Print Page - SE listening test @128kbit/s

Title: SE listening test @128kbit/s
Post by: Serge Smirnoff on 2013-11-28 11:28:26

If somebody is interested in results of forthcoming SE listening test @128kbit/s despite questionable artifact amplification technique, that will be used in this test, please, propose your codec candidates.

Results of the test will be presented in the same detailed form as in previous @64 and @96 tests.

Title: SE listening test @128kbit/s
Post by: Serge Smirnoff on 2014-03-24 22:21:17

Following codecs were added to 128kbit/s section:

AAC VBR@112.0 (Winamp 5.666) - VBR, AAC LC
AAC VBR@118.4 (iTunes 11.1.3) - TrueVBR, AAC LC
AAC VBR@117.5 (NeroRef 1540) - CVBR, AAC LC
Vorbis VBR@119.4 (Xiph 1.3.3)
Opus VBR@115.7 (libopus 1.1)
mp3 VBR@113.7 (Lame 3.99.5) - MPEG-1 Layer 3, VBR
AAC VBR@110.9 (libfdk 3.4.12) - MPEG-4 AAC LC, VBR
mpc VBR@123.3 (SV8)

All encoders have integer/discrete quality settings - http://soundexpert.org/news/-/blogs/opus-a...c-at-128-kbit-s (http://soundexpert.org/news/-/blogs/opus-aac-vorbis-mp3-mpc-at-128-kbit-s)

Title: SE listening test @128kbit/s
Post by: C.R.Helmrich on 2014-03-25 09:57:33

Sorry if this has been answered before, but: how much error amplification do you apply at 128 kbps stereo? 1 dB, or more?

And I'm surprised Fraunhofer's AAC encoder averages only 112 kbps on this item set. Do some samples include silence?

Chris

Title: SE listening test @128kbit/s
Post by: Serge Smirnoff on 2014-03-25 10:47:52

Quote from: C.R.Helmrich on 2014-03-25 09:57:33

Sorry if this has been answered before, but: how much error amplification do you apply at 128 kbps stereo? 1 dB, or more?

Not all test items were amplified, only those with unnoticeable artifacts.
If amplification was applied then at least three amplified versions of a test item were produced - in most cases with +1dB, +3dB, +5dB amplification. It depends on particular codec/item, in some cases it was even +4dB +6dB +10dB. For higher bitrates amplification is usually higher as well.

Quote from: C.R.Helmrich on 2014-03-25 09:57:33

And I'm surprised Fraunhofer's AAC encoder averages only 112 kbps on this item set. Do some samples include silence?

SE test set usually results in lower bitrates than pop-music. It is closer to classical music material. Yes, some items contain silence. SE test sequence can be downloaded from http://soundexpert.org/sound-samples (http://soundexpert.org/sound-samples) (bottom of the page)

Title: SE listening test @128kbit/s
Post by: LithosZA on 2014-03-25 12:09:06

Quote

Not all test items were amplified, only those with unnoticeable artifacts.

I assume the same amplification would be applied to all codecs for that item?

Title: SE listening test @128kbit/s
Post by: Serge Smirnoff on 2014-03-25 12:26:33

Quote from: LithosZA on 2014-03-25 12:09:06

Quote
Not all test items were amplified, only those with unnoticeable artifacts.

I assume the same amplification would be applied to all codecs for that item?

No, as each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

Title: SE listening test @128kbit/s
Post by: C.R.Helmrich on 2014-03-25 12:40:04

Quote from: Serge Smirnoff on 2014-03-25 12:26:33

... each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

But then how can you rank the codecs for such items?

Chris

Title: SE listening test @128kbit/s
Post by: Serge Smirnoff on 2014-03-25 13:29:24

Quote from: C.R.Helmrich on 2014-03-25 12:40:04

Quote from: Serge Smirnoff on 2014-03-25 12:26:33
... each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

But then how can you rank the codecs for such items?

Two page doc explains the whole procedure of ranking - http://soundexpert.org/documents/10179/11017/se_igis.pdf (http://soundexpert.org/documents/10179/11017/se_igis.pdf)
In short. Three (or more) gradually degraded test items are graded by testers as usual. Each test item then has two coordinates - level of waveform degradation (Difference level, dB) and subjective score [1-5]. These three points define a 2-nd order curve which shows the relationship between measurable degradation of waveform and perceived degradation of sound quality. Resulting score of the codec in such case is the point on the curve corresponding to Difference level of the item without amplification.

Title: SE listening test @128kbit/s
Post by: C.R.Helmrich on 2015-04-13 21:26:09

Looking at the current live rankings (http://soundexpert.org/encoders-128-kbps) makes me conclude that the following statistical outcome could occur.

Let us assume there are two codecs, A and B, which are tested using two test signals. Now, if

codec A has a mean score of 6 (i.e. transparent with a relatively strong margin) on both signals and
codec B has a mean score of 3 (clearly non-transparent) for the first and 11 (clearly transparent) for the second item,

then codec A averages 6 on both items, and codec B averages 7 on both items, meaning that:

codec A exhibits a lower mean score than codec B even though both signals are transparent for codec A while only one signal is transparent for codec B.

Is this correct? Looking at the current scores for the Vorbis encoders, the example does not seem far-off.

Chris

Title: SE listening test @128kbit/s
Post by: halb27 on 2015-04-14 06:47:56

That's why looking at the bad case scenarios is much more relevant to me than looking at the average outcome. True for every listening test.

Title: SE listening test @128kbit/s
Post by: IgorC on 2015-04-14 14:57:26

When it comes to consistency of quality or anytime because consistency is important as much as level of quality (average score) then geometric mean score should be/is more representative.

P.S. There might be some other functions of averaging those can penalize a deviation of particular score from an average score. And it's a possibility to elaborate them for particular cases.

Title: SE listening test @128kbit/s
Post by: C.R.Helmrich on 2015-04-14 19:56:23

Quote from: IgorC on 2015-04-14 14:57:26

When it comes to consistency of quality or anytime because consistency is important as much as level of quality (average score) then geometric mean score should be/is more representative.

Yes, I also thought about recommending the geometric mean here. It would give 5.74 instead of 7. An alternative would be to apply some kind of compressor when computing the arithmetic mean, e.g.

outputScore(itemScore) = 5 * sqrt(0.4*itemScore - 1) if itemScore > 5
and
outputScore(itemScore) = itemScore otherwise.

The above formula would give an average of 6.11 for my earlier example.

Chris

Title: SE listening test @128kbit/s
Post by: IgorC on 2015-04-16 01:28:39

Quote from: C.R.Helmrich on 2015-04-14 19:56:23

outputScore(itemScore) = 5 * sqrt(0.4*itemScore - 1) if itemScore > 5
and
outputScore(itemScore) = itemScore otherwise.

Oh, and that formula? Reminds me of a formulating of my rules of thumbs

Anyway fair enough. Though I think these differences in numbers is because different encoders were tested in different times by different set of people.
And now a new codecs were added on top of that set of data which was recolected during several years. I don't know how to parse that.

Title: SE listening test @128kbit/s
Post by: C.R.Helmrich on 2015-04-16 07:53:08

Quote from: IgorC on 2015-04-16 01:28:39

Oh, and that formula? Reminds me of a formulating of my rules of thumbs

Yes, it's quite rule-of-thumby But only in the transparency range above a score of 5, where the infinite-score artifact-amplification thing itself could be considered a rule of thumb. BTW, I suggest to only apply the compressor when computing the overall average score, not in the computation of the per-item mean scores.

The whole point of me bringing this up is that I fear that encoders aiming for CONSTANT quality (i.e. small quality variance and NO overcoding for scores > 5, i.e. following the VBR principle) are being punished in this evaluation. This isn't so much the case in the lower-bit-rate rankings.

Chris

Title: SE listening test @128kbit/s
Post by: IgorC on 2015-04-17 14:10:41

Quote from: C.R.Helmrich on 2015-04-16 07:53:08

The whole point of me bringing this up is that I fear that encoders aiming for CONSTANT quality (i.e. small quality variance and NO overcoding for scores > 5, i.e. following the VBR principle) are being punished in this evaluation. This isn't so much the case in the lower-bit-rate rankings.

Exactly. It can greatly benefits CBR or CBR-ish, ABR, constrained-VBR against pure quality based VBR. If some VBR encoder does 5.0 for two samples while CBR does 4.0 and 6.0 then there is no benefit from VBR. But it's clear that VBR is considerably better in real scenarios.

Title: SE listening test @128kbit/s
Post by: halb27 on 2015-04-17 15:05:29

I'd prefer a formulation like: if encoder A yields 4.0 and 6.0 on 2 samples, while encoder B yields 5.0 for both of them, then judging from arithmetic average both encoders have equal quality, while probably everybody would prefer encoder B.

I can't see evidence correlating encoder A-behavior with CBR or a method which is similar to some extent, and encoder B-behavior with VBR. From theory with all other machinery the same I'd expect for a given average bitrate VBR to have the larger quality variance, but quality average on a higher level than CBR if VBR works well. With real world encoders of course things can be different in any direction.

Title: SE listening test @128kbit/s
Post by: xorgy on 2015-12-23 23:16:25

Quote from: Serge Smirnoff on 2014-03-24 22:21:17

Following codecs were added to 128kbit/s section:

AAC VBR@112.0 (Winamp 5.666) - VBR, AAC LC
AAC VBR@118.4 (iTunes 11.1.3) - TrueVBR, AAC LC
AAC VBR@117.5 (NeroRef 1540) - CVBR, AAC LC
Vorbis VBR@119.4 (Xiph 1.3.3)
Opus VBR@115.7 (libopus 1.1)
mp3 VBR@113.7 (Lame 3.99.5) - MPEG-1 Layer 3, VBR
AAC VBR@110.9 (libfdk 3.4.12) - MPEG-4 AAC LC, VBR
mpc VBR@123.3 (SV8)

All encoders have integer/discrete quality settings - http://soundexpert.org/news/-/blogs/opus-a...c-at-128-kbit-s (http://soundexpert.org/news/-/blogs/opus-aac-vorbis-mp3-mpc-at-128-kbit-s)

Why did you choose to upsample the Opus input, but none of the others? It seems like that would degrade the quality for a given target bitrate.

(Wrt "- 44.1/16 -> 48/24 by Audition CS6")

Title: SE listening test @128kbit/s
Post by: Rotareneg on 2015-12-24 02:32:27

Because Opus doesn't support a 44.1 kHz sampling rate.

Title: SE listening test @128kbit/s
Post by: jmvalin on 2015-12-28 21:52:50

Quote from: Serge Smirnoff on 2013-11-28 11:28:26

despite questionable artifact amplification technique

I haven't been able to find much information about the artifact amplification technique, but my concern is that it effectively penalizes more advanced codecs more heavily than "dumber" ones. For pure waveform codecs, it's probably not so bad, but as soon as you introduce perceptual tricks, the amplification is likely to be very wrong. I doubt you can properly "amplify" the artefacts of something as simple as MP3 intensity stereo. And it's going to be worse with more advanced codecs, HE-AAC's SBR, HE-AACv2's parametric stereo, Opus' folding, pseudo-intensity stereo, anti-collapse, ... all are going to end up weird when "amplified". In the end, when you compare two codecs that are normally (without amplification) near-transparent, the amount of damage the amplification does to each codec is going to have more impact on the results than how close to transparency each codec actually is.

Title: Re: SE listening test @128kbit/s
Post by: John Silver on 2016-03-17 10:39:54

Please add AAC 128 kbps ABR (only new ITUNES, preset: High Quality)

Title: Re: SE listening test @128kbit/s
Post by: jarsonic on 2016-03-17 14:15:09

Zombie thread!

HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: Serge Smirnoff on 2013-11-28 11:28:26