Skip to main content

Topic: SE listening test @128kbit/s (Read 10476 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
SE listening test @128kbit/s
If somebody is interested in results of  forthcoming SE listening test @128kbit/s despite questionable artifact amplification technique, that will be used in this test, please, propose your codec candidates.

Results of the test will be presented in the same detailed form as in previous @64 and @96 tests.
keeping audio clear together - soundexpert.org

SE listening test @128kbit/s
Reply #1
Following codecs were added to 128kbit/s section:

AAC VBR@112.0 (Winamp 5.666) - VBR, AAC LC
AAC VBR@118.4 (iTunes 11.1.3) - TrueVBR, AAC LC
AAC VBR@117.5 (NeroRef 1540) - CVBR, AAC LC
Vorbis VBR@119.4 (Xiph 1.3.3)
Opus VBR@115.7 (libopus 1.1)
mp3 VBR@113.7 (Lame 3.99.5) - MPEG-1 Layer 3, VBR
AAC VBR@110.9 (libfdk 3.4.12) - MPEG-4 AAC LC, VBR
mpc VBR@123.3 (SV8)

All encoders have integer/discrete quality settings - http://soundexpert.org/news/-/blogs/opus-a...c-at-128-kbit-s
keeping audio clear together - soundexpert.org

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
SE listening test @128kbit/s
Reply #2
Sorry if this has been answered before, but: how much error amplification do you apply at 128 kbps stereo? 1 dB, or more?

And I'm surprised Fraunhofer's AAC encoder averages only 112 kbps on this item set. Do some samples include silence?

Chris
  • Last Edit: 25 March, 2014, 05:59:23 AM by C.R.Helmrich
If I don't reply to your reply, it means I agree with you.

SE listening test @128kbit/s
Reply #3
Sorry if this has been answered before, but: how much error amplification do you apply at 128 kbps stereo? 1 dB, or more?

Not all test items were amplified, only those with unnoticeable artifacts.
If amplification was applied then at least three amplified versions of a test item were produced - in most cases with +1dB, +3dB, +5dB amplification. It depends on particular codec/item, in some cases it was even +4dB +6dB +10dB. For higher bitrates amplification is usually higher as well.

And I'm surprised Fraunhofer's AAC encoder averages only 112 kbps on this item set. Do some samples include silence?

SE test set usually results in lower bitrates than pop-music. It is closer to classical music material. Yes, some items contain silence. SE test sequence can be downloaded from http://soundexpert.org/sound-samples (bottom of the page)
keeping audio clear together - soundexpert.org

  • LithosZA
  • [*][*][*]
SE listening test @128kbit/s
Reply #4
Quote
Not all test items were amplified, only those with unnoticeable artifacts.

I assume the same amplification would be applied to all codecs for that item?

SE listening test @128kbit/s
Reply #5
Quote
Not all test items were amplified, only those with unnoticeable artifacts.

I assume the same amplification would be applied to all codecs for that item?

No, as each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.
keeping audio clear together - soundexpert.org

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
SE listening test @128kbit/s
Reply #6
... each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

But then how can you rank the codecs for such items?

Chris
If I don't reply to your reply, it means I agree with you.

SE listening test @128kbit/s
Reply #7
... each test item is degraded by each codec differently, in each item/codec case the amplification is applied differently (if at all). If applied - three gradually degraded versions of an item are produced.

But then how can you rank the codecs for such items?

Two page doc explains the whole procedure of ranking - http://soundexpert.org/documents/10179/11017/se_igis.pdf
In short. Three (or more) gradually degraded test items are graded by testers as usual. Each test item then has two coordinates - level of waveform degradation (Difference level, dB) and subjective score [1-5]. These three points define a 2-nd order curve which shows the relationship between measurable degradation of waveform and perceived degradation of sound quality. Resulting score of the codec in such case is the point on the curve corresponding to Difference level of the item without amplification.
keeping audio clear together - soundexpert.org

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
SE listening test @128kbit/s
Reply #8
Looking at the current live rankings makes me conclude that the following statistical outcome could occur.

Let us assume there are two codecs, A and B, which are tested using two test signals. Now, if

  • codec A has a mean score of 6 (i.e. transparent with a relatively strong margin) on both signals and
  • codec B has a mean score of 3 (clearly non-transparent) for the first and 11 (clearly transparent) for the second item,

then codec A averages 6 on both items, and codec B averages 7 on both items, meaning that:

codec A exhibits a lower mean score than codec B even though both signals are transparent for codec A while only one signal is transparent for codec B.

Is this correct? Looking at the current scores for the Vorbis encoders, the example does not seem far-off.

Chris
If I don't reply to your reply, it means I agree with you.

  • halb27
  • [*][*][*][*][*]
SE listening test @128kbit/s
Reply #9
That's why looking at the bad case scenarios is much more relevant to me than looking at the average outcome. True for every listening test.
lame3995o -Q1

  • IgorC
  • [*][*][*][*][*]
SE listening test @128kbit/s
Reply #10
When it comes to consistency of quality or anytime because consistency is important as much as level of quality (average score) then geometric mean score should be/is more representative.


P.S. There might be some other functions of averaging those can penalize a deviation of particular score from an average score. And it's a possibility to elaborate them for particular cases.
  • Last Edit: 14 April, 2015, 10:05:09 AM by IgorC

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
SE listening test @128kbit/s
Reply #11
When it comes to consistency of quality or anytime because consistency is important as much as level of quality (average score) then geometric mean score should be/is more representative.

Yes, I also thought about recommending the geometric mean here. It would give 5.74 instead of 7. An alternative would be to apply some kind of compressor when computing the arithmetic mean, e.g.

outputScore(itemScore) = 5 * sqrt(0.4*itemScore - 1)  if itemScore > 5
and
outputScore(itemScore) = itemScore  otherwise.

The above formula would give an average of 6.11 for my earlier example.

Chris
If I don't reply to your reply, it means I agree with you.

  • IgorC
  • [*][*][*][*][*]
SE listening test @128kbit/s
Reply #12
outputScore(itemScore) = 5 * sqrt(0.4*itemScore - 1)  if itemScore > 5
and
outputScore(itemScore) = itemScore  otherwise.

Oh, and that formula?  Reminds me of a formulating of my rules of thumbs

Anyway fair enough. Though I think these differences in numbers is  because different encoders were tested in different times by different set of people.
And now a new codecs were added on top of that set of data which was recolected during several years. I don't know how to parse that. 
  • Last Edit: 15 April, 2015, 08:30:49 PM by IgorC

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
SE listening test @128kbit/s
Reply #13
Oh, and that formula?  Reminds me of a formulating of my rules of thumbs

Yes, it's quite rule-of-thumby  But only in the transparency range above a score of 5, where the infinite-score artifact-amplification thing itself could be considered a rule of thumb. BTW, I suggest to only apply the compressor when computing the overall average score, not in the computation of the per-item mean scores.

The whole point of me bringing this up is that I fear that encoders aiming for CONSTANT quality (i.e. small quality variance and NO overcoding for scores > 5, i.e. following the VBR principle) are being punished in this evaluation. This isn't so much the case in the lower-bit-rate rankings.

Chris
  • Last Edit: 16 April, 2015, 02:56:51 AM by C.R.Helmrich
If I don't reply to your reply, it means I agree with you.

  • IgorC
  • [*][*][*][*][*]
SE listening test @128kbit/s
Reply #14
The whole point of me bringing this up is that I fear that encoders aiming for CONSTANT quality (i.e. small quality variance and NO overcoding for scores > 5, i.e. following the VBR principle) are being punished in this evaluation. This isn't so much the case in the lower-bit-rate rankings.

Exactly. It can greatly benefits CBR or CBR-ish, ABR, constrained-VBR  against pure quality based VBR.  If some VBR encoder does 5.0 for two samples while CBR does 4.0 and 6.0 then there is no benefit from VBR. But it's clear that VBR is considerably better in real scenarios.
  • Last Edit: 17 April, 2015, 09:11:16 AM by IgorC

  • halb27
  • [*][*][*][*][*]
SE listening test @128kbit/s
Reply #15
I'd prefer a formulation like: if encoder A yields 4.0 and 6.0 on 2 samples, while encoder B yields 5.0 for both of them, then judging from arithmetic average both encoders have equal quality, while probably everybody would prefer encoder B.

I can't see evidence correlating encoder A-behavior with CBR  or a method which is similar to some extent, and encoder B-behavior with VBR. From theory with all other machinery the same I'd expect for a given average bitrate VBR to have the larger quality variance, but quality average on a higher level than CBR if VBR works well. With real world encoders of course things can be different in any direction.
  • Last Edit: 17 April, 2015, 10:13:09 AM by halb27
lame3995o -Q1

  • xorgy
  • [*]
SE listening test @128kbit/s
Reply #16
Following codecs were added to 128kbit/s section:

AAC VBR@112.0 (Winamp 5.666) - VBR, AAC LC
AAC VBR@118.4 (iTunes 11.1.3) - TrueVBR, AAC LC
AAC VBR@117.5 (NeroRef 1540) - CVBR, AAC LC
Vorbis VBR@119.4 (Xiph 1.3.3)
Opus VBR@115.7 (libopus 1.1)
mp3 VBR@113.7 (Lame 3.99.5) - MPEG-1 Layer 3, VBR
AAC VBR@110.9 (libfdk 3.4.12) - MPEG-4 AAC LC, VBR
mpc VBR@123.3 (SV8)

All encoders have integer/discrete quality settings - http://soundexpert.org/news/-/blogs/opus-a...c-at-128-kbit-s


Why did you choose to upsample the Opus input, but none of the others? It seems like that would degrade the quality for a given target bitrate.

(Wrt "- 44.1/16 -> 48/24 by Audition CS6")
  • Last Edit: 23 December, 2015, 06:26:28 PM by xorgy

  • Rotareneg
  • [*][*][*]
SE listening test @128kbit/s
Reply #17
Because Opus doesn't support a 44.1 kHz sampling rate.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
SE listening test @128kbit/s
Reply #18
despite questionable artifact amplification technique

I haven't been able to find much information about the artifact amplification technique, but my concern is that it effectively penalizes more advanced codecs more heavily than "dumber" ones. For pure waveform codecs, it's probably not so bad, but as soon as you introduce perceptual tricks, the amplification is likely to be very wrong. I doubt you can properly "amplify" the artefacts of something as simple as MP3 intensity stereo. And it's going to be worse with more advanced codecs, HE-AAC's SBR, HE-AACv2's parametric stereo, Opus' folding, pseudo-intensity stereo, anti-collapse, ... all are going to end up weird when "amplified". In the end, when you compare two codecs that are normally (without amplification) near-transparent, the amount of damage the amplification does to each codec is going to have more impact on the results than how close to transparency each codec actually is.

Re: SE listening test @128kbit/s
Reply #19
Please add AAC 128 kbps ABR (only new ITUNES, preset: High Quality)

  • jarsonic
  • [*][*][*]
Re: SE listening test @128kbit/s
Reply #20
Zombie thread!