Skip to main content

Topic: Analyzing the Results of Listening Tests (Read 4304 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • ScorLibran
  • [*][*][*][*][*]
  • Banned
Analyzing the Results of Listening Tests
A side-topic began in this thread about how to interpret the results of Roberto's 128kbps listening test.  It spurred some questions in my mind concerning where the "threshold of significance" exists for certain aspects of the test results.  This is addressed in some ways in the "Test Finished" thread, but there are still some things I'm wondering about concerning how the results can be interpreted.

The top four rated codecs, AAC, WMA Pro, Vorbis and MPC were statistically tied, yet the overall average bitrates for each codec are still disparate.  My contention is that the average bitrates are fixed values, and hence usable in further analysis of the test results.

Hence, is it feasible to measure the test results as "rating / average bitrate"?  I did a quick calculation of this based on the published test results data, multiplying all my calculation results by 128 (the bitrate target in kbps) to get numbers >0, and came up with the following "composite ratings" addressing the test's quality ratings divided by average bitrates for each codec (results are rounded):

AAC = (4.42/129)*128 = 4.39
WMA Pro = (4.30/128)*128 = 4.30
MPC = (4.51/146.1)*128 = 3.95
Vorbis = (4.28/140.1)*128 = 3.91

...so does this interpretation have any significance?  Or am I on the wrong track?   

I hear many people saying that MPC was "better" than Vorbis in the test, but looking only at sound quality, since they were tied within ANOVA's margin of confidence this would not be the case.  However, it may be possible to qualify that statement by defining a "composite rating", if the approach has any significance, that is.

A decision of either "Yes, Chris, this is usable", or "No, the resulting variance between these four codecs at 128kbps is still insignificant, even considering the average bitrate differences" may settle a lot of disputes on this subject.

Edit: Changed multiplier from "100" to "128", as noted above.
  • Last Edit: 12 November, 2003, 05:41:43 AM by ScorLibran

  • Todesengel
  • [*]
Analyzing the Results of Listening Tests
Reply #1
I think you're right. We should compare all codecs on same bitrate = 128... MPC and OGG will lose some points...

  • guruboolez
  • [*][*][*][*][*]
  • Members (Donating)
Analyzing the Results of Listening Tests
Reply #2
ScorLibran & Todesengel> This was discussed many time. If mpc --radio gives an average bitrate on the 12 full albums close to 128 kbps, why would you lowering the notation on short samples extract from these 12 discs? It's a non-sense. It's just like encoding a DVD in a 700 MB Divx, cutting a short part and difficult part of the video stream, and then claiming than we should compensate the located bitrate inflation in order to be fair with CBR encoders.
Why people prefer VBR on CBR? Simply because it gives a better quality? And why is quality better? Mainly because VBR encoder can grant more bitrate on difficult part.

Don't forget that Roberto's test used difficult samples. Not killer one, but difficult. Therefore, average bitrate for VBR encoder is logically bigger than the 128 kbps targetted. Make another test, with easy samples. CBR will stay at 128 kbps, and VBR will reach the 100 kbps floor.

Short samples are not a good basis for bitrate speculation. Full album yes, but certainly not 20 selected seconds.
  • Last Edit: 12 November, 2003, 06:21:45 AM by guruboolez

  • tigre
  • [*][*][*][*][*]
Analyzing the Results of Listening Tests
Reply #3
Problem(s) with this interpretation:

The quality settings were chosen to give similar bitrates (~128kbps) when encoding a wide variety of music. The samples used for the test, OTH, were chosen in order to be difficult to encode, otherwise it would have been hard to get meaningful results at all.

Another point is that your calculation is based on a linear relation ship between bitrate and quality which isn't the case most likely.
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

  • Todesengel
  • [*]
Analyzing the Results of Listening Tests
Reply #4
I have encoded about 100 albums in MPC. and I told you --radio gives bitrate around 140kbps!!!

  • ScorLibran
  • [*][*][*][*][*]
  • Banned
Analyzing the Results of Listening Tests
Reply #5
Quote
This was discussed many time.

I know, but not in the specific context of using the results to define a composite rating system (that I found...if I missed something, I apologize).

Quote
It's a non-sense. It's just like encoding a DVD in a 700 MB Divx, cutting a short part and difficult part of the video stream, and then claiming than we should compensate the located bitrate inflation in order to be fair with CBR encoders.
Why people prefer VBR on CBR? Simply because it gives a better quality? And why is quality better? Mainly because VBR encoder can grant more bitrate on difficult part.

I agree, so there would have to be some defined range of comparison.  It would obviously be non-sensical to compare composite quality/bitrates of 320kbps vs. 96kbps, for instance, as the numbers would not reflect an accurate relationship.  I was just wondering if there was a way to show an accurate relationship with encoding rates within a more narrow range.

Quote
Don't forget that Roberto's test used difficult samples. Not killer one, but difficult. Therefore, average bitrate for VBR encoder is logically bigger than the 128 kbps targetted. Make another test, with easy samples. CBR will stay at 128 kbps, and VBR will reach the 100 kbps floor.

I understand, and there may be no way around this, other than to not test VBR alongside CBR.  This would not be feasible as discussed in the test preparation thread because these are similar bitrates and settings that people actually use between these different codecs, and a comparison is quite fair in principle.

Quote
Problem(s) with this interpretation:

The quality settings were chosen to give similar bitrates (~128kbps) when encoding a wide variety of music. The samples used for the test, OTH, were chosen in order to be difficult to encode, otherwise it would have been hard to get meaningful results at all.

Another point is that your calculation is based on a linear relation ship between bitrate and quality which isn't the case most likely.

I've thought about the first point, but the only solution would be to test only with sample types that are the same as the ones used in the pre-test bitrate analysis thread, right?  This becomes an issue when "regular" music samples are tested for average bitrates to determine encoder settings, then "problem" samples are used in the live test.  This too has been addressed, but I can't remember a specific resolution for this.  (Eyes are tired from reading MANY thread pages on this subject this morning.    )

On your second point...could an alternative method be used in the place of a linear calculation?  I'm sure there is a lot more to that relationship than the little I understand of it, but it would seem that there could be some valid way of calculating a composite value if it's a feasible method at all.

  • Ivan Dimkovic
  • [*][*][*][*][*]
  • Developer
Analyzing the Results of Listening Tests
Reply #6
Quote
On your second point...could an alternative method be used in the place of a linear calculation? I'm sure there is a lot more to that relationship than the little I understand of it, but it would seem that there could be some valid way of calculating a composite value if it's a feasible method at all.


No, not posible before knowing codec coding parameters and algrorithms, and particular implementation weak spots.

Problem is that usually, most codecs have some kind of "operating range", where the quality scales nicely, and below "operating range"  quality starts to drop significantly.

For example, for MP3 the "operating range" is around 96-160 kbps,  below which it starts to detoriate progresively.  Same goes for AAC, but it is around 80-128 kbps.

We can define some kind of "coding power", which would be the number of bits required to code signal with same perceptual entropy between various codecs - the problem is that psymodels in codecs differ - i.e. , the PE is not the same for signals - so we don't have a common value to make predictions.
  • Last Edit: 12 November, 2003, 07:14:11 AM by Ivan Dimkovic