Skip to main content

Topic: Public MP3 Listening Test @ 128 kbps - FINISHED (Read 146735 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • singaiya
  • [*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #175


You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.


It does have an effect. I never said it is the only thing that influences the error margins.


What are the other factors?


I mean all this time members f this forum have strongly recommended against using the mp3 encoder in itunes,

Probably based on the last mp3 listening test where it SUCKED.


That still seems an overly strong interpretation to me. iTunes did lose to Lame 3.95 and AudioActive, but in the results, Roberto says at the beginning: "I would like to point out two very serious issues with this test: not using the latest version of Xing, bundled with Real Player, that has been reportedly extensively tuned since version 1.5; and forcing VBR on codecs that shouldn't be using them. I'm confident iTunes MP3 would perform better if it was featured at CBR 128, and the same might apply to FhG. I take full responsability on those mistakes, and for them, I apologize."

To me, "sucks" means not only losing the test (without caveats of choosing the wrong setting) but also a score below 3.

  • benski
  • [*][*][*][*][*]
  • Developer
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #176
That still seems an overly strong interpretation to me. iTunes did lose to Lame 3.95 and AudioActive, but in the results, Roberto says at the beginning: "I would like to point out two very serious issues with this test: not using the latest version of Xing, bundled with Real Player, that has been reportedly extensively tuned since version 1.5; and forcing VBR on codecs that shouldn't be using them. I'm confident iTunes MP3 would perform better if it was featured at CBR 128, and the same might apply to FhG. I take full responsability on those mistakes, and for them, I apologize."

To me, "sucks" means not only losing the test (without caveats of choosing the wrong setting) but also a score below 3.


Helix (somewhat-formerly Xing), iTunes and FhG were all tested with VBR this go-around.  This makes the comparison to that previous test all the more relevant.

  • singaiya
  • [*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #177
Good point. I still think "sucks" lines up more with a score of 1, otherwise the wording on ABC/HR rankings should be revised 

  • Raiden
  • [*][*][*]
  • Developer
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #178
...the results, ...

The interpretation of that result surprises me. Roberto said "Although iTunes is a little tied with Gogo, it's safe to say it lost." But actually iTunes was tied with FhG, Gogo and Xing, so it wasn't safe to say "it lost".
Also Lame was tied with AActive so "Lame wins, followed by AudioActive" is also not valid.
Or am I misinterpreting the graph?

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #179
I think you're right, Raiden.

The iTunes mp3 bashing based on the results of that test has always annoyed me.
  • Last Edit: 28 November, 2008, 04:26:15 PM by greynol
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • kwanbis
  • [*][*][*][*][*]
  • Developer (Donating)
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #180

It was at 3.04, againsts LAME's 3.74. LAME won that one, and iTunes did not offered anything better, so it sucked.

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #181
Not you too, kwanbis!

The difference between Lame and iTunes in that test could be smaller than 0.3; less than half the amount you're suggesting.  Not to mention the bitrate on samples created with iTunes was consistently and significantly less than Lame.

On the same token, the difference could be greater than 1.1.  Even still, the conclusions people make from that test are annoying. 
  • Last Edit: 28 November, 2008, 07:07:45 PM by greynol
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #182
that's why i'm thinking something like http://www.hydrogenaudio.org/forums/index....mp;#entry601863 could be beneficial. i think it could be useful if there was some kind of an instruction like "how to do the test properly".
i know this has already been partially discussed, but i felt like it was a good moment to mention it.
also, everyone participating in such a test should look up information on how to interpret the results (maybe provide a faq or something like that?)

edit: trying to correct errors
  • Last Edit: 28 November, 2008, 10:38:18 PM by null-null-pi
10 FOR I=1 TO 3:PRINT"DAMN":NEXT

  • kwanbis
  • [*][*][*][*][*]
  • Developer (Donating)
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #183
Not you too, kwanbis!

The difference between Lame and iTunes in that test could be smaller than 0.3; less than half the amount you're suggesting.  Not to mention the bitrate on samples created with iTunes was consistently and significantly less than Lame.

On the same token, the difference could be greater than 1.1.  Even still, the conclusions people make from that test are annoying.

Point is at that time, even LAME sucked, but it sucked the least. So with all encoders being "ho-hum", what was the point to recommend the worst one? LAME have statistically won, even by 0.3 or 1.1. So, what was the point of recommending iTunes?

  • krabapple
  • [*][*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #184
I think what this thread is telling us is that HA needs a wiki page on statistics....and it should be required reading  before posting about public listening test results, just as we required ABX results for audio claims.

(No, I won't be writing it; I'm not a stats expert. I have a few biostatistics books on my reference shelf, and I know just enough to be suspicious of claims being made here about some codecs being better than others, based on these results.  In other words, I'm with greynol: there is no statistically significant  difference I can see here, for general guidance).

Btw, who here does does have a solid background in statistical analysis?  Just curious.

  • halb27
  • [*][*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #185
A more profound knowledge about statistics can help figuring out that it's necessary to obey to the confidence intervals when judging about the overall average outcome.
The deeper problem is: which is the worth of this overall average outcome? We all want to have life easy but IMO the average outcome is considered so important by many people only because it's such a simple scheme.
I personally prefer an outcome that shows that a certain candidate is good or at least not bad on all of the samples tested (or at least those samples that have a personal meaning). I'm well aware that this (or similar schemes) brings some amount of subjectivity (but does not at all make things arbitrary) and brings no general consensus, but in the end deciding on an encoder to use is an indvidual decision.
  • Last Edit: 29 November, 2008, 05:04:42 PM by halb27
lame3995o -Q1

  • halb27
  • [*][*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #186
I wanted to give FhG a try because of this test's outcome, and I wanted to concentrate on the most serious tonal problems I know (herding_calls, trumpet, trumpet_myPrince) which I wanted to have at a non-obvious issue level which should be not a bit annoying when careful listening. And I wanted to stay at around 200 kbps maximum (having to care about file size in the near future).

What I learnt from the test is that I don't have to care about the extreme HF region and can be content with HF up to 16 kHz which is very favorable when using mp3.

As a refererence I used Lame 3.98.2. I figured out that -V0.5 --lowpass 16.0 is the setting which brings the desired quality for me. With my test set of typical regular music the average bitrate for this setting is 205 kbps.

Then I tried FhG surround, but didn't manage to find a competetive setting (even with lowpassing before encoding with one of the higher quality settings). So FhG isn't an alternative to me.

Though I didn't want to pick up again Helix as I decided some time ago not to use it I was curious how Helix behaved. At relatively low bitrate Helix has serious problems with all the samples (especially the 'tremolo' effect with trumpet_myPrince). But from a certain point on there is a strong quality increase. With '-X2 -U2 -SBT500 -TX0 -V110' my quality demands are met. With my typical regular music test set the average bitrate for this setting is 179 kbps. As this is a good result I looked up those samples where in the past I found a subtle issue concerning HF behavior and 'vividness'. I couldn't hear an issue, probably due to my reduced demands (and limited time I wanted to spend for this small test).

Struggling for lower bitrate with Lame I arrived at --abr 200 --lowpass 16.0 which gives an average bitrate of 191 kbps with my typical regular test set and meets my quality demands for these bad tonal problems.
But in this bitrate range and for general usage I prefer VBR, now that I'm happy with Lame's VBR behavior.

I will use Lame -V0.5 --lowpass 16.0 in the future. I have no reason to prefer it over Helix. It's just personal - my emotions are more with Lame than with Helix.
  • Last Edit: 30 November, 2008, 05:20:39 AM by halb27
lame3995o -Q1

Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #187
Quote
'-X2 -U2 -SBT500 -TX0 -V110'

What do these switches do?

I tried to compare the command line used in the test, '-X2 -U2 -V60' with the one suggested for metal,

'-X2 -HF2 -SBT500 -TX0 -C0 -V60'.

I found the latter to be harder to ABX even though I could ABX both (I only tried it on two tracks). Halb27, have you tried that command line?
//From the barren lands of the Northsmen

  • halb27
  • [*][*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #188
non-audio related switches:
-X2: MPEG compatible Xing Header
-U2: encoding speedup (uses SSE)
-C0: clear copyright bit

audio related switches:
-Vx: VBR quality (range 0-150)
-HF2: makes use of HF > 16 kHz
-SBTy: short block threshold
-TXz: nobody seems to be able to tell about this switch.

level is the one who studied these switches most intensively at VBR settings ~ -V120. As I'm in this bitrate range I simply follow his settings. I did some tests on my own with TX0...TX8 at -V110, and though I think I can hear differences they are so subtle that I personally can't say which one is best. Same goes for -SBT500 or default setting.
I don't use -HF2 as I don't need frequencies beyond 16 kHz.
  • Last Edit: 30 November, 2008, 03:27:21 PM by halb27
lame3995o -Q1

  • pfloding
  • [*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #189
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation. I'm afraid the conclusion about evaluating lower bit rates sounds as if 128 kbps would provide satisfactory sound quality!?

  • halb27
  • [*][*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #190
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation. I'm afraid the conclusion about evaluating lower bit rates sounds as if 128 kbps would provide satisfactory sound quality!?

I understand your concern about a lower bitrate test, as this is not attractive to me either (though there may be codecs like HE-AAC which may provide for satisfactory sound quality at 96 kbps).
But as for your remark on sound system I am convinced that this is not the problem. I guess many participants are happy with their sound system. At least I am. BTW artifacts can often be heard even with a bad sound system. Why not accept the fact that 128 kbps provides satisfactory sound quality for most users most of the time?
lame3995o -Q1

  • pfloding
  • [*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #191

Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation. I'm afraid the conclusion about evaluating lower bit rates sounds as if 128 kbps would provide satisfactory sound quality!?

I understand your concern about a lower bitrate test, as this is not attractive to me either (though there may be codecs like HE-AAC which may provide for satisfactory sound quality at 96 kbps).
But as for your remark on sound system I am convinced that this is not the problem. I guess many participants are happy with their sound system. At least I am. BTW artifacts can often be heard even with a bad sound system. Why not accept the fact that 128 kbps provides satisfactory sound quality for most users most of the time?


Well, ok, let me rephrase the question then: What is the point of comparing codecs to each other rather than comparing to the uncompressed reference? Correct me if I'm wrong, but wasn't the test a comparison between encoders? (With the reference still being a low bit rate compressed format.)

Sure, some encoders are better then others for certain things. But certain other things are not even being considered. (Such as the pretty much non-existing ambient reflected sound field.) Without using a proper reference and a high quality audio system, tests will just be nit-picking about the lesser evil influence on this or that musical instrument, but missing entire huge aspects of sound quality!

It would be a good thing to put low bitrate audio quality into some kind of overall quality context. 128 kbps is not CD quality, it's not LP quality, and it's not even close to good Philips compact cassette. It sounds nice in a non-offensive way, which is good. Even on my Nano 128 kbps is clearly a lot worse than 224 kbps.

BTW, I'm not trying to start a flame war, just stating what I think are some truths that seem to be forgotten all the time.

  • gerwen
  • [*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #192
Well, ok, let me rephrase the question then: What is the point of comparing codecs to each other rather than comparing to the uncompressed reference? Correct me if I'm wrong, but wasn't the test a comparison between encoders? (With the reference still being a low bit rate compressed format.)

Sure, some encoders are better then others for certain things. But certain other things are not even being considered. (Such as the pretty much non-existing ambient reflected sound field.) Without using a proper reference and a high quality audio system, tests will just be nit-picking about the lesser evil influence on this or that musical instrument, but missing entire huge aspects of sound quality!

I take it you didn't do the test.  You should.  Even though it's already over.  There is a reference lossless (i believe) sample for you to compare each codec against.

Even on my Nano 128 kbps is clearly a lot worse than 224 kbps.

For many (if not most) people, that is simply not true.  Personally I can't tell the difference on 99% of material, with careful listening on a decent set of earphones.  Even for people who can spot the differences, i don't think it is 'clearly a lot worse.'
  • Last Edit: 01 December, 2008, 10:54:18 AM by gerwen

  • Squeller
  • [*][*][*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #193
I wanted to give Helix a try for my portable. After a quick test artifacts at a test track (Classical, RVW "Fantasia on Christmas Carols, Hickox (rip), first seconds) disappeared at -V120 - 186 kbps av. Lame perfomed better even at V4 (I usually use V3 at portable listening). --> 162 kbps av. - still no artifacts.

However. I was surprised, Helix encodes at 32x, Lame 16x on my 3ghz intel P4 on one core. Only twice as fast as Lame? For Helix: Are there optimized compiles out there?

  • Synthetic Soul
  • [*][*][*][*][*]
  • Global Moderator
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #194
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation.
Better than what?  You can't possibly know the equipment used by each participant.

As per gerwen's suggestion, it would be great to see some ABX results from you, using the samples tested.
I'm on a horse.

  • pfloding
  • [*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #195
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation.
Better than what?  You can't possibly know the equipment used by each participant.

As per gerwen's suggestion, it would be great to see some ABX results from you, using the samples tested.


It's true that the systems used are complete unknowns.

I had a look around for the samples, but couldn't locate them at:

http://www.listening-tests.info/mp3-128-1/

  • lvqcl
  • [*][*][*][*][*]
  • Developer
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #196
It's true that the systems used are complete unknowns.

I had a look around for the samples, but couldn't locate them at:

http://www.listening-tests.info/mp3-128-1/


You can find links to all samples on previous page 
http://www.hydrogenaudio.org/forums/index....st&p=601657

Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #197

So what we can say (as conservative scientists) is: We can't be sure that there's a difference in quality between the different encoders. Nothing more.

Exactly. Or at "We can't be sure that there's a difference in quality between the different encoders for this set of samples and for the participants etc..."

To put the debate on statistical difference and on the practical side of the graph, I created a fake one in which I add as competitor a lossless encoding. It's not quite perfect as the confidence error margin would change a bit but I don't think a true graph would really look different:

[a href="http://img230.imageshack.us/my.php?image=resultsz1my0.png" target="_blank"] ).  Err, I guess they could if they look at people's comments about what sounded bad in each sample.  That could tip them off to which sample was which if they can hear the same thing; I guess that answers my question.  Although you could just ask people not to look at people's detailed results while doing the test.  if people wanted to bias the themselves, they could have used a Java debugger, or used strace -efile to see what .wav ABC/HR was opening, so you're always depending on people's honesty anyway.

Ok, that's enough ways to tread on statistical thin ice for now.
  • Last Edit: 01 December, 2008, 11:27:43 PM by llama peter

  • ckjnigel
  • [*][*][*]
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #198
Since the results are surprising, I wish Sebastian could moderate a discussion with three authorities.
In other words, a panel discussion with people like Roberto, a Nero rep, Menno, a LAME team member, whatever...
Maybe it could be done by Skype.  I think it would be better to allow participants to jump in and interrupt...
{ed add'n: I'm particularly interested to know if HELIX tweaks may come}
  • Last Edit: 08 December, 2008, 01:10:11 PM by ckjnigel

  • kwanbis
  • [*][*][*][*][*]
  • Developer (Donating)
Public MP3 Listening Test @ 128 kbps - FINISHED
Reply #199
The much awaited results of the Public, MP3 Listening Test @ 128 kbps are ready - partially.

Sebastian, would you be adding more info? Or is the test finished already?