Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Public MP3 Listening Test @ 128 kbps - FINISHED (Read 192912 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #25
Regarding statistics... the confidence intervals will decrease in size if there are more participants?

Great to have updated results for 2008; thanks Sebastian!

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #26
Is this claim correct? There has been no improvement on the Helix encoder since after 2005?


 

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #28
I analyzed my results and the ranking of the encoders is different for each sample. So indeed there's no undisputed winner here.
Though I have Helix at the first place in some samples. I would have never expected that! Nice surprise 
Another surprise to me is that, on some samples, I found LAME 3.97 worse than Fhg or iTunes.
And finally, I don't have any results where LAME 3.97 is better than 3.98.2.

This is indeed surprising. I'm sure I've seen smaller, recent, ABX-tests where Lame has outperformed Helix quite clearly. I think Guruboolez and maybe also Halb27 have done a couple, but I might be mistaken.

Last time I've seen Francis doing an MP3 listening evaluation with LAME and Helix is on this post.

Is this claim correct? There has been no improvement on the Helix encoder since after 2005?

It's correct. Here's the latest compile (v5.1 2005.08.09) used in this test.

EDIT: LAME 3.97 comments

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #29
Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.
In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #30
Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.
In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess

About Helix: Lets not forget Guru's listening test from 2007 where Helix clearly failed on classical music.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #31
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess  ....

Sure, but blind people aren't the target audience either. The non-zoomed view gives all the information we need.
lame3995o -Q1.7 --lowpass 17


Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #33
Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??
I'm only waiting that someone says "lossless is lossy", then my confusion is completed.
FB2K,APE&LAME

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #34
Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??
I'm only waiting that someone says "lossless is lossy", then my confusion is completed.

There has always been a tendency at HA that Lame is expected to be seriously superior as compared with other encoders. And listening tests have always been taken too much of a 'proof' for this whereas they contribute experience with encoders in a pretty objective way but only within the restrictions of the samples tested and the listening abilities of the participants. It's the best we can do, but has its restrictions.

Why worry? Isn't it a good thing that all the encoders perform very well on the samples?
As for Lame 3.98.2: isn't it a good thing that it scores so well? All we have known so far is that that it brings improvement over 3.97 for certain classes of problems where 3.97 had a rather weak quality. We did not have a lot of experience that there is no serious regression with 3.98 which is possible. Now we have reason to beleive that this is not the case, we can expect from 3.98  with good reason that 3.98 is a real progress.
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #35
Before anything I have to thank Sebastian again for having conducted this nice MP3 Listening Test @ 128 kbps!

I think I was too hard rating the samples but my results are very similar to the overall results:

Code: [Select]
               My Average  Test Results
iTunes 8.0.1      2,45        4,26
Lame 3.98.2       2,94        4,51
l3enc 0.99a       1,00        1,56
Fraunhofer        2,84        4,44
Lame 3.97         2,77        4,28
Helix v5.1        3,20        4,59

"Test Results" are the results of all participants. "My Average" is a simple linear average of the results as I don't remember how to do other type of analysis (too long ago  ). Taking out the highest and lowest result of all encoders produces a similar result as presented above. If anyone can tell me which formula to use in MS Excel to get error margin please do.

I'm really surprised an encoder that hasn't been tuned since 2005 gets these good results. I have more samples Helix doing better than Lame 3.98.2 than the other way around allthough differences are small. When doing the Test I noticed clearly 2 encoders were better than the rest and I thought they were the Lame ones 

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #36
...
Why worry? Isn't it a good thing that all the encoders perform very well on the samples?
...

I'm worried now not because of Helix being very competitive with Lame 3.98.2 with respect to quality but because Helix encodes so much faster and that's very usefull when I encode albums from my lossless archive to take them on the road.

I wonder why Lame doesn't do better compared to Helix having 3 years more of development on its back. I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #37
It is not good to conclude, from the results of this test, that Helix will be the best option in 160-220 kbps range. You should check quality after encoding to this bitrate.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #38
Wouldn't it be possible to compare the variance within each encoder to get an idea of the robustness of each encoder?

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #39
It is not good to conclude, from the results of this test, that Helix will be the best option in 160-220 kbps range. You should check quality after encoding to this bitrate.

This is very obvious. I added Helix to foobar2000 to do just this: compare quality with some songs and samples 



Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #42
...I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).

You may want to try level's finding about his kind of quality improvement in this bitrate range which you can find in the Helix thread. Quality improvement was not confirmed though by other people.
lame3995o -Q1.7 --lowpass 17


Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #44
If anything the test shows samples where LAME needs improvement at 128 kbps.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #45
And finally, I don't have any results where LAME 3.97 is better than 3.98.2.


I do. If Lame 3.98.2 is file 2 and 3.97 is file 5, then sample 8 sounds near-transparent to me with Lame 3.97, not with 3.98.2. I also find sample 11 better with Lame 3.97.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #46
I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil


All results are available for download already so you can calculate whatever you wish. Tukey HSD is something around 0.5 IIRC (I'm at work right now and don't have access to the exact value) so the tolerance bars are around 0.25 in each direction.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #47
If anything the test shows samples where LAME needs improvement at 128 kbps.

I think we should analyze the results sample by sample and discuss about the severity of the found problems. It would be useful to find out if certain obvious problems with certain encoders were apparently confirmed by the majority of the testers.

In general, I found the choice of the low anchor a bit problematic. The encoder is clearly badly broken. Obviously the 0.99 alpha version is not the version that was involved when the 128 kbps MP3 = CD quality myth was created. In my experience the release version was already a lot better.

A too bad low anchor can have an adverse effect to the rating scale the testers choose to use. It can make the differences between the contenders appear to be less significant.

For comparison, here are my results:

Code: [Select]
% Result file produced by chunky-0.8.4-beta
% ..\chunky.exe --codec-file=..\codecs.txt -n --ratings=results --warn -p 0.05

% Sample Averages:
%    iTunes    L398    Anchor    Fhg    L397    Helix
01.    3.80    2.60    1.00    3.40    1.80    4.30
02.    3.80    2.60    1.40    2.80    2.20    3.10
03.    3.90    3.10    1.00    4.30    3.70    2.90
04.    4.20    4.40    1.00    4.40    3.30    4.40
05.    2.70    3.50    1.00    3.00    3.00    3.80
06.    2.20    3.50    1.00    3.00    3.90    4.00
07.    3.70    4.00    1.00    3.80    2.00    4.00
08.    2.40    4.00    1.00    3.00    4.30    3.00
09.    3.00    3.40    1.00    3.60    3.40    2.50
10.    4.50    4.50    1.00    4.20    4.00    4.50
11.    3.90    2.70    1.00    3.50    3.90    2.40
12.    2.00    3.70    1.00    2.60    3.30    3.60
13.    3.70    3.20    1.00    3.80    2.50    4.00
14.    2.00    3.60    1.00    3.10    3.30    4.00

% Codec averages:
%%%    3.27    3.49    1.03    3.46    3.19    3.61

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #48
Just try some Metal tracks on Helix at V60, I guarantee it will struggle.
"I never thought I'd see this much candy in one mission!"

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #49
/mnt told me that Helix is not gapless, which is to me a serious shortcomming. Another thing is that Helix is not that robust as LAME is. But what is stunning people here is the encoding speed of a encoder it hasn't been worked on for 3 years, while latest fresh LAME is so so much slower to encode!