HydrogenAudio

Hydrogenaudio Forum => Validated News => Topic started by: Sebastian Mares on 2008-11-24 21:30:55

Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-24 21:30:55
The much awaited results of the Public, MP3 Listening Test @ 128 kbps are ready - partially. So far, I only uploaded an overall plot along with a zoomed version. The details will be available in the next few days. You can also download the encryption key along with the submitted results on the results page that is located here: http://www.listening-tests.info/mp3-128-1/results.htm (http://www.listening-tests.info/mp3-128-1/results.htm)

The results show that all encoders are tied on first place, except l3enc which of course comes out last being the low anchor.

What is interesting to see is how the MP3 codec actually evolved since its first days (l3enc was the first MP3 software encoder back in 1994 when it was released) and how it is still competitive with newer formats like AAC or Ogg Vorbis.

Another very interesting thing, which was also one of the goals for this test, is that Fraunhofer and especially Helix, which both outperform LAME in terms of encoding speed, are still very competitive. While statistically being tied to LAME on first place, Helix actually even received a higher rating than LAME 3.98.2 - and this at 90x encoding speed! Even FhG received a slightly higher score at least against LAME 3.97 which was the recommended encoder by the Hydrogenaudio community for a long time. But again, statistically, they are all tied so there is no quality winner.

(http://www.listening-tests.info/mp3-128-1/results.png)

The quality at 128 kbps is very good and MP3 encoders improved a lot since the last test. This was the last test conducted by me at this bitrate. It's time to move to bitrates like 96 kbps or 80 kbps.

Here is a zoomed version of the plot showing the competitors only and leaving out the low anchor l3enc.

(http://www.listening-tests.info/mp3-128-1/resultsz.png)

Finally, I would like to thank everyone who participated!

EDIT: Whoops, the link to the results was pointing to the 64 kbps multiformat test by mistake. Corrected now.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-24 21:52:57
Wow am really shocked, Helix (Xing) has performed well. Not only did GNR's new album came out; Xing (Helix) has outperformed LAME; Hell most be pretty cold now
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-24 21:56:23
I kept telling you guys that the results will be quite surprising.  If you analyze the decrypted results, you will see that for at least one sample, Helix is even statistically better than all other encoders.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-24 22:00:18
Does that make Helix the new recommended MP3 encoder, or has it to be LAME because it's open source?
Edit: Both are open source.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-24 22:05:57
Does that make Helix the new recommended MP3 encoder, or has it to be LAME because it's open source?

Helix is open source aswell.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-24 22:14:10
Yes it is, I just noticed it!
I can't believe how fast it is... here encodes at 33x realtime. LAME fastest speed here is 12x.
Foobar2000 parameters: -X2 -U2 -V150 -HF - %d
Would that be equivalent to LAME -V0 ?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-24 22:14:27
I don't think open source has anything to do with it.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-24 22:14:37
If you submitted results, I recommend you look at them and choose your encoder of choice based on that.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-24 22:22:25
I am curious about the detailed results as my interest is in worst case behavior in the first place.
I guess Lame 3.98.2 and Helix will be the winners in this respect too, maybe quality difference towards the  contenders will be more remarkable in this field.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-24 22:25:22
They are all techincally tied, but Helix outperformed all of them. Also, the encoding speed compared to LAME is absurd faster. Could these two arguments qualify Helix for the new recommended MP3 encoder? (LAME being the second recommended)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ExUser on 2008-11-24 22:51:24
Thank you very much Sebastian. We have some things to carefully consider, I see.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DigitalDictator on 2008-11-24 22:56:04
This is indeed surprising. I'm sure I've seen smaller, recent, ABX-tests where Lame has outperformed Helix quite clearly. I think Guruboolez and maybe also Halb27 have done a couple, but I might be mistaken.

IIRC Helix has a very simple code. If it is open source, can it be tuned further by third party? The latest version is from 2005, or no?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ZinCh on 2008-11-24 22:59:30
This is only 128k tests, so Helix in the winner in this niche. More sets needed

Helix can be recomended for 128k encoding.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-24 23:07:56
I encoded a few tracks using -V150 (VBR range 0-150) with -HF (high frequencies encoding) enabled. The average bitrate goes up to 270 kbps. That would be equivalent to LAME -V0. People find -V0 and -V2 already excessive, and there is this ghost-case about the sbf21 bloat... many are -V3 advocates. Helix is a simpler encoder indeed with fewer switch options, but isn't its quality and speed an outstanding alert?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-24 23:10:33
>Helix can be recomended for 128k encoding.

How can you say this when all the contenders were tied within the 95% margin of confidence?

Some other things to consider before leaping to such a conclusion:
- How many participants were there?
- Did Helix consistently score higher amongst all participants?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-24 23:11:58
I'm sure I've seen smaller, recent, ABX-tests where Lame has outperformed Helix quite clearly. I think Guruboolez and maybe also Halb27 have done a couple, but I might be mistaken. ...

Sorry: As far as I am concerned I didn't do a recent ABX test Helix vs. Lame 3.98.
A few years back after level and others stating Helix' remarkable quality in the 200 kbps area I did ABX Helix and some other encoders and I valued Helix' robustness against artifacts in this bitrate area very high. There were some more or less (to me) negligible HF issues however which Wombat found and I also found some cases with a subtle but ABXable lack of 'vividness' (don't know how to describe it). All these things relevant only in the very high bitrate range if at all. But as I don't care about bitrate much and as I was happy with Lame  ABR 270 (3.90.3 then) I sticked with Lame with my mp3 encodings.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Pio2001 on 2008-11-24 23:34:09
AAAAAAAAAAAARRRRRRRRGGGGHGHHHHHH !!

They're all tied 

How can it be ? I ABXed nearly all the samples that I submitted, and there were important differences in quality between them. Could it be that for every sample, the best to worst order was different ? Or were there too many tied submissions ?

I must check my personal results right now !

Oh, and Greynol is right. Helix is not  winner. It is tied. The differences are within the confidence intervals, which means that they are just random. If you redid the test with the same samples and same listeners, the simple fact the ABC/HR presents them in a different order every time would probably lead Lame, or Fraunhofer, or iTunes to get a slightly, but not significantly, superior score.

We must consider this to be chance, unless we have more information to backup further claims.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-24 23:42:54
Wow (even if the difference between LAME 3.98.2 and Helix is 0.08 and knowing that both (all) are statistically tied.)

EDIT: Congratulations to Sebastian for conducting the test.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-24 23:43:14
... Could these two arguments qualify Helix for the new recommended MP3 encoder? ...

I've never been too happy with recommendations especially when it's about just one encoder.
I was especially unhappy with recommending Lame 3.97. There was also a listening test where Lame 3.97 came out great, with a bigger quality difference against the contenders compared to the more or less equal scores in this test as far as average score is concerned. It was after the test that 3.97's 'sandpaper problem' became known. The question is how to weigh it, the question is: how annoying is it for the person who reads the recommendation? It may be negligible, it may be a big issue.

The problem is that we can't test encoders on the universe of music. We can get significant experience with encoders, that's why Sebastian's test is important. But we should always take the results with a grain of salt.
There's also the question what kind of a result you have in focus. Usually people concentrate on the average result of an encoder averaged over all samples. But is this really the real thing which is most important? That's a very personal question. You can look at worst case behavior which is what I do in the first place. To me it's more important that my favorite encoder has a low number of scores below 4.0, and - at best - there is no sample with a score below 3.0. But this too has to be taken with a grain of salt. A bad score on an (to me) exotic sample doesn't count much to me, but it has a very high impact if it happens with music of my favorite genre. So evaluating an encoder is more than just looking at the average scores of a listening test.

Instead of giving a rather strong recommendation as was done so far I'd prefer if we had a weaker suggestion, kind of:
When targeting at a quality which can be achieved with ~128 kbps on average the most recent mp3 listening test has shown that the current versions of Lame, Helix, Fraunhofer, iTunes all do an excellent job. Quality differences between them were negligible within this test as far as the average outcome was concerned, with XXX and YYY having the best consistency in high quality (in case it turns out that such a statement can be made).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Pio2001 on 2008-11-24 23:48:26
Excuse me, but what is the correspondance between the files and the encoders ?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: krabapple on 2008-11-25 00:02:19
Sorry, it's not clear to me how many subjects participated.  Can you point me to that in the graph?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: lvqcl on 2008-11-25 00:16:04
Excuse me, but what is the correspondance between the files and the encoders ?

As shown on the pictures above -- samplexx_1.mp3 is encoded by iTunes, samplexx_2.mp3 - lame 3.98.2 etc. Of course, ABC/HR tool randomizes order of samples every time you load abc/hr config file (Samplexx.ecf)

Sorry, it's not clear to me how many subjects participated.  Can you point me to that in the graph?

Downloaded results.rar:
39 - 26 - 26 - 27 - 30 - 26 - 26 - 26 - 26 - 26 - 27 - 26 - 29 - 30.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Pio2001 on 2008-11-25 00:37:50
Thanks, I analyzed my own results. That's what I feared. The ranking of the encoders is different for each sample. A Tukey HSD analysis on my ratings give them all tied too, except the low anchor.

Thanks again to Sebastian !
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Raiden on 2008-11-25 01:12:30
Oh, and Greynol is right. Helix is not  winner. It is tied. The differences are within the confidence intervals, which means that they are just random. If you redid the test with the same samples and same listeners, the simple fact the ABC/HR presents them in a different order every time would probably lead Lame, or Fraunhofer, or iTunes to get a slightly, but not significantly, superior score.

We must consider this to be chance, unless we have more information to backup further claims.

agreed. Probably the zoomed plot window should be removed, since it's quite misleading and doesn't show any useful information.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ExUser on 2008-11-25 01:32:22
agreed. Probably the zoomed plot window should be removed, since it's quite misleading and doesn't show any useful information.
Not really. It's just a zoomed version of the previous data. It is not misleading, it just represents the same data as the other graph and text in a slightly different form.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sld on 2008-11-25 03:51:39
Regarding statistics... the confidence intervals will decrease in size if there are more participants?

Great to have updated results for 2008; thanks Sebastian!
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Squeller on 2008-11-25 07:46:33
Is this claim correct? There has been no improvement on the Helix encoder since after 2005?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 08:02:55
Regarding statistics... the confidence intervals will decrease in size if there are more participants?

Great to have updated results for 2008; thanks Sebastian!


Yes, that is correct. The more people post, the shorter the confidence intervals.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: melomaniac on 2008-11-25 08:19:10
I analyzed my results and the ranking of the encoders is different for each sample. So indeed there's no undisputed winner here.
Though I have Helix at the first place in some samples. I would have never expected that! Nice surprise 
Another surprise to me is that, on some samples, I found LAME 3.97 worse than Fhg or iTunes.
And finally, I don't have any results where LAME 3.97 is better than 3.98.2.

This is indeed surprising. I'm sure I've seen smaller, recent, ABX-tests where Lame has outperformed Helix quite clearly. I think Guruboolez and maybe also Halb27 have done a couple, but I might be mistaken.

Last time I've seen Francis doing an MP3 listening evaluation with LAME and Helix is on this post (http://www.hydrogenaudio.org/forums/index.php?showtopic=58724).

Is this claim correct? There has been no improvement on the Helix encoder since after 2005?

It's correct. Here (http://rarewares.org/mp3-others.php)'s the latest compile (v5.1 2005.08.09) used in this test.

EDIT: LAME 3.97 comments
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 08:19:41
Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.
In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Squeller on 2008-11-25 08:28:03
Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.
In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess

About Helix: Lets not forget Guru's listening test from 2007 (http://www.hydrogenaudio.org/forums/index.php?showtopic=58724) where Helix clearly failed on classical music.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 08:44:38
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess  ....

Sure, but blind people aren't the target audience either. The non-zoomed view gives all the information we need.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: melomaniac on 2008-11-25 08:46:24
About Helix: Lets not forget Guru's listening test from 2007 (http://www.hydrogenaudio.org/forums/index.php?showtopic=58724) where Helix clearly failed on classical music.

I've already posted the link Squeller.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: memomai on 2008-11-25 09:08:05
Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??
I'm only waiting that someone says "lossless is lossy", then my confusion is completed.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 09:30:02
Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??
I'm only waiting that someone says "lossless is lossy", then my confusion is completed.

There has always been a tendency at HA that Lame is expected to be seriously superior as compared with other encoders. And listening tests have always been taken too much of a 'proof' for this whereas they contribute experience with encoders in a pretty objective way but only within the restrictions of the samples tested and the listening abilities of the participants. It's the best we can do, but has its restrictions.

Why worry? Isn't it a good thing that all the encoders perform very well on the samples?
As for Lame 3.98.2: isn't it a good thing that it scores so well? All we have known so far is that that it brings improvement over 3.97 for certain classes of problems where 3.97 had a rather weak quality. We did not have a lot of experience that there is no serious regression with 3.98 which is possible. Now we have reason to beleive that this is not the case, we can expect from 3.98  with good reason that 3.98 is a real progress.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alexxander on 2008-11-25 09:44:42
Before anything I have to thank Sebastian again for having conducted this nice MP3 Listening Test @ 128 kbps!

I think I was too hard rating the samples but my results are very similar to the overall results:

Code: [Select]
               My Average  Test Results
iTunes 8.0.1      2,45        4,26
Lame 3.98.2       2,94        4,51
l3enc 0.99a       1,00        1,56
Fraunhofer        2,84        4,44
Lame 3.97         2,77        4,28
Helix v5.1        3,20        4,59

"Test Results" are the results of all participants. "My Average" is a simple linear average of the results as I don't remember how to do other type of analysis (too long ago  ). Taking out the highest and lowest result of all encoders produces a similar result as presented above. If anyone can tell me which formula to use in MS Excel to get error margin please do.

I'm really surprised an encoder that hasn't been tuned since 2005 gets these good results. I have more samples Helix doing better than Lame 3.98.2 than the other way around allthough differences are small. When doing the Test I noticed clearly 2 encoders were better than the rest and I thought they were the Lame ones 
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alexxander on 2008-11-25 09:55:48
...
Why worry? Isn't it a good thing that all the encoders perform very well on the samples?
...

I'm worried now not because of Helix being very competitive with Lame 3.98.2 with respect to quality but because Helix encodes so much faster and that's very usefull when I encode albums from my lossless archive to take them on the road.

I wonder why Lame doesn't do better compared to Helix having 3 years more of development on its back. I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: muaddib on 2008-11-25 10:15:56
It is not good to conclude, from the results of this test, that Helix will be the best option in 160-220 kbps range. You should check quality after encoding to this bitrate.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Jan S. on 2008-11-25 11:33:53
Wouldn't it be possible to compare the variance within each encoder to get an idea of the robustness of each encoder?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alexxander on 2008-11-25 11:37:52
It is not good to conclude, from the results of this test, that Helix will be the best option in 160-220 kbps range. You should check quality after encoding to this bitrate.

This is very obvious. I added Helix to foobar2000 to do just this: compare quality with some songs and samples 
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 11:44:50
Wouldn't it be possible to compare the variance within each encoder to get an idea of the robustness of each encoder?


I am not quite sure I understand what you mean.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: robert on 2008-11-25 12:01:14
I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil (http://de.wikipedia.org/wiki/Quantil)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 12:04:22
...I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).

You may want to try level's finding about his kind of quality improvement in this bitrate range which you can find in the Helix thread. Quality improvement was not confirmed though by other people.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-25 12:06:57
I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil (http://de.wikipedia.org/wiki/Quantil)

http://en.wikipedia.org/wiki/Quantile (http://en.wikipedia.org/wiki/Quantile)

(i think more people knows english )
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: westgroveg on 2008-11-25 12:55:10
If anything the test shows samples where LAME needs improvement at 128 kbps.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Pio2001 on 2008-11-25 13:04:36
And finally, I don't have any results where LAME 3.97 is better than 3.98.2.


I do. If Lame 3.98.2 is file 2 and 3.97 is file 5, then sample 8 sounds near-transparent to me with Lame 3.97, not with 3.98.2. I also find sample 11 better with Lame 3.97.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 13:47:40
I would be more interested in Quartile, instead of Varianz.

http://de.wikipedia.org/wiki/Quantil (http://de.wikipedia.org/wiki/Quantil)


All results are available for download already so you can calculate whatever you wish. Tukey HSD is something around 0.5 IIRC (I'm at work right now and don't have access to the exact value) so the tolerance bars are around 0.25 in each direction.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-25 13:51:28
If anything the test shows samples where LAME needs improvement at 128 kbps.

I think we should analyze the results sample by sample and discuss about the severity of the found problems. It would be useful to find out if certain obvious problems with certain encoders were apparently confirmed by the majority of the testers.

In general, I found the choice of the low anchor a bit problematic. The encoder is clearly badly broken. Obviously the 0.99 alpha version is not the version that was involved when the 128 kbps MP3 = CD quality myth was created. In my experience the release version was already a lot better.

A too bad low anchor can have an adverse effect to the rating scale the testers choose to use. It can make the differences between the contenders appear to be less significant.

For comparison, here are my results:

Code: [Select]
% Result file produced by chunky-0.8.4-beta
% ..\chunky.exe --codec-file=..\codecs.txt -n --ratings=results --warn -p 0.05

% Sample Averages:
%    iTunes    L398    Anchor    Fhg    L397    Helix
01.    3.80    2.60    1.00    3.40    1.80    4.30
02.    3.80    2.60    1.40    2.80    2.20    3.10
03.    3.90    3.10    1.00    4.30    3.70    2.90
04.    4.20    4.40    1.00    4.40    3.30    4.40
05.    2.70    3.50    1.00    3.00    3.00    3.80
06.    2.20    3.50    1.00    3.00    3.90    4.00
07.    3.70    4.00    1.00    3.80    2.00    4.00
08.    2.40    4.00    1.00    3.00    4.30    3.00
09.    3.00    3.40    1.00    3.60    3.40    2.50
10.    4.50    4.50    1.00    4.20    4.00    4.50
11.    3.90    2.70    1.00    3.50    3.90    2.40
12.    2.00    3.70    1.00    2.60    3.30    3.60
13.    3.70    3.20    1.00    3.80    2.50    4.00
14.    2.00    3.60    1.00    3.10    3.30    4.00

% Codec averages:
%%%    3.27    3.49    1.03    3.46    3.19    3.61
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-25 14:19:44
Just try some Metal tracks on Helix at V60, I guarantee it will struggle.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-25 15:07:26
/mnt told me that Helix is not gapless, which is to me a serious shortcomming. Another thing is that Helix is not that robust as LAME is. But what is stunning people here is the encoding speed of a encoder it hasn't been worked on for 3 years, while latest fresh LAME is so so much slower to encode!
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DigitalDictator on 2008-11-25 15:15:25
Why would Helix struggle with metal? IIRC it also struggled with classical music. It can't struggle with everything and still be on par with Lame, can it?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: uart on 2008-11-25 15:30:53
In general, I found the choice of the low anchor a bit problematic.


Yeah great test but I'd have to agree about the low anchor being too badly broken. That low anchor really was just a little too low IMHO.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 15:32:49
Well, one thing you have to consider is that Helix struggled with classical music for one person: Guru. These test results however are based on over 300 results.


In general, I found the choice of the low anchor a bit problematic.


Yeah great test but I'd have to agree about the low anchor being too badly broken. That low anchor really was just a little too low IMHO.


Yeah, I guess you are right. The thing is that I also wanted to see / show how the quality for MP3 developed since the first days. 0.99a was a public version BTW. It's not that is was leaked from somewhere.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-25 15:38:09
Why would Helix struggle with metal? IIRC it also struggled with classical music. It can't struggle with everything and still be on par with Lame, can it?


I encoded a few metal albums from Metallica (Ride The Lightning), Iron Maiden (Powerslave) and Fear Factory (Obsolete) and plenty of the tracks on those albums warble like hell on guitars and have a swush noise on the drums. Hell just try any Fear Factory track on Helix such as Replica or Linchpin (Sample 11), it will choke like it was Blade. I will post some ABX results and possibly samples later.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-11-25 16:21:13
/mnt told me that Helix is not gapless, which is to me a serious shortcomming.
Agreed.  I thought about this last night and meant to test myself today.  I can confirm that foobar certainly does not play Helix files gaplessly.  This is a killer for me, although I am unlikely to switch from LAME in any case.

Well done Sebastian.  Very interesting results.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: IgorC on 2008-11-25 16:23:28
I can confirm statements of /mnt . Helix isn't good on my collection of rock and metal samples. But it's good for a bunch of overkill samples. 
I noticed that LAME has a problem with  a few first seconds of each sample while Helix doesn't.  It can explain surprising reults at least partially.

Should admit interesing test. Thank you, Sebastian.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ff123 on 2008-11-25 16:26:56
I can confirm statements of /mnt . Helix isn't good on my collection of rock and metal samples. But it's good for a bunch of overkill samples. 
I noticed that LAME has a problem with  a few first seconds of each sample while Helix doesn't.  It can explain surprising reults at least partially.

Should admit interesing test. Thank you, Sebastian.


There should have been an option in the configuration files to specify that the first second or two of each sample would be ignored.  This would prevent the problem you mention creeping in.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-25 16:34:21
0.99a was a public version BTW. It's not that is was leaked from somewhere.

Out of curiosity, where did you get that info? Do you know someone who was involved?

I suppose it wasn't difficult for FhG to prevent it from leaking. Dial-up internet connections were starting to become popular, but you couldn't easily download leaked or any other files at that time. I recall that I bought a couple of programs after checking the manufacturers websites in mid-nineties, but I sent the orders by fax and the floppy disks & manuals were delivered by snail mail. If you needed an update to a driver or program you connected the manufacturer's modem bank directly by a dial-up modem or got it on a floppy disk which was posted to you.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-25 17:04:41
The 2 or 3 first seconds were already ignored in this test.

Interesting results anyway. Conclusion is far from what I reached in the past. I only tested the last 11 samples ; my results are therefore not totally comparable but are significantly different:

iTunes: 2.98
Lame 3.98: 3.30
l3enc: 1.171.00
fraunhofer: 3.51
LAME 3.97: 3.68
Helix: 2.95

This is also the very first test I performed with my new headphone I just owned the day before I started the test. The new sound signature was so different and therefore so disturbing that I didn't bother to spend more than a few minutes to test and give an evaluation to each sample. It was a strange experience for me. I wonder how much a different headphone may change results. But it becomes clear to me that a different material configuration could heavily disturb a listener.

Anyway, even in this highly confused listening environment my results in this test tend to confirm that Helix doesn't please me at all, even with completely different samples / musical genre.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 17:08:15

I can confirm statements of /mnt . Helix isn't good on my collection of rock and metal samples. But it's good for a bunch of overkill samples. 
I noticed that LAME has a problem with  a few first seconds of each sample while Helix doesn't.  It can explain surprising reults at least partially.

Should admit interesing test. Thank you, Sebastian.


There should have been an option in the configuration files to specify that the first second or two of each sample would be ignored.  This would prevent the problem you mention creeping in.


2000 ms were given as run-in time.


0.99a was a public version BTW. It's not that is was leaked from somewhere.

Out of curiosity, where did you get that info? Do you know someone who was involved?

I suppose it wasn't difficult for FhG to prevent it from leaking. Dial-up internet connections were starting to become popular, but you couldn't easily download leaked or any other files at that time. I recall that I bought a couple of programs after checking the manufacturers websites in mid-nineties, but I sent the orders by fax and the floppy disks & manuals were delivered by snail mail. If you needed an update to a driver or program you connected the manufacturer's modem bank directly by a dial-up modem or got it on a floppy disk which was posted to you.


Do you have any info that it was leaked? I never heard about l3enc being leaked. fastenc, yes, but l3enc, no. Roberto can also backup my claims for sure.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alexxander on 2008-11-25 17:27:09
...
I can confirm that foobar certainly does not play Helix files gaplessly.  This is a killer for me
...

I just tested it also, Helix does not play gaplessly and gapless play is a must for me. Helix doesn't do bad on track 01 - Pet Shop Boys - In The Night (http://www.hydrogenaudio.org/forums/index.php?act=Attach&type=post&id=4636) though but it doesn't mind to me anymore.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-25 17:50:50
Do you have any info that it was leaked? I never heard about l3enc being leaked. fastenc, yes, but l3enc, no.

I didn't think it was leaked. I meant the first part: "0.99a was a public version BTW." At that time "public version" probably meant that there was an official release announcement, product pricing was determined and it was made available to potential buyers. I assume it was initially offered to professional users who needed to compress audio files for a reason or another. There wasn't any consumer market for MP3 encoders yet.

Actually, I think you may have read the RRW's l3enc story hastily. I think it says that the first public version was 1.00 (1994-07-13). The 0.99a version happens to be just below the *****The First Ever publicly available MP3 Software Encoder***** text line, which is a bit misleading.

from http://www.rjamorim.com/rrw/l3enc.html (http://www.rjamorim.com/rrw/l3enc.html)
Quote
FhG l3enc MP3 Encoder

Fraunhofer l3enc was the first MP3 (back then called MPEG layer 3, that's why l3enc) software encoder. The first public version was released on 1994-07-13, and before that only very expensive hardware encoders existed (not that l3enc wasn't expensive itself, registration for version 2.0 cost 350DM - ~U$250) ...

*****The First Ever publicly available MP3 Software Encoder*****
Date: 1994-03-16
Version: 0.99a
Interface: Command Line
Platform: DOS
Download: l3enc099a.zip - 311Kb ...

Date: 1994-07-13
Version: 1.00
Interface: Command Line
Platform: DOS
Download: l3enc100.zip - 126Kb  ...

Date: 1994-06-13
Version: 0.99c
Interface: Command Line
Platform: DOS
Download: l3enc099c.zip - 312Kb


Edit: fixed: 2004 -> 1994
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Jillian on 2008-11-25 18:13:41
I like the part where test result (quality and encode speed) should raise the popularity of Helix, but instead people try to proof that Helix is bad in their test, while the others blame Helix for not support gaplessness.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ExUser on 2008-11-25 18:34:42
/mnt told me that Helix is not gapless, which is to me a serious shortcomming. Another thing is that Helix is not that robust as LAME is. But what is stunning people here is the encoding speed of a encoder it hasn't been worked on for 3 years, while latest fresh LAME is so so much slower to encode!


Agreed.  I thought about this last night and meant to test myself today.  I can confirm that foobar certainly does not play Helix files gaplessly.  This is a killer for me, although I am unlikely to switch from LAME in any case.


I just tested it also, Helix does not play gaplessly and gapless play is a must for me. Helix doesn't do bad on track 01 - Pet Shop Boys - In The Night (http://www.hydrogenaudio.org/forums/index.php?act=Attach&type=post&id=4636) though but it doesn't mind to me anymore.


Did you guys try encoding to MP3/CUE for an album then splitting with pcutmp3? That should retain gapless information even with Helix. I haven't tested this out yet myself, but it could solve the problem. I think I might go give it a try and report back.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-25 18:36:58
/mnt told me that Helix is not gapless, which is to me a serious shortcomming.
Agreed.  I thought about this last night and meant to test myself today.  I can confirm that foobar certainly does not play Helix files gaplessly.

I thought that this gapless issue was common knowledge. Only the LAME encoders and nyaochi's ACMENC (a command line frontend for the ACM MP3 encoders) can create the LAME info headers which contain the needed info for gapless decoding.

FhG has developed a newer proprietary header style that serves the same purpose, but as far as I know only FhG's own decoder and Winamp support it.

It would be nice if nyaochi would create a version of his tool that would support .exe CL encoders like Helix and FhG. Or maybe someone else would like to modify it. I think the source code is available under LGPL license.

More info about ACMENC:
http://www.hydrogenaudio.org/forums/index....st&p=239454 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=26956&view=findpost&p=239454)
http://nyaochi.sakura.ne.jp/software/acmenc (http://nyaochi.sakura.ne.jp/software/acmenc)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ExUser on 2008-11-25 18:51:39
Gapless encoding using Helix is possible by encoding the entire album to one MP3 and a cuesheet, then splitting the MP3 with pcutmp3. I've just verified that. It's a bit of a hack, but it works with foobar2000. It should work anywhere pcutmp3 is known to work already.

C'mon guys, think outside the box.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sld on 2008-11-25 19:05:47
Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess

Ironically, I learnt about confidence intervals from a similar listening test conducted by roberto (rjamorim). It was him or a fellow forumer who kindly posted the correct way of interpreting such graphs.

Now, if that poster had your attitude, this forum would have been the poorer for it.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-25 19:06:49
Do you mean having to "split" manually the entire album, just to obtain gapless!? Ouch that's way too much...
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 19:10:11
Created 7 / 14 sample graphs...

BTW, Tukey's HSD for all samples is 0.537.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-25 19:18:04
Sebastian, where it says, "The results are graphed below. They show that all encoders are tied on first place, except l3enc which of course comes out last being the low anchor."

You should bold, colour, etc, like this:

"The results are graphed below. They show that all encoders are tied on first place, except l3enc which of course comes out last being the low anchor."

Or something like that
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-25 19:33:05
Do you mean having to "split" manually the entire album, just to obtain gapless!? Ouch that's way too much...

As Canar said, it's a hack.

If you would like to quickly convert a bunch of already ripped lossless track files to MP3 you would first need to create the image & cue files in a way or another.

Personally, I would propably first use foobar for converting the files to an MP3 image file and cue sheet, then cut the file with pcutmp3 and finally copy my extensive tags from the source files with foobar or Mp3tag, but that isn't really very practical.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Bodhi on 2008-11-25 19:44:16
Great Job once again. Thank you Sebastian!
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-25 19:58:53
Quote
As Canar said, it's a hack.


That alone wouldn't compensate a switch from LAME!
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-25 20:00:07
Personally, I would propably first use foobar for converting the files to an MP3 image file and cue sheet, then cut the file with pcutmp3 and finally copy my extensive tags from the source files with foobar or Mp3tag, but that isn't really very practical.

Not practical at all. Statistically speaking, Helix is tied to LAME (on 14 samples). Mathematically, the difference is close to be unsignificant. The biggest advantage of HELIX lies in encoding speed ; manipulating cuesheet and external tool would simply ruin this advantage.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-25 20:05:51
unless someone patches Helix and release it with that hack in LAME...
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-11-25 20:07:22
The biggest advantage of HELIX lies in encoding speed ; manipulating cuesheet and external tool would simply ruin this advantage.
Exactly.  If it can't be achieved in normal working mode then I'm not interested.

@Alex B.  Possibly common knowledge; however knowledge that I personally did not have, and could only assume.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 20:10:46
In case you are interested, here is a quick and dirty "quality distribution" across the samples:

(http://www.listening-tests.info/mp3-128-1/distribution.png)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 20:13:35
I like the part where test result (quality and encode speed) should raise the popularity of Helix, but instead people try to proof that Helix is bad in their test, while the others blame Helix for not support gaplessness.

I second that.
Nobody complained about the samples or a potential bias they might give to some encoders before the test.
A listening test's outcome is seriously influenced by the samples used (and the degree the participants are sensitive towards the issues with them).
But that's a natural thing we have to accept. It's been like that with any prior listening test.

As I wrote personal conclusions are another story, and everybody is doing well to look at the test's details with respect to personal relevance before making decisions about encoder usage. Put it's correct in general for instance what we leaarnt here about Helix' behavior with metal, metal lovers won't like to use Helix.

Unfortunately there's a high degree of over-simplification even in this forum.
Many people like to see the best encoder (in a universal sense), and they expect it to be Lame (and we see again that Lame is great, it's just not the greatest encoder), and they expect that there should
be serious quality differences between encoders in a general sense.

What a pity! We should be glad that we have a variety of excellent encoders to pick from, so that it's also easy to obey to non-quality related properties like gapless playback or encoding speed according to personal demands.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ExUser on 2008-11-25 20:14:21
The point I was trying to make was that though Helix is inherently gapful, it doesn't necessarily need to be, nor do any of the encoders here. If anyone wants to start using it for regular use to hunt for problem samples and still wants gapless, it's possible to do. It's not particularly straightforward to make them gapless, it's just possible.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 20:26:14
In case you are interested, here is a quick and dirty "quality distribution" across the samples:...

Thanks for the graph, very interesting.
Helix keeps well above 4.0 throughout all the samples, Lame 3.98 is getting close to that, and FhG is also not far behind. iTunes and Lame 3.97 are showing several weaknesses of a more serious kind.

Very interesting is the encoders' varying performance on sample 12 (Helix and L3.98 are doing pretty good and the other ones rather bad), and sample 1 (both Lame versions, especially L3.97, are performing quite a bit worse than the other encoders). Sample 6, 7, and 14 show specific weaknesses of iTunes resp. L3.97.

Sample 11 is interesting too as it adresses Helix' metal issue we learnt about in this thread. Yes, it's the weakest sample for Helix, but obviously the majority of testers didn't see a real issue with it. Of course anybody can come to a different individual conclusion. Quality judgement is personal and vital for choosing encoder.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-25 20:42:05
in this graph, LAME 3.98.2 seems the more stable encoder...
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: TechVsLife on 2008-11-25 20:44:43
@sebastian mares: thanks for the test!

@guruboolez

Helix doesn't please me at all

Your personal quick test shows (if not a typo):

helix:lame398::lame398:lame397.  So acc to this, if helix is worse than lame 3.98, then for you lame 3.98 is just as much worse than 3.97.  How would you explain? 

I take it that it's impossible for any individual's result in the general test to be (statistically) meaningless, because it's repeated and blinded etc. (so it's never a "fluke").  So I'm trying to explain what that means in the context of a statistical tie for the group, esp. in those individual cases, like guruboolez, where there is NOT a tie.

1. all encoders are so close, that individual sensitivities/variances (or quirks, depending on your view of their significance) dominate more, even (or especially) in a group of more sensitive than average listeners.  [this could sometimes result in ties for just one listener, if we are not talking about a specific weakness of an encoder repeated across several selections of music but very fine and specific differences, limited to specific sounds or instruments or genres, see #2]

2.  the division by music genre (or instruments used etc.) seems important.  is there a way to know if there is a division in the results this way, i.e. producing something other than a tie for the whole result set? [this could be true along with #1]. (I esp. care about classical.)

3. is there a way to know whether and to what extent an unduly low anchor masked or could mask substantial quality differences?

p.s. The sample by sample discussion is good and does address #2.  The graph by sample is helpful--wonder if there is enough data to make informed statistically sound judgments by music type?)


The 2 or 3 first seconds were already ignored in this test.

Interesting results anyway. Conclusion is far from what I reached in the past. I only tested the last 11 samples ; my results are therefore not totally comparable but are significantly different:

iTunes: 2.98
Lame 3.98: 3.30
l3enc: 1.171.00
fraunhofer: 3.51
LAME 3.97: 3.68
Helix: 2.95

This is also the very first test I performed with my new headphone I just owned the day before I started the test. The new sound signature was so different and therefore so disturbing that I didn't bother to spend more than a few minutes to test and give an evaluation to each sample. It was a strange experience for me. I wonder how much a different headphone may change results. But it becomes clear to me that a different material configuration could heavily disturb a listener.

Anyway, even in this highly confused listening environment my results in this test tend to confirm that Helix doesn't please me at all, even with completely different samples / musical genre.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: benski on 2008-11-25 20:50:57
In case you are interested, here is a quick and dirty "quality distribution" across the samples:


I doubt the individual samples have a large enough sample base to make the statistics meaningful, but two things of note:

1) iTunes encoder and LAME 3.98.2 never perform the best on any sample.
2) FhG encoder never performs the worst on any sample. (Except #5 where the graph shows it is tied for the worst)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-25 21:06:39
in this graph, LAME 3.98.2 seems the more stable encoder...

As halb27 said, just by looking the picture it is obvious that LAME 3.98.2 has a problem only with the sample no 1. It seems to produce good quality with all other samples. It should be possible to LAME developers to fix this "sample 1" problem because the other three encoders can handle it better. Though, I could name a few other samples that are especially problematic for LAME.

It is also obvious that Helix did not fail with any of the samples (it didn't go below 4). Personally, I didn't like how it handled the samples 3, 9 and 11. (In my results Helix was the worst encoder with these samples).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 21:18:28
The graphs for all samples are available on the results page. I will add the corresponding text tomorrow since it's 22 o'clock and I just finished cooking.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: singaiya on 2008-11-25 21:21:40
Is anybody else not surprised that each contender is statistically tied? It was the same in the multi-format test at 128 kbps from 2005.

I'm wondering if testing at lower bitrates will increase separation in rankings. Regardless, I'd like to test 96 kbps next instead of 80 kbps.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-25 21:25:49
The graphs for all samples are available on the results page. I will add the corresponding text tomorrow since it's 22 o'clock and I just finished cooking.

Thanks a lot for your hard work. Enjoy your meal.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sunhillow on 2008-11-25 21:35:24
Thank you for this great checkup, Sebastian! I think it shows that we can be comfortable with bitrates in the -V3 to -V2 range  :-)

Now enjoy a Tannenzäpfle with your meal...
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Zilog Jones on 2008-11-25 21:53:27
I, as many others was also very surprised by the results. Seeing how well Helix coped with the electronic tracks (sample 12 was the only one where I could ABX 5 out of 6 samples, with Helix being the only one I couldn't) and considering the speed of the encoder I am definately going to try it out on my FLAC transcodes for my MP3 player instead of LAME 3.98.2. If it wasn't for this test I would have never bothered considering any other encoder, so thanks! 

Also, do you have the full artist names and titles for all the samples? I know sample 12 is Kalifornia by Fatboy Slim (from You've Come a Long Way, Baby), and Tom's Diner is by Suzanne Vega, but after that I'm lost. Sample 01 certainly sounds like it could be a Nobuo Uematsu composition (he did the music for most the Final Fantasy games), but it sounds like PC MIDI which is a bit odd.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sizetwo on 2008-11-25 22:16:53
Without adding fuel to the fire, I think its strange reading some of the comments to this test. As the forum is so hellbent on factual tests and ABX'ing, and when the result in a way contradicts the paradigm of the forum, a lot of people start questioning it.  Its almost as if though there is a preference of encoder, and its ... not... Helix... Its as if ... some people really like defending LAME. Wow, I would have never thought.  No, honestly though, thanks for the test Sebastian. 

Lets look at the facts of the test; the proof is in the pudding. Does it mean that Helix is >= LAME at 128kbps? Yes, apparently it is.  If the forum people really want people to use cold hard facts when making a claim; well here it is.  Now get over it. Seriously.  If we want to yell "ABX and ABC" at people making encoder claims, we really need to be content with the results we're given.  Sometimes LAME doesn't yield a superior result, and sometimes it does.  Does that mean we have all have to switch to Helix ? Absolutely not.  But lets not turn the replies into some frantic strange twisted sales pitch for LAME (free as it is), as it seems that some people want it to be.

Anyway, it certainly was a baffling result.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-25 22:24:40
I, as many others was also very surprised by the results. Seeing how well Helix coped with the electronic tracks (sample 12 was the only one where I could ABX 5 out of 6 samples, with Helix being the only one I couldn't) and considering the speed of the encoder I am definately going to try it out on my FLAC transcodes for my MP3 player instead of LAME 3.98.2. If it wasn't for this test I would have never bothered considering any other encoder, so thanks! 

Also, do you have the full artist names and titles for all the samples? I know sample 12 is Kalifornia by Fatboy Slim (from You've Come a Long Way, Baby), and Tom's Diner is by Suzanne Vega, but after that I'm lost. Sample 01 certainly sounds like it could be a Nobuo Uematsu composition (he did the music for most the Final Fantasy games), but it sounds like PC MIDI which is a bit odd.


Sample names and sources are now on the results page.

Does it mean that Helix is >= LAME at 128kbps?


Statistically, for the people who tested and for the given samples, it is on par with LAME, not better.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DigitalDictator on 2008-11-25 22:30:31
I've been asking this a couple of times, but I guess there's no answer to it - is it possible to tune Helix further, since it seems to be some headroom for it?

If not, can better quality be achieved by fiddling with the command line switches? Earlier on, there were different command lines for Helix floating around here at HA.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sizetwo on 2008-11-25 22:34:48
Quote
Statistically, for the people who tested and for the given samples, it is on par with LAME, not better.


Sorry then, Helix == Lame at 128kbps. End of discussion... ? Probably not.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-25 22:52:09
Sorry then, Helix == Lame at 128kbps.

No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more.
Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-25 23:36:32
No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more. Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.
100% with you guru.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-25 23:49:57
I have posted some ABX logs and samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=67548) of tracks that shows Helix's major flaws.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sld on 2008-11-26 04:14:31
Does it mean that Helix is >= LAME at 128kbps? Yes, apparently it is.

You are probably the type that derives satisfaction from counting all those "smug, self-satisfied, self-proclaimed intellectuals" wrong. Unfortunately, your claim can never use ">=" to compare the encoders. As Guru interpreted the graphs correctly, the comparison to make is "==" .

You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length, and that you may have a shot at using ">=" instead of "==" .
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: singaiya on 2008-11-26 04:46:44
You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: JasonQ on 2008-11-26 05:18:29
Good test.  Good to see that Helix had a solid showing.  What I take away from this is that Lame 3.98 is rock solid.  It seems to be a bit more consistent then 3.97, and should probably be used instead.  If you want a fast encoder, Helix or Fraunhofer will do the trick with no worries.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sizetwo on 2008-11-26 06:18:32

Sorry then, Helix == Lame at 128kbps.

No. It's Helix == Lame at 128kbps for the people who tested and for the given samples - as Sebastian just said it. Nothing more.
Unlike you, I don't see anybody defending LAME in this thread.

A listening test doesn't give an answer. It's just a lead to further investigation and a brick to build an encoder's reputation. LAME's positive reputation tooks years to be constructed and it wasn't done by one listening test. I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.


You make a valid point, but then I think this should also be the mantra of any listening test; the result is only valid for the people who did the test, its not a qualitative indicator to the format/encoder.  But what you are saying here is that what we need is quantity to get the "real" proof, in other words, there were few participants ? My issue is this: Had LAME (3.97 or 3.98.2) come out on top, none of this (IMHO) would have happened. Am I right or has the box of crazypills been opened again.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-26 06:55:01
But what you are saying here is that what we need is quantity to get the "real" proof, in other words, there were few participants ?

Participants, but also samples - or what I said before, experience. And even there, you won't get any real proof, or universal answer. The very best encoder on the world won't necessary please every single user. When people here will start using HELIX, reporting their good feelings and also their bad samples; when developers will start fixing those issues; then HELIX will for sure become a true alternative, or maybe the obvious choice for MP3 encoding at x bitrate. Trust is something that need a long time to grow. LAME is not the best MP3 encoder but the most tested and therefore the most trustable. LAME not better but simply safe (to a certain point).
Anyway, if HELIX really please some people here, I really suggest them to start using it. Their experience will be for sure interesting for all other possible users.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 08:06:52

You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.


It does have an effect. I never said it is the only thing that influences the error margins.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-26 09:04:00
...Unlike you, I don't see anybody defending LAME in this thread. ...  I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.

Well, as you can learn from recent posts there are some people feeling that there are posters here defending Lame in an inadequate way (though there is nothing to defend). Chance is high they wouldn't do something similar if Lame had come out clear on top. I am one of these who feel like that.
And you are one of those Lame defenders, and you do it in a way I really dislike. What you say isn't wrong, it's just killer statements which if taken seriously makes this test worthless.

It's true, and you can read it for instance in my posts in this thread, that such a test just contributes to the experience on encoders. It is one of the most objective contributions of a considerable amount of participants with higher demands on encoder quality who spent a considerable amount of time evaluating this. It's the average judgement of active HA members (and comparable people) on the samples tested. Not more. Not less.

You are trying to relativate Helix' result by throwing doubts on the way we can trust Helix, and on the other hand you try to give special merits to Lame because you think we can trust Lame more. This simply isn't fair. And it's even a bad argument, cause Lame 3.98 isn't Lame 3.97 and when going back in time we had significant changes in Lame technology when looking at the Lame history. Moreover what is this trust in Lame good for if for instance with Lame 3.97 the 'sandpaper problem' came up? We just should stick to the real experience we have with encoders. The trust speech without hard facts is the non-audio variant of the warm-fuzzy feeling speech.

I like the way AlexB talked about his judgement on Helix behavior on 3 samples which he didn't like. He says what he felt, but in a way which respects the results of the test (which is the judgement of all the participants).

If we look at the test results IMO we can conclude the following for practical purposes:

a) the overall outcome of the encoders averaged over all the samples doesn't give any hint which encoder to use

b) the detailed outcome of the encoders on the individual samples gives some hints which encoder to use:

b1) iTunes and Lame 3.97 aren't attractive candidates for encoding (things can look different in case those samples where these encoders perform weakly are not very relevant for the individual choosing the encoder)

b2) Lame 3.98, Helix, and FhG are all good candidates to use. Which encoder is 'best' is personal and can partially be answered by figuring out which samples are individually most relevant and looking at these encoders' outcome on these samples. Best is backing things up by additional personal tests with favorite music. Not mentioning non-audio quality related topics which are relevant too for encoder choice, but in a very individual way.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-26 09:34:16
I have posted some ABX logs and samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=67548) of tracks that shows Helix's major flaws.

We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-11-26 09:53:58
I think guru has answered most of the nonsense far better than I ever could, but I can't let the statements below go without some comment.

I like the part where test result (quality and encode speed) should raise the popularity of Helix, but instead people try to proof that Helix is bad in their test, while the others blame Helix for not support gaplessness.
I second that.Nobody complained about the samples or a potential bias they might give to some encoders before the test.
A listening test's outcome is seriously influenced by the samples used (and the degree the participants are sensitive towards the issues with them).
It's incredulous to me that you think that we may sing the praises of Helix but not mention any of the cons.  It is obvious, following the result of this test that members are going to be drawn to Helix: we have had people suggesting that it become the new HA recommendation solely from the results of this test, and also members stating that it is proved better than LAME.  I think that it is important for members to consider the reality, pros and cons.

As for you halb27, are you complaining that we are not complaining enough or too much?  Should we start complaining about the samples or bias?  Is this even relevant to your point?


...Unlike you, I don't see anybody defending LAME in this thread. ... I don't think HELIX is currently as trustable as LAME. A possible collective experience may help to get a better vision of HELIX quality and flaws. This experience will make the pudding bigger and the proof clearer.
Well, as you can learn from recent posts there are some people feeling that there are posters here defending Lame in an inadequate way (though there is nothing to defend). Chance is high they wouldn't do something similar if Lame had come out clear on top. I am one of these who feel like that.
And you are one of those Lame defenders, and you do it in a way I really dislike. What you say isn't wrong, it's just killer statements which if taken seriously makes this test worthless.
I find this attack most strange.  How many LAME users are there compared to Helix users?  Which encoder do you think has had the most testing?  Or do you feel that these fourteen samples are enough to usurp the thousands of tracks that LAME users have thrown at LAME?

I just don't get it.

I don't think that you should see it as LAME fanboys people shooting down Helix for no reason; I would rather see it as users who have shown a fresh interest in Helix attempting to make an informed decision.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-26 10:24:06
...As for you halb27, are you complaining (about sample bias) that we are not complaining enough or too much?  Should we start complaining about the samples or bias?

I didn't complain, and IMO nobody should do (or he should have done it when it was about sample selection if there had been some concerns). What I'm trying to say is: We should take the test as it is. There is a tendency in this thread by some posters that sound like lowering the Helix results. This isn't good. Look at for instance Helix' behavior on metal. Ít is reflected in the test. And it's okay to provide additional warnings on this from people who have experience in this field. But am I over-sensitive when I feel the way it's done has a tendency to bring down Helix in a more general way? May be I am, but that's what I feel about it. And it looks like I'm not the only one.
BTW I personally don't use Helix (I'm personally considering converting from Lame to FhG), but I can't see the catastrophe when Helix gets some attraction. After all it seems to be a good encoder (okay, not so much for metal and hard rock). Maybe some warning should be given that we can't expect any Helix development (guru gave this hint already) and have to take Helix as is.
But I don't expect further FhG stereo mp3 development as well. I don't care when I'm happy with what I got. Brings even some relief not having to care about new versions. We can be happy with Lame being developed further, but we can be happy with Lame 3.98. mp3 development has pretty much reached its good end, as shown in this test.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Gabriel on 2008-11-26 10:54:09
In case you are interested, here is a quick and dirty "quality distribution" across the samples:

(http://www.listening-tests.info/mp3-128-1/distribution.png)

Would it be possible for you to include this graph within the results page?

Btw, question for the audience: What are the relative speeds of FhG/Helix/Lame ?
edit: sorry, speed is already mentioned within the test results
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 11:06:08
Sure, can be done when I get home.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-11-26 11:14:36
...but I can't see the catastrophe when Helix gets some attraction. After all it seems to be a good encoder (okay, not so much for metal and hard rock). Maybe some warning should be given that we can't expect any Helix development (guru gave this hint already) and have to take Helix as is.
I haven't seen anybody suggesting that it is catastrophic.  I think the attention has been very positive on the whole.  Very positive, given that many people had probably never heard of or tested the damn thing!

For my part I think that Helix's results are of great interest.  So much interest that I considered running some tests of my own; however, I really enjoy the fact that I can play LAME MP3s gaplessly with foobar, when my Creative Nano fails at this I find it really jarring.  The fact that Helix cannot currently do this natively (please, no-one bother pointing out Canar's tongue in cheek suggestion) is a major drawback in my eyes.  I'm not saying that it is not a minor fix.

I am not one of these members that is willing to encode every track with various encoders at various settings to see which makes a better job of it.  I decided upon LAME -V5 a couple of years back and I stick with it.  That's not to say that I can't change, but I don't have the time to be so picky when encoding new albums.

Now, that is not to say that Helix will never be a contender.  It is open source, and improvements can be made, if anyone cares to undertake it.

I'm very much in favour of some positive attention to Helix - as you rightly point out, the more the merrier - but I'm not in favour in glossing over its failings just because it's in vogue in November 2008.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sizetwo on 2008-11-26 11:56:42
Derived from this test and the consequent forum postings, this is what I have learned about listening tests:

1: Whichever encoder(s) comes out on top of a test does not indicate that it is a superior encoder at that bitrate, regardless, as in this case, if its a tie.
2: The samplesize and participants need be increased as well as some form of participant knowledge on audiocompression and what to listen for (artifacts).
3: The samples selected for a test will never be enough to make an encoder "safe", in other words we will not  be able to know that the samples are representative for the various types of music one would imagine to compress.
4: The test results should be interpreted in a highly subjective manner, as everyone seems to interpret the results differently.
5: The final outcome is for most people to end up saying "test for yourself", thus negating the empirical evidence we can draw from such a test, and ultimately making it rather pointless, other then saying that people should stay away from the low anchor.

The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 12:12:28
The problem which a lot of people do not understand is that you cannot generalize the results by saying "Encoder x is the best" when a finite number of participants test a finite number of samples at a certain bitrate.
If you have an average hearing and are listening to all the types of music that were covered in this test, you could actually choose any of the contenders with regards to quality. If you only listen to metal, you would put the encoders that performed best at metal on your list. What these public listening tests actually serve for is to let you narrow down the encoders you should consider for starting your own tests. Then you start to cut more and more encoders from your list depending on whether you need fast speed, gapless playback, support for platforms like Linux or Mac, etc. and in the end, you come up with one encoder that is best suited for your individual needs. I hope you get my point.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-26 12:24:16
...5: The final outcome is for most people to end up saying "test for yourself", thus negating the empirical evidence we can draw from such a test, and ultimately making it rather pointless, other then saying that people should stay away from the low anchor.

The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

It's true that the overall outcome averaged over all samples doesn't say much especially if the results are tíed. But looking at the detailed results every reader can get results for any level of personal effort he is willing to put into interpreting the results.
IMO it's like this (keep in mind it's only about interpreting the results of the test):

a) for the take-it-easy-people people not struggling about details:
Helix is best as it achieved good results with any sample. It's rather closely followed up by first Lame 3.98 and second FhG surround which show a weakness (of minor to modest degree) on only 1 sample.  iTunes and Lame 3.97 are quite a bit behind having both 3 weaknesses (one of them being of higher degree).
From the test organization a warning can be helpful that deciding things this way may lead to suboptimal decision as personal relevance of the samples is not taken into account.

b) for the more caring people giving some effort to result interpretation but avoiding own listening tests:
Concentrate on those samples which are meaningful to you (which are roughly your kind of music) and ignore those samples which have no or nearly no relevance to you. Look at the outcome of the various encoders for this sample selection and pick your favorite.
From the test organization this procedure can be supported by giving more detailed information about the samples (genre(s) in the first place), as not every reader will listen to the samples (which however should be highly recommended cause otherwise the reader doesn't know what he's reading about).

c) for the very caring people allowing for own listening tests procedure b) can be a start and eventually make things easier as it can exclude certain encoders from consideration.

In case several encoders are getting candidates for personal use this way: don't worry, enjoy the choice in it's own right (and you always have the choice to go the b) or c) way in case you're coming from a level above).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: uart on 2008-11-26 13:36:24
1: Whichever encoder(s) comes out on top of a test does not indicate that it is a superior encoder at that bitrate, regardless, as in this case, if its a tie.


This is called "statistical significance" and is a very important part of making judgements about which is better in  cases where there is an element of randomness or  uncertainly (aka variance) in measurement. There is well developed statistical theory that analyses the difference in the means (averages) in relation to the variance of the scores and the number of samples and determines whether the observed difference is likely to be a result of chance or whether it is more likely that it is due to a genuine difference in the nature of the things being measured. Loosely speaking these two cases correspond to "no significant difference" or a "significant difference" respectively.

What people really mean here when they say the scores are "tied" is that the cold hard statistical mathmatics says that the differences are not statistically significant. Essentually this just means that the underlying randomness of the data set means that is unrelaible to assume that there is a real difference.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-26 14:02:07
I think it would be good to quote Pio2001's valid comment here:

... Oh, and Greynol is right. Helix is not  winner. It is tied. The differences are within the confidence intervals, which means that they are just random. If you redid the test with the same samples and same listeners, the simple fact the ABC/HR presents them in a different order every time would probably lead Lame, or Fraunhofer, or iTunes to get a slightly, but not significantly, superior score.

We must consider this to be chance, unless we have more information to backup further claims.


To better understand the results I am going to start sample specific discussion threads - one for each sample.

The first two are here:

http://www.hydrogenaudio.org/forums/index....showtopic=67562 (http://www.hydrogenaudio.org/forums/index.php?showtopic=67562)
http://www.hydrogenaudio.org/forums/index....showtopic=67561 (http://www.hydrogenaudio.org/forums/index.php?showtopic=67561)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-26 16:42:27
b1) iTunes and Lame 3.97 aren't attractive candidates for encoding (things can look different in case those samples where these encoders perform weakly are not very relevant for the individual choosing the encoder)

This is nonsense.    Who's to say using 14 different samples would have given the exact same outcome?  If they had then maybe you'd be right but there aren't more samples.  If the difference between samples where Lame 3.98 scored consistently and significantly higher than Lame 3.97 was due to a known defect of Lame 3.97 that has been corrected in Lame 3.98 ("sandpaper problem"), then possibly.  Perhaps a class of samples exist that show weaknesses new to Lame 3.98.  This is not beyond the realm of possibility considering that we've seen regression in Lame's CBR method with at least one documented sample between 3.93 and 3.98, though one sample does not a class make.

Based on the test results the candidates were all tied.  There is not enough statistical evidence to suggest the sound quality of any are more attractive than any other, period, end of discussion.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-26 17:20:46
I'm quoting halb27:

Well, as you can learn from recent posts there are some people feeling that there are posters here defending Lame in an inadequate way (though there is nothing to defend). Chance is high they wouldn't do something similar if Lame had come out clear on top. I am one of these who feel like that.
Of course, and that's perfectly normal. When a general consensus is confirmed, there's no debate. But when the same consensus is broken by a new element (test, proof, theory) then the pertinence of the latter is subject to strong debate. Take an example. A scientific would find a new proof that earth turn around the sun: the scientific community won't put real attention to this new proof. Another scientific would bring a test proving that heliocentrism is wrong… and guess what will happen. You see a bias where there's simply a very common attitude.

“What you say isn't wrong, it's just killer statements which if taken seriously makes this test worthless.”
So what I say is not wrong but you refuse to accept it because it makes the test worthless?! I said this result is "a lead" and "a brick" to a bigger building. No more and certainly not less. I don't call this "worthless".

“and on the other hand you try to give special merits to Lame because you think we can trust Lame more. This simply isn't fair. And it's even a bad argument, cause Lame 3.98 isn't Lame 3.97 and when going back in time we had significant changes in Lame technology when looking at the Lame history. ”
This argument looks dishonest to my eyes. LAME 3.98 is an improvement, not a radically different piece of code. A new release won't break the confidence people have on an encoder just because parts of the code changed. People trust LAME in general, Vorbis in general, MPC, FLAC, x264, Xvid in general... and not a single and past version of it. LAME is trustable since years ; LAME 3.98 quality didn't start from scratch ; with no surprise several people are trusting and using the last version of the encoder. HELIX/Real wasn't trustable for years, and I don't see giving a special merit to LAME when I say that a single listening test won't make Helix as trustworthy as LAME considering the different history they have.

“Moreover what is this trust in Lame good for if for instance with Lame 3.97 the 'sandpaper problem' came up”
I case you forgot it, the sandpaper issue occured on very specific occasions and the overall progress of LAME 3.97 over 3.96 was massive enough (specially with VBR at mid -bitrate range) to prefer that most recent version. I've posted several listening tests on LAME 3.97 beta few years ago (in which the artefact you described was discovered).

“b) the detailed outcome of the encoders on the individual samples gives some hints which encoder to use:

b1) iTunes and Lame 3.97 aren't attractive candidates for encoding ”


So long on HA.org and still unable to read a listening test?!
ALL ENCODERS ARE TIED. HELIX is as good as iTunes according to this test. If you refuses it then you're implicitly admitting some limitation of collective listening tests.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 17:38:05
The people interested, here are the samples (don't know how long they will be available since my account expires on December 1st):

http://rapidshare.com/files/167638538/Sample01.zip (http://rapidshare.com/files/167638538/Sample01.zip)
http://rapidshare.com/files/167638513/Sample02.zip (http://rapidshare.com/files/167638513/Sample02.zip)
http://rapidshare.com/files/167638551/Sample03.zip (http://rapidshare.com/files/167638551/Sample03.zip)
http://rapidshare.com/files/167638514/Sample04.zip (http://rapidshare.com/files/167638514/Sample04.zip)
http://rapidshare.com/files/167638550/Sample05.zip (http://rapidshare.com/files/167638550/Sample05.zip)
http://rapidshare.com/files/167638524/Sample06.zip (http://rapidshare.com/files/167638524/Sample06.zip)
http://rapidshare.com/files/167638545/Sample07.zip (http://rapidshare.com/files/167638545/Sample07.zip)
http://rapidshare.com/files/167638522/Sample08.zip (http://rapidshare.com/files/167638522/Sample08.zip)
http://rapidshare.com/files/167638544/Sample09.zip (http://rapidshare.com/files/167638544/Sample09.zip)
http://rapidshare.com/files/167638527/Sample10.zip (http://rapidshare.com/files/167638527/Sample10.zip)
http://rapidshare.com/files/167638543/Sample11.zip (http://rapidshare.com/files/167638543/Sample11.zip)
http://rapidshare.com/files/167638554/Sample12.zip (http://rapidshare.com/files/167638554/Sample12.zip)
http://rapidshare.com/files/167638529/Sample13.zip (http://rapidshare.com/files/167638529/Sample13.zip)
http://rapidshare.com/files/167638525/Sample14.zip (http://rapidshare.com/files/167638525/Sample14.zip)

Or an all-in-one ZIP from kwanbis:

http://www.megaupload.com/de/?d=13B7NWEP (http://www.megaupload.com/de/?d=13B7NWEP)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-26 17:50:55
I've never seen such a hot debate. It's pretty cool. Perhaps the discussion of how things were in the past and we were there and saw LAME crawl to its majesty is useless now.Helix hasn't been tuned in 3 full years.
LAME latest tuning is from the last 3 months.
Helix is encoding at 90x. I got 30x in my PC probably because of hardware limitations. But it's OK.
LAME encoding here is no more than 12x. And this bothers me.
Helix performed a bit better than LAME in this test.
LAME is showing weaknesses at 128 kbps (this could be with this set of samples, we don't know)

This is just the tip of the iceberg that already started to bother the crowd.

Can you imagine if Helix had been developed and tuned? Would it have been surpassed LAME in light-years?

I guess this discussion does not end here, I see a lot of analytical people trying to make a point. Everyone's got their point, and I think we should deepen this investigation, make another test, perhaps a different listening test with more people and a vast amount of samples to "end this discussion".
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Big_Berny on 2008-11-26 17:58:39
So long on HA.org and still unable to read a listening test?!
ALL ENCODERS ARE TIED. HELIX is as good as iTunes according to this test. If you refuses it then you're implicitly admitting some limitation of collective listening tests.

Well to be 100% correct you can't say that HELIX is as good as iTunes. This is not 'proven' by the test (you'd have to test the beta error instead the alpha error). But since the differences between the two aren't significant you also can't say that HELIX is better as the difference MAY BE (!) random.

So what we can say (as conservative scientists) is: We can't be sure that there's a difference in quality between the different encoders. Nothing more.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: null-null-pi on 2008-11-26 17:59:57
yay, this is exciting!!
seems like i'll have to run a few tests myself since i didn't check on FhG or Helix for a very long time. and it also seems like i underestimated their performance/progress in development...
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-26 18:13:06
Helix performed a bit better than LAME in this test.
No, it didn't.

LAME is showing weaknesses at 128 kbps (this could be with this set of samples, we don't know)
How so?

To point out the impotency of your analysis, based on Sebastian's colored graph, Lame 3.97 performed the best on the greatest number of samples (it appears to be tied with Fraunhofer on sample 10).  The point is that you have to look at the totality of the test and understand something about statistics.  Those vertical bars in the chart summarizing the results are there for a reason and it appears that you have no idea how to interpret them.

Would it (Helix) have been surpassed LAME in light-years?
Quite possibly not.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Soap on 2008-11-26 18:13:14
and I think we should deepen this investigation, make another test, perhaps a different listening test with more people and a vast amount of samples to "end this discussion".

We?  Let us give Sebastian Mares credit - this was largely a one-man show. 
We?  How about you?  Organize a new test if you like.  Don't beg the collective audience to do the work for you.
More People?  I'm sure if Sebastian had a magic wand there would have been more people involved - but even with a Slashdotting and repeated extensions there were only a limited number of participants.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Dingo_RG on 2008-11-26 18:23:50
Neasden said:

"Helix hasn't been tuned in 3 full years"

"Can you imagine if Helix had been developed and tuned? Would it have been surpassed LAME in light-years?"

-----------------------------------------------------------

Excellent point, exactly my thoughts...

With the results from the test anyone could conclude that in general, Helix is a good encoder, performing excellent there...

There are two main flaws in Helix encoder, one regarding to audio quality with metal music; and the other regarding to gapless.

Well, Helix is open source... there is a good challenge for the software developers and beta testers from HA to fix these two issues and tuning Helix to its maximum capacity.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-26 18:39:35
So what we can say (as conservative scientists) is: We can't be sure that there's a difference in quality between the different encoders. Nothing more.

Exactly. Or at "We can't be sure that there's a difference in quality between the different encoders for this set of samples and for the participants etc..."

To put the debate on statistical difference and on the practical side of the graph, I created a fake one in which I add as competitor a lossless encoding. It's not quite perfect as the confidence error margin would change a bit but I don't think a true graph would really look different:

(http://img230.imageshack.us/img230/9895/resultsz1my0.th.png) (http://img230.imageshack.us/my.php?image=resultsz1my0.png)

LAME 3.98 and Helix are statistically tied to any lossless format.
What's the point of this? Simply imagine any lossy competitor at higher bitrate (it could be LAME -V2, MPC standard or any other idol): it would appear on this graph a bit below my virtual lossless contender. Then what should people conclude? That Helix ~130 kbps is as good as LAME ~200 kbps but is also much faster and much smaller. What the hell LAME developers did during these years? Why people on HA.org are so conservative and don't immediately switch to this encoder which even competes with lossless.

Am I clear enough? The first, immediatate and indubitable conclusion of this test was first made by Sebastian Mares: it's the last time MP3 at 128 kbps will be tested (by him). They're too close to transparency to reach other conclusions. From this test you can build the most foolish recommandations, including that LAME and Helix are a substitute to any lossless format. This is what the test would say. I'm not caricaturing things and it's not even an aberration: the evidence that a group of listeners would be OK with several MP3 implementation at around 130 kbps is there. I find it nice – much nicer than the useless debate below. It's not the conclusion I dreamt about but I would thank Sebastian to bring HA.org (which is sometimes a bit elitist) to a conclusion million people reached by themselves in the world. MP3 at 128 kbps is often good even with the fastest encoders.

Now individual users are different from a group and people won't replace their lossless collection by an helix or lame at 130 kbps just because a test said it's safe to do it. We don't blindly obey to listening tests.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: [JAZ] on 2008-11-26 18:42:30
Can you imagine if Helix had been developed and tuned? Would it have been surpassed LAME in light-years?


* Helix (including its former and current incarnation) has been developed for around 10 years (ok, make that 7~8 if we accept that the last modification was in 2005).

* Helix has always been developed by companies, and full-time workers.

* The original goal of Xing (helix's parent) was speed. Back then, the claim was: "it is 8 times faster than current encoders". And it was true!

* During the later days of Xing, development focused on quality (moved from i/s stereo to m/s stereo, allowed full bandwidth encoding instead of usually filtering at around 16Khz, improved in the VBR department..)

* When Helix was born, as part of a whole new attempt of Real Networks to embrace the open source community (Helix DNA, Helix server, Helix player... ), the Helix mp3 encoder was further tuned and developed with quality in mind, while preserving its speed (For Real it was good to have a fast encoder).


In constrast:

* LAME has been developed for 10 years.

* LAME's development has always been a work of volunteers, sometimes, a single person.

* The original's (1.0) original goal of LAME was to be an mp3 encoder for the Amiga pc's. That implied speed.
The actual original (2.0) goal of LAME is quality.
As such, LAME was based on the official dist10 reference MP3 encoder, and improving the methods as to get a better quality.
This got further remarked when LAME developers tuned the encoder using fraunhofer's output as reference.

* LAME has always received both, speed and quality improvements, taking quality as most important. GOGO took speed as most important.

* During the last years of LAME development, the changes have been focused on new models, tweak behaviour shown in certain killer samples and overall standards compliance. This translates that in fact, the development didn't advance much, but it did, as the test shows.


In the end, it is not strange for me to see Helix's behaviour. I may have found strange that Itunes showed Helix's behaviour.

About the test results, I will just repeat what's the consensus: They are tied. Closed point.
All them have weaker and stronger areas.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: lvqcl on 2008-11-26 18:47:41
If the difference between samples where Lame 3.98 scored consistently and significantly higher than Lame 3.97 was due to a known defect of Lame 3.97 that has been corrected in Lame 3.98 ("sandpaper problem"), then possibly.  Perhaps a class of samples exist that show weaknesses new to Lame 3.98.


I remembered this thread: Low Bitrate VBR sounds worse in 3.98 (http://www.hydrogenaudio.org/forums/index.php?showtopic=67041&hl=). Maybe workaround for the "sandpaper problem" results in serious bitrate bloat for some tracks at -V7...-V9.
(Edit: punctuation.)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-26 18:51:31
halb27 I will stick with you when you say "some people like defending LAME"... even with a bit of irritation and anger... I have nothing to add, like I am not saying Helix is BETTER than LAME, I didn't say that... but the numbers are there, and I am gonna stick with the numbers. You can't tell against the numbers. If you use 100 samples in a new test, it will come out the same.

Quote
How so?

To point out the impotency of your analysis, based on Sebastian's colored graph, Lame 3.97 performed the best on the greatest number of samples (it appears to be tied with Fraunhofer on sample 10). The point is that you have to look at the totality of the test and understand something about statistics. Those vertical bars in the chart summarizing the results are there for a reason and it appears that you have no idea how to interpret them.


If that graph is so much misleading, why is it even done in a listening test this way?
I think you have a better proposal for graphics that will not mislead people to interpret these same graphics, because it wouldn't be only me the owner of a "impotent analysis".

Quote
We? Let us give Sebastian Mares credit - this was largely a one-man show.
We? How about you? Organize a new test if you like. Don't beg the collective audience to do the work for you.
More People? I'm sure if Sebastian had a magic wand there would have been more people involved - but even with a Slashdotting and repeated extensions there were only a limited number of participants.


I didn't ask anyone for anything.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-26 18:55:18
I am gonna stick with the numbers
No one is defending Lame it's just that you don't know how to read the numbers.

If you use 100 samples in a new test, it will come out the same.
That's the most absurd thing I've read so far.  You really have no idea what you're talking about.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-26 19:01:28
Quote
No one is defending Lame it's just that you don't know how to read the numbers.


I hope more people agree with you.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-26 19:02:04
@Neasden

Sebastian's results page is unfinished. There will be comments on the right side of each graph to help people to interpret correctly the results.
Example: http://www.listening-tests.info/mf-64-1/results.htm (http://www.listening-tests.info/mf-64-1/results.htm)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Soap on 2008-11-26 19:29:02
Two theory questions:
1 - If I were to take the raw data - and throw out all responses where the tested did not correctly identify the low anchor how "valid" would the numbers be?
2 - Why did this listening test not also include a high anchor?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-26 19:37:12
High anchor becomes pointless when some competitors results are expected at ~4.5/5 (i.e. close to transparency). It also makes the test heavier to perform for each listeners and if I'm not wrong it should bring additionnal statistical noise (i.e. longer error bar) to final results.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-26 19:57:26

I have posted some ABX logs and samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=67548) of tracks that shows Helix's major flaws.

We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.


Also IgorC, shawdowking and kornchild2002, noticed that Helix does perform very poorly with metal music.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: level on 2008-11-26 20:20:40


I have posted some ABX logs and samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=67548) of tracks that shows Helix's major flaws.

We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.


Also IgorC, shawdowking and kornchild2002, noticed that Helix does perform very poorly with metal music.


Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Your ABX tests only show that there exists a problem with metal music in the 128kbps area, and nothing more... suggesting that in general Helix is poorly with metal music by a few tests in the 128kbps area doesn't make any sense.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 20:23:06
Two theory questions:
1 - If I were to take the raw data - and throw out all responses where the tested did not correctly identify the low anchor how "valid" would the numbers be?


I am not quite sure. I was also thinking about doing that, but this has to be made clear before the test starts, otherwise people might blame me for selecting only the results I like afterwards. Also, how do you know if for a person, the low anchor didn't really sound better? Let's say an encoder fails on a certain sample and introduces a lot of ringing while the low anchor simply lowpasses at let's say 14 KHz. For that person, the lowpass might be more acceptable than the ringing.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-26 20:35:23
Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Yes, this is a 128kbps listening test. Anyway, LOW bitrates for me are 96kbps and less.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-26 20:47:49
Quote from:  link=msg=601472 date=0
...All them have weaker and stronger areas.

That's the key result of the test, and we can learn about the encoders' strengths and weaknesses when loooking at the encoders' results on the individual samples. There is a chance that users can get significant encoder differentiation for his individual needs.
Looking only at the results averaged over all the samples is pretty pointless, especially when results are tied, and it looks like some members have only this in mind.
It's a matter of fact that it's useful to base encoder decision not only on encoders' performance on favorite genres in this test, but it's a starting point, and if someone doesn't want to dig deeper, it's his decision and he still gets some though a bit poor decision basis from the test when looking at the results of his favorite genres.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sld on 2008-11-26 21:03:34
Quote
No one is defending Lame it's just that you don't know how to read the numbers.


I hope more people agree with you.

Please show a little more respect for and comprehension of math and statistics, especially on this forum. If you can come up with new math and statistical equations to overthrow the current system, sure, please do so.

I doubt you can, and I suggest you learn from those who are trying to teach knowledge  Making improvements in audio encoding goes far beyond walking around spouting placebo and unsubstantiated declarations.
The confidence interval bars are about as useless as kidneys are to a human.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: level on 2008-11-26 21:15:27

Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Yes, this is a 128kbps listening test. Anyway, LOW bitrates for me are 96kbps and less.

Well, that is your personal opinion...

For me, and many others 128 kbps is low, when we do reference to transparency and Hi-Fi audio. Even 160 kbps and many 192 kbps is very easy to ABX in many cases.

My point is that making statements about that Helix is IN GENERAL poor in metal music only for a few tests in the 128 kbps area is irresponsible, because Helix performs well in metal music in high bitrates (equivalent in bitrate size to Lame -V3 or more).

Obviously, if Helix will be tuned in the 128kbps area for performing better with metal music, then obviously, the performance of Helix with metal music in high bitrates could be even better than now... But we all know that this is not the interest from HA.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: rednyrg721 on 2008-11-26 21:16:10
The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

That's an interesting question. I would ponder on it a bit ;-) The future of listening tests probably lies in some kind of online database of samples intercrossed with database of 'encoder+settings' sets. The tester probably would request a set of n samples and m 'encoder+settings' sets, that will end in downloading n*m lossless samples (or even some kind of online testing, though i think it would need too high internet speed). For security reasons probably some kind of cryptographic signatures of files would be needed, in tags for example. Then some kind of future ABC/HR utility will make the test itself (checking signatures of files to prevent renaming/mangling of files) and will send results back with tester ID attached (if tester can't ABX encoded sample from original it should be rated with 5 automatically). Then the results will be added to online database of results signed with tester ID. So such listening tests database should grow by itself, and eventually a number of test results for each pair (sample, encoder+settings) should be significant enough. The plus of it all will be each tester can select samples/encoders he are interested in, and users which are interested in results can get for example a 'virtual' listening test for a selected set of encoders on a selected subset of samples (a kind of music they listen to).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sld on 2008-11-26 21:19:15
Well, that is your personal opinion...

Probably the personal opinion of the entire iPod market, too. The ones who just don't give a heck about hi-fi, or have spoilt their ears enough not to give it. Just a small reminder.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 21:50:05
Full results are online now. Hopefully I didn't miss anything.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-26 22:04:18
Quote
Please show a little more respect for and comprehension of math and statistics, especially on this forum. If you can come up with new math and statistical equations to overthrow the current system, sure, please do so.

I doubt you can, and I suggest you learn from those who are trying to teach knowledge  Making improvements in audio encoding goes far beyond walking around spouting placebo and unsubstantiated declarations. The confidence interval bars are about as useless as kidneys are to a human.


Yes Sir!

Quote
The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

Exactly... and I didn't see anything near that.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: lvqcl on 2008-11-26 22:23:51
Full results are online now. Hopefully I didn't miss anything.


#3 on picture: "linchpin". On text: san francisco bay = sfbay.

"linchpin" should be #11. And IIRC this sample from Fear Factory -- Digimortal.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DigitalDictator on 2008-11-26 22:27:01
Quote

Please show a little more respect for and comprehension of math and statistics, especially on this forum. If you can come up with new math and statistical equations to overthrow the current system, sure, please do so.

I doubt you can, and I suggest you learn from those who are trying to teach knowledge  Making improvements in audio encoding goes far beyond walking around spouting placebo and unsubstantiated declarations. The confidence interval bars are about as useless as kidneys are to a human.


Yes Sir!


Quote
The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

Exactly... and I didn't see anything near that.


Yes Sir! Haha! Too funny 

There is nothing wrong with the test itself really, it's just that the encoders are very equal, that's all. No test in the world would point out a clear winner in this case. If it did, then that test would be f*cked up.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-26 22:34:50
...There is a chance that users can get significant encoder differentiation for his individual needs....

I was a a bit over-optimistic here. I just figured out what the detailed results mean for my prefered genres. As for these I can focus on samples 1-8, 10, 13, with special emphasis on samples 3, 4, 7, and 10.
I just looked at what comes out when concentrating on Lame 3.98, FhG, and Helix results. Well, a real differentiation of the results isn't possible even when concentrating on 'my' music. (I will give FhG a try nonetheless, but there's no foundation from the test, just personal decision).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-26 22:45:20

Full results are online now. Hopefully I didn't miss anything.


#3 on picture: "linchpin". On text: san francisco bay = sfbay.

"linchpin" should be #11. And IIRC this sample from Fear Factory -- Digimortal.


Thanks! Fixed.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: guruboolez on 2008-11-26 22:50:46
There is a chance that users can get significant encoder differentiation for his individual needs.

A very small chance. What conclusion would someone draw when encoder X fail on one sample of genre Y? LAME 3.97 didn't perfom extremely well on castanets.wav. Should I conclude that it's less suitable on spanish music than other encoders? Anyway, where's exactly the problem: on guitar? on castanets? on background noise? on lowest part of this dynamic sample? Should guitar lovers conclude anything about this sample if the problem only lies on the noisy background? A musical genre doesn't make a problem genre. I hope I'm clear enough.

Nobody could seriously make any conclusion by generalizing the performance you get of one single sample. An encoder could be perfectly transparent on a metal sample in a listening test and suffering from several flaws easily audible in the full Iron Maiden discography. Conclusion = 0. And on the opposite an encoder could reveal a strong artefact which may appear as totally isolate and not reproducible. Same thing: conclusion = 0.
To achieve a strong conclusion about musical genre/taste, you need dozens of samples from the same category and see the global performance... which means a listening test entireley dedicated to a specific genre (and several listeners interested in such test).

Quote
Looking only at the results averaged over all the samples is pretty pointless, especially when results are tied, and it looks like some members have only this in mind.

Pointless? I find a bit strange that you blamed me few hours ago for "killing" the interest of this test and that now you're saying that looking on overall conclusion is “pointless”? 
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-27 00:34:54
I have posted some ABX logs and samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=67548) of tracks that shows Helix's major flaws.
We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.

Also IgorC, shawdowking and kornchild2002, noticed that Helix does perform very poorly with metal music.

Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Your ABX tests only show that there exists a problem with metal music in the 128kbps area, and nothing more... suggesting that in general Helix is poorly with metal music by a few tests in the 128kbps area doesn't make any sense.

I tried a few metal tracks with Helix at V100 "-X2 -U2 -V100 -HF", and it still sucks at metal.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:39:50

File A: C:\Temp\Helix Mp3\V100\Fear Factory - Demanufacture\04. Replica.mp3
File B: C:\Rips\Fear Factory - Demanufacture\04. Replica.flac

23:39:50 : Test started.
23:40:00 : 01/01  50.0%
23:40:11 : 02/02  25.0%
23:40:19 : 03/03  12.5%
23:40:28 : 04/04  6.3%
23:40:34 : 05/05  3.1%
23:40:40 : 06/06  1.6%
23:40:46 : 07/07  0.8%
23:40:56 : 08/08  0.4%
23:41:02 : 09/09  0.2%
23:41:09 : 10/10  0.1%
23:41:35 : 11/11  0.0%
23:41:41 : 12/12  0.0%
23:41:43 : Test finished.

 ----------
Total: 12/12 (0.0%)

Splashy drums at the start, but a huge improvement over V60.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:43:32

File A: C:\Temp\Helix Mp3\V100\Fear Factory - Digimortal\05. Linchpin.mp3
File B: C:\Rips\Fear Factory - Digimortal\05. Linchpin.flac

23:43:32 : Test started.
23:43:43 : 01/01  50.0%
23:43:48 : 02/02  25.0%
23:44:00 : 03/03  12.5%
23:44:06 : 04/04  6.3%
23:44:13 : 05/05  3.1%
23:44:26 : 06/06  1.6%
23:44:33 : 07/07  0.8%
23:44:41 : 08/08  0.4%
23:44:51 : 09/09  0.2%
23:44:56 : 10/10  0.1%
23:45:07 : 11/11  0.0%
23:45:15 : 12/12  0.0%
23:45:16 : Test finished.

 ----------
Total: 12/12 (0.0%)

Horrid warbling from the start, still sounds really bad.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:48:25

File A: C:\Temp\Helix Mp3\V100\Metallica - Ride The Lightning\05. Trapped Under Ice.mp3
File B: C:\Rips\Metallica - Ride The Lightning\05. Trapped Under Ice.flac

23:48:25 : Test started.
23:48:40 : 01/01  50.0%
23:48:53 : 02/02  25.0%
23:49:08 : 03/03  12.5%
23:49:15 : 04/04  6.3%
23:49:30 : 05/05  3.1%
23:49:34 : 06/06  1.6%
23:49:52 : 07/07  0.8%
23:49:59 : 08/08  0.4%
23:50:14 : 09/09  0.2%
23:50:22 : 10/10  0.1%
23:50:28 : 11/11  0.0%
23:50:35 : 12/12  0.0%
23:50:36 : Test finished.

 ----------
Total: 12/12 (0.0%)

Warbling on the gutiar is still there, but a very good improvement over V60.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/27 00:14:36

File A: C:\Temp\Helix Mp3\V100\Metallica - Ride The Lightning\04. Fade To Black.mp3
File B: C:\Rips\Metallica - Ride The Lightning\04. Fade To Black.flac

00:14:36 : Test started.
00:15:25 : 01/01  50.0%
00:15:31 : 02/02  25.0%
00:15:43 : 03/03  12.5%
00:15:51 : 04/04  6.3%
00:15:59 : 05/05  3.1%
00:16:06 : 06/06  1.6%
00:16:14 : 07/07  0.8%
00:16:25 : 08/08  0.4%
00:16:38 : 09/09  0.2%
00:16:48 : 10/10  0.1%
00:16:51 : Test finished.

 ----------
Total: 10/10 (0.1%)

Drum smear and warbling on gutiars.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:58:44

File A: C:\Rips\Ministry - Rio Grande Blood\05. Lieslieslies.flac
File B: C:\Temp\Helix Mp3\V100\Ministry - Rio Grande Blood\05. Lieslieslies.mp3

23:58:44 : Test started.
23:59:35 : 01/01  50.0%
23:59:45 : 02/02  25.0%
23:59:57 : 03/03  12.5%
00:00:08 : 04/04  6.3%
00:00:23 : 04/05  18.8%
00:00:31 : 05/06  10.9%
00:00:40 : 06/07  6.3%
00:00:46 : 07/08  3.5%
00:00:54 : 08/09  2.0%
00:01:01 : 09/10  1.1%
00:01:12 : 10/11  0.6%
00:01:19 : 11/12  0.3%
00:01:24 : 12/13  0.2%
00:01:28 : 13/14  0.1%
00:01:37 : 14/15  0.0%
00:01:46 : 15/16  0.0%
00:01:53 : 16/17  0.0%
00:01:55 : Test finished.

 ----------
Total: 16/17 (0.0%)

A precho at around 0:27 - 0:29, plenty of gutiar warbling at the near end of the track.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-27 00:53:49
Is this basically the end of mp3 by reaching its maximum level of quality being that all codecs are tied?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: level on 2008-11-27 02:42:50
@/mnt

Please; Could you perform the test again with the same samples?

but now with this command line:

-V122 -X2 -HF2 -SBT500 -TX0 -C0

Copy and paste EXACTLY as this is written.

Thanks.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Steve999 on 2008-11-27 03:00:35
Full results are online now. Hopefully I didn't miss anything.


Thanks so much everyone!  Sebastian, I sympathize with you, after putting forth the effort and conducting a very nice test, there are so many arguments!  I had agonized over whether the switch from Lame 3.97 to Lame 3.98 was worth it, and the answer appears to be yes, quite likely, very much so, for some types of samples.  So I will keep using lame 3.98.  Thanks to all of the lame developers for your good work.

The helix results look very interesting.  I am not much of an expert.  I downloaded the hellix encoder from rarewares, but I don't see a guide on how to set the switches.  I like to try new things just for fun.  I'm odd that way.

SO, if someone could help me, if I want to use helix in the low-to-mid 200 kbps range (which is what I use nowadays since memory and hard disk space are so much cheaper), what would be the standard-issue switch settings?  I assume I could use it with EAC (which is what I use)?  Is there a link to a guide for setting the switches?  I can't find the documentation.  For me a decent encoder at this range is fine.  I'm interested in the speed and trying something new.

And most of all, thank you SO MUCH to everyone who participated in setting up and conducting the testing.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-27 03:13:17
I had agonized over whether the switch from Lame 3.97 to Lame 3.98 was worth it, and the answer appears to be yes, quite likely, very much so, for some types of samples.

There really isn't much legitimate to argue about.  The problem is that people seem to want to extrapolate the results for a single sample to the entire genre of that sample; and it extends beyond this particular thread.  Without conducting tests on additional samples and showing correlation such claims simply aren't credible.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alexxander on 2008-11-27 08:45:32
Is this basically the end of mp3 by reaching its maximum level of quality being that all codecs are tied?

Define "end of mp3". Judging from this listening test clearly there's room for quality improvement as some encoders perform clearly better on some samples and worse on others.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Alex B on 2008-11-27 09:29:04
Judging from this listening test clearly there's room for quality improvement as some encoders perform clearly better on some samples and worse on others.

That is one of the reasons I started the sample specific threads. I hope the threads would help the developers (mainly the LAME developers who are active here) to better understand what kind of problems the testers noticed.

I have added new threads for the samples #3 and #4. More is coming.
http://www.hydrogenaudio.org/forums/index.php?showforum=40 (http://www.hydrogenaudio.org/forums/index.php?showforum=40)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-11-27 09:39:45
Is this basically the end of mp3 by reaching its maximum level of quality being that all codecs are tied?
You seem to enjoy throwing inflammatory one-liners into this thread.  Nice avatar BTW.

Thanks so much everyone! Sebastian, I sympathize with you, after putting forth the effort and conducting a very nice test, there are so many arguments!
I suspect that Sebastian is very pleased about the debate that his test has created.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-27 10:02:43

There is a chance that users can get significant encoder differentiation for his individual needs.

A very small chance. ...

You're right, as you can read in my last post.
BTW I guess I can see what our differences are about: I care about other things than you do. For instance it's not a question to me whether encoder X is better than encoder Y, in the first place because of the difficulty in the meaning of 'better'. I think most members here including you are happy when there would have been a good ordering in overall results with well-separated confidence intervals. While in a formal sense this is meaningful (also to me) this is not the most important thing to me.
If for instance the best encoder in this sense would have had modest scores on samples 3, 4, 7, 10 (those samples with very special importance to me - see my last post) this encoder would not be interesting to me.
In the end it comes to that I like only results on those samples I care about (1-8, 10, 13, with special emphasis on 3, 4, 7, 10), and I'd like to see results which are pretty close to 5 (cause only in this cases there is a strong agreement among all the participants that the outcome of the particular encoder on this sample is good). Sure 'pretty close to 5' is a weak statement, but at 128 kbps I can't expect to give a strong formulation in case I want to get some answer.
And I do want an answer cause the question about saving storage space has arrived at me as I plan to use a Meizu M6 SL DAP (guess I'll get it for Xmas) which unfortunatley has only 8 GB. This test came in quite handy, and I was surprised I could only ABX pretty few encoders on pretty few samples (not mentioning the very low anchor). So my very high quality demands I had so far (and which I could easily have with my 40 GB iRiver H140) are inappropriate. So I take these test results as a starting point for my encoder choice, I have to reconsider those problem samples I care about (I still do, but I will drop some of minor practical significance to me and will be content when an encoder gets at the non-obvious issue level status), and once I have found a good candidate based on this (shouldn't be too hard), I will do intensive listening tests with 'normal' music of my favorite kind.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Sebastian Mares on 2008-11-27 10:04:23
Thanks so much everyone! Sebastian, I sympathize with you, after putting forth the effort and conducting a very nice test, there are so many arguments!
I suspect that Sebastian is very pleased about the debate that his test has created.


Of course. Otherwise, the test would be pretty much pointless and would give me the feeling that nobody cares.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: /mnt on 2008-11-27 10:48:35
@/mnt

Please; Could you perform the test again with the same samples?

but now with this command line:

-V122 -X2 -HF2 -SBT500 -TX0 -C0

Copy and paste EXACTLY as this is written.

Thanks.

Since am busy today, i can only do a few tests.

Anyway, did a few tracks.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/27 10:29:05

File A: C:\Temp\Helix Mp3\V122\Metallica - Ride The Lightning\04. Fade To Black.mp3
File B: C:\Rips\Metallica - Ride The Lightning\04. Fade To Black.flac

10:29:05 : Test started.
10:30:15 : 00/01  100.0%
10:30:29 : 01/02  75.0%
10:30:33 : 02/03  50.0%
10:30:39 : 03/04  31.3%
10:30:45 : 04/05  18.8%
10:30:50 : 05/06  10.9%
10:30:55 : 06/07  6.3%
10:31:00 : 07/08  3.5%
10:31:05 : 08/09  2.0%
10:31:10 : 09/10  1.1%
10:31:14 : 10/11  0.6%
10:31:22 : 11/12  0.3%
10:31:49 : 12/13  0.2%
10:31:54 : 13/14  0.1%
10:32:06 : 14/15  0.0%
10:32:08 : Test finished.

 ----------
Total: 14/15 (0.0%)

Warbling is almost gone at 3:55, but i think i spotted a preecho though after the drum snare. But alot better.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/27 10:32:57

File A: C:\Rips\Metallica - Ride The Lightning\05. Trapped Under Ice.flac
File B: C:\Temp\Helix Mp3\V122\Metallica - Ride The Lightning\05. Trapped Under Ice.mp3

10:32:57 : Test started.
10:33:17 : 01/01  50.0%
10:33:24 : 02/02  25.0%
10:33:31 : 03/03  12.5%
10:33:50 : 04/04  6.3%
10:34:01 : 05/05  3.1%
10:34:07 : 06/06  1.6%
10:34:22 : 07/07  0.8%
10:34:31 : 08/08  0.4%
10:34:39 : 09/09  0.2%
10:34:47 : 10/10  0.1%
10:34:55 : 11/11  0.0%
10:34:56 : Test finished.

 ----------
Total: 11/11 (0.0%)

Gutiar warbling from the start is still there, but alot better.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/27 10:36:14

File A: C:\Rips\Fear Factory - Digimortal\05. Linchpin.flac
File B: C:\Temp\Helix Mp3\V122\Fear Factory - Digimortal\05. Linchpin.mp3

10:36:14 : Test started.
10:36:30 : 01/01  50.0%
10:36:37 : 02/02  25.0%
10:36:48 : 03/03  12.5%
10:36:58 : 04/04  6.3%
10:37:05 : 05/05  3.1%
10:37:12 : 06/06  1.6%
10:37:20 : 07/07  0.8%
10:37:27 : 08/08  0.4%
10:37:32 : 09/09  0.2%
10:37:40 : 10/10  0.1%
10:37:48 : 11/11  0.0%
10:37:56 : 12/12  0.0%
10:37:57 : Test finished.

 ----------
Total: 12/12 (0.0%)

Hardly any improvement with sample 11, warbling and smearing.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: sld on 2008-11-27 11:32:41
Maybe LAME 3.98.2 is far into the realm of diminishing returns at this stage (for 128 kbps), when taking into account the tradeoff in encoding speed.

Does LAME have to trade its quality to reach Helix's speed? Does Helix have to trade its speed for LAME's quality (consistency and less metal artifacting)?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DigitalDictator on 2008-11-27 12:44:10
Quote
-V122 -X2 -HF2 -SBT500 -TX0 -C0

where did you get that command line from? I've been asking about this before, we should explore the existing switches to come up with the best possible. It's the best we can do, since tuning Helix doesn't seem like an option.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-27 13:13:35
Two things to consider is that this where all problem samples and that people where paying attention as much as they can to hear errors.

I think in normal conditions, people would not be so picky when listening.

Like, when i'm listening to music and working, i don't pay not half the attention to the music.

Same if i'm in a meeting with friends, and we have background music.

I think in "reality" , encoders sound better than on tests like this one.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Polar on 2008-11-27 15:59:54
The people interested, here are the samples (don't know how long they will be available since my account expires on December 1st):

http://rapidshare.com/files/167638538/Sample01.zip (http://rapidshare.com/files/167638538/Sample01.zip)
http://rapidshare.com/files/167638513/Sample02.zip (http://rapidshare.com/files/167638513/Sample02.zip)
http://rapidshare.com/files/167638551/Sample03.zip (http://rapidshare.com/files/167638551/Sample03.zip)
http://rapidshare.com/files/167638514/Sample04.zip (http://rapidshare.com/files/167638514/Sample04.zip)
http://rapidshare.com/files/167638550/Sample05.zip (http://rapidshare.com/files/167638550/Sample05.zip)
http://rapidshare.com/files/167638524/Sample06.zip (http://rapidshare.com/files/167638524/Sample06.zip)
http://rapidshare.com/files/167638545/Sample07.zip (http://rapidshare.com/files/167638545/Sample07.zip)
http://rapidshare.com/files/167638522/Sample08.zip (http://rapidshare.com/files/167638522/Sample08.zip)
http://rapidshare.com/files/167638544/Sample09.zip (http://rapidshare.com/files/167638544/Sample09.zip)
http://rapidshare.com/files/167638527/Sample10.zip (http://rapidshare.com/files/167638527/Sample10.zip)
http://rapidshare.com/files/167638543/Sample11.zip (http://rapidshare.com/files/167638543/Sample11.zip)
http://rapidshare.com/files/167638554/Sample12.zip (http://rapidshare.com/files/167638554/Sample12.zip)
http://rapidshare.com/files/167638529/Sample13.zip (http://rapidshare.com/files/167638529/Sample13.zip)
http://rapidshare.com/files/167638525/Sample14.zip (http://rapidshare.com/files/167638525/Sample14.zip)

Or an all-in-one ZIP from kwanbis:

http://www.megaupload.com/de/?d=13B7NWEP (http://www.megaupload.com/de/?d=13B7NWEP)
They'll also still be available at
http://listeningtest.vanquickel.be/Sample01.zip (http://listeningtest.vanquickel.be/Sample01.zip)
http://listeningtest.vanquickel.be/Sample02.zip (http://listeningtest.vanquickel.be/Sample02.zip)
http://listeningtest.vanquickel.be/Sample03.zip (http://listeningtest.vanquickel.be/Sample03.zip)
http://listeningtest.vanquickel.be/Sample04.zip (http://listeningtest.vanquickel.be/Sample04.zip)
http://listeningtest.vanquickel.be/Sample05.zip (http://listeningtest.vanquickel.be/Sample05.zip)
http://listeningtest.vanquickel.be/Sample06.zip (http://listeningtest.vanquickel.be/Sample06.zip)
http://listeningtest.vanquickel.be/Sample07.zip (http://listeningtest.vanquickel.be/Sample07.zip)
http://listeningtest.vanquickel.be/Sample08.zip (http://listeningtest.vanquickel.be/Sample08.zip)
http://listeningtest.vanquickel.be/Sample09.zip (http://listeningtest.vanquickel.be/Sample09.zip)
http://listeningtest.vanquickel.be/Sample10.zip (http://listeningtest.vanquickel.be/Sample10.zip)
http://listeningtest.vanquickel.be/Sample11.zip (http://listeningtest.vanquickel.be/Sample11.zip)
http://listeningtest.vanquickel.be/Sample12.zip (http://listeningtest.vanquickel.be/Sample12.zip)
http://listeningtest.vanquickel.be/Sample13.zip (http://listeningtest.vanquickel.be/Sample13.zip)
http://listeningtest.vanquickel.be/Sample14.zip (http://listeningtest.vanquickel.be/Sample14.zip)

To throw in my own 2 perhaps simplistic cents, all contenders yielding a 4.5ish tie, I personally see little interest in endless debates about which encoder comes out best from whichever point of view, as all have once again proven to have bumped into the boundaries of the practically testable.  One of the few, maybe the only, conclusion to draw imho, is that except for killer samples, 128k on any modern codec is no longer interesting to test at this scale.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-27 16:53:25
You seem to enjoy throwing inflammatory one-liners into this thread.  Nice avatar BTW.


Did you find my one-liner inflammatory? I don't think it is. I think perhaps it was a naive question that was interpreted as sarcasm (which is quite used by other members, often unremarked) but I really meant in this way: if there were more room for MP3 improvements, because of the known limitations bounded to the format.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: TechVsLife on 2008-11-27 17:47:04
Does this latest test suggest that the recommended settings for LAME be changed (or any other part of the ha wiki)?  And one specific point: is -v2 still the recommended minimum setting for transparency for most listeners in good listening environments etc., or should that be lower now? 



Thanks for any comments (and thanks to Sebastian Mares for conducting the test).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: gerwen on 2008-11-27 18:04:00
Two things to consider is that this where all problem samples and that people where paying attention as much as they can to hear errors.

I think in normal conditions, people would not be so picky when listening.

Like, when i'm listening to music and working, i don't pay not half the attention to the music.

Same if i'm in a meeting with friends, and we have background music.

I think in "reality" , encoders sound better than on tests like this one.


True enough, but do you really want your encoder to let you down when you do decide to pay specific attention? 

Imo, pick your encoder and bitrate on very critical listening, and tough samples.  At least then you can be reasonably confident that you won't hear artifacts when you do crank it up.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-11-27 18:38:09
  You seem to enjoy throwing inflammatory one-liners into this thread.  Nice avatar BTW.
Did you find my one-liner inflammatory? I don't think it is. I think perhaps it was a naive question that was interpreted as sarcasm (which is quite used by other members, often unremarked) but I really meant in this way: if there were more room for MP3 improvements, because of the known limitations bounded to the format.
Yes, I do see the suggestion that this is the "end of mp3" as inflammatory.  If you say that is not your intention then so be it.  FYI, I have compiled a few more of your statements from this thread as background for my remark, as way of explanation:

Does that make Helix the new recommended MP3 encoder, or has it to be LAME because it's open source?
Edit: Both are open source.
They are all techincally tied, but Helix outperformed all of them. Also, the encoding speed compared to LAME is absurd faster. Could these two arguments qualify Helix for the new recommended MP3 encoder? (LAME being the second recommended)
* Facts
...
Helix performed a bit better than LAME in this test.
I have nothing to add, like I am not saying Helix is BETTER than LAME, I didn't say that... but the numbers are there, and I am gonna stick with the numbers. You can't tell against the numbers.
Is this  basically the end of mp3 by reaching its maximum level of quality being  that all codecs are tied?
I'll say no more on the subject now.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Destroid on 2008-11-27 18:41:51
Interesting results. It appears 128kbps really is enough for general use MP3 encoding after all. I actually stick to lossless, but I was satisfied with CBR 128kbps Helix on the old P3-733 system because it's advantageous speed and acceptable quality.

If anything, these results should help rid those ridiculous threads about customized encoding setting (you know, the ones with "-V 0 -m s -b320 -B320" et cetera).
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-27 18:50:13
Does this latest test suggest that the recommended settings for LAME be changed (or any other part of the ha wiki)?
No.

is -v2 still the recommended minimum setting for transparency for most listeners in good listening environments etc., or should that be lower now?
-V2 is not recommended as the minimum setting for transparency.  If you're talking about the wiki which is intended only to be a general guideline the minimum setting is -V3.  This is not to say that -V4 might also deliver transparency to a large number of people over a large number of tracks, let alone -V5.  Consider that -V5.7 was used in this test for Lame 3.98
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: TechVsLife on 2008-11-27 19:10:16
Thanks. I must have been mistakenly thinking of -v2 as the minimum standard for transparency from some even older recommendation. I'll consider -v3 as a conservative and safe min. standard level for transparency for most (the vast majority?) of listeners under even very good conditions (i.e. quiet, but not abx testing). I understand this is only a very general guideline, and listeners vary, etc.

The reason I mention this test as possibly changing the wiki, is that if the perceived quality of the mp3 encoders at 128kps is now higher than it used to be over a couple of years ago (e.g. getting closer to "5"), then that might shift the recommendations a bit. The graph on the wiki of LAME -v settings against both resulting quality and filesize is very useful; I'm assuming that's still consistent with the latest results.

Scroll down a little from here for the graph:
http://wiki.hydrogenaudio.org/index.php?ti...ate.29_settings (http://wiki.hydrogenaudio.org/index.php?title=LAME#VBR_.28variable_bitrate.29_settings)


-V2 is not recommended as the minimum setting for transparency. If you're talking about the wiki which is intended only to be a general guideline the minimum setting is -V3. This is not to say that -V4 might also deliver transparency to a large number of people over a large number of tracks, let alone -V5. Consider that -V5.7 was used in this test for Lame 3.98
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Neasden on 2008-11-27 22:00:02
Quote
Yes, I do see the suggestion that this is the "end of mp3" as inflammatory.


Sorry if I offended anyone with those one-liners.
Exiting the thread.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: jimmy69 on 2008-11-28 01:48:20
Has anyone else noticed that the itunes encoder has actually performed very well from looking at the results.  I mean all this time members f this forum have strongly recommended against using the mp3 encoder in itunes, only to find that it performed just as well as LAME 3.97, which was considered to be a very good encoder for a long time.  On top of that Apple have now updated itunes with the fix version of there mp3 encoder.  From these results I think it should be ok to say that people who use itunes and would prefer the ease of use in using the built in mp3 encoder, go for it.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-28 08:26:20
... I mean all this time members f this forum have strongly recommended against using the mp3 encoder in itunes ...

Yes, I've never liked tendencies like these on HA (I've never seen strong reason for this), and I like the outcome of this test so everybody can pick his favorite also for non-quality related reasons.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-28 11:42:50
I mean all this time members f this forum have strongly recommended against using the mp3 encoder in itunes,

Probably based on the last mp3 listening test where it SUCKED.

It is really good that now it does not.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: lvqcl on 2008-11-28 16:14:41
I mean all this time members f this forum have strongly recommended against using the mp3 encoder in itunes,

Probably based on the last mp3 listening test where it SUCKED.

It is really good that now it does not.

It is really good that Sebastian have dualcore computer, and Alex B doesn't 
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: singaiya on 2008-11-28 20:02:21


You should have brought in your peers (yourself too) to inflate the sample size (no. of participants), so that the magical black bars decrease in length


That's what I thought happens too, but it seems not to have had an effect: If you look at the first sample which had 39 listeners, the bars are about as long as the second sample which had 26 listeners, and definitely longer than the third sample which also had 26 listeners.


It does have an effect. I never said it is the only thing that influences the error margins.


What are the other factors?


I mean all this time members f this forum have strongly recommended against using the mp3 encoder in itunes,

Probably based on the last mp3 listening test where it SUCKED.


That still seems an overly strong interpretation to me. iTunes did lose to Lame 3.95 and AudioActive, but in the results (http://www.rjamorim.com/test/mp3-128/results.html), Roberto says at the beginning: "I would like to point out two very serious issues with this test: not using the latest version of Xing, bundled with Real Player, that has been reportedly extensively tuned since version 1.5; and forcing VBR on codecs that shouldn't be using them. I'm confident iTunes MP3 would perform better if it was featured at CBR 128, and the same might apply to FhG. I take full responsability on those mistakes, and for them, I apologize."

To me, "sucks" means not only losing the test (without caveats of choosing the wrong setting) but also a score below 3.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: benski on 2008-11-28 20:11:49
That still seems an overly strong interpretation to me. iTunes did lose to Lame 3.95 and AudioActive, but in the results (http://www.rjamorim.com/test/mp3-128/results.html), Roberto says at the beginning: "I would like to point out two very serious issues with this test: not using the latest version of Xing, bundled with Real Player, that has been reportedly extensively tuned since version 1.5; and forcing VBR on codecs that shouldn't be using them. I'm confident iTunes MP3 would perform better if it was featured at CBR 128, and the same might apply to FhG. I take full responsability on those mistakes, and for them, I apologize."

To me, "sucks" means not only losing the test (without caveats of choosing the wrong setting) but also a score below 3.


Helix (somewhat-formerly Xing), iTunes and FhG were all tested with VBR this go-around.  This makes the comparison to that previous test all the more relevant.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: singaiya on 2008-11-28 20:20:43
Good point. I still think "sucks" lines up more with a score of 1, otherwise the wording on ABC/HR rankings should be revised 
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Raiden on 2008-11-28 20:35:24
...the results (http://www.rjamorim.com/test/mp3-128/results.html), ...

The interpretation of that result surprises me. Roberto said "Although iTunes is a little tied with Gogo, it's safe to say it lost." But actually iTunes was tied with FhG, Gogo and Xing, so it wasn't safe to say "it lost".
Also Lame was tied with AActive so "Lame wins, followed by AudioActive" is also not valid.
Or am I misinterpreting the graph?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-28 20:38:31
I think you're right, Raiden.

The iTunes mp3 bashing based on the results of that test has always annoyed me.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-28 22:07:02
(http://www.rjamorim.com/test/mp3-128/plot12z.png)
It was at 3.04, againsts LAME's 3.74. LAME won that one, and iTunes did not offered anything better, so it sucked.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-11-28 23:32:08
Not you too, kwanbis!

The difference between Lame and iTunes in that test could be smaller than 0.3; less than half the amount you're suggesting.  Not to mention the bitrate on samples created with iTunes was consistently and significantly less than Lame.

On the same token, the difference could be greater than 1.1.  Even still, the conclusions people make from that test are annoying. 
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: null-null-pi on 2008-11-29 02:55:53
that's why i'm thinking something like http://www.hydrogenaudio.org/forums/index....mp;#entry601863 (http://www.hydrogenaudio.org/forums/index.php?showtopic=67547&st=0&#entry601863) could be beneficial. i think it could be useful if there was some kind of an instruction like "how to do the test properly".
i know this has already been partially discussed, but i felt like it was a good moment to mention it.
also, everyone participating in such a test should look up information on how to interpret the results (maybe provide a faq or something like that?)

edit: trying to correct errors
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-11-29 03:28:08
Not you too, kwanbis!

The difference between Lame and iTunes in that test could be smaller than 0.3; less than half the amount you're suggesting.  Not to mention the bitrate on samples created with iTunes was consistently and significantly less than Lame.

On the same token, the difference could be greater than 1.1.  Even still, the conclusions people make from that test are annoying.

Point is at that time, even LAME sucked, but it sucked the least. So with all encoders being "ho-hum", what was the point to recommend the worst one? LAME have statistically won, even by 0.3 or 1.1. So, what was the point of recommending iTunes?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: krabapple on 2008-11-29 03:41:32
I think what this thread is telling us is that HA needs a wiki page on statistics....and it should be required reading  before posting about public listening test results, just as we required ABX results for audio claims.

(No, I won't be writing it; I'm not a stats expert. I have a few biostatistics books on my reference shelf, and I know just enough to be suspicious of claims being made here about some codecs being better than others, based on these results.  In other words, I'm with greynol: there is no statistically significant  difference I can see here, for general guidance).

Btw, who here does does have a solid background in statistical analysis?  Just curious.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-29 14:42:06
A more profound knowledge about statistics can help figuring out that it's necessary to obey to the confidence intervals when judging about the overall average outcome.
The deeper problem is: which is the worth of this overall average outcome? We all want to have life easy but IMO the average outcome is considered so important by many people only because it's such a simple scheme.
I personally prefer an outcome that shows that a certain candidate is good or at least not bad on all of the samples tested (or at least those samples that have a personal meaning). I'm well aware that this (or similar schemes) brings some amount of subjectivity (but does not at all make things arbitrary) and brings no general consensus, but in the end deciding on an encoder to use is an indvidual decision.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-30 10:18:05
I wanted to give FhG a try because of this test's outcome, and I wanted to concentrate on the most serious tonal problems I know (herding_calls, trumpet, trumpet_myPrince) which I wanted to have at a non-obvious issue level which should be not a bit annoying when careful listening. And I wanted to stay at around 200 kbps maximum (having to care about file size in the near future).

What I learnt from the test is that I don't have to care about the extreme HF region and can be content with HF up to 16 kHz which is very favorable when using mp3.

As a refererence I used Lame 3.98.2. I figured out that -V0.5 --lowpass 16.0 is the setting which brings the desired quality for me. With my test set of typical regular music the average bitrate for this setting is 205 kbps.

Then I tried FhG surround, but didn't manage to find a competetive setting (even with lowpassing before encoding with one of the higher quality settings). So FhG isn't an alternative to me.

Though I didn't want to pick up again Helix as I decided some time ago not to use it I was curious how Helix behaved. At relatively low bitrate Helix has serious problems with all the samples (especially the 'tremolo' effect with trumpet_myPrince). But from a certain point on there is a strong quality increase. With '-X2 -U2 -SBT500 -TX0 -V110' my quality demands are met. With my typical regular music test set the average bitrate for this setting is 179 kbps. As this is a good result I looked up those samples where in the past I found a subtle issue concerning HF behavior and 'vividness'. I couldn't hear an issue, probably due to my reduced demands (and limited time I wanted to spend for this small test).

Struggling for lower bitrate with Lame I arrived at --abr 200 --lowpass 16.0 which gives an average bitrate of 191 kbps with my typical regular test set and meets my quality demands for these bad tonal problems.
But in this bitrate range and for general usage I prefer VBR, now that I'm happy with Lame's VBR behavior.

I will use Lame -V0.5 --lowpass 16.0 in the future. I have no reason to prefer it over Helix. It's just personal - my emotions are more with Lame than with Helix.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DigitalDictator on 2008-11-30 17:04:22
Quote
'-X2 -U2 -SBT500 -TX0 -V110'

What do these switches do?

I tried to compare the command line used in the test, '-X2 -U2 -V60' with the one suggested for metal,

'-X2 -HF2 -SBT500 -TX0 -C0 -V60'.

I found the latter to be harder to ABX even though I could ABX both (I only tried it on two tracks). Halb27, have you tried that command line?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-11-30 20:22:08
non-audio related switches:
-X2: MPEG compatible Xing Header
-U2: encoding speedup (uses SSE)
-C0: clear copyright bit

audio related switches:
-Vx: VBR quality (range 0-150)
-HF2: makes use of HF > 16 kHz
-SBTy: short block threshold
-TXz: nobody seems to be able to tell about this switch.

level is the one who studied these switches most intensively at VBR settings ~ -V120. As I'm in this bitrate range I simply follow his settings. I did some tests on my own with TX0...TX8 at -V110, and though I think I can hear differences they are so subtle that I personally can't say which one is best. Same goes for -SBT500 or default setting.
I don't use -HF2 as I don't need frequencies beyond 16 kHz.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: pfloding on 2008-12-01 14:18:03
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation. I'm afraid the conclusion about evaluating lower bit rates sounds as if 128 kbps would provide satisfactory sound quality!?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: halb27 on 2008-12-01 14:56:02
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation. I'm afraid the conclusion about evaluating lower bit rates sounds as if 128 kbps would provide satisfactory sound quality!?

I understand your concern about a lower bitrate test, as this is not attractive to me either (though there may be codecs like HE-AAC which may provide for satisfactory sound quality at 96 kbps).
But as for your remark on sound system I am convinced that this is not the problem. I guess many participants are happy with their sound system. At least I am. BTW artifacts can often be heard even with a bad sound system. Why not accept the fact that 128 kbps provides satisfactory sound quality for most users most of the time?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: pfloding on 2008-12-01 15:34:31

Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation. I'm afraid the conclusion about evaluating lower bit rates sounds as if 128 kbps would provide satisfactory sound quality!?

I understand your concern about a lower bitrate test, as this is not attractive to me either (though there may be codecs like HE-AAC which may provide for satisfactory sound quality at 96 kbps).
But as for your remark on sound system I am convinced that this is not the problem. I guess many participants are happy with their sound system. At least I am. BTW artifacts can often be heard even with a bad sound system. Why not accept the fact that 128 kbps provides satisfactory sound quality for most users most of the time?


Well, ok, let me rephrase the question then: What is the point of comparing codecs to each other rather than comparing to the uncompressed reference? Correct me if I'm wrong, but wasn't the test a comparison between encoders? (With the reference still being a low bit rate compressed format.)

Sure, some encoders are better then others for certain things. But certain other things are not even being considered. (Such as the pretty much non-existing ambient reflected sound field.) Without using a proper reference and a high quality audio system, tests will just be nit-picking about the lesser evil influence on this or that musical instrument, but missing entire huge aspects of sound quality!

It would be a good thing to put low bitrate audio quality into some kind of overall quality context. 128 kbps is not CD quality, it's not LP quality, and it's not even close to good Philips compact cassette. It sounds nice in a non-offensive way, which is good. Even on my Nano 128 kbps is clearly a lot worse than 224 kbps.

BTW, I'm not trying to start a flame war, just stating what I think are some truths that seem to be forgotten all the time.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: gerwen on 2008-12-01 15:52:34
Well, ok, let me rephrase the question then: What is the point of comparing codecs to each other rather than comparing to the uncompressed reference? Correct me if I'm wrong, but wasn't the test a comparison between encoders? (With the reference still being a low bit rate compressed format.)

Sure, some encoders are better then others for certain things. But certain other things are not even being considered. (Such as the pretty much non-existing ambient reflected sound field.) Without using a proper reference and a high quality audio system, tests will just be nit-picking about the lesser evil influence on this or that musical instrument, but missing entire huge aspects of sound quality!

I take it you didn't do the test.  You should.  Even though it's already over.  There is a reference lossless (i believe) sample for you to compare each codec against.

Even on my Nano 128 kbps is clearly a lot worse than 224 kbps.

For many (if not most) people, that is simply not true.  Personally I can't tell the difference on 99% of material, with careful listening on a decent set of earphones.  Even for people who can spot the differences, i don't think it is 'clearly a lot worse.'
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Squeller on 2008-12-01 15:55:29
I wanted to give Helix a try for my portable. After a quick test artifacts at a test track (Classical, RVW "Fantasia on Christmas Carols, Hickox (rip), first seconds) disappeared at -V120 - 186 kbps av. Lame perfomed better even at V4 (I usually use V3 at portable listening). --> 162 kbps av. - still no artifacts.

However. I was surprised, Helix encodes at 32x, Lame 16x on my 3ghz intel P4 on one core. Only twice as fast as Lame? For Helix: Are there optimized compiles out there?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Synthetic Soul on 2008-12-01 16:16:01
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation.
Better than what?  You can't possibly know the equipment used by each participant.

As per gerwen's suggestion, it would be great to see some ABX results from you, using the samples tested.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: pfloding on 2008-12-01 16:28:16
Rather than moving to even lower bit rates, I suggest getting a better sound system for evaluation.
Better than what?  You can't possibly know the equipment used by each participant.

As per gerwen's suggestion, it would be great to see some ABX results from you, using the samples tested.


It's true that the systems used are complete unknowns.

I had a look around for the samples, but couldn't locate them at:

http://www.listening-tests.info/mp3-128-1/ (http://www.listening-tests.info/mp3-128-1/)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: lvqcl on 2008-12-01 16:59:55
It's true that the systems used are complete unknowns.

I had a look around for the samples, but couldn't locate them at:

http://www.listening-tests.info/mp3-128-1/ (http://www.listening-tests.info/mp3-128-1/)


You can find links to all samples on previous page 
http://www.hydrogenaudio.org/forums/index....st&p=601657 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=67529&view=findpost&p=601657)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: llama peter on 2008-12-02 03:09:10

So what we can say (as conservative scientists) is: We can't be sure that there's a difference in quality between the different encoders. Nothing more.

Exactly. Or at "We can't be sure that there's a difference in quality between the different encoders for this set of samples and for the participants etc..."

To put the debate on statistical difference and on the practical side of the graph, I created a fake one in which I add as competitor a lossless encoding. It's not quite perfect as the confidence error margin would change a bit but I don't think a true graph would really look different:

[a href="http://img230.imageshack.us/my.php?image=resultsz1my0.png" target="_blank"] ).  Err, I guess they could if they look at people's comments about what sounded bad in each sample.  That could tip them off to which sample was which if they can hear the same thing; I guess that answers my question.  Although you could just ask people not to look at people's detailed results while doing the test.  if people wanted to bias the themselves, they could have used a Java debugger, or used strace -efile to see what .wav ABC/HR was opening, so you're always depending on people's honesty anyway.

Ok, that's enough ways to tread on statistical thin ice for now.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: ckjnigel on 2008-12-08 18:06:40
Since the results are surprising, I wish Sebastian could moderate a discussion with three authorities.
In other words, a panel discussion with people like Roberto, a Nero rep, Menno, a LAME team member, whatever...
Maybe it could be done by Skype.  I think it would be better to allow participants to jump in and interrupt...
{ed add'n: I'm particularly interested to know if HELIX tweaks may come}
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: kwanbis on 2008-12-12 22:25:41
The much awaited results of the Public, MP3 Listening Test @ 128 kbps are ready - partially.

Sebastian, would you be adding more info? Or is the test finished already?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: Matth on 2008-12-14 18:11:44
Just wandered back into HA after a long time (and my system remembered my password).

One thing I note about the "128k" test, is that all encoders other than the low anchor were allowed to average higher than 128k (over the whole sample set), so if reaching 128k for storage capacity/bandwidth costs was required, then they all failed. How they would have compared at 128k ABR, CBR or VBR <= 128k (per sample / all samples) may have played out differently, as they varied from 8.6 to 12.5% over the targer bitrate (average), with greater excursions above and below for particular samples.
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: DonP on 2008-12-14 18:37:02
Just wandered back into HA after a long time (and my system remembered my password).

One thing I note about the "128k" test, is that all encoders other than the low anchor were allowed to average higher than 128k (over the whole sample set),


To the extent that the samples are selected to be problematic (I don't know if they all were), maybe it's to be expected that they would run higher than the target rate?
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: greynol on 2008-12-14 18:55:02
I tried to do something about misleading test titles and Sebastian even seemed like he was going to go along with it:

http://www.hydrogenaudio.org/forums/index....st&p=421117 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=47313&view=findpost&p=421117)

...oh well!
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: robert on 2008-12-14 19:55:03
@Matth:
You may want to read the paragraph "Is it normal that the bitrate is very high on some samples (even 228 kbps)?" from the introduction to the listening test. (http://www.listening-tests.info/mp3-128-1/index.htm)
Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: SVI on 2009-02-02 16:07:57
Hi to everybody and sorry for my poor english. 

This is my first message to this fourm so I'll try to not to be so unpolite but there is something in this results that surprised me. I worked as biostatistician for 5 years and I cannot understand why all the 95% confidence intervals are equal. 

it is supposed that the 95% error depends on sample size (identical for all the inner grooup) ande the x-mean dispersion. So if there are different appreciations (1 to 5 range) there will be different confidence intervals.

May be there is something I could not achieve    but in a first look it "souns" strange.

And, thank you for this incredible forum (a long time reading until today). 

Title: Public MP3 Listening Test @ 128 kbps - FINISHED
Post by: zorba on 2011-06-25 08:29:58
hi,

I would like having a look to the results but this link is dead : http://www.listening-tests.info/mp3-128-1/results.htm (http://www.listening-tests.info/mp3-128-1/results.htm)

any idea to get those results ?
thanks

edit : found ! sorry
http://listening-tests.hydrogenaudio.org/s...8-1/results.htm (http://listening-tests.hydrogenaudio.org/sebastian/mp3-128-1/results.htm)