Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Public MP3 Listening Test @ 128 kbps - FINISHED (Read 193511 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #125
If the difference between samples where Lame 3.98 scored consistently and significantly higher than Lame 3.97 was due to a known defect of Lame 3.97 that has been corrected in Lame 3.98 ("sandpaper problem"), then possibly.  Perhaps a class of samples exist that show weaknesses new to Lame 3.98.


I remembered this thread: Low Bitrate VBR sounds worse in 3.98. Maybe workaround for the "sandpaper problem" results in serious bitrate bloat for some tracks at -V7...-V9.
(Edit: punctuation.)

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #126
halb27 I will stick with you when you say "some people like defending LAME"... even with a bit of irritation and anger... I have nothing to add, like I am not saying Helix is BETTER than LAME, I didn't say that... but the numbers are there, and I am gonna stick with the numbers. You can't tell against the numbers. If you use 100 samples in a new test, it will come out the same.

Quote
How so?

To point out the impotency of your analysis, based on Sebastian's colored graph, Lame 3.97 performed the best on the greatest number of samples (it appears to be tied with Fraunhofer on sample 10). The point is that you have to look at the totality of the test and understand something about statistics. Those vertical bars in the chart summarizing the results are there for a reason and it appears that you have no idea how to interpret them.


If that graph is so much misleading, why is it even done in a listening test this way?
I think you have a better proposal for graphics that will not mislead people to interpret these same graphics, because it wouldn't be only me the owner of a "impotent analysis".

Quote
We? Let us give Sebastian Mares credit - this was largely a one-man show.
We? How about you? Organize a new test if you like. Don't beg the collective audience to do the work for you.
More People? I'm sure if Sebastian had a magic wand there would have been more people involved - but even with a Slashdotting and repeated extensions there were only a limited number of participants.


I didn't ask anyone for anything.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #127
I am gonna stick with the numbers
No one is defending Lame it's just that you don't know how to read the numbers.

If you use 100 samples in a new test, it will come out the same.
That's the most absurd thing I've read so far.  You really have no idea what you're talking about.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #128
Quote
No one is defending Lame it's just that you don't know how to read the numbers.


I hope more people agree with you.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #129
@Neasden

Sebastian's results page is unfinished. There will be comments on the right side of each graph to help people to interpret correctly the results.
Example: http://www.listening-tests.info/mf-64-1/results.htm

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #130
Two theory questions:
1 - If I were to take the raw data - and throw out all responses where the tested did not correctly identify the low anchor how "valid" would the numbers be?
2 - Why did this listening test not also include a high anchor?
Creature of habit.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #131
High anchor becomes pointless when some competitors results are expected at ~4.5/5 (i.e. close to transparency). It also makes the test heavier to perform for each listeners and if I'm not wrong it should bring additionnal statistical noise (i.e. longer error bar) to final results.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #132

I have posted some ABX logs and samples of tracks that shows Helix's major flaws.

We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.


Also IgorC, shawdowking and kornchild2002, noticed that Helix does perform very poorly with metal music.
"I never thought I'd see this much candy in one mission!"

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #133


I have posted some ABX logs and samples of tracks that shows Helix's major flaws.

We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.


Also IgorC, shawdowking and kornchild2002, noticed that Helix does perform very poorly with metal music.


Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Your ABX tests only show that there exists a problem with metal music in the 128kbps area, and nothing more... suggesting that in general Helix is poorly with metal music by a few tests in the 128kbps area doesn't make any sense.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #134
Two theory questions:
1 - If I were to take the raw data - and throw out all responses where the tested did not correctly identify the low anchor how "valid" would the numbers be?


I am not quite sure. I was also thinking about doing that, but this has to be made clear before the test starts, otherwise people might blame me for selecting only the results I like afterwards. Also, how do you know if for a person, the low anchor didn't really sound better? Let's say an encoder fails on a certain sample and introduces a lot of ringing while the low anchor simply lowpasses at let's say 14 KHz. For that person, the lowpass might be more acceptable than the ringing.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #135
Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Yes, this is a 128kbps listening test. Anyway, LOW bitrates for me are 96kbps and less.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #136
Quote from:  link=msg=601472 date=0
...All them have weaker and stronger areas.

That's the key result of the test, and we can learn about the encoders' strengths and weaknesses when loooking at the encoders' results on the individual samples. There is a chance that users can get significant encoder differentiation for his individual needs.
Looking only at the results averaged over all the samples is pretty pointless, especially when results are tied, and it looks like some members have only this in mind.
It's a matter of fact that it's useful to base encoder decision not only on encoders' performance on favorite genres in this test, but it's a starting point, and if someone doesn't want to dig deeper, it's his decision and he still gets some though a bit poor decision basis from the test when looking at the results of his favorite genres.
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #137
Quote
No one is defending Lame it's just that you don't know how to read the numbers.


I hope more people agree with you.

Please show a little more respect for and comprehension of math and statistics, especially on this forum. If you can come up with new math and statistical equations to overthrow the current system, sure, please do so.

I doubt you can, and I suggest you learn from those who are trying to teach knowledge  Making improvements in audio encoding goes far beyond walking around spouting placebo and unsubstantiated declarations.
The confidence interval bars are about as useless as kidneys are to a human.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #138

Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Yes, this is a 128kbps listening test. Anyway, LOW bitrates for me are 96kbps and less.

Well, that is your personal opinion...

For me, and many others 128 kbps is low, when we do reference to transparency and Hi-Fi audio. Even 160 kbps and many 192 kbps is very easy to ABX in many cases.

My point is that making statements about that Helix is IN GENERAL poor in metal music only for a few tests in the 128 kbps area is irresponsible, because Helix performs well in metal music in high bitrates (equivalent in bitrate size to Lame -V3 or more).

Obviously, if Helix will be tuned in the 128kbps area for performing better with metal music, then obviously, the performance of Helix with metal music in high bitrates could be even better than now... But we all know that this is not the interest from HA.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #139
The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

That's an interesting question. I would ponder on it a bit ;-) The future of listening tests probably lies in some kind of online database of samples intercrossed with database of 'encoder+settings' sets. The tester probably would request a set of n samples and m 'encoder+settings' sets, that will end in downloading n*m lossless samples (or even some kind of online testing, though i think it would need too high internet speed). For security reasons probably some kind of cryptographic signatures of files would be needed, in tags for example. Then some kind of future ABC/HR utility will make the test itself (checking signatures of files to prevent renaming/mangling of files) and will send results back with tester ID attached (if tester can't ABX encoded sample from original it should be rated with 5 automatically). Then the results will be added to online database of results signed with tester ID. So such listening tests database should grow by itself, and eventually a number of test results for each pair (sample, encoder+settings) should be significant enough. The plus of it all will be each tester can select samples/encoders he are interested in, and users which are interested in results can get for example a 'virtual' listening test for a selected set of encoders on a selected subset of samples (a kind of music they listen to).

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #140
Well, that is your personal opinion...

Probably the personal opinion of the entire iPod market, too. The ones who just don't give a heck about hi-fi, or have spoilt their ears enough not to give it. Just a small reminder.


Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #142
Quote
Please show a little more respect for and comprehension of math and statistics, especially on this forum. If you can come up with new math and statistical equations to overthrow the current system, sure, please do so.

I doubt you can, and I suggest you learn from those who are trying to teach knowledge  Making improvements in audio encoding goes far beyond walking around spouting placebo and unsubstantiated declarations. The confidence interval bars are about as useless as kidneys are to a human.


Yes Sir!

Quote
The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

Exactly... and I didn't see anything near that.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #143
Full results are online now. Hopefully I didn't miss anything.


#3 on picture: "linchpin". On text: san francisco bay = sfbay.

"linchpin" should be #11. And IIRC this sample from Fear Factory -- Digimortal.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #144
Quote

Please show a little more respect for and comprehension of math and statistics, especially on this forum. If you can come up with new math and statistical equations to overthrow the current system, sure, please do so.

I doubt you can, and I suggest you learn from those who are trying to teach knowledge  Making improvements in audio encoding goes far beyond walking around spouting placebo and unsubstantiated declarations. The confidence interval bars are about as useless as kidneys are to a human.


Yes Sir!


Quote
The question remains, how then can a test be deviced so that it can yield results that are in fact conclusive and create a form of intersubjective opinion regarding the prefered codec at a specific bitrate ?

Exactly... and I didn't see anything near that.


Yes Sir! Haha! Too funny 

There is nothing wrong with the test itself really, it's just that the encoders are very equal, that's all. No test in the world would point out a clear winner in this case. If it did, then that test would be f*cked up.
//From the barren lands of the Northsmen

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #145
...There is a chance that users can get significant encoder differentiation for his individual needs....

I was a a bit over-optimistic here. I just figured out what the detailed results mean for my prefered genres. As for these I can focus on samples 1-8, 10, 13, with special emphasis on samples 3, 4, 7, and 10.
I just looked at what comes out when concentrating on Lame 3.98, FhG, and Helix results. Well, a real differentiation of the results isn't possible even when concentrating on 'my' music. (I will give FhG a try nonetheless, but there's no foundation from the test, just personal decision).
lame3995o -Q1.7 --lowpass 17

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #146

Full results are online now. Hopefully I didn't miss anything.


#3 on picture: "linchpin". On text: san francisco bay = sfbay.

"linchpin" should be #11. And IIRC this sample from Fear Factory -- Digimortal.


Thanks! Fixed.

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #147
There is a chance that users can get significant encoder differentiation for his individual needs.

A very small chance. What conclusion would someone draw when encoder X fail on one sample of genre Y? LAME 3.97 didn't perfom extremely well on castanets.wav. Should I conclude that it's less suitable on spanish music than other encoders? Anyway, where's exactly the problem: on guitar? on castanets? on background noise? on lowest part of this dynamic sample? Should guitar lovers conclude anything about this sample if the problem only lies on the noisy background? A musical genre doesn't make a problem genre. I hope I'm clear enough.

Nobody could seriously make any conclusion by generalizing the performance you get of one single sample. An encoder could be perfectly transparent on a metal sample in a listening test and suffering from several flaws easily audible in the full Iron Maiden discography. Conclusion = 0. And on the opposite an encoder could reveal a strong artefact which may appear as totally isolate and not reproducible. Same thing: conclusion = 0.
To achieve a strong conclusion about musical genre/taste, you need dozens of samples from the same category and see the global performance... which means a listening test entireley dedicated to a specific genre (and several listeners interested in such test).

Quote
Looking only at the results averaged over all the samples is pretty pointless, especially when results are tied, and it looks like some members have only this in mind.

Pointless? I find a bit strange that you blamed me few hours ago for "killing" the interest of this test and that now you're saying that looking on overall conclusion is “pointless”? 

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #148
I have posted some ABX logs and samples of tracks that shows Helix's major flaws.
We know now that Helix has major flaws for you with metal, so chance is high that this is relevant to other metal lovers too. It is also backed up by the test where Helix shows its worst behavior with metal.

Also IgorC, shawdowking and kornchild2002, noticed that Helix does perform very poorly with metal music.

Which is true ONLY for low bitrates, but for high bitrates, Helix performs very well with metal music.

Your ABX tests only show that there exists a problem with metal music in the 128kbps area, and nothing more... suggesting that in general Helix is poorly with metal music by a few tests in the 128kbps area doesn't make any sense.

I tried a few metal tracks with Helix at V100 "-X2 -U2 -V100 -HF", and it still sucks at metal.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:39:50

File A: C:\Temp\Helix Mp3\V100\Fear Factory - Demanufacture\04. Replica.mp3
File B: C:\Rips\Fear Factory - Demanufacture\04. Replica.flac

23:39:50 : Test started.
23:40:00 : 01/01  50.0%
23:40:11 : 02/02  25.0%
23:40:19 : 03/03  12.5%
23:40:28 : 04/04  6.3%
23:40:34 : 05/05  3.1%
23:40:40 : 06/06  1.6%
23:40:46 : 07/07  0.8%
23:40:56 : 08/08  0.4%
23:41:02 : 09/09  0.2%
23:41:09 : 10/10  0.1%
23:41:35 : 11/11  0.0%
23:41:41 : 12/12  0.0%
23:41:43 : Test finished.

 ----------
Total: 12/12 (0.0%)

Splashy drums at the start, but a huge improvement over V60.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:43:32

File A: C:\Temp\Helix Mp3\V100\Fear Factory - Digimortal\05. Linchpin.mp3
File B: C:\Rips\Fear Factory - Digimortal\05. Linchpin.flac

23:43:32 : Test started.
23:43:43 : 01/01  50.0%
23:43:48 : 02/02  25.0%
23:44:00 : 03/03  12.5%
23:44:06 : 04/04  6.3%
23:44:13 : 05/05  3.1%
23:44:26 : 06/06  1.6%
23:44:33 : 07/07  0.8%
23:44:41 : 08/08  0.4%
23:44:51 : 09/09  0.2%
23:44:56 : 10/10  0.1%
23:45:07 : 11/11  0.0%
23:45:15 : 12/12  0.0%
23:45:16 : Test finished.

 ----------
Total: 12/12 (0.0%)

Horrid warbling from the start, still sounds really bad.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:48:25

File A: C:\Temp\Helix Mp3\V100\Metallica - Ride The Lightning\05. Trapped Under Ice.mp3
File B: C:\Rips\Metallica - Ride The Lightning\05. Trapped Under Ice.flac

23:48:25 : Test started.
23:48:40 : 01/01  50.0%
23:48:53 : 02/02  25.0%
23:49:08 : 03/03  12.5%
23:49:15 : 04/04  6.3%
23:49:30 : 05/05  3.1%
23:49:34 : 06/06  1.6%
23:49:52 : 07/07  0.8%
23:49:59 : 08/08  0.4%
23:50:14 : 09/09  0.2%
23:50:22 : 10/10  0.1%
23:50:28 : 11/11  0.0%
23:50:35 : 12/12  0.0%
23:50:36 : Test finished.

 ----------
Total: 12/12 (0.0%)

Warbling on the gutiar is still there, but a very good improvement over V60.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/27 00:14:36

File A: C:\Temp\Helix Mp3\V100\Metallica - Ride The Lightning\04. Fade To Black.mp3
File B: C:\Rips\Metallica - Ride The Lightning\04. Fade To Black.flac

00:14:36 : Test started.
00:15:25 : 01/01  50.0%
00:15:31 : 02/02  25.0%
00:15:43 : 03/03  12.5%
00:15:51 : 04/04  6.3%
00:15:59 : 05/05  3.1%
00:16:06 : 06/06  1.6%
00:16:14 : 07/07  0.8%
00:16:25 : 08/08  0.4%
00:16:38 : 09/09  0.2%
00:16:48 : 10/10  0.1%
00:16:51 : Test finished.

 ----------
Total: 10/10 (0.1%)

Drum smear and warbling on gutiars.

Code: [Select]
foo_abx 1.3.3 report
foobar2000 v0.9.6 beta 3
2008/11/26 23:58:44

File A: C:\Rips\Ministry - Rio Grande Blood\05. Lieslieslies.flac
File B: C:\Temp\Helix Mp3\V100\Ministry - Rio Grande Blood\05. Lieslieslies.mp3

23:58:44 : Test started.
23:59:35 : 01/01  50.0%
23:59:45 : 02/02  25.0%
23:59:57 : 03/03  12.5%
00:00:08 : 04/04  6.3%
00:00:23 : 04/05  18.8%
00:00:31 : 05/06  10.9%
00:00:40 : 06/07  6.3%
00:00:46 : 07/08  3.5%
00:00:54 : 08/09  2.0%
00:01:01 : 09/10  1.1%
00:01:12 : 10/11  0.6%
00:01:19 : 11/12  0.3%
00:01:24 : 12/13  0.2%
00:01:28 : 13/14  0.1%
00:01:37 : 14/15  0.0%
00:01:46 : 15/16  0.0%
00:01:53 : 16/17  0.0%
00:01:55 : Test finished.

 ----------
Total: 16/17 (0.0%)

A precho at around 0:27 - 0:29, plenty of gutiar warbling at the near end of the track.
"I never thought I'd see this much candy in one mission!"

Public MP3 Listening Test @ 128 kbps - FINISHED

Reply #149
Is this basically the end of mp3 by reaching its maximum level of quality being that all codecs are tied?