Multiformat@128kbps listening test - FINISHED

Topic: Multiformat@128kbps listening test - FINISHED (Read 181468 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Multiformat@128kbps listening test - FINISHED

Reply #75 – 2004-05-24 19:06:55

Quote

Also I think that Lame 3.96 -V5 --athaa-sensitivity 1 is not tested enough to say it doesn't fail (badly) in certain cases even pretty often. Imo iTunes 4.2 AAC in this sense is more safe.

I take your points into consideration but tbh i don't think my ears are up to the challenge of telling the two apart. Although im still undecided as to what format i will use.. i can never seem to make up my mind on these things

thanx to rjamorim for a very informative listening test.

-Brian

Multiformat@128kbps listening test - FINISHED

Reply #76 – 2004-05-24 19:13:18

Quote

I don't see how you can possibly disagree that it would have been a worse test if each sample had been encoded to give an average 128kbps per sample...

It would have been worse because it wouldn't simulate a real-life situation. In real life, you'd choose one setting, and encode your music with it.. you wouldn't spend time encoding every single song 3 or 4 times until you reach 128kbps average.

Multiformat@128kbps listening test - FINISHED

Reply #77 – 2004-05-24 19:45:46

My thanks to Roberot for organising such an enlightening test after the string of setbacks, and to AoTuv for improving Vorbis to such an impressive level

Roll on the 48k test!

Multiformat@128kbps listening test - FINISHED

Reply #78 – 2004-05-24 19:46:21

Quote

Btw: Slashdot seems like a nice place to waste time. Why bother with the facts when you can assume things instead and base the discussion on these assumptions (in this case why bother reading how the test was performed. Lets just assume how it was done and base the discussion on that). I only visit Slashdot when someone post a link on this forum, and I don't plan spending more time there either.

Same here. I poked my head into the discussion surrounding this particular news item. Had to get out before my anger welled. Lots of big mouths just waiting for the chance to open and blah blah blah BLAH. How totally useless. The headlines are all that is interesting.

Multiformat@128kbps listening test - FINISHED

Reply #79 – 2004-05-24 20:43:20

Vorbis winning (I know, MPC is very close) a listening test? That didn't happen for a long time

Multiformat@128kbps listening test - FINISHED

Reply #80 – 2004-05-24 20:45:24

Heh, as a tradition I would like to thank Roberto and partisipants for the test !
Good work, guys !

But I have one question about MPC average bitrate.
It seems (I may be wrong) that for those two samples (Debussy, CouldBeSweet) bit allocation mechanism or something else fails badly.
So, if we exclude those two samples, average bitrate of MPC will be about 142Kbit for such a setting (close to avg bitrate that was in previous test).
I can not tell it professionally, but may be those two bitrate values should be excluded from average bitrate calculation ?
As I remeber, it can be count, if this bitrate results are statistically significant to include in calculation...

Multiformat@128kbps listening test - FINISHED

Reply #81 – 2004-05-24 21:09:27

Quote

But I have one question about MPC average bitrate.
It seems (I may be wrong) that for those two samples (Debussy, CouldBeSweet) bit allocation mechanism or something else fails badly.
So, if we exclude those two samples, average bitrate of MPC will be about 142Kbit for such a setting (close to avg bitrate that was in previous test).
I can not tell it professionally, but may be those two bitrate values should be excluded from average bitrate calculation ?
As I remeber, it can be count, if this bitrate results are statistically significant to include in calculation...

ItCouldBeSweet was purposely inserted into the test to compensate somewhat for the higher average bitrate of the other samples (Debussy was not chosen specifically for bitrate).

I don't think it's a matter of the bit allocation mechanism failing, it's that the samples were chosen such that their average bitrates were generally higher than 128 kbit/s. The rationale was that having such samples would make defects easier to detect. This is true for defects like pre-echo, but as we saw from the test, if having a high bitrate helps the VBR codecs, having a very low bitrate can also hurt it. In other words, problem samples can be found at either end of the bitrate spectrum.

I think the bitrate criticism has some validity, but probably not to the extent that the overall results would have been significantly different if the average bitrates were closer to 128 kbit/s. It's an oversimplification to assume a linear degradation with average bitrate.

ff123

Multiformat@128kbps listening test - FINISHED

Reply #82 – 2004-05-24 21:23:00

Would I be opening up a can of worms to ask if this indicates that 3.96 should now be the recommended compile?

Multiformat@128kbps listening test - FINISHED

Reply #83 – 2004-05-24 21:40:44

Quote

ItCouldBeSweet was purposely inserted into the test to compensate somewhat for the higher average bitrate of the other samples (Debussy was not chosen specifically for bitrate).

Quote

it's that the samples were chosen such that their average bitrates were generally higher...

Oh, I didn't know it...
Thanks for clarification ff123 !
I've got the point, it is a material to think about.

Quote

I think the bitrate criticism has some validity, but probably not to the extent that the overall results would have been significantly different...

Agree completely
This was not a criticim, really. English is not my native as you see, sometimes I can not explain my thoughts clearly, sorry.
But I will try
My point about MPC(not vorbis) bitrate was:
when you count average on statistical data column, you must exclude values that are outside 3 sigma boundaries. Only in this case result will be statistically valid.
So, in other words, did MPC with used setting produce an average bitrate of 136bit *really* ?
It seems, that not. That two strange samples breaks a statistics a bit, because they were specifically chosen (at least one of them). So, users can be confused (possibly), when the real average bitrate with such a setting will be 142Kbit...
BTW, this do not affect rating calculations, only a bitrate...
This is my IMHO, of course.
Any opinion (and a clarifcation that I'm wrong too ) will be greatly appreciated

Multiformat@128kbps listening test - FINISHED

Reply #84 – 2004-05-24 21:48:18

Quote

My point about MPC(not vorbis) bitrate was:
when you count average on statistical data column, you must exclude values that are outside 3 sigma boundaries. Only in this case result will be statistically valid.
So, in other words, did MPC with used setting produce an average bitrate of 136bit *really* ?
It seems, that not. That two strange samples breaks a statistics a bit, because they were specifically chosen (at least one of them). So, users can be confused (possibly), when the real average bitrate with such a setting will be 142Kbit...
BTW, this do not affect rating calculations, only a bitrate...
This is my IMHO, of course.
Any opinion (and a clarifcation that I'm wrong too ) will be greatly appreciated

Yes, I understand your point.

Ideally, you'd like the bitrate distribution to look somewhat like a bell curve with its mean at 128 kbit/s.

The two samples with extremely low bitrate do not compensate very well for the other 16 samples which are generally skewed above 128 kbit/s.

For the 48 kbit/s test, if there are VBR codecs, I think we should strive to have about an equal number of bitrates above and below the average bitrate (which should work out to be 48 kbit/s on average across the sample set).

The rationale about wanting to use "hard" samples does not apply at low bitrates.

ff123

Multiformat@128kbps listening test - FINISHED

Reply #85 – 2004-05-24 21:50:27

Quote

Would I be opening up a can of worms to ask if this indicates that 3.96 should now be the recommended compile?

IMHO, this test is pretty good evidence that Lame 3.96 performs better than 3.90.3 at this bitrates, at least. Personally, this is enough to convince me to use 3.96 for mid-bitrates from now on, whatever the recommended version happens to be.

Multiformat@128kbps listening test - FINISHED

Reply #86 – 2004-05-24 21:50:47

Quote

This is true for defects like pre-echo, but as we saw from the test, if having a high bitrate helps the VBR codecs, having a very low bitrate can also hurt it.

Ehhh. When writing a previous reply start thinking about it.
Things are not that easy with VBR encodings ...
May be two pass ABR is the best encoding mode ?

Multiformat@128kbps listening test - FINISHED

Reply #87 – 2004-05-24 21:53:02

Quote

Would I be opening up a can of worms to ask if this indicates that 3.96 should now be the recommended compile?

I think that this test should have little relevance as far as the recommended LAME version goes. Remember --preset standard is the setting we are really worried about there.

Multiformat@128kbps listening test - FINISHED

Reply #88 – 2004-05-24 22:07:42

Quote

May be two pass ABR is the best encoding mode ?

It is the best for test conducers, for sure :B

Too bad only WMA implements it.

Multiformat@128kbps listening test - FINISHED

Reply #89 – 2004-05-24 22:11:44

Quote

Ideally, you'd like the bitrate distribution to look somewhat like a bell curve with its mean at 128 kbit/s.
The two samples with extremely low bitrate do not compensate very well for the other 16 samples which are generally skewed above 128 kbit/s.

Yep, that is what I mean.
Anyway, it is great that such a test are performed !
Thanks again !
EDIT:

Quote

It is the best for test conducers, for sure :B

He-he

Multiformat@128kbps listening test - FINISHED

Reply #90 – 2004-05-24 22:54:16

Quote

Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.

http://forums.minidisc.org/viewtopic.php?p=22321#22321

Multiformat@128kbps listening test - FINISHED

Reply #91 – 2004-05-24 22:56:30

Quote

Quote
About the test results, I noticed that for some samples there are no confidence intervals on the graphs (bartok_strings, leahy, mahler, ordinary world). Did everybody score exactly the same on these samples, or maybe you just forgot to put the intervals on the graphs?

Those samples had too few listeners and/or results were too close to each other. When that happens, friedman.exe doesn't output the LSD (which is essential to build the confidence intervals) and says that results are "not significant" (what in practice means they are tied)

Hmmm... If the results are too close to each other then it doesn't make sence to find everything equal. For instance, in the leahy sample vorbis gets 4.68 and atrac get 3.76. If the confidence intervals are so tight there is no way these two are statistically equal. And if there are too few listeners then you cannot make any statistical tests on the samples anyway. BTW how many listeners do you considers too few?

Is there a way you can upload the text files with the individual ranks for each sample tested? It is a real pain to build the tables manually from the xml files and I have to exclude ranked references from the start. I'm asking because maybe I can help with providing statistical results for these samples.

Multiformat@128kbps listening test - FINISHED

Reply #92 – 2004-05-24 23:04:06

Quote

If the confidence intervals are so tight there is no way these two are statistically equal.

Quite the opposite - the confidence intervals are so broad that they all overlap - so there are no winners and no losers in that sample.

Quote

BTW how many listeners do you considers too few?

To make me happy, I need at least 20 valid results/sample.

Quote

Is there a way you can upload the text files with the individual ranks for each sample tested? It is a real pain to build the tables manually from the xml files and I have to exclude ranked references from the start. I'm asking because maybe I can help with providing statistical results for these samples.

First, download the .rar package containing all the XMLs. Decompress it to an empty folder.

Then, install python and Phong's wonderful Chunky:
http://www.phong.org/chunky/

At the folder you decompressed the RAR, run

Code: [Select]

python "C:\path\to\chunky" -n --codec-file="C:\path\to\codec\list\codecs.txt" --ratings=results --warn -p 0.05

The codecs.txt should be:

Code: [Select]

1, Vorbis
2, MPC
3, Lame
4, iTunes
5, Atrac3
6, WMA

It'll create all result tables (good to be fed to friedman.exe) at the empty folder, and will discard the ranked results that haven't been ABXd to a confidence of 0.05. Chunky is just too wonderful to be true! OMG!

Regards;

Me.

Multiformat@128kbps listening test - FINISHED

Reply #93 – 2004-05-25 00:04:51

Thanks for the plug Roberto. :-)

If you have windows you don't need Python installed (the standalone windows binary version should work). You should also try out the --help option to get some other options. My personal favorite is the --spreadsheet option to output all the scores in a nice spreadsheet (CSV) format.

I intend to add an option for outputting the listener comments as browseable HTML.

I've tried to make the code fairly accessable, though it's gotten a bit crufty in recent versions (the XML support, for example, isn't as pretty and clean as I would like). The existing code is "incomplete"; there are some features I'd like to add still, but it does all the heavy lifting already (i.e. parsing result files into useful data structures and filtering out bad results). So, if you feel like "doing something" with the data and you know Python, feel free to jump in fix features or add bugs.

Multiformat@128kbps listening test - FINISHED

Reply #94 – 2004-05-25 00:19:33

Thanks a lot Roberto!

I think this test showed us, one more time, that open source is still better than any paid stuff. I don't want to start any Open Source political fight here, as I am not an free software defender most of the times. But, hey come on!

I think what impressed me most in this test was LAME's climb. LAME is doing the impossible with MP3, to improve it even more. Perhaps Gabriel should make a very simple preset that uses this configuration so maybe we can see more and more nice MP3 around.

Anyway, congrats to all for the nice encoders. By the way, where is the winner? Did he see the results already?

Multiformat@128kbps listening test - FINISHED

Reply #95 – 2004-05-25 00:27:25

Quote

By the way, where is the winner? Did he see the results already?

Indeed. Aoyumi should show up to receive a big round of applause.

Multiformat@128kbps listening test - FINISHED

Reply #96 – 2004-05-25 00:33:38

Quote

If you have windows you don't need Python installed (the standalone windows binary version should work).

I get a 404 when I try to donwload the windows binary. I'm not on my linux box right now so I'm downloading python for windows (9MB take a long time to download on 56k )

Multiformat@128kbps listening test - FINISHED

Reply #97 – 2004-05-25 00:53:07

Ooops, should be fixed now.

Multiformat@128kbps listening test - FINISHED

Reply #98 – 2004-05-25 00:54:42

This site apparently interviewed Roberto about the test:

http://p2pnet.net/story/1525

They got the contestants wrong, though.

ff123

Multiformat@128kbps listening test - FINISHED

Reply #99 – 2004-05-25 00:59:20

Quote

This site apparently interviewed Roberto about the test:

http://p2pnet.net/story/1525

They got the contestants wrong, though.

Yeah, the site author mailed me earlier today asking for comments, and for that sexy picture.

I'll mail him asking him to correct the competitors list.

Another news site mentioning my test:
http://www.afterdawn.com/news/archive/5257.cfm

Notice