Hello.
I'd like to announce the results of the Multiformat at 128kbps listening test
Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place.
The results page is here:
http://www.rjamorim.com/test/multiformat128/results.html (http://www.rjamorim.com/test/multiformat128/results.html)
For those in a hurry, here are the zoomed overall results:
(http://www.noveo.net/rjamorim/plot18z.png)
Big thanks to everyone that helped and participated.
Best regards;
Roberto.
Now that was a surprise... Lame as good as AAC??? Anyone expected that?
Vorbis (aoTuV) and MPC tied for first place. LAME and iTunes tied for second. Then WMA-S in third, and ATRAC3 at the back of the pack.
Funny that there was no real consistency this time across music types with the formats tested. Tends to oppose theories about certain formats excelling with certain types of music. At least among these samples.
What the! ... surprised!
Whoa, look at aoTuV!!
It is now as good as MPC. Very good work, Aoyumi. Vorbis is now back in the spotlight.
I believed Musepack would win the test especially such bitrate range(-q4.15). Anyway, it's very interesting result, good job Roberto and all participants.
Surprisingly, MPC 1.14 (same tested last year) isn't tied anymore with iTunes AAC, but “win”.
ATRAC3 (minidisc) is obviously a poor encoding solution.
aoTuV is without doubt a great step behind for Vorbis!
The codes:
1 - Vorbis aoTuV
2 - Musepack
3 - Lame MP3
4 - iTunes AAC
5 - Atrac3
6 - WMA Std.
The decryption key:
http://www.rjamorim.com/test/multiformat12...multiformat.key (http://www.rjamorim.com/test/multiformat128/comments/multiformat.key)
Very good results by aoTuV. It seems all the others have a new target for 128kbps quality now.
One thing which this test shows is that VBR coding (aoTuV, MPC) is definitely way to go for 128kbps, and with good enough VBR tweaking it's certainly possible to be clearly better than CBR (iTunes).
One thing which this test shows is that VBR coding (AoTuV, MPC) is definitely way to go for 128kbps, and with good enough VBR tweaking it's certainly possible to be clearly better than CBR (iTunes).
Yes. That is also true for Lame. With a very good VBR implementation, it got close to the best AAC implementation at that bitrate.
Let's hope Apple implements VBR in their codec, and Ahead improves their implementation considerably.
I believed Musepack would win the test especially such bitrate range(-q4.15).
I thought so too.
I anticipated a tie between MPC and QT-AAC, then Vorbis in second place, then LAME, then WMA-S and ATRAC at the back. Vorbis and QT-AAC both surprised me.
My browsers(Firefox, MSIE) don't show test comments correctly. Also, the title of this page (http://www.rjamorim.com/test/multiformat128/comments/comments.html) seems to be wrong.
My browsers(Firefox, MSIE) don't show test comments correctly.
It's XML. IE should show something like this:
http://esc17.midphase.com/~calmerc/screenshots/screen-1.jpg (http://esc17.midphase.com/~calmerc/screenshots/screen-1.jpg)
XML is worse for readability but easier to be parsed. That's why Schnofler switched to XML results in recent versions of ABC/HR Java.
Also, the title of this page (http://www.rjamorim.com/test/multiformat128/comments/comments.html) seems to be wrong.
Fixed. Thanks for reporting.
XML is worse for readability but easier to be parsed. That's why Schnofler switched to XML results in recent versions of ABC/HR Java.
I expected something like in raw *.txt format. Thanks for clarification.
I expected something like in raw *.txt format. Thanks for clarification.
Schnofler already has a converter from xml -> txt in ABC/HR. But it only works for encrypted results ATM. Hopefully he'll add support for already decrypted results.
Wow, what really impresses me is that I don't think there was one sample where the vorbis encoder did poorly. This is a little shocking after last test. Excellent work aoTuV!
Surprise surprise!
I hope this'll give vorbis development a new boost.
Oh! Joy!
Oh! Joy!
I'm happy my test is spreading happiness.
woow, now thats what i not expected
- vorbis aotuv: vorbis is back, and i am proud to have helped finding out what vorbis encoder should be used
- mpc vs aac: funny that mpc was that better than itunes (with a only 0.15 higher setting than in the last test)
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
- atrac3: even worse than wma9 -> go away sony
and if you take this test as a comparison between some online music stores (itunes vs. wma9 based ones vs. sonys new store) itunes clearly comes out as the winner, leaving wma9 behind by far!
I see there is a very small margin between mpc and aoTuv, how would aoTuv react
in higher bitrates.?
Very interesting results ...
I think it could be an interesting addition to show the bitrate for each encoder in the specific diagrams for each sample ...
The more I think of it the more impressed I am with the performance of LAME. Very good work Gabriel (and consider changing -V 5 default --athaa-sensitivity).
How many results were discarded because of ranked refs?
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Some comments:
1. mpc encoded debussy.wav at too low of a bitrate (98 kbit/s), apparently, because multiple people commented on a distorted sound, and its low rating on this sample (3.53) hurt it in comparison with vorbis. Note that problem samples are not synonymous with high bitrate! I would hope Frank could look into what's going on with mpc on this sample.
2. It's not clear that the all of the samples which didn't show significant differences (there were 4) would have benefited much with a larger listener sample size. The Bartok_strings2.wav and OrdinaryWorld.wav samples in particular are pretty evenly rated across the board.
Roberto did a separate analysis omitting these 4 samples and the overall results were very similar to the results with all 18 samples, except that with 18 samples the confidence level increased. So I'd say they helped out, even if individually they didn't show significant differences between codecs.
3. The absolute ratings of iTunes is remarkably stable in the tests it's been featured in (4.39, 4.42, 4.20, and 4.26 on this one), even though the tests are not strictly comparable.
4. MPC should have been expected (and it did appear) to be slightly better this time around than the last multiformat test since its quality setting was tweaked up slightly (from 4 to 4.15).
5. Excellent job on AoTuVb2, Ayumi and everybody else who was involved. Seeing such a high score in the test shouldn't have been a real big surprise since those virtuoso tuning ears were rating the beta2 version at around 4.0 overall.
6. Lame is still improving. Good job Gabriel and [proxima]
ff123
Edit: After checking, I see that MPC's absolute score went down from 4.51 to 4.47, so comment 4 is not consistent with what actually happened. But then again, it's not strictly correct to compare scores on one test with scores on another.
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Ok.
Thats around 15% of the results... And for me it still doesn't feel right to take them as irrelevant for the stats...
What about all the /.ers?
Seems they were just interested in wasting bandwidth after all
I found my chanchan listening test result wrongly classified as a NewYorkCity result.
-Grease
What about all the /.ers?
Seems they were just interested in wasting bandwidth after all
More than
500 people downloaded the samples through bittorrent only - not counting HTTP downloads! :B
I won't ever understand these people.
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Ok.
Thats around 15% of the results... And for me it still doesn't feel right to take them as irrelevant for the stats...
What about all the /.ers?
Seems they were just interested in wasting bandwidth after all
Some results with ranked refs are worse than others. Roberto showed me results from one person whose listening results I wouldn't trust at all, they were so bad (meaning lots of ranked refs).
There is always a question about how these results should be treated, and there are probably multiple ways of handling them. The fairest and simplest way seems to be to just throw them away if you have enough results that you can afford to do that, which in this case seems to be true.
ff123
I found my chanchan listening test result wrongly classified as a NewYorkCity result.
No worries, that classification happened while uploading. I'll move it back to the correct folder later.
1. mpc encoded debussy.wav at too low of a bitrate (98 kbit/s), apparently, because multiple people commented on a distorted sound, and its low rating on this sample (3.53) hurt it in comparison with vorbis. Note that problem samples are not synonymous with high bitrate! I would hope Frank could look into what's going on with mpc on this sample.
The problem seems to be low-volume. MPC --radio have some troubles with low-volume sample, especially when there's a slight amout of noise. Debussy.wav is just an exemple amoung hundred of this problem.
Problem is shoking if playback volume is exceptionnaly high, but is probably less annoying on normal playback conditions (which explain maybe the overall relative good notation of the encoding - I expected to be lower).
Note that standard preset also suffers from this problem, but it's less critical...
Wow. That is interesting. LAME with the --athaa-sensitivity switch and aoTuv being that strong.
Thanks to all participating and - of course - to all these great codec developers and exspecially to Roberto himself!!!
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
AC3??? Vorbious???
Get some sleep.
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?
- Lyx
lame's result is fairly amazing. I was about to begin encoding my cd collection into iTunes aac for an iPod im about to purchase. I think ill just stick with lame now. It's level of quality combined with it's compatiblity between mp3 players is an unbeatable combination.
rjamorim, can you plz make a zoomed "music store codecs only" chart too (aac, wma9, atrac3), i think it would be very interesting and important to have such a chart handy for showing people that when they have to choose where they should buy songs from, that not only the prices, but also the quality is very important and varries a lot
btw did i already thank you for your great test? thanks a lot!
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?
yepa and maybe add "mp3" to lame too, (and maybe ogg to vorbis) at least in the final chart to exclude all possible misunderstandings
A big thank you to Roberto for his efforts in conducting this test. Let's hope that it is not the last too
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for Atrac3 and LAME, and the close-to-160 test for MPC/Vorbis.
Yup, it's hard to compare CBR encoders with VBR encoders.
Everything you do is wrong
Usually all encders tend to produce files at around 128 kbps on an "average" sound file with the same settings. That's why I think it's ok to compare these codecs with these settings. Many test samples were chosen to be hard-to-encode (weren't they?). VBR encoders use higher bitrates in those complex situations. CBR encoders don't.
Bad Luck for the CBR encoders.
So... you can ask yourself: Is the choice of test samples fair ?
I don't know...
bye,
Sebastian
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
That's why I suggested to put the bitrates into the score graphs for each sample ... so everyone can see at which average bitrate the codec's result has been obtained.
lame's result is fairly amazing. I was about to begin encoding my cd collection into iTunes aac for an iPod im about to purchase. I think ill just stick with lame now. It's level of quality combined with it's compatiblity between mp3 players is an unbeatable combination.
I suggest you do also your own tests concentrading for example on pre-echo etc. (I'm not saying that either one is better, I have not compared LAME 3.96 -V5 --athaa-sensitivity 1 against iTunes 4.2 with pre-echo).
Remember however that these are average results of a group with restricted amount of samples and listeners with different abilities. It shows pretty well the quality on average, but doesn't necessarely show some of the details which might be interesting for you.
Also I think that Lame 3.96 -V5 --athaa-sensitivity 1 is not tested enough to say it doesn't fail (badly) in certain cases even pretty often. Imo iTunes 4.2 AAC in this sense is more safe.
But, if it's not so big deal, that Lame setting does seem on average pretty good.
A big thank you to Roberto for his efforts in conducting this test. Let's hope that it is not the last too
second the thanks to Roberto and everyone elso involved (including all the testers).
Roberto: come on, be honest, you would realy miss all the hick-hack and nag-nag going hand in hand with the tests, wouldn't you
What about all the /.ers? :rolleyes:
Seems they were just interested in wasting bandwidth after all :lol:
More than 500 people downloaded the samples through bittorrent only - not counting HTTP downloads! :B
I won't ever understand these people. :frustrated:
I think a lot of people thought that the test was going to be very easy (me included), "Come on, it's 128kbit! That sounds like crap, everybody knows that.".
...only to find out that there couldn't be found any major imperfections in the couple of samples tried. Sample 1 looks like it was one of the hardest ones to abx; very tough start, especially for someone who had set his mind on the assumtion above.
And besides, abx is an exhausting way of testing and it can be very frustrating/unmotivating if you don't get the results you're expecting ;).
woow, now thats what i not expected
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
it's a pitty that wma9
Pro was included in the test ...last test it was included it performed quite well
woow, now thats what i not expected
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
it's a pitty that wma9 Pro was included in the test ...last test it was included it performed quite well
Answer why wma9 pro was not included is here: http://www.hydrogenaudio.org/forums/index....ndpost&p=199103 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20301&view=findpost&p=199103)
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
Where did you get those numbers from?
@ Roberto
A big thanks for making this test possible. I hope you reconsider making more tests in the future.
About the test results, I noticed that for some samples there are no confidence intervals on the graphs (bartok_strings, leahy, mahler, ordinary world). Did everybody score exactly the same on these samples, or maybe you just forgot to put the intervals on the graphs?
This particular test should be called, "The 128 kbps test for iTunes/WMA, and the low-130 test for AC3 and LAME, and the close-to-160 test for MPC/Vorbious.
Leahy iTunes MPC Vorbis Lame WMA Atrac3
bitrate 128 155 149 133 128 132
Score 4.34 4.41 4.68 4.11 4.37 3.76
I am aware of the rationalization. I am aware of the overall average. But let this be a n"oh?" to those that don't and aren't.
See here how the average bitrates were decided for this test (personally I'm not absolutely sure if it was enough). Obviously those settings in the table close to 128 were used:
http://www.hydrogenaudio.org/forums/index....ndpost&p=207203 (http://www.hydrogenaudio.org/forums/index.php?showtopic=21079&view=findpost&p=207203)
Also the correct average bitrates for the 18 samples tested are (instead of what you said):
iTunes MPC aoTuV Lame WMA Atrac3
128 136 135 134 128 132
Roberto> what software did you used to obtain wma9 files? Is it VBR-2 pass 128 kbps? What decoder? I've tried to reproduce the same wavform with different settings, and I wasn't able to do it.
I'm very sorry I couldn't participate in this test.
However, many thanks to you Roberto - and isn't it nice to have such interesting results? I don't think many people expected this.
To me, these bitrates are very interesting. It's amazing both how good, and how bad, some things can sound at 128kbps.
Also, I bet almost every sample in this test was light years better than most people's typical experience of 128kbps lossy coding!
Cheers,
David.
Roberto> what software did you used to obtain wma9 files? Is it VBR-2 pass 128 kbps? What decoder? I've tried to reproduce the same wavform with different settings, and I wasn't able to do it.
I already asked him about this.
http://www.hydrogenaudio.org/forums/index....ndpost&p=210584 (http://www.hydrogenaudio.org/forums/index.php?showtopic=21370&view=findpost&p=210584)
EDIT:It's certainly Bitrate VBR 128kbps, 44kHz, stereo VBR 1pass.
BTW, a thread is made at Slashdot about the test results.
Vorbis And Musepack Win 128kbps Multiformat Test (http://slashdot.org/article.pl?sid=04/05/24/0623247&mode=thread&tid=141&tid=185&tid=188)
harashin> I need to try again. I probably did a mistake. Thanks
Other question: did someone post these results on minidisc ddicated boards? It would be interesting, because a lot of MD users often said in these boards that atrac3@132 = mp3@192...
Thanks rjamorim for this test, and all testers for their time and results!
And imagine, the first Hungarian internet music store uses WMA instead of MP3., because it has "better wuality" (And DRM, but they could have used AAC for DRM'd music)
Yeah, and thanks for the Hungarian references in the samples! (Bartók, Dances Hongroises - or whatever, they're "Magyar táncok" in Hungarian )
Other question: did someone post these results on minidisc ddicated boards? It would be interesting, because a lot of MD users often said in these boards that atrac3@132 = mp3@192...
Actually, I used to hear things to the effect of "LP2 (132kbps) sounds better than a 320kbps MP3!!!" As much as I would love to send an email to the webmaster of the MD Community page or create a thread on one of their forums, I don't think posting the results will have much of an effect. If my past experiences are any indication of the MD community's openess to such data, it will be ignored or disputed via the usual subjective, empirically-dubious arguments one might expect to be prevelant on such boards. For example, when I ABX'ed ATRAC-R with ease a few years ago and posted the results in the most unbiased manner possible, I was either flamed for being "anti-MD" or just outright ignored. The truth hurts, eh?
Roberto, kudos on another great test!
Cygnus X1> I've also read that MD > CD, because MD is 292 kbps and CD 176
This current test is nevertheless different: there's not only one “biased” tester who posted false results, but a whole community, with 18 samples, on ABX conditions. Some opinions might change
Cool, these results clearly show that
WMA 1ST DEATH :[
/me wonders what the Extremetech editors will think when - if - they see these results...
This just in:
http://slashdot.org/comments.pl?sid=108647&cid=9238730 (http://slashdot.org/comments.pl?sid=108647&cid=9238730)
and
http://slashdot.org/comments.pl?sid=108647&cid=9238686 (http://slashdot.org/comments.pl?sid=108647&cid=9238686)
"actual ranking is Vorbis, iTunes, MPC, Lame, WMA, Atrac3" after "a quick-n-dirty compensation, [using] the average scores times 128 over the average bitrate."
ff123
This just in:
http://slashdot.org/comments.pl?sid=108647&cid=9238730 (http://slashdot.org/comments.pl?sid=108647&cid=9238730)
and
http://slashdot.org/comments.pl?sid=108647&cid=9238686 (http://slashdot.org/comments.pl?sid=108647&cid=9238686)
"actual ranking is Vorbis, iTunes, MPC, Lame, WMA, Atrac3" after "a quick-n-dirty compensation, [using] the average scores times 128 over the average bitrate."
ff123
Not very hard to guess that these "compensators" appear who don't want to understand the concept of vbr and how the test was conducted. Though last time I remember something like this on HA also.
rjamorim, can you plz make a zoomed "music store codecs only" chart too (aac, wma9, atrac3), i think it would be very interesting and important to have such a chart handy for showing people that when they have to choose where they should buy songs from, that not only the prices, but also the quality is very important and varries a lot
Here it is. I probably won't add this graph to the results page. There are already plenty of graphs there.
http://www.rjamorim.com/test/multiformat128/plot18b.png (http://www.rjamorim.com/test/multiformat128/plot18b.png)
maybe it would make sence to rename "iTunes" to "iTunes AAC" in the summary chart, so that people do not mistake the iTunes result with its lousy mp3-encoder?
yepa and maybe add "mp3" to lame too, (and maybe ogg to vorbis) at least in the final chart to exclude all possible misunderstandings
God, no. If people are that uninformed, they shouldn't be even reading those results.
A big thank you to Roberto for his efforts in conducting this test. Let's hope that it is not the last too
Penultimate
That's why I suggested to put the bitrates into the score graphs for each sample ... so everyone can see at which average bitrate the codec's result has been obtained.
That will lead peopel to linking bitrates with scores, just like happened at /. - and that is wrong.
I think a lot of people thought that the test was going to be very easy (me included), "Come on, it's 128kbit! That sounds like crap, everybody knows that.".
No worries, next test will be at 48kbps. Even people with crappy $5 speakers (like me ) and tone deaf will be able to participate.
it's a pitty that wma9 Pro was included in the test ...last test it was included it performed quite well
It hasn't changed a bit since last test. And I personally believe including WMA Pro in that test was a mistake (my second biggest mistake in test conduction, perhaps). When I included it, I expected microsoft would soon start pushing it with all the might of their marketing department to make it replace WMA Std. Alas, that didn't happen. Microsoft seems to have settled on focusing WMA Pro on DVD players and industry usage, and keeping WMA Std. for consumer usage (portables, online stores, ripping at home...)
Moving on to next post...
About the test results, I noticed that for some samples there are no confidence intervals on the graphs (bartok_strings, leahy, mahler, ordinary world). Did everybody score exactly the same on these samples, or maybe you just forgot to put the intervals on the graphs?
Those samples had too few listeners and/or results were too close to each other. When that happens, friedman.exe doesn't output the LSD (which is essential to build the confidence intervals) and says that results are "not significant" (what in practice means they are tied)
Other question: did someone post these results on minidisc ddicated boards? It would be interesting, because a lot of MD users often said in these boards that atrac3@132 = mp3@192...
MWAHAHA!
No worries, next test will be at 48kbps. Even people with crappy $5 speakers (like me ) and tone deaf will be able to participate.
Time for me to give it a try
Even the "Waiting" sample was un-ABX-able for me, when I tried at higher bitrates, I might be deaf
Other question: did someone post these results on minidisc ddicated boards? It would be interesting, because a lot of MD users often said in these boards that atrac3@132 = mp3@192...
MWAHAHA!
Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 (http://forums.minidisc.org/viewtopic.php?p=22300) Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.
Here it is. I probably won't add this graph to the results page. There are already plenty of graphs there.
http://www.rjamorim.com/test/multiformat128/plot18b.png (http://www.rjamorim.com/test/multiformat128/plot18b.png)
why not? i would vote for that everyone who thinks about buying songs from a music store, which offers wma@128, should be forced to stare at this graph for two hours
maybe adding lame to this graph would be also good, to proove that probably even the "kazaa music store" will offer better quality at more reasonable prices than wma-based ones
Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 (http://forums.minidisc.org/viewtopic.php?p=22300) Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.
Let's all go over there and flame them. Mwahahaha!
Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 (http://forums.minidisc.org/viewtopic.php?p=22300) Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.
Let's all go over there and flame them. Mwahahaha!
http://microsoftusernetwork.com/forum/viewtopic.php?p=275 (http://microsoftusernetwork.com/forum/viewtopic.php?p=275)
Not very hard to guess that these "compensators" appear who don't want to understand the concept of vbr and how the test was conducted. Though last time I remember something like this on HA also.
I have actually been waiting for the complaining about unfair bitrates to start. I personally find it hard to understand the complaining, after the reasoning behind it has been explained over and over.
Btw: Slashdot seems like a nice place to waste time. Why bother with the facts when you can assume things instead and base the discussion on these assumptions (in this case why bother reading how the test was performed. Lets just assume how it was done and base the discussion on that). I only visit Slashdot when someone post a link on this forum, and I don't plan spending more time there either.
Reading discussions at Slashdot, makes me want to thank the Hydrogenaudio staff for keeping Hydrogenaudio a source of information, instead of a source of speculation, assumption and incorrect information. A big thank you to you all!
Also thanks to Roberto for yet another interesting test. Too bad they are always conducted when I have to prepare for my exams, but I don't really have a suitable listening environment for performing such tests anyway.
maybe adding lame to this graph would be also good, to proove that probably even the "kazaa music store" will offer better quality at more reasonable prices than wma-based ones
You won't find many lame -V5 --athaa-sensitivity 1 at Kazaa, I reckon
IMHO the test is meaningless. If codecs can go over the 128 kbps average then essentially these codecs have cheated.
Don't get me wrong I've encoded most of my tunes into Vorbis as I feel it's a better codec than others, but I would like to see a fair comparison that fits the title of this listening test!!
IMHO the test is meaningless. If codecs can go over the 128 kbps average then essentially these codecs have cheated.
oh, come on! not again...
Good Test
Just the average bitrate is a little bit to high imo (itunes is ok though)...
*edit* havn't seen ep0ch's post before...
oh, come on! not again...
Heh, it was asked for Just my opinion!
I don't see how you can possibly disagree that it would have been a worse test if each sample had been encoded to give an average 128kbps per sample... oh I'll shut up to keep you all happy!!
*yawns* Well the failure of CBR codecs in this test are due to their developers: they did not write a code smart enough to alter bitrate allocation dynamically. VBR is the key issue that requires proper tuning -> the test is fair.
Well ATRAC has a cool name and that's all about it I guess. It looks like this test will be referred many times to whoever claiming ATRAC@132 sounds better than MP3@320 in the upcoming years. Maybe it's a good idea to open a Codec Comparison Forum to post these tests for future easy access.
I really wish people wouldn't post links to slashdot. It's an incomparable juggernaut of stupidity. I can't resist the urge to go over and yell at them. It usually takes me about half an hour to realize that I'm just sticking my finger in a dike.
IMHO the test is meaningless. If codecs can go over the 128 kbps average then essentially these codecs have cheated.
AHHHHHHH!! You've got slashdotitus! Do they make a vaccine for this yet?
*runs off to wash hands*
Also I think that Lame 3.96 -V5 --athaa-sensitivity 1 is not tested enough to say it doesn't fail (badly) in certain cases even pretty often. Imo iTunes 4.2 AAC in this sense is more safe.
I take your points into consideration but tbh i don't think my ears are up to the challenge of telling the two apart. Although im still undecided as to what format i will use.. i can never seem to make up my mind on these things
thanx to rjamorim for a very informative listening test.
-Brian
I don't see how you can possibly disagree that it would have been a worse test if each sample had been encoded to give an average 128kbps per sample...
It would have been worse because it wouldn't simulate a real-life situation. In real life, you'd choose one setting, and encode your music with it.. you wouldn't spend time encoding every single song 3 or 4 times until you reach 128kbps average.
My thanks to Roberot for organising such an enlightening test after the string of setbacks, and to AoTuv for improving Vorbis to such an impressive level
Roll on the 48k test!
Btw: Slashdot seems like a nice place to waste time. Why bother with the facts when you can assume things instead and base the discussion on these assumptions (in this case why bother reading how the test was performed. Lets just assume how it was done and base the discussion on that). I only visit Slashdot when someone post a link on this forum, and I don't plan spending more time there either.
Same here. I poked my head into the discussion surrounding this particular news item. Had to get out before my anger welled. Lots of big mouths just waiting for the chance to open and blah blah blah
BLAH. How totally useless. The headlines are all that is interesting.
Vorbis winning (I know, MPC is very close) a listening test? That didn't happen for a long time
Heh, as a tradition I would like to thank Roberto and partisipants for the test !
Good work, guys !
But I have one question about MPC average bitrate.
It seems (I may be wrong) that for those two samples (Debussy, CouldBeSweet) bit allocation mechanism or something else fails badly.
So, if we exclude those two samples, average bitrate of MPC will be about 142Kbit for such a setting (close to avg bitrate that was in previous test).
I can not tell it professionally, but may be those two bitrate values should be excluded from average bitrate calculation ?
As I remeber, it can be count, if this bitrate results are statistically significant to include in calculation...
But I have one question about MPC average bitrate.
It seems (I may be wrong) that for those two samples (Debussy, CouldBeSweet) bit allocation mechanism or something else fails badly.
So, if we exclude those two samples, average bitrate of MPC will be about 142Kbit for such a setting (close to avg bitrate that was in previous test).
I can not tell it professionally, but may be those two bitrate values should be excluded from average bitrate calculation ?
As I remeber, it can be count, if this bitrate results are statistically significant to include in calculation...
ItCouldBeSweet was purposely inserted into the test to compensate somewhat for the higher average bitrate of the other samples (Debussy was not chosen specifically for bitrate).
I don't think it's a matter of the bit allocation mechanism failing, it's that the samples were chosen such that their average bitrates were generally higher than 128 kbit/s. The rationale was that having such samples would make defects easier to detect. This is true for defects like pre-echo, but as we saw from the test, if having a high bitrate helps the VBR codecs, having a very low bitrate can also hurt it. In other words, problem samples can be found at either end of the bitrate spectrum.
I think the bitrate criticism has some validity, but probably not to the extent that the overall results would have been significantly different if the average bitrates were closer to 128 kbit/s. It's an oversimplification to assume a linear degradation with average bitrate.
ff123
Would I be opening up a can of worms to ask if this indicates that 3.96 should now be the recommended compile?
ItCouldBeSweet was purposely inserted into the test to compensate somewhat for the higher average bitrate of the other samples (Debussy was not chosen specifically for bitrate).
it's that the samples were chosen such that their average bitrates were generally higher...
Oh, I didn't know it...
Thanks for clarification ff123 !
I've got the point, it is a material to think about.
I think the bitrate criticism has some validity, but probably not to the extent that the overall results would have been significantly different...
Agree completely
This was not a criticim, really. English is not my native as you see, sometimes I can not explain my thoughts clearly, sorry.
But I will try
My point about MPC(not vorbis) bitrate was:
when you count average on statistical data column, you must exclude values that are outside 3 sigma boundaries. Only in this case result will be statistically valid.
So, in other words, did MPC with used setting produce an average bitrate of 136bit *really* ?
It seems, that not. That two strange samples breaks a statistics a bit, because they were specifically chosen (at least one of them). So, users can be confused (possibly), when the real average bitrate with such a setting will be 142Kbit...
BTW, this do not affect rating calculations, only a bitrate...
This is my IMHO, of course.
Any opinion (and a clarifcation that I'm wrong too ) will be greatly appreciated
My point about MPC(not vorbis) bitrate was:
when you count average on statistical data column, you must exclude values that are outside 3 sigma boundaries. Only in this case result will be statistically valid.
So, in other words, did MPC with used setting produce an average bitrate of 136bit *really* ?
It seems, that not. That two strange samples breaks a statistics a bit, because they were specifically chosen (at least one of them). So, users can be confused (possibly), when the real average bitrate with such a setting will be 142Kbit...
BTW, this do not affect rating calculations, only a bitrate...
This is my IMHO, of course.
Any opinion (and a clarifcation that I'm wrong too ) will be greatly appreciated
Yes, I understand your point.
Ideally, you'd like the bitrate distribution to look somewhat like a bell curve with its mean at 128 kbit/s.
The two samples with extremely low bitrate do not compensate very well for the other 16 samples which are generally skewed above 128 kbit/s.
For the 48 kbit/s test, if there are VBR codecs, I think we should strive to have about an equal number of bitrates above and below the average bitrate (which should work out to be 48 kbit/s on average across the sample set).
The rationale about wanting to use "hard" samples does not apply at low bitrates.
ff123
Would I be opening up a can of worms to ask if this indicates that 3.96 should now be the recommended compile?
IMHO, this test is pretty good evidence that Lame 3.96 performs better than 3.90.3 at this bitrates, at least. Personally, this is enough to convince me to use 3.96 for mid-bitrates from now on, whatever the recommended version happens to be.
This is true for defects like pre-echo, but as we saw from the test, if having a high bitrate helps the VBR codecs, having a very low bitrate can also hurt it.
Ehhh. When writing a previous reply start thinking about it.
Things are not that easy with VBR encodings ...
May be two pass ABR is the best encoding mode ?
Would I be opening up a can of worms to ask if this indicates that 3.96 should now be the recommended compile?
I think that this test should have little relevance as far as the recommended LAME version goes. Remember --preset standard is the setting we are really worried about there.
May be two pass ABR is the best encoding mode ?
It is the best for test conducers, for sure :B
Too bad only WMA implements it.
Ideally, you'd like the bitrate distribution to look somewhat like a bell curve with its mean at 128 kbit/s.
The two samples with extremely low bitrate do not compensate very well for the other 16 samples which are generally skewed above 128 kbit/s.
Yep, that is what I mean.
Anyway, it is great that such a test are performed !
Thanks again !
EDIT:
It is the best for test conducers, for sure :B
He-he
Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 (http://forums.minidisc.org/viewtopic.php?p=22300) Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.
http://forums.minidisc.org/viewtopic.php?p=22321#22321 (http://forums.minidisc.org/viewtopic.php?p=22321#22321)
About the test results, I noticed that for some samples there are no confidence intervals on the graphs (bartok_strings, leahy, mahler, ordinary world). Did everybody score exactly the same on these samples, or maybe you just forgot to put the intervals on the graphs?
Those samples had too few listeners and/or results were too close to each other. When that happens, friedman.exe doesn't output the LSD (which is essential to build the confidence intervals) and says that results are "not significant" (what in practice means they are tied)
Hmmm... If the results are too close to each other then it doesn't make sence to find everything equal. For instance, in the leahy sample vorbis gets 4.68 and atrac get 3.76. If the confidence intervals are so tight there is no way these two are statistically equal. And if there are too few listeners then you cannot make any statistical tests on the samples anyway. BTW how many listeners do you considers too few?
Is there a way you can upload the text files with the individual ranks for each sample tested? It is a real pain to build the tables manually from the xml files and I have to exclude ranked references from the start. I'm asking because maybe I can help with providing statistical results for these samples.
If the confidence intervals are so tight there is no way these two are statistically equal.
Quite the opposite - the confidence intervals are so broad that they all overlap - so there are no winners and no losers in that sample.
BTW how many listeners do you considers too few?
To make me happy, I need at least 20 valid results/sample.
Is there a way you can upload the text files with the individual ranks for each sample tested? It is a real pain to build the tables manually from the xml files and I have to exclude ranked references from the start. I'm asking because maybe I can help with providing statistical results for these samples.
First, download the .rar package containing all the XMLs. Decompress it to an empty folder.
Then, install python and Phong's wonderful Chunky:
http://www.phong.org/chunky/ (http://www.phong.org/chunky/)
At the folder you decompressed the RAR, run
python "C:\path\to\chunky" -n --codec-file="C:\path\to\codec\list\codecs.txt" --ratings=results --warn -p 0.05
The codecs.txt should be:
1, Vorbis
2, MPC
3, Lame
4, iTunes
5, Atrac3
6, WMA
It'll create all result tables (good to be fed to friedman.exe) at the empty folder, and will discard the ranked results that haven't been ABXd to a confidence of 0.05. Chunky is just too wonderful to be true! OMG!
Regards;
Me.
Thanks for the plug Roberto. :-)
If you have windows you don't need Python installed (the standalone windows binary version should work). You should also try out the --help option to get some other options. My personal favorite is the --spreadsheet option to output all the scores in a nice spreadsheet (CSV) format.
I intend to add an option for outputting the listener comments as browseable HTML.
I've tried to make the code fairly accessable, though it's gotten a bit crufty in recent versions (the XML support, for example, isn't as pretty and clean as I would like). The existing code is "incomplete"; there are some features I'd like to add still, but it does all the heavy lifting already (i.e. parsing result files into useful data structures and filtering out bad results). So, if you feel like "doing something" with the data and you know Python, feel free to jump in fix features or add bugs.
Thanks a lot Roberto!
I think this test showed us, one more time, that open source is still better than any paid stuff. I don't want to start any Open Source political fight here, as I am not an free software defender most of the times. But, hey come on!
I think what impressed me most in this test was LAME's climb. LAME is doing the impossible with MP3, to improve it even more. Perhaps Gabriel should make a very simple preset that uses this configuration so maybe we can see more and more nice MP3 around.
Anyway, congrats to all for the nice encoders. By the way, where is the winner? Did he see the results already?
By the way, where is the winner? Did he see the results already?
Indeed. Aoyumi should show up to receive a big round of applause. (http://pessoal.onda.com.br/rjamorim/clapping.gif)
If you have windows you don't need Python installed (the standalone windows binary version should work).
I get a 404 when I try to donwload the windows binary. I'm not on my linux box right now so I'm downloading python for windows (9MB take a long time to download on 56k )
Ooops, should be fixed now.
This site apparently interviewed Roberto about the test:
http://p2pnet.net/story/1525 (http://p2pnet.net/story/1525)
They got the contestants wrong, though.
ff123
This site apparently interviewed Roberto about the test:
http://p2pnet.net/story/1525 (http://p2pnet.net/story/1525)
They got the contestants wrong, though.
Yeah, the site author mailed me earlier today asking for comments, and for that sexy picture.
I'll mail him asking him to correct the competitors list.
Another news site mentioning my test:
http://www.afterdawn.com/news/archive/5257.cfm (http://www.afterdawn.com/news/archive/5257.cfm)
Ooops, should be fixed now.
Yes it is. Thanks.
Nope. It just won't work for me. All I get from chunky is
Parsing result files...
Traceback (most recent call last):
File "chunky", line 639, in ?
File "chunky", line 595, in main
File "abchr_parser.pyc", line 634, in __init__
File "abchr_parser.pyc", line 646, in _handleTargets
File "abchr_parser.pyc", line 697, in __init__
abchr_parser.Error: Sample directory names must end in a number.
But the directory names already end in a number!
But the directory names already end in a number!
The folder where you run chunky from (and where the SampleXX folders are) must be
emptyI.E, no files there, only the 18 folders.
OK the program executed but all I got were files with:
%
% !EMPTY!:
Vorbis MPC Lame iTunes Atrac3 WMA
% Codec averages:
% 0.00 0.00 0.00 0.00 0.00 0.00
huh?
Some body has posted the results at http://forums.minidisc.org/viewtopic.php?p=22300 (http://forums.minidisc.org/viewtopic.php?p=22300) Nobody has dered to answer yet. Maybe all the minidisc guys have got heart attack after reading the results.
http://forums.minidisc.org/viewtopic.php?p=22321#22321 (http://forums.minidisc.org/viewtopic.php?p=22321#22321)
Almost 400 page views and not a peep from anybody. I find this sort of response interesting...it's not like we're personally attacking MD. The test simply showed that it's performance isn't up to par, fair or not. The exact same thing happened when I used to talk about pre-echo samples with ATRAC Type-R. I have to wonder, though, how many readers of that thread will rush out and buy a 1GB "hi-md" machine once they come out...although ATRAC3plus is technically a different animal (much bigger transform window, etc) than ATRAC3, my expectations aren't very high for it either.
(Edit: I kant sphell)
OK the program executed but all I got were files with:
Oops.
Last detail: files extension must be .txt :B
So please rename all xmls to txt (ren /s *.xml *.txt)
Oops.
Last detail: files extension must be .txt :B
So please rename all xmls to txt (ren /s *.xml *.txt)
Got it! Finaly! Thanks a lot Roberto. Here are some character graphs with confidence intervals included. I will make proper graphs tomorrow after I get some sleep if you like (it's already 5:00 in the morning here! )
Bartok (p=0.851)
Level N Mean StDev -----+---------+---------+---------+-
Atrac3 16 4.3125 1.2894 (-----------*-----------)
iTunes 16 4.6438 0.9040 (-----------*-----------)
Lame 16 4.4688 0.8845 (------------*-----------)
MPC 16 4.5438 0.9736 (------------*-----------)
Vorbis 16 4.7375 0.5439 (-----------*------------)
WMA 16 4.4125 1.1621 (-----------*------------)
-----+---------+---------+---------+-
Pooled StDev = 0.9880 4.00 4.40 4.80 5.20
Leahy (p=0.532)
Level N Mean StDev -------+---------+---------+---------
Atrac3 12 3.758 1.672 (---------*--------)
iTunes 12 4.242 1.161 (---------*--------)
Lame 12 4.108 1.157 (---------*--------)
MPC 12 4.408 0.955 (---------*---------)
Vorbis 12 4.683 0.876 (---------*---------)
WMA 12 4.367 1.130 (--------*---------)
-------+---------+---------+---------
Pooled StDev = 1.186 3.50 4.20 4.90
Mahler (p=0.660)
Level N Mean StDev ---------+---------+---------+-------
Atrac3 12 3.617 1.777 (----------*---------)
iTunes 12 4.092 1.328 (---------*----------)
Lame 12 4.167 1.170 (----------*---------)
MPC 12 4.517 0.735 (----------*---------)
Vorbis 12 4.292 1.076 (---------*----------)
WMA 12 4.142 1.323 (---------*----------)
---------+---------+---------+-------
Pooled StDev = 1.274 3.50 4.20 4.90
Ordinary world (p=0.846)
Level N Mean StDev ------+---------+---------+---------+
Atrac3 13 4.4769 0.7939 (------------*------------)
iTunes 13 4.3846 1.0808 (------------*------------)
Lame 13 4.4538 0.8866 (------------*------------)
MPC 13 4.7077 0.6396 (------------*------------)
Vorbis 13 4.7077 0.7065 (------------*------------)
WMA 13 4.3077 1.3775 (------------*------------)
------+---------+---------+---------+
Pooled StDev = 0.9478 4.00 4.40 4.80 5.20
Good work Roberto.
The results are an upset win for ogg vorbis, and a significant improvement in the venerable Lame MP3 as well.
good work roberto ... i would continue to use LAME then till iRiver porperly supports Vorbis
Okay, now that it is over, I want to make a few points.
I really couldn't tell the difference between any of these codecs and the wav at this bitrate. The first thing that will come to many of your minds is equipment, but I am using a decent pair of headphones (Grado SR-60's) and although I am working without a headphone amp on a Thinkpad which uses the Intel 855 AC'97 codec, things sound pretty good.
A few times I thought I heard a difference, but when I tried to abx the set I was not successful. At one point I deleted my whole results fileset thinking what I would be submitting wouldn't be acceptable. Although, I reconsidered and figured I would go ahead and submit them with everything scored a five.
I am relatively new to this, and what is interesting to me is that when I use the myriad of training materials that exist, I can successfully hear the artifacts and problems when I am told what to listen for. I successfully abx these samples 100% of the time. However, in a blind test, when I don't know what I am listening for, I cannot hear a difference. I know quality plays a role here, but I am thinking my problem is that I don't have the attention span and attention to detail necessary (which would be consistent with how I react to other stimuli) to be good at this.
I would be very interested in hearing some isolated artifacts on a few of these samples so I can try to hear what I missed. It's been a learning experience anyway, thanks for letting me participate.
woow, now thats what i not expected
- vorbis aotuv: vorbis is back, and i am proud to have helped finding out what vorbis encoder should be used
- mpc vs aac: funny that mpc was that better than itunes (with a only 0.15 higher setting than in the last test)
- wma9: lol, worse than mp3! (and i even wonder that it got rated that high, even at 128 it had this metallic sound sometimes) -> go away m$
- atrac3: even worse than wma9 -> go away sony
and if you take this test as a comparison between some online music stores (itunes vs. wma9 based ones vs. sonys new store) itunes clearly comes out as the winner, leaving wma9 behind by far!
ATRAC3plus is thier new codec
I do understand that Sony Connect service currently offer ATRAC3, so stick to other format for those 99 cents purchase
There are replies:
http://forums.minidisc.org/viewtopic.php?p=22345#22345 (http://forums.minidisc.org/viewtopic.php?p=22345#22345)
http://minidisct.com/forum/showthread.php?threadid=22995 (http://minidisct.com/forum/showthread.php?threadid=22995)
Which implementation of ATRAC3 did this test use?
I only see flac decompressing the wav, where does it originate from?
The hardware and the encoder in SonicStage may leads to different output
Daijoubu,
I also had hard time finding artifacts on all samples that I had time to listen to.
However, as in most sensory skills, practise makes perfect, so don't be too worried about your hearing being bad (in an absolute sense).
I've also trained using my own samples, lame/ff123/vorbis/klemm/vqf/MUS420/AES test samples, but clearly I still have a long way to go myself to properly hear even the most obvious of artifacts.
You are also right in assuming that attention plays a part in sensory detection. You wil hear/see/taste/feel more if you know what to "look" for.
You can use the user comments to find problem parts in samples:
http://www.rjamorim.com/test/multiformat12...s/comments.html (http://www.rjamorim.com/test/multiformat128/comments/comments.html)
I tried to put what goes wrong, how and at what time in the playback of the sample (didn't do this to even half the samples though) for _my hearing_.
Using those comments from various testers, it is possible to guide your attention to listen to for something specific and just repeat a certain part of the sample.
Be noted however that people are sensitive to different artifacts. I went through some of the ff123's/Pio's comments and I didn't pay any attention to some of the stuff he heard and found annoying.
Guess, I'll have to train some more
best regards,
halcyon
The test was mentioned on the Screensavers last night. Not much was said beyond mentioning that the winners were Ogg vorbis and Musepack. A link to the results were included in the shownotes (http://www.techtv.com/screensavers/shownotes/story/0,24330,3427263,00.html).
Patrick said that he didn't know how the tests were conducted (e.g. if they were blind etc), and that he was planning to download the test and try it out himself.
How many results were discarded because of ranked refs?
54
Mind you that I didn't discard results that ranked the reference but on that sample pair ABXd the samples to a pval of 0.05 or less.
Does that mean that all of the users rankings for that sample were thrown out or just for the codec(s) where they ranked the reference?
Was there any pattern in those 54 discarded results as to which codecs' were mis-identified? If, for example, half of those 54 were thrown out because they ranked the reference vs. MPC that would be somewhat interesting.
I guess I'm just looking for some informatin that would make me more comfortable that throwing out those 54 didn't distort results in any obvious way.
I did not expect that aoTuV became the first place.
This is delightful miscalculation.
@Acknowledgement
First, I appreciate very much people which performed the spontaneous comparison test and the tuning friend of Vorbis.
And it is thankful to people of Xiph.org including Monty which created libvorbis(& ogg) which is the code base of aoTuV. Vorbis is a wonderful format!
Finally, it is thankful to all the people concerned with this test.
Three things things perhaps worth considering in the future. I noticed these myself, but I'm not sure others see them as important:
1) ABCHR Java version has, imho, some issues:
- buffer length that is small enough for fast switching can cause a lot of skips/gaps on playback (at least on my system. I think Gabriel may also have mentioned this?)
- Often times, I found out that I was about to save a test result from the software with accidental "rankings". That is, I'd sometimes click accidentally on a sample that I knew was the reference, moving it's slider from 5.0 to 4.9 (this happens for instance when I switch from sound card volume mixer window back to ABCHR and accidentally click on the slider for a sample that I didn't mean to rate). Now, the problem is that this change from 5.0 to 4.9 on the UI itself is so small, that it almost went unnoticed by me on various occasions. So, I was about to save/return results where I had erroneously/unintentionally ranked the reference/original sample (even though I had ABXed the reference from the test sample). This would have lead to discarding of the results (i.e. wasted time for me and loss of data for the test). I wish there was a clear indicator (color or something) that showed when any of the slider had been moved from the reference position (even if only 0.1 points). Just a UI design issue and minor at that, but can lead to discarding of perfectly "good" data.
- It is impossible to select the Output sound card and/or the ouput method (DirectSound/WaveOut/Asio/Kernel). On cards that have broken DirectSound (like RME DIGI 96/8), this makes it hard/useless to use that card. I had to resort to my worse sound card, worse headphone amplifier and worse headphones due to this. Not that it necessarily altered my listening accuracy at all, but it was a bummer not to be able to use gear one is accustomed to. I wonder if there is any way around this limtation?
2) Intro: I've been reading Les Leventhal's AES papers, like "Type 1 and Type 2 errros in the Statistical Analysis of Listening Tests". Mr Leventhal is a psychologist who understands auditory testing and statistical analysis issues on the subject of significance leves (I recommend: J. Audio Eng. Soc, Vol 34, No 6, 1986 June as a starting point. He has further papers on the issue). While statistical analysis is not a substitute for a carefully thought out research methodology and test setup, it can help to analyse non-ideal settings with higher confidence.
Suggestion: Considering Leventhal's points and the impossibility of making a perfect test: most test don't even have a research question openly formulated, not to mention analysis of the testing methodology in reference to the research question, both of which could actually further validate the results AND limit the scope of conclusions which can be draw from the results.
With these in mind I'd suggest considering the use of fairness coefficients in the significance calculation (especially in test that have very small audio impairments and a low likelihood of detectability). Neurological research about diminishing of auditory evoked responses with repeat tests also appear to support this conclusion.
3) For general use, for learning how to listen / train one's hearing and for test likes the last 128 kbps test, could we build some general guidelines on how to conduct listening tests alone. That is, after somebody has offered the samples and the software, how does one actually carry out the listening and ranking, in order to get the best out of it.
This could include issues like volume setting, selecting a good time to test, pros/cons of repeated fast switching, re-inforcement of the neutral reference, attitutional motivation, attention guiding, etc. All these can have a slight and in some cases a dramatic effect on the overall results (not necessarily changing any codec rankings, but enabling testers to find more artifacts). I already know of the fine ff123's pages and they could serve as a starting point. We could inject some basic tips there culled from cognitive, neurological and audiological research. And your personal experience of course.
Unfortunately I'm not much of a person to help with issues 1 & 2 any further, but maybe others can consider them for future alterations, if they feel they are important.
In 3 I could perhaps contribute, if others are interested.
Would this be a good Wiki project? Should we start a new thread to discuss this, if there are any interested parties.
regards,
halcyon
Oops.
Last detail: files extension must be .txt :B
So please rename all xmls to txt (ren /s *.xml *.txt)
I think that's my "oops." I thought I made it accept the .arf and .xml extensions... I'll fix that and a couple other little things and upload a new version tonight sometime.
With issue 1, why not just invoke a dialog box saying "Are you sure you want to change the ranked file for sample x" if you try switching from rating A to B? That way you could click [Oh crap nooooo] and not lose your previous rating.
I would appreciate some advice on testing dos and dont's, basic technique combined with what to look for.
Three things things perhaps worth considering in the future. I noticed these myself, but I'm not sure others see them as important:
1) ABCHR Java version has, imho, some issues:
- buffer length that is small enough for fast switching can cause a lot of skips/gaps on playback (at least on my system. I think Gabriel may also have mentioned this?)
I am slowly working towards implementing encryption in abchr for windows. The first part of that is to be able to decode xml setup files. I'm currently figuring out how to use expat/arabica to implement a document object model for xml (yes, I could have used msxml.dll, but I want something that works for all windows users without having to ask them to install updated dlls).
Hopefully being able to use a native windows app on pc/windows systems should take care of the clicking issue.
- It is impossible to select the Output sound card and/or the ouput method (DirectSound/WaveOut/Asio/Kernel). On cards that have broken DirectSound (like RME DIGI 96/8), this makes it hard/useless to use that card. I had to resort to my worse sound card, worse headphone amplifier and worse headphones due to this. Not that it necessarily altered my listening accuracy at all, but it was a bummer not to be able to use gear one is accustomed to. I wonder if there is any way around this limtation?
In java, you are restricted to the java sound library. In abchr for windows, I only implemented wavOut playback, which is probably the most compatible method for existing PC's (plus it was convenient to use the MCI interface). I don't have plans to implement DirectSound or ASIO playback.
2) Intro: I've been reading Les Leventhal's AES papers, like "Type 1 and Type 2 errros in the Statistical Analysis of Listening Tests". Mr Leventhal is a psychologist who understands auditory testing and statistical analysis issues on the subject of significance leves (I recommend: J. Audio Eng. Soc, Vol 34, No 6, 1986 June as a starting point. He has further papers on the issue). While statistical analysis is not a substitute for a carefully thought out research methodology and test setup, it can help to analyse non-ideal settings with higher confidence.
This sounds interesting. I should note that the method Roberto uses for analyzing the results favors finding differences at the expense of higher type I errors -- i.e., it does not correct for multiple samples.
Currently the biggest remaining criticism I see in Roberto's tests are not statistical. I think the bitrate criticism should be tackled head on in future tests. Bitrates over multiple albums and bitrates over the sample set should be about the same, IMO. So that means choosing samples which might not at first glance appear to be "difficult."
You ask how does the test method affect the test? Well in this case, we have self-selected listeners and an abc/hr test method. The self-selection is probably amplifying the differences. In the general population, I'd bet the vast majority of people would not find the differences this group of listeners has.
The abc/hr and abx test methods are also very sensitive, and certainly not representative of real-world listening. I think it also has a tendency to over-amplify differences (although those differences are very real). Bottom line -- these tests are, if anything, too sensitive to represent everyday listening for the general population.
But for the people who actually care, they do a pretty good job of providing information on differentiating codec quality at a very subtle level.
ff123
This test was discussed slightly on the MiniDisc TBoard too -
http://www.minidisct.com/forum/showthread....&threadid=22995 (http://www.minidisct.com/forum/showthread.php?s=&threadid=22995)
Which implementation of ATRAC3 did this test use?
SonicStage2
I only see flac decompressing the wav, where does it originate from?
Decoding the Atrac3 and encoding to FLAC. There's no other way to distribute the Atrac3 samples.
Was there any pattern in those 54 discarded results as to which codecs' were mis-identified? If, for example, half of those 54 were thrown out because they ranked the reference vs. MPC that would be somewhat interesting.
Hrm... you would have to check the output of Chunky with the command line I posted earlier to see what results are being discarded, and then analyze these results one by one.
I gotta say this has done a lot to restore my confidence in Vorbis, and I'm probably not the only one. Mad props to Aoyumi.
Given that LAME is now nipping at the heels of iTunes at 128k, I do really have to wonder about the AAC encoders that didn't win the last AAC listening test. I wouldn't be surprised if LAME is now ahead of or at least tied with Nero AAC. Who wants to test?
Vorbis winning (I know, MPC is very close) a listening test? That didn't happen for a long time
As I said on #foobar2000 to someone saying the same thing: Learn stats, and post again.
I'm surprised this "claim" hasn't been debunked yet. Vorbis
did not win. Statistically speaking, it's more likely Vorbis is better than MPC than the other way around, but you cannot say that Vorbis won with any level of certainty. It'd be like 60% probability Vorbis is better, and 40% probability Musepack is better (I pulled these numbers out of my ass for a visual example). I believe this test was run with a significance level of 95%. Am I correct?
I believe this test was run with a significance level of 95%. Am I correct?
Erm... it's in the results page. Read the second sentence of "How to interpret the plots:"
Now, officially they are tied. But considering Vorbis' score is above MPC's confidence margin, I would say, with some confidence, that Vorbis aoTuV is
better than MPC, at this bitrate.
I'd like to see sometime a double test. Meaning, another test after the first one with another set of samples, and see how close the final results are to each others.
I'd like to see sometime a double test. Meaning, another test after the first one with another set of samples, and see how close the final results are to each others.
You are invited to conduce it
Which implementation of ATRAC3 did this test use?
SonicStage2
I only see flac decompressing the wav, where does it originate from?
Decoding the Atrac3 and encoding to FLAC. There's no other way to distribute the Atrac3 samples.
Real time recording via internal loopback?
Real time recording via internal loopback?
Total Recorder.
It seems people outside HA does not understand the language I speak, they interpret 'yes' as 'no' and viceversa, 'better' as 'worst', 'scientific, objective and repicable by yourself' as 'my mother did it and she owns the truth'
I'd like to see sometime a double test. Meaning, another test after the first one with another set of samples, and see how close the final results are to each others.
Couldn't you just split the current test into two 9-sample tests and pretend one was taken after the other. Comparing the results of these two 'sub-tests' would in effect be the same, and you've got the benefit of 49'000 different ways to create two sub-tests. I would imagine from the previous comments that you wouldn't find a great amount of discrepancy, I got the impression that gone are the days when WMA is best for classical and mp3 best for metal [span style='font-size:8pt;line-height:100%'](or whatever codec/genre associations there were)[/span]
It seems people outside HA does not understand the language I speak, they interpret 'yes' as 'no' and viceversa, 'better' as 'worst',
Calm down...
People who WANT to understand, will understand.
People who do not care - will not...
There is old russian saying, i will try to translate:
When you argue with a fool - take care, other people could see no difference
[between]
Erm... it's in the results page. Read the second sentence of "How to interpret the plots:
Now, officially they are tied. But considering Vorbis' score is above MPC's confidence margin, I would say, with some confidence, that Vorbis aoTuV is better than MPC, at this bitrate.
Haha, yeah, figured I could find it out in a few moments, but I didn't really have them when I posted.
You make sense with the confidence margin thing, true, but you're likely going to start confusing the less statistically minded unless you stick pretty hardcore to the 95% confidence interval information. Either that, or you qualify the hell out of any statement that doesn't comply to the 95% interval.
You make sense with the confidence margin thing, true, but you're likely going to start confusing the less statistically minded unless you stick pretty hardcore to the 95% confidence interval information.
It's no use. This test is not controlled enough to warrant sticking to the 95% confidence as if it was gospel. Differently from ITU tests, I have no control of participants' listening environment, equipment, training, fatigue, etc. (and that's why ITU tests are damn expensive)
These results are there just to give an idea of how codecs rank. They are not trying to be definitive in what they report. And people should still test for themselves to decide what codec beter suits them, and consider other features like availability, hardware support, etc, etc.
I'm surprised this "claim" hasn't been debunked yet. Vorbis did not win.
even if they are both tied (Vorbis and MPC) it means that vorbis (and (posibli) MPC) won.
Was there any pattern in those 54 discarded results as to which codecs' were mis-identified? If, for example, half of those 54 were thrown out because they ranked the reference vs. MPC that would be somewhat interesting.
Hrm... you would have to check the output of Chunky with the command line I posted earlier to see what results are being discarded, and then analyze these results one by one.
Oh, OK, I thought someone may have actually looked at those discarded results already.
Maybe someone can answer the other part of my question:
Does that mean that all of the users' rankings for that sample were thrown out or just for the codec(s) where they ranked the reference?
ok, so it looks like the vbr contenders did very well and itunes's cbr held its on. how safe would it be to assume that using vbr with AAC (for instance the most recent FAAC with FB2K) would be a contender?
There are replies:
http://forums.minidisc.org/viewtopic.php?p=22345#22345 (http://forums.minidisc.org/viewtopic.php?p=22345#22345)
http://minidisct.com/forum/showthread.php?threadid=22995 (http://minidisct.com/forum/showthread.php?threadid=22995)
This guy's signature is great!
Best portable setup = 128kbps MP3 (super high quality, > CD!) -> transcoded to the best codec in the world, uber high quality ATRAC3/LP4 (5000% better than SACD) -> NetMD (faster than ur sh*tty firewire) -> N710 (EU version with 1.2mW x2, OH YEAHHHH BABY!) + MDR-E808 (bestest hedfonez in teh world!)
This will shizz on all ur lame iPods! Its sooooo clear dat I can almos feel teh mud flwing dwn da waterfal!
Worst portable setup = CD -> WAV -> (WaveGain @ 87dB) -> iTunes 4.5/QT 6.5.1 encoded 224kbps AAC or ALAC -> 3G iPod + Etymotic ER-4P
I am actually impressed with most responses there, but apparently some believe that the test is not fair because ATRAC was not tested on a preferred hardware DAC.
The arguments at the minidisc forums about hardware encoded Atrac3 sounding better than software encoded make no sense. The opposite actually makes more sense. On hardware, you must be worried about real time encoding, voltage consumption and battery consumption. On software, you can go nuts.
So, if Sony cut corners somewhere, it must have been on hardware due to inherent limitations.
The arguments at the minidisc forums about hardware encoded Atrac3 sounding better than software encoded make no sense. The opposite actually makes more sense. On hardware, you must be worried about real time encoding, voltage consumption and battery consumption. On software, you can go nuts.
So, if Sony cut corners somewhere, it must have been on hardware due to inherent limitations.
Even worse is the fact that some people claim ATRAC3 sounds "better" decoded through Type-S or their 1-bit digital amps, so the test is therefore invalid
I don't think that some people understand the point of comparing lossy codecs: it's not to see which one sounds "warm" or "fat" or "has better bass," it's to compare artifacts, with the best codec having the least number of and/or least annoying artifacts. I want to smack people when they claim that while ATRAC3 sounds worse than MP3 on the computer, it will sound better going through their 1-bit digital amp. NOOOO!!! An artifact is an artifact...a phasey cymbal or dropout will still be there no matter how good your amp or boost boost is. I'm personally surprised that although many people claim to be able to discern the "higher quality" of a 1-bit digital amp on certain players, they apparently aren't able to pick out what are sometimes blatant artifacts. I wonder how much of that can be attributed to marketing?
Replies gathering at:
http://microsoftusernetwork.com/forum/viewtopic.php?p=275 (http://microsoftusernetwork.com/forum/viewtopic.php?p=275)
where the response from the forum moderator is surprisingly dismissive. Oh well.
ff123
I'm surprised this "claim" hasn't been debunked yet. Vorbis did not win.
even if they are both tied (Vorbis and MPC) it means that vorbis (and (posibli) MPC) won.
I agree. To have a winner, you must have a loser(s). And there are some notable losers in this test (ie. ATRAC3). Since Vorbis and MPC are statistically tied, they both won over the rest.
I've uploaded chunky-0.8.4 which fixes the filename extension problem. Also, I've changed the default behavior so that it discards files with ranked references (i.e. -p 0.0 is assumed unless specified otherwise).
You can get it, as usual, at http://www.phong.org/chunky/ (http://www.phong.org/chunky/)
Does that mean that all of the users' rankings for that sample were thrown out or just for the codec(s) where they ranked the reference?
Yes, the whole result for that sample is thrown out. To do otherwise would taint the results. Even if you just guessed without listening, you would get about half of them right - if you just discarded the wrong ones, you'd still have half left with completely invalid ratings. The only safe route is to toss the whole result file.
On the other hand, it is possible that the reference was ranked inadvertantly even if they did hear a difference (if it was very subtle). In those cases (i.e. when the differences are subtle), it's best to make an ABX test - if you are successful, the ranked reference won't cause it to be discarded. If you fail the ABX test, then you know you probably didn't hear a difference and you shouldn't rank the sample at all (leave it at 5.0).
I agree. To have a winner, you must have a loser(s). And there are some notable losers in this test (ie. ATRAC3). Since Vorbis and MPC are statistically tied, they both won over the rest.
I meant that Vorbis didn't win when compared to Musepack. That's all. I didn't mean globally. Sorry for the confusion.
The arguments at the minidisc forums about hardware encoded Atrac3 sounding better than software encoded make no sense.
How can you jump straight into the conclusion like that?
Hardware ATRAC3 encoder in MD player may use different codebase from software counterpart.
Sonic Stage to ATRAC3 maybe something like Blade is to MP3. We have to test it.
How can you jump straight into the conclusion like that?
Hardware ATRAC3 encoder in MD player may use different codebase from software counterpart.
Sonic Stage to ATRAC3 maybe something like Blade is to MP3. We have to test it.
<sigh>
Have you ever even bothered reading the rest of my post?
Here, let me give you some knowledge. That way, you will think twice before posting next time:
On hardware, a developer must be concerned about constraints like voltage consumption, battery consumption, real time encoding, less precision (no FPU), a fraction of the CPU clocks, etc.
On software, the developer can go nuts since none of those restrictions apply.
On codec development, the usual path is first creating a software implementation (that will also be later used for compliancy tests), and then start cutting corners and complexity for the hardware version until it reaches the desired performance.
FOR THAT REASON, I claim it's nonsense. I don't claim it's impossible, maybe Sony has some serious voodoo going on there. But it does go against common sense.
Common sense is that they aren't deliberately putting a worse version of Atrac3 "like blade is for MP3" on SonicStage for kicks and giggles.
You're welcome.
Regards;
Roberto.
Well, I'm not sure about MPC.
Before the test I mentioned that MPC perhaps is only as good because it uses very high bitrates on this problemsamples! But if the average bitrate is 128 for the tested qualitysetting, there should also be a lot samples with bitrates under 128kbits! Logical, isn't it?
The problem on this test is that most samples had high bitrates and the samples with small bitrates were not ranked as good!
For example you could also modify an mp3-encoder to user very high bitrates (160kbits) on difficult samples and very low (80kbits) on normal samples. In this thest it would probably be better thant the current lame-encoder but in practice there would be a lot of songs which would sound very bad!
I hope you understand what I mean. But perhaps my idea is totally wrong!?
Big_Berny
Congratulations to Roberto for once more pulling through a tough one. Great work, greatly appreciated.
Regarding the bitrate "issue"...
The encoders in the test were using standard settings, they were not specially tweaked for the test, and you can go ahead and use them with your songs.
So if some of the encoders have a flawed code to choose the bitrate in tough passages of music, well, it's their problem.
I think this test is really useful as an indication of which encoder does a better job with a setting that will end up giving an average 128kbps in a whole bunch of music. And I fail to see what's wrong with the idea.
Just to throw in my 2 cents, considering the ATRAC and WMA forums' responses:
I think we all more or less knew that there will be such reaction, when we post these results. I even think some hoped for such reaction, so they can say that these people make unsupported claims and such.
I myself trust the results, though I won't change my encoding habits: Lame aps for me, as I only have an mp3-capable portable. The people in the ATRAC/WMA forums won't do that either, IMHO, as they payed a lot of money to be able to use the formats they defend now.
And, to be honest, those people who really care about audio quality, end up at HA finally And those who don't help the sales of lower quality codec capable devices skyrocket, because they only listen to the commercials.
And don't tell me you didn't read the "2 cents" warning
One more on-topic question: is it possible to send these results to portable manufacturers? Would it make some reason if we'd make a thread for collecting contact email addresses, so we could mail most portable manufacturers, to give them a hint what to develop? E.g. Daisy MM (manufacturer of Diva) wrote me in an email, that they would consider implementing further codecs, if their licensing fees are fine. So why not give the companies a hint?
Have you ever even bothered reading the rest of my post?
I did read your post on minidisc forum (and agree with you on that sense) before posting that.
My point is
if they (minidiscers) claim that their MD hardware encodes better, then we should consider their claim. Similar to how we select the best encoder for other codecs in your test.
My point is if they (minidiscers) claim that their MD hardware encodes better, then we should consider their claim. Similar to how we select the best encoder for other codecs in your test.
consider and then dismiss, if there is no proof for the claim, other than general subjective opinions.
if there is (semi-) scientific proof, it will will be gladly accepted.
guess what's gonna happen.
Oh I forgot to say that I want to thank you nevertheless (see my last post), Roberto!
In the next listening we should use more songs with lower than average bitrates.
Big_Berny
My point is if they (minidiscers) claim that their MD hardware encodes better, then we should consider their claim. Similar to how we select the best encoder for other codecs in your test.
consider and then dismiss, if there is no proof for the claim, other than general subjective opinions.
if there is (semi-) scientific proof, it will will be gladly accepted.
guess what's gonna happen.
Well, they're the ones with the hardware.
Some one could easily record all the clips to their MD recorder via a digital link (from a non-resampling sound card), and then copy them all back into a PC. Then the three versions of each clip (original, software encoded, hardware encoded) could be the subject of a mini listening test. EDIT: like the lame 3.90.3 vs 3.96 test, not like the present one.
I'm not suggesting Roberto should carry out such a test - I'm just saying it would be easy to prove this one way or the other.
As you said, it probably won't happen. Let's face it, MD users aiming for decent results aren't using this setting anyway, because it adds audible artefacts so often.
Cheers,
David.
Some one could easily record all the clips to their MD recorder via a digital link (from a non-resampling sound card), and then copy them all back into a PC. Then the three versions of each clip (original, software encoded, hardware encoded) could be the subject of a mini listening test.
If have Net MD I will definitely do that test.
People who care about quality probably won't use MD.
Let's hope Apple implements VBR in their codec
The implementation in QT 6.5.1 / iTunes 4.5 is VBR see here (http://www.hydrogenaudio.org/forums/index.php?showtopic=21814&)
Let's hope Apple implements VBR in their codec
The implementation in QT 6.5.1 / iTunes 4.5 is VBR see here (http://www.hydrogenaudio.org/forums/index.php?showtopic=21814&)
ABR and recognizing silence is not the same as VBR.
If the codec knows its encoding difficult music, it still can't flex the bitrate higher.
If the codec knows its encoding difficult music, it still can't flex the bitrate higher.
How do you know its unable to do that?
If the codec knows its encoding difficult music, it still can't flex the bitrate higher.
How do you know its unable to do that?
Because you can only set bitrates and the produced file has exactly that average bitrate?
It can vary in the song, but it can't vary among songs.
The implementation in QT 6.5.1 / iTunes 4.5 is VBR
small clarification.
As far as I know there is no CBR aac encoders.
Two systems are used: VBR and ABR.
VBR - is quality based mode - bitrate is adjusted to maintain constant quality (measured by S/N ratio for example).
ABR - bitrate based mode. Average bitrate should be as defined.
CBR can (unsure) be used with aac, but by ITU specs it is not defined.
AAC always use ABR mode. If bitrate fluctuations are no more than defined by standart it is considered as CBR.
So, technically, you can consider iTunes encoding as CBR, if requrements mentioned above are met.
BTW: I remeber Ivan Dimkovic explained this somewhere on this forum, but I can't find it, period...
How do you know its unable to do that?
It seems (from testing, try it) that average bitrate remains the same...
As far as I know there is no CBR aac encoders.
iTunes 4.2 is CBR. Frames are at a constant bitrate. Search for Ivan's explination. I believe he refers to it directly as CBR actually.
One more on-topic question: is it possible to send these results to portable manufacturers?
Problem is, unfortunately, codec quality is one of the least concerns of hardware manufacturers. They have much more to worry about first: hardware requirements for decoding, ease to implement on hardware, licensing fees, user demand...
iTunes 4.2 is CBR. Frames are at a constant bitrate.
Uhh. Found it at the end.
Ivan: AAC is always variable bit rate with following rules:
1. Maximum number of bits per one frame is in range from 0 to 6144, multiplied by the number of channels
See this thread: http://www.hydrogenaudio.org/forums/index....showtopic=8835& (http://www.hydrogenaudio.org/forums/index.php?showtopic=8835&)
Well, I'm not sure about MPC.
Before the test I mentioned that MPC perhaps is only as good because it uses very high bitrates on this problemsamples! But if the average bitrate is 128 for the tested qualitysetting, there should also be a lot samples with bitrates under 128kbits! Logical, isn't it?
The problem on this test is that most samples had high bitrates and the samples with small bitrates were not ranked as good!
For example you could also modify an mp3-encoder to user very high bitrates (160kbits) on difficult samples and very low (80kbits) on normal samples. In this thest it would probably be better thant the current lame-encoder but in practice there would be a lot of songs which would sound very bad!
I hope you understand what I mean. But perhaps my idea is totally wrong!?
Big_Berny
You make it sound like MPC is tweaked for winning listening tests. I believe it is tweaked to sound consistent trough the entire track. I'm no expert, but it sounds to me like you kind of confuse VBR with ABR.
I'll try to explaing my view on this matter. Then some of the experts can correct it
The average bitrate you see with a certain VBR quality setting, is kind of a coincidence. When alot of encoded material has a certain consistent quality during the whole track, it just happens to average to e.g. 128 kbps. If you take the settings used in this test and encode metal or another demanding genre, you won't see this 128kbps average anymore. The way you explain it, it sounds like it has to sacrifice large parts of the song to be able to boost the bitrate on the hard parts. That's not the idea of VBR (more like ABR but not really that either). When a VBR setting uses only 80 kbps on certain parts, it's because it doesn't need more bits to reach the desired quality. It could have used more if it was needed.
Have you actually heard that the quality in between problem samples is lower? I'm not sure how people ended up providing the samples that they did, for this test. If they listened for problems, then from your reasoning, they wouldn't have provided the problem samples themselves, but rather the low quality parts between problem samples, as that low quality probably would stand out from the rest of the track.
CBR: Variable quality. High quality on easy parts, low quality on hard parts.
ABR: As constant quality as possible within the bitrate limitation. Use bits where they are most useful.
VBR: Constant quality.
So, from my point of view your idea is totaly wrong. I think it would be possible, and a cynical marketing department might be tempted, but I would say that it isn't very likely the case here. These codecs are used every day, right?
IIRC Nvidia or ATI or both, tried to optimize their graphics drivers to win 3Dmark tests and make their cards look better, but I'm not sure if they sacrificed anything by doing it though.
Now, am I totally wrong?
I wonder why the test didn't compare files that were the same size, though I know it would be a royal PITA to repeatedly encode to get that. And then somebody would complain that time to encode should be the equalizing measure ...
You make it sound like MPC is tweaked for winning listening tests.
No! Sorry if it sounded like this! I think that this is one of the objectivest audio tests!
Sorry you misunderstood me. It's very difficult to explain it for me cause I speak german.
I'll try it again:
I only try to say that MPC perhaps only is so good in this tests because we test only samples with high bitrates! (BTW I know what ABR and VBR is!)
In this test we used quality-settings to reach an average bitrate of 128kbits, right? I know that not the average bitrate of the samples should be 128kbits, but the average bitrate of hole musiccollections with different genres. Right?
If you now look at the bitratetable you'll see that most of the samples encoded with MPC have a bitrate over 128kbits! I don't say that MPC is optimized for the listening test, but its bitrate spreading is very high! And if we only test samples with high bitrates MPC will probably give good results. But there must also be songs with a bitrate under 128kbits because of the average bitrate of approx. 128kbits! Right?
And perhaps this "easy" samples could sound bad because they have a very lower bitrate than the other codecs at this qualitysetting!
Does someone understand this theory? It's only a theory without any prove! But if you look at the testresults you can see that the sample "debussy" with a very low bitrate sounded bad with MPC!
Big_Berny
Big_Berny
you are just saying that musepack has an efficient vbr model... but saying this as if it is wrong or confusing or unfair...
this makes no sense
later
@Big_Berny:
I see what you mean after looking at the data for the Debussy sample. At first I thought your reasoning behind this low bitrate was for the wrong reason, but now I see that it probably wasn't.
I would also like to see a test like the one you suggest. If musepack or any other codec, uses too few bits in places it should have used more, then I suppose it would be useful to investigate it.
Btw: Explaining what I mean in english, isn't my strength either...
Edit: Added the part about the Debussy sample
Big_Berny
you are just saying that musepack has an efficient vbr model... but saying this as if it is wrong or confusing or unfair...
this makes no sense
later
No, that's not what I want to say. (Very happy that upNorth seems to have understood)!
I want to say that MPC has a very strong VBR-mode with very different bitrates at the same qualitysettings. In this test there were bitrates from 91kbits to 155kbits. You can call it efficient if you want.
But the problem is that: Only two of the samples we tested had a bitrate under 128kbits! And one of them was rated very bad! I just want to say that perhaps the variation of the bitrate is too high but nevertheless MPC will be rated very good in this test overall because we (almost) only tested samples with bitrates over the average (for this qualitysetting).
In the next test we should perhaps also test some samples with very low bitrates because that could be a serious problem of MPC.
If you only test difficult samples on an codec with a high bitrate variation you'll get good ratings if the codec recognizes that the sample is difficult because it will give it a high bitrate. But on the other hand it gives very little bitrate to non-problem-samples so that they perhaps have a too small bitrate. And then MPC will sound worse than the other codecs which have a smaller bitrate variation and will give a higher bitrate on this sample.
Example: Sample Kraftwerk (high bitrate)
Codecs: iTunes MPC Vorbis Lame WMA Atrac3
Bitrates: 128 152 135 141 127 132
Ratings: 4.30 4.78 4.30 3.32 3.11 2.29
Example: Sample Debussy (low bitrate)
Codecs: iTunes MPC Vorbis Lame WMA Atrac3
Bitrates: 128 98 120 108 129 132
Ratings: 4.67 3.53 4.91 3.75 3.95 4.54
You see now what I mean? Perhaps MPC is like "too efficient"!
Big_Berny
To: Big_Berny, upNorth.
Hey, guys, problem with mpc bitrate exists and was discussed on the page 4 of this thread.
See ff123 comments on this issue and how to avoid this in the future, if possible.
EDIT: grammar.
@SirGrey: I read it now, thanx.
But I think you only mentioned that it is a problem of the test and not that it could be a problem of the MPC encoder!
Big_Berny
I understand the point of Big_Berny, never thought about that.
He means that perhaps mpc could be failing on easy tracks, and as the test is featuring mostly hard tracks, it is performing good.
It might also be a matter of track loudness. As an example, the current Lame version is likely to fail in vbr when using a low volume track, as it is not estimating loudness before encoding.
Perhaps this isn't the right time for me to comment on it but I think it would have been superior to test LAME 3.96 ABR instead of VBR, i.e. "--preset 128" or something similar. At lower bitrates ABR is generally considered more effective than VBR. Theoretically VBR should be best but that's not going to be the case in the real world always.
Of course I have not been keeping track of much that is going on so perhaps I missed something.
Perhaps this isn't the right time for me to comment on it but I think it would have been superior to test LAME 3.96 ABR instead of VBR, i.e. "--preset 128" or something similar. At lower bitrates ABR is generally considered more effective than VBR. Theoretically VBR should be best but that's not going to be the case in the real world always.
This was checked before the test. Based on pre-tests, vbr was choosed.
At lower bitrates ABR is generally considered more effective than VBR
For bitrates around 128kbps, it is now time to change this consideration.
Gabriel: I understand the point of Big_Berny, never thought about that.
Me too
ff123 mentioned that at page I point to... Never think that way before.
Interesting question to think of.
Big_Berny: But I think you only mentioned that it is a problem of the test and not that it could be a problem of the MPC encoder!
He-he.
That depends on your point of view and methodology.
Someone in that discussion porposed to use mpc with different setting for every song - to be sure avg bitrate is 128Kbit. So, in this case mpc will have no problems !
But from another perspective, using just one setting is much more consistant...
BTW: I never used mpc for encoding, except for testing.
As I understand, it is tweaked layer 2 encoder, so for bitrates less than ~130Kbit it should produce low quality output... (?)
May be somebody familiar with mpc could explain it's behaviour ?
Gabriel: ...as it is not estimating loudness before encoding.
Oh. Thing to do for version 4 ?
I understand the point of Big_Berny, never thought about that.
He means that perhaps mpc could be failing on easy tracks, and as the test is featuring mostly hard tracks, it is performing good.
Thank you! That's exactly what I mean! You explained most of my thoughts in one simple sentence...
Big_Berny
To: Big_Berny, upNorth.
Hey, guys, problem with mpc bitrate exists and was discussed on the page 4 of this thread.
See ff123 comments on this issue and how to avoid this in the future, if possible.
EDIT: grammar.
I had read the whole thread already, but forgot about that discussion, sorry.
I still have the feeling that the bitrate is a topic because of the fact that this is a 128kbps test. If the motivation for adding more samples with low bitrate is only, or partly, to make the average bitrate look better (closer to 128kbps), then I would say the results of such a test would be less interesting. As ff123 has said already, some problems arise because of a too low bitrate (as seen with Debussy sample), and as I see it, that is a valid reason for using more such low bitrate samples. Add it because it is another type of problem sample, not because it makes the average closer to 128kbps.
Shouldn't the samples also be picked so that all codecs has something to struggle with? Like two samples wma struggles with, two mp3 struggles with and so on? Or maybe that would be too much to ask if as many genres as possible should be covered?
My point of view is that when it comes to bitrate, the only thing that counts is the long time average. Doing some artificial tweaking to make all codec have the same average bitrate on all these short samples, would ruin the value of the test for me. For short I agree with the way things are done already.
Then a question, or more like hearing if my understanding is right: If a codec where perfect, wouldn't it then, at a specific quality setting, recieve the same rating for all samples?
I'm sorry if all of this is old "news" and covered in another thread, then I would be greateful if someone could point me too it.
I'm just trying to see if my understanding and way of thinking is right.
Btw: As it takes me a while to write, alot has happend in the meantime. I see now that Gabriel has picked up on the point Big_Berny made.
My point of view is that when it comes to bitrate, the only thing that counts is the long time average.
Yep. Of course. And here the problem with mpc lies.
So, to summarize, my IMHO, formed by last test and this thread:
1. MPC is a VBR encoder (or at least vbr is the mode it performs much better).
2. Test on many different albums showed (see 128Kbit test discussion thread) that 4.15 setting produces and average bitrate of ~130Kbit and that is ok.
3. The idea the samples for the test are selected is to make encoders job harder.
4. So, average mpc bitrate rises for test samples from 130 to 142Kbit.
5. To maintain avg mpc bitrate about projected (as a result - 136Kbit) additional easy sample(s) (one was selected ocassionally) was chosen.
6. MPC failed on this samples.
So, the question - do mpc have a high score because of it's quality or because of samples selected ?
Correct me, if I wrong somewhere...
BTW: ff123 idea to use equal number of overbitrated and underbitrated samples can correct a situation. May be. That's why I wrote, that error could be in test setup, not in mpc...
The Debussy sample is a great sample for Frank Klemm when he does future MPC tuning. If indeed the bitrate was too low for MPC to give good quality, then that is an issue that needs to be fixed.
From the Vorbis side, I think guruboolez tested the 1.0 encoder on one sample (may have been creaking sample or brahms) at q 0 that produced a low bitrate of 40 kbps or something. It sounded pretty bad. Monty took note of this and made some tweaks to produce 1.0.1 which now gives a more realistic bitrate (somewhere close to nominal 64 kbps) and sounds much better now.
Quickly contribution some more humor to this thread. Someone posted the results on some Windows tech forum, someone else found it funny to reply with this:
I have audible problems with LAME so I use Blade Enc and all's fine now.
http://www.neowin.net/forum/index.php?showtopic=171304 (http://www.neowin.net/forum/index.php?showtopic=171304)
QuantumKnot> I've sent last years two samples: Stockhausen - Stimmung (vocal music) and Liszt - Harmonies Poétiques et Religieuses (piano). Both at very low volume (original, I didn't changed it).
With 1.01 encoder, the problem corrected (or close to be so).
I've played yesterday with mpc 1.14 -q4.15. I've encoded 10 CD of piano music. Average bitrate was < 100 kbps (for a complete work of Erik Satie (5h30, digital recording), bitrate was ~90 kbps. In other words, the average bitrate of the Debussy isn't an accident... or is a very usual one!
Low volume is just a part of the quality problem. I've encoded a piece of contempory music, very quiet too, and without background noise (lossless encoding reached 20%): bitrate was inferior to 80 kbps, and quality was much better, far from the disaster of Debussy.wav. This mean that MPC could achieve good reproduction even with very low bitrate...
Problem for mpc is maybe low volume + background noise? To be verified...
Heh... just as the Slashdot hammering on my site starts to subside, I get Slashdotted again.
In Japan (http://slashdot.jp/article.pl?sid=04/05/27/020254)!!!
I wonder if the SNR there is as big as in slashdot.org...
Edit: BTW, I recommend you don't use Babelfish
Using part the some overseas, when the domestic technician builds up in the country, when it is called "domestic production" is many, because is.
So, including to the part, when it makes from one, it becomes "the purity domestic".
Your head will explode.
Roberto,
Can you clarify the sample sizes again? What your N means, is that the total people that listened to all songs or the number of people per song? And what is the sample size for the final ANOVA?
Btw, can you make the actual dataset available for other people to analyze?
As a final comment, I think this is too low a sample size.
Jaester
Can you clarify the sample sizes again? What your N means, is that the total people that listened to all songs or the number of people per song? And what is the sample size for the final ANOVA?
What do you mean by sample sizes?
N is the amount of results I received for that sample minus discarded results.
Btw, can you make the actual dataset available for other people to analyze?
It's already there.
http://www.rjamorim.com/test/multiformat12...s/comments.html (http://www.rjamorim.com/test/multiformat128/comments/comments.html)
As a final comment, I think this is too low a sample size.
?
Hi !
I just switched on my iRiver H120 in shuffle mode. I'm currently hearing a Madonna Song, I encoded many years ago with a Fraunhofer encoder in 128 kbps. It sounds very cool, probably due to the usage of intensity stereo. I guess LAME could benefit of intensity stereo usage in the 128-ish bitrate area. I really wonder what ranking it would have got...
I'm currently too lazy to search all postings but in case anyone knows: why has LAME been chosen for mp3 encoding in this test ? It lacks intensity stereo support.
I guess the aoTuV beta 2 encoder would sound lousy in the 128-ish area if it would not make use of intensity stereo (or point-stereo, whatever you want to call it)
bye,
SebastianG
As a final comment, I think this is too low a sample size.
Typically, a sample size of 30 is recommended to be representative of a population.
However, we already know going into these types of open tests that the group of listeners who'll respond are not going to be representative of the general population. So if you accept the proposition that this group of listeners represents the smaller population of "listeners who care (LWC)," then all you need are enough samples to produce a significant result.
You can get significant results this way from just
one person, as long as he represents the LWC (e.g. Dibrom tuning lame). However, more listeners are desirable, of course, to average out individual biases (for example, my limited high-frequency ability predisposes me to the sound of *gasp* wma9 standard). As was shown in the Vorbis and mp3 listening pre-tests, even a handful of listeners listening to multiple samples can produce reliable and accurate result.
ff123
I just switched on my iRiver H120 in shuffle mode. I'm currently hearing a Madonna Song, I encoded many years ago with a Fraunhofer encoder in 128 kbps. It sounds very cool, probably due to the usage of intensity stereo. I guess LAME could benefit of intensity stereo usage in the 128-ish bitrate area. I really wonder what ranking it would have got...
I'm currently too lazy to search all postings but in case anyone knows: why has LAME been chosen for mp3 encoding in this test ? It lacks intensity stereo support.
I guess the aoTuV beta 2 encoder would sound lousy in the 128-ish area if it would not make use of intensity stereo (or point-stereo, whatever you want to call it)
FhG does not use intensity stereo at 128 kbit/s. IS is a low bitrate technique, in the same vein as spectral band replication, and isn't meant to produce near transparent encodings.
However, that isn't to say that old FhG encodings can't sound competitive. Roberto's last mp3 test (designed to find the best mp3 encoder at 128 kbit/s, and which lame won) did not include the super slow FhG encoder, which many people with very good high frequency hearing might like best as their mp3 encoder at 128 kbit/s.
ff123
why has LAME been chosen for mp3 encoding in this test ? It lacks intensity stereo support.
IMHO because:
1. It is most widely used mp3 encoder.
2. It, I think, is on pair (or even better, may be) than old mp3enc31.
I 'm sure Gabriel could answer this question more correctly.
BTW, do you know that mp3enc31 cost 199$ ? And you can not purchase it now.
I think, most people used it illegally
Oh, and I think lame USES joint-stereo for 128Kbit bitrate.
And IS (intensity stereo) corrupts stereo image, thus it is not recommended for such a *high* bitrate as 128Kbit.
EDIT: ff123 was faster
EDIT2:
Roberto's last mp3 test (designed to find the best mp3 encoder at 128 kbit/s, and which lame won)
Forgot to mention it as 3. Sorry...
Lame is the most widely used MP3 encoder? Perhaps around here. I would have to say most people are using variations on royalty paying MP3 encoders, probably Fraunhafer, that are included in various all-in-one music solutions. Because of its so so legal status none of these programs can incorporate Lame, even if many popular applications work with it. What about Music Match, it comes on nearly every PC?
Frankly, I am amazed at all the tiny details that seem so fascinating around here.
IMO, Roberto's test is a blockbuster. Look at the politics. An unofficial build of the open source ogg vorbis encoder blows everything away. Two proprietary solutions, one from the hated MS and the other from Sony, a mega copyright holder, make a weak showing. The hightly compatible and easy to find Lame MP3 shows it has a bunch of life left in it. That is headline news in digital audio compression if there ever was any.
Lame is the most widely used MP3 encoder? Perhaps around here.
Heh. You are probably right.
Personally, I (nor my friends) 've never use any *box* solutions(except nero, which comes with all my writers), so I simply did not count them. My fault
Musicmatch, ITunes and so on have a huge auditory, really...
FhG does not use intensity stereo at 128 kbit/s. IS is a low bitrate technique, in the same vein as spectral band replication, and isn't meant to produce near transparent encodings.
However, that isn't to say that old FhG encodings can't sound competitive. Roberto's last mp3 test (designed to find the best mp3 encoder at 128 kbit/s, and which lame won) did not include the super slow FhG encoder, which many people with very good high frequency hearing might like best as their mp3 encoder at 128 kbit/s.
ff123
I've used an old l3enc which DOES make use of IS, even at 192 kbps.
As for "near transparency": Current Vorbis encoders make use of IS at up to -q5.99. They just don't call it Intensity stereo. Monty seems to have a very different (official) point of view regarding this. He talks about diffuse and point images in the specification. Well, it's basically the same as intensity stereo.
(Maybe seeing/interpreting things from a different angle helps avoiding patent issues, I don't know...)
Anyway, I'm surprised that LAME peforms so well WITHOUT Intensity Stereo in the 128-ish bitrate area - Same for FAAC. (no IS AFAIK)
I guess Vorbis will have strong competitorw when LAME and FAAC start making use of IS for that kind of bitrates (perhaps PNS for FAAC, too).
time will tell.
bye,
Sebi
BTW, do you know that mp3enc31 cost 199$ ? And you can not purchase it now.
I think, most people used it illegally
Believe it or not. I registered l3enc back in 1997 together with the WinPlay3 software for Win3.11
Oh, and I think lame USES joint-stereo for 128Kbit bitrate.
And IS (intensity stereo) corrupts stereo image, thus it is not recommended for such a *high* bitrate as 128Kbit.
Yeah, I wasn't talking about joint stereo coding. LAME does M/S coding as one possible Joint-Stereo coding technology but not IS.
How you define stereo image ?
It is a widely believed fact that we are unable to perceive phase differences of high frequencies, so IS is an appropriate tool, even for near transparency encodings.
bye,
Sebi
Roberto's last mp3 test (designed to find the best mp3 encoder at 128 kbit/s, and which lame won) did not include the super slow FhG encoder
It did. Audioactive (I.E, slowenc with some tunings done in AudioActive)
IMO, Roberto's test is a blockbuster.
Thank-you, but that's the reason conducing listening tests is nearly impossible now
I've used an old l3enc which DOES make use of IS, even at 192 kbps.
Buoy...
From l3enc documentation (taken from good ol' ReallyRareWares (http://www.rjamorim.com/rrw)):
*****For l3enc 2.0*****
For bitrates <= 96 kbps, the default is intensity stereo (-mod 1). For
bitrates >= 112 kbps, the default is ms-stereo (-mod 0). For
more details about encoding modes, please refer to section 1.11 'Encoding
Recommendations'
For coding of stereo files with bitrates <=96 kbps, the use of intensity
stereo is highly recommended. This is also the default configuration of
the encoder. Note, however, that the use of intensity stereo will destroy
information which is needed for sound processing schemes like
Dolby Surround. For bitrates >= 112 kbps, intensity stereo is not used by
default.
*****For l3enc 2.72*****
For the coding of stereo files with bitrates <=96 kbit/s, the encoder
will use the intensity stereo technique.
Note, however, that the use of intensity stereo may demage information
which is needed for sound processing schemes like Dolby Surround.
For bitrates >= 112 kbit/s, intensity stereo is not used.
What means that, if you got IS at 192kbps, you were messing where you shouldn't
As for "near transparency": Current Vorbis encoders make use of IS at up to -q5.99. They just don't call it Intensity stereo. Monty seems to have a very different (official) point of view regarding this. He talks about diffuse and point images in the specification. Well, it's basically the same as intensity stereo.
I always understood Vorbis' implementation as a variation on M/S stereo, not IS.
After all, it's very well known that IS completely ruins the stereo image. There were some pre tests for my listening tests that came to that conclusion (look for a post by tigre, IIRC)
Anyway, I'm surprised that LAME peforms so well WITHOUT Intensity Stereo in the 128-ish bitrate area - Same for FAAC. (no IS AFAIK)
Same for MPC and iTunes.
Actually, IS was once available in MPC, and IIRC Andree removed it because it had no place in a codec targeted at high quality
I guess Vorbis will have strong competitorw when LAME and FAAC start making use of IS for that kind of bitrates (perhaps PNS for FAAC, too).
I keep my point that using IS at bitrates above 96kbps is a very bad idea, except on very specific cases.
Regards;
Roberto.
Because of its so so legal status none of these programs can incorporate Lame, even if many popular applications work with it.
afaik, you are wrong, you are required t pay a license for the right to implement/use an MP3 encoder, so after that, you can use LAME if you want LEGALLY.
Roberto's last mp3 test (designed to find the best mp3 encoder at 128 kbit/s, and which lame won) did not include the super slow FhG encoder
It did. Audioactive (I.E, slowenc with some tunings done in AudioActive)
Audioactive is a different beast from the very slow codec, which is best represented by mp3enc31, or by using fastencc.exe in -hq mode (this version of the very slow codec has a higher lowpass than mp3enc31).
Audioactive/Opticom/"radium" can be grouped together, but not in the same family as mp3enc31/fastencc.exe -hq.
mp3enc31 is recognizable by low frequency glitches. Ironically, bAdDuDeX (an mp3 connoiseur from long ago), who could hear a 16 kHz lowpass in applaud.wav, loved mp3enc31 despite the glitching and despite its relatively low 14.5 kHz lowpass because it was free from high-frequency ringing.
ff123
Frankly, I am amazed at all the tiny details that seem so fascinating around here.
IMO, Roberto's test is a blockbuster.
I agree completely with both of these statements. They both, in their own way, provide interesting reading. Roberto's tests "always" provide informative, useful and constructive information, while the former simply serves to amuse and utter the occasional "WTF is that about?"
Later.
Heh... just as the Slashdot hammering on my site starts to subside, I get Slashdotted again.
In Japan (http://slashdot.jp/article.pl?sid=04/05/27/020254)!!!
I wonder if the SNR there is as big as in slashdot.org...
It seems to be even worse than the one in .org. I found some guys say that Sony rules and this test sucks, or they should adopt Japanese songs for samples. A guy who has his doubts about the test didn't even know that has been a double blind test
It is a widely believed fact that we are unable to perceive phase differences of high frequencies, so IS is an appropriate tool, even for near transparency encodings.
The problem with MP3 IS is that it´s not possible to restrict IS usage to certain frequencies - you can only switch stereo modes on a block level, not on a frequency one.
I've used an old l3enc which DOES make use of IS, even at 192 kbps.
Buoy...
Ahhh!
I don't know what was going wrong....
I just trimmed a frame out of the middle and checked for myself:
Header: FF FB 90 64 = 1111 1111 | 1111 1011 | 1001 0000 | 0110 0100
=>
MPEG 1, Layer 3, 128 kbps, 44100 Hz, Joint-Stereo
mode_extension = 10 => M/S coding: yes and IS coding: no.
I apologize for that. Might have been fuzzy memory or something.
Also, I wasn't that experienced back in 1997.
But I remember that i fed l3enc with out-of-phase high frequency sines that got cancelled ...
edit:
As for Vorbis: Trust me. It's intensity stereo for q<6 (called "point-stereo")
I stick to "IS is very powerful if done
right."
Vorbis Stereo Stuff (http://www.xiph.org/ogg/vorbis/doc/stereo.html)
At q<6 and at a certain frequency Vorbis encoders switch from lossless coupling to point-stereo. In point-stereo the angle value will always br zero. Therefore, the (unscaled) MDCT samples will be the
same for both channels after inverse square polar mapping. Intensity is controlled by the floor curves.
In Vorbis, decorrelation and intensity stereo is achieved by square-polar-mapping and channel-interleaved vector quantization.
bye,
Sebastian
It is a widely believed fact that we are unable to perceive phase differences of high frequencies, so IS is an appropriate tool, even for near transparency encodings.
The problem with MP3 IS is that it´s not possible to restrict IS usage to certain frequencies - you can only switch stereo modes on a block level, not on a frequency one.
well, in mp3 you can signal the use of IS stereo without using IS at all**.
if IS is used, you will have to use it for the whole frequency range beginning from the last scalefactor band down to some arbitrary but fixed scale factor band. you can use L/R or M/S coding for the lower bands.
---
**) I'm actually not sure if the sfb21--where no scalefactor band exists--wouldn't have to be IS coded with 0 degree direction in this case
It is a widely believed fact that we are unable to perceive phase differences of high frequencies, so IS is an appropriate tool, even for near transparency encodings.
The problem with MP3 IS is that it´s not possible to restrict IS usage to certain frequencies - you can only switch stereo modes on a block level, not on a frequency one.
This is a quote from the mp3 specification:
Intensity Stereo
This mode switch (found in the header: mode_extension) allows switching from 'normal stereo' to intensity stereo. The lower bound of the scalefactor bands decoded in intensity stereo is derived from the "zero_part" of the right channel. Above this bound decoding of intensity stereo is applied using the scalefactors of the right channel as intensity stereo positions. An intensity stereo position of 7 in one scalefactor band indicates that this scalefactor band is NOT decoded as intensity stereo.
I guess this means the encoder can choose some kind of split frequency. Below this frequency L/R or M/S coding is applied and above IS coding is used.
Agree ?
bye,
Sebastian
I guess this means the encoder can choose some kind of split frequency. Below this frequency L/R or M/S coding is applied and above IS coding is used.
Agree ?
yes.
M/S coding is some special case of doing some main axis transformation of the stereo plane and transmitting the rotation angle, the sum and the difference signal. for mid/side coding the rotation angle is fixed and is not transmitted.
IS coding is some simplification where you leaf out the difference signal.
ff123: mp3enc31 is recognizable by low frequency glitches. Ironically, bAdDuDeX (an mp3 connoiseur from long ago), who could hear a 16 kHz lowpass in applaud.wav, loved mp3enc31 despite the glitching and despite its relatively low 14.5 kHz lowpass because it was free from high-frequency ringing.
BTW, it was officially recommended to change it's default lowpass to -bw 15995 (number can be wrong).
And it's ability not to produce ringing and very small pre-echo always was a point all free developers gonna to achieve, as I remember.
Interesting, may be now, when mp3 seems to be already mature standart, we could find somebody from Fhg and ask how they avoid ringing in their encoder ?
If it is not already known, of course...
I've had a strong feeling that I already saw this discussion about mp3enc.
I've find it at the end: Test old Fhg encoder or not (http://www.hydrogenaudio.org/forums/index.php?showtopic=16270&st=50&)
I can also confirm that the current Vorbis encoders use a mix of lossless stereo (full mag, full ang preserved) and point stereo (zero ang) below q 6. Point stereo kicks in for components above a certain frequency which is dependent on the quality. For lower quality values, more point stereo is used, hence the recognisable 'stereo collapse'. It does not appear to be the optimal way of doing things but considering the quality we get from current Vorbis, it doesn't do a bad job either. Monty has plans of implementing a better stereo model.
Because of its so so legal status none of these programs can incorporate Lame, even if many popular applications work with it.
Lame project is only providing a technology implementation. It is up to the the company wanting to use it to acquire a patents license regarding the mp3 patents.
Several companies choosed this solution and are using Lame in their products.
Anyway, I'm surprised that LAME peforms so well WITHOUT Intensity Stereo in the 128-ish bitrate area - Same for FAAC. (no IS AFAIK)
There is absolutely no need to use IS @128 kb/s for MP3 or AAC.
And I also disagree with the claims that IS could bring good quality - there are lots of cases with stereo configuration impossible to code properly with IS, because IS saves only ILD information (level difference) and not ITD (time difference) and inter-channel cross corellation.
Equalized and mixed "left" (IS) channel could completely distort the phase information, and you end up with something which is quite different from the original when the coloration of the sound comes into the question.
Applaud is one of the examples that is impossible to code properly with IS.
Smart psychoacoustic would be able to disable IS for such frames, but @128 kb/s there woud be no need for lossy bit savings, same goes for PNS (in AAC) more or less- we did a lot of tests with PNS @128 kb/s and in most cases is pretty much useless, or degrades the quality.
I guess this means the encoder can choose some kind of split frequency. Below this frequency L/R or M/S coding is applied and above IS coding is used.
Agree ?
yes.
M/S coding is some special case of doing some main axis transformation of the stereo plane and transmitting the rotation angle, the sum and the difference signal. for mid/side coding the rotation angle is fixed and is not transmitted.
IS coding is some simplification where you leaf out the difference signal.
Thanks alot for the explanations. I want to apologize for making obviously wrong statements about MP3 IS.
And I also disagree with the claims that IS could bring good quality - there are lots of cases with stereo configuration impossible to code properly with IS, because IS saves only ILD information (level difference) and not ITD (time difference) and inter-channel cross corellation.
[...]
Smart psychoacoustic would be able to disable IS for such frames, but @128 kb/s there woud be no need for lossy bit savings, same goes for PNS (in AAC) more or less- we did a lot of tests with PNS @128 kb/s and in most cases is pretty much useless, or degrades the quality.
Thanks for your reply.
Let's compare ogg to aac. Monty said once, lossless coupling would be like wasting space for 128kbps and 160kbps modes. You guys keep telling me IS is inappropriate for that bitrates. Sure, this mapping is irreversible and only preserves the channel's energy levels - not all their phase relations. But AFAIK phase relations are not that important to us at above 10 kHz because the wavelength is already very short. So if an advanced encoder would make use of this psychoacoustic effect
properly by using IS this could save some space and allows to use smaller scalefactors to improve the SNR.
AFAIK IS can be switched on/off for each scalefactor band (AAC). Another cool thing: IS can be done in-phase and out-of-phase. How about the following sheme for scalefactor bands above 10 kHz:
- treat MDCT samples for a scalefactor band as multidimensional vector
- compute the cosine of the angle between L and R by cos_a := \frac{<L,R>}{||L|| ||R||}
- use in-phase IS for cos_a > 0.5
- use out-of-phase IS for cos_a < -0.5
These thresholds (in this case 0.5) could be chosen depending on the quality-preset and frequency area.
Well, I don't know, if Intensity Stereo is or is not appropriate for 128 kbps. But I
do know that Vorbis makes use of it and "won" the listening test.
edit: corrected cos_a correlation formula
bye,
Sebastian
Suprisely! How it is possible Lame mp3 better than Itunes aac?
are previous tests wrong?
Suprisely! How it is possible Lame mp3 better than Itunes aac?
are previous tests wrong?
the encoder-version + settings used for lame mp3 during this test for sure weren't the same as in previous large-scale tests.
Lame is not = Lame. In recent versions, lame seems to have made good progress in improving at mid-bitrates. You can see this in the lame 3.90.3 vs. lame 3.96 thread.
- Lyx
Suprisely! How it is possible Lame mp3 better than Itunes aac?
are previous tests wrong?
the encoder-version + settings used for lame mp3 during this test for sure weren't the same as in previous large-scale tests.
It's also worth mentioning that Lame is not better in the test. It's officially tied, with a tendency to be a little worse.
ok, so it looks like the vbr contenders did very well and itunes's cbr held its on. how safe would it be to assume that using vbr with AAC (for instance the most recent FAAC with FB2K) would be a contender?
okay, it has been proven that iTunes AAC is better then FAAC for instance.
http://www.rjamorim.com/test/aac128test/results.html (http://www.rjamorim.com/test/aac128test/results.html)
but how sure could one say how good iTunes AAC would be if it had VBR implemented?
and, one silly question:
how sure is it how good/bad one encoder would perform at higher bitrates,e.g. 160kbps?
i mean is it right to say that vorbis for instance can reach transparent level at a lower bitrate then MPC or AAC?
okay, it has been proven that iTunes AAC is better then FAAC for instance.
http://www.rjamorim.com/test/aac128test/results.html (http://www.rjamorim.com/test/aac128test/results.html)
This is an rather old test, here are newer test results: http://www.rjamorim.com/test/aac128v2/results.html (http://www.rjamorim.com/test/aac128v2/results.html) but iTunes is still the winner.
how sure is it how good/bad one encoder would perform at higher bitrates,e.g. 160kbps?
i mean is it right to say that vorbis for instance can reach transparent level at a lower bitrate then MPC or AAC?
You can´t extrapolate the results! WMA is generally better than MP3 at 64kb/s, but at 128kb/s MP3 is better.
You can´t extrapolate the results! WMA is generally better than MP3 at 64kb/s, but at 128kb/s MP3 is better.
okay, thank you, but
i mean is it right to say that vorbis for instance can reach transparent level at a lower bitrate then MPC or AAC?
i assume we would need a listening test at this high bitrate, right?
i assume we would need a listening test at this high bitrate, right?
Indeed
even if i accounted harashin's private listening test (http://209.152.181.168/~hydrogen/index.php?showtopic=21916&st=50&hl=) i wouldn't be able to do so?
see, we know auTuV is very good at 128kbps and at ~200kbps (at least for harashin).
now we should be able to say how Vorbis would perform at around 160kbps shouldn't we?
and what about my other question:
but how sure could one say how good iTunes AAC would be if it had VBR implemented?
My post will not be very informative
but how sure could one say how good iTunes AAC would be if it had VBR implemented?
No one knows.
As example - Fhg mp3 encoders on low bitrates tends to be better with CBR, than VBR (search forum, if you wish to have more info) but no one would tell (I hope ) VBR is loosely implemented there...
VBR have it's own problems, as discussed in this thread...
EDIT: grammar
oh gosh!
trying is superior to studying - isn't it?
(well, I tried to translate a german saying)
okay.now for me it IS like this:
aoTuv is superior to iTunes AAC even at 160kbps and above and second personal truth is iTunes AAC would be better with VBR implemented presumed that it is decent implemented. harrrharrr!
I wonder how ATRAC3plus performs...it's a pity that it wasn't included in the test
It's currently not possible to oppose atrac3+ to other encoders at 128 kbps. For a simple reason: there's no 130 kbps mode with current and public atrac3+ encoder. Only low bitrate (48 & 64 kbps) and high bitrate (256 kbps) setting.
It's currently not possible to oppose atrac3+ to other encoders at 128 kbps. For a simple reason: there's no 130 kbps mode with current and public atrac3+ encoder. Only low bitrate (48 & 64 kbps) and high bitrate (256 kbps) setting.
interesting...so I wonder what bitrate is used in Sony's music store...I read they use ATRAC3plus...don't tell me they use 64kbps
Hello, I'm back from holydays...
I just wanted to point out an odd thing that happened to me during this test : contrary to the common way of things, I could only ABX the 7th sample (gone) with speakers, and not with headphones ! (Dynaudio Gemini speakers vs Sennheiser HD600 headphones).
A picture of me ABXing "gone" : listeningtest.jpg (http://perso.numericable.fr/laguill2/pictures/listeningtest.jpg)
To avoid any background noise, the picture is video-projected on a screen in front of me, and the computer is in the next room. 5 meters mouse, keyboard, SPDIF, and DVI cables.
Mhhh, actually, I must admit that my speaker setting is often refered by my father as "the biggest headphones I've even seen".
A picture of me ABXing "gone" : listeningtest.jpg (http://perso.numericable.fr/laguill2/pictures/listeningtest.jpg)
To avoid any background noise, the picture is video-projected on a screen in front of me, and the computer is in the next room. 5 meters mouse, keyboard, SPDIF, and DVI cables.
Mhhh, actually, I must admit that my speaker setting is often refered by my father as "the biggest headphones I've even seen".
A picture of me ABXing "gone" : listeningtest.jpg
Nice settings
contrary to the common way of things, I could only ABX the 7th sample (gone) with speakers, and not with headphones ! (Dynaudio Gemini speakers vs Sennheiser HD600 headphones).
Impressive!
Can you describe the artifact you can only detect with your speakers, and not your headphones?
From the "user comment" section, matching the samples ID with the filenames, then the filenames with the codecs, I got
Lame : 5/5
AAC : 4/5 : Ringing on the first guitar note.
ABX 13/16 from 8.3 to 22 s
Musepack : 5/5
Atrac : 4/5 More treble from 17s
ABX 15/16 from 17.17 to 24.21 s
Vorbis : 5/5
WMA : 3.5/5 : More treble when the guitar comes in
ABX 13/16, from 8 to 26.6 s
When I say "more treble", it is just a subjective impression. It does not necessarily mean the the treble level is higher, but rather the the treble sounds brighter.
Edit : these speaker have not a linear response. The tweeter is set 1.5 db louder than the woofer. The crossover frequency is 2 kHz. I usually cancel this with Foobar convolver, but with ABCHR I couldn't.
Edit2 : the fact that they are rather to the sides of the listener than in front of him makes treble even harsher (this is the case with audio sources when they are to the side of the listener). But that's the way I often listen to music. I did not move the speakers especially for the test.
I thought it was quite impressive that you look so much like your avatar.
Hello!
I would like to know if the poor results of lame aren't due to its popularuty. The more you listen to a format, the easier it is for you to recognize its artefacts, right? Maybe the people who participated where more able to recognize MP3 than other formats. What do you think?
Hello!
I would like to know if the poor results of lame aren't due to its popularuty. The more you listen to a format, the easier it is for you to recognize its artefacts, right? Maybe the people who participated where more able to recognize MP3 than other formats. What do you think?
[a href="index.php?act=findpost&pid=248163"][{POST_SNAPBACK}][/a]
Well, I wouldn't consider Lame's resulst poor. It ended up tied with AAC - that is supposed to sound much better!
Now, for your concern: I think the artifacts that happen on lossy music are pretty much the same across all formats. So, if you learn to distinguish pre-echo, smearing or stereo collapse on MP3, you will probably detect these same artifacts in AAC, Vorbis, MPC... if they are there.
That's just a supposition though, maybe MP3's popularity did affect its results in some way...
The more you listen to a format, the easier it is for you to recognize its artefacts, right?
[a href="index.php?act=findpost&pid=248163"][{POST_SNAPBACK}][/a]
I don't think so. For exemple, tons of people are listening to vorbis for years, and still can't detect anything wrong in stereo image or timbre coarseness...
To recognize with ease artifacts, you probably need to track them. It's an
active attitude, opposed to the daily listening, which is
passive.
On the other side, artifacts don't really differ from one encoder to another. mp3, aac, mpc, wma, atrac3... are really close each others. SBR (mp3pro, he-aac) introduce specific problems in addition to the previous one; vorbis is also slightly different (see above); hybrid encoders produce noise. But most artifacts (pre-echo, warbling, chirping, metallic sound...) are common to all transform encoders.
The more you listen to a format, the easier it is for you to recognize its artefacts, right?
[a href="index.php?act=findpost&pid=248163"][{POST_SNAPBACK}][/a]
I don't think so. For exemple, tons of people are listening to vorbis for years, and still can't detect anything wrong in stereo image or timbre coarseness...
To recognize with ease artifacts, you probably need to track them. It's an active attitude, opposed to the daily listening, which is passive.
On the other side, artifacts don't really differ from one encoder to another. mp3, aac, mpc, wma, atrac3... are really close each others. SBR (mp3pro, he-aac) introduce specific problems in addition to the previous one; vorbis is also slightly different (see above); hybrid encoders produce noise. But most artifacts (pre-echo, warbling, chirping, metallic sound...) are common to all transform encoders.
[a href="index.php?act=findpost&pid=248177"][{POST_SNAPBACK}][/a]
That's what most of the people who participated to this test did, I guess. We're not talking about average music listeners here, but people who have "trained ears".
We're not talking about average music listeners here, but people who have "trained ears".
[a href="index.php?act=findpost&pid=248182"][{POST_SNAPBACK}][/a]
Sorry, but when you said "
The more you listen to a format, the easier it is for you to recognize its artefacts, right?" I thought you made a general assumption.
To answer to this: many results were sent by people which are not trained. Take a look to the overall notation: wma@128 is "near transparent" according to the test. It can't be true for someone having a small experience in artifacts hunting.
Ok. Thank you both for your answers.
Roberto> what software did you used to obtain wma9 files? Is it VBR-2 pass 128 kbps? What decoder? I've tried to reproduce the same wavform with different settings, and I wasn't able to do it.
I already asked him about this.
http://www.hydrogenaudio.org/forums/index....ndpost&p=210584 (http://www.hydrogenaudio.org/forums/index.php?showtopic=21370&view=findpost&p=210584)
EDIT:It's certainly Bitrate VBR 128kbps, 44kHz, stereo VBR 1pass.
I don't get that. From what I have seen, for 1 pass WMA VBR you cannot specify a bit rate at all, only the "quality settings"such as 50, 75, 90, etc.
With two pass WMA VBR you specify an average bit rate.
You state it was WMA one pass 128 kbps VBR. How could that be?
Windows media encoder allows you to do 1-pass bitrate vbr. It is somekind of ABR.
Windows media encoder allows you to do 1-pass bitrate vbr. It is somekind of ABR.
[a href="index.php?act=findpost&pid=249646"][{POST_SNAPBACK}][/a]
I guess I've only tried WMA VBR using DBPoweramp. For 1 pass VBR, there are no bit rate settings, just the quality settings like 50, 75, 90, etc. With the two pass VBR, you set a target bit rate. (I guess they figure that with two passes they can come closer to a target bit rate, but not with one pass.)
Surprised it's different in WME. It doesn't have the "quality settings"?
It has them also.
I used two-pass VBR (Bitrate VBR)
In some cases tests like this are not very subjective -- for example an Opera lover will probably cringe at listening to tracks of "House Music" played with ANY CODEC and probably vice versa. It's almost impossible to find music that everyone likes which rather invalidates some of the test findings.
I've tried the new HI-MD minidisc units from Sony particularly the NH-1 and I'm pretty fussy with my music. The HI-SP (Atrac3 +) format seems to me certainly for music on the move or when wearing some decent cans as good as CD (also CD's have pretty varying quality as well).
For Classical Music which on the whole has a higher dynamic range than most rock type music then MP3's can sound pretty hopeless. Acoustic instruments also tend to sound somewhat "quirky" on MP3's as well whereas the more "electronic sound" of dance music tends to hide some of the more obvious problems with MP3's especially at the lower bit rates.
My main problem with ATRAC3 + is some of the really STUPID DRM problems which make copying and distributing YOUR OWN MUSIC a real pain.
-K
You are right, placebo effect is way more suggestive, and often works quite well.
You are right, placebo effect is way more suggestive, and often works quite well.
[a href="index.php?act=findpost&pid=268810"][{POST_SNAPBACK}][/a]
Acoustic instruments also tend to sound somewhat "quirky" on MP3's as well whereas the more "electronic sound" of dance music tends to hide some of the more obvious problems with MP3's especially at the lower bit rates.[a href="index.php?act=findpost&pid=268808"][{POST_SNAPBACK}][/a]
My experience is the exact opposite (at high bitrates at least). Guruboolez' harpsichord and orchestral samples sound transparent to me, while the electronic boxes of Amnesia, Fsol, Autechre, Spahm, Astral, Transwave etc. sound ugly to me once encoded.
Anyone who is basing their ideas off this listening test is taking a bit of a risk. This test is very good and I commend rjamorim for taking his time to conduct it, however I don't believe there was nearly enough testers to validate any accurate data, and therefore come to any valid conclusions. I think this test should be redone and spread much more widely over the internet audio boards, not just this one. Then we could formulate some accurate conclusions. In my opinion, there is just not enough data to do that.
I think this test should be redone and spread much more widely over the internet audio boards, not just this one.
Yes I think that's right to say that not enough people did participate.
Still, IIRC there were anouncements made on other boards about this test. I personnaly made one there (on a French popular board, but not as specialized as HA is about audiocoding):
http://forum.hardware.fr/forum2.php?config...sh=0&subcat=131 (http://forum.hardware.fr/forum2.php?config=hardwarefr.inc&post=66496&cat=3&cache=&sondage=0&owntopic=0&p=1&trash=0&subcat=131)
To tell the truth of what I think: not so many people really want to spend some time testing different samples. Not even mentionning those who don't know what an ABX test is and claim that everything is just like "night and day" or so.
Formal listening tests conduced by the ITU and EBU sometimes use as few as 9-10 listeners. Trained listeners, of course, but still, it's quite few compared to the amount of people that participated in some of the samples of this test...
Formal listening tests conduced by the ITU and EBU sometimes use as few as 9-10 listeners. Trained listeners, of course, but still, it's quite few compared to the amount of people that participated in some of the samples of this test...
[a href="index.php?act=findpost&pid=273760"][{POST_SNAPBACK}][/a]
They also use reference systems, though. The downside to these internet tests is the wide array of equipment, which means the transparency threshold is often quite low. But then again, it reflects the real world nicely.
Anyone who is basing their ideas off this listening test is taking a bit of a risk. This test is very good and I commend rjamorim for taking his time to conduct it, however I don't believe there was nearly enough testers to validate any accurate data, and therefore come to any valid conclusions. I think this test should be redone and spread much more widely over the internet audio boards, not just this one. Then we could formulate some accurate conclusions. In my opinion, there is just not enough data to do that.
[a href="index.php?act=findpost&pid=273756"][{POST_SNAPBACK}][/a]
I think the conclusions reached were quite valid and accurate -- for the goup of people who participated and for the samples listened to. That was the whole point of doing a statistical analysis.
If one wants to generalize to a larger group of people or a different set of samples, yes there is a bit of a risk, but the results are probably not far off the mark. A different sample set would probably get you the most different results. And of course trying to apply group results to a particular individual is quite a bit more risky. I would say that the variations are bigger from individual to individual than from one group to another.
ff123
Yes, there are many factors and variables that ruin the validity of the test. One being, which you named, the audio equipment being used to do the testing. Most users have shit audio equipment, therefore their results are pretty poor and innacurate. Secondly, many people, like you said, don't even know what the hell ABXing is, so you can tell by that that they don't know much about audio. Their ears, and/or listening skills probably suck. This would dramatically alter the results of the test.
Anyways, the test is better than no test. It gives us a reasonable idea, but not accurate enough, in my opinion, to really make any conclusive judgements.
I would be interested in gathering a group of good listeners that have quality equipment. I think we should have enough here. I myself have Etymotic Research ER-4s, which are basically the best you can get as far as equipment goes.
I would be interested in gathering a group of good listeners that have quality equipment.[a href="index.php?act=findpost&pid=274122"][{POST_SNAPBACK}][/a]
And, by doing that, you would conduce a test that would only have meaning to people with good listening and quality equipment
By accepting everyone and all equipment on my test, I got much closer to the average user than if I only targeted it at golden ears with headphones that cost more than 100 dollars.
no audiophile will encode in 128kbps anyway...so that was a real world test with real people that use that bitrate...nothing wrong with it and well done
Yes, there are many factors and variables that ruin the validity of the test. One being, which you named, the audio equipment being used to do the testing. Most users have shit audio equipment, therefore their results are pretty poor and innacurate. Secondly, many people, like you said, don't even know what the hell ABXing is, so you can tell by that that they don't know much about audio. Their ears, and/or listening skills probably suck. This would dramatically alter the results of the test.
Anyways, the test is better than no test. It gives us a reasonable idea, but not accurate enough, in my opinion, to really make any conclusive judgements.
I would be interested in gathering a group of good listeners that have quality equipment. I think we should have enough here. I myself have Etymotic Research ER-4s, which are basically the best you can get as far as equipment goes.
[a href="index.php?act=findpost&pid=274122"][{POST_SNAPBACK}][/a]
You keep saying "invalid" and "inaccurate." But in what way? As in the rankings would have produced a different order of rankings, or another winner or loser? I don't think so. The effect of having different setups, in my opinion, is to add random variability to the results, so that the uncertainty is greater. But I don't think it would add a bias, i.e., change the order of the rankings by much, if any.
ff123
Secondly, many people, like you said, don't even know what the hell ABXing is, so you can tell by that that they don't know much about audio. Their ears, and/or listening skills probably suck. This would dramatically alter the results of the test.
I just meant that most of these people will not bother testing, not that they will do the test WITHOUT knowing what an ABX test is.
This test is very good and I commend rjamorim for taking his time to conduct it, however I don't believe there was nearly enough testers to validate any accurate data, and therefore come to any valid conclusions.
I would be interested in gathering a group of good listeners that have quality equipment. I think we should have enough here. I myself have Etymotic Research ER-4s, which are basically the best you can get as far as equipment goes.
First you say there are not enough participants, then you want to conduct a test with a very selective group...
The test does not claim to be more than it is, it is not the definite anwer to which codec is "best". However, it does give an good indiciation what is good and bad (or perhaps i should say "not so good"?) for the average user. There is no more inaccuracy in the test than the error bars in the graphs suggest.
A tests which proves codec A does better than codec B on some 10.000$ piece of equiptment is completely useless for most people.
I would be interested in gathering a group of good listeners that have quality equipment.[a href="index.php?act=findpost&pid=274122"][{POST_SNAPBACK}][/a]
And, by doing that, you would conduce a test that would only have meaning to people with good listening and quality equipment
It would have meaning for people with poor equipment as well; the quality headroom will be bigger with the winning codec. Even if the current equipment isn't good enough to reveal flaws at 128kbps, I bet most people would always want to encode with the best format at that bit-rate, rated by people where the equipment/hearing isn't the bottleneck.
When hifi mags test speakers, they tend to use the best possible cables, amplifiers and most trained ears. That doesn't make the test useless to people with average hearing/equipment
A tests which proves codec A does better than codec B on some 10.000$ piece of equiptment is completely useless for most people.
This is exactly my point. Crappy equipment, and poor ears don't provide and accurate data. Yes, it may be real world to the majority of listeners. But, nevertheless, it does not prove anything substantial. For all we know these users were guessing. I would trust good ears and good equipment with a small majority of users, over poor ears and crappy equipment with a large majority of listeners.
For all we know these users were guessing. [a href="index.php?act=findpost&pid=274486"][{POST_SNAPBACK}][/a]
Do you even know how ABC/HR testing works?
For all we know these users were guessing.[a href="index.php?act=findpost&pid=274486"][{POST_SNAPBACK}][/a]
Nice one...
A tests which proves codec A does better than codec B on some 10.000$ piece of equiptment is completely useless for most people.
This is exactly my point. Crappy equipment, and poor ears don't provide and accurate data. Yes, it may be real world to the majority of listeners. But, nevertheless, it does not prove anything substantial. For all we know these users were guessing.
Please read up on how the test was performed. You cannot make any valid conclusion from it if you do not understand how to interpret the data.
For all we know these users were guessing. [a href="index.php?act=findpost&pid=274486"][{POST_SNAPBACK}][/a]
Do you even know how ABC/HR testing works?
[a href="index.php?act=findpost&pid=274491"][{POST_SNAPBACK}][/a]
@rjamorim: As I see, you couldn´t resist the oportunity.
@jmitch: Well, that´s what a listening test´s supposed to be, isn't it? A "so good" audio equipment is not that relevant.
SoNiX
Crappy equipment, and poor ears don't provide and accurate data. [{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=274486")
The accuracy of the answers is given in the test results. It is 95 %.
Explanations : [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=16295&]http://www.hydrogenaudio.org/forums/index....howtopic=16295&[/url]
The ABC/HR method : http://ff123.net/abchr/abchr.html (http://ff123.net/abchr/abchr.html)
The Anova analysis (which gives the 95 % above) : http://www.psychstat.smsu.edu/introbook/sbk27.htm (http://www.psychstat.smsu.edu/introbook/sbk27.htm)
For all we know these users were guessing. I would trust good ears and good equipment with a small majority of users, over poor ears and crappy equipment with a large majority of listeners.
[a href="index.php?act=findpost&pid=274486"][{POST_SNAPBACK}][/a]
Just to satisfy yourself, why don't you take the test with your gear? All the samples and ABX software are still there. when you are done, just for yuks, go slumming and borrow someones "crappy" sub $100 phones and see if that makes a difference (not in how good things sound, just in how they affect your ability to abx a codec)
Let us know how it went.
edit: fixed quote markers
Crappy equipment, and poor ears don't provide and accurate data.
I participated in this test myself (when it was run). I used:
- RME Digi 96/8 PAD (professional level 96kh/24bit sound card with 1:1 bit accuracy and very nice analog measurements)
- High quality minimum capacitance shielded interconnects
- Meier Audio Pre head (very high quality solid state headphone amp)
- Sennheiser HD600, AKG K271s, Ultrasone HFI-650 and Etymotics ER4p/s headphones (Ultrasones and etymotics getting most of the listening time)
- A quiet room with a silenced computer
My ears have been tested to be flat to 11 kHz (with less than average attenuation after that up to 14kHz, the maximum that the test equipment at national hearing clinic was able to test).
I have c. 4 years in trying to get into lossy audio, perhaps 3 of that with slowly increasing listening acuity. I've gone through several training sessions with the AES "Perceptual Audio encoders - what to listen for" CD, as well as many example samples from here, mpeg development archives and previous listening tests. I've also purchased and gone through the Moulton labs "Golden Ears" hearing training cd set. In addition, I regularly audition new hifi and high end gear (and also write about hifi to a national publication). I think my hearing (both as an instrument and as a skill) is better than average.
While I'm far from being a "golden ear" I can invalidate the above argument by saying that neither my equipment or hearing is crap.
My results didn't significantly differ from that of the statistical averages in this test.
While neither my hearing or equipment are not "best in class", I think they are clearly better than average population in both cases. It is debatable how good they are, but surely not crap.
As such, I don't think the test can be in invalidated by only referring to "crappy equipment and poor listeners".
Had I magnificiently surpassed every other listener in this test by picking out artifacts other couldn't hear, I could _perhaps_ be willing to entertain the possibility of the argument being right.
But alas, I wasn't even among the best listeners in the test. Surely equipment at least wasn't a limiting factor in my case.
I must say that I was also a wee surprised that spotting artifacts in a 128kbps ABR test was so difficult. I knew it was going to be difficult, but it was even more so than I initially had imagined.
friendly regards,
halcyon
PS I really should not even have needed to reply with this defense, as ad hominem type attacks don't really need refutation. I think arguments should be evaluted based on the evidence available (and the logic of reasoning). Not on the basis who makes the argument, UNLESS there is strong proof to show that the author is not to be trusted (which in this case is non-existent). Conjecture is not enough. Arguments need evidence, not prejudice as their support.
When hifi mags test speakers, they tend to use the best possible cables, amplifiers and most trained ears. That doesn't make the test useless to people with average hearing/equipment
[a href="index.php?act=findpost&pid=274192"][{POST_SNAPBACK}][/a]
Tell me you're kidding. Hifi mags may *claim* that the ancillary equipment they use is the 'best possible ' (though it's almost never been subjected to a controlled listening test) and that their listeners are 'trained' (no proof of that either...but I guess it's how guys like Robert Harley can hear the directionality of the crystalline structure of cables)... but that's no reason to believe what they claim.
And it's always funny when the caution that the listener who isn;'t using a $10,000 amp and $100/ft cabling might not hear the amazing microdynamics they hear...thereby covering their asses.
FWIW, Floyd Toole, Sean Olive, et al, who are doing controlled comparisons of speakers using trained listeners at their facility at Harman/JBL, seem rather more credible to me than any 'hi fi' mag in this area