Hello.
I was going to start this test discussion tomorrow, but considering HA will be, as per tradition, flooded with April 1st pranks, I decided to start today.
The test is planned to start on April 14th and end on April 25th. But the test start can be postponed if necessary.
The codecs that are planned to be tested are:
-Musepack - latest beta version - quality 4
-iTunes/QuickTime AAC - latest release - 128kbps (winner of the latest AAC test)
-Ogg Vorbis - whatever version the vorbis enthusiasts decide - quality that comes closer to 128kbps average (probably 4)
-Lame - latest stable version - --alt-preset 128 (winner of the MP3 test)
-Windows Media Audio 9 standard - whatever VBR setting comes close to 128kbps.
There is still room for a 6th competitor (and ONLY a 6th competitor). IMO, it could either be Atrac3 (not Plus) or an anchor. You guys decide.
Schnofler's ABC/HR Java comparator will be used, with encryption.
Also, I would like to invite everyone interested to subscribe to the listening test newsletter (http://www.rjamorim.com/test/newsletter.html), to keep informed about tests starting and results being published. The traffic is very low.
Regards;
Roberto.
Roberto,
Just out of curiosity as I'm reluctant to partecipate to the test due to my 49er's ears, but in case it will be ATRAC3, how can we listen to ATRAC3 material on our PCs? Are you going to provide flac or whatever rips of that?
Cheers
Sergio
There is still room for a 6th competitor (and ONLY a 6th competitor). IMO, it could either be Atrac3 (not Plus) or an anchor. You guys decide.
I would be interested to see how the Sony Atrac3 format compares to the others. I have a Sony portable, and I find that I can fill a single CD with almost twice as many songs (340) as I can with MP3 @ 128 kbps.
Unfortunately, the Sony Atrac3 software is very plodding and slow to work with. You can't even save your encoded files to your hard drive!
- Scott
Are you going to provide flac or whatever rips of that?
Considering I manage to deal with SonicStage 2, I will provide FLAC files of the decoded streams. Same thing for WMA.
Greetings to everybody,
Sorry if this has already been discussed, but I couldn't seem to find the answer in the previous listening test discussions. Why will the WMA9 standard codec be included in the test, instead of the better performing WMA9 pro. The previous multiformat listening test showed that it has great performance, so I think it would a pity not to include a worthy component and include instead the inferior standard version.
I suspect, that this is done because of the very limited user base (compared to WMA 9 Standard) and the poor (if any) hardware support for the pro codec, but chances are, that this situation will improve sooner or later (on the other hand, sadly, I don't see that happening any time soon for MPC).
Thanks for your replies
Regards
-George
The main reason I'm not planning to include WMA Pro is that it has not changed since the 128kbps test. We already have an idea of how it compares to Vorbis, AAC, etc. and therefore, this test won't bring in any novelty.
Second, I never tested WMA Std at this bitrate, and there is still a huge interest for this codec. Even though WMA Pro is much superior to Std, Microsoft is still heavily marketing Std - E.G, using WMP9 you can only rip CDs to WMA Std, not WMA Pro.
The fact that WMA pro has very little hardware, software and multiplatform support adds to the lack of interest in the format.
Last but not least, I'm really curious to see how WMA Std. performs at 128kbps. Microsoft's claim about WMA std outputting same quality as MP3 at half bitrate has already been proven to be false. I wonder if it can output same quality at even same bitrate using 128kbps.
Regards;
Roberto.
Last but not least, I'm really curious to see how WMA Std. performs at 128kbps. Microsoft's claim about WMA std outputting same quality as MP3 at half bitrate has already been proven to be false. I wonder if it can output same quality at even same bitrate using 128kbps.
I'd like to know this as well.
It's hard to explain to people in the other forums I'm on about WMA9-S sound quality when I don't have an independant, recent test result to refer to.
I'm really looking forward to this test, and a big reason (oddly) is to finally test WMA9 Standard against the more "proven" competition.
It's hard to explain to people in the other forums I'm on about WMA9-S sound quality when I don't have an independant, recent test result to refer to.
I feel exactly the same!
Anyway, just wanted to say something about what lame version to be used. From the results of the 3.90.3 vs 3.96b1 testing everything seems to indicate that 3.90.3 may still be better. Personally, I would want to see the best version compete, but I can understand the point of supporting (and motivating) further development by using the latest stable version. Any other opinions on this?
...or an anchor.
That would be fantastic.
Lets give those minidisc.org fundamentalists something to beef about and lets include atrac3. Thay even have some papers claiming that atrac3 sounds better than mp3 on theory basis (no listening test; only some theory).
There is still room for a 6th competitor (and ONLY a 6th competitor). IMO, it could either be Atrac3 (not Plus) or an anchor.
I'd like an anchor, and I think that Blade (once again) would be interesting.
A lowpass could also be used as an anchor, but an "encoder anchor" has the advantage to demonstrate encoding artifacts, and that could help some listeners. I know that artifacts are/could be different between encoders, but I think that this would be more usefull than a plain lowpass.
Anyway, just wanted to say something about what lame version to be used. From the results of the 3.90.3 vs 3.96b1 testing everything seems to indicate that 3.90.3 may still be better. Personally, I would want to see the best version compete, but I can understand the point of supporting (and motivating) further development by using the latest stable version. Any other opinions on this?
1 - I think that the 3.90.x branch is unlikely to be develloped further. New developement will be based on 3.96.
2 - 3.96 is more likely to be used outside HA, mainly due to the increased speed. You can not really argue that the test is targeted to HA users, as most of them are probably not encoding at 128kbps.
Because of those 2 points, I think that it would be better to use the latest available release (3.95.1 or perhaps 3.96)
I'd like to see ATRAC3 in the test, since that format became pretty popular here. I'm really interested in how Sony's format performs against other commercial formats.
IMO, it could either be Atrac3 (not Plus)
great to see that you finally think about using atrac3 too
so to say i also vote for atrac3
cause than this test can also be seen as a comparison of formats as they are used in music stores (aac in itunes, wma9 std in whatever crappy music store, and atrac3 in sony)
Here's another vote to include atrac. Minidisk is quite widely used and I am interested in knowing how good it is.
Why don't you try VQF-128 as a 6th competitor?
I think that it would be better to use the latest available release (3.95.1 or perhaps 3.96)
i second the opinion of including 3.96, and Atrac3 ... and i would participate this time!
There is still room for a 6th competitor (and ONLY a 6th competitor). IMO, it could either be Atrac3 (not Plus) or an anchor. You guys decide.
Atrac3 seems interesting... but I'm more interested in an anchor like l3enc ( ): how fares the grandfather versus the newborns?
Why don't you try VQF-128 as a 6th competitor?
Maybe because so far noone has provided a good reason for doing this?
-------------------------------------
I'd like to see ATRAC included too
-------------------------------------
about lame: Use the winner from the test (http://www.hydrogenaudio.org/forums/index.php?showtopic=19813&) at ~ 128kbps:
3.96 -V 5 vs. 3.96 --preset 128 vs. 3.90.3 --alt-preset 128
I hope a few more of the 264 people who've voted for performing this test here (http://www.hydrogenaudio.org/forums/index.php?showtopic=19404&st=80&) will contribute results quickly... at least at this bitrate it should be possible for almost everyone to find samples where (s)he can hear a difference.
I would consider spending time on testing an inferiour encoder, that I wouldn't recommend to ppl later anyway, as a waste. BTW - Who knows, maybe 3.96 (or 3.97 final) will be the last version of lame 3.9x branch and remaining lame 3.9x developers will move to 4.x branch or start developing some AAC encoder... so the fact that 3.90.x won't be developed further isn't really an argument IMO.
I suspect this will be a popular test. Can we consider increasing the number of samples to test?
ff123
I suspect this will be a popular test. Can we consider increasing the number of samples to test?
ff123
Roberto will never agree to that. It takes to much effort on the testers part to do more then 6 samples.
I suspect this will be a popular test. Can we consider increasing the number of samples to test?
ff123
Roberto will never agree to that. It takes to much effort on the testers part to do more then 6 samples.
Nobody will be expected to do all the samples. If a lot of people participate on a partial basis, though, the test could accomodate more samples. For example, if 24 people on average participate in a 12-sample test, then
without increasing the volume of individual effort, it could accomodate 16 people on average on an 18-sample test.
ff123
Edit: Oops, now I understand the response. I wasn't talking about increasing the number of codecs. I was talking about increasing the number of music samples to test up from 12.
Why don't you try VQF-128 as a 6th competitor?
Well, if would be fun, everybody would point at VQF and say "look at how badly it performs!". But what would be the point? Does anybody use VQF at all these days, let alone the Nero-exclusive 128kbps version? All big supporting sites are gone, and Yamaha themselves gave up supporting the format. So, I think the final VQF results won't make a difference to anyone.
I suspect this will be a popular test. Can we consider increasing the number of samples to test?
Can be done. How many samples would you consider appropriate?
And kl33per is right, the amount of codecs featured definitely won't be higher than 6.
Thanks for all your suggestions and ideas so far.
Regards;
Roberto.
How about increasing from 12 samples to 18?
ff123
I think Lame 3.96 should definitely go instead of 3.90.3.
We can --sorta-- predict how the well-tested, HA-recommended version will do.
I think it would be more useful to use the new contender, which seems to be doing quite alright on it's "competition vs. 3.90.3" thread.
ATRAC should definitely go in, as it is still widely used.
It would make the test more relevant and interesting.
How about increasing from 12 samples to 18?
That is surely doable. I will do a call for samples in the next few days. I won't do it today, because people would believe it's an April 1st prank >_<
How about increasing from 12 samples to 18?
hm would this help the final results? if not i would avoid this, as 10 samples are already enough to listen to imho
I would like to see ATRAC, too, as I use my MiniDisc player quite often and want to know how good/bad the quality is.
How about increasing from 12 samples to 18?
hm would this help the final results? if not i would avoid this, as 10 samples are already enough to listen to imho
Again, people would not be expected to listen to all 18 samples. This should probably be made very clear in the test instructions, though, so listeners don't kill themselves.
ff123
This should probably be made very clear in the test instructions, though, so listeners don't kill themselves.
Headline: "Roberto's Listening Test claim another victim!"
WMA Standard????
Finally!!! Too cool. I'm curious to see how much better my iTunes purchased songs are than the same offering from Napster and hoping I made the right decision in ditching my Creative products (WMA) for Apple's (AAC) line of DAPs.
VQF 128
Was that a joke??? Does ANYONE use this codec??
ATRAC3
Very interesting 6th choice as sony Connect debuts in weeks.
OK, so I can consider that the vast majority wants Atrac3?
In this case, the codec list will be:
-Musepack
-Apple AAC
-Vorbis
-WMA Std.
-Lame
-Atrac3
I want to make clear I'm not sure yet I will be able to provide Atrac3 encodings. I have SonicStage2 installed, but I didn't tried yet doing encoding -> decoding. Input from people experienced with it will be hugely appreciated.
Regards;
Roberto.
Probably too late now but I just wanted to put my hand up for atrac as well.
One question: Will MPC, like the other VBR codecs to be tested, be tested at the quality that gives a bitrate closest to 128 kbps as well? In your very first post, you seem to have just put quality 4 for musepack.
In your very first post, you seem to have just put quality 4 for musepack.
http://www.hydrogenaudio.org/forums/index....howtopic=11134& (http://www.hydrogenaudio.org/forums/index.php?showtopic=11134&)
Musepack didn't change since.
great to see people also want atrac3
rjamorim can you plz add mp3 when you write lame. cause i showed your last multiformat test at 128 to some newbies and they believed mpc was mp3
I want to make clear I'm not sure yet I will be able to provide Atrac3 encodings. I have SonicStage2 installed, but I didn't tried yet doing encoding -> decoding. Input from people experienced with it will be hugely appreciated.
To decode .omg (encrypted atrac3) files:
- encode the samples
- RENAME or CHANGE LOCATION of the ORIGINAL samples. If you don't do that, Sonic Stage will play the original file, and not the encoded one.
- Play the sample(s) with Sonic Stage and capture the stream with Total Recorder.
I've tried to capture the decoded stream through foobar2000 or Adobe Audition. I don't know why, but there was a slight parasite noise VISIBLE in high frequencies. No problems with Total Recorded. Problem with Windows mixer?
There's a possible problem: the offset. You have to find it for each sample (or to cut exactly the stream). Maybe Java ABC/HR could do this automatically. Otherwise, it will be painful (especially with more than 12 samples).
Or burn *.omg files into AudioCD then rip 'em. It's still hassle in this way though.
ATRAC3 should be "The one".
But what do you think about this (http://personal.inet.fi/koti/askoff/WBPB_30s.flac) sample? I can't be sure tho if it is good "problem" sample.
how about using wma 9 Pro as the 6th competitor? this way we can see what the Pro really makes...I mean it's the newest encoder from MS.
how about using wma 9 Pro as the 6th competitor? this way we can see what the Pro really makes...I mean it's the newest encoder from MS.
hm rjamorim already said that "The main reason I'm not planning to include WMA Pro is that it has not changed since the 128kbps test."
i think it would make much more sense to test a new codec than testing the same codec again
how about using wma 9 Pro as the 6th competitor? this way we can see what the Pro really makes...I mean it's the newest encoder from MS.
Additionally to what bond said, wma9 standard is supported by some portable players (and wma9pro is not), so the results can help ppl to choose what player to buy. Or do you suggest to ditch some other codec and include wma9pro additionally? In this case - what encoder would you suggest to be replaces with wma9pro (and why do you think wma9pro is more interesting)?
It makes no sense whatsoever to include WMA Pro. Since MusePack is in both tests and also unchanged since, you can compare the results with the previous test and see how WMA Pro compares.
(And if you wonder, it makes more sense to have MusePack as reference since it won the last test)
(And if you wonder, it makes more sense to have MusePack as reference since it won the last test)
I'll save Roberto the trouble of replying:
[span style='font-size:21pt;line-height:100%']Musepack didn't win.[/span]
Just nitpicking, of course. I agree with you that it doesn't make sense to include WMA Pro.
Just nitpicking, of course. I agree with you that it doesn't make sense to include WMA Pro.
Lol, I remembered the result wrong.
OF COURSE AAC was also at the top
Edit: The text says 'big tie for first place', but I assume that was written before the error margins were corrected.
Edit: The text says 'big tie for first place', but I assume that was written before the error margins were corrected.
Aree!?! When was this update done? And how could there be such a big change? Were the analysis that messed up before?
Aree!?! When was this update done? And how could there be such a big change? Were the analysis that messed up before?
There was a bug in the calculation of the error margins, several of the old test results got updated. The relative positions didn't change, but the results are much more significant now.
Edit: The text says 'big tie for first place', but I assume that was written before the error margins were corrected.
Aree!?! When was this update done? And how could there be such a big change? Were the analysis that messed up before?
http://www.hydrogenaudio.org/forums/index....ndpost&p=190675 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18474&view=findpost&p=190675)
Edit, more here:
http://www.hydrogenaudio.org/forums/index....ndpost&p=190827 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18474&view=findpost&p=190827)
The text says 'big tie for first place', but I assume that was written before the error margins were corrected.
Good point. AFAIK Roberto didn't change the text after the margins were corrected.
From the second link in my previous post:
At the Extension test, it seems Vorbis and WMAPro are no longer tied to AAC and MPC, and now share second place. I'll leave it to others to discuss.
If the new version of LAME is used (3.96) then which setting will be used:
[span style='font-size:12pt;line-height:100%']--preset 128 or -V 5[/span]
3.96b1 -V 5 > 3.90.3 --ap 128 :: Quizas :: tigre :: 0x verified so far
3.96b1 --p 128 < 3.90.3 --ap 128 :: Quizas :: tigre :: 0x verified so far
On this sample -V5 is better, but it hasnt been tested very much.
Changing the text at the 128kbps test is a problem. Even after the error margins were corrected, all the 4 fist places overlap, if even a little. I would eyeball that Musepack is better than Vorbis and WMA Pro.. But Musepack isn't first place alone, as it is obviously tied to iTunes. And, for it's turn, iTunes is tied to Vorbis and WMA Pro. So there you have it...
And, to clarify: I still maintain Musepack didn't win - alone. At least, Musepack and iTunes AAC both won.
I think that we should lower the bitrate of MPC compared to the last test! In the last test MPC got a bitrate of 146.1 and that's a big difference to 128 I think. I think that the average bitrate of all test samples should be between 125 and 130!
Big_Berny
I think that we should lower the bitrate of MPC compared to the last test! In the last test MPC got a bitrate of 146.1 and that's a big difference to 128 I think. I think that the average bitrate of all test samples should be between 125 and 130!
I assume the tests are done on problem samples. So if the bitrate is inflated for these samples it shouldnt matter, as long as the majority of music is around the correct bitrate for the setting.
Edit: The text says 'big tie for first place', but I assume that was written before the error margins were corrected.
Aree!?! When was this update done? And how could there be such a big change? Were the analysis that messed up before?
http://www.hydrogenaudio.org/forums/index....ndpost&p=190675 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18474&view=findpost&p=190675)
Edit, more here:
http://www.hydrogenaudio.org/forums/index....ndpost&p=190827 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18474&view=findpost&p=190827)
First I've heard about this.
That many mistakes and corrections should have been posted as a seperate thread, rather than being burried in a couple of long threads.
I can understand making the mistake. That happens. I'm just surprised it wasn't a thread all by itself so everybody would hear about it.
The "how to choose quality settings for VBR codecs" topic has been discussed countless times in almost every thread related to rjamorim's tests (pretest, annoucement, results threads).
The posts by Big_Berny and FatBoyFin can be regarded as summaries of the 2 standpoints on this question.
After all these discussions, rjamorim's decision on how to choose settings for VBR codecs hasn't been made by flipping a coin for sure So to anyone who wants to add something related to this: Please read the old threads and make sure that you have to say something new before posting.
As long as the same criteria for choosing quality settings is applied consistently to all VBR codecs, then that should be ok, I think.
As long as the same criteria for choosing quality settings is applied consistently to all VBR codecs, then that should be ok, I think.
the problem is that not all codecs are VBR! For example the AAC-codec of Itunes is CBR AFAIK.
Big_Berny
That's why the average quality ratings in the test results must be represented along with average efficiency to accommodate variances in average bitrate. That way all factors are accounted for to make the results more meaningful. For instance, it won't be so surprising when a format gets the highest quality rating if it's also known that it averaged a 14% higher bitrate than the test target. (One of many factors, of course, that determine sound quality in an encoding format.)
If a codec's used setting averages 128kbps over a wide range of music, yet uses a higher bitrate on these samples because it understands they are though, it is _more_ effifcient not less, even though it averages a higher bitrate in the test.
Ok, I changed my mind! I think too that it should be the average bitrate ovver multiple albums and not the testsamples. But I think it would also be interesting to compare a song where MPC gets a bitrate of 100 and the others a higher one!
Big_Berny
Just a small note that the Vorbis listening test, which will determine the best encoder Vorbis has to offer for this multiformat test, has started (http://www.hydrogenaudio.org/forums/index.php?showtopic=20389&view=findpost&p=199844).
@guruboolez: Thanks for the tips on using SonicStage
About featuring more than 12 samples: Even thought this test MIGHT be a popular one, we must remember we're testing each codec at it's best: the best AAC encoder (iTunes), the best MP3 encoder (Lame), the best Vorbis branch... so, it'll still be a quite hard one. Probably the hardest I conduced to this date.
For that reason, I'm not very confident about using several samples. It would be perfectly fine for the 48kbps test that is coming next, but I'm not sure it'll be a good idea for this one.
Regards;
Roberto.
About featuring more than 12 samples: Even thought this test MIGHT be a popular one, we must remember we're testing each codec at it's best: the best AAC encoder (iTunes), the best MP3 encoder (Lame), the best Vorbis branch... so, it'll still be a quite hard one. Probably the hardest I conduced to this date.
For that reason, I'm not very confident about using several samples. It would be perfectly fine for the 48kbps test that is coming next, but I'm not sure it'll be a good idea for this one.
It's a matter of how best the listeners can be distributed to give the best results. I agree that it's a tough choice given the number of listeners who participated in the past. I wouldn't expect much more than 30 per sample if there are 12 samples, even on a very popular test. Maybe 25 is more realistic and if you're pessimistic maybe less than 20 per sample.
If you're pessimistic, it's better to keep the number of samples at 12.
ff123
If a codec's used setting averages 128kbps over a wide range of music, yet uses a higher bitrate on these samples because it understands they are though, it is _more_ effifcient not less, even though it averages a higher bitrate in the test.
I may have used the wrong term, but what I meant by "efficiency" is compression rate. An encoder using more bits on problem samples is, to me, the encoder using bits wisely to maintain good sound quality,
at the expense of efficiency.
HERE'S MORE FUEL TO THE DEBATE :-)
Isn't incorporating a totally proprietory format such as Atrac3 into the listening test, kind of like bringing RealAudio into the fold? Yikes!!
I really like the idea of utilizing a grandfather codec to demonstrate how years of research have hopefully improved upon THE old original standard from Fraunhoffer - however my twist on this would be to use the most "mature" example of the FhG codec available from "AudioActive".
This codec had such extensive fine tuning done at 128kbps that it would make an excellent competitor to the newest codecs available. It would truly be interesting to see just how far everyone else has come by comparison. The final Mp3 codec from Fraunhoffer was the benchmark that everyone looked to -- as it had so much research and developement in its favor.
In Roberto's Mp3 Tests at 128kbps - this is the codec that gave everybody quite a run for the money - beating out all others more than once in the test! Even though work on the codec has ceased - it may still make for some interesting competion five years later.
FOR THE 6TH CODEC SAMPLE - I VOTE FOR FHG PRO (AudioActive)
Isn't incorporating a totally proprietory format such as Atrac3 into the listening test, kind of like bringing RealAudio into the fold? Yikes!!
There are several considerations to determine what formats to include in the listening test, and one of them is the general
popularity of each format, not whether they are proprietary, in my opinion. I've heard of lots of people that use ATRAC3 (mostly MiniDisc users).
The results of this test will be meaningful to more people if the tested formats are ones they actually use, have used in the past, are considering, or at least have heard of.
In my opinion, FhG Pro belongs in an MP3-specific test, as it has already had a showing in.
QUOTE (nite @ Apr 3 2004, 10:01 PM)
Isn't incorporating a totally proprietory format such as Atrac3 into the listening test, kind of like bringing RealAudio into the fold? Yikes!!
[/i]
REPLY:
QUOTE (ScorLibran)
In my opinion, FhG Pro belongs in an MP3-specific test, as it has already had a showing in.
[/i]
You may be totally right that FhG Pro has already had its day. The forerunner to AAC may not hold much relevance except as an anchor in the test.
I would love to see Sony's ATRAC3 held up to the quality standards of the other codecs which are definately part of the new testing.
Am I correct in believing that ATRAC3 was recently abandoned by RealPlayer as thier proprietary encoding format. I thought they had made a move to AAC. I have long been a Sony fan - but.... they do have a history of beating a dead horse......remember "Betamax".
No matter...If the jury is still out?? Go ahead and put ATRAC3 to the challenge!
HERE'S MORE FUEL TO THE DEBATE :-)
Isn't incorporating a totally proprietory format such as Atrac3 into the listening test, kind of like bringing RealAudio into the fold? Yikes!!
I really like the idea of utilizing a grandfather codec to demonstrate how years of research have hopefully improved upon THE old original standard from Fraunhoffer - however my twist on this would be to use the most "mature" example of the FhG codec available from "AudioActive".
This codec had such extensive fine tuning done at 128kbps that it would make an excellent competitor to the newest codecs available. It would truly be interesting to see just how far everyone else has come by comparison. The final Mp3 codec from Fraunhoffer was the benchmark that everyone looked to -- as it had so much research and developement in its favor.
In Roberto's Mp3 Tests at 128kbps - this is the codec that gave everybody quite a run for the money - beating out all others more than once in the test! Even though work on the codec has ceased - it may still make for some interesting competion five years later.
FOR THE 6TH CODEC SAMPLE - I VOTE FOR FHG PRO (AudioActive)
Erm, FhG is every bit as proprietary as RealAudio. What are you talking about? The best formats will be included period. Whether they are proprietary or open source or whatever is completely irrelevant in this case.
Am I correct in believing that ATRAC3 was recently abandoned by RealPlayer as thier proprietary encoding format. I thought they had made a move to AAC. I have long been a Sony fan - but.... they do have a history of beating a dead horse......remember "Betamax".
Atrac3 will be featured at Sony's online music store, that should be launched soon.
Actually, that's the feature of this test that is thrilling me the most: It'll be a big shootout among online music stores: Sony's (Atrac3), iTMS (Apple AAC) and almost everything else (WMA std). I hope the Vorbis, MPC and MP3 entrusiasts forgive me, but what is really interesting me in this test is to see wether AAC, WMA or Atrac3 will win. That will point which music store offers the files of highest quality. (not considering the Real music store, unfortunately, since it's using a different bitrate range).
I agree that it'll be of great interest to see these music store formats tested side-by-side.
Maybe you should rename it to the "Online Music Store Format Listening Test".
IMO more than 12 samples would be a good thing, no matter how many people participate. Reasons:
- In the tests before, listening closely to some of the samples for ABXing had been a torture for me, because I don't like the music. I'd like to test as many samples as possible, but I'd prefer to have a choice and only test the music I like.
- Additionally, I believe that for music you like (i.e. instruments you're used to) it's easier to spot artifacts, especially the more subtle ones.
- This is just an assumption, but the overal results (especially the size of error bars) depend on the number of results and the distribution of rankings, as far as I understand. If you give people the possibility to choose the samples they feel comfortable with for testing, you'll get more results and the error bars become smaller.
It'll be a big shootout among online music stores: Sony's (Atrac3), iTMS (Apple AAC) and almost everything else (WMA std).
great that you like my arguments
It makes no sense whatsoever to include WMA Pro. Since MusePack is in both tests and also unchanged since, you can compare the results with the previous test and see how WMA Pro compares.
(And if you wonder, it makes more sense to have MusePack as reference since it won the last test)
This comparison would be somewhat possible only if the samples would remain the same between the two tests, otherwise every sense of comparison is lost, since WMA9 Pro was tested in CBR mode (two pass vbr) and MPC is a highly adaptive intrinsic VBR encoder. Assuming the case, that both were tested in VBR mode, the comparison between tests with different samples, would still be difficult (although not impossible, taking into account some error margins) since the capabilities and the number of listeners between the tests would vary and the listening conditions for the same listeners would surely be different (the claim that VBR encoders offer constant quality over a wide range of music, should also be taken with a grain of salt).
Concerning now the number of samples, I think that the question that arises is: "How many listeners per sample are needed to create accurate and statistically valid results?". If the answer to that question is "around 15 listeners" then I don't think that increasing the number of the samples would help all that much (although it would be more than helpful under different testing conditions), taking into consideration the fact, that the distribution of the listeners among the different samples won't be ideal, to the effect that some samples would be evaluated from too few listeners, making results for that specific sample, statistically invalid. On the other hand, if the number of participants in this test is expected to be quite high (highly doubtful, since the tops of modern encoders are being tested, making the test very difficult) the increase of the samples makes perfect sense. So IMHO, 12 carefully chosen and representative of each genre samples, should be enough.
What do you think about using the same samples that were used in the AAC test? Something like that would help comparisons (with a margin of error of course) between encoders that will not be directly compared (e.g. Nero AAC vs. MPC, Lame vs. FAAC, Vorbis vs. FAAC and so forth).
Just my 0.02 euros
Kind Regards;
-George.
Just a small note that the Vorbis listening test, which will determine the best encoder Vorbis has to offer for this multiformat test, has started (http://www.hydrogenaudio.org/forums/index.php?showtopic=20389&view=findpost&p=199844).
i suggest everyone to join the vorbis listening test.
it really makes a difference!!!
i tested now 4 samples and on 3 the tunings were significantly better!
Concerning now the number of samples, I think that the question that arises is: "How many listeners per sample are needed to create accurate and statistically valid results?". If the answer to that question is "around 15 listeners" then I don't think that increasing the number of the samples would help all that much (although it would be more than helpful under different testing conditions), taking into consideration the fact, that the distribution of the listeners among the different samples won't be ideal, to the effect that some samples would be evaluated from too few listeners, making results for that specific sample, statistically invalid.
IMO it's much more important to get a meaningful overall result than meaningful results for every single sample. I might be wrong, but in my understanding a low number of listeners for a single sample just leads to bigger error bars. If all error bars overlap, the result won't be meaningful for that sample but still statistically valid. When calculating the total result (average rankings + error bars), 18 samples with 20 results on average should be as good as 12 samples with 30 results on average. I believe that there will be more total results (i.e. 18*20 or 12*30 in the example) submitted if people can choose from more samples.
Pleas someone knowledgable correct me if my assumptions about the way results are calculated are wrong.
IMO it's much more important to get a meaningful overall result than meaningful results for every single sample.
This is the main reason why I'd like to see more samples. One can think of the number of samples in the test as analagous to the number of listeners per sample. I think it's probably better to have 12 listeners per sample and 24 samples than 24 listeners and 12 samples. That way the particular weaknesses/strengths of each codec can be better explored.
It would also be possible to carefully choose overlapping subsets of samples to answer criticisms of previous tests. As an example:
1. average bitrate of each codec in subsample chosen to come as close as possible to 128 kbs. Both samples which produce high bitrates and low bitrates in vbr codecs included.
2. classical genre emphasized more.
3. problem samples used.
ff123
Well, if you guys say more samples will be better...
I will go for 18 samples then. The 12 samples from the AAC test will remain pretty much the same, and the remaining 6 samples could be:
-2 classical ones
-2 problem samples
-2 normal samples of styles not featured among the 12 from last test.
What do you think?
I considered adding some voice-only sample (from a movie, maybe) since that would be interesting for people doing DVD rips. But I guess that would be too transparent at 128kbps...
Regards;
Roberto.
I considered adding some voice-only sample (from a movie, maybe) since that would be interesting for people doing DVD rips. But I guess that would be too transparent at 128kbps...
Spoken voice on top of background/ambiant noise or background music would be interesting
Hi All.
I discovered a problem when trying to burn a CD while converting from MP3 to ATRAC3(plus). I used the Sony program, SonicStage (Simple Burner) version 1.0.00.14180, that came with my player. I thought that this might be relevant when deciding whether or not to include ATRAC3 in the tests.
The problem occurs during the conversion phase, the first operation that creates an image file from the source file, and before it writes to disk. Once SonicStage SB encounters a problem file, it aborts the process, and you lose all of the image files that had already been converted. Nothing gets written to the CD. This is a critical shortcoming. The whole process of converting files is extremely slow - more than an hour to fill a single CD. If after an hour of processing files, it encounters a single problem file, it aborts and you lose everything.
For me, SonicStage SB could not handle converting APS or VBR MP3 files unless the maximum bit rate was set to 128 or less. What I find interesting is that it had no problems dealing with ABR 128. No problems with alt-preset insane (API) either.
Pentium 4 1.6 GHz
TDK VeloCD 48 x 16/24 x 48
Kansas: Leftoverture: Magnum Opus
APS (92-128): pass
APS (128-160): fail
APS (128-192): fail
APS (128-256): fail
API (320): pass
VBR default (64-128): pass
VBR default (64-160): fail
AP CBR 192 (192): pass
AP ABR 128 (64-320): pass
AP ABR 160 (64-320): fail
AP ABR 160 (64-160): fail
AP ABR 192 (64-320): fail
AP ABR 192 (64-224): fail
Can anyone confirm these findings? Perhaps this problem doesn't occur when using the regular SoundStage or ATRAC3 instead of ATRAC3(plus). I don't know.
- Scott
The 12 samples from the AAC test will remain pretty much the same, and the remaining 6 samples could be:
-2 classical ones
which would mean that there will be 4 "classical" style samples (with mahler and hongroise from the aac test) - more than 20% of the samples
wouldnt this mean a bias towards wma9 (or towards a specific music style)? at least wma9 pro seems to have been heavily tuned towards classic music (as the last multiformat test showed)
I considered adding some voice-only sample (from a movie, maybe) since that would be interesting for people doing DVD rips
great
which would mean that there will be 4 "classical" style samples (with mahler and hongroise from the aac test) - more than 20% of the samples
wouldnt this mean a bias towards wma9 (or towards a specific music style)? at least wma9 pro seems to have been heavily tuned towards classic music (as the last multiformat test showed)
I don't think so, because even if the samples can all be lumped together as "classical", they are very different among themselves.
Hongroise is piano solo, and Mahler is a full orchestra. I was planning, for the other two samples, a chamber orchestra and an opera. I never tested an opera sample, and I would be very interested on results.
Can someone contribute samples of said styles? Guruboolez?
great
Past tense. I'm not considering it for this test anymore, although It would probably be an interesting sample for the 48kbps test
I don't think so, because even if the samples can all be lumped together as "classical", they are very different among themselves.
if you say so
Past tense. I'm not considering it for this test anymore, although It would probably be an interesting sample for the 48kbps test
bad
btw did you drop the idea of conducing a speech codec test? would be more interesting than another low bitrate test (bitrates hardly used i think), which aac (with sbr+ps) will win anyways imo...
I'd be interested in another punkrock/hardcore sample personally, since this genre is quite underrepresented right now. I'll post some new samples later tonight.
btw did you drop the idea of conducing a speech codec test?
Mostly, yes. I am pretty certain there will be not enough interest. Just look at the speech codecs forum (http://www.hydrogenaudio.org/forums/index.php?showforum=31&) and you'll understand.
I'd be interested in another punkrock/hardcore sample personally, since this genre is quite underrepresented right now.
Wouldn't gone and mybloodrusts already represent this "heavy stuff" you guys listen to?
I'm not into these styles, so I really don't know :B
just a quick question.
Why ATRAC3 and no ATRAC3 Plus?
thanks
Why ATRAC3 and no ATRAC3 Plus?
The only bitrates available for Atrac3 plus in SonicStage are 48, 64 and 256kbps
I cast my vote for a trance sample and some celtic/irish stuff.
Which ATRAC is ATRAC3? IIRC there are several varients of ATRAC3 and I can never keep them straight.
I cast my vote for a trance sample and some celtic/irish stuff.
Sounds good. Please provide samples, if you have
Everyone is invited to provide samples you believe can fit in the 6 open slots.
Which ATRAC is ATRAC3? IIRC there are several varients of ATRAC3 and I can never keep them straight.
Atrac versioning is a mess. I'm using the version in SonicStage 2, and hoping that is what I should be using. :B
I vote for my Rosemary (http://www.hydrogenaudio.org/forums/index.php?showtopic=19882&view=findpost&p=196998) sample. It's not only from my favourite song but also a problematic sample I think.
I cast my vote for a trance sample and some celtic/irish stuff.
Sounds good. Please provide samples, if you have
Please see this thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=12255&st=75&). I uploaded 8 samples fitting both categories. I'd be happy to provide some more if you like.
The only bitrates available for Atrac3 plus in SonicStage are 48, 64 and 256kbps
I'm using SonicStage Simple Burner. I believe it uses Atrac3 plus. Under 'configuration', it provides for the following bit rates only:
48
64
66
105
132
- Scott
Why ATRAC3 and no ATRAC3 Plus?
The only bitrates available for Atrac3 plus in SonicStage are 48, 64 and 256kbps [/quote]
Weird...
Thanks for the info.
which would mean that there will be 4 "classical" style samples (with mahler and hongroise from the aac test) - more than 20% of the samples
wouldnt this mean a bias towards wma9 (or towards a specific music style)? at least wma9 pro seems to have been heavily tuned towards classic music (as the last multiformat test showed)
I don't think so, because even if the samples can all be lumped together as "classical", they are very different among themselves.
i just stumbled over ff123's old 64kbps listening test and one of the conclusions he had was "If one compares all classical chamber music, wma8 might come out near the top"
of course someone can always say "lets listen to 18 "hard rock" samples, there well be no bias towards "hard rock", cause they all are so different among themselves", but still there are similarities if you compare styles like rock vs. techno vs. classic aso...
still 4 classical style (yeah the broad range definition) samples (as i said more than 20% of the samples) are too much, even the slightlies bias towards a codec should be avoided imho
instead i would prefer a move towards other music styles like one (or four ) irish music sample would be great, also tigre's chanchan sample sounds interesting
also i would vote for taking in the rebel sample and the destitute sample from harashin (uploaded here (http://www.hydrogenaudio.org/forums/index.php?showtopic=19882&view=findpost&p=196998)
Hello,
I would like to ask some questions, concerning the consequences deriving from the increase of the samples for this test and most importantly about the use of the ANOVA / Fisher LSD method to analyse the results, since there are some things that I don't understand and I would very much appreciate it, if someone could help me out.
But first of all, let me cite an example from the recent 128kbps mp3 test, where a specific sample (riteofspring) was evaluated by a very low number of listeners (11).
Rite of spring
Listeners | Xing Lame iTunes Gogo FhG AActive
-----------|---------------------------------------------------
1 | 5.0 5.0 5.0 5.0 5.0 5.0
2 | --- --- --- 1.5 --- ---
3 | 2.5 2.5 3.5 3.0 4.0 4.2
4 | 2.0 3.0 2.2 3.6 4.4 4.0
5 | 4.7 4.4 4.9 4.8 4.2 4.8
6 | --- 4.7 --- 4.5 4.6 4.1
7 | 1.0 1.5 1.0 4.0 2.0 3.5
8 | 4.1 2.8 2.5 3.0 1.0 2.0
9 | 3.5 1.4 4.0 3.1 1.5 1.7
10 | 4.2 4.0 4.6 4.6 2.2 4.2
11 | 1.3 3.7 1.5 1.0 2.5 1.8
---------------------------------------------------------------
ANOVA | 3.50 3.86 3.47 2.45 3.36 4.29
Ranking | (3) (2) (4) (6) (5) (1)
In the following table we can see how many times,
each codec was placed in a particular position.
Listener Rating
| 1st 2nd 3rd 4th 5th 6th
--------------|-------------------------------------------------
Xing | 2 2 1 0 1 3
|
Lame | 3 0 2 3 0 2
|
iTunes | 4 0 1 3 0 1
|
Gogo | 3 2 3 2 0 1
|
FhG | 2 3 1 0 1 3
|
AActive | 2 4 1 2 0 1
--------------|----------------------------------------------------
So, can we draw any safe conclusions about the performance of the different codecs, by examining the above tables? Well, as far as I can understand it, I don't think so, and I surely don't think that we can conclude that iTunes should be fourth and AActive first or that Gogo is the worst by far, since only one listener came to that conclusion, while seven out of nine listeners found it better than Lame. So why does the ANOVA / Fisher LSD analysis indicate that it is the worst of all? Is there something that I am missing?
Is it because the ANOVA / Fisher LSD analysis produces very high error rates when dealing with contradicting and irregular data such as the above?
If the above is the case, will the upcoming increase of the samples combined with the difficulty of this test, lead to examples like this and if yes, shouldn't we discard them?
Can we use erroneous results such as this, to calculate the final ratings?
Will the use of an anchor make the distribution of the ratings more normal and thus provide more accurate results?
Should we consider revising the way we analyze the results or am I just posting nonsense and make a complete fool of myself?
Kind regards;
-George.
I'd be interested in another punkrock/hardcore sample personally, since this genre is quite underrepresented right now.
Wouldn't gone and mybloodrusts already represent this "heavy stuff" you guys listen to?
I'm not into these styles, so I really don't know :B
mybloodrusts should be replaced, since its bad recording quality is somewhat sub-par even for this genre. I prepared three alternatives, which are similiar in style:
A Case of Grenada - The Secret in The Soundfrom 'The Evidence':
acaseofgrenada-thesecretinthesound.sample18sec.flac (http://dev0.rc55.com/files/samples/acaseofgrenada-thesecretinthesound.sample18sec.flac)
The Blood Brothers - Guitarmyfrom 'Burn Piano Island, Burn!'
bloodbrothers-guitarmy.sample12sec.flac (http://dev0.rc55.com/files/samples/bloodbrothers-guitarmy.sample12sec.flac)
Give Up The Ghost - Since Alwaysfrom 'We're Down 'Til We're Underground'
giveuptheghost-sincealways.sample18sec.flac (http://dev0.rc55.com/files/samples/giveuptheghost-sincealways.sample18sec.flac)
I'd suggest the last one as a replacement, mostly because it produces some very interesting artifacts, but the others should be suitable too.
So, can we draw any safe conclusions about the performance of the different codecs, by examining the above tables? Well, as far as I can understand it, I don't think so, and I surely don't think that we can conclude that iTunes should be fourth and AActive first or that Gogo is the worst by far, since only one listener came to that conclusion, while seven out of nine listeners found it better than Lame. So why does the ANOVA / Fisher LSD analysis indicate that it is the worst of all? Is there something that I am missing?
If I recalculate the result based on the score table given by you, GoGo averages higher than LAME. That contradicts with what the result says. So I don't get it either. Perhaps your score table is wrong?
Thanks for all the submitted samples. I'll check them out today and tomorrow.
And I'm still waiting for classical samples. Anyone? :/
Regards;
Roberto.
And I'm still waiting for classical samples. Anyone? :/
:B
i give up trying to convince rjamorim, first he always says no way, he wont do as you suggest (at least in my case its that way) and after a month he says "yeah, lets take atrac3"...
first he always says no way, he wont do as you suggest (at least in my case its that way) and after a month he says "yeah, lets take atrac3"...
Bullshit. From the start I said I prefered an anchor instead of Atrac3. And I istill prefer. But if the vast majority wants Atrac3, I'll oblige.
I never said no way about Atrac3 in this test.
Now, I don't see the vast majority wanting to ditch classical samples. Instead, I see the biggest authority in listening tests in this forum (and one of the forum members I respect the most) suggesting them.
And I'm still waiting for classical samples. Anyone? :/
I uploaded Bartok_strings (http://www.hydrogenaudio.org/forums/index.php?showtopic=18360&view=findpost&p=195032). However it's already known by some Vorbis encoders, still is a "good" sample for the people who are not familiar with classical music.(like me)
EDIT: grammar
you mean ff123's statement?
It would also be possible to carefully choose overlapping subsets of samples to answer criticisms of previous tests. As an example:
...
2. classical genre emphasized more
for me this sounded more as if the critics were that there were too many classical genre samples in the last test
Classical music suggestions (additionally to the piano and brass sample):
- wood winds
- classical voice, choral
- recording with prominent noise (stage noise, hiss from old recording)
I will upload some samples later...
for me this sounded more as if the critics were that there were too many classical genre samples in the last test
Well, if you consider all three points he raised (makes me wonder why you didn't quote them too), it looks more like these points are
solutions to criticism, namely:
1. VBR bitrate deviation
2. Too few emphasis on classical
3. Too few problem samples
An increased amount of samples helps solve all this criticism, by:
1. bitrates are averaged over a broader amount of samples
2. makes room for classical samples
3. makes room for problem samples
for me this sounded more as if the critics were that there were too many classical genre samples in the last test
Well, if you consider all three points he raised (makes me wonder why you didn't quote them too), it looks more like these points are solutions to criticism, namely:
1. VBR bitrate deviation
2. Too few emphasis on classical
3. Too few problem samples
An increased amount of samples helps solve all this criticism, by:
1. bitrates are averaged over a broader amount of samples
2. makes room for classical samples
3. makes room for problem samples
Roberto is saying what I meant to say. There has been criticism in the past because classical, or maybe more generally, acoustic-only music, has been under-represented (so those critics say).
ff123
But first of all, let me cite an example from the recent 128kbps mp3 test, where a specific sample (riteofspring) was evaluated by a very low number of listeners (11).
Rite of spring
Listeners | Xing Lame iTunes Gogo FhG AActive
-----------|---------------------------------------------------
1 | 5.0 5.0 5.0 5.0 5.0 5.0
2 | --- --- --- 1.5 --- ---
3 | 2.5 2.5 3.5 3.0 4.0 4.2
4 | 2.0 3.0 2.2 3.6 4.4 4.0
5 | 4.7 4.4 4.9 4.8 4.2 4.8
6 | --- 4.7 --- 4.5 4.6 4.1
7 | 1.0 1.5 1.0 4.0 2.0 3.5
8 | 4.1 2.8 2.5 3.0 1.0 2.0
9 | 3.5 1.4 4.0 3.1 1.5 1.7
10 | 4.2 4.0 4.6 4.6 2.2 4.2
11 | 1.3 3.7 1.5 1.0 2.5 1.8
---------------------------------------------------------------
What are the ratings with dashes in them? Are these where the listener ranked the reference?
My page: http://ff123.net/friedman/stats.html (http://ff123.net/friedman/stats.html)
can calculate both parametrically (where ratings, 1.0 to 5.0 scale, are used) and non-parameterically (where rankings, 1st, 2nd, 3rd, etc.) are used.
Excluding those entries with dashes in them, this sample did not produce significant results either way. If these were the actual numbers for the 128 kbs mp3 test, then the graph for that sample should only show the averages and no error bars. If the ANOVA does not find that the results are significant, it doesn't go further than that, so no error bars are computed.
Maybe this is a case where Roberto's graph needs to be corrected.
As for using these results in the final calculation, I would think it's still ok to do so. If there were a lot of samples like this (where the results are not significant), the worst that can happen is that when you group them all together, they will add a false bias to the results. But more likely you'll just be adding more random noise to the results.
ff123
Roberto is saying what I meant to say. There has been criticism in the past because classical, or maybe more generally, acoustic-only music, has been under-represented (so those critics say).
ok convinced
(damn i hate abxing classic samples )
(damn i hate abxing classic samples )
Well, you won't need to ABX classical samples.
if I go with 18 samples, and
if I go with 4 classical samples, that still leaves 14 samples of other styles. Plenty of room to have fun.
But first of all, let me cite an example from the recent 128kbps mp3 test, where a specific sample (riteofspring) was evaluated by a very low number of listeners (11).
OK, I see what happened with your interpretation of the data from this particular sample:
Here's what your table looked like:
Rite of spring
Listeners | Xing Lame iTunes Gogo FhG AActive
-----------|---------------------------------------------------
1 | 5.0 5.0 5.0 5.0 5.0 5.0
2 | --- --- --- 1.5 --- ---
3 | 2.5 2.5 3.5 3.0 4.0 4.2
4 | 2.0 3.0 2.2 3.6 4.4 4.0
5 | 4.7 4.4 4.9 4.8 4.2 4.8
6 | --- 4.7 --- 4.5 4.6 4.1
7 | 1.0 1.5 1.0 4.0 2.0 3.5
8 | 4.1 2.8 2.5 3.0 1.0 2.0
9 | 3.5 1.4 4.0 3.1 1.5 1.7
10 | 4.2 4.0 4.6 4.6 2.2 4.2
11 | 1.3 3.7 1.5 1.0 2.5 1.8
---------------------------------------------------------------
Here's what the table actually looks like:
Xing Lame iTunes Gogo FhG AActive
4.00 4.20 4.60 2.20 4.20 4.60
1.70 4.00 3.10 1.40 1.50 3.50
2.80 2.50 2.00 1.00 3.00 4.10
2.00 3.50 1.50 1.00 1.00 4.00
4.10 4.60 5.00 4.50 5.00 4.70
4.90 4.80 4.70 4.80 4.40 4.20
1.50 2.50 1.30 1.00 1.80 3.70
5.00 5.00 5.00 1.50 5.00 5.00
4.00 2.20 3.00 2.00 3.60 4.40
3.50 4.20 3.00 2.50 2.50 4.00
5.00 5.00 5.00 5.00 5.00 5.00
Averages: 3.50 3.86 3.47 2.45 3.36 4.29
The results file don't list the ratings in the same order every time. So the first rating doesn't necessarily correspond with Xing. So you got the ordering mixed up. Roberto uses a script (http://www.phong.org/chunky/) to process the results files.
ff123
For classical (lyrical - vocal) samples:
http://www.hydrogenaudio.org/forums/index....ndpost&p=200645 (http://www.hydrogenaudio.org/forums/index.php?showtopic=14134&view=findpost&p=200645)
My favorites among them:
• [glockenspiel] / « Pa-Pa-Pa » / Debussy / Mouret: average volume is really low, and VBR encoders may decrease the bitrate, and lowering the quality. With the three last samples, the bitrate drop is serious with MPC, LAME V 5 and also Vorbis QK32 (less sensitive here). I've just take a look to MPC quality with all these samples... people requesting "low-bitrate samples" for MPC (or other encoders) should found an answer to the possible quality drop.
[glockenspiel] sample may be difficult for most encoder: volume is very low, and there's in addition a very special instrument accompanying voices (might introduce pre-echo problems and dull sound...).
I really like the Debussy one. It's maybe the most dramatic for VBR encoders. It's a live performance, probably recorded in winter (many cough in the background). There's a lot of details behind the scene, and it's really interesting to hear the amount of details remaining after the encoding. Volume could be increase by the listener without harm.
• Rinaldo: two sopranos are singing. Few instruments, but castanets as accompaniment! Should be enough to betray most encoders.
• Mondonville: Prometheus is singing, and lame --preset standard is shaking... I've ABXed 3.96b2 with --standard; I suppose that all 128 kbps encoders will have audible problems.
• Vivaldi - Vespri: good stereo image, chorus, orchestra, harpsichord...
• [01] Kancheli: the biggest dynamic range of all samples: ~60 dB. Ultra-quiet parts are alterning with very loud chorus moment. If we need to test quiet/loud orchestral/chorus music in all-in-one sample, this one is perfect. There-'s also a difficult-to-encode brass instrument.
To be short, my favorite for this test are Debussy - 01. Acte I - Scène 1 - Une forêt (2).wav for measure quality of encoders on very quiet parts ; then [GLOCKENSPIEL] Zaüberflöte - 14. ACTE I · Scène 16 - « Schnelle Füße, starker Mut » (Pamina, Papageno) because it makes me happy ; finally [01] Kancheli - Styx, for viola, mixed choir & orchestra (19 (30'50'').wav.
Other samples are here for artistic or esthetic reasons, if Roberto prefer vocal music from Renaissance or Sacred composition. I offer some variety
also tigre's chanchan sample sounds interesting
I didn't think of that one - thanks for reminding.
Of course I'd like to see at least 4 latin music samples but since I try to be a reasonable person, I suggest replacing my "Quizas" sample from 128kbps AAC test with Chanchan1 (http://www.hydrogenaudio.org/forums/index.php?showtopic=19882&view=findpost&p=197226). Same genre but different kind of problems (there are already enough samples with pre-echo/smearing problems IMO) - even with lame --(a-)p standard. Plus it's a recent, decent recording and not some old stuff containing weird artifact-like sounds in the original because of too much processing/restauration (noise removal etc.) like "Quizas" sample.
My "Bodyheat" sample (live recording) is similar (old recording with processing artifacts) - I suggest replacing this one by some other live recording. (IIRC there was a hard to encode sample from eric clapton - unplugged on some test sample page...)
I uploaded 4 samples in the same thread.
Mozart: Requiem - Domine Jesu
Verdi: Requiem - Lacrimosa (noisy, recording from 1959)
Korngold: Die tote Stadt
Mahler: Um Mitternacht (rather silent)
But Guru seems to cover a wide variety anyway.
I vote for my Rosemary (http://www.hydrogenaudio.org/forums/index.php?showtopic=19882&view=findpost&p=196998) sample. It's not only from my favourite song but also a problematic sample I think.
I like it. Also, it is appropriate, since Suzanne Vega was used by FhG to tune the algorithms that would later become MP3
Could you please tell the name of the Album this sample comes from? Also, can this style be defined?
I suggest replacing my "Quizas" sample from 128kbps AAC test with Chanchan1.
OK. Since you surely understand a lot more about latin music than I do, I will replace quizás with chanchan1.
I'd suggest the last one as a replacement, mostly because it produces some very interesting artifacts, but the others should be suitable too.
I will go with your suggestion. These samples are so far from what I personally like, that I wouldn't even know what to pick.
So, mybloodrusts gets replaced by "SinceAlways".
Please see this thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=12255&st=75&). I uploaded 8 samples fitting both categories. I'd be happy to provide some more if you like.
All files uploaded to that thread have been deleted.
I think that happened when IPB was updated to 1.3, and Dibrom mistakenly deleted the entire forum folder.
To be short, my favorite for this test are Debussy - 01. Acte I - Scène 1 - Une forêt (2).wav for measure quality of encoders on very quiet parts
I agree. That is an encoder weakness that hasn't been explored much in my tests.
I will probably feature it for Classical vocal. Can you please upload a lossless version of that sample?
My "Bodyheat" sample (live recording) is similar (old recording with processing artifacts) - I suggest replacing this one by some other live recording. (IIRC there was a hard to encode sample from eric clapton - unplugged on some test sample page...)
What I like about BodyHeat is that, besides being a live recording, it is representing Soul. This style wasn't represented in my tests until this sample got in.
I uploaded Bartok_strings. However it's already known by some Vorbis encoders, still is a "good" sample for the people who are not familiar with classical music.(like me)
I liked the strings quartet sample. That's exactly what I want to be the 4th classical sample: chamber orchestra or strings quartet (preferably quartet). But it's a little too small. Maybe you can provide a bigger sample?
So, my idea of the 6 samples to be added would be:
-Guruboolez' Debussy
-Some string quartet/chamber orchestra
-2 problem samples (suggestions?)
-harashin's rosemary
-A Celtic sample provided by Gecko.
Comments? Ideas? Flames?
Also, thank-you for all the samples ideas and suggestions
Please see this thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=12255&st=75&). I uploaded 8 samples fitting both categories. I'd be happy to provide some more if you like.
All files uploaded to that thread have been deleted.
I think that happened when IPB was updated to 1.3, and Dibrom mistakenly deleted the entire forum folder.
Hehe, I can picture Dibrom's "D'oh!"-moment.
I'll put together another small set of samples then. Are you only interested in celtic/irish or also trance? Please give me time until friday. Got my last exam of this very exhausting semester on thursday.
Are you only interested in celtic/irish or also trance?
Sure, let's give trance a try.
Please give me time until friday.
Until sunday actually. I'll travel tonight for the Easter holydays and will only be back early next monday.
I like it. Also, it is appropriate, since Suzanne Vega was used by FhG to tune the algorithms that would later become MP3
I think Tom's Diner(FhG's pick) is the best of her songs BTW.
Could you please tell the name of the Album this sample comes from? Also, can this style be defined?
It's from "The Best of Suzanne Vega - Tried and True". Usually I don't really care about music styles, but this time I define it as "Folk" or just "Pop". You could check out tag in my FLAC sample anyway.
I liked the strings quartet sample. That's exactly what I want to be the 4th classical sample: chamber orchestra or strings quartet (preferably quartet). But it's a little too small. Maybe you can provide a bigger sample?
I prepare longer version of it.
http://www.hydrogenaudio.org/forums/index....ndpost&p=200859 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=200859)
Got your sample. Thank-you for preparing and uploading it
@ ff123: Thank-you very much for spending some of your time to clarify this for me, it got me thinking and I knew that I was probably doing something wrong. Thanks again (your website by the way, was a priceless source of information for me).
@ Roberto: I deeply respect your work and I had no intention whatsoever to doubt the credibility of your tests, so I hope you didn't take it that way.
@ Roberto: I deeply respect your work and I had no intention whatsoever to doubt the credibility of your tests, so I hope you didn't take it that way.
No worries. It was just a mistake
What I like about BodyHeat is that, besides being a live recording, it is representing Soul. This style wasn't represented in my tests until this sample got in.
I have uploaded a classic Marvin Gaye (http://www.hydrogenaudio.org/show.php/showtopic/20504&) sample to help, if you decide to use the Clapton sample instead of bodyheat.
[span style='font-size:7pt;line-height:100%']
-----------------------------
Edit: OT talk removed[/span]
I think that we should lower the bitrate of MPC compared to the last test! In the last test MPC got a bitrate of 146.1 and that's a big difference to 128 I think. I think that the average bitrate of all test samples should be between 125 and 130!
I agree
re atrac3 or anchor
I'd personally prefer an anchor... mainly because I'm dreading another listening test and an anchor would make me feel better by at least being able to pick one of the codecs
How about increasing from 12 samples to 18?
hm would this help the final results? if not i would avoid this, as 10 samples are already enough to listen to imho
Agree here too
I find it easier to listen to less samples from more codecs... basically I have to learn less samples then
I know the argument about "well you don't have to do all the samples" but I'd still feel compelled to do them all
I have long been a Sony fan - but.... they do have a history of beating a dead horse......remember "Betamax".
And MiniDisc and MemoryStick
I cast my vote for a trance sample and some celtic/irish stuff.
Yes
I cast my vote for a trance sample and some celtic/irish stuff.
Yes
Actually, if anyone has an original, you could kill two birds with one stone with Shiva Shidapu's Power of Celtic
I know its on the Goa-head 5 VA and the Shiva Space Technology album
Flooding the thread, eh, Stux?
Re: Atrac3 vs. anchor: I personally would also prefer an anchor. That would leave my test less open to criticism. But I'm already being very strict about the other 5 codecs, so I'll leave the 6th to users' decision. And since the vast majority wants Atrac...
About 18 samples: Well, that's good if you feel compelled to test them all. I suggest you do one at a time, preferably in a random order, until you get too fed up with them. Then you send me the results
@music_man_mpc: Thanks for the samples. I will check them as soon as I return home, monday.
I can't upload the Debussy in the next hours. I suppose you don't need it before this week-end. Is that OK?
I can't upload the Debussy in the next hours. I suppose you don't need it before this week-end. Is that OK?
Sure. I won't be able to download it until monday anyway...
After some searching I found the "perfect" (IMO) replacement for BodyHeat:
Trust (http://www.hydrogenaudio.org/forums/index.php?showtopic=20504&view=findpost&p=201604)
It's a well done live recording with hard-to encode, reallistic sounding applause (similar to eric clapton - unplugged). It's gospel, but besides the lyrics, it perfectly fits into soul/funk category (at least this song). And it hits a 3rd bird with the same stone, because it contains female vocals .
I've tried to find something useful on other CDs, e.g. a Marvin Gaye live CD, but the quality, especially of the audience noise, is bad in most cases.
Roberto, please see here for Celtic samples:
http://www.hydrogenaudio.org/forums/index....ndpost&p=201810 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20606&view=findpost&p=201810)
Has the version and setting of Lame to use been settled?
3.90.3, 3.96b1, 3.96b2, or 3.96final?
(a)p 128 or V 5?
ff123
3.96 final
Roberto, just wanted to let you know that I'll send you an appropriate version of ABC/HR for Java tomorrow, in case it's not too late. Sorry I didn't finish it earlier. I constantly got sidetracked by other things during the last few weeks (yeah, I know that's a dumb excuse)
As a question to everyone: I implemented a feature to output detailed ABX reports to the results files (progress, time, playback range, etc.). So, basically, I just want to know whether I should enable this feature, or if you would find it useless/evil ("A successful ABX is a successful ABX, it's nobody's business how I got it!", "Not everybody has to know I do all my ABXing at 4am in the morning!").
As a question to everyone: I implemented a feature to output detailed ABX reports to the results files (progress, time, playback range, etc.). So, basically, I just want to know whether I should enable this feature, or if you would find it useless/evil ("A successful ABX is a successful ABX, it's nobody's business how I got it!", "Not everybody has to know I do all my ABXing at 4am in the morning!").
As a feature, it can't hurt -- more information is added. The only problems that can occur are in the interpretation of the results.
Also, to the listeners of this test, enabling or disabling the feature doesn't make a difference at all until the results are decrypted. So I'd say go ahead and enable the detailed reports.
ff123
3.96 final
From the lame threads, it appears that 3.96final may have regressed from the 3.96bx versions (at least for cbr 128), and those were in turn worse quality than 3.90.3 ap 128.
The idea behind the vorbis and aac contenders is to find the best-sounding version. For Vorbis, the choice is actually going far astray from what a typical user would choose to encode with, aotuv not being the official release.
But what is the idea behind the lame choice, exactly?
I actually don't really have a preference for what's chosen for lame, but I think the rationale should be clearly spelled out, especially if the philosophy is not consistent for all codecs.
ff123
It seems a little bit unfair to me testing the best vorbis compile and LAME latest compile, proven by ABX test restults to be inferior to 3.90.3 and even worse than the 3.96 betas, just because it needs to be tested... I think we should test the latest official encoders, no matter if they're considered the best or not.
Besides, Xiph.org seems to believe their current encoder is the best, since they had plenty of time to include tunnings made by Garf and others in their cvs and never did it.
I'm sure this aoTuV encoder you guys have been tuning for the test sounds better than the official xiph encoder, but let's face it, using "special tunned" encoders for this test may be misleading for the common user, not aware of the builds used. If this codec performs good, an HA outsider will get the impression you're talking about xiph's encoder.
So, my suggestion is, test the best available compiles for every codec, or test the latest official compile of every one of them.
Personally, I think it depends on the goals of this listening test: whether it is a test designed to find the best codec for consumers or whether it is a test to find the best codecs based on its technical merits.
aoTuV may be far away from what consumers would most likely use but other formats like MPC are also not so well-known. Or if we look at the recent AAC listening test, I have a suspicion that neither FAAC nor Compaact are as well known to the consumer as, say, Apple iTunes.
If the goal of this test is based on technical merits, then aoTuV is a good example of what Vorbis is capable of and whether development is heading in the right direction. I'm sure codec developers rely a lot listening tests to judge what they've done is good and use these results to improve the codec further. Xiph Vorbis 1.0.1 has not undergone any development since the last test to warrant another test, unless people want to use it as an anchor to see how newer formats compare.
Just my two cents.
But what is the idea behind the lame choice, exactly?
A rather naĂŻve rationale that "newer version is probably better"
Well, I guess now we'll have plenty of time to iron out these issues.
I'm very sorry to inform you that the test will have to be postponed. My dad (a doctor) just called me, and told me I'll have to undergo surgery next friday.
I did an examination while I was at my parents' town for Easter, and the results came out today.
I don't know hot to explain it properly, but the doctor needs to extract from the back of my neck something that we call a "nodule" in portuguese. It might be cancerous.
After the extraction is done, they send it to a lab to examine if it's really cancerous tissue or just some harmless cyst.
Dad told me odds are it's harmless, but you can take no risks...
The biopsy results should be out next monday or tuesday, and by then I'll know if I'll be able to start the test next wendesday, or if I'll have to start doing chemotherapy/whatever and postpone the test indefinitely.
Thank-you for your understanding.
Best regards;
Roberto.
I don't know hot to explain it properly, but the doctor needs to extract from the back of my neck something that we call a "nodule" in portuguese. It might be cancerous.
Oy! Roberto, wishing for the best!
ff123
Best of luck, Roberto.
I'm very sorry to hear that, hope the lab work comes back fine.
I wish you the very best Roberto. This test can definitely wait. Your health is much more important. Let us know if there's anything we can do for you.
Wish you all the best, Roberto.
Oh boy... sorry to hear that, Roberto. Best of luck to you!
Besides, Xiph.org seems to believe their current encoder is the best, since they had plenty of time to include tunnings made by Garf and others in their cvs and never did it.
Garf´s tunings have certain drawbacks (inflated bitrate) which should be ironed out in a reference library.
I'm sure this aoTuV encoder you guys have been tuning for the test sounds better than the official xiph encoder, but let's face it, using "special tunned" encoders for this test may be misleading for the common user, not aware of the builds used.
Xiph.org delivers a reference implementation. Although the xiph´s encoders are certainly the most widespread used ones there´s nothing "wrong" with 3rd-party encoders. You won´t argue LAME should not be used because it is not from Fraunhofer, will you?
If this codec performs good, an HA outsider will get the impression you're talking about xiph's encoder.
I´m sure the exact encoder version will be documented.
Roberto: Best of luck to you!
I´m sure the exact encoder version will be documented.
I will definitely mention on every plot that it is a branch - say, "Vorbis aoTuV" istead of only "Vorbis"
I'll also write a whole paragraph explaining why aoTuV was chosen, linking to the Vorbis test thread.
Thank-you very much for all the good luck wishes
I will definitely mention on every plot that it is a branch - say, "Vorbis aoTuV" istead of only "Vorbis"
I'll also write a whole paragraph explaining why aoTuV was chosen, linking to the Vorbis test thread.
I would like to wish you good luck as well Roberto. At your age it is quite unlikely to be cancerous, although one never knows, so I think you have every right to be optimistic . Not that you don't already know that, I'm sure.
Just wanted to wish you all the best, Roberto!
From the lame threads, it appears that 3.96final may have regressed from the 3.96bx versions (at least for cbr 128), and those were in turn worse quality than 3.90.3 ap 128.
Ok, from my own tests of 3.96final p 128 vs. 3.90.3 ap 128, I definitely prefer 3.96 final using --preset 128 (results of listening to the 12 samples of the first 64 kbs test). See my contribution to the thread:
http://www.hydrogenaudio.org/forums/index....howtopic=20715& (http://www.hydrogenaudio.org/forums/index.php?showtopic=20715&)
ff123
all the best also from my side, rjamorim
I will definitely mention on every plot that it is a branch - say, "Vorbis aoTuV" istead of only "Vorbis"
great idea! also that you are adding vorbis next to aotuv, otherwise it would cause a lot of hassles telling every newbie, that this is a vorbis encoder
My best wishes to you and your family Roberto, hopefully all goes well and you can continue taunting me daily on ICQ
I have finally uploaded the Roni Size sample I suggested to you, if you get the time you can check it out (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=35&t=20742) (and the old post relating to it is here (http://www.hydrogenaudio.org/forums/index.php?showtopic=2824&st=0&&#entry27797))
I hope it will be the good answer, my mother has axactly the same matter at this moment...
Best regards.
Tanguy
all the best also from my side, rjamorim
I will definitely mention on every plot that it is a branch - say, "Vorbis aoTuV" istead of only "Vorbis"
great idea! also that you are adding vorbis next to aotuv, otherwise it would cause a lot of hassles telling every newbie, that this is a vorbis encoder
Adding a big explanation (iTunes *cough*) on top might be a good idea, too.
"aotuv is an experminental Vorbis encoder. It's not the one you can download from Vorbis.com. It's based on code from Xiph.org, but neither maintained nor supported by Xiph.org. You can dowload the latest version of aotuv here."
Adding a big explanation (iTunes *cough*) on top might be a good idea, too.
Yes. I have learned that no amount of clarification is enough, ever :B
Well, I just got the biopsy results, and I'm happy to say it was indeed only a cyst.
Thank-you all very much for the good luck wishes.
Best regards;
Roberto.
Edit: I don't want to rush the test to start it tomorrow, so my plan is to start the test next wednesday (April 28th)
That also gives us more time to iron out issues about Lame version and Vorbis tuning to be tested
Well, I just got the biopsy results, and I'm happy to say it was indeed only a cyst.
great!
Adding a big explanation (iTunes *cough*) on top might be a good idea, too.
Yes. I have learned that no amount of clarification is enough, ever :B
Well, I just got the biopsy results, and I'm happy to say it was indeed only a cyst.
Thank-you all very much for the good luck wishes.
Best regards;
Roberto.
Edit: I don't want to rush the test to start it tomorrow, so my plan is to start the test next wednesday (April 28th)
That also gives us more time to iron out issues about Lame version and Vorbis tuning to be tested
Well, I just got the biopsy results, and I'm happy to say it was indeed only a cyst.
Good for you
(I would assume congratulations aren't really appropriate )
Hello.
I recently got this e-mail:
hi roberto,
ANOVA is not normally used when the dependent variable is measured on an
'ordinal' scale, which is what you've used in these tests. The people are
effectively ranking their preference. ANOVA requires the use of 'ratio' or
'interval' data. While violations of the assumptions underlying ANOVA when
using 'ordinal' data are not as common as previously thought, you should
still tell people that it is a possibility.
I must admit I don't understand what he is talking about. So could someone with knowledge on statistics please tell me
a) What does he mean?
b) Is it correct? Should I add it to the result pages?
Thank-you.
Hello.
I recently got this e-mail:
hi roberto,
ANOVA is not normally used when the dependent variable is measured on an
'ordinal' scale, which is what you've used in these tests. The people are
effectively ranking their preference. ANOVA requires the use of 'ratio' or
'interval' data. While violations of the assumptions underlying ANOVA when
using 'ordinal' data are not as common as previously thought, you should
still tell people that it is a possibility.
I must admit I don't understand what he is talking about. So could someone with knowledge on statistics please tell me
a) What does he mean?
b) Is it correct? Should I add it to the result pages?
Thank-you.
Bollocks.
It's obvious he's never used the ABC/HR application. The ratings aren't just 1,2,3,4,5. It's got a resolution down to the tenths of one of these 5 numbers, which effectively makes the scale continuous, or at least close enough to be able to do a parametric analysis without violating one of the assumptions such an analysis rests on.
ff123
Edit: "Ordinal" means first, second, third, fourth, etc. I.e., a ranking system instead of the rating system used by ABC/HR. So it's possible he wasn't even thinking about too big of a step size in between ratings at all, and confused the more basic idea about ratings vs. rankings. Either way, he's wrong.
Garf´s tunings have certain drawbacks (inflated bitrate) which should be ironed out in a reference library.
It doesn't matter. It performs better than Xiph's reference encoder, and that wasn't enough to arouse Xiph's attention. Fortunatelly, Garf's potential is now being used in AAC...
Xiph.org delivers a reference implementation. Although the xiph´s encoders are certainly the most widespread used ones there´s nothing "wrong" with 3rd-party encoders. You won´t argue LAME should not be used because it is not from Fraunhofer, will you?
Correct me if i'm wrong, but it seems to me that's not the case here. MPEG1 is the reference, the fraunhofer and LAME codecs are different tuned encoders based on that reference. LAME's probably the best mp3 codec around because its open source politics work as they should. The developers work as a team and pay attention to each other ideas.
However, from what i've read, i got the impression Xiph don't give a sh*t about external codec development, and i'm sure Monty is too busy to have a look at these new encoders' source. So you better believe your tunnings will be forgotten when the new only-Monty-coded Vorbis II encoder is relased. And then again, you'll grab the sources and start tunning it, and the whole process repeats itself.
I admit it, maybe i want the reference encoder to be part of the test so Xiph can learn from their mistakes and start giving external developers some attentiton and credit.
btw: glad to know it's nothing serious, Roberto
Hello.
I recently got this e-mail:
hi roberto,
ANOVA is not normally used when the dependent variable is measured on an
'ordinal' scale, which is what you've used in these tests. The people are
effectively ranking their preference. ANOVA requires the use of 'ratio' or
'interval' data. While violations of the assumptions underlying ANOVA when
using 'ordinal' data are not as common as previously thought, you should
still tell people that it is a possibility.
I must admit I don't understand what he is talking about. So could someone with knowledge on statistics please tell me
a) What does he mean?
b) Is it correct? Should I add it to the result pages?
Thank-you.
why don't you reply to that email and ask the author of it?
It doesn't matter. It performs better than Xiph's reference encoder, and that wasn't enough to arouse Xiph's attention. Fortunatelly, Garf's potential is now being used in AAC...
According to http://www.vorbis.com/ot/20030930.html (http://www.vorbis.com/ot/20030930.html)
Fans of the Garf-tuned Vorbis encoder will be happy to hear that Monty is using Garf's work wherever he can.
Since Monty has not released Vorbis 1.1 yet, then it is purely speculative that Xiph's attention has not been aroused. Monty says that he has higher priorities now with OggFile and Vorbis is not the only Xiph project underway so he has to time-slice his efforts. As with most open source projects, developers aren't paid generous salaries to work on projects with full dedication. I'm sure the other 3rd party Vorbis tuners would understand also the sacrifice we make in our own spare time to work on Vorbis. These things take time. We are not employed by big companies with commercial interests to work full-time.
However, from what i've read, i got the impression Xiph don't give a sh*t about external codec development, and i'm sure Monty is too busy to have a look at these new encoders' source. So you better believe your tunnings will be forgotten when the new only-Monty-coded Vorbis II encoder is relased. And then again, you'll grab the sources and start tunning it, and the whole process repeats itself.
I don't think it is a mandatory expectation for all 3rd-party code to be included in the reference coder. The source code is open, outside developers tweak it, and it is up to the lead developer, Monty, to determine whether the tweaks are good enough to be included in the reference, similar to how Linus Torvalds makes the final judgement about what goes into his linux kernel, in the midst of a plethora of code contributions from around the world.
I admit it, maybe i want the reference encoder to be part of the test so Xiph can learn from their mistakes and start giving external developers some attentiton and credit.
Well, it is ultimately up to Roberto to determine the rationale of the test and if he does choose the reference Vorbis encoder, then I doubt it is for that reason.
I admit it, maybe i want the reference encoder to be part of the test so Xiph can learn from their mistakes and start giving external developers some attentiton and credit.
Well, it is ultimately up to Roberto to determine the rationale of the test and if he does choose the reference Vorbis encoder, then I doubt it is for that reason.
That reason is nonsensical.
Let's repeat the mantra again:
"This test it to compare each of the featured formats amongst themselves, and discover what format offers the best quality at 128kbps avg."
Of course, for that to be valid, each format must be represented by it's best implementation at that bitrate. The best MP3 encoder (Lame), the best AAC encoder (iTunes), and the best Vorbis tuning.
Also, these tests are done by users, for users. I won't use a bad implementation of Vorbis just to punish Xiph because they can't take care of their schedule and their priorities properly.
Well, I guess ff123's proposed Lame listening test warrants yet another postponing.
http://www.hydrogenaudio.org/forums/index....howtopic=21079& (http://www.hydrogenaudio.org/forums/index.php?showtopic=21079&)
The test should now start on May 5th.
I wonder if this test will ever start
I just remember The Supremes' old song "You can't hurry love (http://my.execpc.com/~suden/hurry_love.html)".
What are the settings going to be for wma9 standard?
There's
1. CBR
2. Bit rate VBR (Peak)
3. Quality VBR
4. Bit rate VBR
Seems like Quality VBR would be most like the other competitors, but there would have to be a number of albums encoded to find the appropriate quality setting. WMEncoder offers Q10, Q25, Q50, Q75, Q90, and Q98.
If it turns out that none of these settings yields a stable average bitrate close to 128 over a number of albums, then probably next best would be Bit rate VBR, which I imagine is a 2-pass scheme.
ff123
Hi All,
I'm new here but I've been using compressed audio for a while. I've been reading around and trying out ABC/H R after finding a link to it here.
I'd like very much to join in this next listening test. I was wondering if it is too late to suggest a sample. I have in mide a piece of opera, a passage sung by the baritone in Carmina Burana. The recording I have has very wide dymanics and I have heard it trip up codecs many times.
If anyone is interested I can prepare a .wav sample
Thanks,
Luke
If anyone is interested I can prepare a .wav sample
Yes. Please upload a sample to the forum's upload section. All samples are welcome.
If anyone is interested I can prepare a .wav sample
Remember to make it less then or equal to 30 secounds in length and if you could encode it in FLAC before uploading it would cut down on the HA server's bandwidth. Here (http://www.rarewares.org/files/lossless/flac-1.1.0.zip) is the encoder and here (http://www.rarewares.org/files/lossless/flacdrop.zip) is a frontend (if you are running win32 and like frontends).
Ok, I posted the file:
Upload section (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=35&t=21181)
I don't know how suitable you will think it is for this test, at 128kbps most problems are just nuances. It does not produces the glaring flaws that some of the test samples here do. It might be more suitable for a lower bitrate test.
Also I have found this may be tricky to audition on some headphones.
My Sennheiser HD495’s are not that great and can't handle the dynamic range, so I have to listen to the quiet parts and loud parts separately. I find it easier to detect problems with this track using my speakers.
Luke
p.s. Thanks for the links music_man!
p.s. Thanks for the links music_man!
No prob.
Thats a really interesting sample, I think it should be added to the test just because it is so different from all the others. Very refreshing, thanks Luke_A_P.
Hello,
I posted a new test sample for the 128 kbit Multiformat test
Test Sample (The Hooters) Irish Folk Rock (http://www.hydrogenaudio.org/forums/index.php?showtopic=21201&view=findpost&p=207452)
Regards
naturfreak
Someone for whom itunes 4.5 AAC with extra high freq information may have proved detrimental:
http://www.hydrogenaudio.org/forums/index....ic=21148&st=25& (http://www.hydrogenaudio.org/forums/index.php?showtopic=21148&st=25&)
ff123
Well, I got the sample list more or less defined
These are the samples I plan to feature. If you disagree, please reply FAST telling what sample you don't like and what would you replace it with.
Bartok_strings2 - String quartet
BigYellow - Easy listening
chanchan - Latino (replaces Quizás)
DaFunk - Electronic (House)
Debussy.wav - Classical vocal
getiton - Soul (replaces bodyheat)
gone - Extreme Metal
Hongroise - Piano solo
kraftwerk - Electronic (evil sample)
Leahy - Celtic
Mahler - Full orchestral
NewYorkCity - Pop/Rock
OrdinaryWorld - New Wave/Art Rock (This one stays! )
rosemary - Pop/Folk
SinceAlways - whatever style dev0 likes (replaces mybloodrusts)
trust - Gospel choir (fully featured with organ!)
velvet - Electronic (evil sample)
Waiting - Rock (evil sample)
Something is missing? Complaints? Comments? Questions? Please reply until monday night. Tuesday morning I'll start encoding.
Best regards;
Roberto.
If anyone wants to check the samples:
http://www.rarewares.org/samples/ (http://www.rarewares.org/samples/)
Maybe I've missed something but I can't see any live recording (-> applause)...
Maybe I've missed something but I can't see any live recording (-> applause)...
trust
Edit: Heck, dude, you submitted this sample yourself
http://www.hydrogenaudio.org/forums/index....ndpost&p=201606 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20301&view=findpost&p=201606)
Maybe I've missed something but I can't see any live recording (-> applause)...
trust
Edit: Heck, dude, you submitted this sample yourself
http://www.hydrogenaudio.org/forums/index....ndpost&p=201606 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20301&view=findpost&p=201606)
LOL - Very good choice
Not to self: Drink Coffee:
first ... Wait a few minutes:
second ... Start reading HA:
third
leaving out the point that there seem to be 4 classical-style samples rjamorim, will you use the latest apple codec or the version before that (as ff123 reported some problems with the latest version)?
rjamorim, will you use the latest apple codec or the version before that (as ff123 reported some problems with the latest version)?
I think it's worthless to test a version that is not available anymore.
Menno
hm its still available (only the english quicktime was updated till now)
also if its really a big issue, i am sure apple will fix it
You can actually find all QuickTime versions back to 4.0 if you dig deep on Apple's support site.
I'm still trying to decide if I'll go with 4.2 or 4.5
I'm still trying to decide if I'll go with 4.2 or 4.5
hm maybe some silent reader from apple wants to tell rjamorim which encoder he recommends to be used to fight wma9 in this comparison
hm maybe some silent reader from apple wants to tell rjamorim which encoder he recommends to be used to fight wma9 in this comparison
He told me to consult the HA golden ears. So, no, no help is coming from Apple
He told me to consult the HA golden ears. So, no, no help is coming from Apple
ok so the question is whether you will risk using a potentially "buggy" aac encoder in a listening test which will maybe be the last multichannel128 one you conduce and everyone will point at in the future
or if you choose going the save way with a highly tested stable encoder
i vote for the later
I vote for the later also, but for another reason:
As JohnV made me see, there are two approaches to the codec choice:
-Use what is popular and easily available. That means iTunes 4.5 and vanilla Vorbis 1.0.1
-Use what is best at what it does, even if it's obscure or hard to find. That means iTunes 4.2 and Vorbis aoTuV.
I personally prefer the second option. Opinions?
I think the first option is actually better, as a normal user will simply go to the program's main site and download the latest version of the program (thinking that every newer version is better).
If you plan to do the test for normal users, go with the newest, original builds. If you don't care about that, test using the encoders which produce the best quality.
One more thing... If we are talking about the normal user who always gets the latest version, you would have to test Xiph's Vorbis 1.0.1, iTunes 4.5, LAME 3.96, etc.
Definitly the second option!
I want to know which is the best codec and how good the best vorbis-encoder is.
Why should we test a worse encoder? I think you should use the best encoder for each format. If the new Itunes 4.5 is better than 4.2 then we should use it, if not then we should use the 4.2!
And why should a Normaluser who read the listening test use the xiph-encoder and not the aoutuv? Also the normalusers who read this listeningtest want to have best quality, isn't it?
Big_Berny
I'd prefer option 1.5: AoTuV and iTunes 4.5. The reason for option 1 makes sense for iTunes, but I think we all know ol' Vorbis 1.01 all too well, whereas AoTuV performance is exciting and would be very interesting to see compared to others.
Go for the first option. No one except HA users use "obscure and hard to find" compiles.
... and no one except HA users cares as much about Roberto's listening tests.
whats going on now? why should we test 1.0.1? we should go for the best encoder which a format can offer... well what else???
we are doing the test not for what newbies will use, but to find out which format is the best!
therefore aotuv was found to be the best vorbis encoder -> it should be used
who says that noone will use it if people find out how good it is?
and if the current apple encoder is "buggy" or "alpha", which apple might fix in the next release, why use it?
it makes no sense... everyone will take this listening test on how good the formats are in the future, nothing less nothing more
we arent testing usability of the codecs or so...
why sacrifie this "one-time" opportunity for being "newbie-compatible"???
I'd prefer option 1.5: AoTuV and iTunes 4.5.
Nope. That would be incoherent at best.
what going on now? why should we test 1.0.1? we should go for the best encoder which a format can offer... well what else???
This is simply quality vs. popularity. Pick your poison.
And, please, don't confuse "popular" with "for dumb people". That's rude at best.
And, please, don't confuse "popular" with "for dumb people". That's rude at best.
well ok thats too hard of course
my point simply was that we should not test the codecs according to how newbie-proof or popular the encoding tools are but what quality the format brings
I'd prefer option 1.5: AoTuV and iTunes 4.5.
Nope. That would be incoherent at best.
No, it's not. It's just going for the "best thing available". I just tumbled around for several minutes in the Apple website and searched in Google and nowhere could I find a darn link to download version 4.2, they all pointed to the 4.5 version. On the other hand, a simple google search for "aoTuv download" gave me a page for the download page as the 2nd or 3rd hit. iTunes 4.2 might be better than 4.5, but if it isn't even available, what's the point? If Apple screwed up with AAC in that version, too bad. Maybe the test will help them realise their mistake and correct it.
EDIT: OK, I just found out that the version at snapfiles.com is still v4.2, so yes, it is still available with a simple search... but only as long as Snapfiles doesn't update their version, which I guess will be soon.
I was thinking you could pull one of those classical samples (come on, classical is easy to encode!) and replace it with some Jesus and Mary Chain or My Bloody Valentine as really noisy high-frequency music is a) not represented here at all, and b) represented very well in my collection
I can upload a sample when i get home. should be really good for identifying ringing and high frequency reproduction.
No, it's not. It's just going for the "best thing available". I just tumbled around for several minutes in the Apple website and searched in Google and nowhere could I find a darn link to download version 4.2, they all pointed to the 4.5 version. On the other hand, a simple google search for "aoTuv download" gave me a page for the download page as the 2nd or 3rd hit. iTunes 4.2 might be better than 4.5, but if it isn't even available, what's the point?
QuickTime 6.4 is still available, which outputs same AAC bitstreams as iTunes 4.2
http://docs.info.apple.com/article.html?artnum=120297 (http://docs.info.apple.com/article.html?artnum=120297)
If Apple screwed up with AAC in that version, too bad. Maybe the test will help them realise their mistake and correct it.
Ehm... my tests are not meant to punish developers' slips, you know?
QuickTime 6.4 is still available, which outputs same AAC bitstreams as iTunes 4.2
http://docs.info.apple.com/article.html?artnum=120297 (http://docs.info.apple.com/article.html?artnum=120297)
If Apple screwed up with AAC in that version, too bad. Maybe the test will help them realise their mistake and correct it.
Ehm... my tests are not meant to punish developers' slips, you know?
Oh well, I guess you beat me.
Anyway, my primary concern is really to see aoTuV in the test, not what iTunes version is used.
Does someone know what version of iTunes are iTunes Music Store AACs encoded in? Do they "upgrade" them whenever a new version of the codec is released? If not, that would be a good reason to use 4.2 then; or 4.5, if they do, which seems unlikely.
My opinion is to go for the best codecs available. That is iTunes 4.2 for AAC and aoTuV for vorbis. Just make a note for those that might say that newest is better on the presentation page that iTunes 4.2 is used instead of 4.5 linking to the HA thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=21148&hl=) that discusses why 4.2 is better and in the hope that Apple will soon fix this.
regards
echo
sample (http://www.hydrogenaudio.org/forums/index.php?showtopic=21213&view=findpost&p=208415)
Here's another sample for the upcoming test (if it's not too late). Listen for yourself.
We should use iTunes 4.2 in the listening test.
It doesn't matter when the encoder is available exactly. iTunes 4.2 was widely available not long ago (thus it is still installed on many computers). Roberto wouldn't postpone the test if a new version of iTunes was announced for one month in the future, would he?
As it is the new version 4.5 raises some doubts concerning its encoding performance (see earlier in this thread) so it doesn't make sense to use this instead.
If someone really needs to know how iTunes 4.5 compares with the other codecs in the test, the results of iTunes 4.2 can give that person a good indication. Except the new version could be worse (until Apple provides some patch)
Well, since noone really complained about the samples selection, I'll freeze it. The samples will be precisely the ones available at the start of the former page in this thread.
Now, this is the codec selection I am proposing:
-Lame 3.96 -V5 --athaa-sensitivity 1 (as per this (http://www.hydrogenaudio.org/forums/index.php?showtopic=21079&view=findpost&p=207659) thread)
-Musepack 1.14 --quality 4 --xlevel
-Vorbis aoTuV -q 4 (as per
this (http://www.hydrogenaudio.org/show.php/showtopic/20389) thread)
-Apple iTunes 4.2 AAC 128kbps
-Microsoft WMA9 Std bitrate VBR 128kbps
-Sony SonicStage 2 Atrac3 132kbps
Comments? Criticism? Opinions?
Please reply fast. I'll start encoding the samples and creating the packages in about 12 hours.
Regards;
Roberto.
There is something to be said for using whole numbers for the quality settings. I.e., quality setting 4 for both Vorbis and MPC.
However, from the bitrate table I made
http://www.hydrogenaudio.org/forums/index....ndpost&p=207203 (http://www.hydrogenaudio.org/forums/index.php?showtopic=21079&view=findpost&p=207203)
AoTuV beta 2 should be about 128 kbit/s (on the albums I used) at about 4.35 quality.
and mpc 1.14 beta should be about the same at 4.15
Otherwise, both Vorbis and MPC will be a bit low on average.
ff123
Edit: the numbers above are extrapolated from the data I took. I actually encoded Vorbis at 4 and 4.25, and MPC at 4 and 4.2.
Overall I like the selection, but I'm not sure if using LAME -V 5 is really a good idea. Is Gabriel still opposed to the idea of using VBR?
And please include links to the discussion/testing threads to explain why each encoder/setting was chosen. There will be a lot of confusion (and, of course, bitching) otherwise.
Looks like Musepack has the problem where in systems where ',' is used as the decimal separator, then a setting like --quality 4,2 is interpreted as --quality 4
Maybe it's better just to use 4 and be done with it.
ff123
I will use decimal with vorbis, if you agree that's best.
And please include links to the discussion/testing threads to explain why each encoder/setting was chosen. There will be a lot of confusion (and, of course, bitching) otherwise.
Of course. There will be lenghty explanations. And bitchers will be methodologically flamed and insulted, if proven necessary.
Overall I like the selection, but I'm not sure if using LAME -V 5 is really a good idea. Is Gabriel still opposed to the idea of using VBR?
Well, my opinion is not the only one that should count.
I would prefer seeing lame 3.96 in abr mode, but also iTunes 4.5 and not 4.2.
It is because I'd prefer to have some results representative of usual usage of the codecs.
On the other hand, if people prefer to test the best from every format, then lame in vbr and iTunes 4.2
Looks like Musepack has the problem where in systems where ',' is used as the decimal separator, then a setting like --quality 4,2 is interpreted as --quality 4
Sorry, i can't see the problem. Mppenc uses "." as separator in all systems, the Roberto's batch file will be always valid. The problem is only for oggenc that uses a different separator.
Question: could someone legally distribute iTunes 4.2?
If it's not possible, I don't think that testing something not available at all have any sense.
what sense does it make to use a format at an inferior quality in a listening test to determine which format offers the greatest quality ???
NONE! wtf are you guys smoking?
who cares if joe shmoe more often uses something else? not me, i couldn't care less. these tests are to determine quality, not a popularity contest
if it were a popularity contest, then aotuv's enhancements are out.
so is musepack entirely.
and lame at something other than cbr 128/160/192
...get the picture?
hm someone could also say testing a potentially buggy encoder, which will maybe be fixed in the next releases of itunes anyways, in a test which maybe will never been done again in this way also doesnt make much sense
as rjamorim said, this will come close to "punishing developers' slips"
and from what rjamorim said, old versions of itunes still are available
and from what rjamorim said, old versions of itunes still are available
Of QuickTime!!!!
iTunes 4.2 has been completely wiped from Apple's site. (makes sense, since it's also a DRM update)
and from what rjamorim said, old versions of itunes still are available
Of QuickTime!!!!
well ok, its the same encoder anyways
still i think the test should be about what quality the formats bring at best, without caring about usabiltiy of tools
i mean itunes or qt shouldnt be seen as itunes/quicktime, but as the representative of the aac format and every format should be represented with the best quality bringing encoder out there
for me a comparison of different audio encoding format means that only the quality counts and nothing else
for me a comparison of different audio encoding format means that only the quality counts and nothing else
It means that a preliminary test must be done, comparing ~10 samples with iTunes 4.2 and iTunes 4.5.
People are very fast for concluding strong conclusions...
for me a comparison of different audio encoding format means that only the quality counts and nothing else
It means that a preliminary test must be done, comparing ~10 samples with iTunes 4.2 and iTunes 4.5.
People are very fast for concluding strong conclusions...
yeah exactly, obviously this cant be done always (the same goes for what encodings settings to use with each encoder, what bitrates aso...)
some things simply cant be tested with an own listening test cause it would be too much work
so basically we have to look at what we know already and that is that apples old encoder was the best aac encoder in the last aac-only test and we know the reports from ff123 on the latest apple encoder
for me that would be enough to say "why risking something if the old apple encoder was the best aac codec already anyways"
so basically we have to look at what we know already and that is that apples old encoder was the best aac encoder in the last aac-only test and we know the reports from ff123 on the latest apple encoder.
Apparently, most people want the best solution. Couldn't we imagine that iTunes 4.5 is better than 4.2?
so basically we have to look at what we know already and that is that apples old encoder was the best aac encoder in the last aac-only test and we know the reports from ff123 on the latest apple encoder.
Apparently, most people want the best solution. Couldn't we imagine that iTunes 4.5 is better than 4.2?
well is the same point you said already, we dont know it without an own listening test: so we can only speculate _maybe_ its better, maybe its also worse,
still its the same question:
will using the new encoder do the aac format more harm or more good?
what does it potentially bring to use a specific version and what does it potentially cost...
for me the old aac encoder already showed that it shines and imho will be a good and stable representative for the aac format
@ Roberto
What about the decimals in vorbis and mpc? Will you use them? I think it should be better to get as close to 128kbps as possible just to be fair.
regards
echo
What about the decimals in vorbis and mpc? Will you use them? I think it should be better to get as close to 128kbps as possible just to be fair.
Someone made a point about this before. A lot of these samples are pretty tricky to encode, so if a quality based encoder like vorbis q=4 is doing its job these samples should have bitrates greater than 128kbps (but the whole track might have an average rate of more like 128kbps).
This way you are testing the bitrate allocation system as well as the encoding performance. I think it is fair since if I encode a sample with a certain quality level and find it is larger than I expected I don't generally go back to change it, that is not the point of quality based encoding.
Luke
Regarding the test, it seems to me that it should follow previous precedent for the sake of consistency.
In previous multiformat test, LAME version used was the proven 3.90 despite the fact that newer versions existed. And even at that, there has been much debate and testing to "authenticate" 3.96 as the new standard.
So following that standard, it would mean using iTunes 4.2. \
Personally, I am interested in 4.5 since I updated and that is what I am using. But for the sake of consistency, I think it makes more sense to test 4.2 for now, and then next step can be to focus on 4.2 vs 4.5.
Podunk.
I've posted in the apposite thread the result of another test where iTunes 4.2 is better than 4.5. More testing is necessary but (until proved the contrary with more samples) the last version seem to be inferior.
I think that we have to go with the good 4.2 version for this test.
OK. I read the last two pages of this thread, and it seems to me more people are demanding iTunes 4.2 instead of 4.5
So, I'll start encoding now, to iTunes 4.2.
I'm also considering distributing MPC and Vorbis pre-encoded. Else, the test wouldn't be so multiplatform - *nix users would have to compile aoTuV themselves, and there is no MPPenc for BSD, Solaris, etc.
The downside is that Reference, WMA and Atrac3 would already have to be distributed in FLAC. That would probably bring package sizes to about 10Mb :/
Ideas?
Test starts tomorrow (at last!!!)
Thank-you for all your invaluable feedback.
Regards;
Roberto.
Together with all the pre-tests and discussion, I'm pretty sure this will be one of the most interesting (and probably most controversial...) tests ever.
Good luck.
P.S.: Is anyone gonna submit a test announcement to slashdot, arstechnica or similiar sites?
P.S.: Is anyone gonna submit a test announcement to slashdot, arstechnica or similiar sites?
I'm going to submit a test announcement to some French "famous" forum the way I submitted some of the previous tests.
The downside is that Reference, WMA and Atrac3 would already have to be distributed in FLAC. That would probably bring package sizes to about 10Mb :/
Ideas?
hm i dunno how to set this up myself, but maybe the spreading of the packages via BT could help?
hm i dunno how to set this up myself, but maybe the spreading of the packages via BT could help?
This sounds like a good idea to me. But I dont know how many people have Bit Torrent.
I just want to find a legal use for P2P
This sounds like a good idea to me. But I dont know how many people have Bit Torrent.
Bit Torrent is easy enough to set up but is the problem here hosting the files or just how much people will have to download, bit torrent doesn't make the download any smaller.
If hosting is a problem I may be able to host some of the samples.
I can't host many because I have a monthly cap on webspace bandwidth.
Luke
Together with all the pre-tests and discussion, I'm pretty sure this will be one of the most interesting (and probably most controversial...) tests ever.
Oh, for sure. Criticism will pour in.
Good luck.
Thank-you
hm i dunno how to set this up myself, but maybe the spreading of the packages via BT could help?
Ah, thanks for reminding me. I wanted to ask this.
I would like to set up a BT tracker on RareWares, to help spread packages through other means than http. Could some one guide me into how to do that, considering it's Linux (RedHat) and I have no root access?
If hosting is a problem I may be able to host some of the samples.
I can't host many because I have a monthly cap on webspace bandwidth.
Well, RareWares has 150Gb cap. RW itself uses about 50Gb/month. That leaves us with 100Gb, which
should be enough. But just to be safe, I would like to set up BT. Even if it help save 10% of the bandwidth, it would already be of much help.
There seems to be a free tracker at this link.
http://dehacked.2y.net:6969/ (http://dehacked.2y.net:6969/)
Edit: fixing linkage
Ok nice to see all OK for you Rjamorim... I'll participate to this test thanks to Nyarla who submited the announcement on the "world well-known board"...
Regards
There seems to be a free tracker at this link.
http://dehacked.2y.net:6969/ (http://dehacked.2y.net:6969/)
Edit: fixing linkage
Very interesting. But still, I gotta admin all I know about BitTorrent is downloading .torrent files and throwing them at TorrentStorm. I don't know how to create .torrent files, how to set up and configure the php tracker...
Can someone point me to links to guides or something?
SonicStage is stupid and retarded, and refuses to encode my samples.
http://pessoal.onda.com.br/rjamorim/screen1.png (http://pessoal.onda.com.br/rjamorim/screen1.png)
(these are standard WAVs! It plays back those tracks without any problem, but refuses to convert to Atrac3)
If I don't find a way to fix this issue until later tonight, Atrac3 is out.
SonicStage is stupid and retarded, and refuses to encode my samples.
http://pessoal.onda.com.br/rjamorim/screen1.png (http://pessoal.onda.com.br/rjamorim/screen1.png)
(these are standard WAVs! It plays back those tracks without any problem, but refuses to convert to Atrac3)
If I don't find a way to fix this issue until later tonight, Atrac3 is out.
OK. I'll try to provide samples. (encode to ATRAC > Burn into AudioCD > Rip to wav)
Edit: Somehow exact same error happens here. I probably need to reinstall the program or something.
screenshot (http://cyberquebec.ca/harashin/sonicstage.png)
SonicStage is stupid and retarded, and refuses to encode my samples.
http://pessoal.onda.com.br/rjamorim/screen1.png (http://pessoal.onda.com.br/rjamorim/screen1.png)
(these are standard WAVs! It plays back those tracks without any problem, but refuses to convert to Atrac3)
If I don't find a way to fix this issue until later tonight, Atrac3 is out.
Have you tried opening them with some program (Adobe Audition, for instance) and re-saving them (without modifications, of course)? "Converting" those WAVs to WAV with foobar2k could be another way of "re-saving" them that might work.
Have you tried opening them with some program (Adobe Audition, for instance) and re-saving them (without modifications, of course)? "Converting" those WAVs to WAV with foobar2k could be another way of "re-saving" them that might work.
Yes. Doesn't work.
It must be a bug in SonicStage. It can open the file without problems to play back, so why is it bitching about not being able to open it to encode?
,May 4 2004, 06:03 PM] The problem is only for oggenc that uses a different separator.
Both Aoyumi and myself have compiled an oggenc that uses only a . (dot) as the separator.
OK. I'll try to provide samples. (encode to ATRAC > Burn into AudioCD > Rip to wav)
Edit: Somehow exact same error happens here. I probably need to reinstall the program or something.
screenshot (http://cyberquebec.ca/harashin/sonicstage.png)
Thank-you. I uninstalled, restarted windows, installed, restarted windows, and now it works
So, no worries about providing the samples anymore.
Instead of burning to CD, I captured the played audio with total recorder. Hopefully that won't introduce artifacts or other weirdness.
The ill-famed bitrate deviation table. Bitrates were obtained with foobar, except for the Atrac3 bitrates, that were obtained with ff123's python bitrate calculator.
http://pessoal.onda.com.br/rjamorim/Bitrates.txt (http://pessoal.onda.com.br/rjamorim/Bitrates.txt)
iTunes, WMA Std and Atrac3 are very well-behaved. Lame and Vorbis deviate a little too much, but they are still less than 10kbps far from the target bitrate. MPC continues making my life miserable.
Very interesting. But still, I gotta admin all I know about BitTorrent is downloading .torrent files and throwing them at TorrentStorm. I don't know how to create .torrent files, how to set up and configure the php tracker...
Can someone point me to links to guides or something?
So? Anybody willing to help?
I just calculated, and the entire sample set will end up in 165Mb (!)
So, yes, I will definitely need some sort of p2p.
For basic information about creating torrents and setting up a tracker read Brian Dessent's excellent BT-FAQ (http://dessent.net/btfaq/).
CompleteDir (http://prdownloads.sourceforge.net/bittorrent/completedir-1.1.exe?download) is the 'official' tool for creating *.torrent files, but many clients and tools like TorrentSpy have that capability too nowadays. I'd still use CompleteDir to be safe, since it's the reference and really easy to use.
I could offer you to use my personal tracker, but it's hosted on a 128kbps upstream ADSL connection and I won't guarantee 100% uptime.
I've also prepared a py2ex-ified version of bttrack.py, which you (or someone else) could use to run a tracker under Win32:
http://dev0.rc55.com/bttrack-win32/bttrack...cvs20040405.zip (http://dev0.rc55.com/bttrack-win32/bttrack-win32-3.4.2-cvs20040405.zip)
There is a PHP tracker implementation, which could work on a server without root access, but I've never used it.
Thank-you very much, dev0
Would just like to inform a last-minute change:
In an attempt to bring the bitrates down a little, ff123 found a sample that I used to replace velvet.
The sample is called ItCouldBeSweet
http://www.rarewares.org/samples/ItCouldBeSweet.flac (http://www.rarewares.org/samples/ItCouldBeSweet.flac)
And the new bitrate table looks like this:
http://pessoal.onda.com.br/rjamorim/Bitrates.txt (http://pessoal.onda.com.br/rjamorim/Bitrates.txt)
Now the average bitrates are deviating at most +8 from the target bitrate. Much better than +13, as it was before.
Regards;
Roberto.
Here are links to the samples in the test. I've placed some of them on my site.
Bartok_strings2 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=200859)
BigYellow (http://ff123.net/samples/BigYellow.flac)
chanchan (http://www.hydrogenaudio.org/forums/index.php?showtopic=19882&view=findpost&p=197226)
DaFunk (http://ff123.net/samples/DaFunk.flac)
Debussy (http://ff123.net/samples/Debussy.flac)
getiton (http://www.hydrogenaudio.org/forums/index.php?showtopic=20504&view=findpost&p=200896)
gone (http://ff123.net/samples/gone.flac)
Hongroise (http://ff123.net/samples/Hongroise.flac)
ItCouldBeSweet (http://ff123.net/samples/ItCouldBeSweet.flac)
kraftwerk (http://ff123.net/samples/kraftwerk.flac)
Leahy (http://www.hydrogenaudio.org/forums/index.php?showtopic=20606&view=findpost&p=201816)
Mahler (http://ff123.net/samples/Mahler.flac)
NewYorkCity (http://www.phong.org/audio/NewYorkCity.flac)
OrdinaryWorld (http://ff123.net/samples/OrdinaryWorld.flac)
rosemary (http://www.hydrogenaudio.org/forums/index.php?showtopic=19882&view=findpost&p=196998)
SinceAlways (http://ff123.net/samples/SinceAlways.flac)
trust (http://www.hydrogenaudio.org/forums/index.php?showtopic=20504&view=findpost&p=201604)
Waiting (http://ff123.net/samples/Waiting.flac)
ff123
Edit: Thanks. I transferred Debussy and SinceAlways to my site, as well as kraftwerk, which I converted to flac.
For some reason, I forgot to include Bartok_strings2. Thanks, all.
Kraftwerk : http://www.hydrogenaudio.org/forums/index....ST&f=35&t=21497 (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=35&t=21497)
Debussy.flac
SinceAlways.flac
Can't hold them there for long,though. For some reason I couldn't make upload to HA (have to investigate what's wrong at my end).
rgds,
halcyon
Bartok_strings2 (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=200859)
Checked the bitrates using
python bitrate calculator (http://www.hydrogenaudio.org/forums/index.php?showtopic=10816&view=findpost&p=109091)
for the individual samples. The averages at the bottom were calculated by entering the numbers into excel and averaging (instead of using the bitrate calculator). This weights each sample equally instead of giving more weight to longer samples and less weight to shorter samples.
MPC Vorbis Lame
Bartok_strings2 153 147 146
BigYellow 148 144 136
chanchan 148 146 144
DaFunk 135 124 143
Debussy 98 119 109
getiton 128 126 129
gone 132 133 132
Hongroise 104 127 117
ItCouldBeSweet 92 110 94
kraftwerk 152 133 141
Leahy 155 148 134
Mahler 146 133 145
NewYorkCity 144 136 132
OrdinaryWorld 153 145 143
rosemary 135 135 130
SinceAlways 146 123 153
trust 144 152 146
Waiting 153 143 148
----------------------------------------------
Averages 137 135 135
I think the above average bitrates are still too high for a "128" test, lending ammunition to critics who would say that "of course the highest bitrate codecs would rate better."
Here are the averages for each sample (averaging over mp3, mpc, and vorbis), from lowest to highest:
averages
ItCouldBeSweet 98.7
Debussy 108.7
Hongroise 116.0
getiton 127.7
gone 132.3
rosemary 133.3
DaFunk 134.0
NewYorkCity 137.3
SinceAlways 140.7
Mahler 141.3
kraftwerk 142.0
BigYellow 142.7
Leahy 145.7
chanchan 146.0
OrdinaryWorld 147.0
trust 147.3
Waiting 148.0
Bartok_strings2 148.7
If samples are to be replaced, it would be more efficient to replace from the bottom (highest average bitrate first.
ff123
If samples are to be replaced, it would be more efficient to replace from the bottom (highest average bitrate first.
If Bartok_strings2 should be replaced with other sample for that reason, I can submit Shostakovich (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=210090) instead. (I prefer Bartok.)
If samples are to be replaced, it would be more efficient to replace from the bottom (highest average bitrate first.
If Bartok_strings2 should be replaced with other sample for that reason, I can submit Shostakovich (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=210090) instead. (I prefer Bartok.)
Shostakovich bitrates:
mp3: 146
mpc: 139
vorbis: 131
The replacement samples, if any, should average below 128, preferably by 10 kbit/s or more.
ff123
The replacement samples, if any, should average below 128, preferably by 10 kbit/s or more.
OK, I'll search for more quiet one. Not sure if it also will be a hard-to-encode though.
Edit:Check Webern (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=210104).
Here is a hip-hop sample, from Mystikal's "Let's Get Ready" album. The title is extra on the CD and untitled, but if I had to give it a title, it would be "Are You Lookin' for Me?"
looking4me (http://ff123.net/samples/looking4me.flac)
bitrates
mp3: 114
mpc: 105
vorbis: 120
ff123
The replacement samples, if any, should average below 128, preferably by 10 kbit/s or more.
OK, I'll search for more quiet one. Not sure if it also will be a hard-to-encode though.
Edit:Check Webern (http://www.hydrogenaudio.org/forums/index.php?showtopic=20498&view=findpost&p=210104).
That should probably help
I calculate:
mp3: 120
mpc: 128
vorbis: 129
ff123
Here's another rather low bitrate sample:
ExitMusic (http://ff123.net/samples/ExitMusic.flac)
"Exit Music (For A Film)" by Radiohead from the album "OK Computer"
male vocal with guitar
mp3: 103
mpc: 111
vorbis: 120
ff123
Sorry, but I'm not really planning to change samples for this test anymore. Three reasons:
- The first try at this test was a big mess. I'll try to fix all my mistakes this time. So, I won't change any of the test settings, to avoid new errors and mistakes creeping in.
- Samples that use lower bitrate are usually easier-to-encode samples. I don't like the idea of replacing problem cases with easy cases. It would make testing those samples too difficult to non-golden-ears, and would probably make the test less meaningful overall.
- I don't care about critics anymore. Let them drool and whine all they want. If someone wants to believe my tests, he's welcome to do so. If he doesn't, he should just ignore it. And if he doesn't ignore it, I'll be offensive and mean.
Thank-you for searching for samples, anyway.
Regards;
Roberto.
I agree with Roberto. Finding samples with lower bitrate ultimately means finding easier-to-encode samples.
IIRC the settings used are supposed to bring bitrates of "normal" music around 128kbps and not of problem samples. So there's really no point in changing samples anyway. More difficult to encode samples should be used IMHO since they are the ones that will be given lower scores and make the statistical result more significant. Easy to encode samples would probably have too many 5.0 scores.