HydrogenAudio

Hydrogenaudio Forum => Validated News => Topic started by: rjamorim on 2004-03-01 05:49:49

Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 05:49:49
The AAC at 128kbps listening test is finished at last.

The summary is: iTunes and Nero are tied at first place, although iTunes is quite above Nero in the final ranking. And Compaact!, Faac and Real are tied at second place.

The results:
http://www.rjamorim.com/test/aac128v2/results.html (http://www.rjamorim.com/test/aac128v2/results.html)

For those in a hurry:
(http://www.rjamorim.com/test/aac128v2/plot12z.png)

Thank-you very much for everyone that participated.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 05:50:47
BTW, the obvious outcome of this is: iTunes will be featured at the 128kbps multiformat test, that shall start next month.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: dev0 on 2004-03-01 05:52:58
And just as I expected FAAC did a lot better job than most people thought it would. Props to Knik for his awesome work.
And, of course, respect and thanks to rjamorim for his amazing efforts.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: norky on 2004-03-01 06:04:00
thanks to everyone who worked on this.  from what i've read a lot of hard work and listening was done.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Dologan on 2004-03-01 06:04:43
What did the sample numbers equate to? (e.g. sample_3 = Real, sample_2 = iTunes... etc.)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 06:11:13
1 = Nero
2 = Real
3 = Faac
4 = Compaact!
5 = iTunes

Tomorrow I'll upload the decryption key to the site.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-01 06:11:19
First of all, thanks for all your hard work on conducting this test, Roberto.    And thanks to everyone who participated.

I tried 4 times to do the test, but the medication I've been taking has hopelessly affected my hearing sensitivity, and I wasn't able to differentiate anything. 

About the results, I'm no AAC expert, and I can only go on what I've heard here over the past several months.  Based on that, I'm not surprised at the outcome of QT or Nero, but FAAC surprised me a little, and Real surprised me quite a bit to be ranked as highly as it was.  I guess I was only able to remember the reputation RA had over the past couple of years, so I wasn't expecting Real's AAC encoder to do so well.  (Although, finding out since that it's Coding Technologies who created it, it's not so surprising after all.)


Edit:

@Roberto:  Can you post the actual average bitrates for each codec, for reference?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 06:26:09
Quote
Edit:

@Roberto:  Can you post the actual average bitrates for each codec, for reference?

Yes. Tomorrow.

I'll get some sleep now
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rpop on 2004-03-01 06:29:11
IMHO, iTunes won. Though not rated higher than Nero with 95% confidence, the difference does seem significant, with iTunes clearly getting first place on 4 of the samples, slightly edging out Nero on 6 others, and being slightly beaten byNero on only 2 samples. These results also imply that iTunes is more efficient, seeing as it stayed at a constant 128 kbps, while Nero went above 140 kbps on some occasions. All in all, I think iTunes should be the recommended AAC encoder if one hopes to obtain the highest quality on average .
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 06:34:42
That's true. I tried to clarify the issue in the results page, in this part:

Quote
iTunes is more or less tied to Nero. The security margin (LSD/2 overlapping with ranking) is very small, which would likely indicate iTunes is the winner.


Thank-you (and thanks to JohnV) for your comments on this subject.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: QuantumKnot on 2004-03-01 06:41:13
Cool. iTunes is first placed and its free too.  Great combination
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-01 06:47:41
Quote
IMHO, iTunes won. Though not rated higher than Nero with 95% confidence, the difference does seem significant, with iTunes clearly getting first place on 4 of the samples, slightly edging out Nero on 6 others, and being slightly beaten byNero on only 2 samples. These results also imply that iTunes is more efficient, seeing as it stayed at a constant 128 kbps, while Nero went above 140 kbps on some occasions. All in all, I think iTunes should be the recommended AAC encoder if one hopes to obtain the highest quality on average .

I agree with this.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-01 06:53:25
Roberto, didn't you get my results?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: harashin on 2004-03-01 07:04:21
Quote
Roberto, didn't you get my results?

I found your name in a result.
http://www.rjamorim.com/test/aac128v2/comm...2_decrypted.txt (http://www.rjamorim.com/test/aac128v2/comments/results12/anon03-sample12_decrypted.txt)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 07:04:26
Quote
Roberto, didn't you get my results?

I guess I did. Why?

Edit: Oh, harashin found them

Now that participant name can be entered in the ABC/HR results, I guess I will change the file naming method. Instead of anon, I'll just add "participantXX" before each result file. (Except the ones that send the results files with their login in the filename)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-01 07:13:46
Quote
IMHO, iTunes won. Though not rated higher than Nero with 95% confidence, the difference does seem significant

It's not statistically significant still. Stricly by the books there's no winner. Roberto was unsure about this because there were not so many people taking the test, LSD/2 overlap being 0.03 points according to him.
But I said that it still can't be said that it is statistically significant (something either is or isn't significant), only that iTunes is a likely winner because the LSD/2 overlap is pretty small.
Still, stricly speaking statistically there is no winner.

One has to remember that with a different sample set the results could have been different. 12 samples is pretty close to the practical limit this kind of group test can have, but 12 samples is in no way "enough" considering that many deficiencies of all codecs are not revealed here. This is somekind of average indication, but I think with different set of samples there could have caused more or less difference between the contenders.

Another thing which will likely draw attention is that Nero had with these samples higher bitrates than most other codecs, although its average with lots of samples according to spoon's test was about 131kbps. One has to remember that VBR is quality based encoding, so what you see according to the VBR principle is the average quality for a VBR codec for its overall average bitrate, but with only 12 samples (the average quality is based on very small amount of samples but what it shows is the average quality of the overall average bitrate=131kbps in Nero's case).  Compaact is an exception in certain cases: it scales the bitrate up very heavily with short blocks, which means that it gets good ranking with velvet. But this behaviour can't be mirrored directly to the general VBR behaviour and to the principle of VBR coding (constant quality principle).
Of course the constant quality doesn't in practise happen always (even if you exclude compaact's short block scaling), because VBR is hard to control, so things are quite complicated..  It's by no mean clear that if QT/iTunes gets VBR mode that it will be better than its CBR. Good VBR quality in all circumstances is hard to achieve and control at mid-low bitrates, and that's why my opinion is that ABR would be probably the best coding method for mid-low bitrate like 128kbps.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-01 07:25:54
Anyone want to post the comments archive zipped?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-01 07:29:17
Quote
Quote
Roberto, didn't you get my results?

I found your name in a result.
http://www.rjamorim.com/test/aac128v2/comm...2_decrypted.txt (http://www.rjamorim.com/test/aac128v2/comments/results12/anon03-sample12_decrypted.txt)

Thank you. I looked for my result for sample 5 (Hongroise) and couldn't find it. All other results are there (with anon03) but this one is missing! 

(Maybe I ranked the reference 5 times... )
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 07:35:38
Quote
Anyone want to post the comments archive zipped?

Well, you can get UnRAR and RAR for MacOS X, and I think StuffIt supports it too.

I use Rar because, in solid mode, it gets up to 3 times smaller than zip.

@Continuum: I'm afraid I messed the anon ordering a little. Sorry :/
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-01 07:50:49
Quote
Quote
Anyone want to post the comments archive zipped?

Well, you can get UnRAR and RAR for MacOS X, and I think StuffIt supports it too.

Okay, I tried stuffit and gumby, they both report errors with the format

:-\

UnRAR cli app works
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Dibrom on 2004-03-01 07:58:19
Quote
Quote
Quote
Anyone want to post the comments archive zipped?

Well, you can get UnRAR and RAR for MacOS X, and I think StuffIt supports it too.

Okay, I tried stuffit and gumby, they both report errors with the format

:-\

UnRAR cli app works

UnRarX (http://unrarx.sourceforge.net/) should probably work too, and is a little nicer than using the commandline tool.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 08:38:49
Some comments:

1) iTunes result is nice, but I think our progress is even nicer! In the previous test iTunes clearly beat Nero. Now it's getting quite arguable whether it's really better  Next major Nero release will be very intresting.

2) Somebody said that iTunes is more efficient because of bitrate - that's simply not true. The average bitrates of both codecs with the used settings is within +-3 kbps.

3) About interpreting the results. Is the graph correct? It is stated that the overlap is really small. But on the graph it is very large.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-01 09:43:29
Well, thankyou to rjamorim for running the test

Was very interesting, I'm looking forward to the multi-format test now
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 10:27:25
Quote
2) Somebody said that iTunes is more efficient because of bitrate - that's simply not true. The average bitrates of both codecs with the used settings is within +-3 kbps.

Depends of the genre.
I've encoded ~1500 files with latest Nero build (and a VBR profile especially optimized for this test), only classical, and bitrate is close to 140 kbps. And quality have serious issues. I really hope that problems will quickly be solved. But in my opinon, current encoder have still problems with efficiency AND with quality.


Anyway, faac surprises me. I really wonder if -q115 is the best preset for this encoder. I remember that in my last test, I've used ABR, safer. Is the bitrate table ready? It wouldn't surprises me if faac obtained good results on sample with bitrate > 140, and bad one with bitrate < 120.

Good job for all AAC developers. Many people discovered that 128 kbps are close to be transparent, if not totally transparent. And thanks to Roberto, for this nice test.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 10:53:08
Yes, it is true that some codecs are more fit to a genre than others. This does not change my point at all - on a sample of various genres, the bitrate was not abnormally high, and apparently the quality is very good, too, so the efficiency argument is false.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-01 10:56:21
One more thing - please do not extrapolate results from CBR to VBR  (saying: if the codec was used in CBR/VBR mode it would be better/worse, etc..)

MP3 listening test clearly shows that:

http://www.rjamorim.com/test/mp3-128/results.html (http://www.rjamorim.com/test/mp3-128/results.html)

One codec vendor's VBR codec was inferior to CBR implementation (slightly tuned) - which means that extrapolations do not work - CBR and VBR bit allocations could be quite different.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 11:05:48
Quote
so the efficiency argument is false.

If efficiency is a relation between quality and size, therefore Nero AAC seems to be less efficient (you said it, hierarchy is arguable).


I wonder: if there's space enough for the next test, why not include both iTunes and Nero AAC? Afterall, AAC is in a unique situation, with 6 or 7 (at least) editor currently working on AAC encoder (should be discuss in the netx pre-test thread, but people should start thinking on the idea).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bawjaws on 2004-03-01 11:16:46
Am I reading this graph correctly?

iTunes and Nero are statistically tied because the extended arm of Nero rises higher than the middle point of iTunes, am I right?

If so then you cannot say that Nero is better than Faac by the same criterion, though you can say that iTunes is better that Faac as they only overlap at the very edges (and definately better than the ones it doesn't overlap at all).

I'd say that makes iTunes a winner as it is statistically better than all but one of the other codecs, whereas Nero statistically beats all but two (and 'looks' very close to the rest).

Does that make sense?

edit: answering my own question it seems that any overlap at all means less than 95% confidence so I assume some sophisticated statistical method beyond my ken was used to decide who was tied for which position.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 11:23:49
Quote
iTunes and Nero are statistically tied because the extended arm of Nero rises higher than the middle point of iTunes, am I right?

Not sure - generally there should not be an overlap between the error bars themselves either, but the addition of errors is not linear. It would probably be better to figure this out from the raw data than from the graph, especially as I cannot align robertos text with what the graph shows.

To answer your question, from eyeballing the result it's more likely that Nero is better than Compactaac! and Real than that iTunes is better than Nero, though it's also quite likely iTunes is simply the best encoder, right now. You just can't conclude it with a very high degree of certainty.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-01 11:26:56
Quote
@Continuum: I'm afraid I messed the anon ordering a little. Sorry :/

No, the one result is actually missing. No problem, it just bugs me that I have to wait for the decryption key now.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 11:29:57
I don't kow if it's possible, but I would be interested but a specific graph with lowscorers only (as ff123 (?) did it for the multiformat test last year). There are a lot of results with few discrimination between encoders, and if there are significant about the transparency of modern AAC encoders, they don't help to see the existing difference among them.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-01 12:02:38
Quote
Anyway, it's safe to say that all codecs represented here are pretty mature and, no matter what your choice among them is, it's very likely you'll get very good results for your encodings.

Personally, i strongly disagree with this statement
I'd like to see some organized results from listeners who can discriminate codecs better, maybe these results can't give us statistical sureness but could be useful for developers/tuning.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: PoisonDan on 2004-03-01 12:30:37
Quote
BTW, the obvious outcome of this is: iTunes will be featured at the 128kbps multiformat test, that shall start next month.

Hmmm... I already look forward to seeing how QuantumKnot's tuned Ogg Vorbis encoder (http://www.hydrogenaudio.org/forums/index.php?showtopic=17949&view=findpost&p=187669) compares to iTunes at 128kbps. 

Of course, it's not certain that this particular Ogg Vorbis encoder will be featured, but it seems a logical choice to me at this moment.

Thank you for the test, Roberto.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-01 12:40:02
VERY interesting results:

1) apple knows whats good! its still on a position were someone can say its the best
also considering that its still cbr and was now tested against vbr nero

2) real knows why they offer 192kbps aac in their music store (sorry karl  )
for a big company wanting to compete against itunes it surely needs to offer a higher bitrate to keep up qualitywise (but well against all the wma stores around 128kbps real would also have been enough i assume  )

thinking about the "possible" results for the multiformat test i think someone can say that there is no way around itunes when it comes to buying online music with a great quality/size ratio

3) faac seems to be now on par with the other codecs and in fact also looking at my personal results its great!!!
but
when i did the listening test there was on nearly every sample one codec, which was easy to sort out
i always thought that this was faac (sorry about that  ) but now looking at my results its Compaact!!!
i wonder how this codec got such a good ranking here? it was really worse and easy to sort out imho (sorry alex)
didnt someone else had the same feelings/results like me?


and not to forget:
THANKS A LOT ROBERTO for this great test!!! 
you know how much value these tests bring to us, the costumers, we cant say thank you for this often enough!
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 12:52:31
Quote
when i did the listening test there was on nearly every sample one codec, which was easy to sort out i always thought that this was faac (sorry about that  ) but now looking at my results its Compaact!!! i wonder how this codec got such a good ranking here? it was really worse and easy to sort out imho didnt someone else had the same feelings/results like me?

What do you mean? Are you surprised about faac good quality or compaact! good performances?

In my opinion, faac new lowpass value (16 Khz, whereas 15 Khz last year) is more friendly for ABX tests. In this test, I've often ranked Real badly due to its lowpass. I'm not especially annoyed by lowpass on daily listening (portable use, and poor earbud performances), but on direct comparison, lowpass have significant impact on quality: it gives a metallic coloration, sometimes stereo feeling is affected (dull sound), etc... lowpass is sometimes difficult to interpret even if it's easy to detect.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-01 12:59:33
Quote
What do you mean? Are you surprised about faac good quality or compaact! good performances?

no i am surprised that overall compaact was voted that high cause only from my results it would have got a score of 3.4 (which means its cearly worse than the others for me)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 13:09:13
Quote
Quote
What do you mean? Are you surprised about faac good quality or compaact! good performances?

no i am surprised that overall compaact was voted that high cause only from my results it would have got a score of 3.4 (which means its cearly worse than the others)

OK
I don't have an overall view of my results (score table). I tried to have an approximate opinion by reading some results of mine. Apparently, compaact! is very unconstant. Absolutely fabulous with hongroise.wav (i.e. the piano sample), compaact! is on some samples the worse encoder, suffering from heavy troubles in the background. Faac seem to share the same position - and Nero AAC too, but problem is less pronounced.

I second [proxima] when he disagree about Roberto's conclusion. In my opinion, most AAC encoders are not mature enough. What is maturity for AAC? I guess that an AAC encoder should be named 'mature' when the performance are superior to MP3 on most samples. And I'm not sure that's the case for faac, real AAC, compaact! (and Nero AAC too, but I suppose that this encoder is more mature than the three other challengers).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: schnofler on 2004-03-01 13:22:50
The message of this test I like most is probably this: Lack of recent development is not a good indicator for poor quality (Take that, you Vorbis and MPC moaners! )

But seriously now, I'm a bit disappointed of AAC. With some samples (Sample 1, most noticeably) the codecs still all sounded worse to me than Lame at 128kbps. And 128kbps mp3 is not even supposed to be any competition for AAC at that bitrate. AAC is supposed to deliver equivalent quality at a lower bitrate, after all. And that advantage is just not huge at the moment. Maybe it can compete with mp3 at 160kbps, but it's no match for Lame APS. It's not such a big surprise then that modern formats still haven't really managed to replace mp3. A 25% advantage is just not enough to justify a switch in standards. I think the new codecs have to deliver the same quality as mp3 at half the bitrate before they will become a real danger to mp3's popularity.

I also think that Compaact! behaves strangely. In my results it won on the DaFunk sample and on Velvet, but got some terrible ratings on other samples, due to an awfully annoying metallic ringing artifact.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: eagleray on 2004-03-01 13:42:31
Going by the overlap rule it can not be said that either Nero or Quicktime is better than FAAC.  Hmmmm... I wonder what would happen if someone did some tuning and came up with a preset for FAAC that used a slightly higher lowpass than 16k traded off against a slightly lower q setting than 115 at the same file size.

I wonder what we will see when Roberto reveals the bitrates for the samples and the results are "adjusted" by this fudge factor.  The debate goes on.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: seanyseansean on 2004-03-01 13:43:46
It's on slashdot now. Batten down the hatches.

Title: AAC at 128kbps v2 listening test - FINISHED
Post by: cheerow on 2004-03-01 13:52:20
Quote
I think the new codecs have to deliver the same quality as mp3 at half the bitrate before they will become a real danger to mp3's popularity.

According to Microsoft wma sounds better at a fraction of any bitrate. 

Besides: compared to the development that went into mp3 (Lame) AAC is still in it's infancy...
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-01 14:00:12
Would it be possible to get another visualization of the results, too? Instead of the confidence intervals show the variance (or something similar) of each result.

It would be interesting to see, where the opinions differ most.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Jojo on 2004-03-01 14:18:53
just a quick question. Why wasn't QuickTime included in this test? In your last AAC 128kbps test, it was the winner. So it would have been interesting to see, how well they would perform against each other.

Anyway, awesome test! Thank you!
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 14:20:50
Quote
just a quick question. Why wasn't QuickTime included in this test? In your last AAC 128kbps test, it was the winner. So it would have been interesting to see, how well they would perform against each other.

Anyway, awesome test! Thank you!

iTunes is more friendly for end-user and daily use. You can't rip a CD with QT. You can't (natively) batch encode with this software.
iTunes allows CD ripping/encoding, and batch encoding from .wav files. iTunes uses the QT encoder in "BETTER" profile (whereas QT allows you a "BEST" profile).
Last but not least, iTunes is free
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: cheerow on 2004-03-01 14:26:09
Quote
Quote
just a quick question. Why wasn't QuickTime included in this test? In your last AAC 128kbps test, it was the winner. So it would have been interesting to see, how well they would perform against each other.

Anyway, awesome test! Thank you!

iTunes is more friendly for end-user and daily use. You can't rip a CD with QT. You can't (natively) batch encode with this software.
iTunes allows CD ripping/encoding, and batch encoding from .wav files.

Don't these two use the absolute same encoding engine?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 14:27:49
I've just edited my previous post.
Encoder are the same for both programs. Just a difference in profile (but with limited if not inaudible impact on quality for 44.1/16 PCM files).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Jojo on 2004-03-01 14:28:34
Quote
Quote
Quote
just a quick question. Why wasn't QuickTime included in this test? In your last AAC 128kbps test, it was the winner. So it would have been interesting to see, how well they would perform against each other.

Anyway, awesome test! Thank you!

iTunes is more friendly for end-user and daily use. You can't rip a CD with QT. You can't (natively) batch encode with this software.
iTunes allows CD ripping/encoding, and batch encoding from .wav files.

Don't these two use the absolute same encoding engine?

that is what I just thought too  - QuickTime is made by Apple...so the one that is used in iTunes should be the newest release
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: mdmuir on 2004-03-01 14:52:03
Since I could not differentiate anything in this test (congrats to those with better hearing acuity than me!) I reached the personal conclusion not to worry about it anymore. I use Nero AAC encoder at the transparent VBR profile, which for my ears should be plenty of  "overkill"
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: seanyseansean on 2004-03-01 14:53:59
Roberto, I just saw this comment on the Slashdot thread:

Quote
As a physicist, I'd just like to draw everyone's attention to the error bars on these charts. For the majority of the tests, it's possible to draw a horizontal line through the 95% confidence intervals of nearly all the points.

Hence, the conclusions declaring clear winners/losers in these cases are invalid. If 99% confidence intervals were used (which gives a better statistical test), I feel that no clear winners or losers would be drawn.

Be careful with these sort of studies - even though the author has used confidence intervals, he has failed to use them to infer the proper conclusions.

That said, it's awfully nice to see error bars on this sort of website. Simple data points give such a false sense of precision, I find...


Any comments? I wouldn't have a clue 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-01 15:36:21
What is the link to the slashdot thread?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-01 15:55:07
In any case, since I can't find the slashdot thread, the response to the criticism of choosing 95% confidence over 99% confidence is that I don't know what he's so uptight about.  We're not talking about introducing a new drug here.  This is a listening test, for God's sake!  95% confidence is a standard accepted threshold, and it was properly set prior to the test.  As an engineer I think this physicist should focus less on theory and more on practicality.

I wish that slashdotters could discern the real weaknesses in the test, which were already mentioned here:

1. not enough samples to be definitive
2. different set of samples could produce different results
3. highly discriminating listeners could spread the results more

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: seanyseansean on 2004-03-01 15:57:46
Quote
What is the link to the slashdot thread?

Sorry it took so long, the link is here (http://apple.slashdot.org/article.pl?sid=04/03/01/1234244&mode=flat)

sean

p.s. I post on there as 'doofusclam'...
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 16:39:40
Quote
But seriously now, I'm a bit disappointed of AAC. With some samples (Sample 1, most noticeably) the codecs still all sounded worse to me than Lame at 128kbps. And 128kbps mp3 is not even supposed to be any competition for AAC at that bitrate. AAC is supposed to deliver equivalent quality at a lower bitrate, after all. And that advantage is just not huge at the moment. Maybe it can compete with mp3 at 160kbps, but it's no match for Lame APS. It's not such a big surprise then that modern formats still haven't really managed to replace mp3. A 25% advantage is just not enough to justify a switch in standards. I think the new codecs have to deliver the same quality as mp3 at half the bitrate before they will become a real danger to mp3's popularity.

I'm sorry, but I think the facts don't agree with you.

Evidence 1: Results of previous 128k extension test.

Evidence 2: Results of previous 64k test.

Conclusion: Modern AAC codecs are much superior to MP3 at the same bitrate and _approach_ similar efficiency at half the bitrate.

I think the chance is very good that HE-AAC + Parametric Stereo equals 128k MP3 at 64 kbps.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 16:45:36
Let me try to reply to everything:

-Including Nero and iTunes in the multiformat test, probably getting rid of the anchor in the process since I won't accept more than 6 codecs:
Don't even start me.

-About Slashdot:
 

-About AAC being mature enough:
I strongly believe all these encoders performed wery well, considering the bitrate they were tested at. All of them are above or close to the ITU transparency cutoff, which surely wasn't the case in the MP3 test. Actually, if we compare graphs (I know that's not a good idea), the MP3 winner gets tied to the AAC loser.

Did I miss anything?

I will post the bitrate table later today.

Regards;

Roberto.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 17:05:29
Quote
Actually, if we compare graphs (I know that's not a good idea), the MP3 winner gets tied to the AAC loser.

In my opinion, AAC is now competitive with MP3, and the best AAC encoders are surely superior to LAME at 128 kbps on most case. The current problem don't lie in overall performances, but on occasionnal artifacts. Faac, Real, Compaact have still serious issues. Too many for my taste... I've also realized that when I've tested many AAC encoders and compared them to mp3 lame (OK, it was only one single musical genre, but there's a great diversity of instruments within): mp3 superiority on faac or Nero AAC was not something exceptionnal.

The next multiformat test will surely give a better idea on mp3/aac difference.

I don't say that mp3 is a reference, and that AAC isn't a great progress. But I'm more disappointing when I hear an AAC problems than with a mp3 artifact. Just because AAC seems to have a great potential, and that current implemtation are surely far from it. I suppose that in two or three years, quality will be amazing
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Dologan on 2004-03-01 17:09:40
Quote
3. highly discriminating listeners could spread the results more

Hmm... how would the results look like if people scoring 5.0 (or more than one 5.0) on a particular sample were not counted? The number of listeners would definitely be not enough to draw statistically valid results, but I would be curious to know if it would look qualitatively different.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: karl_lillevold on 2004-03-01 17:19:29
Quote
2) real knows why they offer 192kbps aac in their music store (sorry karl  )
for a big company wanting to compete against itunes it surely needs to offer a higher bitrate to keep up qualitywise (but well against all the wma stores around 128kbps real would also have been enough i assume  )

I find it very interesting to view my own scores for "Waiting" (sample 12), considering I do not have very experienced or expert ears, or have listened much to any of these codecs before. I did however borrow a good quality DAC and headphones.

Quote
1 = Nero
2 = Real
3 = Faac
4 = Compaact!
5 = iTunes


1: 3.0 (Nero)
2: 4.0 (Real)
3: 2.5 (Faac)
4: 3.5 (Compaact!)
5: 4.2 (iTunes)

I apologize if I misunderstood, misread, or mixed up something, but based on these numbers, it seems I personally did not like Faac or Nero for "Waiting", while iTunes did the best. I do remember Faac and Nero stood out as clearly the worst for this clip to my ears.

Preferences vary, and if anything, even though 128 kbps for me is close to transparency for most clips, at 192 kbps I know it would sound transparent, and if I had to choose to purchase at 192 kbps or 128 kbps, the overall results tell me that I would be confident in whichever codec is chosen, but rather select based on bitrate.

(I tried other clips that Waiting as well, but with the time I had, I had trouble hearing differences)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: schnofler on 2004-03-01 17:25:37
Quote
Quote
But seriously now, I'm a bit disappointed of AAC. With some samples (Sample 1, most noticeably) the codecs still all sounded worse to me than Lame at 128kbps. And 128kbps mp3 is not even supposed to be any competition for AAC at that bitrate. AAC is supposed to deliver equivalent quality at a lower bitrate, after all. And that advantage is just not huge at the moment. Maybe it can compete with mp3 at 160kbps, but it's no match for Lame APS. It's not such a big surprise then that modern formats still haven't really managed to replace mp3. A 25% advantage is just not enough to justify a switch in standards. I think the new codecs have to deliver the same quality as mp3 at half the bitrate before they will become a real danger to mp3's popularity.

I'm sorry, but I think the facts don't agree with you.

Evidence 1: Results of previous 128k extension test.


Yes, these results show that AAC is clearly superior to mp3 at 128kbps. I don't doubt that. My point is, that's not enough. Unfortunately, we don't have any public listening test results comparing AAC at 128kbps to mp3 at a higher bitrate (similar to the 64kbps test). However, I stand by my claim that AAC's advantage in this higher bitrate range is not huge. The codecs were easily discernible on many of the samples and had big problems on some (for me these were the samples 1, 3, 8, 11, as you can see from my results (anon02, for some reason), but I realize that I don't agree with the majority vote there). For me at least, this is not the case with Lame APS, which is usually transparent to my ears and doesn't quite reach twice the bitrate of this test. From this I drew the conclusion that in this bitrate range AAC still has quite a way to go until it can deliver same quality at half the bitrate.

Quote
Evidence 2: Results of previous 64k test.

Conclusion: Modern AAC codecs are much superior to MP3 at the same bitrate and _approach_ similar efficiency at half the bitrate.

I think the chance is very good that HE-AAC + Parametric Stereo equals 128k MP3 at 64 kbps.

There, you have a point. I'm almost sure that for my hearing HE-AAC + Parametric Stereo will indeed surpass 128kbps mp3 (in my personal results from the 64kbps test, even HE-AAC without PS tied to Lame at 128kbps). And I agree with you that HE-AAC's advantage over mp3 is indeed stunning, even more so if you decrease the bitrate even further (I was shocked when I first tried HE-AAC at around 30kbps).
However, practically (as opposed to technically) HE-AAC is a completely different format than LC AAC, which is why I regard discussions about AAC (as if it was only a single format) as flawed.

Thus, my conclusion is: HE-AAC, in its native bitrate range, is much superior to mp3 at the same bitrate and very close to similar efficiency at half the bitrate.
LC AAC is clearly superior to mp3 at the same bitrate but far from approaching mp3's efficiency at half the bitrate.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 17:31:27
Quote
(anon02, for some reason)

In the next test, it'll be "listenerXX". It's still from the days I added all login names by hand to the filenames when the user asked to be associated with his results.

Quote
LC AAC is clearly superior to mp3 at the same bitrate but far from approaching mp3's efficiency at half the bitrate.


Well, according to MPEG's goals, LC AAC is not meant to sound the same as MP3 at half the bitrates, but at 30% smaller bitrates.  And I reckon the best AAC implementations deliver this quality.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 17:40:26
Quote
I think the chance is very good that HE-AAC + Parametric Stereo equals 128k MP3 at 64 kbps.

Will Parametric Stereo be useful at 64 kbps?
I sometimes read that the tool is efficient at very low bitrate only (24-48 kbps).


http://www.hydrogenaudio.org/forums/index....ndpost&p=178765 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18154&view=findpost&p=178765)
http://www.hydrogenaudio.org/forums/index....ndpost&p=178100 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18098&view=findpost&p=178100)
http://www.audiocoding.com/phorum/read.php...4453#reply_4495 (http://www.audiocoding.com/phorum/read.php?f=1&i=4495&t=4453#reply_4495)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Floydian Slip on 2004-03-01 17:47:06
Quote
Well, according to MPEG's goals, LC AAC is not meant to sound the same as MP3 at half the bitrates, but at 30% smaller bitrates.  And I reckon the best AAC implementations deliver this quality.

How did you come to this conclusion? If that was true then best LC-AAC at 128kbps quality should be close to LAME --alt-preset standard -Y quality (~166kbps?). It will be really interesting to see a valid listening test whether that industry hype holds water.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: schnofler on 2004-03-01 17:54:34
Quote
In the next test, it'll be "listenerXX". It's still from the days I added all login names by hand to the filenames when the user asked to be associated with his results.

From what you wrote in the readme file, I just got the impression you would name the results files according to what was entered in the listener text field. Anyway, doesn't matter really.

Thanks again for your work. It's interesting and enlightening as usual.

Quote
Well, according to MPEG's goals, LC AAC is not meant to sound the same as MP3 at half the bitrates, but at 30% smaller bitrates. And I reckon the best AAC implementations deliver this quality.

Aah, I see, so the flaw is in the concept, not the execution 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 17:54:53
Quote
Will Parametric Stereo be useful at 64 kbps?
I sometimes read that the tool is efficient at very low bitrate only (24-48 kbps).


http://www.hydrogenaudio.org/forums/index....ndpost&p=178765 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18154&view=findpost&p=178765)
http://www.hydrogenaudio.org/forums/index....ndpost&p=178100 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18098&view=findpost&p=178100)
http://www.audiocoding.com/phorum/read.php...4453#reply_4495 (http://www.audiocoding.com/phorum/read.php?f=1&i=4495&t=4453#reply_4495)

It's probably at the upper edge of the usefulness.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 17:56:10
Quote
How did you come to this conclusion? If that was true then best LC-AAC at 128kbps quality should be close to LAME --alt-preset standard -Y quality (~166kbps?). It will be really interesting to see a valid listening test whether that industry hype holds water.

OMG! It's the guy that hates my tests because they bring down HA's quality standards.

To start with, I said "I reckon". Don't come being smart assed on me, I didn't conclude anything as you are saying. Second, I evaluated that by looking at the superior score iTunes gets compared to Lame and other MP3 encoders.

If you want a "conclusion" feel free to conduce a test and while at it, raise HA's quality standards that I shamefully brought down.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 17:58:17
Quote
From what you wrote in the readme file, I just got the impression you would name the results files according to what was entered in the listener text field. Anyway, doesn't matter really.

It would be doable, but would also be a pain :B

Maybe you can add a routine to the decryption module that searches for listener name and adds it to the decrypted filename?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 18:00:18
Quote
How did you come to this conclusion? If that was true then best LC-AAC at 128kbps quality should be close to LAME --alt-preset standard -Y quality (~166kbps?). It will be really interesting to see a valid listening test whether that industry hype holds water.

128kbps LC-AAC being comparable with 160kbps MP3?

Certainly! I don't think thats 'hype' at all.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: 21_already on 2004-03-01 18:04:47
"However, practically (as opposed to technically) HE-AAC is a completely different format than LC AAC, which is why I regard discussions about AAC (as if it was only a single format) as flawed."  ~schnofler

I personaly disagree, since the MPEG chose SBR as a method to improve quality knowing full well that it would be somewhat backward compatible, i.e. an LC AAC codec can play back an HE-AAC file, albeit without the SBR, making it an approximately interopperable solution. Try it, it's fun; Listening to an HE-AAC track on quicktime and then listening to it with COREAAC.ax, i keep showing it off to my friends (depressing when they can't tell the difference).

I don't know if the same thing goes for PS
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Floydian Slip on 2004-03-01 18:09:02
Quote
OMG! It's the guy that hates my tests because they bring down HA's quality standards.

Don't get personal here. I am just wondering whether the industry statement "LC-AAC can achieve similar quality level at 30% less than MP3's bitrate" is a hype.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Garf on 2004-03-01 18:11:44
Quote
I don't know if the same thing goes for PS

It'll play back mono. So it's backwards compatible.

But I think the difference between supported or not will be more easily heared than with HE-AAC, of course.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Floydian Slip on 2004-03-01 18:15:53
Quote
Quote
How did you come to this conclusion? If that was true then best LC-AAC at 128kbps quality should be close to LAME --alt-preset standard -Y quality (~166kbps?). It will be really interesting to see a valid listening test whether that industry hype holds water.

128kbps LC-AAC being comparable with 160kbps MP3?

Certainly! I don't think thats 'hype' at all.

Ummm, I am not that sure. Actually to be fair, to get around 30% less bitrate you have to compare 180kbps LAME with 128kbps LC-AAC. That's why I mentioned LAME --alt-preset standard -Y which will be a valid bitrange to compare.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 18:22:44
Quote
Ummm, I am not that sure. Actually to be fair, to get around 30% less bitrate you have to compare 180kbps LAME with 128kbps LC-AAC. That's why I mentioned LAME --alt-preset standard -Y which will be a valid bitrange to compare.

--preset standard -Y is one of the highest quality level of mp3 ever reached by mp3 format. For good comparison, wait three years: good AAC encoder, with well-tuned VBR mode, will probably be competitive and surely better (think about smearing) than mp3.

If you want a good comparison, try to compare iTunes 128 and Fastenc at 180 kbps. It wouldn't surprise me to see AAC sounding better.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: schnofler on 2004-03-01 18:44:24
Quote
Quote

From what you wrote in the readme file, I just got the impression you would name the results files according to what was entered in the listener text field. Anyway, doesn't matter really.


It would be doable, but would also be a pain :B

Maybe you can add a routine to the decryption module that searches for listener name and adds it to the decrypted filename?

Hehe, damn, dug my own hole there, didn't I?
Anyway, I was thinking about changing the results file format altogether, XML possibly. It's a lot easier to parse and process, and for the 64kbps test phong made that nifty script which produced browsable html pages (here (http://www.hydrogenaudio.org/forums/index.php?showtopic=13464&view=findpost&p=136917)), which is much nicer than reading text files anyway, so maybe we can dispense with text files altogether.

Quote
I personaly disagree, since the MPEG chose SBR as a method to improve quality knowing full well that it would be somewhat backward compatible, i.e. an LC AAC codec can play back an HE-AAC file, albeit without the SBR, making it an approximately interopperable solution. Try it, it's fun; Listening to an HE-AAC track on quicktime and then listening to it with COREAAC.ax, i keep showing it off to my friends (depressing when they can't tell the difference).

Well, if your friends' ears are that terrible, they probably won't hear a difference between mp3 and HE-AAC anyway. I mean, get serious: HE-AAC played back by an LC-AAC decoder is lowpassed at 9kHz (!). Even if the remaining bandwidth is reproduced perfectly, this still sounds just plain terrible. "Approximately interoperable" is extremely euphemistic in these circumstances.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Floydian Slip on 2004-03-01 18:46:39
Quote
If you want a good comparison, try to compare iTunes 128 and Fastenc at 180 kbps. It wouldn't surprise me to see AAC sounding better.

To get a meaningful conclusion you have to compare best encoders of the both formats. That's why I am saying to compare LAME --alt-preset standard -Y  with whatever best LC-AAC encoder at 128kbps (currentlly looks like iTune). IMHO, comparing iTune with Fastenc won't be a good choice. But even with vbr mode, it would be interesting to see whether current best LC-AAC at 128kbps vbr can stand up against LAME --alt-preset standard -Y.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: music_man_mpc on 2004-03-01 18:50:37
Quote
1) iTunes result is nice, but I think our progress is even nicer! In the previous test iTunes clearly beat Nero. Now it's getting quite arguable whether it's really better  Next major Nero release will be very intresting.

Wasn't CBR used for Nero in the last test?  I'm not trying to say that Nero hasn't  made any progress, I am sure that it has, but wouldn't the CBR - VBR issue be part of Nero's better score in this test?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-01 18:56:31
Quote
The current problem don't lie in overall performances, but on occasionnal artifacts. Faac, Real, Compaact have still serious issues.

I agree with you.
Quote
The next multiformat test will surely give a better idea on mp3/aac difference.

As you already said, there is no doubt that @128 kbps the best AAC implementation is superior to the best MP3 implementation. The next multiformat test will confirm what we already know. So, i understand your desire to see Ahead AAC codec but i'm afraid this could not happen because lack of space. Maybe we can exclude iTunes: i see no point to compare again two codecs (LAME and iTunes) that, quality speaking, remained almost the same after the last multiformat test.

I think we are speaking of AAC vs MP3 in a wrong manner here. There are mature and bad MP3 encoders and there are good and "not yet mature" AAC encoders. The maturity of an AAC encoder should be evaluated taking into account the potential. If all codecs tested were "mature" we could say for sure that Itunes one is the "more mature", so (according to the last multiformat test) we can conclude that a mature AAC encoder is equal to Musepack performance @128 kbps....and i've read many time from Frank Klemm itself that musepack PNS implementation is in "alpha stage" !!!. Can we conclude this ?? I simply think we have reached an absurd

From what i've read, AAC is considered the state of art and i simply think that it has the potential to be much better than today.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 19:02:43
Quote
To get a meaningful conclusion you have to compare best encoders of the both formats. That's why I am saying to compare LAME --alt-preset standard -Y  with whatever best LC-AAC encoder at 128kbps (currentlly looks like iTune). IMHO, comparing iTune with Fastenc won't be a good choice. But even with vbr mode, it would be interesting to see whether current best LC-AAC at 128kbps vbr can stand up against LAME --alt-preset standard -Y.

What statement did the MPEG Comitee (or something similar): that AAC is 30% more efficient that MP3, or that current AAC encoders are 30% more efficient that current MP3 encoders?
If they were only talking about format, then comparison between different encoders isn't a good way to verify their claims. A technical discussion (comparing tool and specification potential) is probably the only way to see if AAC is that efficient.

Now, if you want to compare lame at 180 kbps to iTunes or Nero at 128 kbps, do it. I'm also interested, and I can perform some tests. But if iTunes isn't as good, what will be the conclusion? That AAC isn't +30% efficient, or just that in the beginning of 2004, AAC hasn't reach his full potential?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 19:02:46
Quote
and i've read many time from Frank Klemm itself that musepack PNS implementation is in "alpha stage" !!!

WTF does PNS have to do with this whole discussion?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 19:09:33
Quote
,Mar 1 2004, 07:56 PM] i see no point to compare again two codecs (LAME and iTunes) that, quality speaking, remained almost the same after the last multiformat test.

Don't want to start a pre-test discussion but:
- lame 3.90 -> 3.95 (two years of development)
- QuickTime 6.3 -> 6.5

I can't say if progress are limited or impressive, but both encoders have progressed.
But two other encoders didn't progress since last test:
- vorbis (official: 1.01 was a bugfix for lowbitrate on some conditions)
- mpc
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 19:13:55
Quote
- vorbis (official: 1.01 was a bugfix for lowbitrate on some conditions)

I'll maybe use one of the unofficial tunings.

I will leave that decision to the vorbis enthusiasts though. It makes no difference for me if we go with official or unofficial.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-01 19:15:53
Quote
WTF does PNS have to do with this whole discussion?

PNS is exploited in low bitrate settings with Musepack, even at --quality 4 but it is in alpha stage. Maybe Musepack could perform better at --quality 4 with a final PNS implementation.
Nevertheless, please, let's not begin another comparison AAC/MPC or subband/transform discussion. My analogy was certainly not meant to do that, execuse me if i was not enough clear.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-01 19:18:06
Quote
Quote
1) iTunes result is nice, but I think our progress is even nicer! In the previous test iTunes clearly beat Nero. Now it's getting quite arguable whether it's really better  Next major Nero release will be very intresting.

Wasn't CBR used for Nero in the last test?

No.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 19:19:01
Quote
,Mar 1 2004, 04:15 PM] PNS is exploited in low bitrate settings with Musepack, even at --quality 4 but it is in alpha stage. Maybe Musepack could perform better at --quality 4 with a final PNS implementation.

Ehm... to the best of my knowledge, PNS wasn't used in my test.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 19:19:50
Quote
Quote
Quote
1) iTunes result is nice, but I think our progress is even nicer! In the previous test iTunes clearly beat Nero. Now it's getting quite arguable whether it's really better  Next major Nero release will be very intresting.

Wasn't CBR used for Nero in the last test?

No.

http://www.rjamorim.com/test/aac128test/presentation.html (http://www.rjamorim.com/test/aac128test/presentation.html)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 19:20:49
Quote
Quote

Wasn't CBR used for Nero in the last test?

No.

Well, it might have been ABR, but to this day Nero claims "Constant Bitrate" when encoding at 128kbps.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-01 19:47:55
Quote
Ehm... to the best of my knowledge, PNS wasn't used in my test.

It's included by default with radio profile.

Regarding the incoming multiformat test, i think we should include something new (i like your idea of including a vorbis unofficial version) because some codecs are almost the same. Regarding LAME and QuickTime changes should not be so radical but obviously we can't tell how significant was the "recent" upgrades for quality. But i suppose this will be discussed later, in the apposite thread.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 20:02:45
If people believe the codecs are almost the same and that doesn't justify a new listening test, I will not conduce it and go straight to the low bitrates test.

I won't replace a winning codec with a loser.

Multiformat test discussion starts next week.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: SometimesWarrior on 2004-03-01 20:54:06
rjamorim, have you made available the decryption key yet? I have a few results that I forgot to submit by the deadline  but I'd like to know how I rated the samples.

Thanks for running the test. Your work is always appreciated!
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bidz on 2004-03-01 20:59:56
I really appreciate this test. I'll still stick with iTunes/QuickTime then  Not that i had any doubt before the test anyway, the results was mostly just as i anticipated it.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Raptus on 2004-03-01 21:06:47
almos the same? whoa! 
I might not be speaking for everyone but IMHO the differences between codecs sometimes is huge! The new listening test should definitely be made.

BTW, I'm quite happy with how well I (sknop) did in the test. 

I apreciate your efforts, Roberto.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: AstralStorm on 2004-03-01 22:11:27
The conclusions from my tests (too many ranked references):

- Don't test codecs at 0:00. (local time  )
- Train some more on AAC: I don't have some of the codecs.  (esp. iTunes and Real, Compaact can be tricky also)
- Do more ABX tests.

I usually test Vorbis, MP3 and lower MusePack (up to 5)

I vote for adding ABX test tracking in ABC/HR.
But how to parse it? Only take the best test into consideration?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-01 22:58:06
Quote
Quote
Quote
Quote
1) iTunes result is nice, but I think our progress is even nicer! In the previous test iTunes clearly beat Nero. Now it's getting quite arguable whether it's really better  Next major Nero release will be very intresting.

Wasn't CBR used for Nero in the last test?

No.

http://www.rjamorim.com/test/aac128test/presentation.html (http://www.rjamorim.com/test/aac128test/presentation.html)

Damn. I remembered that wrong. Sorry about that.
Anyway, for Roberto: Nero had no ABR at that time. ABR coding is used by default in Recode2, but not in normal Nero 6's encoder.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: music_man_mpc on 2004-03-01 23:08:28
Quote
Damn. I remembered that wrong. Sorry about that.

It's OK.  I was pretty sure when I wrote the origonal post, but you had me second guessing myself for a minute.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Mac on 2004-03-01 23:13:10
I'd just like to apologise to Roberto for not taking part.  After managing to spot one codec/sample out of 5 tests I gave up.  I meant to try again over the weekend but found I couldn't use earbuds 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: mmortal03 on 2004-03-01 23:17:57
Partial quote from a slashdot post here http://apple.slashdot.org/comments.pl?sid=98809&cid=8430289 (http://apple.slashdot.org/comments.pl?sid=98809&cid=8430289) :

Quote
Fundamentally, even the most "discriminating" audiophiles cannot tell the difference between 16-bit, 44.1kHz PCM (Pulse Code Modulation - e.g. AIFF, WAV, in the computer world) and the 1-bit, 2.7GHz DSD bitstream of SACD... nevermind the minute differences betweeen various AAC-enabled codecs. Hell, I would challenge anyone to be able to tell the difference between 16-bit PCM and MPEG-4 AAC. The AES (Audio Engineering Society) has stated that MPEG-4 AAC is perceptibly indistinguishable from uncompressed 16-bit, dual channel PCM (e.g. CD-DA spec audio).and I would wager any experienced audio engineer's pair of ears (my own included) against any consumer "audiophile" any day of the week. My advice to the idle rich? Don't buy the $45,000 pair of speakers... instead buy yourself better hearing and some common sense. My personal preference? MPEG-4 AAC. As a content creator intensely familiar with a variety of media standards including AES, NTSC, ISO, ITU-R/CCIR, etc. I believe MPEG-4 w/AAC (not Quicktime MPEG-4, mind you, but straight MPEG-4) is the superlative format for compressed audiovisual media. However, for critical listening, only uncompressed audio is the way to go.


A little off topic, but this WAS a post about this test.  My question is, what is this "straight MPEG-4 AAC" that this guy is talking about?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-01 23:34:56
Quote
My question is, what is this "straight MPEG-4 AAC" that this guy is talking about?

Excuse my french, but this guy is "plein de merde".
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-01 23:40:25
 
What did you learn during your french séjour? Not poetry
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: kl33per on 2004-03-01 23:40:50
Yeh, I've just been reading the whole topic at Slashdot, and most of the people there don't have the slightest clue what they're talking about.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: mmortal03 on 2004-03-01 23:40:58
Quote
Quote
My question is, what is this "straight MPEG-4 AAC" that this guy is talking about?

Excuse my french, but this guy is "plein de merde".

I figured.  Just wanted to make sure.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: music_man_mpc on 2004-03-01 23:53:44
Quote
Quote
My question is, what is this "straight MPEG-4 AAC" that this guy is talking about?

Excuse my french, but this guy is "plein de merde".

Good one Roberto.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-02 00:48:56
Don't know if I could differenciate PCM 44100/16 bits from straight MPEG-4 encodings... but here is the overview of my notation for non-mythological AAC encoders:

http://www.foobar2000.net/divers/128AAC2.html (http://www.foobar2000.net/divers/128AAC2.html)


My own results are more constrasted than the overall results.

- Faac and compaact are often the worse encoders (Compaact 5 times, faac three times)
- Real has to my ears different problems. Suffers more from lowpass than other annoying artifacts. I'd like to hear Real encodings lowpassed to ~16000 hertz.

- Nero sounded very bad on two files (the two classical music ones...), but was generally good ranked (two time BEST, and six time SECOND).
- iTunes is not hegemonic, but to my ears, the best AAC encoder I've tested at this bitrate.


There wasn't any low anchor, and therefore notation is sometime very low. But even with low anchor, some encodings sounded very bad to my ears (irritiating to very irritating).

(P.S. Thanks Roberto for the files you've sent me)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ihop on 2004-03-02 01:47:58
Quote
Quote
As a content creator intensely familiar with a variety of media standards including AES, NTSC, ISO, ITU-R/CCIR, etc. I believe MPEG-4 w/AAC (not Quicktime MPEG-4, mind you, but straight MPEG-4) is the superlative format for compressed audiovisual media. However, for critical listening, only uncompressed audio is the way to go.



I'm pretty sure that since this slashdot poster wrote "MPEG-4 w/AAC" and "audiovisual media" that he is in fact referring to (and criticizing) Quicktime's implementation of MPEG-4 video not audio.

As for his comment with regards to uncompressed media, I'm pretty sure it's not "the only way to go" (regardless of how discerning he might be) given the existence of lossless audio compression.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-02 02:11:40
Quote
Quote
(guruboolez @ Mar 1 2004, 06:05 PM)

The current problem don't lie in overall performances, but on occasionnal artifacts. Faac, Real, Compaact have still serious issues.

I agree with you.


Me too

Quote
,Mar 2 2004, 05:56 AM] As you already said, there is no doubt that @128 kbps the best AAC implementation is superior to the best MP3 implementation. The next multiformat test will confirm what we already know. So, i understand your desire to see Ahead AAC codec but i'm afraid this could not happen because lack of space. Maybe we can exclude iTunes: i see no point to compare again two codecs (LAME and iTunes) that, quality speaking, remained almost the same after the last multiformat test.

I don't agree.

The point of the multi-format test is to compare the best of MP3 with the best of AAC with the best of Vorbis, etc.

Well, that was my understanding.

Why put Ahead/Nero AAC into the test if it lost to iTunes/QT AAC in both previous tests?

I don't see Fhg being used instead of LAME, even though Fhg lost to LAME in previous tests... we've all see LAME in so many different audio tests after all...

(Apologies to any Ahead/Nero employees/fans)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Bonzi on 2004-03-02 02:40:52
Quote
Quote
My question is, what is this "straight MPEG-4 AAC" that this guy is talking about?

Excuse my french, but this guy is "plein de merde".

Lol, nice one rjamorim!  You might even say, "il est aussi ulite qu'un frien à main sur un canoe."  Ah, the French are so much more creative with insults than us .
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bidz on 2004-03-02 02:59:36
The Winner of a listening test - is - ofcourse the right one to contend in the multiformat test.

Choosing one of the "loosing" codecs to represent a format in a multiformat test sounds insane. That's like getting the 4th best (ranked) sprinter to run in the Olympics, instead of the best - cause we all know the 1st ranked sprinter would just win again, and again, and again..

A stupid comparison, maybe, but it still makes sense somehow (i hope)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: kwanbis on 2004-03-02 03:29:59
i want xing to participate
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: QuantumKnot on 2004-03-02 03:38:29
Quote
i want xing to participate

Yep, just to see if it can pull off another surprise. 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 03:54:54
Why not TAC? I bet my good friend KM would be very pleased.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Gabriel on 2004-03-02 08:16:47
Quote
"il est aussi ulite qu'un frien à main sur un canoe."

Funny one. (this sounds more like Canadian French than "France French")
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-02 13:50:13
Quote
Why put Ahead/Nero AAC into the test if it lost to iTunes/QT AAC in both previous tests?

i also dont get that

i clearly want to know how the best aac implementation does compared to wma9, thats simply the upcoming audio codec "war" and aac should be represented as good as possible!
also this comparison can be used/shown for consumers to decide whether they should choose itunes or a wma music store qualitywise

to make it short: qt should be used in any case!
(i also cant imagine a reason which speaks against using qt)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: gotaserena on 2004-03-02 14:33:02
I would like to ask a few questions about the methodology (I'm a newbye when it comes to audio compression, so my apologies if this is something stupid.)

- As far as I can understand there are two sources of uncertainties in such a test:
1) The spread of grades given to a particular music sample because people rate them differently, background noise, etc.
2) The spread in each implementation's efficiency (i.e. bitrate distribution, modulation, etc.) for each music sample.

How are those two dealt with in the final number? Just treat 1) and 2) as statistical errors and "sum and divide by N (or sqrt{N})"?

- I don't want to sound mighty and lofty, but the one that raised the point that the error bars are overlapping two much to call codec A "the winner" is right. Since people around here seem to believe that codec A is really the winner in the listening test it makes me think that the error bars are overestimated (this is a common "mistake" when one deals with uncertainties as uncorrelated. It has more to do with measurement theory than statistics). If codec A really sounds better than codec B in most of the cases then the error bars as they stand don't tell us the whole story.

Like I said before, I'm a complete n00b in audio compression (although I have a little experience in experimental physics), so forgive me if the point I raised is idiotic -- although I would like to know why

Ah, last but not least: thanks to the rarewares admins for the wonderful page and especial kudos to everybody that participated in the test.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-02 22:04:29
Quote
- As far as I can understand there are two sources of uncertainties in such a test:
1) The spread of grades given to a particular music sample because people rate them differently, background noise, etc.
2) The spread in each implementation's efficiency (i.e. bitrate distribution, modulation, etc.) for each music sample.

"1)" is a known variable that is dealt with by qualifying the statement of the results.  "Codec A had the highest average rating of all the samples tested."  To simply say that "Codec A is the beat" would be wrong, as the claim isn't qualified.

"2)" is more quantifiable, and there is a demand for resolving this question, as I hear many people ask "Which codec produces the best sound quality at the lowest bitrate, or (synonymously) with the smallest filesizes?"

I've proposed a "composite rating system in the past, which produced some debate, but was never "replaced" with a more effective system.  My idea was...

( [average rating] / [actual average bitrate] ) x [target nominal bitrate]

For instance, in this test, if iTunes rated an average of 4.20, and had an actual average bitrate of 128kbps, then it's Composite Rating would be...

( 4.20 / 128 ) x 128 = 4.20

Nero had an average rating os 4.04, and (I believe) an actual average bitrate of 141kbps, so it could be said to have a Composite Rating of...

(4.04 / 141 ) x 128 = ~3.67

This is not a perfect "system", I'm sure, but it at least attempts to address the disparateness of average bitrates (and in turn, resulting filesizes) of the encoded sample tracks.  Encoded filesizes (resulting from bitrate) is a matter of great importance to many people who encode music with a limited amount of HD space.

But, of course, how relevant this system would be when used to consider filesizes of encoded music other than the sample tracks is an issue that relates back to "1)".
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-02 22:24:05
Quote
( 4.20 / 128 ) x 128 = 4.20

Nero had an average rating os 4.04, and (I believe) an actual average bitrate of 141kbps, so it could be said to have a Composite Rating of...

(4.04 / 141 ) x 128 = ~3.67


Sorry, but this system is nonsense -

1. It is impossible to compare VBR and CBR directly (at least in Nero)  because they use completely different algorithms and psychoacoustic parameters - check MP3 tests where FhG VBR scored much worse than FhG CBR, for example - and the bit rate was similar

2. Nero CBR was used in the last AAC 128 test and it scored 4.02, if I remember correctly - it is not really possible to extrapolate results directly - and I am quite confident that current Nero encoder is way better than the one used in the last year's test.

3. Linear scaling of the subjective rankings directly to "match" bit-rate is not founded by any scientific proof - check bit-rate vs. SDG distribution used in making of PEAQ tool (database of listening tests) and you will see that the quality vs. bit rate curve is no way near linear, at least not for AAC.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 22:38:13
Quote
1. It is impossible to compare VBR and CBR directly (at least in Nero)  because they use completely different algorithms and psychoacoustic parameters - check MP3 tests where FhG VBR scored much worse than FhG CBR, for example - and the bit rate was similar

These are completely different encoders. Audioactive is based on SlowEnc. Audition is based on FastEnc. Besides, Audioactive underwent in-house tuning.

A good proof that AudioActive is based on SlowEnc is that, in my PC, it takes 1:40 to encode a 3:20 music at 128kbps. Audition takes 12 seconds

Quote
and I am quite confident that current Nero encoder is way better than the one used in the last year's test.


That is arguable. iTunes barely changed, according to Apple's AAC developer. And it managed to keep a good margin of superiority compared to Nero.

I guess the explanation to that is that between tests, it seems more resources were targeted at low bitrate and SBR tuning, and not mid-high bitrate tuning.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 22:47:35
Quote
Nero had an average rating os 4.04, and (I believe) an actual average bitrate of 141kbps, so it could be said to have a Composite Rating of...

I'd hope simplified "conclusions" like yours would be exclusive for Slashdot, not for HA..
You totally forget again what vbr is about, and what was the average bitrate of thousands of tracks, not to talk about different approach of cbr and vbr which are not comparable. Again, the principle in VBR is that you choose a quality level, and encoder tries to keep it.
This chosen quality level then approaches some bitrate with thousands of tracks, with tested Nero's setting this is about 131kbps.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 22:49:42
Quote
That is arguable. iTunes barely changed, according to Apple's AAC developer. And it managed to keep a good margin of superiority compared to Nero.

It's not arguable. It's totally clear that Nero has improved considerably for anybody who has done lots of testing. Just because your 12 samples may not show this, just shows that 12 samples isn't nearly enough to show the whole picture.
This is even more true for some of the lower ranking encoders, which in reality have way more trouble with quality than you can conclude from these 12 samples.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-02 22:49:44
Ok - I stand corrected - but anyway, comparing different encoder modes and extrapolating results in completely wrong way of doing things.

AAC quality scale is curve which starts to rapidly descent towards 1.0 somewhere between 80 and 96 kb/s , and beyond that the curve angle is much smaller.  I don't have the graph, unfortunately - because it is part of JAES tests - but you could see that curve steepnes are different for different codecs.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-02 22:51:00
Quote from: gotaserena,Mar 2 2004, 06:33 AM
- As far as I can understand there are two sources of uncertainties in such a test:
1) The spread of grades given to a particular music sample because people rate them differently, background noise, etc.
2) The spread in each implementation's efficiency (i.e. bitrate distribution, modulation, etc.) for each music sample.

How are those two dealt with in the final number? Just treat 1) and 2) as statistical errors and "sum and divide by N (or sqrt{N})"?

- I don't want to sound mighty and lofty, but the one that raised the point that the error bars are overlapping two much to call codec A "the winner" is right. Since people around here seem to believe that codec A is really the winner in the listening test it makes me think that the error bars are overestimated (this is a common "mistake" when one deals with uncertainties as uncorrelated. It has more to do with measurement theory than statistics). If codec A really sounds better than codec B in most of the cases then the error bars as they stand don't tell us the whole story.
[/quote]
After the means for each sample have been calculated, these are fed back into the ANOVA/Fisher LSD program to come up with the final error bars for the codecs over all 12 samples.

So in effect, each sample is assumed to be independent of each other (uncorrelated) and weighted equally in figuring the final outcome.  I imagine this could be a problem, for example if the sample selection were biased towards a certain genre, or if some samples had a widely disparate number of participants.  It just underscores how much the test samples can affect the outcome of the test.

Is this what you were after?

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 22:53:50
Quote
This chosen quality level then approaches some bitrate with thousands of tracks, with tested Nero's setting this is about 131kbps.

That's OK, but you can't deny that higher bitrates definitely help encoders.

A good example is Compaact!. It did pretty badly on most samples. But it really shone at Velvet, because it went up to 170kbps. (If I had featured fatboy, it would have used 270kbps!).

In these samples the bitrate stayed at 141. But if the average bitrate over hundreds of samples is 131-132kbps, on another batch of samples the bitrates would maybe be lower. And then quality would maybe be worse. Would it be worth weighting bitrate deviations then? It's pure speculation, but the question remains.

So, IMO both approaches - weighting bitrate deviations or not - have their cons and pros.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 22:55:49
Quote
It's not arguable. It's totally clear that Nero has improved considerably for anybody who has done lots of testing. Just because your 12 samples may not show this, just shows that 12 samples isn't nearly enough to show the whole picture.
This is even more true for some of the lower ranking encoders, which in reality have way more trouble with quality than you can conclude from these 12 samples.

Well, my samples come from a wide variety of styles. If you have other test results, using other samples, showing Nero indeed got much better since the last test, you should post them.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 22:57:11
Quote
Quote
This chosen quality level then approaches some bitrate with thousands of tracks, with tested Nero's setting this is about 131kbps.

That's OK, but you can't deny that higher bitrates definitely help encoders.

A good example is Compaact!. It did pretty badly on most samples. But it really shone at Velvet, because it went up to 170kbps. (If I had featured fatboy, it would have used 270kbps!).

In these samples the bitrate stayed at 141. But if the average bitrate over hundreds of samples is 131-132kbps, on another batch of samples the bitrates would surely be lower. And then quality would maybe be worse. Would it be worth weighting bitrate deviations then? It's pure speculation, but the question remains.

So, IMO both approaches - weighting bitrate deviations or not - have their cons and pros.

Compaact's high bitrate is due to high overcoding of short blocks. Just because of this, you can't extend this directly to other encoders.
Of course higher bitrate "helps", but the idea is to keep relatively constant quality. Compaact VBR doing very high overcoding of short blocks breaks this principle.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-02 22:57:50
Quote
So, IMO both approaches - weighting bitrate deviations or not - have their cons and pros.


You could theoretically weight bitrate and quality  by keeping the same coding mode of the encoder, and knowing - a priori  codec performance per bits/sample.

I could tell you that it is nowhere near perfection, and still it is a very tough method, and you have to have couple of statistically significant tests at, say, 96, 128, 160 and 192 kb/s to know how quality "scales" with the bit rate.

"Scale" of descent, like I said, is not the same for AAC and different codecs, and - most likely, not same between various implementations of the same algorithm ,

What if we slide bit rate to, say, 192 - so, QT would get 6.3 rating, right?

Wrong.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:00:24
Quote
Of course higher bitrate "helps", but the idea is to keep relatively constant quality. Compaact VBR doing very high overcoding of short blocks breaks this principle.

Well, they try to keep quality at all costs. I think that's the idea behind VBR - screw the bitrate, give desired quality.

And, even Velvet considered, Compaact managed to come really close to an average of 128kbps.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 23:03:43
Quote
Well, my samples come from a wide variety of styles. If you have other test results, using other samples, showing Nero indeed got much better since the last test, you should post them.

Not sure if I have that old Nero encoders anywhere, but there's no doubt that Nero has clearly improved. I've been testing it during this time quite a lot. I think Guru can verify this too although only for the kind of music he listens.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:03:49
Quote
"Scale" of descent, like I said, is not the same for AAC and different codecs, and - most likely, not same between various implementations of the same algorithm

So that warrants allowing a codec to use more bits than allowed by the test setup?

Quote
What if we slide bit rate to, say, 192 - so, QT would get 6.3 rating, right?

Wrong.


What are you trying to say there?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 23:05:22
Quote
Well, they try to keep quality at all costs. I think that's the idea behind VBR - screw the bitrate, give desired quality.

And, even Velvet considered, Compaact managed to come really close to an average of 128kbps.

That's just it: Compaact gives Velvet relatively better quality than on average because of overcoding of short blocks. This is more than just "keeping the quality", it's "pushing the quality".
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:06:19
Quote
Not sure if I have that old Nero encoders anywhere, but there's no doubt that Nero has clearly improved.

Oh, it did improve. It surely got closer to QuickTime.

But, until another test result is posted proving it improved a lot, my test results prove it didn't improve all that much.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-02 23:07:05
Quote
So that warrants allowing a codec to use more bits than allowed by the test setup?


Hmm.. we had this discussion before - and if I remember correctly, in the time of 128 extension test  it was decided to use values for near 128 kb/s on average content, not on test items.

The fact that encoder used more bits on this particular sample set just means that it judged them as "hard to encode"

Of course, you could fine-scale encoder for each sample to even give you 128 kb/s for each sample with VBR - but what's the use of that - unless encoder has 2-pass VBR supported and used by most users?

Quote
What are you trying to say there?


That linear scaling of quality to projected bit rate is dead flat wrong.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 23:08:18
Quote
But, until another test result is posted proving it improved a lot, my test results prove it didn't improve all that much.

..for the tested 12 samples... Ideal would be to test something like 120 samples, but this is impossible.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:10:03
Quote
..for the tested 12 samples...

Right, but at least this is a proof, I'm not breaking rule #8 or anything

besides, I test few samples but with a wide variety of styles and a wide variety of listeners, each one of them with their specific sensibilities to artifacts. IMO that is more representative of current state of codec technology than a test consisting of hundreds or samples and only one or two listeners.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 23:17:02
Quote
IMO that is more representative of current state of codec technology than a test consisting of hundreds or samples and only one or two listeners.

I agree and disagree. Group testing is the only way to indicate somekind of average, but 12 samples is way too little to give full picture of codec qualities, it gives somekind of indication.
And eventually there's no objective full picture, it's always subjective.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:19:02
Quote
The fact that encoder used more bits on this particular sample set just means that it judged them as "hard to encode"

Sure. But other encoders probably also found this sample hard to encode. Still, they behaved and didn't allow bitrate to fluctuate wildly.

Quote
That linear scaling of quality to projected bit rate is dead flat wrong.


Hrm... OK, but I still see no substantial proofs of this. You mention some "JAES" quality scales that can't be published, and some speculations about what would happen at listening tests comparing other bitrates, and there are some claims without proof that Nero got much better.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-02 23:19:32
Quote
But if the average bitrate over hundreds of samples is 131-132kbps, on another batch of samples the bitrates would maybe be lower.

Or lower....
Don't forget that I've launched a long batch encoding (1500 tracks, randomly listed from ~250 CD), and average bitrate was something between 136 and 139 kbps. It was only what people call "classical", but there's a great variety within this genre.


But I also agree that we can't balance final score by bitrate. Especially if the bitrate is the average one for short samples. And especially if the samples were selected with difficulty in mind. In these particular conditions, VBR encoders are always bigger than CBR encoders (VBR+difficulty=high bitrate).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:21:11
Quote
but 12 samples is way too little to give full picture of codec qualities

Of course it is. But it gives a picture. Better than having only personal tests comparing codecs of different versions, with different samples, and with different methodology  - and with results that can't be compared amongst themselves.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:23:26
Quote
In these particular conditions, VBR encoders are always bigger than CBR encoders (VBR+difficulty=high bitrate).

Nope. Faac and Compaact! are VBR, the settings I used output average 128kbps over a very big amount of tracks, and still they behaved, staying at 128kbps +-4

I need to publish the bitrate deviation table ASAP >_<
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-02 23:27:25
Quote
I think Guru can verify this too although only for the kind of music he listens.

Yes, Nero AAC improvments are real. At least with the music I listen to.
But I must add that old Nero AAC at ~128 kbps sounded really bad to my ears, clearly lower than lame mp3 for exemple. Now, Nero AAC is much better, though I could still complain about some problems (don't know what's the exact name of the problem -ringing? noise pumping, distorted background?-, but I also hear it on non-classical samples).

I must add that Nero AAC have some other advantages. I could transcode various lossless & lossy formats through Nero AAC frontend or foobar2000. I have gapless playback with this encoder too.
Quality is a very important thing, but handling & extra-fature are important too.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: smok3 on 2004-03-02 23:28:18
i just wonder: is it possible to calculate what's the possibility that with another 12 samples the results would be completely different?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:31:12
Quote
i just wonder: is it possible to calculate what's the possibility that with another 12 samples the results would be completely different?

Not at all. Any claim about it would be just wild speculation.

The only way to know for sure would be conducing a listening test with the different samples.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 23:32:31
Quote
Quote
In these particular conditions, VBR encoders are always bigger than CBR encoders (VBR+difficulty=high bitrate).

Nope. Faac and Compaact! are VBR, the settings I used output average 128kbps over a very big amount of tracks, and still they behaved, staying at 128kbps +-4

I need to publish the bitrate deviation table ASAP >_<

If you want samples which make Compaact behive badly bitrate wise, I have tons of those. 
It depends quite a lot on the music style really. For this "effect music" which for example Dibrom listens sometimes, Compaact's bitrate usually is sky high.
Same with any track of any style of music with lots of sharp attacks practically.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-02 23:33:32
Quote
Nope. Faac and Compaact! are VBR, the settings I used output average 128kbps over a very big amount of tracks, and still they behaved, staying at 128kbps +-4

I need to publish the bitrate deviation table ASAP >_<

I've also tried to batch encode a lot of tracks (the same than for Nero AAC) with faac -q115. I've stoped at ~300 files this time.
Average bitrate was between 111 and 114 kbps (compare it to the 136-139 kbps I obtained for Nero: not the same class with classical).

I don't have the extreme in mind, but I'm quite sure that the higher bitrate reached 140-150 kbps (but it was rare, probably the solo harpsichord tracks). A lot of files were < to 110 kbps.
It means that Faac is highly responsive too. Maybe too much (that's why I've used ABR on my second classical music listening test).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-02 23:38:18
Ok, I used ODG here, but it should be clear what I mean:

(http://es.verat.net/~psyq/odg.jpg)


This is ODG scaling (si02.wav - hard pre-echo case) from 64 to 256 kb/s for plain-vanilla LC-AAC, no PNS, no nothing, Nero flavour -  and compared with bit rate ratio (64 kb/s is 1.0 and 256 kb/s is 4.0)

You see that curves are nowhere near identical - which means that quality does not scale with bit rate in a linear fashion.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:38:43
Quote
Average bitrate was between 111 and 114 kbps (compare it to the 136-139 kbps I obtained for Nero: not the same class with classical).

Yes, but classical account for only 2 of the tested samples.

I quote Spoon:

Quote
First Results for FAAC are in, for 2GB of encodings, quality 100 was 5% below and quality 125 was 5% above, I am thinking quality 115 would be spot on the ball.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-02 23:41:55
Quote
What if we slide bit rate to, say, 192 - so, QT would get 6.3 rating, right?

Wrong.

Wrong indeed.  If the target bitrate was 192, and QT hit 192, then its unadjusted and composite ratings would be the same.  If QT, however, encoded to a target of 192kbps at an actual (who knows how) of 128kbps, then yes, it would have a composite rating of 6.3. 

As I stated, I'm sure my concept is flawed, and if so there are two things that need to be done. 

1.  Tell me how it's wrong.  That's been addressed to some extent (though I'm not enough of an expert to know what the answer really means)... and ...

2. Based on the afforementioned demand for this kind of information, propose an alternative system that would tell which codec provides (on average) the best quality at a fixed filesize target.  Concerning this "fixed target", yes, VBR may "not work that way", but hard drives do.  A file size is a file size, and the larger the average filesize, the less encoded music you can keep in a fixed amount of space.  This is what people want to know.  And also, yes, there will be variation, since using quality-managed VBR encoding produces files of various average bitrates.  But a test like this one can give us a scale of averages for the samples tested (the "qualification" I mentioned before).

From among the Eternal Questions of Psychoacoustic Audio Encoding™"My quality target is X.  How many files encoded to my quality target with Nero AAC can I put on my 20GB hard drive?  OK, how many files encoded to the same quality target with QT AAC can I put in the same space?"

Is there no system that can be incorporated to answer this essential question so many expert and non-expert music encoders have?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:42:43
Quote
You see that curves are nowhere near identical - which means that quality does not scale with bit rate in a linear fassion.

OK. It's obviously a curve. But approximations can be done, specially considering the bitrate difference is only (?) 13kbps, not 32 as it would be in 128kbps vs. 160.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-02 23:43:35
Quote
Quote
Average bitrate was between 111 and 114 kbps (compare it to the 136-139 kbps I obtained for Nero: not the same class with classical).

Yes, but classical account for only 2 of the tested samples.

I quote Spoon:

Quote
First Results for FAAC are in, for 2GB of encodings, quality 100 was 5% below and quality 125 was 5% above, I am thinking quality 115 would be spot on the ball.

Yes. I was really interested by spoon results.
What I'm trying to show is that faac also correspond to my previous equation:
In these particular conditions, VBR encoders are always bigger than CBR encoders (VBR+difficulty=high bitrate).
150 kbps for full complex tracks (harpsichord)
100 kbps for low complex tracks (voice, piano...)
and with short samples, values are even more contrasted

On a listening test based on difficult samples, VBR encodings will probably have an average bitrate superior to common encodings.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:46:05
Quote
On a listening test based on difficult samples, VBR encodings will probably have an average bitrate superior to common encodings.

That's OK. But what if, in a listening test based on difficult samples, a VBR encoder manages to behave and stay close to the target bitrate?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-02 23:49:16
Quote
Roberto
OK. It's obviously a curve. But approximations can be done, specially considering the bitrate difference is only 13kbps, not 32 as it would be in 128kbps vs. 160.


Yes, I said that it could be done, but only by making the bit-rate vs. quality curve based on average content for certain encoder -  it can't be linear approximation because your results would be flawed.

Quote
ScorLibran
1. Tell me how it's wrong. That's been addressed to some extent (though I'm not enough of an expert to know what the answer really means)... and ...


Psychoacoustic coders do not scale quality with bit rate in a linear fashion - meaning that you can't just project quality result to some other bit rate by simple proportion.

Quote
From among the Eternal Questions of Psychoacoustic Audio Encoding™: "My quality target is X. How many files encoded to my quality target with Nero AAC can I put on my 20GB hard drive? OK, how many files encoded to the same quality target with QT AAC can I put in the same space?"


Well - we did these tests before listening test, and Nero ended up in 130.5 (correct me if I am wrong) in average - which indicates how much stuf you would be able  store.

Of course, it depends on signal statistics - so if you want to know exactly how much, you'll either have to encode files,  or to use CBR.  Different music has different masking patterns, and if you use 'real' VBR - it is not easy to predict final bit rate without, at least, psychoacoustic processing.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-02 23:51:24
Quote
Quote
You see that curves are nowhere near identical - which means that quality does not scale with bit rate in a linear fassion.

OK. It's obviously a curve. But approximations can be done, specially considering the bitrate difference is only (?) 13kbps, not 32 as it would be in 128kbps vs. 160.

Not anykind of reliable approximation of cbr quality from vbr. The coding methods are so different. CBR has often different kind of issues than VBR, and CBR can avoid undercoding problems which may happen with VBR especially at mid-low bitrates like near 128kbps average. The tweaking of CBR and VBR profiles must be done totally separately.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-02 23:55:51
Quote
But what if, in a listening test based on difficult samples, a VBR encoder manages to behave and stay close to the target bitrate?

Criticism will shut up
But very ponctuel bitrate explosion might help encoders. See MPC --standard: short frames are reaching 700 kbps with castanets.wav
Imagine the average bitrate on micro samples with VBR encodings...
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:56:23
Quote
it can't be linear approximation because your results would be flawed.

Hrm... no. They could be slightly imprecise, but considering we are extrapolating few kbps - not tens or hundreds - I doubt the results would be much different than if that codec behaved and output test samples at 128kbps average.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-02 23:57:46
Quote
Not anykind of reliable approximation of cbr quality from vbr. The coding methods are so different. CBR has often different kind of issues than VBR, and CBR can avoid undercoding problems which may happen with VBR especially at mid-low bitrates like near 128kbps average. The tweaking of CBR and VBR profiles must be done totally separately.

We are not trying to approximate CBR from VBR. We are approximating a theoretical VBR mode that stayed at an average bitrate of 128kbps across test samples.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 00:00:47
Anyway, AAC quality sucks, according to Microsoft, so we should as well stop discussing this shitty format and move on to WMA standard

http://www.macnn.com/news/19380 (http://www.macnn.com/news/19380)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 00:10:45
Quote
Quote
Not anykind of reliable approximation of cbr quality from vbr. The coding methods are so different. CBR has often different kind of issues than VBR, and CBR can avoid undercoding problems which may happen with VBR especially at mid-low bitrates like near 128kbps average. The tweaking of CBR and VBR profiles must be done totally separately.

We are not trying to approximate CBR from VBR. We are approximating a theoretical VBR mode that stayed at an average bitrate of 128kbps across test samples.

It's pointless, because it's not tweaked and emphasis in real life can be put on psychoacoustic settings which lower bitrate but decrease the quality relatively less than some other setting or this so called linearly approximated theoretical VBR-mode preset.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 00:17:25
Quote
Quote
ScorLibran
1. Tell me how it's wrong. That's been addressed to some extent (though I'm not enough of an expert to know what the answer really means)... and ...


Psychoacoustic coders do not scale quality with bit rate in a linear fashion - meaning that you can't just project quality result to some other bit rate by simple proportion.

Understood.  So naturally my next question would be...

Then what does the math look like for at least getting a fairly accurate composite rating?  Or at least closer than a linear calculation?

Quote
Quote

From among the Eternal Questions of Psychoacoustic Audio Encoding™: "My quality target is X. How many files encoded to my quality target with Nero AAC can I put on my 20GB hard drive? OK, how many files encoded to the same quality target with QT AAC can I put in the same space?"


Well - we did these tests before listening test, and Nero ended up in 130.5 (correct me if I am wrong) in average - which indicates how much stuf you would be able  store.

Of course, it depends on signal statistics - so if you want to know exactly how much, you'll either have to encode files,  or to use CBR.  Different music has different masking patterns, and if you use 'real' VBR - it is not easy to predict final bit rate without, at least, psychoacoustic processing.

So to answer the question of the non-expert music encoder asking how to calculate capacity requirements for each encoding format...

"Encode your collection with a setting you know you like, and if you can't fit all the music you want to onto your drive, then try a little lower setting and start over."

-- Not feasible.

"Use CBR."

-- Not good advice for the best sound quality, in many cases.

So, that person is stuck without a good answer, unless some kind of composite rating system is used with test results such as these.

First, I'd need to have an example of math that would be more accurate than my linear calculations.

And regarding variances across samples and types of music, well, that's where the (now mentioned at least 3 times) qualification of composite rating interpretation would come into play.  Everyone knew going in that in Roberto's test there could only be a limited number of samples, and it was my understanding that it would be safe to quote those results with the qualifier of "with the samples tested, here's how the codecs performed..."

Once we have a more accurate calculation method, we should just go with what we have so far, and as more tests are done, use those results to provide new composite ratings over time.

I'm just trying to speak for the hundreds of people that have posed these questions to me over the past year, and never have gotten a good, straightforward answer.  And if my presentation of these questions and ideas is getting a bit too "simplistic", then forgive me, but inquiring minds want to know! 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: QuantumKnot on 2004-03-03 00:22:08
Quote
Anyway, AAC quality sucks, according to Microsoft, so we should as well stop discussing this shitty format and move on to WMA standard

http://www.macnn.com/news/19380 (http://www.macnn.com/news/19380)

hehehe Interesting article.

Quote
It’s true: Apple’s AAC cuts sound great with the tiny little speakers that come with computers. And they sound pretty good on an original (but AAC upgraded) iPod through the stock headphones. But listen through good headphones and what you’ll hear is dull-sounding bass, slightly sibilant voice quality and a lack of three-dimensionality.


Three-dimensionality?  Perhaps he lacked those special blue and red headphones
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: tcristy on 2004-03-03 00:34:48
I encoded 145 rock songs ranging from classic rock through metal with Compaact Q5 VBR and no low-pass filtering (I hate the low pass, which is why I use Compaact and not iTunes, which sounds dead to me) and got the following distribution:

http://home.columbus.rr.com/tcristy/CPDistro.jpg (http://home.columbus.rr.com/tcristy/CPDistro.jpg)

The average was 126 kpbs.

The highest bit rate song at a bit over 143 kbps average was ZZ Top's Sleeping Bag, which has a synthed cymbal at about 6 beats per second through the whole thing.

Tim
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bidz on 2004-03-03 00:49:23
If all encoders was tested in CBR mode, this discussion wouldn't be a issue

That would - atleast - be a 100% fair comparison.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-03 01:25:10
Last year, Roberto was criticized for choosing CBR for all encoders, especially for Nero AAC, which had a VBR mode. Some people were sure that Nero AAC with VBR would beat QuickTime AAC CBR... That was another discussion, and people didn't considered the forced CBR comparison as something fair. And that's probabably true.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: kwanbis on 2004-03-03 01:51:58
Quote
That would - atleast - be a 100% fair comparison.

it won't, we are trying to compare what is the "standard" among them, and the way it was done is perfect, for me at least.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bidz on 2004-03-03 01:59:29
Quote
Quote
That would - atleast - be a 100% fair comparison.

it won't, we are trying to compare what is the "standard" among them, and the way it was done is perfect, for me at least.

Well, IMHO, a perfect comparison consists of identical bitrates. If one encoder uses VBR, and gets 140kbps - thats not fair to compare against a 128kbps CBR codec IMHO. (even though if the CBR wins).

Just my opinion.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 02:31:26
For my own concern, I'm not so interested in what's "fair".  The bitrate a codec averages on a sample is what it averages...whether it's high or low is incidental to me.

What I'd like to see, though, is some kind of "quality per bit" rating, or a "composite rating".  Like I said, although VBR doesn't really put relevance on bitrate, people do, since bitrate determines filesize, and filesize is used to determine how much music you can fit into a limited space.

Simple concept, but it seems the answer may be quite complex. 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 03:29:46
The bitrate distribution table is up
http://www.rjamorim.com/test/aac128v2/results.html (http://www.rjamorim.com/test/aac128v2/results.html)

Just under the individual plots, as usual.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bidz on 2004-03-03 03:45:49
140 average against 128 .. (Nero - iTunes). I'm really curious on how Nero's rating would be if it would use 128kbps CBR... Oh well..
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-03 05:30:33
Quote
Quote
So that warrants allowing a codec to use more bits than allowed by the test setup?


Hmm.. we had this discussion before - and if I remember correctly, in the time of 128 extension test  it was decided to use values for near 128 kb/s on average content, not on test items.

The fact that encoder used more bits on this particular sample set just means that it judged them as "hard to encode"

Of course, you could fine-scale encoder for each sample to even give you 128 kb/s for each sample with VBR - but what's the use of that - unless encoder has 2-pass VBR supported and used by most users?

Quote
What are you trying to say there?


That linear scaling of quality to projected bit rate is dead flat wrong.

By telling a CBR encoder to encode at 128kbps you are telling it to sacrifice quality to encode at 128kbps.

Then you place VBR encoders in the same situation and tell them to sacrifice bitrate accuracy to maintain quality... in a quality oriented test this is not fair.

Personally (and we do this in video tests), if a codec operates best in VBR mode then a VBR setting should be used which comes as close as possible to the CBR bitrate ON THAT SAMPLE, otherwise what's the justification for not using the VBR bitrate with the CBR encoder, thus asking it to constrain its quality to the same bitrate as the VBR encoder.

Unless of course the granularity on VBR aac encoders isn't that crash hot.


Finding the VBR settings which matches a CBR bitrate might take multiple encodings, but I think its the only way to have a fair test, the test is not how each encoder does at "ultra streaming" quality, but rather at 128kbps.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: sven_Bent on 2004-03-03 05:52:57
Quote
Anyway, AAC quality sucks, according to Microsoft, so we should as well stop discussing this shitty format and move on to WMA standard

http://www.macnn.com/news/19380 (http://www.macnn.com/news/19380)

This article is a joke

He compared the downloaded track to HIS original CD.... who say its from the same source ?

anyway her clearly take time to say bad things about AAC but he really lacks to mention any better alternative.

The burned disc did NOT play in any of my CD players. Not in the ones hooked up to my stereo, my portable players, or even in an old laptop without DVD capabilities. Nor did they play on either of my older MP3 players.

(May 15 update: It turned out the blank CD I was using was bad. I used other discs and was able to play them on my CD players.)


So it took him NINE days to figure out the CD was bad..... now thats a technical guy i can put my trust in...DOH
He is clearly missing the science rule number one: "anything has to be done twice"


<i>Any device that can play a DVD can play burned copies of Apple’s AAC-compressed songs</i>
What the HECK. this is just plain wrong.. unless he burns an AudioCD which he DOES NOT mention at all.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Jerethi on 2004-03-03 06:31:36
Quote
Anyway, AAC quality sucks, according to Microsoft, so we should as well stop discussing this shitty format and move on to WMA standard

http://www.macnn.com/news/19380 (http://www.macnn.com/news/19380)

This article is a pretty pathetic attempt at journalism.  He has obviously done little or no research, and clearly evinces a biased point of view.  This article is completely devoid of merit.

Either he's accusing Dolby of falsely representing the fact that AAC has undergone extensive listening tests, or he's claiming to have the shiniest golden ears this world has ever known.

What the hell are we paying editors for these days?   

EDIT: Well, I guess there's some meaningful content in the comments he makes about iTunes online store, but this hardly what I would call a "review."
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: sony666 on 2004-03-03 07:00:45
I would have preferred only CBR encodings in this test and the previous mp3 one. but that's just me
hope there will be a poll regarding this question for the next test.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 07:16:50
Quote
I would have preferred only CBR encodings in this test and the previous mp3 one. but that's just me
hope there will be a poll regarding this question for the next test.

Impossible. How do you expect me to come up with CBR Musepack? Or VBR iTunes?

Also, Vorbis zealots would slaughter me if I forced it to use CBR.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-03 07:24:19
Quote
The bitrate distribution table is up
http://www.rjamorim.com/test/aac128v2/results.html (http://www.rjamorim.com/test/aac128v2/results.html)

Great!

Can we get the decryption key now too?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 07:28:23
OMG! yes.

http://www.rjamorim.com/test/aac128v2/comments/aac128v2.key (http://www.rjamorim.com/test/aac128v2/comments/aac128v2.key)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: harashin on 2004-03-03 07:43:01
Decrypted my results. Somehow I cannot find my result for sample05 at http://www.rjamorim.com/test/aac128v2/comments/results05/ (http://www.rjamorim.com/test/aac128v2/comments/results05/).
Other results are available there.

Code: [Select]
ABC/HR for Java Version 0.4b3 SE, 20 2月 2004
Testname: Hongroise listening test

Tester: harashin

1L = Sample05\Hongroise_1.wav
2R = Sample05\Hongroise_5.wav
3L = Sample05\Hongroise_2.wav
4R = Sample05\Hongroise_3.wav
5R = Sample05\Hongroise_4.wav

---------------------------------------
General Comments:
---------------------------------------
2R File: Sample05\Hongroise_5.wav
2R Rating: 4.5
2R Comment:
---------------------------------------

ABX Results:
Original vs Sample05\Hongroise_1.wav
   25 out of 41, pval = 0.105
Original vs Sample05\Hongroise_2.wav
   15 out of 28, pval = 0.425
Original vs Sample05\Hongroise_3.wav
   16 out of 32, pval = 0.569
Original vs Sample05\Hongroise_4.wav
   6 out of 14, pval = 0.788
Original vs Sample05\Hongroise_5.wav
   6 out of 6, pval = 0.015
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: sony666 on 2004-03-03 07:46:56
I subconsciously associated multiformat with "mp3 vs. AAC" by wishful thinking, and unjustly pushed aside mpc and vorbis in my (sleepy) mind. my apologies.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 07:47:27
For some strange reason, I couldn't decrypt some of the Sample05 (and a few other samples) results.

It must be a bug in ABC/HR Java

Here's an example:
http://pessoal.onda.com.br/rjamorim/results09.zip (http://pessoal.onda.com.br/rjamorim/results09.zip)

BTW: I already reported the bug to schnofler.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: harashin on 2004-03-03 07:53:27
Quote
For some strange reason, I couldn't decrypt some of the Sample05 (and a few other samples) results.

It must be a bug in ABC/HR Java

Here's an example:
http://pessoal.onda.com.br/rjamorim/results09.zip (http://pessoal.onda.com.br/rjamorim/results09.zip)

BTW: I already reported the bug to schnofler.

I see. The results09.erf can't be decrypted here, either.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-03 09:02:48
Quote
Decrypted my results. Somehow I cannot find my result for sample05 at http://www.rjamorim.com/test/aac128v2/comments/results05/ (http://www.rjamorim.com/test/aac128v2/comments/results05/).
Other results are available there.

My result for sample 5 is missing there as well, for some unknown reason. No problem decrypting it locally:

Code: [Select]
ABC/HR for Java Version 0.4b3 SE, 20 Februar 2004
Testname: Hongroise listening test

Tester: Continuum

1L = Sample05\Hongroise_3.wav
2R = Sample05\Hongroise_2.wav
3L = Sample05\Hongroise_5.wav
4R = Sample05\Hongroise_4.wav
5L = Sample05\Hongroise_1.wav

---------------------------------------
General Comments:
---------------------------------------
1L File: Sample05\Hongroise_3.wav
1L Rating: 3.0
1L Comment: pre-echo? smeared at 10.6
---------------------------------------
2R File: Sample05\Hongroise_2.wav
2R Rating: 4.5
2R Comment: ABC-HR click moved (start pos 9.03)
little problem at sec 15
---------------------------------------
3L File: Sample05\Hongroise_5.wav
3L Rating: 2.0
3L Comment: severly smeared at ~10.6
---------------------------------------
4R File: Sample05\Hongroise_4.wav
4R Rating: 4.8
4R Comment: very little pre-echo at sec 22-23
---------------------------------------
5L File: Sample05\Hongroise_1.wav
5L Rating: 2.5
5L Comment: sec 15
---------------------------------------

ABX Results:
Original vs Sample05\Hongroise_2.wav
   1 out of 1, pval = 0.5
Original vs Sample05\Hongroise_1.wav
   5 out of 5, pval = 0.031


I find this sample particulary interesting. All codecs had problems at different sections. And looking at the bitrate table,
Code: [Select]
            Nero  Real  Faac  Comp!  iTunes
Hongroise   148   128   105    123    128

I am quite shocked by Nero's performance. Faac on the other hand got a respectable rating, considering its low bitrate.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: spoon on 2004-03-03 09:27:58
Quote
Nope. Faac and Compaact! are VBR, the settings I used output average 128kbps over a very big amount of tracks, and still they behaved, staying at 128kbps +-4

I need to publish the bitrate deviation table ASAP >_<

The audio tracks I compresses were a mix of Pop, Rock, Rap - popular tracks. I could upload a playlist if required.

If you were to publish the bitrates over 2GB then any squabbling should stop, anyone looking at the results page for the first time could get the wrong impression that Nero was encoded too high.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 09:57:43
Quote
Then you place VBR encoders in the same situation and tell them to sacrifice bitrate accuracy to maintain quality... in a quality oriented test this is not fair.

Personally (and we do this in video tests), if a codec operates best in VBR mode then a VBR setting should be used which comes as close as possible to the CBR bitrate ON THAT SAMPLE, otherwise what's the justification for not using the VBR bitrate with the CBR encoder, thus asking it to constrain its quality to the same bitrate as the VBR encoder.

This issue has been discussed n+1 times with previous tests already.
Essentially it comes down to the practical use. In audio encoding you encode usually a whole album (or a long movie track) and expect certain average size from the whole album.
It's no use to measure a method which nobody uses (encode every track one by one and check track bitrate every time, and change between cbr/vbr).
The idea of testing vbr is to get some indication (with only 12 samples though) of a certain codec setting which gives certain average bitrate and quality.
If you measure several codec settings at once, it's practically a mess which doesn't tell much anything, especially because the sample amount is already very low.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 10:11:00
Quote
140 average against 128 .. (Nero - iTunes). I'm really curious on how Nero's rating would be if it would use 128kbps CBR... Oh well..

Difficult to say. As it's been said, the drop in quality is not linear compared to bitrate, and CBR and VBR have different issues. CBR could have coped certain problems sections even better, even if the overall bitrate is lower. The 10kbps bitrate increase doesn't necessarely come from sections which would otherwise been audibly clearly worse (although psychoacoustic might think so).
If you check the last test 8 months ago where Nero used CBR128 and consider it has become better, you can get some indication.
http://www.rjamorim.com/test/aac128test/results.html (http://www.rjamorim.com/test/aac128test/results.html)

Imo ABR would be the safest coding method at the bitrates here.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-03 10:20:40
Quote
Hongroise  148  128  105    123    128
I am quite shocked by Nero's performance. Faac on the other hand got a respectable rating, considering its low bitrate.

Yes, bitrate surprises me as well. I'm used to encode piano music with VBR encoder like mpc or lame, and bitrate is often very friendly (mpc --standard at ~140 kbps, --alt-preset standard at 150-160 kbps). Nero high bitrate is strange (but of course, not necessary a bad thing, if you keep quality as purpose). Another VBR encoder needs much more bitrate than average on this sample: wavpak v.4 alpha2 lossy -q (~450 kbps, whereas 300...350 for 99% of all other tracks).
I wonder why? Does someone have an explanation? Native dithering?

About quality now:
Many people are sharing the feeling I have since months: iTunes encoding have serious problems with smearing here. Generally, iTunes is doing a very good job on transients, with few pre-echo. It seems that this piano sample is more problematic for iTunes encoder than a castanets one.
Compaact is amazing (overall results show this encoder as winner).
On my test, Real AAC obtained here the only notation > 3.0. I guess it's because lowpass at 15 Khz don't have consequences here. I don't know if Karl Lillevold could request to Coding Technologie or to Real internal development Team a modification at 128 kbps, in order to have something less agressive (afterall, lame lowpasses at 17500 hz for abr 128, and I don't see why an AAC encoder had to lowpass at 15000).
Faac and nero disapointed me. Not for pre-echo, but other problems. Faac ABR is surely better here (VBR drop to 105 kbps...).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: robUx4 on 2004-03-03 10:42:54
I haven't seen mentioned this question so far, so here I go.
Would it be possible that the AAC decoder used in the test may favor one or the other encoder ?
Maybe a test of AAC decoders with the same encoder could make sure it's all safe ?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-03 10:49:56
Most people fail on separate different AAC encoder. ABXing decoder difference (on LSB) is very, very hard.
look at:
http://www.foobar2000.net/mp3decoder (http://www.foobar2000.net/mp3decoder)
There are samples if you want to test mp3 decoders (and some techniques, like dithering and noise shaping). See if you could ABX them.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 11:01:52
Quote
I haven't seen mentioned this question so far, so here I go.
Would it be possible that the AAC decoder used in the test may favor one or the other encoder ?
Maybe a test of AAC decoders with the same encoder could make sure it's all safe ?


QT AAC decoder and FAAD2 give bit-to-bit identical results for LC-AAC content, regardless of the encoder used.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 11:42:38
Attempting to approximate the curve to more accurately compensate for bitrate variances in each codec.  Without some decent information about efficiency measurement in the mysterious AAC format, this can't be done.  But if we had someone that could answer the question at hand, then it would indeed be possible.  If AAC's efficiency should be measured on a curve for bitrate variances, then this should be the base algorithm...

composite rating = quality/bitrate

( test rating * SQRT(target bitrate) ) / SQRT(actual bitrate) = composite rating

This is obviously not the formula to use (watch someone claim I've said this  ).  But if I could simply squeeze the information out of an AAC expert, then a derivative formula could be developed.

Or, perhaps, the calculation of AAC's efficiency isn't the overly-complex pseudoscientific junk I'm being told of in this thread.    It would make more sense if this format's efficiency were measured just like that of any other codec...by compression rate at a quality point.  But if the "house codec" must be measured differently to try to sway the results in its favor, let's squeeze out the information we need to do that. m'kay?  *sheesh*

[span style='font-size:7pt;line-height:100%']Late edit:  typos and grammar...[/span]
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 12:12:26
ScorLibran,

Again - your method assumes that:

a - Encoders use simple flat SNR scaling for increasing/decreasing bit rate
b - AAC subjective performance is linear with bits/sample
c - SNR decrease of N would automatically yield decreased SDG
d - You are using limited sample set to get bit rate information

All four points are against this method of approximation.

You will also notice that subjective ranking highly depends on other encoders in the test and also on presence of low-rate anchor.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 12:22:04
Quote
composite rating = quality/bitrate

( test rating * SQRT(target bitrate) ) / SQRT(actual bitrate) = composite rating

Please stop this pseudo-scientific BS already. There's no way you can reliably create any kind of approximation formula here. Every codec should have its own formula for starters, and those would need lots of testing. And you are trying to do this so called approximation based on only 12 samples (like 12 samples wouldn't be enough inaccurate by itself, gotta add some approximation formulas from your hat..) 
It seems it's becoming increasingly difficult to keep HA from getting totally out of control recently (and I'm not only talking about this thread)..  *sigh*
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: duartix on 2004-03-03 14:57:30
Quote
Decrypted my results
What tool do you use to decrypt the *.erf files?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 15:56:53
Quote
Again - your method assumes that:

a - Encoders use simple flat SNR scaling for increasing/decreasing bit rate
b - AAC subjective performance is linear with bits/sample
c - SNR decrease of N would automatically yield decreased SDG
d - You are using limited sample set to get bit rate information

All four points are against this method of approximation.

You will also notice that subjective ranking highly depends on other encoders in the test and also on presence of low-rate anchor.

If "a" would require a modifed calculation, then please tell me what that should be.  (I think this is the 4th or 5th time this has been requested.)  "b" is addressed with a non-linear calculation to support this.  Again, if the particular calculation isn't correct, then I'm certainly willing to try a new one.    You'll have to define your acronyms for "c".  And for "d", the limited sample set is already a qualification for any claims made, whether for the composite rating or the unadjusted ratings.  (This has already been addressed exhaustively, both in this test and in every other to my knowledge.)

The shortcomings of the rating system are certainly not limited to this approach.

Quote
Please stop this pseudo-scientific BS already. There's no way you can reliably create any kind of approximation formula here. Every codec should have its own formula for starters, and those would need lots of testing. And you are trying to do this so called approximation based on only 12 samples (like 12 samples wouldn't be enough inaccurate by itself, gotta add some approximation formulas from your hat..) 
It seems it's becoming increasingly difficult to keep HA from getting totally out of control recently (and I'm not only talking about this thread).. *sigh* 

And (said yet again  ) the limited sample set can only be represented with a qualifying statement covering the scope of the sample set, which would be required even for quoting the unadjusted ratings of these codecs.

An incomplete idea != pseudoscience.  If I had labeled this as "complete" without accurate development and testing, then it would be.  See the difference?  Because it doesn't change the ratings to something more favorable is no reason to label an attempt to "equalize the scale" as invalid. 

If the formula needs to be changed, then I completely agree, especially if it will make the system more "acceptable".  I have stated repeatedly that the method makes no "official" statement, that the formula is not accurate without further input from the experts, and that it's results could only be qualified in the scope of the sample set tested.

As for HA being "out of control", I have no idea what that refers to (unless I've missed some specific activity over the past few weeks  ).  My posts, on the other hand, are on-topic, respectful, inquisitive, relevant, and seek to resolve an ongoing dilemma that has been somewhat of an issue here.  If the approach is flawed, then lets fix it.  If this is not the right approach at all, then let's replace it with one that is.  Abandoning any hope to resolve an issue is no solution.  That would, instead, be reminiscent of /. 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-03 16:50:06
I don't see any possible way to change codec ratings based on used bitrates. It makes the test result completely intransparent and dependent on dubious variables. All this appears to be wild speculation and trying to outwit the codec's VBR algorithm. (Which hopefully is more sophisticated than any simple result recalculation formula.)

A far better way, IMHO, to overcome the bitrate differences and the resulting quarrels, would be to use a sample suit more representative of the total covered music spectrum, i.e. use more ordinary samples instead of problem samples. Then the VBR codecs were forced to underbit some files, which possibly caused problems for them. I say possible, but it's quite reasonable to think, that all codecs will return near transparent results for the easy samples, even those that use a lower bitrate. And as a user, I'm more interested in the worst case scenario anyway.

(While I think the idea is inherently flawed, I think JohnV's reaction was a little excessive.)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 16:58:21
Quote
this is not the right approach at all, then let's replace it with one that is.  Abandoning any hope to resolve an issue is no solution.  That would, instead, be reminiscent of /. 

Sure there is a right approach: arrange a new group test using the settings you are trying to approximate.
Anything else is pure speculation. Psychoacoustic audio and artifacting is way too complex issue for the kind of approximation you are presenting here (not to mention the small number of samples).
But remember also, that there's no sense to tweak "fatboy.wav to vbr 128kbps". So for near identical individual sample bitrate testing the only correct method is to use cbr or abr.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 17:13:13
Quote
I don't see any possible way to change codec ratings based on used bitrates. It makes the test result completely intransparent and dependent on dubious variables. All this appears to be wild speculation and trying to outwit the codec's VBR algorithm. (Which hopefully is more sophisticated than any simple result recalculation formula.)

A far better way, IMHO, to overcome the bitrate differences and the resulting quarrels, would be to use a sample suit more representative of the total covered music spectrum, i.e. use more ordinary samples instead of problem samples. Then the VBR codecs were forced to underbit some files, which possibly caused problems for them. I say possible, but it's quite reasonable to think, that all codecs will return near transparent results for the easy samples, even those that use a lower bitrate. And as a user, I'm more interested in the worst case scenario anyway.

I support your view entirely. 

Your second paragraph isn't so much of an issue though, IMO, since any sample set used for this test would be limited, whether "problem samples" or "ordinary samples", and as such quoting results will always require qualification.  For the purpose of determining the efficiency of a codec, anyway.  A need to use "ordinary samples" for any other reason would have its own justification as well.

Your first paragraph covers a relevant argument, but it's not my argument, per se.  I'm not trying to "outwit the VBR algorithm" of each codec, and if it seems like I have been, it's incidental.  The root of my argument is the need to resolve this issue...

Quote
...although VBR doesn't really put relevance on bitrate, people do, since bitrate determines filesize, and filesize is used to determine how much music you can fit into a limited space.

Most people (I believe) would like the best sound quality out of their AAC encodings, but many also have limited HD space, and therefore would be very interested in knowing the most efficient codec.

I know it may look like I'm trying to "nail down" VBR operating principles when exploring my point, but this is only because...If filesize (and hence, bitrate) were NOT of key importance to people, then there would be many more people using a lossless codec rather than AAC.

Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-03 17:21:11
Quote
move on to WMA standard

why tf does wma have the right to be called a standard?   

and this crap is one reason more why rjamorim should include apples aac into the multiformat test, to be possible to compare it with wma9 std
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 17:21:28
Quote
If "a" would require a modifed calculation, then please tell me what that should be.


It is way too complex and different for each encoder implementation.

For example, Nero AAC has, at least 100 parameters that could be tweaked - each parameter has its own range (tone masking noise from 0.0 dB to 30 dB,  M/S thresholds from 0.0 dB to 20 dB, short block switching constants - from 0.0 to 1000.0, etc...)

Each of this parameters has impact on target bit rate - but the impact on overall quality is different and hard to model.  Especially if you take into consideration that each listener is more sensitive to one property (say, pre-echo) than other (say, stereo image)

So - for each bit rate / vbr preset we want to find optimum set of these parameters that gives target average bit-rate (less job for 2-pass quantizeer/noise shaper)  - and the final quality does not scale in the linear manner - far from that.

Quote
(I think this is the 4th or 5th time this has been requested.) "b" is addressed with a non-linear calculation to support this. Again, if the particular calculation isn't correct, then I'm certainly willing to try a new one. You'll have to define your acronyms for "c". And for "d",


SNR is "Signal to Noise Ratio" - or even better "Signal to Mask Ratio" (SMR)

BUT final SDG (this is the mark - 1.0 to 5.0)  does not scale in a linear fashion with the SMR - and SMR does not scale in linear fashion with the bit-rate - now, do you get my point?  You have two variables which are not linearly dependent on each other - and the final result which is not linearly dependent on those variables -  and, finally, way we allocate SNR  might differ on pre-echo content,  classical, etc..  making your "guessing" approach not very good.

Because your approach is missing one known thing - i.e. Perceptual Entropy of the source - meaning, minimum amount of bits required to encode something with desired quality - for tracks like fatboy - it is, say 192 kb/s - and for tracks like es02 (german speech) it is 96 kb/s -  now, how could possibly fixed VBR at, say, 128 give same quality , i.e. "SDG" gor both sources?


So, at the end, you have these things:

1 - Perceptual Entropy of the Source (minimum number of bits to code a sample in a transparent manner)  depends on: psychoacoustic model

2 - Target SMR for a particular bit rate (depends on perceptual entropy)

3 - Target SDG (depends on SMR, huffman encoder performance, bit allocation, etc..)

You will end up in the conclusion:

SMR for a fixed bit rate depends on perceptual entropy of the source

And perceptual entropy of the source could be anywhere between 80-300 kb/s

SDG of the encoder depends on SMR, pre-echo parameters, bandwidth,  mid/side stereo coding efficiency >and< user's sensitivity to each of these parameters
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 17:30:16
Quote
Sure there is a right approach: arrange a new group test using the settings you are trying to approximate.
Anything else is pure speculation. Psychoacoustic audio and artifacting is way too complex issue for the kind of approximation you are presenting here (not to mention the small number of samples).

So we could have, perhaps, five popular AAC codecs to test.  We'll try to get as close as is feasible to a target of, oh, how about 128kbps?  And we could use, let's say, 12 audio samples which may not the the "toughest" problem tracks, but ones found in the past to be harder-to-encode than ordinary music.

Hey, I found the results of just such a test! (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&) 

So, now that we have an accurate results set for the samples tested, perhaps there's a way to adjust for variances in bitrate, since that's something that many people have brought up as an issue.  I found someone starting this kind of approach here (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189723). 

No need to reinvent the wheel, especially on the subject of AAC codecs.  We already have an idea proposed (but, as said, far from accurate).  But if it's the wrong approach entirely, then it should be scrapped.

But these questions remain...

Quote
"My quality target is X. How many files encoded to my quality target with Nero AAC can I put on my 20GB hard drive? OK, how many files encoded to the same quality target with QT AAC can I put in the same space?"

...and so on, for the other codecs.

With a fixed quality target, and discarding any kind of composite rating method, how else can we possibly answer these questions without too much vagueness, and without telling every person who asks to go "choose a setting and encode their collection, then keep starting over until they find the answer themselves"?

Therein lies the essence of my quest. 


Edit:

@ Ivan:  I just read your post after my last submission.  Thanks for the explanation.  I have to get back to work, but I'll return to this later to study it further.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 17:46:56
Quote
So we could have, perhaps, five popular AAC codecs to test.  We'll try to get as close as is feasible to a target of, oh, how about 128kbps?  And we could use, let's say, 12 audio samples which may not the the "toughest" problem tracks, but ones found in the past to be harder-to-encode than ordinary music.

Hey, I found the results of just such a test! (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&)  

So, now that we have an accurate results set for the samples tested, perhaps there's a way to adjust for variances in bitrate, since that's something that many people have brought up as an issue.  I found someone starting this kind of approach here (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189723). 

That's 8 months old test which uses different sample set and lacks Real and Compaact. Hardly results which can be now used for the base of approximation in any reasonable way...
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 18:22:00
Quote
That's 8 months old test which uses different sample set and lacks Real and Compaact. Hardly results which can be now used for the base of approximation in any reasonable way...

Erm.....the link goes to page 1 of this thread. 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: 21_already on 2004-03-03 18:45:59
I'd like to make a suggestion that is for certain impractical for a test that requires taking as many test samples as possible to verify its reliability, but here it goes.

I'd really like to see a test (in the future) measuring audio samples of extended length, containing not just music, but sections of silence, spoken word sections and extremely spatially dynamic sections.

Why? well this could really show how good the vbr decision making is over some extremes within a sample, but more importantly let's remeber that audio compression is also used in MOVIES as well, without the benifit of the user being able to change presets for different sections of the film. It's true that the video compression usually takes precident, but improving the audio always helps. In fact it's quite important for audio to hit its file size targets when encoding video since i'd preferentially code the video first, leaving a defined space for the audio (becoming a real reason for the use of 2-pass audio coding).

In any case, i suppose it's too much burden for testers to sit through even 10 minute tracks, plus i don't know how a tester would rate one segment but not another, but i really would like to know which AAC codecs DOESN'T through bits at a section i know is pretty much silent (not to say that they excessively do).

Well i guesss it's a stupid suggestion. Don't burn me too bad.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-03 19:02:45
21_already:
Longer samples are not needed for this. Even a short sample with spoken words (preferably different speakers) and near silence would suffice for that purpose. There was a discussion preceding this test, if a speech sample should be added.
Because of lack of time and the unclear quality of the source (compressed DVD) this idea was discarded; but I think it's still something to keep in mind for the next tests.

The problem however, with this type of sample is, that they are generally easy to encode, and finding differences becomes very difficult for the listener.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 19:20:10
Quote
Quote
That's 8 months old test which uses different sample set and lacks Real and Compaact. Hardly results which can be now used for the base of approximation in any reasonable way...

Erm.....the link goes to page 1 of this thread.   

Uh, I thought you meant Roberto's earlier 128kbps cbr AAC test, especially because you have been screaming about the bitrate issues and now talked about 128kbps. I don't know why you linked to this current test and this same thread.

Could be that I'm just tired of the amount of "you know what" in this thread. I'd hope those who actually raise issues like this "quality approximation" and start throwing formulas, would be experienced in both how psychoacoustic audio encoders work and listening testing with the codecs in question (well, anybody like that wouldn't raise this particular issue). It's just pretty frustrating to read and explain things again and again. Fortunately at least one developer (Ivan) is also interested in keeping some kind of knowledge level in the HA discussions.
If you suggest something and start a wide scale speculation, it would be good to have first even basic knowledge of the related issues which you are trying to handle.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-03 19:57:29
Quote
I'd like to make a suggestion that is for certain impractical for a test that requires taking as many test samples as possible to verify its reliability, but here it goes.

I'd really like to see a test (in the future) measuring audio samples of extended length, containing not just music, but sections of silence, spoken word sections and extremely spatially dynamic sections.

Why? well this could really show how good the vbr decision making is over some extremes within a sample, but more importantly let's remeber that audio compression is also used in MOVIES as well, without the benifit of the user being able to change presets for different sections of the film. It's true that the video compression usually takes precident, but improving the audio always helps. In fact it's quite important for audio to hit its file size targets when encoding video since i'd preferentially code the video first, leaving a defined space for the audio (becoming a real reason for the use of 2-pass audio coding).

In any case, i suppose it's too much burden for testers to sit through even 10 minute tracks, plus i don't know how a tester would rate one segment but not another, but i really would like to know which AAC codecs DOESN'T through bits at a section i know is pretty much silent (not to say that they excessively do).

Well i guesss it's a stupid suggestion. Don't burn me too bad.

Some samples in this test were 30 seconds and others were 20 seconds....

I dreaded the 30 second samples...

I would not take part in a test with 10 minute samples
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 20:08:50
Besides, copyright would hit on me hard if I distributed samples larger than 30 seconds.

(there is no law anywhere allowing up to 30 second samples being used for research purpose, legally even a 3 seconds sample is under copyright. But, oh well, it seems it became vox populi that distributing excerpts of up to 30 seconds is OK)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 21:32:26
Quote
Quote
Quote
That's 8 months old test which uses different sample set and lacks Real and Compaact. Hardly results which can be now used for the base of approximation in any reasonable way...

Erm.....the link goes to page 1 of this thread.   

Uh, I thought you meant Roberto's earlier 128kbps cbr AAC test, especially because you have been screaming about the bitrate issues and now talked about 128kbps. I don't know why you linked to this current test and this same thread.

Could be that I'm just tired of the amount of "you know what" in this thread. I'd hope those who actually raise issues like this "quality approximation" and start throwing formulas, would be experienced in both how psychoacoustic audio encoders work and listening testing with the codecs in question (well, anybody like that wouldn't raise this particular issue). It's just pretty frustrating to read and explain things again and again. Fortunately at least one developer (Ivan) is also interested in keeping some kind of knowledge level in the HA discussions.
If you suggest something and start a wide scale speculation, it would be good to have first even basic knowledge of the related issues which you are trying to handle.

I've rescanned the thread, and haven't seen anyone screaming, especially me.  (That's what the caps lock key usually denotes.)  This is a discussion which has clearly riled a few feathers, but it really should not.  As I've said, this discussion is one which is relevant and frequently asked about.  My seeking a solution to this issue with this particular test is quite on-topic.

As for having a basic knowledge of the related issues, that's exactly what I've brought to the table.  I've become (not by choice) a bit of a "liason" from many other forums to Hydrogenaudio, as many people I've talked to are intimidated even lurking here.  This, to my knowledge, is the best place to come for real knowledge about psychoacoustic audio encoding, and I've volunteered many hundreds of hours to "translate" the information here into something the people who come to me can understand.  I've become a "bridge", if you will.  So the "related issues" are what to tell these people who want an answer to the questions I've repeated several times.  Finding that out is my job.  Developing these audio codecs is Ivan's, among several others here.  We each have our relevant areas of knowledge on this subject.

If there is a better place to go to build knowledge about AAC, then please let me know.  If there is not one, and if a formula of some kind cannot be created to provide information to the people who have come to me about which AAC codec is the most efficient, then we've traversed new ground here (or are at least attempting to).  And yet again, giving up may actually not be the best solution.

So, back on-topic...codec efficiency is measured as quality/filesize. right?  And filesize is determined by average bitrate, right?  Then we hit the wall.  Efficiency is quality/bitrate, yet VBR modes shouldn't be measured from their average bitrate.  Bitrate determines filesize.  Which means the efficiency of an AAC codec in VBR mode should not be measured as quality/filesize?  Then how can its efficiency be measured?  Encoding time?    I'm afraid that's not answering the questions people have about test results such as these.

One of the natural interpretations of these test results by a person is:  How big (on average) will the files be when encoded with each of these codecs to the threshold of acceptable sound quality?  (Sound quality which at least two codecs in this test would likely satisfy for many people at the settings they were tested with here.)

(Note that there is no screaming....Note that this is on topic...Note that nothing is gained by insulting my level of knowledge instead of providing answers to relevant questions.) 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: kwanbis on 2004-03-03 22:12:09
Quote
Quote
move on to WMA standard

why tf does wma have the right to be called a standard?   

and this crap is one reason more why rjamorim should include apples aac into the multiformat test, to be possible to compare it with wma9 std 

i think it mean no WMA PRO
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 22:18:00
ScorLibran: I'm really starting to insult you soon, if you refuse to even try to understand what has been said here before..   
If you read the posts about this issue again, you should at least get a glimpse how hard this kind of approximation method would be to implement in practise, not to mention how hard it would be to verify its validity.
In practise it would need another group testing to verify the validity, but even then the group test would be limited to very small number of samples, and no doubt you'd like to use this "formula" for more wide scale conclusions.

Much easier job than trying to create even a somehow working approximation system, would be just to arrange a new test and just be happy that it gives some indication of the quality with 12 or so samples. I'm saying again that the only reasonable method to measure quality of identical bitrate samples is to use cbr/abr, not anykind of impossible approximation or vbr tweaking which never can be fair or make sense (think about tweaking fatboy down to 128kbps vbr or some speech sample up to 128kbps vbr, extreme example, but there's this kind of problems if the "vbr bitrate tweak method would be implemented for a listening test", not to mention all vbr codecs have different emphasis, so choosing samples fairly would be very hard).

I really hope you start to understand that what you are asking is not going to happen. It's just not possible in practise.

PS. Sorry if I have insulted you. I try to be reasonably nice here as often as possible, but on some days it's very hard..
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-03 22:18:08
what about using atrac in the multiformat test?

sony will use it for its own music store (or does it already?) and wasnt tested anywhere till now...
maybe its interesting?

Quote
i think it mean no WMA PRO

Title: AAC at 128kbps v2 listening test - FINISHED
Post by: kwanbis on 2004-03-03 22:20:09
Quote
If filesize (and hence, bitrate) were NOT of key importance to people, then there would be many more people using a lossless codec rather than AAC.

if that is the case, they can use 64KBPS CBR
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 22:20:37
Quote
So, back on-topic...codec efficiency is measured as quality/filesize. right? And filesize is determined by average bitrate, right? Then we hit the wall. Efficiency is quality/bitrate, yet VBR modes shouldn't be measured from their average bitrate. Bitrate determines filesize. Which means the efficiency of an AAC codec in VBR mode should not be measured as quality/filesize? Then how can its efficiency be measured? Encoding time? I'm afraid that's not answering the questions people have about test results such as these.


Ok, I'll try introductory specs...  psychoacoustics and coding for dummies

Let's say this -

CBR = Constant Bit Rate, Variable Quality,  ok?
VBR = Variable Bit Rate, Constant Quality, - I guess everyone agrees with this.

Q - But how the "quality" is obtained, what is it?
A - "quality" is defined in the encoder as the ratio between amount of noise in each frequency band (Noise) and Masking threshold - this is called NMR

Q - How is masking threshold obtained?
A - Masking threshold is obtained in psychoacoustic model

Q - Is masking threshold constant for a variety of content
A - No, masking threshold is not constant - it depends on signal statistics


Ok.. now we established some general measures -

Q - I want to reach quality Q, how much bits do I need?
A - It  depends on so-called perceptual entropy and encoder deviation from it

Q - But what is a Perceptual Entropy?
A - It is a measure that estimates amount of bits required to code a signal with masking threshold

For AAC:  BITS = 0.6 * PE + 24 * SQRT ( PE ), where PE is LOG2(SUM(ENERGY/THR))

Q - Is PE constant for music?
A - No, it is not,  you would find out that PE varies between signals a lot

Q - What are deviations from PE? What do you mean?
A - No encoder is perfect, and PE is theoretical minimum of the transmitted information - 0.6 and SQRT are there to approximate AAC's huffman encoder and side information packer's efficiency, as well as MDCT diagonalization power.  However, it is just an approximation, and in real world it could be wrong more than 10%

Q - Great.. so I have the PE.. and I could estimate target quality by scaling it? Right?
A - Wrong.. target subjective quality is not just frequency domain NMR dependent

Q - But what is it then?
A - It is combination of NMR, stereo image quality, temporal performance (mind you that AAC psychoacoustic works in freq domain), perceived bandwidth, noise loudness, noise distrubance over time, roughness, etc..  and each person weights each of this parameters different in a final 'SDG' score - and, each encoder has different means of scaling up and down these figures depending on the bit rate.

Q - Ok, but if I have material - I wan't to know it's quality for target size A and encoder B?
A - You'll have to encode to get the score, each encoder scales bit rate differently and this depends on the material, encoder tuning, etc.. there is a "shortcut" - if you build your own database of bit rate deviations vs. quality deviations - but again, each encoder could also be "produced" to optijmize itself for a particular deviation, making the job not very easy

Q - Does the quality scale with the bit-rate deviation, for example if I want to code the signal at 160 kb/s and PE says it needs 192 kb/s?
A - It does, but the way it scales, and the amount of increase/decrease is encoder and algorithm dependent


Ok.. this was the small FAQ.. now back to the topic

Signal statistic vary from signal to signal,  if the encoder uses real VBR, which is proportional to the PE - you will find out a big difference between particular audio items - for encoding at specific size  ABR is the best option.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 22:28:49
Quote
i think it mean no WMA PRO

Yes. WMA Std is the WMA codec for stereo, 16bit and up to 48kHz frequencies. WMA Pro is for 24bits, multichannel or high frequencies.

WMA Pro was tested in the last multiformat 128kbps test. WMA Std will be tested in the upcoming one.

Quote
what about using atrac in the multiformat test?

sony will use it for its own music store (or does it already?) and wasnt tested anywhere till now...
maybe its interesting?


Atrac, Atrac3 or Atrac3+?

RealProducer 8.5, SonicStage1 or SonicStage2?

See the problem? :/

Also, I refuse to install that #$&%! crapola SonicStage in my PC. Last time I tried I had to reinstall Windows from scratch. If people want it, somebody will have to send me encoded -> decoded samples.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 22:31:09
Actually

Only feasible method would be a 2-pass VBR (with target bit-rate - which makes it ABR at the end), where you would analyse bit rate demand in the first pass - and inform user how much bits are missing, and make some "rough" measure of missing_bits->amount_of_artifacts map.

As you see, if you set a target to, say, 128 kb/s - and perform same thing for, say, Fatboy.wav  and, say, some_speech.wav you will get something like this:

Fatboy: Oops we're missing at least 64 kbit/s
Speech: Man, we have extra 32-64 kbit/s to waste, it's transparent anyway

Do you see what I mean?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: 21_already on 2004-03-03 22:44:53
Hmm i seem to be posting more than ever before.

They keep telling me in class that ppl are now using Neural Network Software to discern patterns that might not be obvious to traditional mathematical models, but for which there are many sets of sample data (they use it in genetics/proteomics to search for example through microarray data of everything thing an organism is doing and work out how it might be all related).

Now they say in class that you have to train the neural network first on a set of known data, then try it on a set of data you pretend you don't know, and see if it's predictions match the reality.

Can you see where i'm going yet 

If we took enough ABX test samples and tracks along with perceptual quality scores (as determined by the hundreds of real live humans on HA for example) and fed them into a neural network, it could 'learn' what it is in sound that makes us think it's high quality, and what it is that makes us think it's bad, spitting out a predicted human score for a given track. Perhaps it could even take a source track and recommend a codec and settings at a desired 'quality level'.

I recognise that there are problems with this, but i figure it's just as good a way as any to measure quality, and if you had the home version, you could train it yourself for your own perceptual and musical tastes.
hmm, they're probably doing this already aren't they???
I guess this is shifting away from AAC but it's still in the interests of testing
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-03 22:49:19
Quote
just be happy that it gives some indication of the quality with 12 or so samples

Argh. I'm tired of this "only (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189954) 12 (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189758) samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189765)" bullshit (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189918). You claim ScorLibran is speculating, but you are also speculating badly claiming 12 samples " is too few"  and " doesn't show the whole picture (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189743)", without proof or anything like that. And breaking rule 8 while at it.

I will only accept this kind of criticism after you conduce a public test with at least a hundred samples and prove the results are different and more significant. Anything short of that is pure wild speculation, and should have no place in this forum.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-03 22:57:59
Quote
Quote
just be happy that it gives some indication of the quality with 12 or so samples

Argh. I'm tired of this "only (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189954) 12 (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189758) samples (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189765)" bullshit (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189918). You claim ScorLibran is speculating, but you are also speculating badly claiming 12 samples " is too few"  and " doesn't show the whole picture (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189743)", without proof or anything like that. And breaking rule 8 while at it.

I will only accept this kind of criticism after you conduce a public test with at least a hundred samples and prove the results are different and more significant. Anything short of that is pure wild speculation, and should have no place in this forum.

I'm breaking rule #8?  lol.. 
Do you claim that 12 samples is enough then? Ask codec developers if 12 arbitrary samples is all they need for tweaking a codec... or if 12 arbitrary samples will reveal all that there is to reveal from their codecs.

My personal testing wouldn't be comparable to group testing results anyway, that's why there is group testings in the first place, in order to achieve somekind of common trend for the samples tested, and group testing with hundred samples is impossible. But the fact that even you ask this reveals that you have not done much listening testing.
Besides, I'm not claiming that the results would be completely different. Of course 12 sample group test gives indication, like I have said all the time. But with different set of samples the results could be differently emphasized in one way or another.
If you don't believe this, lets arrange otherwise identical test but I choose the samples...
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 23:15:09
Quote
If we took enough ABX test samples and tracks along with perceptual quality scores (as determined by the hundreds of real live humans on HA for example) and fed them into a neural network, it could 'learn' what it is in sound that makes us think it's high quality, and what it is that makes us think it's bad, spitting out a predicted human score for a given track. Perhaps it could even take a source track and recommend a codec and settings at a desired 'quality level'.



www.itu.int (BS.1387)
www.peaq.org
www.opticom.de

And..

www.mp3-tech.org/  -> Programmer Sources - Misc - EAQUAL
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-03 23:19:56
@Ivan:  Thank you very much for the information, and for the very kind words.    And the people I talk to who are too afraid to come here will thank you as well, I'm sure.

@JohnV:  Careful...let's remember TOS#2, m'kay?  Now, before the threats get a little too out of hand (  ), let's forget the formula that apparently created so much turmoil.  Forget the approach.  Forget about bitrate comparisons.  Forget "composite ratings" of the test results.  Forget everything except the one essential question.  (And no, you nor anyone else has addressed this particular item yet, so it's not asking for a "repeat" from anyone...)

How can the efficiency of an AAC file encoded with a VBR setting be measured at the file level?

That's all there is to it.  An answer, or saying "It can't be measured"...either one would be relevant.  And the answer should not require me to be a codec developer, I'm quite sure.  However, you cannot change the definition of "codec efficiency", no matter how hard you try.  If I'm smart enough to break down very complex subjects into language understandable to a layperson "on the fly", at least enough to answer a question they have, then I'm sure you and Ivan are as well.  (That is not at all an insult, but instead a compliment, as I have the highest respect for both of you.)

Everyone, including you, had their chance to provide samples and to provide their input on which codec settings should be used to make the bitrates as close to the test target as feasible.  Now is not the time to cry about your codec's generosity with bits causing its measured efficiency to dive.  Questioning scientifically robust test results after the fact would be delving into pseudoscience, and not what HA is about.  You should avoid adopting slashdot attitudes. 

Thanks in advance for what I'm sure will be an actual answer to my very easy question! 

[span style='font-size:7pt;line-height:100%']Edit:  Found in search results...fixed typos.[/span]
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-03 23:31:38
Quote
How can the efficiency of an AAC file encoded with a VBR setting be measured at the file level?


Very hard without knowing the signal statistics

Ok, think of it like this:

bits_required_to_encode = 0.6 * PE + 24 * sqrt(PE) + X

Where X is the ineficiency  of the encoder implementation and deviation due to signal statistics

So, for each PE you'd get different bits_to_encode

- PE higly depends on content - it can be anywhere between 0 and 3 bits/sample

translate:

- Bit rate higly depends on content - it can be anywhere between
0.6 * ( 0 and - 3 bits/sample ) + 24 * sqrt(0 - 3 bits/sample ) + X


So, what do we do? We scale some parameters to create "presets" that give average bit rate, measured on average content (lot of samples) - howver, the amount of distortion is not linear to the scaling limit.

And.. one small thing - PE depends on psychoacoustic in the encoder, it is not universal
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: 21_already on 2004-03-04 00:20:12
Whoa....

Thanks Ivan 

hehe, too bad i don't have a card to buy a copy of the ITU recomendations for PEAQ/BS1387 (the same thing right?), not that i would understand it. However, from what i gather from the humungous contents page and stuff, it still relies on modeling the principles behind what happens to sound as it makes its way to (and through) the listener, and how their brain then decodes it with a whole lot of non-linear stuff going on. I hope i'm not out of place saying this but i don't think neural networks systems are like this, i.e. the developers don't know how the neural networks are actually doing their job, they just do, the so-called black box. All it takes to get neural networks working are a whole lot of test samples and a lot of processing power. But then i guess if you don't understand how the process is working then it's not really very useful for codec development, only evaluation. Or am i wrong again? i think i attract corrections 

Hmm, ok, so this Program Eaqual lets me test audio and it adheres to the testing standard right? Verry Cool. Actually come to think of it, is there any standard for submitting human test results like this AAC test to a central repository for human audio-perception modelling? (there's a phrase i don't use every day). hehe, the ITU or the ISO or someone could make something like Seti@home 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-04 00:22:45
Quote
I hope i'm not out of place saying this but i don't think neural networks systems are like this, i.e. the developers don't know how the neural networks are actually doing their job


Actually - PEAQ uses neural network to map these "model output variables" to SDG-like quality scale (1.0 to 5.0)

This neural network was trained by using big database of professional listening tests carried for ITU, MPEG, EBU, ...

Even with modeling of nonlinearities in human hearing and by using neural network to train this to match listening test results,  this tool has big flaws - as already discussed many times on HA.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: 21_already on 2004-03-04 00:38:40
I See, Thank you very much for your replies!!! I don't wish to waste ppls time with things said already. Good luck with development! I really appreciate what you guys do! That applies to all the ppl running HA and the testing !!! Thanx
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-04 02:10:59
Ivan,

Thanks for the additional info.  My plan is to learn more about this, and then try (with help, of course) tackling the question of how to determine the efficiency of AAC (and other codecs) using VBR at some point in the future.

I appreciate all your input. 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rpop on 2004-03-04 06:14:39
Quote
But I said that it still can't be said that it is statistically significant (something either is or isn't significant), only that iTunes is a likely winner because the LSD/2 overlap is pretty small.
Still, stricly speaking statistically there is no winner.

I understand that strictly speaking statistically, iTunes cannot be said to have better quality than Nero with any kind of significant confidence. However, the results for the individual samples do indicate to me that, when encoding music from a wide variety of genres at 128kbps, I am more likely to get results which are rated as higher quality with iTunes rather than Nero. Excluding the samples where there was too big of an overlap for either iTunes or Nero to be a clear winner, there were samples where iTunes was rated better than Nero with 95% confidence or more, whereas there were none where the opposite happened.

From this it seems pretty clear to me that, were the AAC section or the wiki to have a "Recommended AAC Settings" thread like the MPC forums, iTunes should be the recommended encoder for 128kbps.

Quote
One has to remember that with a different sample set the results could have been different. 12 samples is pretty close to the practical limit this kind of group test can have, but 12 samples is in no way "enough" considering that many deficiencies of all codecs are not revealed here. This is somekind of average indication, but I think with different set of samples there could have caused more or less difference between the contenders.

I don't think so, mainly because Roberto's first listening test had 11 completely different samples (only Waiting was kept), yet the results were somewhat similar. I'm more inclined to believe that any 12 difficult to encode samples from a wide variety of genres will produce similar results.

Quote
Somebody said that iTunes is more efficient because of bitrate - that's simply not true. The average bitrates of both codecs with the used settings is within +-3 kbps.


I said that. I understand that given the settings used, on a normal sample of music (several hundred tracks), both iTunes and Nero will have the same bitrate (128kbps), and thus it can't be said that one is more efficient in terms of bitrate. However, on problem samples only, such as these short clips, I do feel that iTunes is more efficient, because it uses a lower bitrate than Nero yet still manages to achieve a higher quality more often than not.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-04 10:04:41
Quote
I don't think so, mainly because Roberto's first listening test had 11 completely different samples (only Waiting was kept), yet the results were somewhat similar. I'm more inclined to believe that any 12 difficult to encode samples from a wide variety of genres will produce similar results.


I think there are two "classes" of current AAC encoders

- Stable, that do more or less similar SDG results for many items
- Unstable,  that could sometimes fail badly on some items and sometimes shine.

For example, you could consider FAAC as "Unstable" - because it is able to do really good encoding on some items, and on some critical others (si02.wav, velvet.wav) to fail very badly.

"Stability" is directly related to encoder "maturity" - i.e. amount of development and tweaking done - for example, old LAME builds (3.7, etc..) were "unstable", failed badly on some critical tracks,  while now LAME produces predictable output.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Gabriel on 2004-03-04 10:47:39
Quote
hehe, too bad i don't have a card to buy a copy of the ITU recomendations for PEAQ/BS1387

You should find interesting documents related to this on mp3-tech.org
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-04 10:49:13
Quote
Quote
One has to remember that with a different sample set the results could have been different. 12 samples is pretty close to the practical limit this kind of group test can have, but 12 samples is in no way "enough" considering that many deficiencies of all codecs are not revealed here. This is somekind of average indication, but I think with different set of samples there could have caused more or less difference between the contenders.

I don't think so, mainly because Roberto's first listening test had 11 completely different samples (only Waiting was kept), yet the results were somewhat similar. I'm more inclined to believe that any 12 difficult to encode samples from a wide variety of genres will produce similar results.

Roberto's first test had few more harder samples where for example FAAC took hit pretty badly. The problem with group testing is that the differences are usually quite small so in order to get more significant results, there would have to be few harder samples. The ranking would probably stay pretty much similar, but with more problem samples to reveal codec deficiencies even in the group-test results would have probably widen the gap between the last 3 codecs compared to iTunes and Nero.

Imo 0.34 ITU points difference between averages of iTunes and FAAC is too little, and it would be bigger if there were larger amount of samples (also clearly hard to encode traditional codec tweaking test samples) included.
Now the results stayed clearly within 1 ITU point, and maybe gives more positive  picture on some of the lower ranking codecs than what someone who has tested hundreds of samples would say regarding the amount of bad failure cases of a codec.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-04 11:15:25
Quote
Quote
hehe, too bad i don't have a card to buy a copy of the ITU recomendations for PEAQ/BS1387

You should find interesting documents related to this on mp3-tech.org

It is possible to register and get three ITU recommendations free of charge.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-04 11:39:35
Quote
Atrac, Atrac3 or Atrac3+?

RealProducer 8.5, SonicStage1 or SonicStage2?

See the problem? :/

simple answer:
the one sony uses for its music store (whatever this is  )
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: askoff on 2004-03-04 12:57:12
Quote
So, back on-topic...codec efficiency is measured as quality/filesize. right?  And filesize is determined by average bitrate, right?  Then we hit the wall.  Efficiency is quality/bitrate, yet VBR modes shouldn't be measured from their average bitrate.  Bitrate determines filesize.  Which means the efficiency of an AAC codec in VBR mode should not be measured as quality/filesize?  Then how can its efficiency be measured?  Encoding time?     I'm afraid that's not answering the questions people have about test results such as these.

I must add a comment against your statement. How can you make a mathematical formula if you don't have precise values? Quality is not a fact, it's only variable which is different for everyone. Only way to find most efficient encoder is to test it your self, or trust the statistic what we just did.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: askoff on 2004-03-04 12:58:27
Quote
Quote
Atrac, Atrac3 or Atrac3+?

RealProducer 8.5, SonicStage1 or SonicStage2?

See the problem? :/

simple answer:
the one sony uses for its music store (whatever this is  )

Or the best one from them
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: SBeaver on 2004-03-04 13:22:56
How about tricking everyone next time and just have original uncompressed sounds in all tests and see how good people really are at hearing.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-04 13:24:28
Quote
(...) but with more problem samples to reveal codec deficiencies even in the group-test results would have probably widen the gap between the last 3 codecs compared to iTunes and Nero.

If we really want to see a consequent gap between encoders, without performing another collective test with 100 samples but maybe one or two silly listeners, I suggest to filter the results.
Afterall, there's ~50% of useless results. By useless, I don't mean than these results are wrong or illegitimate (and I want to thank all people for their participation). It's just that some listeners tried to draw a precise hierarchy, partially ruined on the overall result with log files of other people that reveal nothing except that all encoders are ~equally good (which is maybe true in a subjective point of vue, but objectively false).
Imagine that 50 other people had joined this test with results showing a ~total transparency for all encoders, the overall notation of the test would looks like:

- iTunes = 4.88
- Nero AAC = 4.82
- Faac AAC = 4.78
...

All contrast would be lost, because all encoders would tend to 5.0. And the conclusion would be that all encoders sounds eaqual, or very very close each other. From my experience and for some other people, it's completely wrong, because some of us perceived serious differences between different AAC encoders.

Therefore, if we are interested by difference between different implementation, and not only about the degree of transparency for the average listener, I request again filtered results, like ff123 did for the last multiformat test. Filtered results are not less or more scientific, more or less accurate. But low-scorers results (i.e. the more contrasted ones) are the only pertinent material for a comparative results.

Could someone do that (if it's not too much work of course)? I guess that with filtered difference, th gap between encoders will be more significant.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-04 14:40:46
I've found again ff123 graphs, separating two groups of listeners:
http://ff123.net/export/128exten_split.html (http://ff123.net/export/128exten_split.html)

On the bottom, the difference between the winning group (mpc, aac, vorbis, wmaPRO) is approximately the same for both listeners group. But the difference between the new generation and lame is much greater with low scorers listeners than with high scorers.

Seems to be an easy way to have different results.
http://ff123.net/export/128ext/overall_low.gif (http://ff123.net/export/128ext/overall_low.gif)
http://ff123.net/export/128ext/overall_high.gif (http://ff123.net/export/128ext/overall_high.gif)


EDIT:
A good exemple is the macabre.wav sample:
- fully transparent for high scorers:
http://ff123.net/export/128ext/macabre_high.gif (http://ff123.net/export/128ext/macabre_high.gif)
- clear difference for wma9pro, compared to vorbis, mpc, aac - and clear difference again between them and lame:
http://ff123.net/export/128ext/macabre_low.gif (http://ff123.net/export/128ext/macabre_low.gif)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-04 15:42:21
Quote
low-scorers results (i.e. the more contrasted ones)

What exactly should be the filter criterion? A low average rating, or the fact that the listener could distinguish many/all encoded files?

Are the results available in a spreadsheet format?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-04 15:56:56
Quote
Therefore, if we are interested by difference between different implementation, and not only about the degree of transparency for the average listener, I request again filtered results

I am personally interested in the quality each encoder outputs for average listeners. Maybe because I'm an average listener myself and I'm pretty confident I would rate most codecs as transparent in this test if I had participated.


Now, Darryl tried filtering the results, using only results from "sensitive" listeners. He found out that you needed to add or remove just one result to make the overall results change a lot. Since we don't want people messing with the results to get to whatever score suits them better (and believing they are in teh right), we agreed this idea should be left alone.

Quote
[02:35:26] miyaguch@esk: hmm, I added one more listener and the
           results are completely different!
[02:35:35] Leviathan: Haha!
[02:35:42] Leviathan: Dangerous business
[02:35:58] miyaguch@esk: yeah
[02:36:02] miyaguch@esk: maybe not a good idea
[02:37:04] Leviathan: Yeah. People might start building all sorts of
           wacked plots by carefully picking results.
[02:37:13] Leviathan: "If ff123 can do it, I can too" :B
[02:37:55] miyaguch@esk: heheh
[02:39:17] miyaguch@esk: scratch that idea
[02:39:31] Leviathan:
[02:39:39] Leviathan: Some things are better left alone
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-04 16:21:00
Quote
Now, Darryl tried filtering the results, using only results from "sensitive" listeners. He found out that you needed to add or remove just one result to make the overall results change a lot. Since we don't want people messing with the results to get to whatever score suits them better (and believing they are in teh right), we agreed this idea should be left alone.

Here is what happened using sample 1, 8 lowest scoring listeners:

Code: [Select]
Fisher's protected LSD for ANOVA:   0.600

Means:

iTunes   Nero     Faac     Real     Compaact
 4.04     3.41     3.30     2.79     2.62  

---------------------------- p-value Matrix ---------------------------

        Nero     Faac     Real     Compaact
iTunes   0.042*   0.018*   0.000*   0.000*  
Nero              0.704    0.042*   0.012*  
Faac                       0.091    0.029*  
Real                                0.583    



And adding one more listener:

Code: [Select]
Fisher's protected LSD for ANOVA:   0.613

Means:

iTunes   Nero     Faac     Real     Compaact
 3.90     3.46     3.34     2.89     2.78  

---------------------------- p-value Matrix ---------------------------

        Nero     Faac     Real     Compaact
iTunes   0.150    0.074    0.002*   0.001*  
Nero              0.715    0.069    0.031*  
Faac                       0.140    0.069    
Real                                0.715    



And for comparison, here are all listeners who didn't rate all 5s (N=19).  BTW, Roberto included all listeners in his analysis, including everybody who rated all 5's.  This doesn't make a big difference in the comparative ratings, but it does change the means.

Code: [Select]
Fisher's protected LSD for ANOVA:   0.333

Means:

iTunes   Nero     Faac     Compaact Real    
 4.21     4.01     3.88     3.63     3.54  

---------------------------- p-value Matrix ---------------------------

        Nero     Faac     Compaact Real    
iTunes   0.235    0.055    0.001*   0.000*  
Nero              0.452    0.026*   0.007*  
Faac                       0.135    0.048*  
Compaact                            0.616    


So "completely different" means that with a small group, one person's preference makes a big difference.  I don't know if making plots like these are more misleading than not, but the data is available for anybody who wishes to play with it:

http://ff123.net/export/128aac_results.zip (http://ff123.net/export/128aac_results.zip)

Try sticking what you like here:

http://ff123.net/friedman/stats.html (http://ff123.net/friedman/stats.html)

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-04 17:12:45
Quote
I am personally interested in the quality each encoder outputs for average listeners. Maybe because I'm an average listener myself (...)

(...) we agreed this idea should be left alone.

I understand both position.
Nevertheless, I don't think that's average listener results should be used for general conclusions. Look for exemple --r3mix against --alt-preset standard. Which one is better? Could we say that both presets are equals, because 90% of the population couldn't differenciate them? Certainly not. With such reasoning, we would conclude that --r3mix is more efficient than --preset standard: same quality, but smaller bitrate and faster encoding.

HA recommandations were never based on average listeners, but on experienced one. Therefore, I fear that the results of this listening test (based on average scoring) will conduct to wrong conclusions (AAC encoders have ~same~ quality) and to bad recommandantions (use encoder X [understand: the one I'm using], quality is near iTunes one according to the collective test, but it's free/faster/...).

Of course, I'm not saying that you and your test would be responsible of this situation or of wrong conclusions based on it. Collective tests are highly instructive (some old myths are down, and a lot of people discovered that they could find satisfaction with 128 kbps encoding).


ff123> thanks for raw material, but I've not the needed knowledge to exploit it :/
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-04 17:54:55
Quote
I must add a comment against your statement. How can you make a mathematical formula if you don't have precise values? Quality is not a fact, it's only variable which is different for everyone. Only way to find most efficient encoder is to test it your self, or trust the statistic what we just did.

You can't do calculations on "approximate" data?  Sure you can.  As long as everyone understands that the output will only be as accurate as the input.  Since there is a significant subjective aspect to these listening tests, everything we're talking about here is an "approximation".  We could never calculate results of any kind if we forever waitied for "perfectly accurate data".  Perfect accuracy is often unfeasible to wait for, so you define a threshold of accuracy to use instead.  Find an approximation close enough that you're "comfortable" with it.  Sometimes you have to bite the bullet and "go with what you have", as long as "what you have" meets your requirements.  Almost nothing in this world would get done otherwise.  A few years in most any kind of business or professional discipline can make the need for this clear. 

Also, quality with a scope and a baseline is indeed a fact, as long as the scope is clearly communicated.  It requires qualification of statements being made about such "approximated" results, just as we qualify the scope of these listening test results.  We cannot say, for instance, "The results show that iTunes and Nero are the best AAC encoders."  Instead, the correct statement would be, "Based on reports from the test participants, and on the music samples tested, and within the bitrate range tested, the results show ....."

Quote
HA recommandations were never based on average listeners, but on experienced one. Therefore, I fear that the results of this listening test (based on average scoring) will conduct to wrong conclusions (AAC encoders have ~same~ quality) and to bad recommandantions (use encoder X [understand: the one I'm using], quality is near iTunes one according to the collective test, but it's free/faster/...).

That's a good point, but as long as HA uses the same guidelines for marking a particular codec version and setting as "recommended", then the results of public listening tests would not be intrusive at all.  People just need a point of reference to consider such results, to keep them from making the mistake of thinking "Codec B tied for the win in the listening test, so it must be equally recommended as codec A."  Making an encoder and setting recommendation should be coupled with information for the reader covering the reason for a recommendation, so as to avoid this kind of mistake being made by anyone.

That way, anyone who reads test results, and also sees the "recommended" encoder on an HA FAQ page will have a clear understanding, and be able to differentiate the two sources of information properly.  And HA, in turn, will maintain it's standards of using filtered results of only the most sensitive listeners to base recommendations on.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-04 18:39:35
Quote
That's a good point, but as long as HA uses the same guidelines for marking a particular codec version and setting as "recommended", then the results of public listening tests would not be intrusive at all.

Yes, that's true. But in my opinion, there are two differents levels of recommandations:
- recommandations i'd like to call "official" (sticked threads)
- spontaneous recommandations of other users, based on a mix of personal tests, rumours, beliefs, etc...

Don't forget that AAC is something different than mp3 (lame is the only serious solution) or mpc/vorbis (only one implementation). AAC have a lot of various and combative implementation. The AAC users are also divided: faac supporters (open-source) - QuickTime (quality) - Nero AAC (mix of quality and sympathy) and also some compaact! fans. An official recommandation would be very explosive...

I'm maybe wrong, but in my souvenirs there is no "official" statement of HA about recommanded encoder/settings for AAC. There is one for lame, another for mpc, and a last one for vorbis. Not for AAC. (But I don't think that we need any recommanded AAC encoder: evolution is very fast, and constantly ruining our certitude). In other word, all recommandations on HA are spontaneous. And the collective tests are an important source for people's opinion about quality.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-04 18:57:18
Quote
HA recommandations were never based on average listeners, but on experienced one.

This is not only a true statement but IMHO one of the existing reason of HA forums.
Quote
Therefore, I fear that the results of this listening test (based on average scoring) will conduct to wrong conclusions (AAC encoders have ~same~ quality) and to bad recommandantions (use encoder X [understand: the one I'm using], quality is near iTunes one according to the collective test, but it's free/faster/...).

This attitude should be avoided. If i would base a recomandation  only with my results i think i will exclude FAAC, Compaact! and Real nowdays. I've usually found FAAC annoying for my tastes. Here is the complete table: http://xoomer.virgilio.it/fofobella/aac128v2.png (http://xoomer.virgilio.it/fofobella/aac128v2.png)

I appreciate very much ff123 analisys, even if unfruitful. Now my humble opinion is that there is no chance to get a plausible results from the scores because if we take all listeners there is a problem of "levelling", if we take few (low scorers) listeners there is the problem of too high sensitivity.
We talked many time of the inopportunity to conduct an high bitrate PUBLIC listening test, with modern codecs @128 kbps i think we are near the limit of significance.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-04 19:24:08
Quote
,Mar 4 2004, 07:57 PM] We talked many time of the inopportunity to conduct an high bitrate PUBLIC listening test, with modern codecs @128 kbps i think we are near the limit of significance.

Unfortunately, it seems to be true.
But there is one difference. With 128 kbps listening tests, there is still a part of the community able to distinguish existing differences between many encoders, even on non-problematic samples. Whereas at higher bitrate, it's more difficult and most people need difficult samples, and sometimes killers only.
For me (and for you too I suppose), it's surely hard to read that differences are not significant. Because we have experienced the contrary. And we want warn the whole community that a big difference exist between iTunes-Nero and Compaact-Real-faac (and a small one between iTunes and Nero).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-04 19:27:00
Quote
And we want warn the whole community that a big difference exist between iTunes-Nero and Compaact-Real-faac (and a small one between iTunes and Nero).

Bollocks. This big difference exist for you, not for the average community member or test participant.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-04 19:35:38
Quote
Bollocks. This big difference exist for you, not for the average community member or test participant.

Not yet.
Three years ago, I wasn't able to hear a difference between a CD and an AudioActive encoding at 128 kbps with full stereo. I was happy to follow the "overkilled" recommandations of experienced listeners or testers.
If people are requesting so often advice on encoding, it's not because they can't compare themselves two files, but because they don't trust their ears. They hope something else than the statement of their own absence of differenciation.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-04 19:58:48
Quote
Quote
And we want warn the whole community that a big difference exist between iTunes-Nero and Compaact-Real-faac (and a small one between iTunes and Nero).

Bollocks. This big difference exist for you, not for the average community member or test participant.

That would depend on samples pretty much. With more extreme samples average members could hear more differences.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-04 20:06:31
Quote
Bollocks. This big difference exist for you, not for the average community member or test participant.

The fact is that the difference exist, moreover some listeners (according to blind tests!) think that this difference is far from being unsignificant. I can understand guruboolez, i have difficulties to read that the codecs gives all good results because i've experimented the contrary. If you see, for example, my scores for FAAC you can understand how difficult (impossible) is for me comparing it with iTunes. I don't think i have a staordinary hearing, maybe with time (and training) some people could find annoying some encodings.

That said, i really think that the the test was very useful because it gave to each listener the opportunity to make himself an idea of the quality of AAC encoders
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-04 20:07:50
Quote
With more extreme samples average members could hear more differences.

yes, but what is the percentage of extreme samples in people's collections? I personally don't have any of the tunes where problem samples are extracted from, except from Waiting and Applause (which is from Eagles' Hell Freezes Over) That probably accounts for less than 0,01% of my collection.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-04 20:12:04
Quote
,Mar 4 2004, 05:06 PM] The fact is that the difference exist, moreover some listeners (according to blind tests!) think that this difference is far from being unsignificant. I can understand guruboolez, i have difficulties to read that the codecs gives all good results because i've experimented the contrary. If you see, for example, my scores for FAAC you can understand how difficult (impossible) is for me comparing it with iTunes. I don't think i have a staordinary hearing, maybe with time (and training) some people could find annoying some encodings.

And your point is...?

I never claimed the difference is unsignificant.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: JohnV on 2004-03-04 20:23:20
Quote
Quote
With more extreme samples average members could hear more differences.

yes, but what is the percentage of extreme samples in people's collections? I personally don't have any of the tunes where problem samples are extracted from, except from Waiting and Applause (which is from Eagles' Hell Freezes Over) That probably accounts for less than 0,01% of my collection.

Depends on the music, though quite hard to encode sections can be found from practically all CDs. It's just matter of screening these sections for listening tests.
You can't say any percentage really, because it varies how hard the hard section really is.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-04 20:26:10
Quote
Quote
And we want warn the whole community that a big difference exist between iTunes-Nero and Compaact-Real-faac (and a small one between iTunes and Nero).

Bollocks. This big difference exist for you, not for the average community member or test participant.

My personal concern is that given time an average listener will begin to hear the differences

When I first started using MP3 back in 95 or so, I couldn't hear the difference... maybe at 112 I could...

anywho... over time without even *trying* to improve my discernability MP3 artifacts and problems grew more and more annoying...

Will I become more attuned to AACs problems over time?

I don't know

(probably, it happened with mp3...)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-04 20:27:31
Quote
Quote
Bollocks. This big difference exist for you, not for the average community member or test participant.

Not yet.
Three years ago, I wasn't able to hear a difference between a CD and an AudioActive encoding at 128 kbps with full stereo. I was happy to follow the "overkilled" recommandations of experienced listeners or testers.
If people are requesting so often advice on encoding, it's not because they can't compare themselves two files, but because they don't trust their ears. They hope something else than the statement of their own absence of differenciation.

Exactly

People want a guru to tell them its okay
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-04 20:29:30
Quote
yes, but what is the percentage of extreme samples in people's collections? I personally don't have any of the tunes where problem samples are extracted from, except from Waiting and Applause (which is from Eagles' Hell Freezes Over) That probably accounts for less than 0,01% of my collection.


Again, AAC is considered to be "state of the art" when the number of latest audio algorithms and minimum bit rate for acceptable quality is counted - this means that we need to test AAC encoders in extreme conditions to check which one good or not.

MPEG uses set of samples with very extreme conditions (very hard and isolated attacks with reverbation, complex signals that have very rich spectrum and difference in stereo image, isolated hi-fi speech samples with harmonic structure especially bad for transform coders, etc..)  - and there is a good reason why they use them in algorithm testing.

For example, on few of these samples FAAC would fail badly - which means that it is, say, bad at isolated pre-echo protection.  QT or Nero, on the other hand, perform quite stable.

You wouldn't expect sudden drop in quality with a good AAC encoder.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: [proxima] on 2004-03-04 20:38:55
Quote
And your point is...?

I never claimed the difference is unsignificant.

I've already said what i think of your conclusion with my first reply in this thread.
I'm not criticizing the results but some comments, i'm afraid that some quality demanding people could choose this or that encoder. That's all.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Ivan Dimkovic on 2004-03-04 20:45:34
Ok, to support my claim I've encoded MPEG's si02.wav with

FAAC
QT
FhG pro
Nero
Compaact

You will see that FAAC has the largest deviation - this clip breaks FAAC's pre-echo control.

I'm not saying that FAAC is bad - far from that,  just that it has issues that one tuned AAC encoder shouldn't have - quality must not drop to this level at 128.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-04 20:48:18
Quote
For example, on few of these samples FAAC would fail badly - which means that it is, say, bad at isolated pre-echo protection.  QT or Nero, on the other hand, perform quite stable.

You wouldn't expect sudden drop in quality with a good AAC encoder.


Perhaps what is needed in future tests are something like 20 samples, 8 of them being problem samples, and 12 of them being normal music.  Based on the number of participants in this test, a future test could support a 50% increase in the number of samples and keep the listener load the same (i.e., each listener could still listen to 12 or fewer samples).  Of course, if there aren't enough listeners, the whole test might be at risk of not getting enough data on any sample.  Just a thought.

That would help to shore up the biggest weaknesses in this type of test, which IMO, is the number of samples and the selection of them.

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-04 21:24:27
Quote
Perhaps what is needed in future tests are something like 20 samples, 8 of them being problem samples, and 12 of them being normal music.  Based on the number of participants in this test, a future test could support a 50% increase in the number of samples and keep the listener load the same (i.e., each listener could still listen to 12 or fewer samples).  Of course, if there aren't enough listeners, the whole test might be at risk of not getting enough data on any sample.  Just a thought.

That would help to shore up the biggest weaknesses in this type of test, which IMO, is the number of samples and the selection of them.

ff123

I think this is a great idea.

Perhaps there could be a very simple front-end for the download.  To prevent most people from just downloading the first 6-10 samples, leaving several samples in the cold in terms of having enough results to be statistically valid, a tester would connect to the DL site, and be asked in a dialog how many samples they'd like to test.  They enter a number, and a zip file is created with a random set of that many samples and is downloaded to them.  This would spread out the testing across all samples evenly.

Alternately, here's a solution that would require less additional work to prepare, but wouldn't actively enforce a balance of testing efforts across all samples.  If the samples were listed on the download site with values shown for each sample specifying how many times each has been downloaded, people could then (hopefully) download the samples with the lowest hit rates.

[wishful thinking]
And if tester participation increased over time, we could gradually increase the size of the sample set, to improve the validity of the results of tests conducted in the future (by broadening the scope of tested samples).
[/wishful thinking]

Just a thought... 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-04 21:40:04
Quote
I think this is a great idea.

Perhaps there could be a very simple front-end for the download.  To prevent most people from just downloading the first 6-10 samples, leaving several samples in the cold in terms of having enough results to be statistically valid, a tester would connect to the DL site, and be asked in a dialog how many samples they'd like to test.  They enter a number, and a zip file is created with a random set of that many samples and is downloaded to them.  This would spread out the testing across all samples evenly.

My thought was that there would be a script which asks the user to check off which samples he has already listened to, which would then randomly return the next sample.

I also think that each of the samples should have exactly the same length (20 seconds?), so that bitrate comparisons are more accurate.  It might be possible to make the 12 normal samples more or less average the target bitrate for most of the codecs (I realize that we can't do this exactly if there are multiple vbr codecs) and to not constrain the problem samples at all.

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-05 07:17:11
No, don't force samples on the user!

One advantage of having more samples is, that there would be more I like listening to. These samples are not only more pleasant to my ears, they also cause much less fatigue. I figure, most people can find artefacts more easily in their favourite genre.

Putting a number next to each sample is a far better idea IMHO.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-05 07:26:32
Quote
No, don't force samples on the user!

One advantage of having more samples is, that there would be more I like listening to. These samples are not only more pleasant to my ears, they also cause much less fatigue. I figure, most people can find artefacts more easily in their favourite genre.

Putting a number next to each sample is a far better idea IMHO.

Well, the idea is to more or less level out the number of listeners per sample.  But I suppose people can always just skip the ones they don't like or fail to submit results files for ones they have trouble with.

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Continuum on 2004-03-05 08:51:43
Quote
http://ff123.net/export/128aac_results.zip (http://ff123.net/export/128aac_results.zip)

I made a graph (http://stud4.tuwien.ac.at/~e0025119/audio/AAC-128-v2/sample06-distribution.png) showing the distribution of the results for sample 6. It seems there are 4 Nero-haters. FAAC got very good (most transparent) and very bad scores (Lowpass?).

Guru, what kind of visualization do you have in mind?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: robUx4 on 2004-03-05 09:33:42
I think a test of the quality encoders should always be on problematic samples only. What is the point of testing samples where all encoders have no problem and sound more than good ?

IMO you should call the test : "Lack of quality testing" and not "Quality testing". Because in the end what people want is to know which encoder will give them a good result most (if not all) of the times (including the problematic ones)... So what Ivan said is good : Quitcktime and Nero will always give you "good" results. The other ones are not mature enough yet.

If you had a lot of non-problematic samples, you just hide the extreme cases. And the difference between encoders will not be very to determine.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-05 11:18:08
Quote
I think a test of the quality encoders should always be on problematic samples only. What is the point of testing samples where all encoders have no problem and sound more than good ?

because normally people will not use the codecs with problem samples

as also normal samples make differences for the output quality i think its necessary to test normal samples

testing problem samples only helps the developers but not the consumers, as it allows them to find out where they have problems

also testing problem samples leads to the problem which samples to use, without biasing into a specific direction
this way its much more luck to really find out which codec is better than another

to sum it up:
i, as a consumer, am interested in how codecs do in 80-90% of the cases and not how they do on some problem samples, i will propably never stumble over anyways
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: robUx4 on 2004-03-05 11:57:58
As I do have some of the samples of this test in my "collection" I do care to have them correctly reproduce. And I will never trust a codec that is known for having drawbacks on some kind of files.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-05 12:03:49
yes of course its nice to have a codec which can also handle problem sample right

but by simply testing problem samples you will never know if you get representative results for the non problematic samples, which are surely the majority
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: XXX on 2004-03-05 12:16:27
Torture track, and it's only a dozen or so seconds.

Joan Osborne
One of Us

The beginning, where she sounds like a kid recording herself on a mono, push two levers at a time to record tape deck.  FAAC truly fails bad here, even at 128.  The left channel sounds like it's been through a washing machine, then a hand-cranked ringer.  Psytel Enc managed to get by.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-05 12:17:50
Continuum> something like that:
http://ff123.net/export/128exten_split.html (http://ff123.net/export/128exten_split.html)

robUx4>there's an espistemological problem, if you select problematic samples only. Some encoders perform very well on hard-to-encode samples, and badly on innocent ones. See Compaact! for exemple. Clear winner on Velvet, but a lot of trouble on common signals.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ScorLibran on 2004-03-05 15:04:17
Quote
As I do have some of the samples of this test in my "collection" I do care to have them correctly reproduce. And I will never trust a codec that is known for having drawbacks on some kind of files.

All lossy formats will have at least a few problem samples that even their highest settings will not encode correctly.

If you must have 100% accuracy, then lossless is the only solution.


Regarding the issue of testing problem samples vs. "normal" samples... I still like ff123's idea for addressing this.  8 problem samples, and 12 samples that haven't previously shown to be hard-to-encode.  This cross-section, in my opinion, would more thoroughly represent the encoding performance of each format as opposed to testing only one type of sample.

The crux lies, of course, with having enough test participants that you'll end up with a large enough results set of each sample for statistical validity.  Or perhaps if only an evenly-selected sub-set had enough results, that would be OK as well...5 problem samples and 7 normal samples, for instance?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-06 04:10:47
http://ff123.net/export/128aacv2.html (http://ff123.net/export/128aacv2.html)

Shows ratings of the overall group vs. the ratings of the lowest raters.  Looks remarkably similar, I'd say.

ff123
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: harashin on 2004-03-06 04:39:44
Quote
http://ff123.net/export/128aacv2.html (http://ff123.net/export/128aacv2.html)

Shows ratings of the overall group vs. the ratings of the lowest raters.  Looks remarkably similar, I'd say.

ff123

Those graphs are located in your local folder.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: ff123 on 2004-03-06 04:42:38
Quote
Quote
http://ff123.net/export/128aacv2.html (http://ff123.net/export/128aacv2.html)

Shows ratings of the overall group vs. the ratings of the lowest raters.  Looks remarkably similar, I'd say.

ff123

Those graphs are located in your local folder.

Oops.  Hopefully it's better now.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: harashin on 2004-03-06 04:45:30
Yes, no problems now.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-06 09:33:58
Quote
http://ff123.net/export/128aacv2.html (http://ff123.net/export/128aacv2.html)

Shows ratings of the overall group vs. the ratings of the lowest raters.  Looks remarkably similar, I'd say.

ff123

assuming that nero stays the same, for the low raters itunes gets better, the "third placers" get worse (and in there compaact also gets worse)

but all in all the results are not that different
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: guruboolez on 2004-03-06 11:42:31
Quote
http://ff123.net/export/128aacv2.html (http://ff123.net/export/128aacv2.html)

Shows ratings of the overall group vs. the ratings of the lowest raters.  Looks remarkably similar, I'd say.

ff123

Thank you very much. The difference between encoders is not really different. But now, there are three encoders close to the 3.0 line, and among them two are inferior to this value. It means than the half (?) of the scorers have troubles with them (2.80 should mean something between irritating artifacts and slightly irritating artifacts).
It's very far from the overall maturity claimed in Roberto's conclusion (this in not intended to be offending).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Stux on 2004-03-06 13:25:07
Quote
because normally people will not use the codecs with problem samples

I'm not so sure about that

every body of music will have sections which can be considered problem samples, in fact, the problem samples in these tests just come from experienced users listening to their music and hearing artifacts and deciding the given passage would make a good problem sample (I assume).

even if 80-90% works fine... that still leaves 10-20% which doesn't  which is a helluva lot... 2 tracks per album

80-90% of the time an AAC encoder will be fine... so what people really want to know is if a given AAC encoder will be fine 98% of the time... or 99% of the time... or 99.9%... or 100% etc

10% failure rate ain't good enough
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: Jojo on 2004-03-07 15:23:24
so what would be the logical conclusion to that? Using iTunes AAC with a bitrate of 160kbps; just to make sure  I mean, in this case, every test must be done with the original sample. If nobody could tell the difference between the original source and the AAC sample, it would be safe to use in real life, right? However, I think this test wasn't done to prove that, but rather to show what codec is best at this bitrate. So that people who usually use this codec at this particular bitrate will know what to choose. At a different bitrate iTunes could technically be the worst; which I doubt
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: jido on 2004-03-08 09:36:28
I do not agree with bond.

I, as a consumer, am interested in how the codecs fare on hard-to-encode samples. Maybe in the 64kbps test, where you can grade codecs by level of annoyance even with an easy sample, it would make sense to include easy ones. But at 128kbps I am expecting some quality, so the codecs should do their best on the hard parts.

On the other hand, I feel that some of the samples included here did not have many challenging parts (I spent quite a lot of time trying to ABX BigYellow, for example). I wouldn't mind including samples which are generally easy, but have at least one difficulty -- that should satisfy both sides.

Providing more samples and showing download counters seems a good idea. We would need to see it in practice. It would allow the organiser of the test to give directives to participants: not enough people working on this sample, I will need to drop it if no more people do it, etc.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 09:58:23
Quote
I, as a consumer, am interested in how the codecs fare on hard-to-encode samples.
But at 128kbps I am expecting some quality, so the codecs should do their best on the hard parts.
On the other hand, I feel that some of the samples included here did not have many challenging parts

yes, the current test didnt include any specific hard to encode samples, chosen for that reason, and still there are significant differences between the codecs

choosing hard to encode samples will not be able to give such significant results cause than the chance that some codec is rated better has more to do with hazard than with how it will do in most of the cases
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 15:55:15
Quote
yes, the current test didnt include any specific hard to encode samples

Velvet.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 15:57:14
Quote
chosen for that reason

?
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 18:33:44
Quote
?

Yes, velvet was chosen because it's a traditional problem case.

http://lame.sourceforge.net/gpsycho/quality.html (http://lame.sourceforge.net/gpsycho/quality.html)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 18:39:44
oki, but still it doesnt change my opinion that only testing problem samples will have problems bringing significant results

anyways i am sure you will make a good decision
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 18:59:51
Quote
oki, but still it doesnt change my opinion that only testing problem samples will have problems bringing significant results

anyways i am sure you will make a good decision

The next test will have two traditional problem samples at most. No point testing more than that, the test loses significance and usability.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 19:24:12
did you already decide on atrac?

from what i read on sony's hp i think testing atrac3 from SonicStage Version 1.5.53 (seems to be the latest recommended for minidiscs/atrac-cd) should be the best

would be really interesting imo to have a new codec tested and also thinking about sony plans for an atrac music shop
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 20:06:54
Quote
did you already decide on atrac?

The discussion on the 128kbps Multiformat test will start soon.

And the test will probably start by April 7th.

The 5 main codecs are already set on stone (iTunes, Musepack, Lame, Vorbis and WMA standard) so the discussion will be wether to use Atrac or some sort of anchor. Since I'm personally tired of people claiming some of my tests are bad because no anchor was featured, I vote for anchor.

BTW: No more than 6 codecs, of course.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 20:09:45
what about dropping mpc as it didnt change since the last test? also its good to have one great, hardly to abx codec less
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 20:11:47
Quote
what about dropping mpc

Definitely not. It was the codec that performed best (although, for the record, it didn't win!), so it should be the parameter against which others should be compared.

Besides, Apple AAC barely changed since last test, and Vorbis only underwent some very small unofficial tunings by enthusiasts. So that argument isn't very strong.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 20:14:11
hm not much left to decide than anyways 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 20:15:45
Quote
hm not much left to decide than anyways 

Yes, I decided to become less democratic since the horrendous fiasco of the AAC test (3 polls!).
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 20:18:17
hm thinking about it again this would mean that the only reason this test is done is to see how wma9 std does
not really exciting

i vote for taking a new codec in 
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 20:26:07
Quote
hm thinking about it again this would mean that the only reason this test is done is to see how wma9 std does
not really exciting

http://www.hydrogenaudio.org/forums/index....ndpost&p=189358 (http://www.hydrogenaudio.org/forums/index.php?showtopic=19190&view=findpost&p=189358)
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 20:31:28
hm it was your argument that vorbis and itunes didnt change that much not mine 

either way i am highly interested in this test
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-08 20:33:09
Quote
hm it was your argument that vorbis and itunes didnt change that much not mine 

I am confident they changed very few. It's now up to you guys to take this information and decide if a test is justifiable or not.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: bond on 2004-03-08 20:35:51
comparing the codecs to wma9 std is imo already reason enough for a test (simply to make once and for all clear how "good" wma9 and all the music stores using it are)

i think the test can be labelled as a comparison of music stores, and therefore it would make sense to take atrac3 in imho
as i said information is maybe the most important thing for a consumer in a market and nothing is bringing more light into the codec jungle than your tests, rjamorim

and personally i am not interested in mpc (and i think also not many people outside this community, sorry to say it that way) but in vorbis
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: duartix on 2004-03-10 18:14:24
My vote would be for a lower bitrate test.
IMHO it would be much more usefull.
Title: AAC at 128kbps v2 listening test - FINISHED
Post by: rjamorim on 2004-03-10 18:49:39
Quote
My vote would be for a lower bitrate test.
IMHO it would be much more usefull.

A dial-up bitrate (48kbps) test is scheduled for May.