15 years ago I made a huge listening test at ∼130 kbps (https://hydrogenaud.io/index.php?topic=38792.0): that guided me to Apple’s AAC for my portable usage. Results are now totally outdated. AAC was a state-of-the-art audio format in 2005, but it’s now an old one. In the meantime, OPUS was born and immediately appeared as the most efficient encoder in public listening tests (https://listening-test.coresv.net/results.htm). And for my own needs (I mainly listen to classical music), xHE-AAC already shown its strength over OPUS at 64 kbps! (https://hydrogenaud.io/index.php?topic=119333.0).
My knowledge was never properly updated since this big test. I made a few sparse blind tests but the results were not helpful. As I no longer knew what to use I often kept untouched lossless encodings for my digital devices these last years, or some high-bitrate AAC encodings to reduce size a bit—and I was even caught sending high resolution lossless files at 3000 kbps on my phone!
Of course, storage space is obviously not the same issue in 2020 than in 2005. While my small DAP and then iPod Classic were entirely filled with tiny MP3 and AAC files only, my memory cards must now host HD movies, complete series, Netflix’s cache, hundreds of 12 to 48 megapixels pictures, games and applications. More storage today, but so much digital stuff to manage… For this reason, I may be inclined to stick again with an efficient audio encoder and save space rather than lazily opt for lossless or an unnecessary high bitrate setting. But that means going back to work and doing a big blind listening test again :)
The questions I’m trying to answer are: what is the best suitable format for my own needs, and what are the most efficient and still enjoying settings? How far can I go below ∼130 kbps and still stay close to transparency?
For this test I compared AAC and OPUS at different bitrates: around 80, 96, 112 and 128 kbps. I even added AAC at ∼144 kbps. And to these nine encodings I also add a low anchor. That makes ten encodings per sample! Ideally, I’d like to put exhale xHE-AAC in the same pool but that would make too many things to test at the same time. Moreover, as I said, this test has a “practical” purpose and at the moment xHE-AAC is not really usable on an Android/portable platform. I’d nevertheless try a similar comparison with Exhale next year.
The full picture will consist in a 150 samples listening test: 75 samples for classical music
(a set I already used this year for a multiformat test at 64 kbps (https://hydrogenaud.io/index.php?topic=119333.0) and for a 128 kbps AAC test (https://hydrogenaud.io/index.php?topic=120062.0)) and 75 additional samples for other kind of music
. The first part is already completed. I expect to perform the second part in the next few weeks. In the meantime, I decided to reveal the results of the 1st round.
These samples are well-known musical moments or parts I really enjoy. Some are very easy to encode and could be transparent even at 64 kbps, some others are much harder. I have deliberately chosen to not overrepresent problem samples and stay closer to average music I listen to on a daily basis
The tested samples are available >here
ENCODERS AND SETTINGS
The last official releases of Apple AAC and OPUS were used:
• Apple AAC 184.108.40.206
: -q36, -q45, -q54, q64, -q73 [expected bitrate: 80, 96 ,112, 128 and 144]
• Opus 1.31
from official website: --bitrate 80, 96, 112, 128
• FDK 4.0.0 HE-AACv2
at CBR 32 kbps as low anchor
I used True VBR (TVBR) mode instead of Constraint VBR (CVBR) for both formats (TVBR is recommended for both (>source (https://developer.apple.com/library/archive/technotes/tn2271/_index.html)< and >source (https://tools.ietf.org/html/rfc6716#section-2.1.1)<). I choose Apple’s AAC implementation over Fraunhofer’s ones according to the final result of my previous listening test (https://hydrogenaud.io/index.php?topic=120062.0).
Before beginning this test, I had to measure the average bitrate of each VBR setting. From experience I know that average bitrate may vary from one musical genre to another: the average values I get with my classical library is radically different from other musical genres. From ∼580 kbps with classical music and FLAC the bitrate often goes beyond 1000 kbps with louder hard rock/metal music. While variations are not so huge with lossy, they exist and may affect the comparison’s equity.
Therefore, I built a large test library of more than 240 albums: 91 classical music albums, a personal selection of 25 jazz albums, 25 for metal, 25 for electronic. I also add all 50 best-selling albums in USA (https://www.businessinsider.com/50-best-selling-albums-all-time-2016-9?IR=T) and the 25 first discs from the Billboard hottest sales in the last decade (https://www.billboard.com/charts/decade-end/hot-100). My data now includes a lot of popular stuff and is therefore more reliable and—I hope—more useful than ever. The whole set now makes 255 CDs for 247 hours of music.
A full detailed list is available as a work-in-progress datasheet
For more comfort, here is a summary of the tested settings
with 247 hours of music:
|GENRE ||AAC q36||AAC q45||AAC q54||AAC q64||AAC q73||OPUS 80||OPUS 96||OPUS 112||OPUS 128|
|AVERAGE||76.7 kbps||96.6 kbps||111.8 kbps||126.4 kbps||143.9 kbps||83.5 kbps||100.4 kbps||117.1 kbps||133.8 kbps|
|CLASSICAL ||75.6 kbps||96.8 kbps||112.7 kbps||127.2 kbps||142.2 kbps||85.8 kbps||103.4 kbps||119.3 kbps||136.8 kbps|
|JAZZ ||80.8 kbps||100.5 kbps||115.7 kbps||131.0 kbps||149.4 kbps||85.4 kbps||102.1 kbps||119.3 kbps||135.8 kbps|
|METAL ||76.0 kbps||96.6 kbps||111.7 kbps||126.7 kbps||145.5 kbps||77.1 kbps||93.3 kbps||109.2 kbps||125.2 kbps|
|ELECTRONIC||77.9 kbps||97.1 kbps||112.1 kbps||126.5 kbps||144.2 kbps||87.5 kbps||105.5 kbps||123.2 kbps||140.7 kbps|
|50 BEST-SELLING||76.6 kbps||96.6 kbps||111.7 kbps||126.7 kbps||145.4 kbps||81.9 kbps||98.5 kbps||115.2 kbps||131.7 kbps|
|25 BILLBOARD||73.4 kbps||91.6 kbps||106.0 kbps||119.4 kbps||136.3 kbps||83.1 kbps||99.7 kbps||116.4 kbps||132.8 kbps|
As you can see, bitrate is globally comparable from one setting to another but OPUS uses 4 to 7 kbps more at similar VBR settings. Unlike Apple’s AAC, OPUS VBR settings can be finely adjusted and can match more closely its competitor. But I decided to keep the most usual values and not lower OPUS bitrate (e.g --bitrate 128
instead of –bitrate 122
). This light overweight also convinced me to add AAC at ∼144 kbps which is sometimes close to OPUS 128 (142 kbps for AAC and 137 for OPUS for classical music, and 144 vs 141 with electronic music). I also expect OPUS to be more efficient than AAC and comparing it at ~130 kbps vs AAC at 144 kbps makes sense to me.
|LOW ANCHOR||AAC q36||AAC q45||AAC q54||AAC q64||AAC q73||OPUS 80||OPUS 96||OPUS 112||OPUS 128|
Beautiful graph, isn’t it :)
I start with an issue: the venerable ff123’s Friedman analysis tool only work up to 8 competitors. Fortunately, kamedo2’s brilliant graph generator (https://listening-test.coresv.net/graphmaker6.htm) handle this without an issue! I simply draw some lines in a picture editor to see better the confidence limit at the top of the graph:
→ AAC 144
is better than OPUS 128
and AAC 128
→ AAC 128
is better than OPUS 112
and AAC 112
→ OPUS 128
is still tied to OPUS 112
and AAC 112
from a statistically point of view (95% confidence is not reached but seems to be between 90 and 95% so I guess it’s safe to claim than OPUS 128 is superior to OPUS 112 and AAC 112).
AAC performs surprisingly well here with classical music. Except for 80 kbps, Apple’s AAC encoder offers comparable performance than OPUS for my taste and for these 75 first samples dedicated to classical music
. And AAC gets this excellent performance with a slightly lower bitrate! I must say that I’m a bit disappointed by OPUS which I expected to perform better, but not totally surprised either: I already noticed that OPUS at 64 kbps was much less impressive with classical than with other musical genre (https://hydrogenaud.io/index.php?topic=119333.0) (score: 3.31 for classical and 4.06 otherwise). I’m quite sure that OPUS will end with a better score OPUS (and AAC with a lesser one) with the second group of samples.
What also surprises me is AAC 144 kbps being statistically better than AAC 128. The latter being close to transparency on almost everything on daily playback I couldn’t imagine a constantly better performance. But those extra bits are most often noticeable—on direct comparison at least. I’m sure that this difference should be much more subtle on daily listening but it can’t be fully ignored. From a practical point of view, AAC -q73 (≈144 kbps) and OPUS 128 (≈137 kbps here) have quite the same bitrate but AACs score is 0.25 higher. In other words Apple’s AAC can appear a bit more efficient than OPUS at mid bitrate—with classical music at least..
This part of the test also shows how quickly and strongly AAC suffers from bitrate starvation. While both OPUS and AAC get the exact same score at 112 kbps, AAC starts to show some limits at 96 kbps and then quickly drops from that point. The gap between 96 kbps and 80 kbps is huge with AAC (from 3.94 to 3.01). And during my previous test at 64 kbps with the same samples LC-AAC felt very deep, at 2.20. So OPUS is clearly much stronger at lower bitrate than LC-AAC and is considerably more acceptable at 80 kbps… and even at 64 kbps according to my previous test (http://was [url= https://hydrogenaud.io/index.php?topic=119333.0) (score=3.31 at 64 kbps).
For the anecdote, I checked the results of kamedo2’s test. He tested different AAC encoders (FDK, FAAC and ffmpeg) and different samples, so it shouldn’t be strictly comparable. The gaps he got between 96 and 128 kbps are:
• FAAC = 0.56 (3.76-3.20)
• ffmpeg = 0.47 (3.44-2.97)
• FDK = 0.78
and my result with Apple’s AAC:
• Apple = 0.72
The gap between two advanced and mature AAC encoders (FDK and Apple) is nearly the same and is rather huge. It confirms that AAC-LC sweet spot seems to be higher than 100 kbps and probably lies between the 112 and 128 kbps range.
From an audible point of view, there is not much to say at 112 and 128 kbps: both OPUS and AAC are very similar and don’t have strong flaws. The differences are easier to catch below. Smearing is sometimes higher with AAC, sometimes higher with OPUS. AAC suffers from traditional lossy flaws which are most often obvious at 80 kbps: distortions, like missing matters in the spectrum, sounding sometimes like noise reduction filters. It’s very “MPEGish” I would say because it’s very similar to MP3—and from my souvenirs to WMA which doesn’t come from MPEG. On the other side OPUS has much less of these artifacts. The encoder replaces details with noise, giving often a coarse, hissy sound. At 80 kbps it sometimes feels a bit narrower, with some alteration of the stereo imaging. Sound is therefore confusing and a bit tiring. But at 80 kbps those issues are less irritating than what AAC produces. At 96 kbps AAC produces a cleaner output but could suffer from other distortions. I suspect OPUS algorithms to be more efficient with louder music so I have to go back to ABC/HR and test the 75 other samples to confirm it or not :)
Few words for the low anchor: it’s HE-AAC with parametric stereo. I don’t know yet how it sounds with pop/rock music but with classical the main issue is stereo imaging. It’s out of phase, wrong, giving headache in less than 30 seconds. I often checked the jack of my headphone, thinking it was partially out. I suspect it gives a better illusion without headphones and on a small device like a radio.
To answer my questions: now what can I use for daily listening
Opus 96 seems very efficient and it ends with a score slightly higher than 4 (4.06), which literally means perceptible but not annoying
. It should be good enough, no? Well, I’m not so sure. Let’s take another look to the results, and put some markers on the graph:
• With OPUS --bitrate 96, 36 samples on 75 were rated ≤4.0. In other words, 48% of the tested samples showed some real annoyance. It’s not very good and it confirms a basic experience: listen to a (classical) disc encoded at 96 kbps, and audible distortions and artifacts are constantly altering the pleasure.
• At 112 kbps, both OPUS and AAC end with the exact same score. 26 samples on 75 encoded with OPUS 112 were still more or less annoying (34% of the tested samples). AAC at the same bitrate is slightly worse: 29 samples rated ≤4.0 (39%). Again, it’s not satisfying.
• 128 kbps encoding is more reassuring. Both Opus and AAC suffered a bit with 14 samples on 75 (still, 19%…). 3 samples only are between 3.00 and 3.50 with Apple’s AAC while OPUS has still 8 samples in the same case. In other words, Apple’s AAC -q64 has a better score and less strong issues (and a smaller bitrate: -7,5 kbps).
• AAC 144 kbps offers an interesting performance: it only scores 4 times at 4.00 or below and the lowest score is 3.7. This performance is much closer to what I expect to be an annoyance-free setting. Final score of 4.85 is also quite high!=> so my choice would be either AAC at 128 kbps or AAC at 144 kbps
One more thing: how did subjective performance change in 15 years
? My hearing is probably not so good anymore, the samples database is also different, and the encoder have improved. Direct comparison between two tests is therefore questionable. The 2005 graph for classical music (150 samples) is the following one:
Apple's AAC was rated at 4.22. Remember, the encoder just implemented VBR (which seemed to be CVBR at the time because bitrate didn't go below 128 kbps). Today it's 4.66!
The high anchor was LAME --preset standard (or -V2 now) and was rated at 4.58, being statistically better than competitors at 128 kbps. Today both AAC and OPUS at ∼130 seem to perform as well as MP3 VBR at 192 kbps. This assumption was also somehow confirmed recently by another listening (https://hydrogenaud.io/index.php?topic=117489.0) test made last year by kamedo2:
OPUS seems also outperforming its ancestor: Ogg Vorbis.
So, the first part of my test is now over… I'll make a small break and I’ll begin the second part :) It will probably take few weeks.
In the meantime, comments are welcome!
Thank you very much for your efforts! Posts like these are the reason I still keep returning to HA after all these years.
The results are very interesting and useful for choosing a good setting for everyday listening. I also listen mainly to classical music and your findings agree with the (very limited) testing I did myself.
Your tests seem to fit my views in the electronic sense. With AAC on it own 160k = 4.80, Vorbis = 4.50. Not get off topic has their even been one off 256kbps test & even silly 480kbps test?.
You certainly did. Thank you!
Edit: So nice to see you guys still at it, proving palpable progress has been made (still is, as xHE-AAC/USAC is proving it now) in lossy audio coding - a field many a self-proclaimed golden ear have keenly dismissed as being stuck ages ago.
...and to think most people outside communities like HA, showing a stiff upper lip towards lossy audio formats, haven't the faintest it's all they been listening to all this time, whenever playing their beloved videos - on and offline - or watching digital TV.
Sometimes I think this "quest for the best" culture has become, thanks to the power Google affords us all, inherent to the rather instantaneous results we, as a society, have grown used to aim for lately - as if an end-all panacea were actually a thing!
Proof of that is the usual load of what's the best this or that... questions pervading well meaning communities like Quora.
As a simple search result can attest, such questions usually come from completely newbies into something - who don't want to "waste" their time, patience or money (from day one) by (God forbids!) learning/adopting something "wrong":
be it something as complex and time-demanding as a programming language, or more trivial decisions such as which Linux distro to adopt for personal use or even a simple calculator phone app... or lossy audio, for that matter - which, were they to pick the, again, wrong choice, might make them look like they were on some sort of losing side later on.
So much so that, back to the Linux distro option, it strikes me as odd the distro community hub (distrowatch), in good olde "snake eats own tail" fashion, relies, of all things, on how much page views within their website each distro has been getting from its own visitors to rank them up on their home page as if in some sort of (sigh!) best of list!
With such rankings, one could be forgiven to think fiddling with new stuff or being a trail blazer these days has become something rather out of fashion.
Guruboolez, it seems your listening tests are getting larger and larger! Truly impressive work, very thoroughly done!
Just one question since I'm not familiar with Apple's AAC encoder: would it have been possible to use mode "q37" or "q38" instead of the "q 36" at the low-rate end? It seems (according to your 200+-hour bit-rate test) that this mode rarely reaches your target rate while Opus almost always exceeds that target rate. So part of the reason that AAC scores relatively low at 80 kbps could be that Opus uses ~10% more bit-rate there.
My god! When you do a listening test, you do a listening test!! Thank you so much for all your hard work.
I wish I was this meticulous with my tests. Mostly any more I just invert the wave form and mix it with the original to see what the noise sounds like after. I suppose that really only works for me because I'm trying to see what codec keeps the most information at the highest bit rate.
@For Serious : Good psychoacoustic encoders aren't designed for accuracy to waveform, but rather what loss is acceptable in reducing information via spectral masking or how long certain frequencies can be extended in the time domain. The kind of difference test you describe won't tell you much in terms of how human hearing works; we use ABX for that. Difference waveforms are pretty much only useful if you get a difference so low in SNR that the compressed result is comparable to 8 bits or higher lossless; but to reach that point the bit rate will be so high that you might as well use FLAC or WavPack.
Wow, haven't been to HA in a while. Vorby is no longer king and Apple has brought their lossy counterpart to the forefront apparently. What about at 60 kbps? My tin ears (not a lot of classical in the collection either) don't mind that average bitrate with Vorbis.
As a classical music listener, I can't thank you enough for this great test. I will stick to apple aac 128kbps in casual listening.
I'm late to the party, but THANK YOU for your test. Very interesting and useful results.