HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: guruboolez on 2005-07-10 19:13:53

Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-10 19:13:53
[span style='font-size:17pt;line-height:100%']I. INTRODUCTION[/span]


After some listening tests performed during two years, I’ve experimented something new, based on this discussion (http://www.hydrogenaudio.org/forums/index.php?showtopic=34923&view=findpost&p=307329). This time, I've tried to perform a multiformat blind comparison based on a much larger group of samples, but without ABX confirmation. Tests are still performed within a double-blind methodology: only difference is that I haven’t spent time to confirm the audible differences with an ABX session. The spared time was invested in something more interesting (to my eyes but also for statistical analysis tools): 150 samples instead of the 15 usual ones.

[span style='font-size:14pt;line-height:100%']1.1/ classical samples[/i][/span]

Few words about this extravagant number. I was used to perform comparisons on a limited number of classical samples (15…20). It was probably enough to draw reliable leads about relative quality of various codecs, but such limited collection couldn’t represent the fullness of classical music, which consists of numerous instruments played in countless combinations, offering for most of them a wide dynamics. There are also voice, electronic, and to finish all variants linked to technical factors (acoustic, recording noise, etc…). That’s why I’ve tried to build a structured collection of “classical music” situations, which of course doesn’t aspire to completeness, but which should represent most situations. The collection is made up of very hard to encode samples as well as of very easy ones, loud (+10 dB) and ultra-quiet (+30 dB); noisy and crystal clear recordings; ultra-tonal and micro-detailed sounds. I’ve split it in four series:

Quote
artificial: electronic samples – most should correspond to critical samples for lossy encoders. Total: 5 samples.
ensemble: various instruments (no voice) played together. I’ve divided it in 2 categories: chamber music and orchestral music (wider ensemble). For each category, I’ve distinguished period instruments (Middle-Age, Renaissance, Baroque) and modern ones (~19 and 20 century). Total: 60 samples.
solo: instrument played alone. Again, I’ve created separate categories (winds, bow, pinch strings [i.e. guitar family: lute, theorbo, harp…], keyboards). Total: 55 samples.
voice: male, female, child – in solo, duo and chorus. Total: 30 samples.


(note#1: all samples are deliberately short. First, it’s easier to upload them. Second, there’s only one acoustic phenomenon to test per sample, and it makes comparison between different tests a bit more interesting. The exact length for the collection is 25 minutes; it corresponds to 10.00 seconds per sample on average).


(note#2: all samples were named following a simple convention. The first letter (A, E, S, V) corresponds to the category (artificial, ensemble, solo, voice). The number to the catalogue number. Then, additional information is tied: nature of instrument, type of instrument or voice, etc…

ex: S11_KEYBOARD_Harpsichord_A
ex: E35_PERIOD_CHAMBER_E_flutes_harpsichord.mpc
To make short, samples will be called S11, E35, etc…)




With such a collection, I should obtain very precise idea of different lossy encoders performance on classical. For me, it’s interesting, especially if I plan to buy in the near future a portable player supporting one new audio format, as Vorbis, AAC or WMAPro. I’d like to know how good these new formats are compared to MP3. These 150 samples may also help developers/testers for evaluating the performance of codec on a wide panel of situations.

[span style='font-size:14pt;line-height:100%']1.2/ various music samples[/i][/span]

Last and not least, I’ve decided to give more audience to this test by adding samples representing some other genres than classical. For an elementary reason –99.9% of my CDs are classical- I can’t build the same kind of structured collection with what I will call now to make short “various music”. I used all samples selected by Roberto during his listening tests, removed all classical ones, and kept the 35 samples representing “various music”. It’s much less than the 150 above, but more than the double of what was used during all previous collective listening tests.

=> total = 150 classical + 35 various = 185 samples.


[span style='font-size:14pt;line-height:100%']1.3/ choice of bitrate[/i][/span]


For my first test based on these samples, I’ve selected a friendly bitrate (at least as tester): 80 kbps. It may appear as uninteresting, that’s why I must explain my choice.
First, I plan to perform similar tests at higher bitrate. My dream is to build a coherent set of tests including all bitrate from 80 to 160 or 192. But this project is very ambitious –too ambitious certainly- and I’ll possibly stop my tests (in this current form) at ~130 kbps.
But why 80, and not 64 kbps? To my ears, there is currently no encoder that sound satisfying at 64 kbps. They’re all disappointing or unsuitable to listening on headphone, even crap ones, even on urban environment (I repeat: to my ears). But I’ve noticed that the perceptible and highly annoying distortions I’ve heard at 64 kbps are seriously lowered once the bitrate reaches the next step. Vorbis has less problems, AAC-LC (at least advanced encoders) also seems to improve quickly beyond 64 kbps. It’s a bit like mp3, which was considered as acceptable at 128 kbps, but which quickly sunk below this value. I would consider as reasonable the *idea* of an acceptable quality at 80 kbps with modern encoders. Let’s see the facts...



[span style='font-size:17pt;line-height:100%']II. PROBLEMS[/span]



[span style='font-size:14pt;line-height:100%']2.1/ competitors[/i][/span]

One big problem with this kind of test is the choice of competitors. Choosing the formats is easy: tester has just to select want he considers as interesting. Here, I’ll exclude outdated formats (vqf, MP3Pro) and unsuitable ones (MPC, MP3 – this last one would also be interesting to test, just for reference...). Remains: WMA, WMAPro (if available at this bitrate), AAC-LC, AAC-HE, Vorbis. But what implementation should I use? Nero AAC or iTunes AAC? Nero AAC features a VBR mode, but is VBR reliable at this bitrate, especially for samples which represents a wide dynamic? And for Nero, which encoder would be the best: the “high” one (default, which has verified issues with classical) or the “fast” one (which performs better with classical, but maybe not as well with various music, and which is still considered as not completely mature by Nero’s developers)? Vorbis CVS or Vorbis aoTuV? I’d say aoTuV, but if vorbis fails people will (legitimately) suspect the other one could have performed better. WMA CBR or WMA VBR? VBR is theoretically better than CBR, but tests have already shown that VBR could be unsafe at low bitrate.
My first idea was to test them all. Schnofler ABC/HR allows the use of countless encoders in a same round (ff123 software is limited to 8 contenders). But after a quick enumeration of all possible competitors (iTunes AAC, Nero AAC CBR fast, Nero AAC CBR high, Nero AAC VBR fast, Nero AAC VBR high, faac, Vorbis aoTuV, Vorbis CVS, Vorbis ABR, WMA CBR, WMA VBR, HE-AAC fast, high, CBR & VBR...) and a mental calculation of the number of comparisons I have to perform with 185 samples and so many contenders, I’ve immediately canceled this project. Last but not least, multiplying the competitors in a single test will lower the significance (statistically speaking) of the results.
Then, I came to a second idea: testing all competitors for one single format in a single pool, and put the winner of each pool in the final arena. It’s like sports: qualification first, final for the best. Remaining problem is the additional work. I’ve planned to test 4…5 codecs per bitrate with 185 samples, not 13 or 14. That’s why I’ve reduced the number of tested samples for the preliminary pools. I’ve limited the number at 40 samples, using 25 samples coming from different categories of the complete classical collection and 15 from the 35 samples representing “various music”. The imbalance in favor of classical is intended: the whole test is clearly focused on classical – “various music” is just an extension or bonus.


[span style='font-size:14pt;line-height:100%']2.2/ Encoding mode and output bitrate[/i][/span]

Other problem: VBR and CBR. Testing VBR and CBR has always been a source of controversy. In my opinion, testing a VBR encoder which outputs the targeted bitrate on average (i.e. a full set of CDs) is absolutely not a problem, even if bitrate reach amazing value on short tested samples. It’s not a problem, but the test should meet in my opinion the following condition: the test must include samples for which VBR encoders produce high bitrate as well as low one. VBR encoders have the chance to automatically increase the bitrate when a difficulty is detected – possibility that CBR encoders don’t have, and they sometime suffer from that handicap, especially on critical samples. But VBR encoders also decrease the bitrate of musical parts they don’t consider as difficult – and this diminution is sometimes very important; theoretically it shouldn’t affect the quality, but we know the gap between theory and reality, between principle and implementations of the principle. Testing the output quality of ‘non-difficult’ part is therefore very important, because these samples are the possible handicap of VBR encoders; otherwise there’s a big risk of favoring VBR encoders over CBR by testing only samples apparently favorable to VBR (whatever the format).
My classical music gallery is not exclusively based on critical or difficult samples; most of them don’t exhibit any specific issue. The sample pool should therefore be fairly distributed between samples with lower bitrate than the targeted one and samples with a higher bitrate. I’ll post as appendix a distribution curve which confirms this.

[span style='font-size:14pt;line-height:100%']2.3/ degree of tolerance[/span]

By testing VBR profiles, it’s not always possible to match the exact target. Some encoders don’t have a precise scale of VBR settings. With luck, one available profile will approximately correspond to the fixed bitrate; sometimes, the output bitrate will deviate too much from the target. CBR is not free of problem either, although they’re less important. With AAC for example, CBR is a form of ABR: output bitrate could vary a little (but fortunately not very much).
That’s why trying to obtain identical bitrate between various contender could be considered as an utopia, even when the test is limited to CBR encoders only. The tester has therefore to allow some freedom: not too much of course in order to keep significant comparisons and not too less in order to make the test possible. I consider a deviation of 10% as acceptable, but again, at one condition: 10% between the lowest averaged bitrate and the highest averaged one, and not 10% between all encoders and the target. As example, if one encoder reaches 72 kbps (80 kbps - 10%) and another 88 kbps (80 kbps + 10%), the total difference would be ~20%: too much.
However, I will possibly allow rare exceptions: when a VBR profile is outside but close to the limit or if it would be more interesting to test a more common profile (example : musepack –quality 4 instead of –quality 3.9). Of course, the deviation mustn’t be exaggerated; and I’ll try to limit the possible exceptions to the pool, in order to keep the fairest conditions during the final test.

[span style='font-size:14pt;line-height:100%']2.4/ Bitrate evaluation for VBR encoders[/span]

Now that rules are fixed, we have to estimate the corresponding bitrate for each VBR encoder and profile. It’s not as easy as we can suppose. Ideally, I had to encode a lot of albums at each profile. But with my slow computer, it’s not really possible. And doing it would only help to obtain the corresponding bitrate for classical; according to my experience, this average bitrate could seriously differ from the output value that other people listening to other music (like metal) have already reported. Think about LAME sf21 issues, which could inflate the bitrate up to 230…250 kbps with –preset-standard, and compare it to the average bitrate I obtain with classical: <190 kbps! Other but different example: lossless.
For practical reasons, I followed a methodology I don’t really consider as acceptable, and took the average bitrate of the 185 kbps as reference for my test. I don’t like it, because short samples could dramatically exaggerate the behavior of VBR encoders, and therefore distort the final estimation. Nevertheless, with 185 samples, this kind of over- and underrating occurring with some samples would normally be softened. And indeed, it seems that the average bitrate of encodings I’ve done of the full suite with formats I’ve used in the past (lame –preset standard, MPC) are very close to the average bitrate of my ancient music library. I can’t absolutely be certain that my gallery works like a microcosm and that bitrate matches the real usage of a full library, but I’m pretty sure that the deviation isn’t significative (+/- 5%, something like that).

[span style='font-size:14pt;line-height:100%']2.5/ Bitrate report[/i][/span]

There’s, before starting to reveal the results one last problem I’d like to put in the spotlight. It concerns the different way to calculate the bitrate. I’ve tried to obtain the most reliable value, and that’s why I’ve logically thought to calculate it myself with the filesize as basis. As long as no tags are integrated within the files, the calculated bitrate should correspond to the real one (audio stream). But the problem is somewhere else. Some formats are apparently embedded in complex containers, which weigh the size down. It’s not a problem in real life: adding something like 30 Kb per 5 Mb file is totally insignificant. But when these 30 Kb are appended to very short encodings, the calculation of the average bitrate is as consequence completely distorted. Concrete example: iTunes AAC. Just experiment the following thing: encode a sample (length: one second exactly) in CBR. At 80 kbps, we should obtain an 80 Kbits or 10 Kb file (80 x 1 / 8). But the final size is 60 Kb, and it corresponds to a 480 kbps (60x8) encoding! What’s the problem? Simply because iTunes add for each encoding something like 50 Kb of extra-chunks. The problem could be solved with foobar2000 0.8 and the “optimize mp4 layout” command: filesize drops to 14 Kb. But even here, the 14 Kb correspond to ~128 kbps bitrate, and the audio stream is only 80 kbps.
iTunes is not apparently alone in this situation. I haven’t looked closely, but it seems that WMA (Pro) have the same behavior, and we have no “optimize WMA layout” tool to partially correct this. If we keep in mind that the average length of my samples is 10 second with some of them at only 5 seconds, we have to admit that calculating the bitrate with filesize/length formula is for this test anything but reliable.

That’s why I followed the value calculated by specialized software. MrQuestionMan 0.7 was released during my test, but the software have some issue to calculate a correct average size on short-sized encodings (iTunes AAC encodings as example). Foobar2000 appeared as the most reliable tool, and I’ve decided to trust the calculated value. For practical reasons, foobar2000 is also preferable: the “copy name” command could be modified to easily export bitrate in spreadsheet.

[span style='font-size:14pt;line-height:100%']2.6/ notation and scale[/u][/span]

The -really- last problem
Each time I have to evaluate quality at low bitrates I regret the inappropriateness of the scale in use in ABC/HR. At 80 kbps, encodings would rarely reach the 4.0 state (“slight but not annoying difference”). 3.0 (“slightly annoying”) would rather be the best quality degree that modern encoders could obtain at this bitrate. It implies that the notation will fluctuate within a compressed scale, from 1.0 to 3.0. It’s not very much, especially when big differences in quality between contenders are noticed by the tester.
To solve this issue, I’ve simply mentally lowered the visible scale by one point. Example: when I considered an encoding to be “annoying” (state corresponding to “2.0”) I put the slider to 3.0. The scale I used for the test was:
5.0 : “perceptible but not annoying”
4.0 : “slightly annoying”
3.0 : “annoying”
2.0 : “very annoying”
1.0 : “totally crap”

If exceptionally one encoding appeared as corresponding to “perceptible but not annoying” I’ve put the slider on 4.9, which means “5.0”; if the quality was superior to this state, I wrote the exact notation in comments. A transparent encoding obtained 6.0.
When the tests were finished, I’ve removed one point to all notation. 6.0 became 5.0, 3.4 -> 2.4 and 1.0 were transformed in a shameful 0.0! By doing it, I maintain the usual scale; only change is therefore a lower floor, corresponding to an exceptionally bad quality.
The redefinition of the quality scale could directly be redefined with Schnofler’s ABC/HR software, but apparently the tester have to type the description for each new test (did I miss an option?); it was faster for me to do this small mental exercise rather than typing more than 200 time the same content


Now, the pools !
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-10 19:14:34
[span style='font-size:17pt;line-height:100%']III. PRELIMINARY POOLS[/span]




[span style='font-size:14pt;line-height:100%']POOL#1: Nero AAC-LC[/span]

Nero currently offers the wider support for AAC: two different encoders (veiled behind the name “high” and “fast”), and for each, support for CBR and VBR. The purpose of the first pool is to establish which one could be considered as the most trustable AAC encoding solution from Nero. I didn’t include the ‘fast’ encoder in VBR mode: “radio” profile targets a bitrate inferior to 70 kbps and the next profile (“internet”) reaches the 140 kbps ceiling.
Contenders for this first pool:

• Nero AAC Codec 3.2.0.15 “fast” CBR 80
• Nero AAC Codec 3.2.0.15 “high” CBR 80
• Nero AAC Codec 3.2.0.15 “high” VBR ::radio:: [87 kbps on 185 samples]


My iTunes is based on QuickTime 7.02 (it appears in the MP4 metadata).

• faac 1.24.1 –q70 [84 kbps sur 185 échantillons]
• iTunes v4.9.0.17 / QuickTime 7.0.2 CBR 80
• Nero AAC Codec 3.2.0.15 “fast” CBR 80


(http://audiotests.free.fr/tests/2005.07/80/80TEST_PLOTS_02.png)

(analytic results are available > here (http://audiotests.free.fr/tests/2005.07/80/80TEST_RESULTS02.png)<)


• faac offers a very poor quality in its current state: the increased lowpass (13 KHz; 14 KHz for Nero & 15 KHz for iTunes) is often audible. More annoying: severe distortions which affects most tested files. Warbling is also often audible. I recall that faac suffers from warbling with some tonal samples up to –q500! However, this severe comparison shows how much could AAC be improved at low bitrate.

• iTunes AAC offers very similar quality from one group to another. Classical music and Various music are encoded with approximately the same quality. Obviously, iTunes AAC is well balanced. Quality is very similar to Nero on classical (warning: ‘fast’ encoder only – ‘high’ is crappy here), and slightly better (but without a 95% confidence level) with various music.

• Nero AAC ‘fast’ is less balanced than his contender. No surprise: it was revealed during the first pool. However, quality is very close to iTunes; a small difference remains, at least for me and for the 40 tested samples.

=> iTunes AAC is qualified for the final comparison



[span style='font-size:14pt;line-height:100%']POOL#3: AAC-HE – Nero[/span]


There are some AAC-HE implementations available. The Apple’s one is still not available on Windows. I didn’t test Real (it needs Producer). Therefore, I only have Nero to test. But it means two different encoders, with 2 different settings (CBR, VBR): four combinations. However, VBR can’t be tested here. The highest VBR settings both output low bitrate (60 kbps, too far from the target - see the bitrate table (http://audiotests.free.fr/tests/2005.07/80/80TEST_BITRATETABLE.png)).

• Nero AAC Codec 3.2.0.15 “fast” CBR 80
• Nero AAC Codec 3.2.0.15 “high” CBR 80


(http://audiotests.free.fr/tests/2005.07/80/80TEST_PLOTS_03.png)

(analytic results are available > here (http://audiotests.free.fr/tests/2005.07/80/80TEST_RESULTS03.png)<)


This time, the ‘fast’ encoder isn’t superior anymore with classical music (group1), but still reveals slight regression on various music (group2). People might also note the very low average notation for both encoders. Nero’s AAC-HE suffers from several artifacts, typical of SBR I would say, which are constantly annoying me. The only difference I’ve noticed between both encoders was a tiny reduction of the level of audible artifact (sandy sound).



[span style='font-size:14pt;line-height:100%']POOL#4: Ogg Vorbis – 1.1.1 & aoTuV beta 4[/span]


This big listening test was a good occasion to compare the modifications introduced by Aoyumi in 1.1.0 core (aoTuV beta 4 based on 1.1.1 wasn’t released when I performed the pool). It also gives me the opportunity to evaluate the performance of ABR (not recommended) over VBR. I don’t expect  anything from ABR, but surprises are always possible with lossy encodings.
Unfortunately, aoTuV and 1.1.1 don’t output the same bitrate (see  the bitrate table (http://audiotests.free.fr/tests/2005.07/80/80TEST_BITRATETABLE.png)). Difference isn’t that big, but it might favor aoTuV results. I’d prefer compare both on near identical basis, in order to see which one is the best. That’s why I kept aoTuV at –q1, and increased 1.1.1 to match the same bitrate. –q1,5 was very close (and it’s a semi-round number: I prefer that over eccentric settings like –q1,38 or something like that).

• vorbis 1.1.1 –q 1,5
• vorbis 1.1.1 ABR 83 (obtained with John33’s OggDropXPd)
• vorbis aoTuV beta 4 –q 1,00


(http://audiotests.free.fr/tests/2005.07/80/80TEST_PLOTS_04.png)

(analytic results are available > here (http://audiotests.free.fr/tests/2005.07/80/80TEST_RESULTS04.png)<)


• on group1, all encoders are tied (although aoTuV is better than 1.1.1 with 90% confidence). It’s a disappointment for me, because I’ve seriously expected from aoTuV to reduce the level of coarseness/fatness on this specific musical genre. However, slight improvements were often perceptible – it’s better than nothing. With some samples, a slight regression was also perceptible: additional distortion or apparently restrictive lowpass (noticed with harpsichord). Interesting to note that ABR doesn’t perform badly, except on critical samples (bitrate stayed at ~85 kbps when VBR encodings reached 160!); ABR also sounded a bit better with some samples (tonal one). Good point to ABR (just note that encoding speed is dramatically slow compared to VBR).

• on group 2, differences are much more defined. ABR appeared as clearly worse than VBR and aoTuV beta 4 outdid 1.1.1 on VBR mode. Obviously, the changes Aoyumi made on vorbis are much more effective on various music.

=> on average, aoTuV beta 4 was better than 1.1.1 (not a surprise I would say), and therefore will rejoin the final.


[span style='font-size:14pt;line-height:100%']POOL#5: WMA 9.1 Standard[/span]


WMA9Pro offers a minimal CBR setting at 128 kbps; on the other side VBR Q10 outputs to 68 kbps and the next step (Q25) to ~110 kbps. WMA9Pro can’t for that reason compete in this test.
I’ve therefore limited the test of Microsoft products to WMA9 standard. It’s the only one that could be played on DAP, and the number of manufacturers supporting WMA STD is countless. WMA is supposed to offer better quality than MP3 at this bitrate, and it’s therefore interesting to see how will really perform this format (at least, I will see it). I have compared CBR to VBR. VBR Q25 offered a nice, round 80 kbps bitrate on 185 samples. However, people should keep in mind that bitrate was lower with the 150 classical samples (76 kbps) and higher with the 35 various music ones (88 kbps).

• Windows Media Audio 9.1 CBR 80
• Windows Media Audio 9.1 VBR Q25


(http://audiotests.free.fr/tests/2005.07/80/80TEST_PLOTS_05.png)

(analytic results are available > here (http://audiotests.free.fr/tests/2005.07/80/80TEST_RESULTS05.png)<)

• CBR80 was slightly inferior to VBR Q25 on classical music. It’s a good point for VBR, because wide dynamics samples are often harder to handle at low bitrate with VBR than CBR. It might interesting to remind that this better performance was obtained with a (slightly) lower bitrate.

• with group2, difference is much more contrasted. CBR 80 performed very poorly on various music, whereas VBR revealed significant progress. Microsoft clearly improved his product with VBR. VBR offers the most balanced results between both groups (1.79 & 1.67), whereas CBR is obviously unbalanced (1.57 vs 1.07) in favor of classical music.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-10 19:15:45
[span style=\'font-size:17pt;line-height:100%\']IV. FINAL TEST: AAC vs Vorbis vs WMA vs MP3[/span]


Have joined this final the following contenders:

• AAC-HE : Nero AAC 3.0.0.15 CBR 80 ‘high’
• AAC-LC : iTunes v4.9.0.17 / QuickTime 7.0.2 CBR 80
• Ogg Vorbis : aoTuV beta 4 VBR –q1
• WMA Standard : Serie 9.1, VBR Q25

At this stage, there’s one slight problem: vorbis bitrate is a bit higher than other contenders (~83 kbps). It’s not a problem: 3 kbps can’t lead to significant difference. However, I know that some people are used to whine, especially when their favorite encoder doesn’t appear to win by far each listening test. That’s why I have decided to not use vorbis aoTuV at –q1, but to set it at –q0,9. Bitrate is now very close to other contenders – a bit too low for classical and a bit higher than 80 for various music. Interested people could refer to the bitrate table (http://audiotests.free.fr/tests/2005.07/80/80TEST_BITRATETABLE.png).

To complete this test, I’ve also add two anchors:

• as high anchor, I’ve considered MP3 at 128 kbps as the most profitable one: good enough to play the role of high anchor, and also interesting reference. Most editors or sellers are used to claim that modern encoders could perform as well as MP3@128 at half bitrate. Here we will see if the best implementation of each audio format could do it at only 60% of MP3. I’ve decided to maximize the quality of this anchor: ABR and LAME 3.97a10. The setting was --preset 131 in order to match 128 kbps (126 in facts).

• as low anchor, I hesitate. Finally, I’ve decided to use MP3 again, at 80 kbps. Quality should theoretically be low enough (but I had serious doubts before starting the test); I also believe as very important to obtain a direct comparison between old MP3 and new competitors, theoretically much better at such low bitrate. Again, I've used LAME 3.97a10 and --preset 82

[span style=\'font-size:14pt;line-height:100%\']RECAPITULATION[/span]

AAC-HE, CBR[/u]  •  Groupe 1 : 76 kbps —— Groupe 2 : 78 kbps
AAC-LC, CBR[/u]  •  Groupe 1 : 80 kbps —— Groupe 2 : 80 kbps
Ogg Vorbis, VBR[/u]  •  Groupe 1 : 76 kbps —— Groupe 2 : 83 kbps
WMA Std, VBR[/u]  •  Groupe 1 : 76 kbps —— Groupe 2 : 88 kbps
and as off-competition :
MP3, ABR[/u]  •  Groupe 1 : 78 kbps —— Groupe 2 : 80 kbps
MP3, ABR[/u]  •  Groupe 1 : 124 kbps —— Groupe 2 : 128 kbps

(N.B. the indicated bitrate values correspond all to the average mean calculated by foobar2000, which weight the calculation on length of each sample. These values slightly differ from the average ones calculated with Excel and reported at the bottom of my bitrate table (http://audiotests.free.fr/tests/2005.07/80/80TEST_BITRATETABLE.png)).

[span style=\'font-size:21pt;line-height:100%\']RESULTS[/span]






[span style=\'font-size:14pt;line-height:100%\']ANALYTICAL COMMENTS[/span]


• MP3 LAME, 80 kbps: low anchor, there’s nothing to comment. Quality is poor, but not as much as expected. Indeed, I was sometimes surprised by the quality obtained with MP3 at this setting: this format could handle decently some ‘easy samples’ (more easily I would say than some competitors). In rare cases, the low anchor obtained a better note than the high one. I thought it was a mistake, but when I checked it later, I have confirmed this. The reason is simple: lame 3.97 have warbling problems occurring with some samples; this warbling was a bit much more annoying to my taste than the lowpass/resampling of the 80 kbps encoding, and that’s how a 80 kbps obtained a better notation than a 128 kbps one.

• AAC-HE (Nero, CBR 80 “high”): very disappointing score, for this format claimed to be a killer at low bitrate. 80 kbps is probably excessive for AAC-HE, now that AAC-LC implementation are getting better and better (take a look again on POOL#1, and see how AAC-LC have progressed). AAC-HE doesn’t suffer from any lowpass, but the SBR layer is highly impure, and seems to interfere with the lowest part of the spectrum. As result I get constant artefacts, noticed with more than 90% of the tested samples. AAC-HE has a maybe CD spectrum, but it’s like if a cricket was directly screeching in my headphones. Personally, I would consider something poorer (with audible lowpass and some ringing) as better that this (un)constant parasitical noise. Just a personal appreciation; other people might prefer the opposite – I don’t know. AAC-HE also have *big* troubles with attacks (pre-echo) and fine details (smearing), even more audible than simple MP3. AAC-HE would probably more pertinent at lower bitrate, for which other contenders would probably be in pain.

• AAC-LC (iTunes, CBR 80): poor results. I’ve expected something better, a bit more suitable for listening with portable player. Quality is not *that* bad (just compare to MP3 or WMA for reference), but there are too often irritating distortions. Lowpass is also annoying, at least on ABCHR conditions (with direct comparison with a full quality reference file), probably less perceptible on common earbuds (I’ve tried, and quality suddenly became much less irritating).

• Vorbis (aoTuV beta 4, VBR –q 0,9): this is by far the most enjoying thing I’ve heard at this bitrate. I was highly surprised by results I’ve got with the 150 classical samples; I was literally astonished by the final score obtained with the 35 remaining samples! Vorbis is obviously an amazing tool at this bitrate. Or differently: Vorbis apparently embed some encoding tools (point stereo?) which are remarkably suited for this bitrate (but which are maybe interfering too much at higher bitrate: see this test (http://www.hydrogenaudio.org/forums/index.php?showtopic=18359&hl=) and this test (http://www.hydrogenaudio.org/forums/index.php?showtopic=29925&hl=)).
Quality is not perfect of course; usual vorbis problems are here: noise boost, coarseness, fatness. Distortion (vibrating effect) on long tonal notes also occurs. But these issues are limited (at least compared to other mutilations produced by other encoding tools at this bitrate) and I would say that Vorbis at this bitrate could be pertinently used for portable playback by people which are not excessively hard to please and more interested to maximize the capacity of their flash memory digital player. It’s too bad for me that vorbis performances are not as good with classical as with “various music”. But even on this “Achilles’ heal”, vorbis outperforms current other encoding tools.

• Windows Media Audio (9.1 Standard, VBR Q25): performance is well-balanced... in weakness. WMA shares the same problems than MP3 at similar bitrate: audible lowpass (13 KHz), and a lot of distortions going with many artefacts. WMA is sometimes better than MP3, sometimes worse (especially with classical, a bit less true with various music – but I recall that WMA VBR outputs something like ~88 kbps with various music, and that MP3 was tested at a bitrate lowered by 10%). I suppose that WMA would gain some quality by using an automatic resampling like LAME does: from my experience, it helps the encoder to limit the amplitude of some artefacts. I’ve often read that WMA should be preferred to MP3 for encodings at less than 128 kbps; these results could question this. Can’t we expect from MP3 to perform as well if not better than WMA at 96 kbps? Answer in some weeks, for my next listening test.

• MP3 LAME, ABR 128 kbps: high anchor, perfect in this role. Quality seems to evolve in another universe than all modern audio formats, of course not at comparable bitrate. But it should indicate how optimistic (should I say "biased") the claims of most software editors, which don't hesitate to proclaim a 50% efficiency over MP3  I also recall that "MP3 at 128 kbps" doesn't necessary mean "LAME at 128 kbps". Compared to less efficient implementation of the format, modern AAC and Vorbis encoders could perform as well (and probably better for Vorbis).






To finish: well, this test was long to perform, but also enjoying. Testing blindly 150 samples is less boring than testing 15 ones but with fastidious and sometimes pointless ABX sessions (pointless when difference is really obvious). People might be surprised –and even more uninterested– by the low bitrate tested here. I recall that my purpose wasn’t to evaluate encoders at near-transparency settings, but to see if I could get something decent at 80 kbps, and to evaluate with precision which encoding tool could be safely considered to my ears as the best.

I also recall that testing 185 samples, even various ones, even in double-blinded conditions, doesn’t remove one important limitation to such test: results correspond to my own subjectivity. It’s important to remind it, especially at low bitrate. Why at low bitrate? Simply because tester have to evaluate two things: the amount of degradation and the kind of degradation. The distortion introduced at low bitrate could take different shape: lowpass, ringing, coarseness, noise boost, metallic color, etc… A single tester could be more tolerant with one kind of distortion, whereas another one could hate it (people who’ve followed the old debate about RV9/RV10-blurring vs MPEG-4-blocking probably know what I mean).
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-10 19:16:36
[span style='font-size:17pt;line-height:100%']V. APPENDIX[/span]


Very short this time

• If people have missed it, they could analyse the bitrate table:
[a href="http://audiotests.free.fr/tests/2005.07/80/80TEST_BITRATETABLE.png" target="_blank"]

[EDIT: of course I missed someone: John33...]
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-10 20:34:35
Awesome, as always!

Thanks, Guru.
Title: 80 kbps personal listening test (summer 2005)
Post by: SirGrey on 2005-07-10 21:02:50
Thanks a lot, guruboolez, as always 

To be honest, I didn't expect such a difference between last ogg & apple lc aac implementations.
Very interesting results...

EDIT: BTW, it seems that Nero encoder have no major updates in the last 12 month...
What they are waiting for, X-mas ? ®Duke Nukem   
Title: 80 kbps personal listening test (summer 2005)
Post by: spoon on 2005-07-10 21:08:29
RE Bitrate problems: if your samples are 10 seconds could you create one large sample using that 10 seconds looping over and over 10 times, encode and then calc the bitrate and divide by 10?
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-10 21:25:23
Quote
RE Bitrate problems: if your samples are 10 seconds could you create one large sample using that 10 seconds looping over and over 10 times, encode and then calc the bitrate and divide by 10?
[a href="index.php?act=findpost&pid=312329"][{POST_SNAPBACK}][/a]

I don't understand. Could you explain?
Title: 80 kbps personal listening test (summer 2005)
Post by: Pri3st on 2005-07-10 22:06:20
Amazing work!

Thanks for all that information.
Title: 80 kbps personal listening test (summer 2005)
Post by: ExUser on 2005-07-10 22:12:24
Wow, guru. Even after debunking my invalid assertions you find time to do things like this. Good job. 
Title: 80 kbps personal listening test (summer 2005)
Post by: sehested on 2005-07-10 22:20:34
Great work guruboolez!
Title: 80 kbps personal listening test (summer 2005)
Post by: music_man_mpc on 2005-07-10 22:29:03
Merci beaucoup Guru!
Title: 80 kbps personal listening test (summer 2005)
Post by: bond on 2005-07-10 22:40:02
i also have to say i am impressed! great work and very interesting results
Title: 80 kbps personal listening test (summer 2005)
Post by: Megaman on 2005-07-10 23:07:21
Thank you for taking the time to test the latest encoders!. Excellent post.
I wish there were many Aoyumis around the world
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-10 23:13:09
What impressed me the most was Vorbis' performance, even compared to "state of the art" HE AAC (even though Guruboolez' tastes probably played a big role on those results). If any, that's yet another proof of aoyumi's enormous talent.
Title: 80 kbps personal listening test (summer 2005)
Post by: nyaochi on 2005-07-11 00:15:57
Thank you very much for the fabulous test, guruboolez! I didn't expect such a huge difference between Vorbis and other codecs. And Aoyumi's effort becomes obvious.
Title: 80 kbps personal listening test (summer 2005)
Post by: IgorC on 2005-07-11 00:17:26
Great job. I agree with the most part of the statements.  And HE-AAC doesn't sound great  comparing to OGG, AAC-LC. Indeed at lower bitrate situation is different.
It would be interesting to see 64 kbit/s in same way. Ande see how OGG is good comparing to enchased HE-AAC.
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-11 00:35:04
Quote
It would be interesting to see 64 kbit/s in same way. And see how OGG is good comparing to enchased HE-AAC.[a href="index.php?act=findpost&pid=312382"][{POST_SNAPBACK}][/a]


I agree. Vorbis has progressed immensely since my 64kbps test (that featured it at pretty much version 1.0). It would be interesting to see how it competes now.

If only Apple released their HE AAC encoder already, that test would not be only interesting - it would be necessary.

Oh well...
Title: 80 kbps personal listening test (summer 2005)
Post by: Razor70 on 2005-07-11 01:05:18
So can I be the noob and stupid one here and ask..what does this tell us?  Why do all the tests always run towards the lower bitrates and not the higher bitrates?  Okay you can flame me now lol.
Title: 80 kbps personal listening test (summer 2005)
Post by: ff123 on 2005-07-11 01:25:12
Fantastic work, guru!  Hats off to you and all the developers.

ff123
Title: 80 kbps personal listening test (summer 2005)
Post by: Jojo on 2005-07-11 01:27:04
wow! not sure what to say...it's just amazing!
Title: 80 kbps personal listening test (summer 2005)
Post by: HotshotGG on 2005-07-11 02:35:56
Quote
So can I be the noob and stupid one here and ask..what does this tell us? Why do all the tests always run towards the lower bitrates and not the higher bitrates? Okay you can flame me now lol.


It's more difficult to actually hear any substantial differences between codecs. Guruboolez is really the only one around here who has golden ears  . Personally I couldn't really tell the difference beyond -q 5 and up with Vorbis, but that's just me I am sure some folks have found some problem case samples. 
Title: 80 kbps personal listening test (summer 2005)
Post by: Danimal on 2005-07-11 02:43:17
Quote
So can I be the noob and stupid one here and ask..what does this tell us?  Why do all the tests always run towards the lower bitrates and not the higher bitrates?  Okay you can flame me now lol.
[a href="index.php?act=findpost&pid=312397"][{POST_SNAPBACK}][/a]


At  higher bitrates it becomes much more difficult to tell any difference between the various codecs except on specific problem samples.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 02:43:25
Quote
Indeed at lower bitrate situation is different.
It would be interesting to see 64 kbit/s in same way. And see how OGG is good comparing to enchased HE-AAC.

Hope that a collective listening will start soon.
I must say that the poor results of HE-AAC don't really surprise me. In my opinion (at least to my ears), Vorbis quality would quickly drop below ~70...80 kbps, whereas HE-AAC performance are probably stagnate above 50...60 kbps.

I suspect that the amazing difference between vorbis and other contenders is very specific to this bitrate. At 96 kbps, AAC-LC is probably stronger (much stronger?) than what I've heard on this test; at 64 kbps, AAC-HE screeching artefacts are probably more acceptable when compared to other form of distortions audible with non-SBR products.

It's just a suspicion. To confirm or infirm it, I'll probably start the second test very soon, and evaluate the relative quality of these contenders at 96 kbps. It should be less fastidious (less pools). Then, 128 kbps should follow (august, if I'm motivated enough). This one will be much harder
Title: 80 kbps personal listening test (summer 2005)
Post by: Destroid on 2005-07-11 02:44:51
Quote
So can I be the noob and stupid one here and ask..what does this tell us?  Why do all the tests always run towards the lower bitrates and not the higher bitrates?  Okay you can flame me now lol.
[a href="index.php?act=findpost&pid=312397"][{POST_SNAPBACK}][/a]


And also I find it interesting to see results in low-bitrate encodes when there are many encoders claiming CD quality. Guru has very good/critical listening abilities
Title: 80 kbps personal listening test (summer 2005)
Post by: Razor70 on 2005-07-11 02:45:51
Quote
Quote
So can I be the noob and stupid one here and ask..what does this tell us? Why do all the tests always run towards the lower bitrates and not the higher bitrates? Okay you can flame me now lol.


It's more difficult to actually hear any substantial differences between codecs. Guruboolez is really the only one around here who has golden ears  . Personally I couldn't really tell the difference beyond -q 5 and up with Vorbis, but that's just me I am sure some folks have found some problem case samples. 
[a href="index.php?act=findpost&pid=312419"][{POST_SNAPBACK}][/a]


Ok another stupid question on my part then (sorry I know this is all noobage stuff here and that's what I am), is using lower bitrates a better thing then?  Or is it just used to save space?  I get so lost on quality issues that I don't know what to use for a format.  I have a Ipod and would like to get the best quality for use on it and don't want to go loseless..so this is were I get confused on formats.  I know the thing to do is abx myself for what I think sounds the best..but isn't there a consenses on one format over another that would be the best format?  Right now I am using 128AAC but the only reason is because I see it as a good go between for space and quality.  But I really do want the best quality I can get.  So any help would be appreciated guys.
Title: 80 kbps personal listening test (summer 2005)
Post by: sld on 2005-07-11 02:46:05
Wow... as a satisfied user of Vorbis for my flash player, what more can I say?

3 thumbs up!
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 02:48:21
Quote
Guruboolez is really the only one around here who has golden ears.
I won't say that. I'm just trained to catch artefacts and distortions (at least some of them). I'm rather an artefact hunter than a blessed audiophile.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 02:52:25
Quote
Right now I am using 128AAC but the only reason is because I see it as a good go between for space and quality.  But I really do want the best quality I can get.  So any help would be appreciated guys.


Right now, you have two possibilities:

- keeping your current setting. If you're happy with 128 kbps encodings, you won't get any audible benefit from higher bitrate.

- if you really want the absolutely best quality with AAC, just go with Nero AAC and set the bitrate to CBR 448 kbps. It's totally insane, but you'll obtain what you've asked for: "the best quality I can get".
Title: 80 kbps personal listening test (summer 2005)
Post by: Enig123 on 2005-07-11 03:02:01
Guru, you always bring us such brilliant articles. Very impressive and convincing.
Title: 80 kbps personal listening test (summer 2005)
Post by: HotshotGG on 2005-07-11 04:56:24
Quote
Vorbis apparently embed some encoding tools (point stereo?) which are remarkably suited for this bitrate (but which are maybe interfering too much at higher bitrate: see this test and this test).
Quality is not perfect of course; usual vorbis problems are here: noise boost, coarseness, fatness. Distortion


I wonder how much noise normalization play's a large role in part due to low-bitrate encoding? I think a lot of the noise is characteristic in Vorbis has a lot to do with the the noise-floor is encoded via VQ approach, which is more pleasent sounding at least to me. I have been browsing through trying to figure out with Aoyumi had adjusted for educational purposes and I think I understand what he did at least for the B2 tunings that were merged into 1.1. Hmm.... thank you for the results though Guru. 

Quote
Right now I am using 128AAC but the only reason is because I see it as a good go between for space and quality. But I really do want the best quality I can get. So any help would be appreciated guys.


I was going to say the exact same thing that Guru said, but seeing that he answered your question first I would just stick with what you have now 
Title: 80 kbps personal listening test (summer 2005)
Post by: kl33per on 2005-07-11 06:17:23
Wow,

Thanks for putting the effort in Guru.
Title: 80 kbps personal listening test (summer 2005)
Post by: spoon on 2005-07-11 09:16:30
Quote
Quote
RE Bitrate problems: if your samples are 10 seconds could you create one large sample using that 10 seconds looping over and over 10 times, encode and then calc the bitrate and divide by 10?
[a href="index.php?act=findpost&pid=312329"][{POST_SNAPBACK}][/a]

I don't understand. Could you explain?
[a href="index.php?act=findpost&pid=312335"][{POST_SNAPBACK}][/a]


The problem is:

|---short audio data---| + container padding    is not giving the true bitrate (without fudging the stream), so duplicate your short audio data x10:

|---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + |---short audio data---| + container padding   

and calc the bitrate as divide 10.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 10:47:43
> Spoon: I undernstand better the purpose. Good idea, but fastidious if I have to apply it to so many samples.
Title: 80 kbps personal listening test (summer 2005)
Post by: Aoyumi on 2005-07-11 12:17:11
Guruboolez, I appreciate the large-scale test of you. 

Quote
Vorbis apparently embed some encoding tools (point stereo?) which are remarkably suited for this bitrate (but which are maybe interfering too much at higher bitrate: see this test and this test).

Control is simultaneously difficult although point stereo is powerful. However, probably, in dealings by the low bit rate, it will be indispensable. Although there was a case to which improvement which is aoTuV beta3 expanded artifact of point stereo, it has improved in beta4 (a part of channel coupling was changed).

Quote
I wonder how much noise normalization play's a large role in part due to low-bitrate encoding? I think a lot of the noise is characteristic in Vorbis has a lot to do with the the noise-floor is encoded via VQ approach, which is more pleasent sounding at least to me. I have been browsing through trying to figure out with Aoyumi had adjusted for educational purposes and I think I understand what he did at least for the B2 tunings that were merged into 1.1. Hmm.... thank you for the results though Guru.

Although noise normalization can control ringing(and metallic warbling), there are side effects.  However, it is needed especially for the low bit rate (especially q-1/-2).
I think that the feature of Vorbis is in Floor(1) encoding, Vector Quantization, and Channel Coupling. These are involved closely.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 15:12:04
Aoyumi> congrats! I hope that your work will soonly be merged in the official branch.

I starting to think about the 96 kbps test. I'd like this time to make a pool dedicated to MP3 at this bitrate. The old idea about Fraunhofer superiority at bitrate < 128 kbps is still alive, and I'd like to evaluate its validity. I didn't find any test comparing modern release of lame and modern implementation of Fhg.

My problem is: what software should I use? I have some possibilities:
- the new ACM encoder bundled with WMP10
- Nero Burning Rom
- iTunes
- Adobe Audition
- or maybe something else?


Does someone have an idea about the possible best FhG implementation?
Title: 80 kbps personal listening test (summer 2005)
Post by: dev0 on 2005-07-11 16:10:10
Quote
My problem is: what software should I use? I have some possibilities:
- the new ACM encoder bundled with WMP10
- Nero Burning Rom
- iTunes
- Adobe Audition
- or maybe something else?

Does someone have an idea about the possible best FhG implementation?[a href="index.php?act=findpost&pid=312539"][{POST_SNAPBACK}][/a]

iTunes does not use FhG, it's only identified as such by Encspot.

I'd vote for Adobe Audition, because it looks like the most configurable (using the 'Best - Current' encoder) FhG encoder.
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-11 16:21:00
Quote
I'd vote for Adobe Audition, because it looks like the most configurable (using the 'Best - Current' encoder) FhG encoder.[a href="index.php?act=findpost&pid=312561"][{POST_SNAPBACK}][/a]


I agree, but the ACM in WMP10 is more recent. I suggest a quick (only a handful of samples) listening test to select one of these.
Title: 80 kbps personal listening test (summer 2005)
Post by: rutra80 on 2005-07-11 16:44:27
Can we have listening-tests like this announced on the front-page news when they are finished, just like it was with Roberto's tests?
Title: 80 kbps personal listening test (summer 2005)
Post by: Garf on 2005-07-11 16:48:39
Quote
Can we have listening-tests like this announced on the front-page news when they are finished, just like it was with Roberto's tests?
[a href="index.php?act=findpost&pid=312569"][{POST_SNAPBACK}][/a]


IMHO roberto's tests had much more validity since they span a larger number of testers. The results of this test are entirely relying on guruboolez personal preferences, which may or may not be representative of the average person (and I suspect they are not).
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 16:54:09
Something annoys me with Audition: it's a bit expensive for the basic user. Other problem: there are various settings. Like Nero AAC, testing Audition's encoder is a lot of work. But it could be worth.

I think I'll limit the MP3 pool to four contenders (three would be ideal).


+ The encoder embedded in WMP10 will probably be tested (it's an interesting one, which could be used without any expense on Windows, which works very fast, and which could -thanks to nyaochi-  benefits from features such as gapless or direct reencoding with foobar2000).

+ LAME of course

+ Audition (maybe directly the "slow" encoder?)


Last one could be iTunes. I suppose that CBR would be better at this bitrate. Does someone experienced something else with it?
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-11 17:18:10
Quote
+ Audition (maybe directly the "slow" encoder?)[a href="index.php?act=findpost&pid=312573"][{POST_SNAPBACK}][/a]


There's no such thing as slowenc in Audition. The last versions of slowenc were MP3enc 3.1 and AudioActive 2.04j.

All three encoders in Audition are different versions of fastenc.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 17:48:41
I have three choices (they're translated in french - I'll translate them in english):

- Current (best quality)
- Legacy - high quality (slow)
- Legacy - average quality (fast)

I thought that "Legacy Slow" corresponds to the old slow encoder [indeed, the Slow encoder isn't slow, and obviously can't correspond to "slowenc". Thanks for the precision Roberto].
But what really annoys me with Audition is the defaulted settings. Lowpass at 96kbps set to 11480. I don't know the exact lowpass set by LAME at the same bitrate, but even at 80 kbps LAME lowpassed to a more confortable value (~13000). To be honest, I really believe that Audition will end last of the pool with such lowpass (except of course if another contender really sucks).
Changing the lowpass would be more pertinent, but it's a game I don't want to play with. My purpose is to evaluate the quality of current encoders, and not to tune them... If lowpass was set to 11,5 KHz by default, there's probably a reason. Any suggestion?
Title: 80 kbps personal listening test (summer 2005)
Post by: Pio2001 on 2005-07-11 17:50:49
Quote
IMHO roberto's tests had much more validity since they span a larger number of testers. [a href="index.php?act=findpost&pid=312570"][{POST_SNAPBACK}][/a]


On the other hand, they relied on much fewer samples. 20 ones, with many users only listening to half of them.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 17:59:04
Quote
Quote
IMHO roberto's tests had much more validity since they span a larger number of testers. [a href="index.php?act=findpost&pid=312570"][{POST_SNAPBACK}][/a]


On the other hand, they relied on much fewer samples. 20 one, with many users only listening to half of them.
[a href="index.php?act=findpost&pid=312586"][{POST_SNAPBACK}][/a]

12 for the 5 first tests; 18 for the 2 last ones.
Both ways have their Achilles' heel: limited by the personal subjectivity of the only tester, or limited by the number of samples tested.
And in both cases, the conducer did his best:

- I can't multiply my subjectivity
- Roberto can't force people to test 50 samples

However, I must add that all samples are online (I gave the link for my 150 classical samples, and the 35 others should be somewhere on Rarewares), and I'd like to see other people testing them.
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-11 18:34:56
Quote
and the 35 others should be somewhere on Rarewares[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=312588")


[a href="http://www.rjamorim.com/test/samples/]http://www.rjamorim.com/test/samples/[/url]
Title: 80 kbps personal listening test (summer 2005)
Post by: Zurman on 2005-07-11 20:44:34
Simply a m a z i n g  Guru, as usual 

My understanding of the results : mp3@128 is the best choice for mobile devices, no need to bother with other codecs 
(especially wma, really disappointing  )
Title: 80 kbps personal listening test (summer 2005)
Post by: rjamorim on 2005-07-11 20:55:18
Quote
mp3@128 is the best choice for mobile devices[a href="index.php?act=findpost&pid=312625"][{POST_SNAPBACK}][/a]


Dude, that's the high anchor.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 20:57:57
Quote
Simply a m a z i n g  Guru, as usual 

My understanding of the results : mp3@128 is the best choice for mobile devices, no need to bother with other codecs

If you want 128 kbps encodings for your player, vorbis and AAC are probably better than MP3.
And if you want LAME 128 kbps quality, you can probably reach it at lower bitrate (90...120 kbps) with other formats, and therefore increase the musical content of your player.

MP3 128 performed the test as anchor, not as competitor. It's here as reference.
Title: 80 kbps personal listening test (summer 2005)
Post by: a_aa on 2005-07-11 21:45:56
First: Thanks for a very interesting article, I admire your work!

Quote
[If you want 128 kbps encodings for your player, vorbis and AAC are probably better than MP3.
[a href="index.php?act=findpost&pid=312628"][{POST_SNAPBACK}][/a]


I do understand you are mentioning vorbis here, but are there really any large-scale testing to support a claim that any implementation of aac performs better than LAME at 128 kbps? Robertos multiformat test showed iTunes and LAME to be practically tied at this bitrate (both beaten by vorbis and MPC).

Problem is, everybody tells me that aac theoretically is much better than mp3, but I havent seen much reliable testing of aac implementations to substantiate this...

Got any good links?
Title: 80 kbps personal listening test (summer 2005)
Post by: Busemann on 2005-07-11 21:59:59
Quote
I do understand you are mentioning vorbis here, but are there really any large-scale testing to support a claim that any implementation of aac performs better than LAME at 128 kbps? Robertos multiformat test showed iTunes and LAME to be practically tied at this bitrate (both beaten by vorbis and MPC).

Problem is, everybody tells me that aac theoretically is much better than mp3, but I havent seen much reliable testing of aac implementations to substantiate this...
[{POST_SNAPBACK}][/a] (http://index.php?act=findpost&pid=312641")


The next 128kbps listening test will be interesting indeed. iTunes 4.2/QT6.4 was used in the last test, but one of guru's [a href="http://www.foobar2000.net/divers/tests/2004.12/aac_global_results.png]classical music tests[/url] showed that the encoder improved nicely with iTunes 4.7/QT6.5.2. How QT7.0.1 with VBR would perform is anyones guess, but it should be closer to mpc/ogg vorbis than last time.
Title: 80 kbps personal listening test (summer 2005)
Post by: Zurman on 2005-07-11 22:07:29
Quote
Quote
mp3@128 is the best choice for mobile devices[a href="index.php?act=findpost&pid=312625"][{POST_SNAPBACK}][/a]


Dude, that's the high anchor.
[a href="index.php?act=findpost&pid=312627"][{POST_SNAPBACK}][/a]

I know 

I didn't say that because it had the best results    but rather because other codecs bring more 'problems' (compatibility, possible higher battery use, ...) than quality or space gain
Title: 80 kbps personal listening test (summer 2005)
Post by: Zurman on 2005-07-11 22:08:19
Quote
Quote
Simply a m a z i n g  Guru, as usual 

My understanding of the results : mp3@128 is the best choice for mobile devices, no need to bother with other codecs

If you want 128 kbps encodings for your player, vorbis and AAC are probably better than MP3.
And if you want LAME 128 kbps quality, you can probably reach it at lower bitrate (90...120 kbps) with other formats, and therefore increase the musical content of your player.

MP3 128 performed the test as anchor, not as competitor. It's here as reference.
[a href="index.php?act=findpost&pid=312628"][{POST_SNAPBACK}][/a]

see my previous post 
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 22:10:01
About 96 kbps and MP3

Using LAME and WMP10-Fhg isn't a problem: ABR is the most suited encoding mode for LAME, and Fhg only offers CBR encoding. The situation is very different for iTunes and Audition. I don't know if VBR mode could be safely use at this bitrate with this encoders.

iTunes VBR is apparently protected against possible VBR flaws by a fixed bitrate floor: 96 kbps. It's not a joke. I've set VBR 96 at highest quality with iTunes, and the encoder is supposed to stay close the 96 kbps target without allowing any frame inferior to the target! Consequently, the average bitrate will necessary be superior to the target  In fact, the average bitrate I obtained for 150 classical short sample is 100 kbps. It's really close to the target. It implies that the VBR mode rarely go beyond 96 kbps. LAME ABR encodings have more fluctuation than iTunes VBR (at least at 96 kbps), and it could be checked with Encspot.
I've tried to evaluate with quick and unreliable comparisons what encoding mode could be considered as the best. First point: both are very close, and none revealed problems that was spared to the other. I've just note that VBR offers some improvements over CBR with some samples. Second point: they're both crap. It's much worse than what I've heard during the 80 kbps test. Obviously, none of this encoder would pass the pool.
I think I'll go for VBR.

Adobe Audition: I have many options. CBR and VBR, different encoding mode and also different sampling rate (for CBR). I would prefer the new encoder and avoid the legacy one (people agree?). VBR is probably the wisest choice. The stronger argument in favor of VBR is the lowpass (> 14000 Hz), clearly less irritating than the 11000 Hz one used by default with CBR. I've make quick comparison, and indeed, VBR encodings are immediately much more enjoying: sound is less poor, and the VBR encodings don't reveal terrible distortions as consequence of the higher lowpass (sound is far better than iTunes encoders!). VBR 30 gives me a nice 96 kbps average bitrate (luck! but I have to check the bitrate with the second group). Other good point for VBR: it doesn't allow internal 32000 KHz encoding which might help the encoder to reduce some artefact. It's something less I have to test.
My choice for Audition would be VBR, Current encoder, VBR q30 (probably: it could be precisely changed). It implies:
- 14780 lowpass
- 44100 samplingrate
- joint stereo
- intensity stereo
- CRC writing
All other options are unchecked.


The pool would look like:

- LAME 3.97a11 ABR
- Windows Media Player - Fraunhofer ACM CBR 96
- iTunes VBR 96 kbps BEST
- Audition - Fraunhofer VBR Q30

Average bitrate for the classical group will apparently be comprise between 96 kbps (WMP CBR) and 100 kbps (iTunes). Does it sound OK? Comments? Critics?
Title: 80 kbps personal listening test (summer 2005)
Post by: Busemann on 2005-07-11 22:11:43
Quote
I didn't say that because it had the best results   but rather because other codecs bring more 'problems' (compatibility, possible higher battery use, ...) than quality or space gain


Mobile phones are all AAC compatible these days so compatibility isn't an issue. For HD players there's no need to use such a low bitrate.

Quote
iTunes VBR is apparently protected against possible VBR flaws by a fixed bitrate floor


For some reason, that's how iTunes VBR works. The bitrate you select is a guaranteed minimum according to the documentation.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-11 22:26:59
Quote
but are there really any large-scale testing to support a claim that any implementation of aac performs better than LAME at 128 kbps? Robertos multiformat test showed iTunes and LAME to be practically tied at this bitrate (both beaten by vorbis and MPC).


LAME was performed with VBR mode. It's a nice improvement over CBR/ABR, and indeed LAME was very close to iTunes AAC. But the performance of VBR is dramatically inverted with classical music. See for information the Debussy sample performed in this test: notation was very low. The current low VBR settings can't handle very well quiet passage. On the other side, iTunes AAC have no problems here. I've already tested -V5 and compared it to ABR on a small classical suite (15 samples, somewhere in alphas testing thread), and indeed VBR was inferior to ABR.
Lame V5 could therefore reach a very good quality, but the results are unstable, and encodings could reveal in some situations severe artefacts you won't get with AAC. iTunes AAC is much more reliable, and could apparently handle more situations than LAME VBR.
Title: 80 kbps personal listening test (summer 2005)
Post by: Zurman on 2005-07-11 22:54:56
Quote
Quote
I didn't say that because it had the best results   but rather because other codecs bring more 'problems' (compatibility, possible higher battery use, ...) than quality or space gain


Mobile phones are all AAC compatible these days so compatibility isn't an issue. For HD players there's no need to use such a low bitrate.

Well, if 128 is too low, what would I say for 80 kbps 
Title: 80 kbps personal listening test (summer 2005)
Post by: aabxx on 2005-07-11 23:03:37
Cheers gurubolez.

I tried to do this test myself at 80 kbps comparing the aotuvb4 with the latest official one, and well... I couldn't really hear the difference between the two different encoders, and the one time i really did hear a clear difference, i couldn't decide which one actually sounded better... figures...

Oh well.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-12 00:11:46
Did you try with all samples? Making a difference isn't necessary that easy. Sometimes a good training is needed to evaluate the progress (or regressions). With aoTuV (and vorbis in general), I'd say that it is really important to focuse on usual problems: fatness, noise, HF boost... Listener should also be attentive to pre-echo and other form of artefacts.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-12 00:39:39
Well, it seems that iTunes MP3 encoder would make a perfect choice... as low anchor:

- Reference (http://guruboolez.free.fr/ENSEMBLE/E50_PERIOD_ORCHESTRAL_E_trombone_strings.ofr)
- LAME ABR (http://foobar2000.net/divers/temp/E50_PERIOD_ORCHESTRAL_E_trombone_LAME.mp3)
- iTunes VBR (http://foobar2000.net/divers/temp/E50_PERIOD_ORCHESTRAL_E_trombone_unTunes.mp3)

Title: 80 kbps personal listening test (summer 2005)
Post by: Busemann on 2005-07-12 01:24:57
Quote
- iTunes VBR (http://foobar2000.net/divers/temp/E50_PERIOD_ORCHESTRAL_E_trombone_unTunes.mp3)


Whoa! Sounds like a badly tuned radio 
Title: 80 kbps personal listening test (summer 2005)
Post by: moozooh on 2005-07-12 02:10:33
Quote
AAC-HE screeching artefacts are probably more acceptable when compared to other form of distortions audible with non-SBR products
[a href="index.php?act=findpost&pid=312422"][{POST_SNAPBACK}][/a]

I guess it's just a matter of taste… Personally, I can't stand that awful SBR sound. It just beats the high frequencies to a violent death, as I can hear it.

Edit: BTW, what else do we need to make the merged 1.1.1+b4 version the recommended one? It seems that it is at least not inferior to 1.1…
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-12 02:27:41
Quote
Edit: BTW, what else do we need to make the merged 1.1.1+b4 version the recommended one?


Probably additional listening tests, or maybe calling into question the way recommendations are done.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-12 17:16:46
Back to the upcoming MP3 96 kbps Pool.

I have encoded the second group of 35 various samples, and bitrate was significantly higher than the average one obtained with the classical group.

iTunes classical = 100 kbps
iTunes various = 104 kbps
=> with iTunes, the bitrate for the second group is within the +/- 10% tolerence I've fixed. VBR is possible I'd say

Audition q30 classical = 96 kbps
Audition q30 various = 112 kbps
=> with Audition, the bitrate for the second group is 20% higher than the target bitrate. Even if I admit that the average bitrate for these 35 short samples doesn't entirely correspond the the average bitrate of full albums, it's clearly too high.


As a consequence, I've lowered the setting to VBR q20. Lowpass is automatically adjusted, but doesn't significantly drop (from 14780 to 14440 Hz). Bitrate:
Audition q20 classical = 89 kbps
Audition q20 various = 102 kbps
=> bitrate is now within the acceptable range for both group. However, bitrate for classical now reaches the critical limit of 87 kbps (96kbps -10%) and is not fully comparable with the bitrate obtained with iTunes (100 kbps).

Then I've tried VBR q25. It's a manual preset, and there's no defaulted lowpass value for manual VBR setting. I've therefore choose 14600 Hz. Bitrate:
Audition q25 classical = 92 kbps
Audition q25 various = 107 kbps
=> excessive deviation with the second group (+11...12%)


At this stage, I have four possibilities:

1/ Using Q20: bitrate is OK for group2, bitrate is OK for group1 but too low when compared to iTunes at 100 kbps.
2/ Using Q25: bitrate is OK for classical, but is too high for group2
3/ Trying Q22..23 in order to obtain an unlikely better compromise
=> in all cases, the selected setting can't be the optimal one for none of both musical category.

4/ Using two different settings for the two different groups.
To me, this possibility makes sense. As someone planning to encode classical only, I won't choose anything else than VBR Q30 which match the desired bitrate. If someone plan to encode something different, he won't probably happy with Q30 (~110 kbps) and will certainly go for Q20, and even maybe a slightly lower setting.
The dual bitrate problem will also occur with other listening tests. All VBR encoders can't output the same bitrate with different kind of samples. It can be experienced with faac, Nero AAC, Lame MP3, Fhg MP3, MPC, WMA9 and WMA9Pro. In all case, I will have to make compromises which probably not correspond to the users' choices. Using two different settings (each one corresponding to the rational choice of someone listening to either "various music" [yes, the concept sucks] or "classical music"].


I could also play a dangerous game: testing iTunes and Audition VBR encodings at an excessive bitrate, cross my finger and hope to see LAME win. The scenario is possible I'd say. iTunes has clearly no chance to pass the pool even with a winning bitrate; but I'm less confident with a contender such as Audition.



I feel that solution 2 (Q25 for both group) and solution 4 (Q20 for various, Q30 for classical) are the two more pertinent. What would be the best in your opinion?
Title: 80 kbps personal listening test (summer 2005)
Post by: sehested on 2005-07-12 20:02:38
Quote
I feel that solution 2 (Q25 for both group) and solution 4 (Q20 for various, Q30 for classical) are the two more pertinent. What would be the best in your opinion?
[a href="index.php?act=findpost&pid=312826"][{POST_SNAPBACK}][/a]
I think you should go for solution 4. 

First and foremost because you are doing the listening tests to satisfy your own curiosity regarding the various codecs performance for the music you love. Never choose a solution that does not make you happy - or motivation might slowly go away. 

Secondly because your listening test are helping put classical music on the agenda when comparing the performance of the various codecs. I beleive you are doing classical listeners a big favour with your insight and very qualified comments.

Thirdly because the "various music" category is "just" an added bonus to the main objectives of your listening tests, although interesting to follow. 

Hopefully someone will be inspired by your work and create samples of different pop / rock / jazz instruments, like different guitars, drums, cymbals, electronic organs etc. from first class recordings.
Title: 80 kbps personal listening test (summer 2005)
Post by: a_aa on 2005-07-13 06:19:36
Quote
I feel that solution 2 (Q25 for both group) and solution 4 (Q20 for various, Q30 for classical) are the two more pertinent. What would be the best in your opinion?
[a href="index.php?act=findpost&pid=312826"][{POST_SNAPBACK}][/a]


The question is: Do you want to compare encoders at a given bitrate, or do you want to compare encoders using given settings?    In the first case, solution 4 will come closest to the intention, in the second case, solution 2 would be better.

Personally I think that there is most sence in comparing at a given bitrate. In a perfect world, this would actually mean that you would have to find the perfect VBR setting for each sample to get the perfect bitrate - however, this would generate a lot of work previous to the testing (unless it could be done via a program/script?), and would not be consistent with normal use or be very helpful for the normal user. I therefore find the idea of finding the perfect VBR setting for each group of samples to be a good compromise.

Conclusion: I'm a supporter of solution 4!

Edit: In brackets
Title: 80 kbps personal listening test (summer 2005)
Post by: sTisTi on 2005-07-13 14:38:11
Quote
[Personally I think that there is most sence in comparing at a given bitrate. In a perfect world, this would actually mean that you would have to find the perfect VBR setting for each sample to get the perfect bitrate - however, this would generate a lot of work previous to the testing (unless it could be done via a program/script?), and would not be consistent with normal use or be very helpful for the normal user. I therefore find the idea of finding the perfect VBR setting for each group of samples to be a good compromise.

Conclusion: I'm a supporter of solution 4!
[a href="index.php?act=findpost&pid=312989"][{POST_SNAPBACK}][/a]

I am not too happy with solution 4 - after all, VBR modes target a certain quality, and not a certain bitrate. If the VBR mode allocates too little bits for certain samples and maybe thereby creates artefacts, it is a fault of the encoder/psymodel and should be treated and evaluated as such.
I know that there is no perfect solution to this problem, but I think combining the two sample sets and calculating the average bit rate of all samples and use this for selecting the VBR mode might be the least bad solution. It would be even less bad if the number of "classical" and "various" samples would be approximately equal
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-13 15:25:48
Quote
I am not too happy with solution 4 - after all, VBR modes target a certain quality, and not a certain bitrate.

That's true. But everyone is counting on an approximate bitrate with every VBR setting. MPC --standard is something like ~180 kbps, MP3 --standard close to ~200 kbps, etc... We are all using this kind of correspondence, for our own purpose or for recommendation (someone using CBR 192 is suggested to use VBR --standard instead).
In other words, we're all thinking in term of bitrate.


Quote
If the VBR mode allocates too little bits for certain samples and maybe thereby creates artefacts, it is a fault of the encoder/psymodel and should be treated and evaluated as such.


Why "too little"? The fact is that bitrate is very different from one group to another. Here, classical needs less bitrate than 'various'. Therefore, someone listening to classical and looking for a VBR setting which offers 96 kbps would be tempted to use Q30, and not Q20, not Q25. He will use the VBR preset matching with the target.
But this setting, working well with classical, won't work with other kind of music. People looking for 96 kbps with various won't use Q30 (-> 110 kbps), Q25 neither (105 kbps). There's no point for both kind of listener to make a compromise, as long as they don't mix both kind of music.

Quote
I know that there is no perfect solution to this problem, but I think combining the two sample sets and calculating the average bit rate of all samples and use this for selecting the VBR mode might be the least bad solution.


I understand the logic, but does it really correspond to a real usage?
I'm tempted to make an analogy with video encoding. Take XviD as example. There's a dedicated mode to improve the quality with cartoons movies. If you want to test the quality of XviD with both movies and cartoons, it would be senseless to use one and only one parameter for both kind of movies. It would also be surprising if the tester will try to find an unlikely hybrid settings: he will certanly lower the quality for both genres, and then handicap the contender with this compromise. If I remember correctly, Doom9 has adapted the encoder's parameters according to the kind of movie. Futurama was encoded with cartoon mode, but not Matrix. Did people complaint? I don't know.

Contrary to previous collective tests, I'm using a wide gallery of samples. Wide enough to make a distinction between both kind of music. The second group is maybe a "bonus" (I said it in my first post), but I don't mean by "bonus" something minor that should be neglected. I'd like to test the reaction of various encoders with different kind of music, and see if some of these encoders are unbalanced in favor of either classical or either something more popular. From my own experience, some renowned encoders have based their reputation on one specific kind of music or sample - and one only. Problem is, that these reputed encoders are sometimes suggested to people listening to something totally different. LAME -V5 is as example working very well with many kind of music, but with classical at least, it's not trustable.
That's why I'm very interested to test two separate kind of samples. And I'm more and more convinced that testing both category with one setting is 1/ not optimal 2/ won't correspond to the real usage of possible listeners.

For example, with AAC at 128 kbps, it will be impossible for me to test Nero's VBR internet (which appeared to be the best AAC solution on classical on a previous test I made last december). Why: bitrate is ~140 kbps. WIth the seond group, the average bitrate don't have this problem. By discarding VBR for bitrate issues with one group, I'll be force to use CBR with both group, and I let you imagine the reaction of many people, which will probably shout about the usage of unoptimal setting, etc...

Some goes for MPC. If --radio seems to be close to 128 kbps with te second group, it's not the case for classical. I could try to find a average setting, and at the end I'll be force to use something comprise within the --thumb profile. I let you also imagine the reaction of some people (a couple of names immediately come into my mind...).

For all these reasons, it looks preferable for me to evaluate both group as independent one. You probably noticed that I didn't proposed any mixed results (http://foobar2000.net/divers/tests/2005.07/80/80TEST_PLOTS_06.png) for the final test, and let both category totally independant.

Quote
It would be even less bad if the number of "classical" and "various" samples would be approximately equal

I don't have material enough to build a coherent gallery similar to the classical one. And I don't plan to restrict the amount of tested situations with classical. I can't solve the imbalance of both categories, unless someone plan to build something similar with 'various' music.
Title: 80 kbps personal listening test (summer 2005)
Post by: a_aa on 2005-07-13 15:54:44
Quote
I am not too happy with solution 4 - after all, VBR modes target a certain quality, and not a certain bitrate. If the VBR mode allocates too little bits for certain samples and maybe thereby creates artefacts, it is a fault of the encoder/psymodel and should be treated and evaluated as such.
I know that there is no perfect solution to this problem, but I think combining the two sample sets and calculating the average bit rate of all samples and use this for selecting the VBR mode might be the least bad solution. It would be even less bad if the number of "classical" and "various" samples would be approximately equal
[a href="index.php?act=findpost&pid=313060"][{POST_SNAPBACK}][/a]

The objective of the testing is essential. I assume that you, in principle, want to see a comparison of encoders using given settings. (As guruboolez says, you probably have an idea about bitrates, anyway - you wouldn't throw in LAME --aps in this test, right...?)

In that case, the result of the testing will not be sufficent to answer the question: "If I'm willing to spend 96 kbps of my flash-space to store store this track, which encoder would probably give me the best sound on that particual space?" The "perfect-world-solution" could have answered that...  On the other hand, if the winner was a VBR-encoder/-setting, this solution would have left it up to each user to find the setting on the winning encoder that actually produced a 96 kbps file out of their particular track. 

Have you seen those TV-shows where people are invited to make the best dinner possible for a small amount of money, lets say 10$? I really it hate when someone brings ingredients for 10.98$...
Title: 80 kbps personal listening test (summer 2005)
Post by: sTisTi on 2005-07-13 17:45:24
@guruboolez & a_aa:
I appreciate your arguments. Maybe I'm too much of a VBR enthusiast concerning my encoding habits. If I want a certain quality level, I just use whatever preset or quality setting corresponds to this level and am not too concerned about the resulting bitrates, as long as they roughly are in line with what I have in mind. I find that in practice a few KB more or less are irrelevant, unless you are restricted by a really small flash player etc.
But as I said, there is no perfect solution with regard to a listening test that has a certain bitrate as its target. Whatever you choose, people will complain

BTW, I find the analogy with XviD does not really hold. AFAIK, its cartoon mode does not use more bits per se, it's just more tuned for cartoons, which are very different from "normal" movies with regard to the demands they pose to an encoder. So you get better quality without increasing the bitrate. Using cartoon mode for normal movies would probably be disastrous. It's like tuning an encoder especially for classical music in a way that would probably decrease its performance if used with other music.
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-13 18:11:44
Quote
(...) am not too concerned about the resulting bitrates, as long as they roughly are in line with what I have in mind.


I've fixed a tolerence margin at the beginning of the test.
I agree with you: as user, I won't really be annoyed if the final bitrate deviate a bit too much from what I've expected. But as tester, the situation is really problematic. Testing something at 142 kbps (IIRC the bitrate for Nero AAC VBR with classical samples) and comparing it with another contender at 124 kbps (IIRC bitrate for WMA9pro VBR) is irrelevant. It's like comparing a movie encoded at 700 MB to another one encoded at 500 MB. The comparison is not necessary unintersting (at 500 MB, some encoders might perform as well as another one with more bitrate), but in most case it won't answer to the question: at a given bitrate, could we say that encoder x is the better or not than encoder y.

Quote
Whatever you choose, people will complain

I agree, but I'm trying to find the most diplomatic solution. For the 80 kbps I heard no complaint. But with further test, I really fear that I can't find the ideal compromize.

Quote
BTW, I find the analogy with XviD does not really hold. AFAIK, its cartoon mode does not use more bits per se, it's just more tuned for cartoons

Right, but in both case of my analogy, we have a tester who tried to adapt the setting to the content. It's not usual. For the past collective test one single setting was used for all samples. It's probably what I would do if my tests were limited to ~15 samples. But here, it is obvious that VBR encoders doesn't react the same with all samples, and that the classical group and the various group could or maybe should be divided in two different categories, with adapted (and optimal) settings for both of them. It looks more pertinent, as user as well as tester.
Title: 80 kbps personal listening test (summer 2005)
Post by: sehested on 2005-07-13 20:50:43
BTW Guruboolez, what is you listening set-up?

Sound card, speakers / headphones.
Title: 80 kbps personal listening test (summer 2005)
Post by: moozooh on 2005-07-14 00:25:08
Quote
BTW Guruboolez, what is you listening set-up?

Sound card, speakers / headphones.
[a href="index.php?act=findpost&pid=313117"][{POST_SNAPBACK}][/a]

I hope that's not Lynx Two with Sennheiser Orpheus or smthng.
[span style='font-size:8pt;line-height:100%']Just kidding though. [/span]
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-14 01:41:04
I used a Creative Audigy2, a Beyerdynamic DT-531 headphone and between them, a basic Onkyo amp. Nothing unusual, but the headphone (~120 euros) plays a very important role.
Title: 80 kbps personal listening test (summer 2005)
Post by: Cygnus X1 on 2005-07-14 02:28:46
Guru, I'm assuming that since you encoded the QT AAC sample through iTunes, the resulting sample rate was 44.1kHz? I ask because it might be interesting to see if there is any improvement when encoding directly in QT, which would let you resample to 32kHz. There might be a little more pre-echo as a result, but at this bitrate, it's all a trade off anyway....

Excellent investigation, as usual 
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-14 02:38:00
Quote
I ask because it might be interesting to see if there is any improvement when encoding directly in QT, which would let you resample to 32kHz.

Oh yes, it would be interesting... but Apple iTunes doesn't allow 32 KHz resampling at this bitrate (tested on iTunes 4.9, Win32). I must add that it surprised me.
Title: 80 kbps personal listening test (summer 2005)
Post by: aspifox on 2005-07-14 10:12:25
Resampling is a different ballgame -- I'd pre-resample the samples with ssrc if resampling is going to be a factor.  Oggenc's built-in resampling, for example, is terrible, while on the flip-side I find that a ssrc'd 45kpbs 22khz vorbis file with the lowpass lifted will very usually sound rather better (and have fewer killer artifacts) than a 45kpbs 44.1khz vorbis file (personal, double-blinded, very conspicuous).  80kbps+ might be a different matter, where I don't venture much.
Title: 80 kbps personal listening test (summer 2005)
Post by: Madman2003 on 2005-07-14 10:39:51
Why do you use a creative audigy and not an emu card or an envy24 solution that doesn't resample?
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-14 11:40:56
I have a Terratec DMX6fire 24/96, and the annoyance of audible clicks and pops is -far- higher than the annoyance of inaudible resampling process
Title: 80 kbps personal listening test (summer 2005)
Post by: QuantumKnot on 2005-07-16 02:14:06
Quote
Quote
Edit: BTW, what else do we need to make the merged 1.1.1+b4 version the recommended one?


Probably additional listening tests, or maybe calling into question the way recommendations are done.
[a href="index.php?act=findpost&pid=312704"][{POST_SNAPBACK}][/a]


I'm open to suggestions.  In the beginning, we didn't even have a recommended vorbis encoder thread, so I put one together very quickly at the time.  Now that vorbis quality is starting to take some of the limelight, I think it is time to work out a systematic way of doing this.
Title: 80 kbps personal listening test (summer 2005)
Post by: HotshotGG on 2005-07-16 04:26:09
Quote
I'm open to suggestions. In the beginning, we didn't even have a recommended vorbis encoder thread, so I put one together very quickly at the time. Now that vorbis quality is starting to take some of the limelight, I think it is time to work out a systematic way of doing this.


I honestly think there should also be page for it in the wiki to as well as the thread.  The thread has had a significant impact though and applaud you for taking the time to put it together. What's important is that these changes in the encoder should merged into the next official Vorbis branch much like 1.1. That should first and foremost remain recommended encoder. All of the other binares should remain around here though for people who are interested in the latest bleeding edge changes or seeking an alternative 
Title: 80 kbps personal listening test (summer 2005)
Post by: guruboolez on 2005-07-17 14:29:23
Quote
Quote
(...) or maybe calling into question the way recommendations are done.
[a href="index.php?act=findpost&pid=312704"][{POST_SNAPBACK}][/a]


I'm open to suggestions.  In the beginning, we didn't even have a recommended vorbis encoder thread (...)


I suggest that we (note that I say "we" and not "you") don't recommend any specific encoder. There are many possible reasons to not recommend a specific encoder, but the less acceptable one is the one linked to the lack of listening tests. We have the chance to have people working hours and hours to improve the quality of an encoder (it's not necessary the case), and we can't tell them "bravo, but sorry we don't have any feedback to recommend your work".
If the community can't spent one hour of free time to make a listening test, the community don't have the right to give recommendation based on missing knowledge. It's just a sign of respect and the minimum we can do for the creative and innovative persons working for thousands or even millions people.

The recommendation should therefore be limited to explain the use of the encoder:
• why users shouldn't tweak the default profiles
• how people could solve some issue (pre-echo with --ITP as example)
• explain all known problems with the encoder (specific artifacts)
• why VBR and not ABR; tips to avoid playback problems with some flash players
....

There's no need to recommend an encoder, especially older one over newer and not-tested ones (I don't like the expression "non-tested": it supposes that developers are throwing their work to the public without bother to test it themsleves — and it also supposes that the lack of report is a sign of lack of tests; I'd rather say: no news = good news ).
Couldn't we simply precise that:

• 1.1.1 is the latest CVS version, that quality is essentially based on the work performed by Aoyumi for aoTuVb2, and that this version is obviously not the most advanced version of vorbis?
• aoTuV b4 is the latest version, and that this encoder solves x, y and z issues or bug, and consequently should offer the best results?
• there are also other encoders, {slightly} outdated now, like GT2/GT3, Megamix, QK, ModestTuning and give of short description of them
?


I know that people like and request explicit answers to their problem: they want THE best encoder and THE transparent setting. As someone starting to have a good experience of multi-codec/multi-format blind listening comparisons, I can say that such encoders and such settings don't and probably won't exist. aoTuV b4, lame 3.97, Nero AAC... could outperform all their competitors, there are still problematic samples for them, some problems audible with the most advanced encoders but not audible with less recommendable encoders.
Title: 80 kbps personal listening test (summer 2005)
Post by: HbG on 2005-07-17 16:00:37
You can still say one encoder is preferable over the other for the majority of uses and the majority of genres. Just like you can say a certain bitrate is transparent to the majority of people on the majority of samples. Everyone with some knowledge will know such recommendations aren't absolute. Everyone else will be happy to follow them and feel assured they're getting the best out of Vorbis. Not recommending one but instead point out how the encoders should be used and the relative differences between them will, even if it is more correct, only be seen as vague.

I beleive keeping things simple for the average user, with one encoder to use, and no more settings than the bitrate, will help spread Vorbis more than the posibility of the recomendations being incorrect will hold it back.

I'm relatively new to this forum and things like ABX'ing, but if someone organises a testing effort for aoTuV b4 i'd be happy to participate. If for nothing else, just because Aoyumi deserves to see some community response to his work.
Title: 80 kbps personal listening test (summer 2005)
Post by: a_aa on 2005-07-17 19:27:28
If this is a discussion on principles, it is not an easy one. I have few answers, but a lot of questions...

Is it not a recommandtation to precise that encoder N solves x, y and z issues or bug, and consequently should offer the best result? Is it OK to do this without comparitive testings to back it up? Isn't the fact that there will be more or less problematic samples for any encoder an argument to perform testings on different types of music and compare to other/older encoders - and would this be a sign of disrespect of developers or a sign of conservative thoroughness?

I agree that there is a good idea to have some kind of factbox on each encoder, almost like it has been done for lossless encoders. Unlike lossless encoders, where sound quality is pr def 100% for all the encoders, sound quality would be the key factor to distinguish between good and less good lossy encoders - would it be possible to show sound quality in a factbox? The problem will be how much one should to rely on theoretical improvements vs tested/verified improvements. To much emphasis on the latter can result in few users of an superior encoder (because of the waiting for test results), and to much emphasis on the first can lead to the embracement of an encoder which can't handle real life...

Mercedes-Benz launched their A-series some years ago, and they were very pleased with their development process. They had cut down on expensive real-life-testing, and instead used more computer simulation etc - problems arose when a swedish magazine performed the so-called "moose-test", in wich the car should do a fast manouvre to avoid crashing into a moose on the road. The manouvre resulted in a car up side down  . Not just once or twice, it was more of a rule than an exception. Basically - (insufficient) theory was beaten by reality...

The choice of this analogy may be an indication of where I'm heading - I don't know yet...
Title: 80 kbps personal listening test (summer 2005)
Post by: pepoluan on 2006-01-19 17:51:40
This test has been linked to from the HA Wiki page on Listening Tests (http://wiki.hydrogenaudio.org/index.php?title=Listening_Tests).

OT: To whoever in charge of the HA Wiki Main Page (http://wiki.hydrogenaudio.org/index.php?title=Main_Page), is it possible a link to the Listening Tests page be added?
Title: 80 kbps personal listening test (summer 2005)
Post by: Jan S. on 2006-01-19 18:51:33
Quote
This test has been linked to from the HA Wiki page on Listening Tests (http://wiki.hydrogenaudio.org/index.php?title=Listening_Tests).

OT: To whoever in charge of the HA Wiki Main Page (http://wiki.hydrogenaudio.org/index.php?title=Main_Page), is it possible a link to the Listening Tests page be added?
[a href="index.php?act=findpost&pid=358328"][{POST_SNAPBACK}][/a]

Done.
Title: 80 kbps personal listening test (summer 2005)
Post by: pepoluan on 2006-01-23 19:16:14
Quote
Quote
This test has been linked to from the HA Wiki page on Listening Tests (http://wiki.hydrogenaudio.org/index.php?title=Listening_Tests).

OT: To whoever in charge of the HA Wiki Main Page (http://wiki.hydrogenaudio.org/index.php?title=Main_Page), is it possible a link to the Listening Tests page be added?
[a href="index.php?act=findpost&pid=358328"][{POST_SNAPBACK}][/a]

Done.
[a href="index.php?act=findpost&pid=358340"][{POST_SNAPBACK}][/a]

Uh... in the process I think you inadvertently killed the spacing between the "Downloads" section and the "Tests" section...