HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: ezlez on 2010-03-10 08:28:13

Title: *Improved* Multi Codec Listening Test Plans
Post by: ezlez on 2010-03-10 08:28:13
Table of Contents
•   Summary
•   Reasoning
•   Issues
•   Samples
•   Proposed Codec Settings


Summary:

Within the following weeks, I will conduct a listening test with several individuals using ABC/HR for Java 0.53a amongst different lossy codecs that use VBR. The goal of these listening tests is to evaluate different codecs within an equal range of quality settings to try to figure what codec performs better than others at compression within these different quality settings. The range of these quality settings span from a low setting (around 64 kbps for most codecs) to a point where each codec reaches transparency, based upon what people have recommended in the forums.  At the moment, the plans are somewhat complete since I’m having issues with certain codecs and samples. Here are the following codecs:

•   MP3: LAME 3.98.2 (Encoder) through Audacity 1.3.11 Beta (Frontend Software)
•   Ogg Vorbis: aoTuV beta 5.7 (Encoder) though OggdropXPd 1.9.0 (Frontend Software)
•   AAC: Nero 1.5.1 (Encoder) through foobar2000 1.0 (Frontend Software), QuickTime/Apple (Encoder) through iTunes 9.0.3.15
•   Windows Media Audio: Standard 9 & Pro 10 Versions through Windows Media Player 11 (Encoder),
•   Musepack: Musepack SV8 (Encoder) through foobar2000 1.0 (Frontend Software)

So far there will be about seven individuals taking the listening tests, and each will be assigned their own codec to test.

Reasoning

The aim of these tests is to evaluate different quality settings instead of arbitrarily chosen bit rates. But then again I don’t exactly know where to adjust each quality setting for the tests. I chose 5 quality settings per codec (excluding MPC) as a rough draft; a couple of settings above or below 5 is alright, but I want to try to evaluate the codecs within an equal range of settings. I also want to keep the number of tests/samples for each setting at/or at least 12 to yield meaningful results. Multiple sources claim that good results tend to come within 12 – 20 samples.  I'm using Roberto Amorim's/rjamorim's listening test document on ABC/HR Testing as a guide to the tests that I will conduct (http://www.rarewares.org/rja/ListeningTest.pdf). The reason why I chose music samples is because I don’t really know much about using plain audio samples, and that I’ve seen many ABC/HR tests done with music samples. The subjects taking the tests have had experience regarding music and acoustics; most of them are teachers and complete music enthusiasts. Before any of them begins the actual testing, I’m going to demonstrate to them what sort of artifacts there are and how to detect them using portions ff123’s Artifact Training Page, which can be found at http://ff123.net/training/training.html (http://ff123.net/training/training.html)

Issues

My main issue regarding the tests are what quality levels should each codec be set upon to be within the same range altogether. I want to test each codec within a range of low quality to transparent quality settings, but they don’t necessarily have to be within 5 settings. For instance, Musepack is used to compress music within transparent and high quality settings, which means it’s not optimized for lower compression. For Musepack, encoding usually starts at a quality setting of “Q3”, and usually reaches transparency at “Q5”. So for the Musepack portion of the listening tests, I’m planning to conduct testing on three quality settings (Q3, Q4, Q5), while still producing significant data for that codec.

Here is a list of issues that I’m currently facing with the codecs
MP3 (LAME):
1.   On which quality settings should be tested: V9(45-85 kbps) to V4(145–185 kbps) [6 settings) or V8(65–105 kbps) to V4(145–185 kbps) [5 settings]
2.   Variable Speed: Fast or Standard?
3.   Channel Mode: Joint Stereo or Stereo?
4.   Is the new release of LAME 3.98.3 available for most frontends (like Audacity or foobar2000 1.0)?

Ogg Vorbis (aoTuV beta 5.7)
1.   On which quality settings should be tested: q 0.0 (~64 kbps) to q 4.0 (~128 kbps) [5 settings] or q 0.0 (~64 kbps) to q 5.0 (~160 kbps) [6 Settings]?
2.   Any advanced encoder options I should use (On OggdropXPd)?

Nero AAC 1.5.1
1.   What should the highest quality setting be? Different people argue that Nero AAC reaches transparency at q 0.45 while others argue that it reaches transparency at q 0.50
2.   What should the lowest quality setting be? At the moment, I’m opting for q 0 .25 ( ~63 kbps)
3.   How many quality settings should I test?

Apple/iTunes AAC
With the iTunes AAC encoder, there aren’t really quality settings. You’re given a choice of: Adjusting the Stereo Bit Rate amongst 17 settings (16 -320 kbps). A choice of Sampling Rates between 10 settings (8.0 kHz – 48 kHz, and “Auto”). A choice of 3 Channels (Auto, Mono, & Stereo). And options to use VBR, High Efficiency, and voice optimization.
1.   How many bit rate settings should I choose? Some argue that transparency is usually at 128 kbps
2.   What settings should I use for the “Sample Rate” and “Channels” Options?
3.   Should I consider using high efficiency?
4.   Should I consider using QuickTime Pro instead of iTunes 9? I haven’t bought it yet.

WMA Standard
1.   Big issue: Should I test either WMA Standard with CBR or with VBR? At the moment, I’m trying to convince someone to see if they could test WMA Std with CBR (Someone is scheduled to test WMA Std with VBR at the moment).
2.   Both WMA Std with CBR and VBR don’t offer a lot of quality settings. How many settings should I set? For that matter, when does WMA standard (either CBR/VBR) reach transparency

WMA Pro 10
1.   At which settings does WMA Pro reach transparency?
2.   Number of settings?
3.   What should be considered a low setting?

Musepack
1.   Are three quality settings with 18 samples per setting good enough? (Q3 – Q5)

Samples

There will be samples from 6 different genres of music, and 2 samples for each genre, each equaling 20 seconds. Each song will be ripped from CDs Here are the following genres
•   Classical
•   Jazz
•   Rap/Hip Hop
•   Country
•   Rock
•   Pop
I will post a complete list of the music samples within a couple of days


Proposed Codec Settings
MP3



Ogg Vorbis



AAC
   





Windows Media Audio (Encoder: Windows Media Player 11)








Musepack (SV8)



As I stated before, I value any criticism and advice regarding the plans

Title: *Improved* Multi Codec Listening Test Plans
Post by: Larson on 2010-03-10 08:32:34
you could use lame mp3 3.98.3 which was released recently,Nero AAC 1.5.4 (bugfix updates) and Apple AAC true vbr through qtaacenc by the great nao!
Title: *Improved* Multi Codec Listening Test Plans
Post by: ojdo on 2010-03-10 09:33:08
MP3 (LAME):
3.   Channel Mode: Joint Stereo or Stereo?

AFAIK Joint Stereo is more efficient at a given bitrate, as the coding scheme can exploit similarities between L and R channel. So I would recommend to use (the default setting) joint stereo instead of separate channels.


One more general question: You write in the summary that you want to find out
Quote
what codec performs better than others at compression within these different quality settings
but then you write that there are
Quote
seven individuals taking the listening tests, and each will be assigned their own codec to test.


I'm neither an expert for listening tests nor have I conducted one myself yet, but I fear you won't be able to compare the quality/transparency of different codecs if each of them was judged by different individuals only. To draw such a conclusion it would be better let ABC/HR each individual a set of samples encoded with different codecs at (bitrate-wise) similar settings.
Title: *Improved* Multi Codec Listening Test Plans
Post by: dv1989 on 2010-03-10 10:02:06
Quote
MP3 (LAME):
2. Variable Speed: Fast or Standard?
3. Channel Mode: Joint Stereo or Stereo?
4. Is the new release of LAME 3.98.3 available for most frontends (like Audacity or foobar2000 1.0)?

That you must ask such basic questions makes me wonder whether or not you're ready to undertake such an apparently ambitious test. Fast is default and recommended, as is joint stereo. The newest version works exactly like previous ones.
Title: *Improved* Multi Codec Listening Test Plans
Post by: googlebot on 2010-03-10 13:03:57
You should test at equal (average) bitrates. Else the results are meaningless and just a function of the initially chosen bitrate proportions. From a user perspective it doesn't make sense, anyway, that codec X at 140 kbit/s outperforms Y at 128 kbit/s. The opposite it doesn't, either. What matters is which is best for a chosen average rate for a specific content.

Your plan is ill with complexity. You must reduce it. Just do one proper ABX of Nero q .41 vs. iTunes AAC CVBR 128, and measure the time you need to come to a conclusive result. Multiply that by the 100's if not 1000's of singular comparisons your new plan necessarily involves.
Title: *Improved* Multi Codec Listening Test Plans
Post by: stephanV on 2010-03-10 13:09:57
I'm sorry to see that you haven't done anything with criticisms you received on your previous posts.
Title: *Improved* Multi Codec Listening Test Plans
Post by: Alexxander on 2010-03-10 14:04:47
Quote
So far there will be about seven individuals taking the listening tests, and each will be assigned their own codec to test.

Do you really mean each codec is assigned to only one individual? If so, think deep about what you're doing.
Title: *Improved* Multi Codec Listening Test Plans
Post by: timcupery on 2010-03-10 15:07:14
why don't you use foobar2000 as frontend software for all the codecs? (except WMA of course)
not that it's a big deal, but you can use foobar2000 for mp3 and ogg vorbis as well as for AAC and musepack. streamlines the process.

Quote
MP3 (LAME):
2. Variable Speed: Fast or Standard?
3. Channel Mode: Joint Stereo or Stereo?
4. Is the new release of LAME 3.98.3 available for most frontends (like Audacity or foobar2000 1.0)?

That you must ask such basic questions makes me wonder whether or not you're ready to undertake such an apparently ambitious test. Fast is default and recommended, as is joint stereo. The newest version works exactly like previous ones.

this is very true. I'll chime in on the general "back to school before the drawing board" sentiment.

but take heart. once you learn what you're doing, such a listening test would be a good thing.
Title: *Improved* Multi Codec Listening Test Plans
Post by: dv1989 on 2010-03-10 15:15:26
I made my point after skim-reading the initial post, so it's rather basic and only superficially related to methodology (which I'm not qualified to comment on), but several criticisms raised in this and previous topics by other users do seem to be fairly glaring.
Title: *Improved* Multi Codec Listening Test Plans
Post by: ezlez on 2010-03-11 01:13:58
I've now decided that I should test each subject with the same amount and types of codecs. So now I need to overhaul the planning of the testing to produce significant results. At the moment, I've made two different plans for the listening tests
Title: *Improved* Multi Codec Listening Test Plans
Post by: а.п.т. on 2010-03-11 12:46:36
Musepack (SV8)
  • Encoder: Musepack Encoder (MPC) through foobar2000
  • VBR
  • 18 Samples per Quality Setting (54 Tests)

  • Q3 (~90 kbps)
  • Q4 (~128 kbps)
  • Q5 (~170 kbps)


As I stated before, I value any criticism and advice regarding the plans



I believe, that testing of Musepack at bitrates below Q4 (it should be --quality 4, btw) to find the transparency point is pointless, it is not tuned at all for such bitrates. For me it becomes transparent at about Q5.5 (although once I needed Q6), so if you want 5 steps for musepack as well, I would suggest you