Print Page - Personal Listening Test of Opus, Celt, AAC at 75-100kbps

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-17 08:25:55

Abstract:
Blind Comparison between 2012/09 new Opusenc(tfsel5), old Celtenc 0.11.2, Apple AAC-LC tvbr, cvbr.
This is an English version of my original post in Japanese. http://d.hatena.ne.jp/kamedo2/20121116/1353099244#seemore (http://d.hatena.ne.jp/kamedo2/20121116/1353099244#seemore)

Encoders:
libopus 0.9.11-146-gdc4f83b-exp_analysis
https://people.xiph.org/~greg/opus-tools_exp_dc4f83be.zip (https://people.xiph.org/~greg/opus-tools_exp_dc4f83be.zip)
celt-0.11.2-win32
https://people.xiph.org/~greg/celt-0.11.2-win32.zip (https://people.xiph.org/~greg/celt-0.11.2-win32.zip)
qaac 1.40
qaac 1.40

Settings:
opusenc --bitrate 66 input.wav output.wav
celtenc input.48k.raw --bitrate 75 --comp 10 output.wav
qaac --cvbr 72 -o output.m4a input.wav
qaac --tvbr 27 -o output.m4a input.wav
opusenc --bitrate 90 input.wav output.wav
celtenc input.48k.raw --bitrate 100 --comp 10 output.wav
qaac --cvbr 96 -o output.m4a input.wav
qaac --tvbr 45 -o output.m4a input.wav

Samples:
20 Sounds of various genres, from easy to modestly critical.
http://zak.s206.xrea.com/bitratetest/main.htm (http://zak.s206.xrea.com/bitratetest/main.htm)
To download, access to the link above, 2nd paragraph, 3rd-6th links. (40_30sec - Run up)

Hardwares:
Sony PSP-3000 + RP-HT560(1st) , RP-HJE150(2nd), took the average of the two results.

Results:
(http://i45.tinypic.com/imkosh.png)

(http://i46.tinypic.com/9vf75t.png)

Conclusions & Observations:
I could not detect a significant improvement in the new September 1st version of Opus, from the old Celtenc in 2011.
It's possibly because the new Opus inflates bitrates more than it improves qualities, although the set of sounds contain easy samples.
On 75kbps, Opus/Celt are markedly better. On 100kbps, there is no big difference between those codecs.

Raw data:
40 Logs and encoders, decoders log
http://zak.s206.xrea.com/bitratetest/log_o...kbps100kbps.zip (http://zak.s206.xrea.com/bitratetest/log_opusaac75kbps100kbps.zip)

Code: [Select]

% Opus, AAC 75kbps, 100kbps ABC/HR Score
% This format is compatible with my graphmaker, as well as ff123's FRIEDMAN.
opus_75k    celt_75k    cvbr_75k    tvbr_75k    opus100k    celt100k    cvbr100k    tvbr100k
%features 6 75kbps 75kbps 75kbps 75kbps 100kbps 100kbps 100kbps 100kbps
%features 7 OPUS OPUS AAC-LC AAC-LC OPUS OPUS AAC-LC AAC-LC 
3.050    3.100    2.500    2.750    3.500    3.750    3.700    3.800    
3.750    2.950    2.700    2.750    4.050    3.800    4.000    3.950    
2.800    2.550    3.000    3.000    3.600    3.250    4.050    3.900    
2.700    3.150    2.350    2.300    3.350    3.800    3.600    3.700    
4.000    3.400    2.850    2.850    4.350    3.900    3.550    3.550    
2.600    2.550    2.800    2.800    3.350    3.150    3.950    3.900    
3.400    3.950    3.000    3.200    3.850    4.500    3.700    3.800    
3.450    3.500    2.900    2.800    3.850    4.050    4.050    4.150    
2.950    2.700    3.550    3.450    3.250    3.450    4.000    3.850    
3.100    3.400    2.750    2.600    3.800    3.850    4.150    4.000    
3.350    3.100    2.600    2.600    3.750    3.400    3.450    3.500    
3.750    3.350    2.800    2.950    4.050    3.750    3.800    3.850    
3.550    3.300    2.600    2.650    4.250    3.950    3.750    3.600    
3.100    3.350    2.750    2.550    3.650    3.700    3.850    3.800    
3.400    3.450    2.900    2.900    3.650    3.950    3.750    3.900    
3.250    3.300    2.750    2.800    3.650    3.850    3.950    3.750    
3.600    3.800    3.300    3.300    3.550    4.000    3.650    3.700    
3.700    3.350    3.300    3.300    3.900    3.650    4.100    4.000    
3.100    3.600    3.150    3.000    3.700    3.800    4.100    3.850    
3.650    4.050    3.000    2.900    4.050    4.250    3.750    3.550

It's not strange that some scores get 0.050 scale because I did tests twice per each music.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: C.R.Helmrich on 2012-11-17 09:28:31

Thanks for this interesting test, Kamedo, and welcome to HA!

So is the "libopus 0.9.11-146-gdc4f83b-exp_analysis (https://people.xiph.org/~greg/opus-tools_exp_dc4f83be.zip)" equal to the current official Opus release, or something older?

Chris

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-17 09:40:41

Quote from: C.R.Helmrich on 2012-11-17 09:28:31

Thanks for this interesting test, Kamedo, and welcome to HA!

So is the "libopus 0.9.11-146-gdc4f83b-exp_analysis (https://people.xiph.org/~greg/opus-tools_exp_dc4f83be.zip)" equal to the current official Opus release, or something older?

Chris

I started the tests in 2012-09-01, and at that time it was the newest experimental branch called exp_analysis7, which was merged into main branch in 2012-10-09 by Jean-Marc Valin.
So it's not the current official Opus release. I think I spent too much time testing.
http://git.xiph.org/?p=opus.git;a=commit;h...9dedf21b3fa6ecb (http://git.xiph.org/?p=opus.git;a=commit;h=dc4f83bef59b66608b4274c229dedf21b3fa6ecb)

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Dynamic on 2012-11-17 09:48:06

Thank you for your time and dedication, Kamedo2

Like the other tests you've done, it's a very thorough job and provides useful comparative data points in the comparison of the leading encoders at 100kbps, much in the way that other individual good listeners' serious multi-sample tests (e.g. guruboolez) have provided useful information in the past.

At 75 kbps it's debatable and possibly subjective whether iTunes/Quicktime LC-AAC or HE-AAC mode is the better AAC codec at 75 kbps, which is near the point (from blind comparisons I've seen) where the two AAC modes converge in quality scores. It also extends somewhat above 64 kbps the range where ABX logs have shown Opus to beat a leading AAC encoder by a likely significant margin (albeit that the 64 kbps multiformat test had ruled out LC-AAC before the main test as HE-AAC beat it at 64 kbps).

It's also reassuring that the modifications to the final rather advanced CELT versions that allowed it to be combined with a modified SILK in the hybrid Opus codec do not appear to have inflicted any quality penalty at the bitrates and samples you've tested.

Thanks again!

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: IgorC on 2012-11-17 10:04:48

Kamedo2, Thank You for all your tests.
Glad to see You on Hydrogen Audio.

The way the test was delivered reminds me someone's else.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-17 17:44:41

The samples I used
(http://i48.tinypic.com/2qi02ea.png)

The ABX criteria is 12/15(p=0.02). Among the 320 ABXs I did, I failed to pass ABX once, in mybloodrusts, celt 100kbps.(11/15)

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Anakunda on 2012-11-17 22:45:05

Quote from: Kamedo2 on 2012-11-17 08:25:55

Blind Comparison between 2012/09 new Opusenc(tfsel5), old Celtenc 0.11.2, Apple AAC-LC tvbr, cvbr.

Is Apple AAC really better at 96k than Opus? I was confident Opus rules all the low bitrate ranges
And I wonder why OpusEnc with ea7 was used when there is the official build (libopus 1.0.1)?

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: C.R.Helmrich on 2012-11-17 23:54:00

Quote from: Anakunda on 2012-11-17 22:45:05

Is Apple AAC really better at 96k than Opus? I was confident Opus rules all the low bitrate ranges
And I wonder why OpusEnc with ea7 was used when there is the official build (libopus 1.0.1)?

Your second question was answered in Kamedo's reply to my question. And who says Apple AAC is better than Opus at 96 kbps? The confidence intervals highly overlap, and Kamedo's statistical Analysis says there is no significant difference.

And: in today's codec development, we consider 96 kbps "high bitrate"

Chris

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-18 04:47:56

Quote from: Anakunda on 2012-11-17 22:45:05

Is Apple AAC really better at 96k than Opus? I was confident Opus rules all the low bitrate ranges

It is possible to think that Opus is slightly better than Apple AAC at 96k on a very large set of samples.
Maybe Opus looks poorer than you imagined on the x-axis=actual bitrate y-axis=average score graph, because
the bitrate bloat on Opus is so big(nominal=90k actual=102k), making the plot rightward shift in the graph.
This is an worrying behavior, and my set of 20 samples contain plenty of non-critical samples.

Quote from: Anakunda on 2012-11-17 22:45:05

And I wonder why OpusEnc with ea7 was used when there is the official build (libopus 1.0.1)?

libopus 1.0.1 was released two weeks after I had started the test, which took me 2.5 months.
The newer version may beat the AAC with margin, if the tweaks includes quality improvement or tightened
actual bitrates.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Dynamic on 2012-11-18 08:20:43

Quote

x-axis=actual bitrate

That was one query I did have about your method, Kamedo2 - mainly because I don't speak more than about 5 words of Japanese to find out for myself.

As you have been so thorough, I expect we can assume that you tested the encoders on a larger collection (maybe 10 or many more CDs) of normal music to determine the target settings to achieve the bitrate target, and that you plotted THAT collection-wide bitrate (for the large sample size) on the horizontal axis of your graphs, ignoring the actual bitrate used for the test samples?

The bitrate on a number of short samples (with more problem samples than usual) may properly be much higher than the average of a wide collection and is not relevant in the context of fair comparison. (That's the beauty of well tuned VBR, and to some extent modern constrained VBR as used by the 'CBR' modes - using much higher bitrate when the sound requires it and less when it doesn't without inflating the average over a whole collection).

There's also a chance that Opus produces lower bitrates for short samples as it doesn't use large codebooks compared to AAC, Vorbis etc., so using the sample-wide bitrate would give an over-emphasised disadvantage to AAC at lower bitrates on shorter-than-normal samples, again requiring the bitrate of a wider collection of representative CD releases, presumably encoded in the popular file-per-track mode without any pre-applied Replay Gain, as that's most representative of real use of lossy encoders for playback.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-18 09:23:18

Quote from: Dynamic on 2012-11-18 08:20:43

Quote
x-axis=actual bitrate

That was one query I did have about your method, Kamedo2 - mainly because I don't speak more than about 5 words of Japanese to find out for myself.

As you have been so thorough, I expect we can assume that you tested the encoders on a larger collection (maybe 10 or many more CDs) of normal music to determine the target settings to achieve the bitrate target, and that you plotted THAT collection-wide bitrate (for the large sample size) on the horizontal axis of your graphs, ignoring the actual bitrate used for the test samples?

Horizontal axis is always the average actual bitrate including headers and footers of each sample.
I calculated filesize*8/(sample_num/44100) of each sample, typically 20-30sec.
Then, I took the average of the 20 bitrates in bps. Larger collections are NEVER used.
Encoded files are typically around 200KB. Opus should benefit form smaller codebooks in this case.

Quote

larger collection (maybe 10 or many more CDs) of normal music

My set of samples are fairly normal. At least they are not an array of super-critical fatboys, although short.
The last 5 samples from the 20 samples I used.(Reunion Blues ~ Run up)
http://zak.s206.xrea.com/bitratetest/bitra...st_wav25-29.zip (http://zak.s206.xrea.com/bitratetest/bitratetest_wav25-29.zip)

Code: [Select]

list of audio file size in KB(1000B)
 286 377 187 220 231 151 153 278 104 232 252 204 182 246 187 251 267 255 155 170
 283 278 182 231 175 165 191 264 100 237 260 192 188 265 190 282 289 256 186 189
 303 280 202 233 190 163 191 266 108 233 273 199 188 263 184 313 278 256 187 189
 319 216 208 224 175 163 191 253 108 209 268 205 163 240 154 341 233 242 169 158
 389 509 252 298 313 204 202 375 139 311 342 276 244 334 238 341 344 344 210 229
 379 373 242 309 234 220 255 353 133 316 346 257 250 354 253 376 386 343 249 252
 415 370 265 310 252 222 252 355 143 312 372 264 248 354 249 411 366 341 250 245
 425 287 275 310 239 222 262 347 143 282 372 282 231 340 212 455 305 330 241 222

list of audio file bitrate including headers and footers in kbps(1000bps)
  76 101  78  72  92  69  60  79  80  74  69  81  73  69  75  67  72  74  63  68
  75  74  76  76  70  76  75  75  77  76  72  76  75  75  76  75  78  75  75  76
  81  75  84  76  76  75  75  75  84  74  75  79  75  74  74  84  75  75  75  75
  85  58  87  73  70  75  75  72  84  67  74  81  65  68  62  91  63  71  68  63
 104 136 105  97 125  93  79 106 107  99  94 110  98  94  96  91  93 100  84  92
 101 100 101 101  94 101 100 100 103 101  95 102 100 100 102 100 104 100 100 101
 111  99 111 101 101 102  99 100 111  99 102 105  99 100 100 110  99  99 101  98
 113  77 115 101  96 102 103  98 111  90 102 112  92  96  86 121  83  96  97  89

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Dynamic on 2012-11-18 19:07:16

Thank you for the clarification.

It seems that duration is typically 20-30 seconds, though a few are around 10 seconds, so this is typically 10-20% of the length of a typical song. I think the advantage to Opus of no codebook transmission (http://en.wikipedia.org/wiki/Opus_(audio_format)) comes mainly in files of a few seconds' duration as it saves a few kilobytes compared to formats like Vorbis, so hopefully any bitrate penalty to AAC is can be neglected as it's probably less than 1% on average.

I guess only a few samples like Tom's Diner and 41_30sec have been considered problem samples in the past by my recollection, so their effect on overall bitrate is hopefully minor. My guess is that the test method is essentially sound, and any minor differences that might arise between the test samples and typical real-world usage are probably masked by the statistical error bars.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: IgorC on 2012-11-18 19:12:05

Kamedo2,

I'm not here to criticize your test, everything in contrary. I'm agree with your results.
I know how it feels when somebody starts criticize results of test because someone doesn't like that the codec he includes in his products wasn't first and no matter how good your intentions were.

So, Thank You Very Much for your labor.

Though I have a bit different philosophy of testing VBR encoders which are very closely related to HA's public tests one.
Simply talking, it's to expect that encoder bloats bitrate on difficult samples. It's ok while VBR encoders respect the average bitrate on a large number of albums.
But still I agree with You that it's not wrong to think that an overall bitrate for all tested samples should be the same too.
As all listening tests yours has samples which difficultness higher than average. Not necessary extremely killers but somewhat harder to code. So it's an evaluation of HIGH part of VBR only. But we don't test here simple parts where the encoder goes really LOW. Yes, the experimental Opus bloats bitrate on hard parts (HIGH) but there wasn't an evaluation of a LOW part (very easy to encode samples). More likely Opus'es VBR mode is more "true" than Apple's true VBR
I've encoded some albums with the experimental Opus build and CELT 0.11.2. Yes, there could be difference in 2-3 kbps overall but shifting bitrates ~10 kbps is a bit too much.

But then again it's my point of view.

I think ideally two encoders should be tested at bitrate where both end up with the same file size for large number of albums and then randomly choose testing items to match the same rate between them too.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-19 01:17:10

Quote from: IgorC on 2012-11-18 19:12:05

As all listening tests yours has samples which difficultness higher than average. Not necessary extremely killers but somewhat harder to code. So it's an evaluation of HIGH part of VBR only. But we don't test here simple parts where the encoder goes really LOW. Yes, the experimental Opus bloats bitrate on hard parts (HIGH) but there wasn't an evaluation of a LOW part (very easy to encode samples).

If we were to include LOW samples, we can predict what happens on CBR and VBR codecs.
On CBR like CELT, because there are plenty of bits available, the average quality will go up. It will make upward shift on the x-axis=bitrate y-axis=quality plot.
On VBR like Opus, because the goal is to retain the same quality, the average bitrate will go down, while quality remains the same. Leftward shift on the plot.
On the bitrate vs quality graph, upward shift and leftward shift effectively means the same thing; better results.

Quote from: IgorC on 2012-11-18 19:12:05

I've encoded some albums with the experimental Opus build and CELT 0.11.2. Yes, there could be difference in 2-3 kbps overall but shifting bitrates ~10 kbps is a bit too much.

Nice to hear that bitrate increase is rather minor.
The problem is, the "some albums" is not redistributable thus not reproducible. May be we should have an objective method to measure "bitrate bloat".
On VBR development people are likely to tweak settings that cause bitrate increase. Rightward shift on the plot is a bad thing, and the tweak involves bitrate increase on only the
critical part, average bitrate bloat will be minimized, meaning very little rightward shift. If the tweaks has an generic bitrate increase, rightward shift is big, it means useless.
Generic bitrate increase plagues LAME, and I don't want Opus to follow the same way.

As for my samples being short, a Vorbis codebook is, for example, several kilobytes. Compared to 4-5 minutes samples, there are at best 2% more benefit that Opus has from being small.
It can be a problem when the intended application is a music storage. On ads and news videos, game sounds, wikimedia, tests at 20 seconds is about right.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-19 19:03:14

Hi Kamedo2, thanks for the test. From what I see, I suspect that the main "issue" is the rate. Based on the commit ID you're quoting for Opus, it was a version where the VBR is supposed to be properly "calibrated". That is, on a large collection of samples, the average output rate matches the target rate. Unlike the original CELT VBR code (which is what the Opus RFC has) that was always trying to get the same average rate, the new VBR code is quite aggressive. It will increase the rate significantly on hard samples and reduce it on easier samples. It would seem like your test has a significant fractions of samples that Opus considers hard, which is why you're getting 100 kb/s when asking for 90 kb/s. Keep in mind that the samples Opus considers hard are different from the ones that are considered hard for MP3, Vorbis or AAC.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: lvqcl on 2012-11-19 19:32:59

I took my Opus compile (libopus v1.0.1-140-gc55f4d8 from git) and encoded several albums with it (pop/rock/electronic etc).
X axis: the value of the --bitrate option. Y axis: real bitrate. One album = one line on the graph.

(http://i.imgur.com/1VRNP.png)

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-19 22:19:18

Quote from: jmvalin on 2012-11-19 19:03:14

it was a version where the VBR is supposed to be properly "calibrated". That is, on a large collection of samples, the average output rate matches the target rate.

Thank you. Nice to hear that Opus developers have a sensible way to avoid tweaks that increase bitrates on virtually anything and perceive them as 'improvement'.

Quote from: lvqcl on 2012-11-19 19:32:59

I took my Opus compile (libopus v1.0.1-140-gc55f4d8 from git) and encoded several albums with it (pop/rock/electronic etc).
X axis: the value of the --bitrate option. Y axis: real bitrate. One album = one line on the graph.

I'm pleased to see, according to your graph, in real life application as a music storage, Opus always creates files with users' desired bitrates. Very close.

According to the bitrate table I posted, only 2 samples, mybloodrusts and Dimmu Borgir are what Opus felt as 'easy' samples(the value of the --bitrate option > real bitrate). Although I'm reluctant to include more easy samples because they adds more time ABXing and statistical noise, I should have included more easier samples.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Dynamic on 2012-11-20 02:03:32

Thanks again to everyone in this thread. I'm certainly learning a lot. That graph of actual versus requested bitrate is useful information I wasn't aware of.

The low latency and fixed codebook requirements of Opus/CELT do mean it has different hard samples compared to high latency codecs like AAC and Vorbis and requires different bitrates to encode them well.

The beauty of Kamedo2's visual graphing approach of bitrate versus quality rating is that we can adjust the horizontal position of the rated scores if we find the true representative bitrate of the settings used over a wider collection without having to retest the samples.

Moving Opus at 75 kbps leftwards to 66 kbps target bitrate, it's stilll clearly a winner over AAC-LC at anything remotely plausible for the AAC setting even 60 kbps upwards, so the Opus victory there is solid. We don't know where CELT would end up, but likely no further to the right than its current position, possibly as low as about 66 kbps also.

At 100 kbps area, both AAC and Opus might need a little shift to the left (Opus to 90 kbps, AAC not sure, but probably less shift to the left) and are likely to remain statistically tied as they are now.

If we get more accurate bitrates, the graph can be replotted, the statistical results will surely be the same and the scatter plots in the first figure will represent actual bitrate variation over the sample corpus. A short horizontal line indicating the wide collection average at each setting would be useful to indicate which samples each encoder considers about average or a little bit harder to encode than average. (In any future test where the tested encoder settings are matched closely over a wider collection of CDs, the 'average bitrate line' for each encoder at the same target rate will be pretty close)

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-20 04:13:01

Quote from: Dynamic on 2012-11-20 02:03:32

We don't know where CELT would end up, but likely no further to the right than its current position, possibly as low as about 66 kbps also.

CELT would normally stay where it is because it never had unconstrained VBR. It was always trying to hit the requested bit-rate average for every file. Unconstrained VBR is a new feature in Opus and is still only implemented in the development branch (git master). So normally the CELT curves would remain in place, while the OPus curves would move to 66 kb/s and 90 kb/s.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-20 17:39:10

Quote from: Kamedo2 on 2012-11-19 22:19:18

According to the bitrate table I posted, only 2 samples, mybloodrusts and Dimmu Borgir are what Opus felt as 'easy' samples(the value of the --bitrate option > real bitrate). Although I'm reluctant to include more easy samples because they adds more time ABXing and statistical noise, I should have included more easier samples.

Well, you could also just "calibrate" on a large set that includes a bit of everything, which I believe is what most tests do. I had a closer look at the bitrates and indeed it seems like your test really focuses on hard samples -- probably more than any other test I've seen so far. Are the audio samples available so I can check why Opus is finding them hard?

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: C.R.Helmrich on 2012-11-20 22:01:23

Quote from: Kamedo2 on 2012-11-17 08:25:55

Samples:
20 Sounds of various genres, from easy to modestly critical.
http://zak.s206.xrea.com/bitratetest/main.htm (http://zak.s206.xrea.com/bitratetest/main.htm)
To download, access to the link above, 2nd paragraph, 3rd-6th links. (40_30sec - Run up)

Edit: I put these samples into Fraunhofer's latest AAC encoder, and at VBR 3 (target ~97 kbps), they average 107 kbps. Here as well, only 2 items get <= 97 kbps (samples 11ff and 24td). So it seems that what Opus/Celt finds difficult to code is not that different from what an AAC encoder might find difficult, after all (except maybe for very tonal items, yes).

Chris

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-21 01:08:17

Quote from: jmvalin on 2012-11-20 04:13:01

CELT would normally stay where it is because it never had unconstrained VBR. It was always trying to hit the requested bit-rate average for every file. Unconstrained VBR is a new feature in Opus and is still only implemented in the development branch (git master). So normally the CELT curves would remain in place, while the OPus curves would move to 66 kb/s and 90 kb/s.

I assume that the preference for bitrate 'calibration' is to make it close to the real-world applications, where easy and difficult samples appear roughly 1:1, as opposed to 2:18 now.
In the real world where easy samples appear more frequently than in this test, cbr codecs like CELT will not 'normally stay'. Because there are plenty of easy samples, it means less perceptible artifacts, thus better average quality(Imagine MP3 CBR 128kbps), and move upwards. It will NOT remain in place.
In other words, in the typical music storage applications, VBRs like Opus get better compression, while CBRs get better quality.

Quote from: C.R.Helmrich on 2012-11-20 22:01:23

Edit: I put these samples into Fraunhofer's latest AAC encoder, and at VBR 3 (target ~97 kbps), they average 107 kbps. Here as well, only 2 items get <= 97 kbps (samples 11ff and 24td). So it seems that what Opus/Celt finds difficult to code is not that different from what an AAC encoder might find difficult, after all (except maybe for very tonal items, yes).

Chris

Interesting stuff. 11ff = finalfantasy is the samples Opus considered it to be the hardest. But for AAC, it's easy. I guess it's because the sound of the harpsichord is tonal. After using 101 and 136kbps for the harpsichord, Opus did a very good job preserving the detail(3.75 and 4.05, respectably). And somehow, 24td = Tom's Diner is easy for AAC(but difficult for Opus).

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-21 02:15:32

Quote from: Kamedo2 on 2012-11-21 01:08:17

I assume that the preference for bitrate 'calibration' is to make it close to the real-world applications, where easy and difficult samples appear roughly 1:1, as opposed to 2:18 now.
In the real world where easy samples appear more frequently than in this test, cbr codecs like CELT will not 'normally stay'. Because there are plenty of easy samples, it means less perceptible artifacts, thus better average quality(Imagine MP3 CBR 128kbps), and move upwards. It will NOT remain in place.
In other words, in the typical music storage applications, VBRs like Opus get better compression, while CBRs get better quality.

Actually, in the "real world", I would say that difficult samples represent less than 10% of all samples. Also, by "stay where it is", all I meant was that the x axis would remain at the same place regardless of the samples because CELT gives the same average for every sample (unlike the new Opus code). As for "better compression vs better quality", it's really all the same thing because you trade one for the other. The main thing we've actually been trying to do with VBR is to keep the quality constant and by adjusting the bitrate. In a perfect world (unachievable), all samples would get exactly the same quality and only the bit-rate would vary.

Quote from: Kamedo2 on 2012-11-21 01:08:17

Interesting stuff. 11ff = finalfantasy is the samples Opus considered it to be the hardest. But for AAC, it's easy. I guess it's because the sound of the harpsichord is tonal. After using 101 and 136kbps for the harpsichord, Opus did a very good job preserving the detail(3.75 and 4.05, respectably). And somehow, 24td = Tom's Diner is easy for AAC(but difficult for Opus).

Of all the instruments, harpsichord is the most difficult to code for Opus and the one that requires the highest bitrate. So the fact that you included two harpsichord samples (although harpsichord is typically rare) is in part what explains why Opus is using a rate that's much higher than it's usual rate. As for the reason why tonal samples are harder for Opus than for AAC, the main reason is the focus on interactive applications. Opus is designed for lower delay than AAC-LC, which forced the use of a shorter window (especially shorter overlap).

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-21 02:26:30

Bitrate vs Score plot of the 20 samples used.

Opus
(http://i47.tinypic.com/2h81kd1.png)
These two outlier samples at the top-right are finalfantasy and FloorEssence. Excellent quality but lots of bitrates used.

Apple AAC-LC tvbr
(http://i47.tinypic.com/316k9og.png)

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-21 04:37:25

Quote from: jmvalin on 2012-11-21 02:15:32

Actually, in the "real world", I would say that difficult samples represent less than 10% of all samples. Also, by "stay where it is", all I meant was that the x axis would remain at the same place regardless of the samples because CELT gives the same average for every sample (unlike the new Opus code). As for "better compression vs better quality", it's really all the same thing because you trade one for the other. The main thing we've actually been trying to do with VBR is to keep the quality constant and by adjusting the bitrate. In a perfect world (unachievable), all samples would get exactly the same quality and only the bit-rate would vary.

I meant 'hard samples' by the definition of its real bitrate exceeds the value of the --bitrate option. In a real world, if we were to randomly choose 100 samples from the world, it is likely that roughly 50 samples < value of the --bitrate option < 50 samples. Yes, 1:1. As for "better compression vs better quality", those who want to calibrate the x-axis for the better representation of the world should also calibrate the y-axis. The calibration of the y-axis involves listening tests including many easy samples, and it's understandable why people don't want to do that, but on the average bitrate vs average score graph, calibrating only one axis would break the beautiful point of the graph, when the graph includes evaluation of CBR codecs like CELT, and make it less fair.

By far the simplest solution I think of is to add more easy samples, as defined by samples that creates less than expected from the value of the --bitrate option.

Quote from: jmvalin on 2012-11-21 02:15:32

Of all the instruments, harpsichord is the most difficult to code for Opus and the one that requires the highest bitrate. So the fact that you included two harpsichord samples (although harpsichord is typically rare) is in part what explains why Opus is using a rate that's much higher than it's usual rate.

Sorry for my misleading reply. Out of 20 samples I used, only finalfantasy is a harpsichord sample. I can understand why would critical samples like finalfantasy, FloorEssence, or VelvetRealm takes much space.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-22 01:45:34

Quote from: Kamedo2 on 2012-11-21 04:37:25

I meant 'hard samples' by the definition of its real bitrate exceeds the value of the --bitrate option. In a real world, if we were to randomly choose 100 samples from the world, it is likely that roughly 50 samples < value of the --bitrate option < 50 samples. Yes, 1:1.

Well, I tend to define "hard samples" as those for which the real bitrate is much higher than the target. But even by your definition, there should be fewer than 50% hard samples. You can see the Opus VBR (simplified explanation) as reducing the rate of all samples by about 10%, and then looking for things that require an increase. You can see that in your results because with --bitrate 66, the lowest real rate is 60 while the highest is 101. So the distribution is highly asymmetric. What really matters is not exactly how many are above the target, but how many are in the "long tail" of outliers.

Quote from: Kamedo2 on 2012-11-21 04:37:25

As for "better compression vs better quality", those who want to calibrate the x-axis for the better representation of the world should also calibrate the y-axis. The calibration of the y-axis involves listening tests including many easy samples, and it's understandable why people don't want to do that, but on the average bitrate vs average score graph, calibrating only one axis would break the beautiful point of the graph, when the graph includes evaluation of CBR codecs like CELT, and make it less fair.

Sorry, I don't understand what you mean by y-axis calibration.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-22 06:33:35

Quote from: jmvalin on 2012-11-22 01:45:34

Quote from: Kamedo2 on 2012-11-21 04:37:25
As for "better compression vs better quality", those who want to calibrate the x-axis for the better representation of the world should also calibrate the y-axis. The calibration of the y-axis involves listening tests including many easy samples, and it's understandable why people don't want to do that, but on the average bitrate vs average score graph, calibrating only one axis would break the beautiful point of the graph, when the graph includes evaluation of CBR codecs like CELT, and make it less fair.

Sorry, I don't understand what you mean by y-axis calibration.

Sorry for my bad explanation.

There are some people who complain that my set of samples focus on hard samples and there are not enough easy samples. Among these people, some people suggests, that I should encode a large set of normal musics and the average bitrate of the large collections should be used as x-axis on the average bitrate vs average quality graph. If you are not the 'some people', you can stop reading now. The reason those people advocate the use of large collections is that they contain more easy samples and more natural. They want to see more easy samples thrown in and want to see what happens to the bitrate. What they fail to notice, is that if I were to throw more easy samples, unavoidably, not only the bitrate but the quality will change too. On VBR, supposedly, it doesn't happen but often it does. Because they don't notice that, they try to adjust the x-axis but never touch the y-axis and move the plot only horizontally. But as more easy samples thrown in, the bitrate in x-axis decrease while the quality in y-axis go up naturally. So the only horizontal calibration some people advocate is unnatural.

Y-axis calibrations probably don't exist. Other than a lengthy process involving listening test of large set of easy samples.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-22 17:10:17

Quote from: Kamedo2 on 2012-11-22 06:33:35

There are some people who complain that my set of samples focus on hard samples and there are not enough easy samples. Among these people, some people suggests, that I should encode a large set of normal musics and the average bitrate of the large collections should be used as x-axis on the average bitrate vs average quality graph. If you are not the 'some people', you can stop reading now.

Well, I'm among those who think that adjusting the x-axis that was isn't ideal, but that just using the rate average on the test samples as the x-axis. It avoids having to include a large amount of easy samples in the test.

Quote from: Kamedo2 on 2012-11-22 06:33:35

The reason those people advocate the use of large collections is that they contain more easy samples and more natural. They want to see more easy samples thrown in and want to see what happens to the bitrate. What they fail to notice, is that if I were to throw more easy samples, unavoidably, not only the bitrate but the quality will change too. On VBR, supposedly, it doesn't happen but often it does. Because they don't notice that, they try to adjust the x-axis but never touch the y-axis and move the plot only horizontally. But as more easy samples thrown in, the bitrate in x-axis decrease while the quality in y-axis go up naturally. So the only horizontal calibration some people advocate is unnatural.

Y-axis calibrations probably don't exist. Other than a lengthy process involving listening test of large set of easy samples.

I also don't know how you would do "y-axis calibration". The best I can think of here is to divide the test samples in two subsets: a hard subset like you currently have, and an "average" subset that represents normal samples you're likely to encounter in practice. You still use an even larger database for the x-axis, but now you give two quality values.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Dynamic on 2012-11-22 18:03:54

I think the objectives in tests (experiments) matter. And the questions they aim to answer should determine the primary graphs produced. Scatter plots with actual bitrates in test samples are often not important to the question being asked but may be interesting secondary information.

I suspect for a lot of consumers, bitrate really amounts to a proxy for "How much music can I store on my device?" or "How much space can I save in my music to take photos on my smartphone?".

Instantaneous bitrate or even bitrate over a whole song isn't that important to them.

They may also ask: "What's the most music I can store on my device at a reasonably good quality?"

The meaning of reasonably good quality will vary.

For some, perceptible but not annoying differences may be tolerable much of the time, with just once or twice every few hours something mildly annoying and unmusical being noticed.
For others, entirely transparent most of the time, but once or twice every few hours, some difference that's perceptible but not annoying being noticed is their quality lower limit.

These are just two examples from the range of possible requirements.

For example, if testing 160 variants (e.g. 20 samples over 2 bitrates of 4 encoders or 10 samples over 4 bitrates of 4 encoders)

Question 1:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), which encoder offers the best quality?

(This is the question your tests are roughly set up to answer)

Question 2:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), what typical quality can we expect from each encoder (ignoring problem samples unless they account for 5% or more of general music).

(This is the question 'some people' seem to want answered)

Each question might warrant a different test:

Question 1 might warrant a large number of problem samples of various classes to make rare annoyances easy to detect and penalize the encoder that deals with problem samples worse. This might be best for people who wish to avoid the occasional annoying and unmusical artifacts.

Question 2 might warrant a collection of normal samples, not known to be problematic, to get a representative idea of typical quality, giving people an idea which bitrates might suit them. This might be best for people who are more forgiving of occasional artifacts, but would start to penalize any encoder frequently exposes artifacts rather than rarely. It might also be sensible to include the same 'typical' samples over a range of a few bitrates and limit the number of samples in the test corpus (e.g. make it 10 samples but test over 4 bitrates per encoder) to match this aim.

There are reasons of academic interest and limited practical use to the generic listener, where you'd really want to see a scatter plot to know either how variable VBR might be in each encoder, or to compare quality with actual bitrate to get an idea of bitrate-efficiency of the coding tools available in a particular format. Even then, the length of the sample may have a big influence on the reported bitrate, e.g. a 2 second period requiring lots of high bitrate short blocks within a 30 second clip of otherwise ordinary sound, will exhibit a lower bitrate than the same 2 seconds within a 5 second clip even though the instantaneous bit rate of the difficult part would be the same. Then again, there are some samples like fatboy that occur almost throughout the song (Kalifornia by Fatboy Slim).

In a sense it ought to be made clear to the average listener that the average bitrate line, not the scattered bitrate is what's important for the question of "How much music can I fit on my device" or "How much space will be left on my phone's microSD card for taking photos and videos?" and the y-axis height is the important point regarding quality, where the average quality may have some importance, the spread of quality values might also be important (less spread usually being good) and the lowest quality reported also having some importance (if you'll be rather unforgiving of a really nasty artifact, for example - in my case it's 'birdies' and 'warbling' in bad MP3 encoders that drive me crazy).

I think Question 1 is where you're leaning when saying that changing the type of sample used changes the quality (easy sample - higher quality) even for VBR.
I think Question 2 is the sort of listening test 'some people' have asked you for.

Where many of us at Hydrogen Audio differ from you is in our belief that average bitrate over a collection of CDs is the important bitrate calibration even for Question 1.

When making an experiment to test Question 1, we use hard samples to differentiate encoders more easily and find out which is best. At the same time, we realise that the reported Quality is more representative of rare problem cases only, and is therefore penalised, and is not representative of normal music, so we wouldn't claim to have useful information about general quality that can be expected.

Question 2 is actually quite hard to test, partly because the Quality is often very close to transparent (5.0) especially at around 96-100 kbps on recent encoders with easy samples.

I guess a useful graph for a single codec and mode (e.g. Opus in VBR mode) could be one that showed Quality (y) versus Average Bitrate (x) but plotted as two different lines at each bitrate.
The upper line (blue) would be general music quality (based on 5 to 20 samples of normal music of various genres). This might start rather low at 32kbps, increase at 48 kbps, get pretty high at 64 kbps and reach 5.0 at 96 kbps
The lower line (red) would be 'problem sample quality' where a collection of typical codec-killers is used (fatboy, tomsdiner, eig, etc.). This might start terribly low at 32kbps, still be pretty bad at 48, get a bit better and 64 kbps and reach somethiing like 4.0 at 96 kbps for example, and if extended to, say 128 kbps, it might get quite close to 5.0 for example.

The lines could also be fit lines, with quality scatter points above and below them to give an indication of the spread of quality for general samples (blue) and for problem samples (red).

The results of such a test might be relatively informative especially in terms of total space occupied by your music or total music duration per amount of storage space (e.g. expressed in hours per Gigabyte or Gigabytes per 10 hours) rather than focusing on bitrate.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-22 21:42:47

Quote from: Dynamic on 2012-11-22 18:03:54

Question 1:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), which encoder offers the best quality?

(This is the question your tests are roughly set up to answer)

This is relatively easy to answer. If an occasional problem sample matters and want to avoid ugly artifacts, this test tells how much of these exceptionally bad moments exist.

Quote from: Dynamic on 2012-11-22 18:03:54

Question 2:
At the same bitrate over a general representative collection (thus being able to store the same amount of music on my device), what typical quality can we expect from each encoder (ignoring problem samples unless they account for 5% or more of general music).

(This is the question 'some people' seem to want answered)

The problem of using a general representative collection is I don't know the quality of the collection. I may know in the future, after 3, 4, or 5 month of listening tests, but not now. Don't expect me to know.
But, something similar can be done. I removed problem samples, namely, finalfantasy(harpsichord), FloorEssence(techno), VelvetRealm(techno, sharp attack), Tom's Dinar(Woman's a cappella).
I replotted the bitrate vs graph on the remaining 16 samples.

(http://i50.tinypic.com/wspvdc.png)

Hope it helps.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-23 15:35:44

Quote

rjamorim: There's some inverse proportionality there
rjamorim: At low bitrates nobody is interested,
but the results are easy to obtain
rjamorim: At high bitrates everyone is interested,
but you practically can't obtain usable results/

The same can be said over hard samples and easy samples.

I uploaded all the samples in the Upload forum.
http://www.hydrogenaudio.org/forums/index....showtopic=98003 (http://www.hydrogenaudio.org/forums/index.php?showtopic=98003)

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-23 20:51:25

I measured an average bitrate over wide range of normal 15 musics(63min).

opusenc --bitrate 66 input.wav output.opus
65.9kbps

celtenc input.48k.raw --bitrate 75 --comp 10 output.oga
73.4kbps

qaac --cvbr 72 -o output.m4a input.wav
75.2kbps

qaac --tvbr 27 -o output.m4a input.wav
66.0kbps

opusenc --bitrate 90 input.wav output.opus
88.9kbps

celtenc input.48k.raw --bitrate 100 --comp 10 output.oga
98.9kbps

qaac --cvbr 96 -o output.m4a input.wav
100.0kbps

qaac --tvbr 45 -o output.m4a input.wav
91.9kbps

The 15 songs I used is, of course, different from 20 samples I used in this listening test.
I understood the importance of calibration, but still I'm reluctant to use these value as the x-axis of the bitrate vs quality graph;

Imagine you make a x=height vs y=bodyweight scatter graph over many people, and you plot Alice's height as x and Bob's weight as y; That's a chimera.
When I plot Charlie, I use Charlie's height and Charlie's weight. When I plot Dave, I use Dave's height and Dave's weight.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2012-11-24 18:49:51

Quote from: Kamedo2 on 2012-11-23 20:51:25

The 15 songs I used is, of course, different from 20 samples I used in this listening test.
I understood the importance of calibration, but still I'm reluctant to use these value as the x-axis of the bitrate vs quality graph;

Imagine you make a x=height vs y=bodyweight scatter graph over many people, and you plot Alice's height as x and Bob's weight as y; That's a chimera.
When I plot Charlie, I use Charlie's height and Charlie's weight. When I plot Dave, I use Dave's height and Dave's weight.

The method certainly has drawbacks, but it's not as silly as you might think. Here's another way to reason about it. I have a music player with a fixed capacity (e.g. 8 GB) and I need to fit my entire music collection on it. I compute the average rate I can afford and use that to encode all songs. Now imagine I wanted to figure out which codec to use. I can apply that process with many different codecs, and then compare the codecs to see which one provides me with the best music quality. Ideally, I'd listen to the each song, but that's way too long. The alternative is to pick only a relatively small sample. I *could* pick my samples at random, but in practice, I probably want to bias my selection towards harder samples (because not OK to have 90% good files with 10% awful ones), while still having "normal" samples too. So I do the listening test on those samples and pick the best codec. When I do that, I have used the quality of a few samples as the y axis, with the rate over a large sample as the x axis. And it makes sense to do so.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-25 14:34:13

Quote from: jmvalin on 2012-11-24 18:49:51

Quote from: Kamedo2 on 2012-11-23 20:51:25
The 15 songs I used is, of course, different from 20 samples I used in this listening test.
I understood the importance of calibration, but still I'm reluctant to use these value as the x-axis of the bitrate vs quality graph;

Imagine you make a x=height vs y=bodyweight scatter graph over many people, and you plot Alice's height as x and Bob's weight as y; That's a chimera.
When I plot Charlie, I use Charlie's height and Charlie's weight. When I plot Dave, I use Dave's height and Dave's weight.

The method certainly has drawbacks, but it's not as silly as you might think. Here's another way to reason about it. I have a music player with a fixed capacity (e.g. 8 GB) and I need to fit my entire music collection on it. I compute the average rate I can afford and use that to encode all songs. Now imagine I wanted to figure out which codec to use. I can apply that process with many different codecs, and then compare the codecs to see which one provides me with the best music quality. Ideally, I'd listen to the each song, but that's way too long. The alternative is to pick only a relatively small sample. I *could* pick my samples at random, but in practice, I probably want to bias my selection towards harder samples (because not OK to have 90% good files with 10% awful ones), while still having "normal" samples too. So I do the listening test on those samples and pick the best codec. When I do that, I have used the quality of a few samples as the y axis, with the rate over a large sample as the x axis. And it makes sense to do so.

One of the big objective of this test is to compare Opus with Celt.
Do VBR-enabled Opus offer even better efficiency than Celt? Or did developers do some silly things and efficiency isn't improved at all?

Let's assume the latter. Suppose Celt and Opus are two ideal CBR and VBR encoders with exactly the same performance.
This ideal CBR encoder do not change the bitrate at all and harder songs occupy lower place on a bitrate-vs-quality plot.
This ideal VBR encoder do not change the quality at all and harder songs occupy right-side on the bitrate-vs-quality plot.
There are a normal music collection of 21 songs, each 4 minutes. Assume we have a 63MB of storage device, and we test which encoder offers better quality.
However, ABC/HRing the entire music collection is painstaking so we omit 8 easiest songs, and test only harder 13 songs.
We know only the 13 quality of harder songs, but we know 8 extra bitrate and we can use both 13 songs average and 21 songs average bitrate.

Which bitrate should we use?
(http://i47.tinypic.com/2e4ewb7.png)

Remember, these two encoders are of exactly the same performance.
Compare the CBR's consensus plot(red) with my plot(blue). They imply they are roughly the same performance.
Then, compare the red plot with a dark blue plot. The same implication is not there and the graph suggests superiority of VBR.
(Some may say it is superior because of it's smaller quality distribution; it's an ideal encoder and actual behavior may vary. Look at the errorbar of the actual Opus!)

I think the emphasis for bitrate calibration in hydrogenaudio is well deserved and I respect it. From now on, if I were to reproduce the same test, I'll
calibrate the bitrate on larger collection so that the bitrate is roughly the same. That's what people really want to know.

I'm going to use the bitrate of set of sample used only when the problem explained above can actually happen; the overemphasis on hard samples and use of CBR codecs.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-25 20:35:30

My post #34 might be too difficult. I wish I had better words to explain.

Let me review what this test really was. I used slightly-easier to hard samples and took the average of the 20 bitrates and 20 qualities.
There were no big measured difference in neither bitrate nor quality between Opus and Celt(green circle and orange circle).
(http://i50.tinypic.com/10xvsc0.png)
However, this test overemphasizes hard samples, and people really want to see are true opus score and true celt score(blue and red circle).
True bitrate is relatively easy to obtain, but for true quality, additional effort is necessary. So, some may want to adjust the bitrate: the cheaper option,
and we get more close to real usage. Likely consequence is that we get somewhere close to True Opus score(blue circle) and measured Celt score(orange circle).
What we want is a blue and red circle, and what we get is a blue and orange circle. At least one score is what we want. It's an improvement from zero match to one match.
And we get the conclusion. Opus is better.

But hey, is that a fair comparison? That comparison ignores how celt encodes easy samples nicely, and on blue and red true score, I don't think there will be a big difference.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: IgorC on 2012-11-26 02:08:34

Interesting.
The Opus'es scores have less deviation from an average score for particular items than CELT's. It means Opus has more constant quality.
Something hard to see looking only at an average scores.

In this moment I have some thoughts about it.
Imagine we have two encoders. Both produce the same average score but one of them produces the larger variation on results. Example, encoder A 4.5+/- 0.5 and encoder B 4.5+/-0.25.

For an experienced listener both have the same quality.
But the situation is different for another listener who already can't spot the difference while other give score >4.7 . So all samples of encoder B with score >4.7 will be ranked as transparent (5.0) in last case. So for the less experienced
listener (or the listener with inferior hardware) encoder B will have higher average score (and quality as well).

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Dynamic on 2012-11-26 20:57:27

Once again, Kamedo2, I applaud you for your testing skills and your willingness to adapt your method and present your data to answer different questions.

As I understand it, post 34 and post 35 are both hypothetical data, not real data, used to illustrate a point on different ways of interpreting the same data. Am I correct?

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-26 22:56:26

Quote from: IgorC on 2012-11-26 02:08:34

The Opus'es scores have less deviation from an average score for particular items than CELT's. It means Opus has more constant quality.

Sorry, I couldn't understand what you mean for 'particular items'. Yes, it's possible to believe Opus has slightly less quality deviation than CELT. 0.163<0.173 for 75k, 0.126<0.135 for 100k.
(Although it's not statistically significant, and I think the difference is almost negligible.)
The second and third paragraph is about nonlinearity I guess. If a encoder encodes first half of a music 4.0=Perceptible but not annoying and last half 2.0=annoying, I'll rate it 2.5 or so, not 3.0.
If an another encoder do the first half 3.6 and last half 3.4, my rate is 3.5. I have some thoughts about nonlinear average too.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2012-11-26 23:14:36

Quote from: Dynamic on 2012-11-26 20:57:27

As I understand it, post 34 and post 35 are both hypothetical data, not real data, used to illustrate a point on different ways of interpreting the same data. Am I correct?

Thank you. Of course, because I don't know the scores of samples that are never tested, post 34-35 are hypothetical data based on how an ideal encoder supposed to work.
And although I strive to rate the same quality as same ratings at any case, my ratings are not THAT accurate as post 34-35.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: jmvalin on 2013-01-03 00:09:52

Kamedo2, can you give 1.1-alpha a try? It includes post-exp_analysis changes that should address some issues exposed in your test. For example, samples 2 and 5 should have a lower rate, while samples 7 and 19 should have a higher rate.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: DonP on 2013-01-03 01:57:41

Quote from: IgorC on 2012-11-26 02:08:34

In this moment I have some thoughts about it.
Imagine we have two encoders. Both produce the same average score but one of them produces the larger variation on results. Example, encoder A 4.5+/- 0.5 and encoder B 4.5+/-0.25.
....
For an experienced listener both have the same quality.
But the situation is different for another listener who already can't spot the difference while other give score >4.7 . So all samples of encoder B with score >4.7 will be ranked as transparent (5.0) in last case. So for the less experienced
listener (or the listener with inferior hardware) encoder B will have higher average score (and quality as well).

IMO if the encoder VBR is quality based, the one with more consistent quality deserves more credit anyway.

Title: Personal Listening Test of Opus, Celt, AAC at 75-100kbps
Post by: Kamedo2 on 2013-01-05 10:50:31

Quote from: jmvalin on 2013-01-03 00:09:52

Kamedo2, can you give 1.1-alpha a try? It includes post-exp_analysis changes that should address some issues exposed in your test. For example, samples 2 and 5 should have a lower rate, while samples 7 and 19 should have a higher rate.

I'm going to test MP3 224kbps ABC/HR first(very long test), so I'll have no time to test Opus. Sorry.

HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: Kamedo2 on 2012-11-17 08:25:55