HydrogenAudio

Hydrogenaudio Forum => Validated News => Topic started by: IgorC on 2011-04-12 00:40:04

Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 00:40:04
The test is finished, results are available here:

http://listening-tests.hydrogenaudio.org/igorc/results.html (http://listening-tests.hydrogenaudio.org/igorc/results.html)

Summary: CELT/Opus won, Apple HE-AAC is better than Nero HE-AAC, and Vorbis has caught up with Nero HE-AAC.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 01:02:33
If someone can assist with a bitrate table or per-sample results, that would be nice...
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 01:06:15
Oh, and given that Opus is open sourced, if one of the developers can give a technical explanation for our audience on what codec features and design decisions made them able to win this test, that would be pretty damn interesting, too
Title: Multiformat listening test @ ~64kbps: Results
Post by: AllanP on 2011-04-12 01:14:33
I just wonder one thing, when the Vorbis encoder was tested how was it lowpassed. Was it tested with the default 14 kHz lowpass?
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 01:15:40
I just wonder one thing, when the Vorbis encoder was tested how was it lowpassed. Was it tested with the default 14 kHz lowpass?


You can see the exact settings used for each codec here:

http://listening-tests.hydrogenaudio.org/igorc/index.htm (http://listening-tests.hydrogenaudio.org/igorc/index.htm)

Title: Multiformat listening test @ ~64kbps: Results
Post by: AllanP on 2011-04-12 01:22:17
You can see the exact settings used for each codec here:

http://listening-tests.hydrogenaudio.org/igorc/index.htm (http://listening-tests.hydrogenaudio.org/igorc/index.htm)


ah thanks, sorry did not see it.

It says -q 0.1 so I assume it was the default 14 kHz lowpass
Title: Multiformat listening test @ ~64kbps: Results
Post by: romor on 2011-04-12 03:08:53
Congratulation to CELT/Opus!

I wanted to compare ratings by testers per sample, but it seems that every tester gets random testing sequence
Is there any way I can get such data, and get wanted plot - if it's not clear I want to know source sample formats (for 5 rating bins) for each tester

Thanks

edit: nevermind, I found a way - it seems that sample name appendixes are same (those describing 5 bins at header of each test result)
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 03:50:38
I think the results of lessthanjoey and AlexB are also anonym. It will be changed.
If anyone is interested in his/her test there is key or email me and I will send the results.


oh, I have participated in this test too.
Garf had the key for my results and had checked them.

It's also good to keep strong words like "thank you, great job". But this time I want to say big Thank You to all participants and people who has helped to conduct these test.
Sebastian Mares - for his previous public tests.  This test benefited much from them.
AlexB - for providing pre-decoded packages and being here.
Especially, Garf.

And many other people who were here around. Your time is valuable and highly aprreciated.
Title: Multiformat listening test @ ~64kbps: Results
Post by: googlebot on 2011-04-12 08:06:57
I'm stunned by the CELT/Opus results! I would have assumed that your toolbox is smaller than usually when you are targeting low-delay. And now Celt even beats the others by lengths.

Thanks for the great work, guys!
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 12:59:02
Thanks guys! Interesting results.


One note though:

Code: [Select]
Read 5 treatments, 531 samples => 10 comparisons
    Means:
          Vorbis   Nero_HE-AAC  Apple_HE-AAC          Opus    AAC-LC@48k
           3.513         3.547         3.817         3.999         1.656


For processing the result .txt files with chunky I organized them to sample folders. I removed the results that were marked "invalid" and results that apparently had a fixed newer version (marked as such). I had a duplicate problem with romor's results (a couple of duplicates in a subfolder), but I decided to keep the newer result files. I got 566 remaining result files. Assuming I did not make lots of mistakes, I wonder what can cause the difference. Did you disqualify more results after creating the rar package or does "531 samples" mean something else than the total number of result files?

Here's how chunky parses the 566 result files I have:

Code: [Select]
% Result file produced by chunky-0.8.4-beta
% ..\chunky.exe --codec-file=..\codecs.txt -n --ratings=results --warn -p 0.05
%
% Sample Averages:

Vorbis    Nero    Apple    CELT    Anchor
2.56    4.28    4.19    2.67    1.87
2.95    4.20    4.03    2.36    1.68
3.42    3.51    3.98    4.73    2.51
4.12    3.84    4.49    4.64    2.18
4.18    3.59    3.87    4.52    1.95
3.35    3.68    3.34    4.00    1.56
3.86    2.98    2.96    3.50    1.85
4.03    3.78    4.09    4.49    2.02
3.60    3.71    3.89    3.94    1.51
4.28    2.78    2.19    4.12    1.44
4.12    3.93    4.17    4.39    1.70
3.25    3.18    3.20    4.14    1.77
3.83    3.63    3.86    4.56    1.41
3.49    3.81    4.01    4.27    1.37
4.15    3.84    4.08    4.76    2.04
3.97    2.74    3.09    4.38    1.74
3.35    3.24    4.15    4.44    1.56
2.68    2.96    3.63    4.10    1.51
3.58    4.37    4.88    3.73    1.76
3.40    4.10    4.68    4.26    1.61
3.80    3.49    3.55    4.43    1.38
3.81    3.30    4.27    4.26    1.13
3.59    3.14    3.51    4.09    1.18
3.29    3.61    3.88    4.16    1.36
3.66    3.84    4.37    3.86    1.55
2.78    3.99    4.18    2.82    1.57
3.62    3.88    3.92    3.93    1.34
3.39    4.03    4.39    3.96    1.46
3.61    4.12    4.36    4.09    1.54
4.42    3.48    4.29    4.68    1.82

% Codec averages:
% 3.60    3.63    3.92    4.08    1.65
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 13:53:46
I got 566 remaining result files. Assuming I did not make lots of mistakes, I wonder what can cause the difference.

I get the same result as you. It looks like the results submitted on the 10th of April are missing.

Edit: See below.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 14:14:51
For comparison I uploaded a rar package of my "chunky" folder. it contains the reorganized result files and phong's chunky (http://www.phong.org/chunky/) (Windows version). The command line I used is in the instructions.txt file

I had to partially rename the result files to reorganize them into the sample folders. In addition I needed to change all r.wav strings inside the result files to .wav before chunky could work. I batch processed the files with Notepad++. I believe it was a "safe" edit.

The package is here: http://www.hydrogenaudio.org/forums/index....showtopic=88033 (http://www.hydrogenaudio.org/forums/index.php?showtopic=88033)
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-12 14:48:57
Quote
For processing the result .txt files with chunky I organized them to sample folders. I removed the results that were marked "invalid" and results that apparently had a fixed newer version (marked as such). I
had a duplicate problem with romor's results (a couple of duplicates in a subfolder), but I decided to keep the newer result files. I got 566 remaining result files. Assuming I did not make lots of mistakes, I
wonder what can cause the difference. Did you disqualify more results after creating the rar package or does "531 samples" mean something else than the total number of result files?


Sounds like you didn't eliminate the listeners with more than 4 invalid results.

The filtering rules on the page are:

*    If the listener ranked the reference worse than 4.5 on a sample, the listener's results for that sample were discarded.
*    If the listener ranked the low anchor at 5.0 on a sample, the listener's results for that sample were discarded.
*    If the listener ranked the reference below 5.0 on more than 4 samples, all of that listener's results were discarded.

You'll have to modify chunky to get the that behavior.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 14:49:25
For comparison I uploaded a rar package of my "chunky" folder. it contains the reorganized result files and phong's chunky (http://www.phong.org/chunky/) (Windows version). The command line I used is in the instructions.txt file

I had to partially rename the result files to reorganize them into the sample folders. In addition I needed to change all r.wav strings in filenames to .wav  before chunky could work. I batch processed the files with Notepad++. I believe it was a "safe" edit.

The package is here: http://www.hydrogenaudio.org/forums/index....showtopic=88033 (http://www.hydrogenaudio.org/forums/index.php?showtopic=88033)


Thanks, I didn't have the triaged results here, so this was welcome. By the way, chunky has quite dangerous behavior: by default, it squashes all listeners together per sample for the overall results. In other words, its discarding most of the information in the test, as if only a single listener did all samples! The per-sample results don't suffer from that, so those should be fine.

Edit: Whoops, I indeed missed some results that should have been discarded.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 14:59:02
Sounds like you didn't eliminate the listeners with more than 4 invalid results.

The filtering rules on the page are:

*    If the listener ranked the reference worse than 4.5 on a sample, the listener's results for that sample were discarded.
*    If the listener ranked the low anchor at 5.0 on a sample, the listener's results for that sample were discarded.
*    If the listener ranked the reference below 5.0 on more than 4 samples, all of that listener's results were discarded.

You'll have to modify chunky to get the that behavior.


Ah, good point. There were two discarded listeners, I got those. I saw one result with a rated reference that didn't cause an invalidation, so got that correctly.

But there are a few results with 5.0's for the reference. After discarding those, I'm at 559 samples now.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 15:02:36
Sounds like you didn't eliminate the listeners with more than 4 invalid results.


I removed two folders (= listeners) before doing the tasks I mentioned:

- 09 (too many invalid results. The listener has  never answered any email)
- 27 (something gone wrong or cheater )

I trusted the comments in the folder and file names. I did not look inside each and every result file.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 15:09:30
But there are a few results with 5.0's for the reference. After discarding those, I'm at 559 samples now.

Perhaps "low anchor" would be more accurate. 
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-12 15:17:53
Sounds like you didn't eliminate the listeners with more than 4 invalid results.


I removed two folders (= listeners) before doing the tasks I mentioned:

- 09 (too many invalid results. The listener has  never answered any email)
- 27 (something gone wrong or cheater )

I trusted the comments in the folder and file names. I did not look inside each and every result file.


Ah, okay!

(moving and amending from my edited post, since others already replied. Sorry)
The users which should have been excluded according to that rule are 09, 27, and 22 but IgorC decided to keep 22 (because 22 didn't understand the procedure at first but got better later) and I expected 21 to be filtered too (because he only rated the low anchor on almost all the samples: 23/30 are either low anchor only or invalid, including many of the really obvious ones).
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 15:35:48
But there are a few results with 5.0's for the reference. After discarding those, I'm at 559 samples now.

I found six "low anchor = 5.0" instances (I outputted a csv file from chunky and sorted the data by the low anchor column in Excel)

My math says 560. 

(or did you actually remove the "rated but accepted reference" instance?)
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 15:42:27
But there are a few results with 5.0's for the reference. After discarding those, I'm at 559 samples now.

I found six "low anchor = 5.0" instances (I outputted a csv file from chunky and sorted the data by the low anchor column in Excel)

My math says 560. 

(or did you actually remove the "rated but accepted reference" instance?)


No. But after running chunky I only had 565, not 566 files. It appears to reject one input file for some reason (this is on Linux).

A lesson here is that the post-screened data-set should be published, too, because it's easy to make mistakes there and it makes it easier for people wanting to do other/further analysis. But considering the comment from NullC the results on the site are probably correct.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 16:14:33
Regarding the bitrate table,

I guess that CELT/Opus is not supported in any program that can display and/or export accurate bit rate data.

If the bitrate needs to be calculated from the file size should the size of the ogg container data be reduced from the file size before performing the calculation? What would be the correct amount?

Would the bitrate value then be comparable with the values that foobar shows for the other contenders? (It is quite simple to export bitrate data from foobar.)
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-12 16:15:40
Sounds like you didn't eliminate the listeners with more than 4 invalid results.


I removed two folders (= listeners) before doing the tasks I mentioned:

- 09 (too many invalid results. The listener has  never answered any email)
- 27 (something gone wrong or cheater )

I trusted the comments in the folder and file names. I did not look inside each and every result file.


# 27 are my results. I do not know, if something went wrong, but I am definitely not a cheater.
Over a week ago, I sent Igor some wave-files he asked for, but he did not answered my email jet.
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-12 17:54:07
# 27 are my results. I do not know, if something went wrong, but I am definitely not a cheater.
Over a week ago, I sent Igor some wave-files he asked for, but he did not answered my email jet.


I think it's really unfortunate that Igor released a file with the word cheater in it.  There are so many ways for a result to go weird which have nothing to do with "cheating".

Your results can be excluded purely based on the previously published confused reference criteria (2,4,9,22,30 invalid),  so that should close the question on correctness of excluding those results and it should have been left at that.  Even with good and careful listeners this can happen, and it's nothing anyone should take too personally.

Though, your results are pretty weird— You ranked the reference fairly low (e.g. 3) on a couple comparisons where many people found the reference and codec indistinguishable.  I think you also failed to reverse your preference on some samples where the other listeners changed their preference (behavior characteristic of a non-blind test?).

I don't mean to cause offense, but were you listening via speakers or could you have far less HF sensitivity than most of the other listeners (if you are male and older than most participants then the answer to that might be yes)?  Any other ideas why your results might be very different overall and also on specific samples?
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-12 18:10:25
Regarding the bitrate table,
I guess that CELT/Opus is not supported in any program that can display and/or export accurate bit rate data.
If the bitrate needs to be calculated from the file size should the size of the ogg container data be reduced from the file size before performing the calculation? What would be the correct amount?
Would the bitrate value then be comparable with the values that foobar shows for the other contenders? (It is quite simple to export bitrate data from foobar.)


If you wish to remove container overhead for the Vorbis and Opus files you can use a tool like ogg-dump from oggztools to extract all the packet sizes.

On a few samples Vorbis suffers a bit because the Vorbis headers are fairly large compare to an 8 second 64kbit/sec file (e.g. Sample01) but I don't think the container overhead is all that considerable.
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 18:13:06
Yes, I was too strict. Sorry about it.

Some of the listeners prefer Nero over Vorbis or vice versa. Some of them have rated Vorbis higher against HE-AAC codecs.
Other preferred Apple HE-AAC over CELT on second half of samples. These variations are all fine.
Finally on average Opus/CELT was better for all listeners with enough results.
It was very strange that you have ranked the Opus as low as low anchor! (like sample 10 and much others) where ALL other listeners scored it very well.
You average scores (including 5 invalid samples):
Vorbis - 3.53
Nero - 3.15
Apple -3.51
CELT - 2.34


Maybe your hardware has some issues.

Earlier I also wrote you to re run again the whole test  because there were 5 invalid results and all test was discarded.
Title: Multiformat listening test @ ~64kbps: Results
Post by: markanini on 2011-04-12 18:52:51
I figured ratings would vary between testers depending on which of pre-echo, lowpass, ringing, warble and grittiness is more objectionable. Further more on the Bohemian Rhapsody sample source warbling had me very confused for a while
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-12 19:42:45
# 27 are my results. I do not know, if something went wrong, but I am definitely not a cheater.
Over a week ago, I sent Igor some wave-files he asked for, but he did not answered my email jet.


I think it's really unfortunate that Igor released a file with the word cheater in it.  There are so many ways for a result to go weird which have nothing to do with "cheating".

Your results can be excluded purely based on the previously published confused reference criteria (2,4,9,22,30 invalid),  so that should close the question on correctness of excluding those results and it should have been left at that.  Even with good and careful listeners this can happen, and it's nothing anyone should take too personally.

Though, your results are pretty weird— You ranked the reference fairly low (e.g. 3) on a couple comparisons where many people found the reference and codec indistinguishable.  I think you also failed to reverse your preference on some samples where the other listeners changed their preference (behavior characteristic of a non-blind test?).

I don't mean to cause offense, but were you listening via speakers or could you have far less HF sensitivity than most of the other listeners (if you are male and older than most participants then the answer to that might be yes)?  Any other ideas why your results might be very different overall and also on specific samples?


This was the first test of this kind I made. I fast realized that I do hear much difference with my speakers, so I tested the samples with a pair of good ear plugs. I "think" I can hear differeneces in HF quite good. btw. I am male and 26 years old.
Yes OK, there might is a special case. I can hear high frequencies better with my left hear, where I got a small tinnitus (resulting from loud fireworks).
Title: Multiformat listening test @ ~64kbps: Results
Post by: lessthanjoey on 2011-04-12 19:51:58
I've done some more testing with headphones after this was finished and also realized that my speakers were limiting my initial impressions. I can pick up differences significantly more easily through headphones than speakers. I guess next time I'll have a more valuable contribution!
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-12 20:09:04
Yes, I was too strict. Sorry about it.

Some of the listeners prefer Nero over Vorbis or vice versa. Some of them have rated Vorbis higher against HE-AAC codecs.
Other preferred Apple HE-AAC over CELT on second half of samples. These variations are all fine.
Finally on average Opus/CELT was better for all listeners with enough results.
It was very strange that you have ranked the Opus as low as low anchor! (like sample 10 and much others) where ALL other listeners scored it very well.
You average scores (including 5 invalid samples):
Vorbis - 3.53
Nero - 3.15
Apple -3.51
CELT - 2.34


Maybe your hardware has some issues.

Earlier I also wrote you to re run again the whole test  because there were 5 invalid results and all test was discarded.


Hi Igor,
on sample 10 I votes this way, because I found this part of the sample SUPER annoying:

http://dl.dropbox.com/u/745331/64kbs%20tes..._4_celt_cut.wav (http://dl.dropbox.com/u/745331/64kbs%20test/Sample10_4_celt_cut.wav)
http://dl.dropbox.com/u/745331/64kbs%20tes...e10_org_cut.wav (http://dl.dropbox.com/u/745331/64kbs%20test/Sample10_org_cut.wav)

From this point the "glitch" gets less annoying but stays through the end of the sample.
maybe this is only for me that annoying or is it a decoding error?! can you please check this?

Thanks,
Christoph
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-12 20:11:07
Thanks for organizing the tests, guys! Sorry for being picky, but I'm not convinced about the analysis. To ease my mind, it would be great if you could comment on the following.



Edit: Christoph, why are the samples you uploaded at 96 kHz? Did you do the test that way?

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 20:21:37
motion_blur,

You can download the results of all listeners and compare them with yours. http://listening-tests.hydrogenaudio.org/i...ous/results.zip (http://listening-tests.hydrogenaudio.org/igorc/miscellaneous/results.zip)
They are too different.

Also why do you post samples with samplerate 96 kHz?



Hi, Chris,
I had also hard time to understand the bootstrap analysis.
Please, wait for detailed answer on it

As of Christoph's results, all of them were excluded. http://www.hydrogenaudio.org/forums/index....st&p=751768 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=88023&view=findpost&p=751768)
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 20:30:48
Here is the raw data for a bitrate table. The bitrates are calculated from the physical file sizes and exact durations of the lossless reference files. The container overhead is not taken into account, but the situation is the same for every contender. I can create the finished table if no one else volunteers, but perhaps not today. I have already spent too much time with this.

Code: [Select]
		bytes	duration	kbps
FOLDER .\Sample01\
FILE sample01.flac 742,802 8.029 740.12
FILE sample01_1.ogg 74,594 8.029 74.32
FILE sample01_2.m4a 62,553 8.029 62.33
FILE sample01_3.m4a 68,891 8.029 68.64
FILE sample01_4.oga 68,270 8.029 68.02
FILE sample01_5.m4a 54,640 8.029 54.44
FOLDER .\Sample02\
FILE sample02.flac 2,834,017 25.000 906.89
FILE sample02_1.ogg 232,073 25.000 74.26
FILE sample02_2.m4a 192,460 25.000 61.59
FILE sample02_3.m4a 211,283 25.000 67.61
FILE sample02_4.oga 210,511 25.000 67.36
FILE sample02_5.m4a 159,226 25.000 50.95
FOLDER .\Sample03\
FILE sample03.flac 960,531 16.717 459.67
FILE sample03_1.ogg 154,038 16.717 73.72
FILE sample03_2.m4a 103,701 16.717 49.63
FILE sample03_3.m4a 142,545 16.717 68.22
FILE sample03_4.oga 143,250 16.717 68.55
FILE sample03_5.m4a 108,151 16.717 51.76
FOLDER .\Sample04\
FILE sample04.flac 1,880,667 19.858 757.65
FILE sample04_1.ogg 162,906 19.858 65.63
FILE sample04_2.m4a 147,510 19.858 59.43
FILE sample04_3.m4a 171,758 19.858 69.19
FILE sample04_4.oga 170,527 19.858 68.70
FILE sample04_5.m4a 126,836 19.858 51.10
FOLDER .\Sample05\
FILE sample05.flac 2,405,162 29.323 656.18
FILE sample05_1.ogg 267,027 29.323 72.85
FILE sample05_2.m4a 258,347 29.323 70.48
FILE sample05_3.m4a 250,533 29.323 68.35
FILE sample05_4.oga 257,966 29.323 70.38
FILE sample05_5.m4a 185,075 29.323 50.49
FOLDER .\Sample06\
FILE sample06.flac 1,936,163 17.468 886.72
FILE sample06_1.ogg 128,628 17.468 58.91
FILE sample06_2.m4a 143,713 17.468 65.82
FILE sample06_3.m4a 152,934 17.468 70.04
FILE sample06_4.oga 148,598 17.468 68.05
FILE sample06_5.m4a 112,631 17.468 51.58
FOLDER .\Sample07\
FILE sample07.flac 1,725,279 25.838 534.18
FILE sample07_1.ogg 280,547 25.838 86.86
FILE sample07_2.m4a 196,327 25.838 60.79
FILE sample07_3.m4a 231,898 25.838 71.80
FILE sample07_4.oga 223,721 25.838 69.27
FILE sample07_5.m4a 163,560 25.838 50.64
FOLDER .\Sample08\
FILE sample08.flac 1,732,476 20.455 677.58
FILE sample08_1.ogg 159,867 20.455 62.52
FILE sample08_2.m4a 165,652 20.455 64.79
FILE sample08_3.m4a 172,542 20.455 67.48
FILE sample08_4.oga 171,391 20.455 67.03
FILE sample08_5.m4a 131,021 20.455 51.24
FOLDER .\Sample09\
FILE sample09.flac 3,588,564 27.481 1044.67
FILE sample09_1.ogg 281,690 27.481 82.00
FILE sample09_2.m4a 235,189 27.481 68.47
FILE sample09_3.m4a 250,493 27.481 72.92
FILE sample09_4.oga 236,652 27.481 68.89
FILE sample09_5.m4a 174,125 27.481 50.69
FOLDER .\Sample10\
FILE sample10.flac 3,176,903 29.207 870.18
FILE sample10_1.ogg 413,776 29.207 113.34
FILE sample10_2.m4a 255,898 29.207 70.09
FILE sample10_3.m4a 267,479 29.207 73.26
FILE sample10_4.oga 242,965 29.207 66.55
FILE sample10_5.m4a 184,898 29.207 50.64
FOLDER .\Sample11\
FILE sample11.flac 2,034,667 20.017 813.18
FILE sample11_1.ogg 183,494 20.017 73.34
FILE sample11_2.m4a 173,358 20.017 69.28
FILE sample11_3.m4a 181,262 20.017 72.44
FILE sample11_4.oga 173,385 20.017 69.30
FILE sample11_5.m4a 128,182 20.017 51.23
FOLDER .\Sample12\
FILE sample12.flac 1,369,056 15.001 730.11
FILE sample12_1.ogg 175,658 15.001 93.68
FILE sample12_2.m4a 145,147 15.001 77.41
FILE sample12_3.m4a 131,690 15.001 70.23
FILE sample12_4.oga 131,032 15.001 69.88
FILE sample12_5.m4a 97,925 15.001 52.22
FOLDER .\Sample13\
FILE sample13.flac 3,199,288 30.002 853.09
FILE sample13_1.ogg 267,568 30.002 71.35
FILE sample13_2.m4a 266,484 30.002 71.06
FILE sample13_3.m4a 268,730 30.002 71.66
FILE sample13_4.oga 253,476 30.002 67.59
FILE sample13_5.m4a 189,903 30.002 50.64
FOLDER .\Sample14\
FILE sample14.flac 3,244,477 24.494 1059.68
FILE sample14_1.ogg 236,053 24.494 77.10
FILE sample14_2.m4a 214,877 24.494 70.18
FILE sample14_3.m4a 209,514 24.494 68.43
FILE sample14_4.oga 207,971 24.494 67.93
FILE sample14_5.m4a 156,055 24.494 50.97
FOLDER .\Sample15\
FILE sample15.flac 2,332,219 29.543 631.55
FILE sample15_1.ogg 269,799 29.543 73.06
FILE sample15_2.m4a 217,455 29.543 58.89
FILE sample15_3.m4a 256,557 29.543 69.47
FILE sample15_4.oga 260,016 29.543 70.41
FILE sample15_5.m4a 186,255 29.543 50.44
FOLDER .\Sample16\
FILE sample16.flac 631,914 6.634 762.03
FILE sample16_1.ogg 71,240 6.634 85.91
FILE sample16_2.m4a 58,878 6.634 71.00
FILE sample16_3.m4a 57,764 6.634 69.66
FILE sample16_4.oga 56,862 6.634 68.57
FILE sample16_5.m4a 45,967 6.634 55.43
FOLDER .\Sample17\
FILE sample17.flac 1,794,257 15.472 927.74
FILE sample17_1.ogg 136,374 15.472 70.51
FILE sample17_2.m4a 126,772 15.472 65.55
FILE sample17_3.m4a 138,673 15.472 71.70
FILE sample17_4.oga 131,054 15.472 67.76
FILE sample17_5.m4a 100,027 15.472 51.72
FOLDER .\Sample18\
FILE sample18.flac 2,403,680 20.155 954.08
FILE sample18_1.ogg 164,209 20.155 65.18
FILE sample18_2.m4a 172,550 20.155 68.49
FILE sample18_3.m4a 180,669 20.155 71.71
FILE sample18_4.oga 173,027 20.155 68.68
FILE sample18_5.m4a 128,988 20.155 51.20
FOLDER .\Sample19\
FILE sample19.flac 2,473,098 25.271 782.90
FILE sample19_1.ogg 188,316 25.271 59.61
FILE sample19_2.m4a 203,905 25.271 64.55
FILE sample19_3.m4a 213,815 25.271 67.69
FILE sample19_4.oga 211,536 25.271 66.97
FILE sample19_5.m4a 159,900 25.271 50.62
FOLDER .\Sample20\
FILE sample20.flac 2,208,744 19.887 888.52
FILE sample20_1.ogg 137,666 19.887 55.38
FILE sample20_2.m4a 162,528 19.887 65.38
FILE sample20_3.m4a 171,667 19.887 69.06
FILE sample20_4.oga 167,556 19.887 67.40
FILE sample20_5.m4a 127,993 19.887 51.49
FOLDER .\Sample21\
FILE sample21.flac 2,401,753 19.908 965.14
FILE sample21_1.ogg 179,423 19.908 72.10
FILE sample21_2.m4a 180,686 19.908 72.61
FILE sample21_3.m4a 182,050 19.908 73.16
FILE sample21_4.oga 167,027 19.908 67.12
FILE sample21_5.m4a 127,049 19.908 51.05
FOLDER .\Sample22\
FILE sample22.flac 2,831,537 22.143 1023.00
FILE sample22_1.ogg 200,308 22.143 72.37
FILE sample22_2.m4a 200,216 22.143 72.34
FILE sample22_3.m4a 188,506 22.143 68.10
FILE sample22_4.oga 188,741 22.143 68.19
FILE sample22_5.m4a 140,889 22.143 50.90
FOLDER .\Sample23\
FILE sample23.flac 1,216,626 11.686 832.88
FILE sample23_1.ogg 121,623 11.686 83.26
FILE sample23_2.m4a 102,927 11.686 70.46
FILE sample23_3.m4a 119,684 11.686 81.93
FILE sample23_4.oga 106,219 11.686 72.72
FILE sample23_5.m4a 77,692 11.686 53.19
FOLDER .\Sample24\
FILE sample24.flac 1,870,069 17.025 878.74
FILE sample24_1.ogg 134,142 17.025 63.03
FILE sample24_2.m4a 135,416 17.025 63.63
FILE sample24_3.m4a 153,654 17.025 72.20
FILE sample24_4.oga 147,069 17.025 69.11
FILE sample24_5.m4a 110,437 17.025 51.89
FOLDER .\Sample25\
FILE sample25.flac 2,734,360 28.727 761.47
FILE sample25_1.ogg 281,634 28.727 78.43
FILE sample25_2.m4a 242,678 28.727 67.58
FILE sample25_3.m4a 252,085 28.727 70.20
FILE sample25_4.oga 243,928 28.727 67.93
FILE sample25_5.m4a 182,075 28.727 50.70
FOLDER .\Sample26\
FILE sample26.flac 2,599,998 22.092 941.52
FILE sample26_1.ogg 223,182 22.092 80.82
FILE sample26_2.m4a 180,466 22.092 65.35
FILE sample26_3.m4a 191,940 22.092 69.51
FILE sample26_4.oga 185,322 22.092 67.11
FILE sample26_5.m4a 141,355 22.092 51.19
FOLDER .\Sample27\
FILE sample27.flac 2,574,403 21.612 952.95
FILE sample27_1.ogg 200,562 21.612 74.24
FILE sample27_2.m4a 187,622 21.612 69.45
FILE sample27_3.m4a 193,290 21.612 71.55
FILE sample27_4.oga 178,160 21.612 65.95
FILE sample27_5.m4a 137,567 21.612 50.92
FOLDER .\Sample28\
FILE sample28.flac 1,739,752 19.144 727.02
FILE sample28_1.ogg 159,467 19.144 66.64
FILE sample28_2.m4a 162,526 19.144 67.92
FILE sample28_3.m4a 176,282 19.144 73.67
FILE sample28_4.oga 163,339 19.144 68.26
FILE sample28_5.m4a 123,291 19.144 51.52
FOLDER .\Sample29\
FILE sample29.flac 2,409,128 28.505 676.13
FILE sample29_1.ogg 215,868 28.505 60.58
FILE sample29_2.m4a 233,228 28.505 65.46
FILE sample29_3.m4a 258,227 28.505 72.47
FILE sample29_4.oga 239,592 28.505 67.24
FILE sample29_5.m4a 180,755 28.505 50.73
FOLDER .\Sample30\
FILE sample30.flac 2,648,660 30.000 706.31
FILE sample30_1.ogg 227,521 30.000 60.67
FILE sample30_2.m4a 247,638 30.000 66.04
FILE sample30_3.m4a 254,019 30.000 67.74
FILE sample30_4.oga 251,772 30.000 67.14
FILE sample30_5.m4a 189,944 30.000 50.65
The codecs:
_1. Vorbis
_2. Nero
_3. Apple
_4. Opus (CELT)
_5. low anchor

The FLAC bitrate may be somewhat interesting. It gives some indication of the sample's complexity.

The same data in Excel format is available here: http://www.hydrogenaudio.org/forums/index....showtopic=88033 (http://www.hydrogenaudio.org/forums/index.php?showtopic=88033)
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-12 20:41:47
Thanks for organizing the tests, guys! Sorry for being picky, but I'm not convinced about the analysis. To ease my mind, it would be great if you could comment on the following.

  • Please provide the number of valid results (i.e. listeners) per sample (excluding "27", see below).
  • How did you compute the overall average score of a codec and its confidence intervals? Taking the mean of all listeners' results? That would mean a sample with more listeners (i.e. probably sample01) has a greater influence than the last few samples (which still needed listeners shortly before the end of the test). This is probably not a good approach; weighting each sample equally in the overall score seems to be the way to go for me (but it probably doesn't make a difference here, but still...).
  • Nothing personal, but if a listener like "27" consistently scores in opposite direction as the average (as shown by Igor), a thorough post-screening analysis (like Spearman rank correlation < some value) would - and has to - exclude such results.


Edit: Christoph, why are the samples you uploaded at 96 kHz? Did you do the test that way?

Chris


@Edit: Oh sorry, I just fast cut it with audacity, did not noticed it was still configured that way. But anyway I hope you can hear what mean.
Maybe the most people only concentrated on the beginning of the samples!? the part with the glitch is way in the sample.

Yes, I know that my results do not fit the standards and therefore are excluded.
And if I would be in, I am just one of the outliers and do not influence the median of the scoring much.
http://www.physics.csbsju.edu/stats/complex.box.defs.gif (http://www.physics.csbsju.edu/stats/complex.box.defs.gif)

But I want to know what I did different and what I can change next time.
Title: Multiformat listening test @ ~64kbps: Results
Post by: _m²_ on 2011-04-12 20:43:06
Some presentation suggestions:
1. Codec versions and settings should be in the results or one clearly marked click away. I don't consider what is now to be clearly marked.
2. Links to results of older tests would be welcome.
3. I can't wait for the bitrate table.
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 20:44:13
Thank you for your help, AlexB. If you can do the complete bitrate analysis it  will be useful.
I hadn't time to do bitrate table in these days.
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 20:46:43
Some presentation suggestions:
1. Codec versions and settings should be in the results or one clearly marked click away. I don't consider what is now to be clearly marked.
2. Links to results of older tests would be welcome.
3. I can't wait for the bitrate table.


Thank you for observations.
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-12 20:52:23
Christoph, do you mean the slightly washed out bass drum? To me (and probably most other listeners) the artifacts of the other codecs in the first 15 seconds appeared much more severe. I don't have the decoded items here. Can someone check if Christoph's CELT decodes match his/her own?

And, since you said this is your first listening test of this kind: did you do training sessions? Did you read e.g. this guideline (http://www.ecodis.de/audio/guideline_high.html)? The way you choose your loops (especially length) has a great impact on your ability to identify artifacts.

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-12 20:54:24
I've checked. The decoder on Christoph's system is fine.

P.S. I've also pointed to Christoph to this guide http://ff123.net/64test/practice.html (http://ff123.net/64test/practice.html)
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-12 20:57:30
I figured ratings would vary between testers depending on which of pre-echo, lowpass, ringing, warble and grittiness is more objectionable. Further more on the Bohemian Rhapsody sample source warbling had me very confused for a while


The bigger difference just comes from which samples were tested.  A great many listeners only listened to the first few samples, so of course their preferences will be skewed by the correlation with the samples they tested.

If you look at the 10 listeners which had all 30 valid results (so no sample-unbalance), you'll see that the overall preferences agree pretty strongly:

These are just the ranks of the averages (no comment on the significance):

Garf    Opus > Apple_HE-AAC > Nero_HE-AAC > Vorbis > AAC-LC@48k
hlm    Opus > Apple_HE-AAC > Nero_HE-AAC > Vorbis> AAC-LC@48k
IgorC  Opus > Apple_HE-AAC > Vorbis > Nero_HE-AAC > AAC-LC@48k
KW      Opus > Apple_HE-AAC > Nero_HE-AAC > Vorbis > AAC-LC@48k
04_anon Opus > Apple_HE-AAC > Nero_HE-AAC > Vorbis > AAC-LC@48k
06_anon Opus > Apple_HE-AAC > Nero_HE-AAC > Vorbis > AAC-LC@48k
14_anon Opus > Apple_HE-AAC > Vorbis > Nero_HE-AAC > AAC-LC@48k
25_anon Opus > Apple_HE-AAC > Vorbis > Nero_HE-AAC > AAC-LC@48k
26_anon Apple_HE-AAC > Opus > Nero_HE-AAC > Vorbis > AAC-LC@48k
30_anon Opus > Apple_HE-AAC > Nero_HE-AAC > Vorbis > AAC-LC@48k

The sample to sample variance in rank is a lot greater than the listener to listener variance in rank (scores might be another matter— but listeners don't score things the same, and because the score scale is non-linear I don't know of any intuitively correct way to deal with that other than using ranks).



> d <- read.listener.file("comp_data.txt")
> aggregate(d$value, list(codec=d$codec,listener=d$listener),mean)
          codec listener        x
1    AAC-LC@48k  04_anon 1.033333
2  Apple_HE-AAC  04_anon 3.550000
3  Nero_HE-AAC  04_anon 3.453333
4          Opus  04_anon 3.900000
5        Vorbis  04_anon 3.310000
6    AAC-LC@48k  06_anon 1.793333
7  Apple_HE-AAC  06_anon 4.186667
8  Nero_HE-AAC  06_anon 3.820000
9          Opus  06_anon 4.460000
10      Vorbis  06_anon 3.603333
11  AAC-LC@48k  14_anon 1.050000
12 Apple_HE-AAC  14_anon 3.283333
13  Nero_HE-AAC  14_anon 2.666667
14        Opus  14_anon 3.600000
15      Vorbis  14_anon 3.110000
16  AAC-LC@48k  25_anon 1.293333
17 Apple_HE-AAC  25_anon 3.183333
18  Nero_HE-AAC  25_anon 2.500000
19        Opus  25_anon 3.503333
20      Vorbis  25_anon 2.960000
21  AAC-LC@48k  26_anon 1.800000
22 Apple_HE-AAC  26_anon 4.866667
23  Nero_HE-AAC  26_anon 4.666667
24        Opus  26_anon 4.766667
25      Vorbis  26_anon 4.573333
26  AAC-LC@48k  30_anon 1.086667
27 Apple_HE-AAC  30_anon 3.110000
28  Nero_HE-AAC  30_anon 2.656667
29        Opus  30_anon 3.333333
30      Vorbis  30_anon 2.476667
31  AAC-LC@48k    Garf 1.923333
32 Apple_HE-AAC    Garf 4.093333
33  Nero_HE-AAC    Garf 3.963333
34        Opus    Garf 4.203333
35      Vorbis    Garf 3.916667
36  AAC-LC@48k      hlm 1.533333
37 Apple_HE-AAC      hlm 3.476667
38  Nero_HE-AAC      hlm 3.113333
39        Opus      hlm 3.656667
40      Vorbis      hlm 2.616667
41  AAC-LC@48k    IgorC 1.056667
42 Apple_HE-AAC    IgorC 3.003333
43  Nero_HE-AAC    IgorC 2.753333
44        Opus    IgorC 3.583333
45      Vorbis    IgorC 2.940000
46  AAC-LC@48k      KW 1.376667
47 Apple_HE-AAC      KW 4.040000
48  Nero_HE-AAC      KW 3.816667
49        Opus      KW 4.236667
50      Vorbis      KW 3.190000
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 21:06:22
2. Links to results of older tests would be welcome.

http://listeningtests.t35.com (http://listeningtests.t35.com).

I have mirrored Roberto's and Sebastian's old test sites. Sebastian's tests are also available here: http://listening-tests.hydrogenaudio.org/sebastian/ (http://listening-tests.hydrogenaudio.org/sebastian/)

Quote
3. I can't wait for the bitrate table.

Actually, a more useful presentation would be a comparison like this: http://www.hydrogenaudio.org/forums/index....st&p=593735 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=57322&view=findpost&p=593735)
I.e. bitrates that represent real life usage, not the bitrates of these short test samples.

I am planning to do it, but the lack of application support for Opus (CELT) will make the process quite a bit more complex than before.
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-12 21:11:34
Christoph, do you mean the slightly washed out bass drum? To me (and probably most other listeners) the artifacts of the other codecs in the first 15 seconds appeared much more severe. I don't have the decoded items here. Can someone check if Christoph's CELT decodes match his/her own?

And, since you said this is your first listening test of this kind: did you do training sessions? Did you read e.g. this guideline (http://www.ecodis.de/audio/guideline_high.html)? The way you choose your loops (especially length) has a great impact on your ability to identify artifacts.

Chris


Hi C.R.,

yes I read the guideline before the test, but usually compared only 2-3 loops per sample.
It is interesting that you are not that much annoyed by this part.
I can clearly hear it and i just did a spectrum analysis and it is also visible.
http://dl.dropbox.com/u/745331/spectrum.png (http://dl.dropbox.com/u/745331/spectrum.png)
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-12 21:38:37
yes I read the guideline before the test, but usually compared only 2-3 loops per sample.

I wonder how many listeners did it like that. It seems there are a lot of things we should put in a checklist for all to read before a test. Such as "listen to the entire sample" and "use headphones"... Maybe by coincidence you only listened to sections where CELT does a bit worse than the other codecs?

Quote
It is interesting that you are not that much annoyed by this part.
I can clearly hear it and i just did a spectrum analysis and it is also visible.
http://dl.dropbox.com/u/745331/spectrum.png (http://dl.dropbox.com/u/745331/spectrum.png)

Weird, I don't even see this in my own spectrogram of the file you uploaded.    What frequency range is the highlighted part in? In other words: please label your axes!

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 21:44:48
Quote
Please provide the number of valid results (i.e. listeners) per sample (excluding "27", see below).


Will be addressed when per sample graphs are made. You can obtain this data yourself easily if you can't wait - the results are public.

  • How did you compute the overall average score of a codec and its confidence intervals? Taking the mean of all listeners' results? That would mean a sample with more listeners (i.e. probably sample01) has a greater influence than the last few samples (which still needed listeners shortly before the end of the test). This is probably not a good approach; weighting each sample equally in the overall score seems to be the way to go for me (but it probably doesn't make a difference here, but still...).


This is already addressed and explained on the results page. Note that equal sample weighting, by only including complete results, does not change the results in the slightest.

That being said, the only solution to this is to put some infrastructure to force equal listeners per sample in the next tests. Any kind of post-processing to equalize the sample weights is probably as controversial as not having them equal in the first place. The samples that weren't included in the test also had unequal weights compared to those that were, if you know what I mean.


Quote
  • Nothing personal, but if a listener like "27" consistently scores in opposite direction as the average (as shown by Igor), a thorough post-screening analysis (like Spearman rank correlation < some value) would - and has to - exclude such results.


As explained in this thread, this listener was in fact screened.
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-12 21:47:22
Sorry for being picky, but I'm not convinced about the analysis.


The paired statistical tests are pretty incontrovertible.  I've since run the same analysis with a number of different balancing and post filtering rules and every time it's come out to be the same way.

If it's any consolation, Opus considerably bombs the couple of cases that it does poorly (though its sample by sample variance is still not as large as the other codecs, it has stronger outliers).  This is undoubtedly due to a mixture of encoder immaturity, lack of taking advantage of VBR,  and just one of the annoying tradeoffs that come from creating a low latency codec. (The mode opus was used in here has a total of 22.5ms of latency, including the overlap but ignoring any serialization delay related to VBR).

I've noticed that there seems to be some misunderstanding promoted around here related to confidence intervals.  Even ignoring the issues with non-pairwise comparisons, assumptions of normality, etc.  there seems to be a mis-aprehension  that the confidence intervals must not overlap at all for the result to be deemed significant to whatever P-value was used to draw the bars.  This is clearly incorrect.

For example, consider 5% error bars on the mean of codec A and 5% bars on the mean codec B  and the lower bar of A is the same as the upper bar of B.  Is there a 1/20 (p=0.05) chance that the difference in means arose from noise?  _NO_  If we assume that the errors are independent the chance of that is more like 1/400 (0.05^2).  Of course, the errors are not completely independently distributed— but this fact also invalidates the assumptions used to set the errors bars in the first place. Another  approach would be to compare the mean of one value with the error-bars on the mean of the other and vice versa, this isn't ideal either but it does avoid squaring the P-value used.

Blocked pair-wise parametric tests are much better for this reason, and others but they don't result in pretty graphs.
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-12 21:56:39
yes I read the guideline before the test, but usually compared only 2-3 loops per sample.

I wonder how many listeners did it like that. It seems there are a lot of things we should put in a checklist for all to read before a test. Such as "listen to the entire sample" and "use headphones"... Maybe by coincidence you only listened to sections where CELT does a bit worse than the other codecs?

Quote
It is interesting that you are not that much annoyed by this part.
I can clearly hear it and i just did a spectrum analysis and it is also visible.
http://dl.dropbox.com/u/745331/spectrum.png (http://dl.dropbox.com/u/745331/spectrum.png)

Weird, I don't even see this in my own spectrogram of the file you uploaded.    What frequency range is the highlighted part in? In other words: please label your axes!

Chris


I did the spectrogram with foobar and log scale, sadly no labels. I looked at it with linear scale and the gap goes approximately from 7 kHz to 9kHz.
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-12 22:12:22
Sorry, Christoph, can't reproduce it. What you describe must sound like a notch filter, i.e. frequency band missing. Haven't noticed anything of that sort during and after the test. What OS are you using? 64-bit?

Thanks, Garf and NullC, for the explanations.

Note that equal sample weighting, by only including complete results, does not change the results in the slightest.

That's good to hear. Still, if you find some time, would you mind creating a closeup average-codec-score plot using only the complete results, just like the plot on the results page? 

Thanks,

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-12 22:20:37
Here's the bitrate table:

(http://i224.photobucket.com/albums/dd212/AB2K/ha/bitrate_table.png)

In Excel format:

http://www.hydrogenaudio.org/forums/index....st&p=751818 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=88033&view=findpost&p=751818)
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 22:24:45
Some presentation suggestions:
1. Codec versions and settings should be in the results or one clearly marked click away. I don't consider what is now to be clearly marked.
2. Links to results of older tests would be welcome.
3. I can't wait for the bitrate table.


I added the bitrate table (thanks AlexB!), but that's as far as I'll go. If people want nicer webpages they need to find someone who is actually skilled at making nice HTML/CSS.
Title: Multiformat listening test @ ~64kbps: Results
Post by: saintdev on 2011-04-12 23:21:33
Here's the bitrate table:

I uploaded the Excel file to Google Docs (https://spreadsheets.google.com/pub?hl=en&hl=en&key=0AsR9EdcdIH_bdFVEdjNEb2E2RGNlQTNDVFRQQldiV1E&single=true&gid=0&output=html).

Link to the actual spreadsheet on Docs:
Google Docs Bitrate Table (https://spreadsheets.google.com/ccc?key=0AsR9EdcdIH_bdFVEdjNEb2E2RGNlQTNDVFRQQldiV1E&hl=en)
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-12 23:37:51
Note that equal sample weighting, by only including complete results, does not change the results in the slightest.

That's good to hear. Still, if you find some time, would you mind creating a closeup average-codec-score plot using only the complete results, just like the plot on the results page? 

Thanks,

Chris


The 10 listeners that did all samples with all results valid (N=300):

(http://listening-tests.hydrogenaudio.org/igorc/nonblocked_means_complete.png)

The results are just as highly significant:

Code: [Select]
bootstrap.py v1.0 2011-02-03
Copyright (C) 2011 Gian-Carlo Pascutto <gcp@sjeng.org>
License Affero GPL version 3 or later <http://www.gnu.org/licenses/agpl.html>

Reading from: bs1.txt
Read 5 treatments, 300 samples => 10 comparisons
Means:
      Vorbis   Nero_HE-AAC  Apple_HE-AAC          Opus    AAC-LC@48k
       3.270         3.341         3.679         3.924         1.395

Unadjusted p-values:
          Nero_HE-AAC   Apple_HE-AAC  Opus          AAC-LC@48k  
Vorbis        0.297         0.000*        0.000*        0.000*      
Nero_HE-AAC   -             0.000*        0.000*        0.000*      
Apple_HE-AAC  -             -             0.000*        0.000*      
Opus          -             -             -             0.000*      

Apple_HE-AAC is better than Vorbis (p=0.000)
Apple_HE-AAC is better than Nero_HE-AAC (p=0.000)
Opus is better than Vorbis (p=0.000)
Opus is better than Nero_HE-AAC (p=0.000)
Opus is better than Apple_HE-AAC (p=0.000)
AAC-LC@48k is worse than Vorbis (p=0.000)
AAC-LC@48k is worse than Nero_HE-AAC (p=0.000)
AAC-LC@48k is worse than Apple_HE-AAC (p=0.000)
AAC-LC@48k is worse than Opus (p=0.000)

p-values adjusted for multiple comparison:
          Nero_HE-AAC   Apple_HE-AAC  Opus          AAC-LC@48k  
Vorbis        0.297         0.000*        0.000*        0.000*      
Nero_HE-AAC   -             0.000*        0.000*        0.000*      
Apple_HE-AAC  -             -             0.000*        0.000*      
Opus          -             -             -             0.000*      

Apple_HE-AAC is better than Vorbis (p=0.000)
Apple_HE-AAC is better than Nero_HE-AAC (p=0.000)
Opus is better than Vorbis (p=0.000)
Opus is better than Nero_HE-AAC (p=0.000)
Opus is better than Apple_HE-AAC (p=0.000)
AAC-LC@48k is worse than Vorbis (p=0.000)
AAC-LC@48k is worse than Nero_HE-AAC (p=0.000)
AAC-LC@48k is worse than Apple_HE-AAC (p=0.000)
AAC-LC@48k is worse than Opus (p=0.000)
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-13 00:41:37
Sorry, Christoph, can't reproduce it. What you describe must sound like a notch filter, i.e. frequency band missing. Haven't noticed anything of that sort during and after the test. What OS are you using? 64-bit?

Thanks, Garf and NullC, for the explanations.

Note that equal sample weighting, by only including complete results, does not change the results in the slightest.

That's good to hear. Still, if you find some time, would you mind creating a closeup average-codec-score plot using only the complete results, just like the plot on the results page? 

Thanks,

Chris


Yes, Windows 7 64-Bit. But this is not an OS issue, I think. The files are as they are. If you like, you can try Audacity to take a look at the spectra, you can see the gap here to.
http://dl.dropbox.com/u/745331/spectrum2.png (http://dl.dropbox.com/u/745331/spectrum2.png)
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-13 00:59:31
Hi Igor,
on sample 10 I votes this way, because I found this part of the sample SUPER annoying:
http://dl.dropbox.com/u/745331/64kbs%20tes..._4_celt_cut.wav (http://dl.dropbox.com/u/745331/64kbs%20test/Sample10_4_celt_cut.wav)
http://dl.dropbox.com/u/745331/64kbs%20tes...e10_org_cut.wav (http://dl.dropbox.com/u/745331/64kbs%20test/Sample10_org_cut.wav)
From this point the "glitch" gets less annoying but stays through the end of the sample.
maybe this is only for me that annoying or is it a decoding error?! can you please check this?
Thanks,
Christoph


So, even after listening to the difference signal in order to set my loop points, I had a somewhat difficult time A/Bing your sample vs the original.. I certainly don't hear anything super-annoying.

Can you try taking the two signals (Sample10r.wav and Sample10_4.wav)  and lowering their volume level by 6dB in an audio editor, then try to ABX them on the segment where you thought the artifact was most obvious and see if you can find it?

The peak value of the signal is very high in this file, and I'm speculating your audio software might be adding a small amount of gain and causing clipping.

This would also be consistent with your overall preference ranking— since I think you managed to order the codecs in order from least peak-value amplification to most (then the low anchor).




Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-13 04:42:16
The test is finished, results are available here:
http://listening-tests.hydrogenaudio.org/igorc/results.html (http://listening-tests.hydrogenaudio.org/igorc/results.html)
Summary: CELT/Opus won, Apple HE-AAC is better than Nero HE-AAC, and Vorbis has caught up with Nero HE-AAC.


Hey all,  you may be interested in a presentation of the results that I did which summarizes some of the results I found while looking at the data.
http://people.xiph.org/~greg/opus/ha2011/ (http://people.xiph.org/~greg/opus/ha2011/)

I'll probably be adding a bit to it over the next few days as I analyze some more things.  I'm posting the link here first before linking it up elsewhere in the hopes that if I've done something obviously dumb that the folks here will cluestick me.  Thanks!


Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-13 08:27:46
Hey all,  you may be interested in a presentation of the results that I did which summarizes some of the results I found while looking at the data.
http://people.xiph.org/~greg/opus/ha2011/ (http://people.xiph.org/~greg/opus/ha2011/)

I'll probably be adding a bit to it over the next few days as I analyze some more things.  I'm posting the link here first before linking it up elsewhere in the hopes that if I've done something obviously dumb that the folks here will cluestick me.  Thanks!


One thing which catches my attention here is that the Nero VBR seems to flex much more than that of Apple VBR, but on this sample set it neglected to increase the bitrate, i.e. it somehow failed to recognize these samples were difficult.

What exactly does the "Per-sample distribution" graph show?
Title: Multiformat listening test @ ~64kbps: Results
Post by: motion_blur on 2011-04-13 10:06:43
Hi Igor,
on sample 10 I votes this way, because I found this part of the sample SUPER annoying:
http://dl.dropbox.com/u/745331/64kbs%20tes..._4_celt_cut.wav (http://dl.dropbox.com/u/745331/64kbs%20test/Sample10_4_celt_cut.wav)
http://dl.dropbox.com/u/745331/64kbs%20tes...e10_org_cut.wav (http://dl.dropbox.com/u/745331/64kbs%20test/Sample10_org_cut.wav)
From this point the "glitch" gets less annoying but stays through the end of the sample.
maybe this is only for me that annoying or is it a decoding error?! can you please check this?
Thanks,
Christoph


So, even after listening to the difference signal in order to set my loop points, I had a somewhat difficult time A/Bing your sample vs the original.. I certainly don't hear anything super-annoying.

Can you try taking the two signals (Sample10r.wav and Sample10_4.wav)  and lowering their volume level by 6dB in an audio editor, then try to ABX them on the segment where you thought the artifact was most obvious and see if you can find it?

The peak value of the signal is very high in this file, and I'm speculating your audio software might be adding a small amount of gain and causing clipping.

This would also be consistent with your overall preference ranking— since I think you managed to order the codecs in order from least peak-value amplification to most (then the low anchor).


OK, cut me out :-)

I just found the problem! It's not my hears, it's not the decoder, It's my Hardware.

The first clue I got was when I played the Opus sample with my speakers (not headphones) and it did not sound bad at all. Than I experimented a bit and could locate the problem at the headphones jack of my sound system. This seems to be a common issue:
http://forums.logitech.com/t5/Speakers/Z-5...ack/td-p/440105 (http://forums.logitech.com/t5/Speakers/Z-5500-Very-poor-sound-quality-from-the-headphone-jack/td-p/440105)

Sorry I caused you that much trouble. I want to thank all the people who prepared this test, especially Igor. This was really great work.
I hope next time I can participate with valid results. I should really put some tape on this darn headphone jack, so I do not use it again accidentally ;-)

Best,
Christoph
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-13 11:38:46
3. I can't wait for the bitrate table.

Actually, a more useful presentation would be a comparison like this: http://www.hydrogenaudio.org/forums/index....st&p=593735 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=57322&view=findpost&p=593735)
I.e. bitrates that represent real life usage, not the bitrates of these short test samples.

I am planning to do it, but the lack of application support for Opus (CELT) will make the process quite a bit more complex than before.

I have done it now.

I have used these two test sets, Various and Classical, for a long time. To see how these test tracks are handled by other encoders see for example the above linked post and my "LAME 3.98.2 VBR bitrate test, all -V settings in 0.5 step increments" thread: http://www.hydrogenaudio.org/forums/index....showtopic=67523 (http://www.hydrogenaudio.org/forums/index.php?showtopic=67523)

The bitrates are from foobar2000, except the Opus (CELT) bitrates, which are calculated from the exact file sizes and original durations. The encoding settings and encoder versions are the same as were used in the listening test. I resampled the source files to 48 kHz before encoding the Opus tracks (as was done in the listening test). The other test tracks have the original 44.1. kHz sample rate.

(http://i224.photobucket.com/albums/dd212/AB2K/ha/table_various.png)

(http://i224.photobucket.com/albums/dd212/AB2K/ha/table_classical.png)

(http://i224.photobucket.com/albums/dd212/AB2K/ha/table_overall.png)

(http://i224.photobucket.com/albums/dd212/AB2K/ha/chart_various.png)

(http://i224.photobucket.com/albums/dd212/AB2K/ha/chart_classical.png)
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-13 12:41:17
The result page is now updated with per-sample graphs (http://listening-tests.hydrogenaudio.org/igorc/samples.html).
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-13 12:41:37
The bitrates are from foobar2000, except the Opus (CELT) bitrates, which are calculated from the exact file sizes and original durations. The encoding settings and encoder versions are the same as were used in the listening test. I resampled the source files to 48 kHz before encoding the Opus tracks (as was done in the listening test). The other test tracks have the original 44.1. kHz sample rate.


Any idea if fb2000 is including the container overhead?    It's not much, but e.g. on the opus files it's 1kbit/sec, which would explain the entire difference of the means here.

Also, I'm quite perplexed by the file that runs at 82kbit/sec:  Unless the file is quite short the Opus encoder shouldn't currently be able to do that.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-13 12:54:04
The bitrates are from foobar2000, except the Opus (CELT) bitrates, which are calculated from the exact file sizes and original durations. The encoding settings and encoder versions are the same as were used in the listening test. I resampled the source files to 48 kHz before encoding the Opus tracks (as was done in the listening test). The other test tracks have the original 44.1. kHz sample rate.


Any idea if fb2000 is including the container overhead?    It's not much, but e.g. on the opus files it's 1kbit/sec, which would explain the entire difference of the means here.


Depends on the format. IIRC, AAC/MP4 bitrate includes the container, but Peter wasn't sure about Vorbis, and I'm not sure either.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Alex B on 2011-04-13 14:46:01
Any idea if fb2000 is including the container overhead?    It's not much, but e.g. on the opus files it's 1kbit/sec, which would explain the entire difference of the means here.

I don't know how foobar2000 gets the bitrate data, but here's a small comparison. The A values are from foobar2000. The B values are calculated from the file sizes and durations. (foobar2000 displays only integer values.)

(http://i224.photobucket.com/albums/dd212/AB2K/ha/overhead.png)


Quote
Also, I'm quite perplexed by the file that runs at 82kbit/sec:  Unless the file is quite short the Opus encoder shouldn't currently be able to do that.

The duration is 2 min 37 s. I have just rechecked the test files. The bitrate values appear to be correct. I'll send you a new PM soon.
Title: Multiformat listening test @ ~64kbps: Results
Post by: mixminus1 on 2011-04-13 14:55:21
Thanks much to all for their work in both setting up the test, and processing the results - it was very educational!

Is there a Win32 binary of the CELT/Opus encoder available?  I'm only seeing source code packages on the CELT website.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-13 15:04:44
Thanks much to all for their work in both setting up the test, and processing the results - it was very educational!

Is there a Win32 binary of the CELT/Opus encoder available?  I'm only seeing source code packages on the CELT website.


It's linked from the test website, in the "Where can I download the encoders?" section.
Title: Multiformat listening test @ ~64kbps: Results
Post by: mixminus1 on 2011-04-13 15:15:02
:facepalm:  Good God...

Thanks, Garf, I was scouring the results page like mad...clueless noob!
Title: Multiformat listening test @ ~64kbps: Results
Post by: romor on 2011-04-13 16:35:19
@Garf: can you please reupload results you posted on page 1 yesterday
It was text showing all results merged (grouped by sample, columns by codec)
If it's possible and if you can include tester number as first column if would be great

Thanks
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-13 17:12:07
AlexB, thank you for bitrate verification.
I really appreciate it.


Just a little observation. Your list  is split in two parts (Various and Classic). Various contains many musical genres except classical. This way all genres have only 50% of weight and 50% belong to overweighted classic music.

Imagine situation if I will calculate average bitrate on (Various vs Metal) because it is my main musical genre.
Any person has his/her personal _Various vs most admirable genre_

All genres are enough popular and should be weighted equally.

I can't imagine that 50% of people listen classic (or metal) music. 

P.S. It will be great to add a few (really brutal one like trash) metal albums to your bitrate table. There is only one metal album Evanscence which is not quite heavy enough. Some encoders tend to inflate the bitrate a lot  on brutal genres.
Title: Multiformat listening test @ ~64kbps: Results
Post by: pdq on 2011-04-13 17:51:26
Whether or not classical should be considered to be of equal importance to all other genres combined is a matter of opinion. For example, suppose you weight it by longevity? 
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-13 18:01:04
Whether or not classical should be considered to be of equal importance to all other genres combined is a matter of opinion.

No, it's matter of facts.
Only 5 of 30 tested sample were classic. Rock and pop are much more popular than classic nowdays.

I argue about genres not because I think some of them better or have more longivity but because it have influence on bitrates. 

Title: Multiformat listening test @ ~64kbps: Results
Post by: pdq on 2011-04-13 18:53:46
Whether or not classical should be considered to be of equal importance to all other genres combined is a matter of opinion.

No, it's matter of facts.
Only 5 of 30 tested sample were classic. Rock and pop are much more popular than classic nowdays.

I argue about genres not because I think some of them better or have more longivity but because it have influence on bitrates.

Sorry, that was meant to be a joke.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-13 18:59:48
@Garf: can you please reupload results you posted on page 1 yesterday


I deleted that file because it contained mistakes in the post-screening, as discussed earlier in this thread.

The page NullC linked has the processed results in various formats with correct screening. You should be able to massage that into whatever format you want.
Title: Multiformat listening test @ ~64kbps: Results
Post by: romor on 2011-04-13 19:28:30
file: http://people.xiph.org/~greg/opus/ha2011/2...it_test.tar.bz2 (http://people.xiph.org/~greg/opus/ha2011/2011_multiformat_64kbit_test.tar.bz2)

Sorry, found it in 'summary_complete' folder, separated sample per file
thanks
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-13 19:53:49
file: http://people.xiph.org/~greg/opus/ha2011/2...it_test.tar.bz2 (http://people.xiph.org/~greg/opus/ha2011/2011_multiformat_64kbit_test.tar.bz2)
Sorry, found it in 'summary_complete' folder, separated sample per file
thanks


Summary complete is the listeners which submitted results for 30 samples, summary all is all listeners. Both post-postscreening.

I'll be updating the file later today and will include a readme with descriptions.

Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-13 21:45:01
Bitrate verification on my set of albums:
(http://i55.tinypic.com/fkbgqa.png)

http://www.hydrogenaudio.org/forums/index....st&p=752009 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=88033&view=findpost&p=752009)
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-13 22:46:34
Thanks, Garf, for the plot! And thanks, Christoph, for clearing up your (very unfortunate) headphone jack issue.

Hey all,  you may be interested in a presentation of the results that I did which summarizes some of the results I found while looking at the data.
http://people.xiph.org/~greg/opus/ha2011/ (http://people.xiph.org/~greg/opus/ha2011/)

I'll probably be adding a bit to it over the next few days as I analyze some more things.  I'm posting the link here first before linking it up elsewhere in the hopes that if I've done something obviously dumb that the folks here will cluestick me.  Thanks!

*raising my hand*

Please don't write "HE-AAC" but something like "two popular HE-AAC encoders" in

Quote
Now, these results demonstrate that Opus's performance against HE-AAC, one of the strongest (but highest-latency) codecs at this bitrate, are very strong, besting its quality on the majority of individual audio samples and receiving a higher average score overall.

Sorry, but being an AAC developer I have to stress this. The nero and Apple encoders have never been proven to be the best encoders around.

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-13 22:48:48
Quote
Also, I'm quite perplexed by the file that runs at 82kbit/sec:  Unless the file is quite short the Opus encoder shouldn't currently be able to do that.

The duration is 2 min 37 s. I have just rechecked the test files. The bitrate values appear to be correct. I'll send you a new PM soon.


Thank you very much for posting these figures.  There was indeed a misbehavior in our encoder.

The current opus encoder has very simple VBR which is mostly designed for low latency constrained VBR usage. In full VBR mode we simply turn off the constraint, but otherwise it's the same.  Because of this our VBR should be very constant compared to other VBR codecs— and you can see this on the samples used in this test.  A few of your samples, however, showed bitrate spikes which should not have been possible with our VBR system.

The way the VBR currently works is that in the middle of encoding a frame the size of the current entropy coded variable-rate part of the frame is added to an offset value. This target is boosted for frames containing transients, then clamped to the permissible rates and then used as the target size for the whole frame. The error between the target rate and the requested rate is used to control the the offset with a simple low-pass linear controller.    (Notice that there is no psy-model in any of this except a dumb transient boost, which is why we're confident that the Opus encoder will improve a lot for high latency uses like this in the future. This is also a good area if someone would like to help improve the Opus encoder.)

Opus can encode digital silence with two-bytes per frame and our encoder does so when it is fed digital silence.  My explicit intention in the above model was to ignore these silence frames for the purpose of the closed loop rate control, so that files with lots of silence would end up undershooting the rate but the presence of silence would not cause huge rate jumps either.  Somehow, I don't know if a patch was lost or I just had a blonde day,  I actually failed do this.  Instead, during long spans of digital silence the encoder would shoot the offset through the roof in a futile attempt to use more than two-bytes per frame.  Once the audio began again it would _greatly_ overshoot the target for a little while until the closed loop control caught up with it. The pink panther file begins with 275 frames of digital silence, and the first non-silent frame was encoded at 423kbit/sec. The encoder it took hundreds of frames to get back to a sane rate.

Since this actually requires digital silence (not just quiet frames) and long spans of them to matter, it isn't actually triggered on many things.  I'm pretty sure this change has no significant effect on any of the samples used in this test, for example... and on the collection of 20 or so CD's (commercial and live recordings) that I've been using for automated Q/A it never managed to trigger a jump large enough to get my attention.

The behavior is now fixed: http://git.xiph.org/?p=celt.git;a=commitdi...1dd48d13e734125 (http://git.xiph.org/?p=celt.git;a=commitdiff;h=fdd867534a8f53ddb3f2845fc1dd48d13e734125)

With the fix the file is encoded with an average rate of 62.890kbit/sec, instead of ~82kbit/sec.

If you're interested in measuring the bitrates on a fixed encoder, I'd be glad to build windows binaries for you.  I can also trivially back-port this fix to the encoder version that was used in the test (though I think we haven't changed anything important since then) if you'd like to see that.

Thank you for taking the time to test it out and report results.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-13 22:57:06
Please don't write "HE-AAC" but something like "two popular HE-AAC encoders" in

Sorry, but being an AAC developer I have to stress this. The nero and Apple encoders have never been proven to be the best encoders around.


From the previous tests, the Nero and Apple encoders are proven to be the best AAC encoders, so with the data at hand I strongly believe you are wrong.

If you believe otherwise, feel free to give evidence to the contrary. Pointing out this hypothetical better encoder would be a good start, and is sure to generate a lot of interest on this forum.
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-13 23:06:18
From the previous tests, the Nero and Apple encoders are proven to be the best AAC encoders, so with the data at hand I strongly believe you are wrong.

Has anyone ever seriously blind-tested e.g. Dolby's and Fraunhofer's HE-AAC encoders in the last few years, especially at these bit rates? I'd be more than happy to test them here, provided I can choose their versions.

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-14 00:10:40
Quote
Has anyone ever seriously blind-tested e.g. Dolby's and Fraunhofer's HE-AAC encoders in the last few years, especially at these bit rates? I'd be more than happy to test them here, provided I can choose their versions.


The CT/Dolby encoder was last tested here (AFAIK): http://www.mp3-tech.org/tests/aac_48/results.html (http://www.mp3-tech.org/tests/aac_48/results.html)
The latest version I have is 8.2.0 (found in the latest Winamp version). Igor took a look at this, and concluded it didn't seem improved much if at all since the 7.x series. I don't know if that is an accurate assessment, but it should be verifiable easily. I don't know of any newer versions, and 8.2 is from 2009.

Nobody ever tested Fraunhofer's HE-AAC encoder. Their MP3 encoder is available freely, and was tested in the past, but I couldn't find a free AAC encoder, which probably explains the complete lack of interest.

I'm not sure what you mean by "I'd be more than happy to test them here", but if you are able to provide copies of those codecs so they can be tested further including in future listening tests, that is probably a good idea. And if they do win those, your criticism is founded. Until then, I see little point.
Title: Multiformat listening test @ ~64kbps: Results
Post by: jmvalin on 2011-04-14 00:58:07
Sorry, but being an AAC developer I have to stress this. The nero and Apple encoders have never been proven to be the best encoders around.


I don't know much about all the HE-AAC implementations around, but keep in mind that we're talking about two implementations that have matured over several years compared to the Opus *reference* implementation that is still very immature in terms of tuning. So if anything, I think the comparison would be biased against Opus. Not to mention that Opus makes quite a few quality-affecting compromises to achieve its low delay. To be fair, I'm probably just as surprised as you are with these results.
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-14 05:00:07
I'm stunned by the CELT/Opus results! I would have assumed that your toolbox is smaller than usually when you are targeting low-delay. And now Celt even beats the others by lengths.
Thanks for the great work, guys!


We were surprised when we started getting competitive with high latency codecs too, but we've had a little while to get used to that.  HE-AAC was a bit more surprising, especially since so many things were going against us (the immaturity of our encoder, it's inadequacy for this application (VBR), etc.)

Low-latency work implies some serious compromises, but it's not all bad.  Small transform sizes automatically give you better time domain noise shaping, for example.  There have been a lot of people that liked codecs like MPC at high rates for the reduced time-domain-liability they imply.  Simply getting many little details right helps reduce the harm of low-latency.  E.g. we use a trivial time domain pre-whitening before the transform so that the quantization noise from spectral leakage is less exposed.

We also expanded the low latency toolbox some. For example,  we have short and long frames without window switching (and the look-ahead latency that requires).  We're also using other things like range coding and high-dimensional vector quantization which might not have been great options a number of years ago.  Our decoder is currently quite a bit slower than AAC decoders (though it's not optimized, so it's hard to say how much the ultimate difference will be),  but since we're mostly targeting interactive use we were able to "pay" for decoder complexity increases with encoder decreases:  We're an order of magnitude faster encode side than our high latency competition (no explicit psy-model!). With some fairly modest compromises you can make an Opus audio mode (the CELT part) encoder which is basically the same speed as the decoder. Though cpu cycle hungry, Opus uses a very small amount of memory, which eliminates one of the embedded hurdles Vorbis suffered.

I also like to think that the ultra-low latency support has fostered some beneficial discipline: Every single fractional-bit of signaling overhead and frame redundancy counts a lot with 2.5ms frames. While it's not so important with larger frames, waste is waste, and Opus has very little of it. A very high percentage of the bits in opus frames go directly to improving the audio band SNR, and very few go to shuffling around coding modes or switching between inconsequentially different alternatives.  Signaling bits are sometimes helpful, sometimes _very_ helpful... but bits spent coding signal data are pretty much always helpful. We came up with a number of clever ways of eliminating signaling, and Opus is able to provide a true hard CBR (no bitres!) in audio modes which is super efficient (uses every bit in the frame) and actually sounds really good.

For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.  Speech is another matter and Opus should do quite well there down to very low rates, owing to the merger of Skype's SILK.

Opus should also scale well to higher rates— it is not using any highly parametric techniques that don't respond to additional bits— though the lack of a mature encoder will probably still give other codecs the edge in many cases.  This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...

I also think that aggressive tuning of these HE-AAC encoders could put them back in the lead, or at least strongly tie opus. E.g. From my own listening I think a lot of the difference between nero and apple in the test was due to the lowpass difference for example.  That said, the HE-AAC format is mature (and by some accounts now stagnant), and we have a lot of low hanging fruit.

I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.

Cheers,
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-14 05:57:47
One thing which catches my attention here is that the Nero VBR seems to flex much more than that of Apple VBR, but on this sample set it neglected to increase the bitrate, i.e. it somehow failed to recognize these samples were difficult.

What exactly does the "Per-sample distribution" graph show?

Hm,
Nero had moderate bitrate variation in previous public tests as well.

I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.

Yes, but how AoTuv 6b juggles with bitrate. It deserves "Bravos"
Some listeners really like Vorbis (including me).
My results are:
Vorbis - 2.94
Nero -2.75
Apple - 3.00
Opus - 3.58.

It's aslo the matter of taste as it says here
I figured ratings would vary between testers depending on which of pre-echo, lowpass, ringing, warble and grittiness is more objectionable. Further more on the Bohemian Rhapsody sample source warbling had me very confused for a while

Title: Multiformat listening test @ ~64kbps: Results
Post by: saratoga on 2011-04-14 06:29:43
We also expanded the low latency toolbox some. For example,  we have short and long frames without window switching (and the look-ahead latency that requires).  We're also using other things like range coding and high-dimensional vector quantization which might not have been great options a number of years ago.  Our decoder is currently quite a bit slower than AAC decoders (though it's not optimized, so it's hard to say how much the ultimate difference will be),  but since we're mostly targeting interactive use we were able to "pay" for decoder complexity increases with encoder decreases:  We're an order of magnitude faster encode side than our high latency competition (no explicit psy-model!). With some fairly modest compromises you can make an Opus audio mode (the CELT part) encoder which is basically the same speed as the decoder. Though cpu cycle hungry, Opus uses a very small amount of memory, which eliminates one of the embedded hurdles Vorbis suffered.


Is the decoder spec finalized yet?  Would be interesting to see about getting a fixed point version running in Rockbox.
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-14 08:30:52
Is the decoder spec finalized yet?  Would be interesting to see about getting a fixed point version running in Rockbox.


We're in a soft-freeze right now. We're really not planning on changing it, but we're also not yet making any promises not to: if something awful comes up we will. The whole process is dependent on progress in the IETF now, which appears to have gone super-political and thus slow, for the moment.  The Ogg encap for it is certainly not final at the moment, it's been due for a redo for over a year now, but that was pushed back because we were working on finishing the codec itself.

So for the CELT part the reference implementation (libcelt) is both a fixed point implementation and a floating point implementation, through the magic of unholy C macros. The SILK part has split fixed/float code, and the combination is float only at the moment, but I think this is mostly just a build system issue.  I'd be glad to work with whomever to get it working on whatever.  Feel free to hop into the #celt  channel on irc.freenode.net.

Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-14 09:13:14
I don't know much about all the HE-AAC implementations around, but keep in mind that we're talking about two implementations that have matured over several years compared to the Opus *reference* implementation that is still very immature in terms of tuning. So if anything, I think the comparison would be biased against Opus.


We only test implementations because I don't think we can sensibly test a specification. The best available implementation of Opus was compared to the best available implementations (that we know of) of HE-AAC. Where exactly is there a bias against Opus here?

You are arguing from the belief that Opus implementations can improve faster and more than HE-AAC implementations can. There may be arguments to support that belief, but this only means that a future test might be expected to show an increasing advantage for Opus. It certainly doesn't mean the current test is biased.

Quote
Not to mention that Opus makes quite a few quality-affecting compromises to achieve its low delay. To be fair, I'm probably just as surprised as you are with these results.


I agree (on both).
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-14 10:46:43
I'm not sure what you mean by "I'd be more than happy to test them here", but if you are able to provide copies of those codecs so they can be tested further including in future listening tests, that is probably a good idea. And if they do win those, your criticism is founded. Until then, I see little point.

Yes, pointing to encoders and settings is what I mean. Sorry for not making it clear. I'll let you know once the encoders of my choice become available somewhere for all to try out.

Maybe some background information: my (and a colleague's) full-time job over the last 3 years has been to improve Fraunhofer's HE-AAC encoder. I'm quite confident that I actually made some progress  at least over Fraunhofer's older encoder versions. I'm not expecting my "latest greatest" work to win over Opus in a test like this one (because the latter sounds really good), but I hope that it would be tied on average.

I'll have to disagree (of course ), Jean-Marc. The fact that Apple's HE-AAC implementation is quite new doesn't convince me it's fully mature. From Wikipedia (http://en.wikipedia.org/wiki/Advanced_Audio_Coding):

Quote
As of September 2009, Apple has added support for HE-AAC (which is fully part of the MP4 standard) but iTunes still lacks support for true VBR encoding.

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: jmvalin on 2011-04-14 11:41:21
You are arguing from the belief that Opus implementations can improve faster and more than HE-AAC implementations can. There may be arguments to support that belief, but this only means that a future test might be expected to show an increasing advantage for Opus. It certainly doesn't mean the current test is biased.


Sorry, wrong choice of words. I meant to say that Opus (as a spec) was at a sort of at a disadvantage on this test compared to AAC which had more mature encoders. I do not in any way suggest that the test itself had a problem or that the IgorC or you should have done anything different. Mainly I was responding to Chris' comment about Apple AAC not being the best encoder out there.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-14 12:19:44
Low-latency work implies some serious compromises, but it's not all bad.  Small transform sizes automatically give you better time domain noise shaping, for example.


But not enough, given the presence of block-switching in Opus

I could imagine the more frequent transmission of band energies is very useful. It should also help the folding. IIRC, SBR has issues with not being able to adapt fast enough time-domain-wise in some circumstances.

Quote
I also like to think that the ultra-low latency support has fostered some beneficial discipline: Every single fractional-bit of signaling overhead and frame redundancy counts a lot with 2.5ms frames. While it's not so important with larger frames, waste is waste, and Opus has very little of it. A very high percentage of the bits in opus frames go directly to improving the audio band SNR, and very few go to shuffling around coding modes or switching between inconsequentially different alternatives.  Signaling bits are sometimes helpful, sometimes _very_ helpful... but bits spent coding signal data are pretty much always helpful. We came up with a number of clever ways of eliminating signaling, and Opus is able to provide a true hard CBR (no bitres!) in audio modes which is super efficient (uses every bit in the frame) and actually sounds really good.


I think you have a good advantage over (LC)AAC here: AAC allows very fine control of the quantization, but at a severe signaling cost. At least in AAC codec design, you can spend a very long time on the complicated question of doing the joint R/D optimization quickly, and end up not using the fine control at all because it just eats too many bits. This gets worse at low bitrates. It doesn't help AAC only uses huffman coding, and not arithmetic/range coding. H.264 also has many possible modes but I presume CABAC helps mitigate the signaling cost (maybe someone who is more familiar with that particular codec can confirm/deny).

Is the almost complete lack of signaling in Opus related to the decision not to make range coding contexts? If I read the spec correctly, almost all of the range coding assumes uniform distribution.

The 3GPP reference code (which is actually Fraunhofer fastaac, as far as I know) shows that you can make a decent AAC codec even ignoring most of the psycho-acoustics or greatly simplifying them, so I'm not surprised you eliminated the explicit psymodel entirely. It's a surprisingly small part of the codecs efficiency (I'm not saying its not important - it is - but less as you would think at first). VBR is another matter, though it's also surprisingly hard to make consistently good decisions there.

Quote
For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.


This is a bit surprising to me because of the above. Are there technical limitations that cause this?

Quote
This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...


Did the codec used in this test use the tonal pre/postfilter from Broadcom?

Quote
I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.


I understand this as saying that Vorbis would be easily competitive if it had something like SBR or folding. The experience with LC-AAC vs HE-AAC seems to support that.
Title: Multiformat listening test @ ~64kbps: Results
Post by: jmvalin on 2011-04-14 14:04:05
Is the almost complete lack of signaling in Opus related to the decision not to make range coding contexts? If I read the spec correctly, almost all of the range coding assumes uniform distribution.


The CELT part of Opus uses range coding more for convenience than absolute necessity. I once did some simulations for using simpler coding (e.g. Golomb) instead of range coding and the loss was about 2-3 bits/frame. Of course, with the features we later added, some features would have been a pain to implement without range coding, but nothing impossible. The most important symbols we code either have flat probabilities or use a Laplace distribution, which Golomb codes model well.

Quote
Quote
For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.


This is a bit surprising to me because of the above. Are there technical limitations that cause this?


The reason I'd expect us to eventually lose at some lower bit-rate is simply the fact that we have no SBR and (at even lower rates), no parametric stereo. But I'm fine with that. Opus was never intended to even go as low as 64 kb/s for stereo music so I'm already pretty happy with our performance.

Quote
Quote
This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...


Did the codec used in this test use the tonal pre/postfilter from Broadcom?


Yes, it probably helped a bit but it doesn't do miracles. This is one of the sacrifices we make for having low delay (the lower MDCT overlap causes more leakage).

Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-14 17:47:07
Low-latency work implies some serious compromises, but it's not all bad.  Small transform sizes automatically give you better time domain noise shaping, for example.

But not enough, given the presence of block-switching in Opus
I could imagine the more frequent transmission of band energies is very useful. It should also help the folding. IIRC, SBR has issues with not being able to adapt fast enough time-domain-wise in some circumstances.


If we'd only been comparing ourselves to G.719/G.722.1c we probably wouldn't have done as much as we've done for transients, only through thoroughly unfair comparisons of our CBR behavior to things like vorbis were we motivated enough to really do something about it here. 

Amusingly, we don't do the kind of block switching that increases the coarse energy temporal resolution currently (the format allows it, we just don't do it).

The format supports frame sizes of 2.5, 5, 10, and 20ms  all use the same 2.5 ms window-overlap.  There format can switch on the fly between any of these sizes, but the current encoder doesn't do this automatically (you can ask it to).  We needed the sizes to cover all the latency use-cases, but they're potentially useful for coding even if you don't care about latency.  There are clearly cases where switching to a higher signaling rate than the 20ms frames give you is beneficial.

The switching we do  have is for any of the {5,10,20} ms sizes there is a 'transient frame' bit switch to flip to the 2.5ms transform, e.g. so a 20ms frame would have 8 of them.  They are grouped by band and normalized by band, so the coarse energy resolution doesn't necessarily go up, and the side information rate doesn't go up (much).  During quantization we can apply special T/F processing to boost or lower (for transient frames) the effective time domain resolution on a band by band basis.

At higher rates when our 32-bit algebraic codebook limitation arises (we artificially limit the VQ symbols to ~32 bits to avoid the need to do 64-bit arithmetic, plus some other limits to tame memory requirements), bands get subdivided in dimensions and for transient blocks (or blocks which have been time-boosted) the subdivision is set up so that it subdivides in time.  When this subdivision occurs, additional energy data is coded (basically the balance of the energy on each half, so that the resulting vectors retain the unit norm required by our spherical VQ), and in that case the energy resolution increases.

Regardless of the energy resolution, the sparseness preservation code makes a fair bit of effort to produce output which has the same time domain distribution as the original signal.

Quote
I think you have a good advantage over (LC)AAC here: AAC allows very fine control of the quantization, but at a severe signaling cost. At least in AAC codec design, you can spend a very long time on the complicated question of doing the joint R/D optimization quickly, and end up not using the fine control at all because it just eats too many bits. This gets worse at low bitrates. It doesn't help AAC only uses huffman coding, and not arithmetic/range coding. H.264 also has many possible modes but I presume CABAC helps mitigate the signaling cost (maybe someone who is more familiar with that particular codec can confirm/deny).

Is the almost complete lack of signaling in Opus related to the decision not to make range coding contexts? If I read the spec correctly, almost all of the range coding assumes uniform distribution.


Few of our signaling parameters have uniform distribution.  We don't use _adaptive_ contexts for most of the signaling because most of the signaling is a single symbol per frame— and we can't adapt across frames due to needing to tolerate loss—, but we have static probabilities, allowing for R/D decisions on the signaling and for making uncommon options very cheap (tiny fractions of a bit when not in use).  The cases where there really are multiple correlated signaling symbols (the per band T/F changes and the band bitrate boost symbols come to mind) then we do adapt the probability.

The coarse energy is all entropy coded, with a static PDF that agrees pretty well with most data. Again, loss robustness prevents us from having much useful adaptation, and the autoregressive inter/intra frame prediction at least makes sure the mean of the assumed distribution is right.

The VQ is uniform coded and it counts for most of the bits in the frame, as you mention— but after dumping a lot of data I found that the actual symbols themselves were quite uniform.  There might have been something we could have done with the signs if we used an alternative algebraic representation, but having very predictable bitrates from our VQ at lower resolutions turned out to be helpful for the bit allocation behavior in any case.

Quote
Quote
For music at lower rates, I expect that HE-AAC would win— we simply start to fall apart once the number of bytes in the frame gets too small.

This is a bit surprising to me because of the above. Are there technical limitations that cause this?


Well, a couple.  For one, I understand that HE-AAC is using 40ms frames(1024 samples at half-rate, no?).  That is a lot of effective signaling reduction that we miss, even if we're more efficient to start with.  Our shorter transforms and tiny window make the transform very leaky.  The reduced compaction means that signals are naturally less sparse, so at the very limits of resolution they fall apart more suddenly.

Because we always preserve the energy to at least 6dB resolution, the energy rate does not change as the rate decreases. At low enough rates we're spending a lot of bits there.  In particular, sometimes the energy bit rate bursts rather high, and if it uses up almost all of the bits which is bad for quality even if it only happens fairly rarely.  A smarter encoder than our current one could use dynamic programming to do R/D optimization of the coarse energy, but since this is only applicable to distorting the 6dB resolution data, it would only be applicable at very low rates.  A smarter encoder could also adjust the end band position (low-passing) to skip the coding of HF when it will be inaudible, but I think both of these would require a reliably psy-model and a lot of tuning in order to not be a liability.

You'd think that reduced side information would be a benefit at low rates, but it isn't always—  we can't e.g. precisely place a single tone in a band without coding enough resolution for the whole band.  When you just don't have enough bits, using them exactly where you want them is more important then when you have more bits, and we have fairly little control.  (And what control we do have, the encoder doesn't make great use of currently).  We also don't have parametric stereo other than a kind of band-energy-intensity only stereo. (We have quite clever stereo coding overall, but it isn't the sort of clever that makes very low rates work well, plus I think we've never heard a parametric stereo we actually liked)

Quote
Quote
This is especially true for exposed multi-tonal samples like the sample 02 in this set, though multi-tonal killers are fairly uncommon and I expect that we can fix them with VBR in the same way large block codecs fix transients...

Did the codec used in this test use the tonal pre/postfilter from Broadcom?


Yes, but it's not really that helpful for that kind of sample. The filter does a fairly narrow comb-shaped noise shaping. It can make a dramatic improvement on simple harmonic signals (like speech, exposed tonal instruments (like a trumpet or clarinet, even with background sound)), but on samples where there are many exposed tones which aren't simply harmonic related it doesn't do much.  Those signals also probably throw off the encoder's search, so even if some weak use of the filter could improve things there it probably isn't using it usefully right now.

Quote
Quote
I feel that Vorbis loses for a sad reason at this rate and lower:  The Vorbis toolbox doesn't have any good tools to avoid having to low-pass at rates well below its initial design goals at higher rates it obviously does much better than it did here.  The lack of efficient tools for low rate HF is real shortcoming at this rate, but not one which is all that interesting from an engineering/competitive basis.


I understand this as saying that Vorbis would be easily competitive if it had something like SBR or folding. The experience with LC-AAC vs HE-AAC seems to support that.


Yes, primarily. Or even if could get away with higher dimensional VQ with acceptable memory/complexity it would be somewhat better off.  Like MP3 vorbis' only way to cope with low rate signals is eventually by throwing out hunks of the spectrum.  It's better than MP3 in this regard as it has more control about where it throws things away, but ultimately leaving holes in the spectrum is not a great thing to do.


Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-04-14 23:39:05
The switching we do have is for any of the {5,10,20} ms sizes there is a 'transient frame' bit switch to flip to the 2.5ms transform, e.g. so a 20ms frame would have 8 of them.  They are grouped by band and normalized by band, ...

Sounds similar to AAC, actually. May I ask, what do you mean by grouping and normalizing?

Quote
I understand that HE-AAC is using 40ms frames (1024 samples at half-rate, no?)

Yes, at 48 kHz output sampling rate and when using dual-rate SBR (46 ms in the test configuration). But since there's 50% block overlap, a frame spans up to 80 ms (up to 93 ms in the test). Which is a bit on the high side if you ask me, but that's how it is. Maybe that explains why CELT does so well in this test: with its 20-ms framing it might actually be closer to the optimum than HE-AAC.

Chris
Title: Multiformat listening test @ ~64kbps: Results
Post by: jmvalin on 2011-04-15 05:49:23
The switching we do have is for any of the {5,10,20} ms sizes there is a 'transient frame' bit switch to flip to the 2.5ms transform, e.g. so a 20ms frame would have 8 of them.  They are grouped by band and normalized by band, ...

Sounds similar to AAC, actually. May I ask, what do you mean by grouping and normalizing?


It means that if (e.g.) a 20 ms frame is split in 8 short blocks, then there's only *one* energy value encoded per band. That value is the sum of the energies for all the short blocks. Normalizing is what CELT does in each band before applying the PVQ encoding (normalizing happens for all frames, not just transients).

Quote
Quote
I understand that HE-AAC is using 40ms frames (1024 samples at half-rate, no?)

Yes, at 48 kHz output sampling rate and when using dual-rate SBR (46 ms in the test configuration). But since there's 50% block overlap, a frame spans up to 80 ms (up to 93 ms in the test). Which is a bit on the high side if you ask me, but that's how it is. Maybe that explains why CELT does so well in this test: with its 20-ms framing it might actually be closer to the optimum than HE-AAC.


For 20 ms frame size, the CELT window is only 22.5 ms, so about 4x shorter than HE-AACs. That makes a big difference. That's probably the single biggest limitation imposed by the low-delay constraint and that's why I was really surprised by the quality we were able to get in this test. Has we not have this constraint, just increasing the MDCT overlap could have provided a big improvement in quality.
Title: Multiformat listening test @ ~64kbps: Results
Post by: .alexander. on 2011-04-15 12:40:43
(http://i54.tinypic.com/2646qlj.jpg) (http://i51.tinypic.com/c3pyd.jpg)

The second graph seems to be consistent with "Results from Dr. Christian Hoene for ITU-T Workshop last September". PEAQ is not supposed to work well for HE-AAC, but Vorbis has bad scores as well.

Title: Multiformat listening test @ ~64kbps: Results
Post by: jmvalin on 2011-04-15 15:17:56
The second graph seems to be consistent with "Results from Dr. Christian Hoene for ITU-T Workshop last September". PEAQ is not supposed to work well for HE-AAC, but Vorbis has bad scores as well.


PEAQ is known to be horrible at comparing codecs. At best it can help tuning a codec when the tuning being done is not related to psycho-acoustics. We've known for a long time that it tends to give CELT higher scores than it deserves, so we've never really relied on it for comparing to other codecs.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Xanikseo on 2011-04-20 16:34:59
Bitrate verification on my set of albums:
(http://i55.tinypic.com/fkbgqa.png)

http://www.hydrogenaudio.org/forums/index....st&p=752009 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=88033&view=findpost&p=752009)

IgorC, in your set of albums, which version of aoTuV did you use to encode to vorbis? I assume that as it is your own collection, you may not always re-encode your whole collection when new versions of the encoder come out (I certainly don't). I find that aoTuV b6.02 produce files consistently larger than those produced by aotuv b5.7, which may explain why the average bitrate in the listening test is so high. I wouldn't be surprised if b5.7 produce files on average ~64-68kbps, but I would be less certain about b6.02... Then again, it could just be coincidence, that for the times I check, b6.02 tends to produce larger files...

EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Garf on 2011-04-20 18:02:29
IgorC, in your set of albums, which version of aoTuV did you use to encode to vorbis? I assume that as it is your own collection, you may not always re-encode your whole collection when new versions of the encoder come out (I certainly don't). I find that aoTuV b6.02 produce files consistently larger than those produced by aotuv b5.7, which may explain why the average bitrate in the listening test is so high. I wouldn't be surprised if b5.7 produce files on average ~64-68kbps, but I would be less certain about b6.02... Then again, it could just be coincidence, that for the times I check, b6.02 tends to produce larger files...

EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.


The bitrates were independently verified here:

http://www.hydrogenaudio.org/forums/index....st&p=751888 (http://www.hydrogenaudio.org/forums/index.php?showtopic=88023&view=findpost&p=751888)
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-20 18:19:05
IgorC, in your set of albums, which version of aoTuV did you use to encode to vorbis?

It was the last version of AoTuv 6.02 Beta.
Title: Multiformat listening test @ ~64kbps: Results
Post by: Zarggg on 2011-04-20 19:09:46
EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.

"Listen[ing] for pleasure" was not the intended goal of this test, based on my understanding. I believe it was to measure the usability of the tested codecs for implementations that require a low encoding rate (whatever they may be, but streamed content over low bandwidth -- such as telephony -- immediately springs to mind).
Title: Multiformat listening test @ ~64kbps: Results
Post by: Xanikseo on 2011-04-20 21:20:10
EDIT: I just realised I'm being a bit silly, who listens to 64kbps files for pleasure anyway? You must have used the current version.

"Listen[ing] for pleasure" was not the intended goal of this test, based on my understanding. I believe it was to measure the usability of the tested codecs for implementations that require a low encoding rate (whatever they may be, but streamed content over low bandwidth -- such as telephony -- immediately springs to mind).

Oh I know that . It's just that I know he wouldn't keep 64kbps encodes of his albums lying around on his computer, so he must have newly encoded the files to 64kbps with the latest encoder, hence why I felt a bit silly about asking which version encoder he used.

Anyway I'm glad it's the most recent version he used, now I don't have to worry about songs taking up more space on my DAP, now that I'm using b6.02.
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-21 19:15:24
Bitrate verification on my set of albums:
(http://i55.tinypic.com/fkbgqa.png)

http://www.hydrogenaudio.org/forums/index....st&p=752009 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=88033&view=findpost&p=752009)


To satisfy my curiosity, can you run the opus numbers (with the same settings) on a build  with the recent VBR fix?

Title: Multiformat listening test @ ~64kbps: Results
Post by: jmvalin on 2011-04-21 19:53:09
Bitrate verification on my set of albums:
(http://i55.tinypic.com/fkbgqa.png)

http://www.hydrogenaudio.org/forums/index....st&p=752009 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=88033&view=findpost&p=752009)


To satisfy my curiosity, can you run the opus numbers (with the same settings) on a build  with the recent VBR fix?


NullC, your VBR fix hasn't made it to Opus yet :-)
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-21 23:52:50
NullC,

h*tp://www.mediafire.com/?s7i9usu2qr27pcg

The bitrate is slightly lower.
Title: Multiformat listening test @ ~64kbps: Results
Post by: NullC on 2011-04-22 13:31:36
NullC,
h*tp://www.mediafire.com/?s7i9usu2qr27pcg
The bitrate is slightly lower.


66.29 -> 65.82  Good. As expected. Not quite enough to get it lower than Vorbis— but lower than the AAC encoders.

Thanks for testing that.


Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-04-28 17:38:30
Some statistics of this public test
18 listeners were kind  to answer the questions. All of them are men. There was at least one woman but she hasn't answered the questions.

Age:
Min – 20
Max – 46
Average – 28.61

Some observation by groups of  age:
20-25 years. A few young participants. Good performance.
26-29 years.  Good performance as well. Good observation. Excelent.
30-34 years. Very accurate observation with patience. They take their time to perform the test. The results are as good as of the younger groups (and maybe a bit better in overall).
>35 years. They were too few participants. But they have performed good too.

At least for this test  there is no connection between the quality of the results and age.  It's more about experience.


Headphones/speakers:
78% - headphones
11% - speakers
11% - headphones and speakers

Most of participants had Sennheiser headphones. ( 44.4%)
Some of participants have realized that headphones are better for spoting artifacts.

Soundcard:
Most of users had on-board soundcards. (59%). It's perfectly fine at least for me. Nowdays on-board solutions are actually very good.

Operating System:
7 – 52.9%
Vista – 11.8%
XP – 23.5%
Linux – 5.9%
Mac OS X – 5.9%

Computer:
Desktop – 52.9%
Mobile (laptop-like) – 41.2%
HTPC – 5.9%

The noise of fans:
Low – 76.5%
High – 17.6%
Less or more – 5.9%

Quite room:
1) Yes – 58.9%
2) Moderate – 23.5%
3) No – 17.6%

Time of testing:
Morning – 10%
Afternoon – 16.7%
Evening – 36.7%
Night - 30%
Varios – 6.7%

The place (room):
Home – 70.6%
Office – 11.8%
Computer room – 11.8%
Treated room – 5.9%

Previous participation in public tests:
For 58.9% of people it was the first time when they have participated in such test but allmost all of them have performed  their own blind tests.
5.9% - previously have participated in 1 past public test
17.6% - in 3 of them 
17.6% – in 6 or more
Title: Multiformat listening test @ ~64kbps: Results
Post by: The Sheep of DEATH on 2011-05-05 07:34:00
Hi all. I might be behind on the times, but was this test intended to replace the 80kbps test proposed last year? I ask because I'm unsure whether to continue tuning GXLame (a project I started as a proof of concept and also to be somewhat competitive at lower bitrates). Thanks, and my profoundest apologies if it is deemed that I am going off-topic.
Title: Multiformat listening test @ ~64kbps: Results
Post by: IgorC on 2011-05-05 21:45:52
Hi, DS

Hi all. I might be behind on the times, but was this test intended to replace the 80kbps test proposed last year?

Do you mean 128 kpbs AAC public test that was postponed? It wasn´t cancelled totally.
As far as I know there are too many competitive AAC encoders  (Nero, CT, Apple, possibly FhG) and next public test should be only for AAC encoders (maybe at 96 kbps). Only then multiformat public test can be conducted.

Those were my thoughts. It can be different. It´s up to all hydrogenaudio. Maybe other members want to conduct the next test.  It will be great if we can work all together because I couldn´t conduct this test without help of Garf. It is time consuming and not easy.

I ask because I'm unsure whether to continue tuning GXLame (a project I started as a proof of concept and also to be somewhat competitive at lower bitrates)

There was interest in your tunings. Some people have reported in your topic. I will give a shot too.
Any available codec can be included in public test if there is enough interest.
Title: Multiformat listening test @ ~64kbps: Results
Post by: snadge on 2011-06-09 14:08:12
just want to say thanks for all your hard work in doing these tests for us to refer too..

THANK YOU
Title: Multiformat listening test @ ~64kbps: Results
Post by: C.R.Helmrich on 2011-06-15 22:55:37
Hi all. I might be behind on the times, but was this test intended to replace the 80kbps test proposed last year? I ask because I'm unsure whether to continue tuning GXLame (a project I started as a proof of concept and also to be somewhat competitive at lower bitrates). Thanks, and my profoundest apologies if it is deemed that I am going off-topic.

Sheep, allow me to redirect you here (http://www.hydrogenaudio.org/forums/index.php?showtopic=89070&st=0&gopid=759505).

Chris