HydrogenAudio

Hydrogenaudio Forum => Validated News => Topic started by: Sebastian Mares on 2007-08-16 00:00:17

Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 00:00:17
The much awaited results of the Public, Multiformat Listening Test @ 64 kbps are ready - partially. So far, I only uploaded an overall plot along with a zoomed version. The details will be available tomorrow. You can also download the encryption key on the results page that is located here:

http://www.listening-tests.info/mf-64-1/results.htm (http://www.listening-tests.info/mf-64-1/results.htm)
http://www.listening-tests.info/mf-64-1/resultsz.png (http://www.listening-tests.info/mf-64-1/resultsz.png)

Nero and WMA Professional 10 are tied and WMA Professional 10 is tied to Vorbis. Vorbis however performed worse than Nero. Of course, High Anchor is best and Low Anchor loses.

This one goes to the experts: How would you rank codecs in such a situation, where A=B and B=C, but C<A?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: guruboolez on 2007-08-16 00:06:13
Wow, thanks a lot for posting so fast these results.
WMAPro is competitive against HE-AAC at 64 kbps... great result for this new format. What were Microsoft listening tests on this subject (I forgot it)?

EDIT: correct link is http://www.listening-tests.info/mf-64-1/results.htm (http://www.listening-tests.info/mf-64-1/results.htm)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: -Nepomuk- on 2007-08-16 00:27:31
Compare to the last 48kbit/s listening test, 64kbits will only bring slightly better results.

I-tunes at 96kbits ist transparent for most users on both tests.

WMA is not interesting for me.

Nero-AAC HE score was 3,64 points at 48kbits, now we can see 3,74 points at 64kbits.
This is not very impressive for me. I thought Nero will performe better at 64kbit/s.
Of course, it is still usable for e.g. portable devices or good quality webradio.

Vorbis is also better at 64kbits/ (3,16 to 3,32 points )


So i can go with itunes at 96kbit for high quality use (maybe nero performing better at this bitrate?), and 48-64kbits for medium quality use.

maybe 80kbits/s will hit a 4.xx score?

i think the next test should be a 96-112kbit multi-format test, also including Lame.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: rjamorim on 2007-08-16 00:28:38
Very interesting, Sebastian. Congratulations, and thank-you very much!
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 00:32:17
EDIT: correct link is http://www.listening-tests.info/mf-64-1/results.htm (http://www.listening-tests.info/mf-64-1/results.htm)


They're actually both correct, but now I agree that the first format which I posted doesn't make sense anymore since the listening tests have their own page. That htaccess redirection was good for the time where the tests were in subfolders of the MaresWEB site.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-08-16 00:34:51
Nice!

I'm a little surprised that Vorbis is on par with the others. During the test I had a feeling it would be worse. Now I need to check my own results.


A QUESTION:

Pardon my ignorance, is there any automated way to combine my own decrypted txt results into one table?
(in order to feed it to the ff123's ANOVA calculator)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 00:45:24
All you need is Chunky! http://www.phong.org/chunky/ (http://www.phong.org/chunky/)

And if you need a guide:

http://www.rarewares.org/rja/ListeningTest.pdf (http://www.rarewares.org/rja/ListeningTest.pdf)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-08-16 00:48:39
All you need is Chunky! http://www.phong.org/chunky/ (http://www.phong.org/chunky/)

And if you need a guide:

http://www.rarewares.org/rja/ListeningTest.pdf (http://www.rarewares.org/rja/ListeningTest.pdf)

Thanks!
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: guruboolez on 2007-08-16 01:21:41
My personal results:
Code: [Select]
WMAPro	high	Vorbis	low	HEAAC
2.3 3.7 2.0 1.0 3.2
2.0 3.0 1.5 1.0 2.5
2.0 2.5 2.5 1.0 1.7
2.8 4.3 3.2 1.5 3.8
2.5 4.5 2.8 1.0 1.8
2.7 2.5 2.0 1.0 1.5
1.8 5.0 1.5 1.0 3.0
1.8 3.5 3.0 1.0 2.2
2.0 3.5 3.0 1.0 2.3
3.5 3.0 3.0 1.0 2.0
2.0 3.0 2.0 1.0 1.7
1.5 2.3 1.3 1.0 1.5
4.0 3.0 3.5 1.0 4.0
3.5 3.0 2.5 1.0 2.8
2.1 1.5 3.0 1.0 2.0
3.0 4.5 2.0 1.5 3.0
1.2 3.5 2.0 1.0 1.5
3.5 3.0 2.0 1.0 2.0

FRIEDMAN version 1.24 (Jan 17, 2002) [url=http://ff123.net/]http://ff123.net/[/url]
Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Tukey's HSD:  0.574

Means:

high    WMAPro  Vorbis  HEAAC    low     
  3.29    2.46    2.38    2.36    1.06 

-------------------------- Difference Matrix --------------------------

        WMAPro  Vorbis  HEAAC    low     
high      0.839*  0.917*  0.933*  2.239*
WMAPro              0.078    0.094    1.400*
Vorbis                      0.017    1.322*
HEAAC                                1.306*
-----------------------------------------------------------------------

high is better than WMAPro, Vorbis, HEAAC, low
WMAPro is better than low
Vorbis is better than low
HEAAC is better than low

For the first time in listening tests my personal results are more evangelical than the collective one... no winner nor loser for my ears.

A direct comparison between my average scores and the collective one:

Code: [Select]
          collective   guruboolez  (diff)
low          1.55         1.06     -0.49
HE-AAC       3.74         2.36     -1.38
VORBIS       3.32         2.38     -0.94
WMAPRO       3.52         2.46     -1.06
high         4.59         3.29     -1.30
            ______       ______    ______
             3.34         2.31     -1.03
Compared to the whole group of testers my global evaluation for all competitors is clearly more harsh (-1.03 points on average), especially with the high anchor (-1.3 points) and HE-AAC (biggest deviation with -1.38 points). It confirms the lake of sympathy I feel for the SBR trick (there's several complains in my log files against the "SBR texture/noise"). I'm more disappointed by the high anchor which doesn't sound great to my ears. I expected more from LC-AAC two years after my previous test at 96 kbps.

WMAPro is a weird case. I'm not familar at all with this format (I never tested since its last metamorphosis in WMP11) and the new kind of distortion it produces. I disliked it on the beginning but I was much more enthousiastic after some times. Indeed, the second half of tested samples was better marked than the first one while it was at best the same for all other competitors. In other words my notation was more harsh during the second half but WMAPro's one has drastically grown in this severe period 
WMAPro artefacts were close to HE-AAC ones; it has a stronger smearing (cf kraftwerk, eig...) and share the same kind of SBRish issue (noise packets altering tonal sound, cymbals...) but often with less annoyance. It also has a kind of "noise sharpening" (for people knowing this foobar2000's plug-in) which tends to add some energy to high frequencies. Sound is often a bit brighter than reference to my ears. It's unexpected, not necessary a good thing but I find it rather pleasant in some situations, and certainly more enjoying than stereo reduction, pre-echo, lowpass or noise filtering. I simply fear that this kind of enhancement would quickly appear as tiresome (like noise sharpening IMO). That's why I wonder if I would still consider WMAPro so kindly with additionnal experience with this encoder and its own texture...

I was never fond of Vorbis at <80 kbps so I'm not surprised to see it inferior to HE-AAC with a confidence >95%. It often sound coarse, fat, with serious stereo issues (and a bit lowpassed too, but a smaller one would maybe increase the ringing...). I'm simply disappointed that for my taste no other format could currently outdistance this format.


As a consequence I'm disappointed. I maybe expected a miracle too soon after reading other people's comments. I will see in a future test if 80 or 96 kbps are more enjoyable for my taste.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-08-16 01:43:25
Compare to the last 48kbit/s listening test, 64kbits will only bring slightly better results.

I-tunes at 96kbits ist transparent for most users on both tests.

WMA is not interesting for me.

Nero-AAC HE score was 3,64 points at 48kbits, now we can see 3,74 points at 64kbits.
This is not very impressive for me. I thought Nero will performe better at 64kbit/s.
Of course, it is still usable for e.g. portable devices or good quality webradio.

Vorbis is also better at 64kbits/ (3,16 to 3,32 points )


So i can go with itunes at 96kbit for high quality use (maybe nero performing better at this bitrate?), and 48-64kbits for medium quality use.

maybe 80kbits/s will hit a 4.xx score?

i think the next test should be a 96-112kbit multi-format test, also including Lame.


It's technically not valid to compare results between tests, although the ratings differences do seem to make some sense.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: guruboolez on 2007-08-16 01:52:44
It's technically not valid to compare results between tests, although the ratings differences do seem to make some sense.

I think it's not completely pointless to note that both high and low anchor (which haven't change in the meantime - iTunes's version excepted) are now slightly worse than previously (samples are harder and/or listeners a bit more sensitive on average). A direct comparison between 48 kbps and 64 kbps performance should take this difference into account. It increases a bit the difference between 48 and 64 kbps encodings.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kwanbis on 2007-08-16 01:58:48
How would you rank codecs in such a situation, where A=B and B=C, but C<A?

not an expert, but at leas mathematically if A=B and B=C, A=C.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-08-16 02:00:32
Very interesting. After all my results are not so different from average except my ratings are spanned over a wider range.

Here are my ratings:
Code: [Select]
% 2.78    4.89    2.78    2.03    3.89
WMApro    high    Vorbis    low    Nero

Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) [url=http://ff123.net/]http://ff123.net/[/url]
Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Tukey's HSD:  0.804

Means:

high    Nero    Vorbis  WMApro  low     
  4.89    3.89    2.78    2.78    2.03 

-------------------------- Difference Matrix --------------------------

        Nero    Vorbis  WMApro  low     
high      1.000*  2.111*  2.111*  2.861*
Nero                1.111*  1.111*  1.861*
Vorbis                      0.000    0.750 
WMApro                                0.750 
-----------------------------------------------------------------------

high is better than Nero, Vorbis, WMApro, low
Nero is better than Vorbis, WMApro, low

Kudos to Nero!  A clear winner according to me. Probably I must like SBR sort of trickery. Ranked it "annoying" only twice.
(And I guess Nero needs some work on the classical orchestra sample "macabre")

WMA pro is disappointing. I'm not impressed. All narrow stereo problems turned out to be WMA.

Vorbis is not worse than WMA but it sounds to me that it didn't really improve very much (at this bitrate) for the last couple of years.

Both WMA and Vorbis tend to distort lower frequencies, which is very easy for me to notice on natural acoustic instruments (guitars, violin, trumpet, also voice). Too distorted sometimes, even worse than low anchor.

(I am not so sensitive to high frequency artifacts. At least typically I don't find it annoying.)

High anchor is very good. Almost transparent. However, I didn't really concentrate very much on the high anchor. Otherwise I could have given it a few more "4"s. But very impressive anyways.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kennedyb4 on 2007-08-16 02:58:22
It seems that Itunes at 96 VBR has outscored Itunes 128 CBR from the previous multi-format test.

That's a substantial improvement unless the difficulty of the samples is not comparable.

Guru's results make me think that prolonged exposure to various artifacts might cause scores to drop over time.

Thanks to all organizers and participants.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-08-16 05:42:15
This one goes to the experts:

How would you rank codecs in such a situation, where A=B and B=C, but C<A?


I think you have to just stick with your description and refer to the graph.  Otherwise the explanation becomes unwieldy.  A=B and B=C because if you repeated the test, there's a fair chance (more than 1 in 20) that A would score higher than B, or that C would score higher than B.  But we say A>C because there's less than a 1 in 20 chance that a repeat test would show the opposite.

BTW, these results do seem to contradict the NSTL results, but they can actually both be consistent because neither yielded a clear winner between nero he-aac and wma pro 10.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: vinnie97 on 2007-08-16 07:39:38
Guru, my taste mirrors yours on Vorbis...anything below 80 kbps and the codec is displeasing with the artifacts.  At 80 kbps, without a reference, my tin ears (a place where our similarities vanish) simply couldn't be happier.  *This* is the reason that I request that we stick with the original plan and do an 80 kbps multiformat test next.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Slacker on 2007-08-16 09:17:56
Little Question: How do I use the key to see my results? 
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-08-16 09:34:11
Little Question: How do I use the key to see my results? 

Open java abc/hr and go to menu Tools/Process result files.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alexxander on 2007-08-16 09:46:48
I used the key and decrypted results through java abc/hr menu Tools/Process and got 18 text files. Some resulting text files don't include all 5 ratings in text file (I rated all 5 tracks s of all 18 samples). Is this some kind of bug?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-08-16 10:07:50
Wow, much more results than what I expected!
Thank you Mares for organizing the test!
Thanks to all participants for doing the test!

I used the key and decrypted results through java abc/hr menu Tools/Process and got 18 text files. Some resulting text files don't include all 5 ratings in text file (I rated all 5 tracks s of all 18 samples). Is this some kind of bug?

I also have suspicion that java abc/hr has some bugs in processing encrypted results. Just never had time to check it.

It seems that Itunes at 96 VBR has outscored Itunes 128 CBR from the previous multi-format test.
That's a substantial improvement unless the difficulty of the samples is not comparable.

Different samples, different participants. Just look at how personal results posted here differ from the average.
Results from different listening tests are just not easily comparable.


How would you rank codecs in such a situation, where A=B and B=C, but C<A?
not an expert, but at leas mathematically if A=B and B=C, A=C.

Operator = and < have in this case different meaning. If average score of A is greater than average score of B, then B=A means that there is chance greater than threshold x that in another test B could have higher average score. B<A means that the chance that in another test B is in average better than A is less than x (x is predefined by procedure used for ranking). This is roughly speaking, correct definitions would be more complicated.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alex B on 2007-08-16 11:53:36
Here are my personal results:

Code: [Select]
% Sample Averages:
WMA    High    Vorbis    Low    Nero
2.60    4.00    1.70    1.00    3.30
2.00    3.50    2.00    1.00    3.00
2.80    4.00    2.30    1.00    2.70
3.40    4.00    3.10    1.00    3.70
2.40    3.60    2.20    1.00    2.30
2.10    3.50    1.70    1.00    2.50
1.70    2.50    2.00    1.00    1.70
2.20    3.40    3.00    1.00    2.60
1.60    3.20    2.30    1.00    2.60
3.10    3.50    2.80    1.00    2.60
2.60    3.50    2.40    1.00    2.80
1.80    3.40    2.00    1.00    1.80
2.90    3.80    2.30    1.00    2.60
3.00    3.90    2.00    1.00    2.70
2.00    3.70    2.30    1.00    1.70
3.00    4.00    2.10    1.20    2.10
2.30    3.50    2.80    1.00    1.80
3.40    4.00    3.40    1.00    3.10

% Codec averages:
% 2.49    3.61    2.36    1.01    2.53

I too am a bit disappointed. I would have expected a few pleasant surprises where the new codecs would have reached almost transparent listening experience. For me, only the high anchor would be usable, even though it is far from transparency.

Out of curiosity, I played some of the samples through my big & good Hi-Fi speakers. I did know that only headphones can reveal codec problems properly, but I was still surprised about how much better the encoded samples sounded through a standard stereo speaker system in a casual listening situation. I suppose that the normal room echoes get mixed with pre-echo and other codec faults and the listener's brain "calculates" subconsciously a new "combined acoustic space", which does not sound completely wrong.

WMA Pro behavior is interesting. It clearly produces more distortion than the other encoders (I mean constant distortion like an analog amp produces when it is played too loud) and behaves rather oddly with some samples. Despite these problems it was occasionally the best contender.

When the WMA Pro samples are inspected with an audio analyzer it looks like the MS developers are very optimistic about how high frequencies their codec can successfully fit in 64 kbps files. WMA Pro uses a lowpass filter at around 20 kHz. However, I suspect that the highest frequency range is more like an artificial byproduct of the MS version of "HE" than a real attempt to represent the original sound faithfully. The WMA Pro samples seem to produce quite altered waterfall displays at about 15-20 kHz when compared with the reference.


Edit: encoder > contender & a couple of typos
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 12:49:07
I used the key and decrypted results through java abc/hr menu Tools/Process and got 18 text files. Some resulting text files don't include all 5 ratings in text file (I rated all 5 tracks s of all 18 samples). Is this some kind of bug?


  Now this is weird!

OK, I uploaded all user comments - you can either browse here (http://www.listening-tests.info/mf-64-1/miscellaneous/results/) or download everything as signed, solid and locked RAR (http://www.listening-tests.info/mf-64-1/miscellaneous/results.rar). Notice that those were the comments used for evaluating. Please check if you find all five codecs rated in my decrypted result files.

An updated HTML results file will be online this evening.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: thana on 2007-08-16 14:12:42
i downloaded the rar file and tried to process the results with chunky but i always get this error:

Code: [Select]
G:\listeningtest\chunky-0.8.4-win>chunky.exe --codec-file="codecs.txt" -n --ratings=results --warn -p 0.05
Parsing result files...
Traceback (most recent call last):
  File "chunky", line 639, in ?
  File "chunky", line 595, in main
  File "abchr_parser.pyc", line 634, in __init__
  File "abchr_parser.pyc", line 646, in _handleTargets
  File "abchr_parser.pyc", line 697, in __init__
abchr_parser.Error: Sample directory names must end in a number.

but they do end in numbers as you can see:

Code: [Select]
G:\listeningtest\chunky-0.8.4-win>dir
25.05.2004  21:26            49.152 chunky.exe
16.08.2007  15:00                60 codecs.txt
25.05.2004  21:26            45.123 datetime.pyd
25.05.2004  21:26           712.726 library.zip
25.05.2004  21:26           135.234 pyexpat.pyd
25.05.2004  21:26           974.915 python23.dll
16.08.2007  13:40    <DIR>          Sample01
15.08.2007  23:37    <DIR>          Sample02
15.08.2007  23:38    <DIR>          Sample03
15.08.2007  23:38    <DIR>          Sample04
15.08.2007  23:27    <DIR>          Sample05
15.08.2007  23:38    <DIR>          Sample06
15.08.2007  23:39    <DIR>          Sample07
15.08.2007  23:42    <DIR>          Sample08
15.08.2007  23:42    <DIR>          Sample09
15.08.2007  23:42    <DIR>          Sample10
15.08.2007  23:43    <DIR>          Sample11
15.08.2007  23:43    <DIR>          Sample12
15.08.2007  23:43    <DIR>          Sample13
15.08.2007  23:43    <DIR>          Sample14
15.08.2007  23:44    <DIR>          Sample15
15.08.2007  23:44    <DIR>          Sample16
15.08.2007  23:27    <DIR>          Sample17
15.08.2007  23:50    <DIR>          Sample18
25.05.2004  21:26            16.384 w9xpopen.exe
25.05.2004  21:26            49.218 _socket.pyd
25.05.2004  21:26            57.407 _sre.pyd
25.05.2004  21:26           495.616 _ssl.pyd
25.05.2004  21:26            36.864 _winreg.pyd

what am i doing wrong?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-08-16 14:20:14
i downloaded the rar file and tried to process the results with chunky but i always get this error:

What I did was this: I made a new empty folder and moved all samples subfolders there, and also added a switch to Chunky, something like --directory=".\empty_folder"
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alex B on 2007-08-16 15:08:57
i downloaded the rar file and tried to process the results with chunky but i always get this error: ...

The "sample01", "sample 02" etc folders must be inside an empty base folder.

After strugling with the same problem for a while I found that the following worked:

First I saved the "codecs.txt" file in the chunky program folder.

Then I created a subfolder named "res" under my chunky program folder and placed the sample folders inside the empty "res" folder.

After that I opened a command prompt and went to this "res" folder:
C:\Documents and Settings\Alex B>L:
L:\>CD 64test\chunky\res\
L:\64test\chunky\res>

and used this command line:
L:\64test\chunky\res>..\chunky.exe --codec-file=..\codecs.txt -n --ratings=results --warn -p 0.05

(italics=prompt, bold=command line)

Chunky didn't like one of the text lines in the source files:
Unrecognized line: "Ratings on a scale from 1.0 to 5.0"
However, despite the warnings it created apparently correct result files.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Rio on 2007-08-16 15:11:39
This one goes to the experts:

How would you rank codecs in such a situation, where A=B and B=C, but C<A?


I suggest it would be politically (and mathematically) correct that it is like if A>B and B>C then A>C.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: naylor83 on 2007-08-16 15:58:36
Stupid question alert:

If I ranked the reference, will the result text file say so? Or will it just not show a result for that file?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 16:00:11
The decrypted result files will then contain the rating you gave for the reference.

Edit: It will look like this:

[...]
2L File: Sample08\Sample08.wav
2L Rating: 4.5
2L Comment: blah
[...]
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: pdq on 2007-08-16 16:07:31

This one goes to the experts:

How would you rank codecs in such a situation, where A=B and B=C, but C<A?


I suggest it would be politically (and mathematically) correct that it is like if A>B and B>C then A>C.

I would say rather that A>C and the B is approximately equal to A and approximately equal to C, but is not necessarily A>B>C since there is a possibility that either B>A or B<C (but not both).
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benski on 2007-08-16 16:09:36

This one goes to the experts:

How would you rank codecs in such a situation, where A=B and B=C, but C<A?


I suggest it would be politically (and mathematically) correct that it is like if A>B and B>C then A>C.


No.

There is a chance that A>B but also a chance that A<B.
There is a chance that B>C but also a chance that B<C.
A>C


To rank them, A and B are tied for first.  C is third.
Given the data set, the "true" rank has three possibilities.  ABC, BAC, ACB.  However, more samples would be necessary to determine this.

One thing I've always disliked about these tests is that, given the subjective nature of the ratings, the deviation in participants' rating style is likely larger than the standard deviation.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: naylor83 on 2007-08-16 16:13:19
Stupid question alert (again):

I'm trying to work out which samples are which contenders.

I realize number 3 is Vorbis, and that number 4 must be low anchor. But I'm confused about the others...
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: guruboolez on 2007-08-16 16:17:16
You can use MrQuestionMan, foobar2000 or several other tools to check these files :

1: WMAPro (losslessly compressed due to the lack of WMA CLI decoder)
2: high anchor (iTunes LC-AAC at ~100 kbps)
3: vorbis (ogg fileformat)
4: low anchor (iTunes LC-AAC at 48 kbps)
5: HE-AAC (Nero Digital AAC at ~64 kbps).
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: naylor83 on 2007-08-16 16:19:22
1: WMAPro (losslessly compressed)
2: high anchor (LC-AAC at 96 kbps)
3: vorbis (ogg fileformat)
4: low anchor (LC-AAC at 48 kbps)
5: HE-AAC


Thanx.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-08-16 16:20:10
One thing I've always disliked about these tests is that, given the subjective nature of the ratings, the deviation in participants' rating style is likely larger than the standard deviation.


In the analysis, each listener is treated as a separate "block", which takes into account the fact that different listeners have individual rating styles.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Whelkman on 2007-08-16 17:26:06
In the analysis, each listener is treated as a separate "block", which takes into account the fact that different listeners have individual rating styles.

Thanks. I wondered about this. I doubt I applied consistent "objective" ratings across the board, but codecs were always ranked compared to each other.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 21:39:05
Does anyone know how to make Excel to refer to the current table when creating a plot? I have a document with 19 tables and I thought about plotting the results for the first sample and then copying and pasting this in the other 17 documents and then only changing the values. However, if I copy and paste a plot, the pasted plots still refer to the source table. Then if I change the data source, some of the plot formatting is gone, such as the margins, the vertical grid and the grid color.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-16 23:31:31
Uploaded the plots for each sample. The corresponding text is still missing, though, although there isn't much to say since all three were tied in almost every case.

Off-Topic: That listening test page needs rework badly. The design could be better and maybe offer some help for newbies.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: mezenga on 2007-08-17 00:17:04
Does anyone know how to make Excel to refer to the current table when creating a plot?
Maybe joining all 19 tables in a big one and making a single one for the plot. This single table should change its content among one of the 19 blocks from the big table. That would be my approach for a dynamic plot.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-08-17 00:18:39
Interesting.  he-aac had some clear winners over wmapro10, whereas there were none the other way around.  Poets of the fall and Bachpsichord are particularly striking.  Choice of samples is pretty critical in these tests.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: echo on 2007-08-17 00:36:18

How would you rank codecs in such a situation, where A=B and B=C, but C<A?

not an expert, but at leas mathematically if A=B and B=C, A=C.

Mathematically yes, but this is not math, this is statistics. 

To put it in simple terms, without any statistical talk, this means that A is probably equal to B, B is probably equal to C, while A is greater than C. Try to think "equal" like "roughly equal".
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: rockcake on 2007-08-17 05:32:10
I'd also like to give a big thankyou to Sebastian for organising another test (and publishing the results amazingly quickly!), especially under difficult circumstances e.g. HDD failure, widespread apathy, moving house etc. etc.  You're a legend! 
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: TechVsLife on 2007-08-17 07:43:56
I'd also like to give a big thankyou to Sebastian for organising another test (and publishing the results amazingly quickly!), especially under difficult circumstances e.g. HDD failure, widespread apathy, moving house etc. etc. You're a legend!

Or does such superhuman generosity border on insanity? Is his undying fame worth the terrible price he pays--with his very life etc. (Life itself is a 64 kbps lossy compression where you have to pick carefully what to carry to get to a half-decent harmony, but discerning ears will always be able to pick up the falseness, especially in critical passages.)


But seriously, thanks for the hard work, even if insane,
--and how about the next test! (128 kbps mp3?).
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: IgorC on 2007-08-17 08:05:48
Thanks for test. Nero has done a good work.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alexxander on 2007-08-17 09:30:42
Congrat Nero!

I can't believe WMA Pro 10 is true CBR because it has good results compared to the VBR samples. If it really is there would be room for improvement (by going VBR)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: halb27 on 2007-08-17 10:21:59
... I can't believe WMA Pro 10 is true CBR because it has good results compared to the VBR samples. If it really is there would be room for improvement (by going VBR)

It's rather the other way around. The beleive in VBR's universal superiority has simply no good basis. Moreover there seems to be a common misconception that a constant frame bitrate (CBR) means constant audio data bitrate which is simply wrong. Maybe WMP10pro CBR offers a higher degree of audio data bitrate variation than for instance mp3 CBR. But even without it there's really no reason to think that constant bitrate automatically means reduced quality.
There's no contradiction to the fact that Vorbis, NeroAAC, MPC, Lame 3.98 are good at VBR.
Everything depends on codec principles and - may be to a larger extent - implementation details.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-08-17 10:22:24
Congrat Nero!
I can't believe WMA Pro 10 is true CBR because it has good results compared to the VBR samples. If it really is there would be room for improvement (by going VBR)

Thanks!

Considering CBR: CBR is more dependent on choice of samples. It is expected that Nero would perform on this sample set a bit better when CBR 64kbps is used (most probably not enough to be statistically better than WMA). From this test it can also be concluded that VBR mode in Nero doesn't have big flaws.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alexxander on 2007-08-17 11:29:09
...Moreover there seems to be a common misconception that a constant frame bitrate (CBR) means constant audio data bitrate which is simply wrong. Maybe WMP10pro CBR offers a higher degree of audio data bitrate variation than for instance mp3 CBR...

So CBR actually means constant frame bitrate? I thought CBR referred to constant audio data bitrate, like plain old PCM: for example sampling 8000 times per second at fixed intervals with 8 bits per sample. Then, if frame bitrate is constant but audio bitrate varies within a frame it's actually VBR but only on a different timescale. It all depends on the exact definitions and the correct use of terms (as always).

Thanks for clearing up.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Ivan Dimkovic on 2007-08-17 11:40:16
CBR, in this context, means:  "Fixed bit rate within a fixed (predictable) period, or fixed amount of data"

Most "CBR" codecs are actually variable bitrate, but they have relatively small "bit buffer" which is constant in size and known a-priori, and that provides variations in frame bit rate.  Within those limits, codec has full freedom to allocate bits.

Even within a single frame, bits are allocated in the variable sense - depending on the psychoacoustic threshold, etc...

So, in a nutshell - "CBR" in the modern audio codec is way different than "CBR" in PCM sense - both frames and  individual samples are coded with different, variable, accuracies.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alex B on 2007-08-17 12:45:01
Sebastian,

Could you possibly post the average results per sample as a table like this (in the original sample order):

Code: [Select]
WMA    High    Vorbis    Low    Nero
2.60    4.00    1.70    1.00    3.30
2.00    3.50    2.00    1.00    3.00
2.80    4.00    2.30    1.00    2.70
3.40    4.00    3.10    1.00    3.70
2.40    3.60    2.20    1.00    2.30
2.10    3.50    1.70    1.00    2.50
1.70    2.50    2.00    1.00    1.70
2.20    3.40    3.00    1.00    2.60
1.60    3.20    2.30    1.00    2.60
3.10    3.50    2.80    1.00    2.60
2.60    3.50    2.40    1.00    2.80
1.80    3.40    2.00    1.00    1.80
2.90    3.80    2.30    1.00    2.60
3.00    3.90    2.00    1.00    2.70
2.00    3.70    2.30    1.00    1.70
3.00    4.00    2.10    1.20    2.10
2.30    3.50    2.80    1.00    1.80
3.40    4.00    3.40    1.00    3.10


I would like to draw a chart in the following format, but it would be quite laborious to grab the values from the result images.

Alex B's personal results:
(http://www.adart.pp.fi/ha/pix/64_ab_chart.png)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-08-17 13:05:40
Could you possibly post the average results per sample as a table like this (in the original sample order):

It is possible to get that data using chunky on the complete test results which are available in .rar.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alex B on 2007-08-17 13:37:21
It is possible to get that data using chunky on the complete test results which are available in .rar.

I already did that, but I wasn't sure if any corrections were needed. Here's what Chunky calculated (please correct me if the values are not right):

Code: [Select]
% Sample Averages:
WMA    High    Vorbis    Low    Nero
3.32    4.78    2.09    1.81    4.21
3.39    4.58    2.21    1.50    3.93
3.84    4.61    3.91    1.22    3.88
3.87    4.70    3.81    1.80    4.27
3.45    4.84    3.25    1.61    3.92
3.21    4.69    2.94    1.37    3.37
2.79    4.55    3.20    1.31    2.78
3.55    4.80    3.36    1.86    4.01
3.30    4.60    3.80    1.47    3.76
4.25    4.47    4.22    1.59    4.33
3.84    4.71    3.73    1.48    3.92
2.92    4.13    2.94    1.45    2.74
3.90    4.47    3.34    1.34    3.85
3.54    4.26    3.29    1.30    3.84
3.16    4.50    3.50    1.50    3.36
3.67    4.86    2.78    2.03    3.63
3.54    4.49    3.78    1.85    3.49
3.87    4.58    3.60    1.41    3.96

% Codec averages:
3.52    4.59    3.32    1.55    3.74


and here's the chart:
(http://www.adart.pp.fi/ha/pix/64_global_chart.png)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-08-17 14:02:38
This looks much better now. Nero is at its place
The only way to check values is to compare them to those that Mares published.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: lexor on 2007-08-19 15:44:25
This one goes to the experts:

How would you rank codecs in such a situation, where A=B and B=C, but C<A?

That is actually not a contradiction as such (though further expert opinion on the actual statistical metric used is needed).

You think that is a contradiction, because such situation doesn't happen in "normal" number systems, like integers, reals, etc. What you noticing is the property of total order breaking. However not all valuations have that property.

Take integers, if you take 2 integers at random there is the way to count from one to the other, precisely because there is a total order and you know what is less/greater than what, what equals what and what follows what.
On the other hand take Complex numbers, this is the first number system students usually exposed to in school that doesn't have total order on its elements (though school teachers don't usually mention that). Given 2 random Complex numbers there isn't "the" way to count from one to the other, in fact there are infinitely many ways, all correct in some sense.

So while I don't know if the underlying statistical measure produces set of values that has total order, your example (if not subject to some freaky error) shows that it doesn't, and should be read as raking:

1) HE = WMA
2) Ogg

My immediate intuition would be to use equivalence classes to solve this problem.
1) Make individual comparison between every possible pair
2) Look at the ones with strict inequalities
3) Pick the largest of them all (strictly greater, not >=)
4) Rank that first
5) Add all who are directly equal to it to its equivalence class (not ones that are equal by some chain of equalities)
6) Removing them from further consideration
7) From remaining, rank the next largest as number 2
8) Go to 5 and repeat for the rest of the ranking.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Jillian on 2007-08-19 16:06:06
How about use approximation instead of equality.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: halb27 on 2007-08-19 22:37:08

This one goes to the experts:

How would you rank codecs in such a situation, where A=B and B=C, but C<A?

That is actually not a contradiction as such...

Same opinion for me.
For clearly defined objects A, B, C, a clearly defined identy and a clearly defined <-relation  it would be a contradiction.
Here A and B correspond to the quality of Nero HE AAC and that of WMA pro, and C corresponds to that of Vorbis. Quality as measured with this test. 
The problem is in the meaning of '=' and '<' as these are rough quality comparison operators which can easily make up for such a pseudo-contradiction.

The zoomed view is a major evil to me as it overestimates such a rough '<' comparison.
From absolute view ranging from 1.0 to 5.0 it's easy to say 'all these three encoders yield roughly the same quality with vorbis being a tiny bit behind.'
This is what is most important in practice, cause with these codecs you usually don't have the choice which one to use on a mobile device. No matter which one you use: you get state of the art 64 kbps technology regarding quality.

If it's up to elaborating differences between the encoders the very personal preferences are much more of concern than the overall small quality differences according to the test.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Woodinville on 2007-08-21 20:48:06

How would you rank codecs in such a situation, where A=B and B=C, but C<A?

not an expert, but at leas mathematically if A=B and B=C, A=C.



You can not assume transitivity in test results.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Nikaki on 2007-08-22 15:20:08
Interesting results. I only use Vorbis for high bitrates (my own music collection on my PC's hard disk), since I expected low bitrates (for a small, portable mp3 player) to kill Vorbis. Seems like I was wrong.

With those results, low bitrate Vorbis internet streams make more sense now!
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: TechVsLife on 2007-08-22 16:40:43
Wouldn't it make more sense just to go with one of the winners for low bitrates?  (Also note that on some of the subtests, ogg did much more poorly relative to the others.)  Or are there other tradeoffs here? 



Interesting results. I only use Vorbis for high bitrates (my own music collection on my PC's hard disk), since I expected low bitrates (for a small, portable mp3 player) to kill Vorbis. Seems like I was wrong.

With those results, low bitrate Vorbis internet streams make more sense now!
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Junon on 2007-08-22 17:10:43
Wouldn't it make more sense just to go with one of the winners for low bitrates?  (Also note that on some of the subtests, ogg did much more poorly relative to the others.)  Or are there other tradeoffs here?

Well, for streaming purposes you're right, 64 kbit/s WMA10 Pro/Nero AAC seem to be better choices than Vorbis here. For low bitrates in general this isn't implicitly the case, because Vorbis, unlike the other two codecs, doesn't rely on tricks to artificially improve the quality. HE-AAC includes Spectral Band Replication, WMA 10 Pro makes use of a very similar approach. For portable players these techniques can be quite a burden, since they heavily drain batteries. Of course, Vorbis isn't a saint concerning its power hunger as well, but judging from what I've read so far it isn't as demanding as HE-AAC. I haven't seen any figures about WMA 10 Pro's decoding performance on low bitrates yet, hence I'll abstain from further commenting it. A Zune owner might be able to shed some light on this matter.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: TechVsLife on 2007-08-22 18:16:17
Thanks, I forgot about that whole dimension to the problem. It's like comparing compression utilities by size reduction and forgetting about speed. It would be useful to have some power consumption index, but I guess that may vary greatly by device.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Woodinville on 2007-08-27 22:31:13
Out of curiousity, why is there no castinettes in this test?  In my recollection pre-echo was a large problem with several of the codecs, and not with some others. It would seem unreasonable to suppress this issue.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: [JAZ] on 2007-08-27 22:46:43
Out of curiousity, why is there no castinettes in this test?  In my recollection pre-echo was a large problem with several of the codecs, and not with some others. It would seem unreasonable to suppress this issue.


AFAIR, it's "castanets", not castinettes, and there are other samples in it that show that issue. You'll definitely find comments about preecho in several test comments (i know, i wrote some)

Edit: typos.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Woodinville on 2007-08-28 18:03:56
Quote
' date='Aug 27 2007, 14:46' post='512918']

Out of curiousity, why is there no castinettes in this test?  In my recollection pre-echo was a large problem with several of the codecs, and not with some others. It would seem unreasonable to suppress this issue.


AFAIR, it's "castanets", not castinettes, and there are other samples in it that show that issue. You'll definitely find comments about preecho in several test comments (i know, i wrote some)

Edit: typos.


Indeed there are some other sources, but these other sources also have ring tones and such, which mask a lot of pre-echo. I don't doubt you could still hear some. Oh, and thank you for offering the spelling corrections, I am elucidated. Yeah.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-08-28 18:08:39
Well, I asked for sample suggestions long before the test started...
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-08-31 15:28:36
How many of you that took test did use Headphones? How many did use in ear phones?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-08-31 15:35:22
headphones for me
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: elmar3rd on 2007-08-31 15:54:09
How many of you that took test did use Headphones? How many did use in ear phones?

Cheap Sennheiser HD 60 TV Headphones an AC97 onboard Soundcard.

I was recently thinking of an additional questionnaire in future listening-tests, e.g. for listening environment, age, ...
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: IgorC on 2007-08-31 17:19:44
Headphones for 17 samples.  Loudspeakers for White America samples because I couldn't hear the diff in headphones. Masking, noise canceling?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-08-31 17:44:13
good headphones (mid-price Beyerdynamic)

with some generic on-board sound on a laptop.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: naylor83 on 2007-08-31 19:43:26
Headphones: Koss PortaPro
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Alex B on 2007-08-31 20:08:18
I too used the Koss PortaPro headphones, which are quite suitable for testing encoders at this quality level.

...and as I wrote before:
Out of curiosity, I played some of the samples through my big & good Hi-Fi speakers. I did know that only headphones can reveal codec problems properly, but I was still surprised about how much better the encoded samples sounded through a standard stereo speaker system in a casual listening situation. I suppose that the normal room echoes get mixed with pre-echo and other codec faults and the listener's brain "calculates" subconsciously a new "combined acoustic space", which does not sound completely wrong.

These speakers have a price tag of about EUR 2000. The PortaPros are about EUR 50 or less.


Headphones for 17 samples.  Loudspeakers for White America samples because I couldn't hear the diff in headphones. Masking, noise canceling?



My comments about White America:

"I hate this sample. My previous sample was the quiet bibilolo and I had set the volume level louder than normal. I didn't remember to reduce the level before starting this. It was like an explosion inside my eardrums. I hope I didn't damage my hearing...

In general the sample is overcompressed and very distorded. It does not have much that encoders could hide or alter. A bit more pre-echo & distortion does not change the ugly nature of this sample."
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-09-03 10:58:48
It seems to me that in most cases headphones reveal more differences than speakers and that most of participant in public listening tests use headphones.
Also while doing this listening test, I discovered that in some cases I could hear more differences with cheap earphones Creative EP 630 than with Sennheiser HD 650. I guess it is because of blocking of outside noise (though there was not much outside noise since I was in a room with doors and windows closed, and using quiet HTPC).
Or maybe it has to do something with neutralizing effects of head and pinnae related filtering.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: naylor83 on 2007-09-03 11:17:28
It seems to me that in most cases headphones reveal more differences than speakers and that most of participant in public listening tests use headphones.
Also while doing this listening test, I discovered that in some cases I could hear more differences with cheap earphones Creative EP 630 than with Sennheiser HD 650. I guess it is because of blocking of outside noise (though there was not much outside noise since I was in a room with doors and windows closed, and using quiet HTPC).
Or maybe it has to do something with neutralizing effects of head and pinnae related filtering.


Could it be because the cheaper ones don't produce all frequencies as well/evenly, which reveals high frequency artifacts better...?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-09-03 12:23:51
Could it be because the cheaper ones don't produce all frequencies as well/evenly, which reveals high frequency artifacts better...?

It is true that these headphones have different frequency responses:
http://www.headphone.com/technical/product...ild-a-graph.php (http://www.headphone.com/technical/product-measurements/build-a-graph.php)
http://www.hydrogenaudio.org/forums/index....ost&id=3535 (http://www.hydrogenaudio.org/forums/index.php?act=Attach&type=post&id=3535)
But it is question if this is the only thing that causes different perception of differences.

And difference between discovering artifacts with speakers or headphones IMO is beyond just different frequency responses of speakers/headphones. There are examples of double talk artifacts that sound terrible on headphones and are hard to perceive on speakers. One for example is sample 16 in this test.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Whelkman on 2007-09-03 19:28:01
I used a pair of Sennheiser PX 100s (http://www.headphone.com/guide/by-headphone-type/ear-pad-type/sennheiser-px-100.php) for the test.


"I hate this sample. My previous sample was the quiet bibilolo and I had set the volume level louder than normal. I didn't remember to reduce the level before starting this. It was like an explosion inside my eardrums. I hope I didn't damage my hearing...

In general the sample is overcompressed and very distorded. It does not have much that encoders could hide or alter. A bit more pre-echo & distortion does not change the ugly nature of this sample."

I composed something similar but deleted it prior to submission due to unprofessionalism. Of all samples I spent significant time on I had the most difficultly with that one, precisely because the sample itself was so offensive to my ears. Even with volume levels down the song remains a screeching blob.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-09-06 13:54:01
All of a sudden, I have got a small question -- about the error bars on all the plots.

If we compare the plots for two different samples, the error bars are shorter for the sample with more listeners. This makes sense. (More listeners --> more representative statistics --> less error)  Ok.

But if we look at just one plot (any one of the plots), it seems the error bars of all 5 contenders have exactly the same size. Are they actually exactly the same? Is it how it's supposed to be due to the design of the test?
Are there any circumstances when error bars could have different size for different contenders?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-09-06 15:29:07
Within a sample plot, all bars should have the same size - always.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: kdo on 2007-09-06 16:06:10
Within a sample plot, all bars should have the same size - always.

Somehow this feels counter-intuitive.

Imagine an extreme case when one contender is rated 3.0 by ALL listeners (i.e. all of them give exactly the same rating), but other contender gets different ratings between 1.0 and 5.0
Why should the error bars be equal?

(I don't doubt the results, just want to understand a little deeper.)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-09-06 17:20:24
Maybe someone with more knowledge in statistics can answer your question.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: robert on 2007-09-06 18:00:51
Who said all bars should be equal? What do you want the bars to represent?

some boxplot example: http://www.physics.csbsju.edu/stats/box2.html (http://www.physics.csbsju.edu/stats/box2.html)
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-09-06 23:06:13
In my results (and Roberto's, Guru's and ff123's), the bars for the various contenders of the same sample will have the same length.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: naylor83 on 2007-09-06 23:11:46
In my results (and Roberto's, Guru's and ff123's), the bars for the various contenders of the same sample will have the same length.


If the bars are supposed to indicate the quartiles they should vary a bit. But I haven't checked what those bars are supposed to be...
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-09-07 03:48:21
For this type of analysis, the error bars are all the same size.  Another way you can do the analysis is to have a different confidence range for every comparison.  So  for the 5 codecs (including the anchors), you would have 10 different numbers.  This can be represented well in matrix table format, but not nicely in a graph format.  If you want to get matrix type confidence ranges, download the bootstrap program from my site:

http://ff123.net/bootstrap/ (http://ff123.net/bootstrap/)

which performs this type of analysis.  In practice, the two types of analyses yield very similar results.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: robert on 2007-09-07 10:58:53
So the bars do not represent the distribution of data collected for each codec, as, for example, you could have one codec rated by all people 5.0 and you'll add bars to it. I find this confusing. What is the meaning of the painted bars? How should I read them?
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Moguta on 2007-09-08 02:56:31
I would've loved to see MP3 involved in this test.  We know that Vorbis, AAC, and WMA are better, but just as a comparison it's always interesting to see how the newer, improved codecs rate nowadays against our friendly ol' MP3 fomat, to know exactly how much of an improvement there is.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: ff123 on 2007-09-08 03:38:00
So the bars do not represent the distribution of data collected for each codec, as, for example, you could have one codec rated by all people 5.0 and you'll add bars to it. I find this confusing. What is the meaning of the painted bars? How should I read them?


If the bottom of the bar of one codec does not touch the top of the bar of another codec, you can state with at least 95% confidence that the first codec is better than the second one.

The bars being all the same size means that you might lose a bit of power in making statistical distinctions between codecs.  But I think that's more than balanced by having the nice, easy-to-look at pictures instead of tables of numbers.

There are some who assert (and they have a point) that even if there are statistical differences between codecs, it may not make a practical difference if the ratings are relatively close to each other (close being determined by looking at the pictures and making a judgment).
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: muaddib on 2007-09-26 13:40:29
It seems that Itunes at 96 VBR has outscored Itunes 128 CBR from the previous multi-format test.
That's a substantial improvement unless the difficulty of the samples is not comparable.
Different samples, different participants. Just look at how personal results posted here differ from the average.
Results from different listening tests are just not easily comparable.

Sorry for bringing this up again, but I have one more note about this. iTunes 96kbps VBR was used in this test at 64kbps and in previous at 48kbps. Some samples were used in both tests. But score for those sample is not the same (example: Toms Diner 4.70 vs 4.86) and the decoded sample is the same. Even a participant involved in both tests didn't give the same rating (examples: Alex B 4.0 vs 4.2, haregoo 5.0 vs 4.5).
Unfortunately it is not possible to get consistent results
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2007-09-26 14:28:45
Yes, this is normal and depends on the mood, the listening-conditions (maybe different headphones or soundcard, possible noise from the neighbors, etc.) and health (maybe the listener just got better from a cold or still has a cold while testing).
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benwaggoner on 2008-04-10 23:25:00
Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benski on 2008-04-10 23:29:48
Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


I would agree here.  Streaming is the main use so far for 64kbps.  Low bitrates are interesting for portable devices, but the CPU usage (and hence battery life) of the winners of this test (HE-AAC and WMA Pro) leaves a lot to be desired.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benwaggoner on 2008-04-11 02:57:41

Say, are there any plans for doing a new test here?

As I mentioned elsewhere, it's not a good apples-to-apples to compare CBR WMA to qulaity VBR in other codecs. The WMA family supports quality VBR, as well as 2-pass CBR and bitrate VBR modes.

And for a streaming test, CBR is really the appropriate encoding mode. While fixed quality is an interesting thing to look at, it excludes rate control, which is a very important part of codec design, and a place where a lot of engineering effort goes.


I would agree here.  Streaming is the main use so far for 64kbps.  Low bitrates are interesting for portable devices, but the CPU usage (and hence battery life) of the winners of this test (HE-AAC and WMA Pro) leaves a lot to be desired.

How are you measuring CPU use/battery drain of the codecs? We've done a ton of work for the mobile implementations of WMA Pro to get the CPU hit low enough to make it feasible for phone use. I haven't done any formal testing with recent devices though.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2008-04-11 18:43:32
The reason why WMA was tested in CBR mode is that Microsoft seems to recommend CBR over VBR for WMA. Also, IIRC, VBR produced target bitrates that deviated from the average bitrate of the other encoders by more than 10%. 2-pass modes for short samples are also not an option - using 2-pass must be done on complete tracks and then samples have to be extracted out of the encoded full tracks.

A pure CBR test could be interesting for streaming indeed.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benwaggoner on 2008-04-11 21:20:28
The reason why WMA was tested in CBR mode is that Microsoft seems to recommend CBR over VBR for WMA.

Do we? Do you have a link - I'd like to have that corrected. Speaking for Microsoft, I recommend that content that needs CBR be encoded as 2-pass CBR, and otherwise 2-pass VBR be used. We've done a lot of work around 2-pass audio encoding.

Quote
Also, IIRC, VBR produced target bitrates that deviated from the average bitrate of the other encoders by more than 10%. 2-pass modes for short samples are also not an option - using 2-pass must be done on complete tracks and then samples have to be extracted out of the encoded full tracks.

Hmmm. How short are the clips you're using? If you can give me a reproducible test for this, I'll pass it on to our engineers. In my experience, VBR audio comes out within 1% of the target, but I'm normally encoding at least 60 second clips.

2-pass VBR peak limited might work better in this case. But if you need to use CBR, at least use 2-pass.

Quote
A pure CBR test could be interesting for streaming indeed.

Great, I'd love to see that as well.

For the WMA codecs, the proper mode to use for that (unless it's a test of live encoders) would be 2-pass CBR. We are able to get a meaningful reduction in peak QP with 2-pass CBR.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: Sebastian Mares on 2008-04-11 21:45:13
The test performed by NSTL featured WMA in CBR mode. Since you explicitly instructed NSTL what settings to use, one would assume you had a reason why you did this: obtain best quality results.

If that is not the case, well, sorry. IIRC, WMA did not offer a quality based VBR mode that produced files with the target bitrate.

Could you explain me what multi-pass CBR is supposed to do? I thought multi-pass encoding was good for ABR only. For CBR you always assign the same number of bits (don't know if WMA has something like a bit reservoir -in case it does, I imagine that could be the only variable thing that could be influenced by multi-pass encoding).
As for bitrate based VBR (which I call ABR) I would prefer to encode full tracks and then extract the sample from that. Otherwise the test has no or less usage.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benwaggoner on 2008-04-11 23:28:44
The test performed by NSTL featured WMA in CBR mode. Since you explicitly instructed NSTL what settings to use, one would assume you had a reason why you did this: obtain best quality results.
That test was done before my time, but my understanding is that we used 1-pass CBR in that case as that was the only rate-controlled mode supported by HE AAC, and the goal was to have an apples-to-apples test. It was never meant to be a demonstration of best practices. 1-pass CBR is certainly the most challenging codec mode, so it's interesting to test, but nothing I use other than for live encoding.

Quote
If that is not the case, well, sorry. IIRC, WMA did not offer a quality based VBR mode that produced files with the target bitrate.
Understood. I just want to help make future tests a more scenario-relevant comparison.

Quote
Could you explain me what multi-pass CBR is supposed to do? I thought multi-pass encoding was good for ABR only. For CBR you always assign the same number of bits (don't know if WMA has something like a bit reservoir -in case it does, I imagine that could be the only variable thing that could be influenced by multi-pass encoding).
Correct. with 2-pass CBR, you're able to essentially request a bigger bit reservoir in advance of complex audio, to keep worst-case QP lower. With 2-pass VBR, we essentially calculate the QP that will produce closest to the optimum bitrate, and then vary QP's per block a little in order to hit the target. But in essence an unconstrained 2-pass VBR is a lot like a "magic" way to figure out what quality level to use to give a file of the requested size.

Quote
As for bitrate based VBR (which I call ABR) I would prefer to encode full tracks and then extract the sample from that. Otherwise the test has no or less usage.
Makes sense to me.

Moderation: Fixed quotes.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: hellokeith on 2008-04-12 08:27:31
Do we? Do you have a link - I'd like to have that corrected. Speaking for Microsoft, I recommend that content that needs CBR be encoded as 2-pass CBR, and otherwise 2-pass VBR be used. We've done a lot of work around 2-pass audio encoding.


Hi Ben,

Nice to see you here at HA.  I think you'll find this place somewhat subdued compared to AVSF..

Interesting you speak of 2-pass VBR WMA.  I have been using -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 for more than a year with excellent results on my portable.  I think perhaps it is underrated/underused in the lossless community, though it wasn't trivial to get the VBS command line options all sorted out.  The reason I ended up with ~128kb 2-pass VBR WMA was that during my testing, I found it maintained the best stereo imaging during intricate percussion/cymbal passages.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: IgorC on 2008-04-12 12:55:33
I tried 1 and 2 pass CBR wma10 at 64 kbit/s in past. I didn't share the results here. There were miscellaneous changes but I couldn't abxed the difference.
So maybe 2 pass has a bigger reservoir and other kind of grass called "magic" it makes no sense for audio CBR encoding. If anyone doesn't agree provide samples where 2 pass CBR is better than 1 pass for wma10.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: benwaggoner on 2008-04-13 06:19:00
Nice to see you here at HA.  I think you'll find this place somewhat subdued compared to AVSF..

Thank goodness !

Quote
Interesting you speak of 2-pass VBR WMA.  I have been using -a_codec WMA9STD -a_mode 3 -a_setting 128_44_2 for more than a year with excellent results on my portable.  I think perhaps it is underrated/underused in the lossless community, though it wasn't trivial to get the VBS command line options all sorted out.  The reason I ended up with ~128kb 2-pass VBR WMA was that during my testing, I found it maintained the best stereo imaging during intricate percussion/cymbal passages.

Cool, glad it's working out for you.

I'd probably recommend using -a_mode 4 and set a peak bitrate instad of leaving it entirely unconstrained, since devices may have a maximum supported rate. For Zune, it's 320 for audio-only files, and 192 for soundtracks in WMV files, IIRC.

Stuff like stereo seperation is a great thing to use VBR for, since it gets you the bits were you need them. I think people spend so much time sweating the hard clips they can miss that most of most full tracks aren't that hard.
Title: Multiformat Listening Test @ 64 kbps - FINISHED
Post by: vinnie97 on 2008-04-14 08:30:17
I'm still anxiously awaiting the forthcoming ~80kbps multiformat test, especially now that Ayoume has just released beta 5.5 to infuse more life into Vorbis.