Skip to main content
Topic: Multiformat Listening Test @ 48 kbps - FINISHED (Read 81621 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #25
... German Speech sample - average bit rate 31 kbps - and the codec was still performing best on average - even compared to much higher bit rate contenders ...
Interesting that you mentioned this sample. Surprisingly I found all contenders to be unusable for encoding it even though it is "only" a speech sample. The strong echo effect made the speech unpleasant and very different from the original - like the person speaking was moved to a cave. I actually found the low anchor better than the contenders with this sample.

My results:

Code: [Select]
ABC/HR for Java, Version 0.52b, 08 joulukuu 2006
Testname: Sample16: spmg54_1

Tester: Alex B

1R = Sample16\Sample16_1.wav
2L = Sample16\Sample16_5.wav
3L = Sample16\Sample16_4.wav
4L = Sample16\Sample16_6.wav
5L = Sample16\Sample16_3.wav
6R = Sample16\Sample16_2.wav

Ratings on a scale from 1.0 to 5.0

---------------------------------------
General Comments:
---------------------------------------
1R File: Sample16\Sample16_1.wav
1R Rating: 1.0
1R Comment: strong echo
---------------------------------------
2L File: Sample16\Sample16_5.wav
2L Rating: 4.5
2L Comment:
---------------------------------------
3L File: Sample16\Sample16_4.wav
3L Rating: 1.4
3L Comment: some echo
---------------------------------------
4L File: Sample16\Sample16_6.wav
4L Rating: 1.8
4L Comment: low anchor - most lowpassed, but more pleasant than the echo effect that the other contenders produce
---------------------------------------
5L File: Sample16\Sample16_3.wav
5L Rating: 1.3
5L Comment: distorded, some echo
---------------------------------------
6R File: Sample16\Sample16_2.wav
6R Rating: 1.3
6R Comment: echo
---------------------------------------

ABX Results:
Original vs Sample16\Sample16_5.wav
    8 out of 8, pval = 0.0030


---- Detailed ABX results ----
Original vs Sample16\Sample16_5.wav
Playback Range: 06.662 to 09.154
    2:15:23 AM p 1/1 pval = 0.5
    2:15:47 AM p 2/2 pval = 0.25
    2:15:57 AM p 3/3 pval = 0.125
    2:16:02 AM p 4/4 pval = 0.062
    2:16:06 AM p 5/5 pval = 0.031
    2:16:10 AM p 6/6 pval = 0.015
    2:16:16 AM p 7/7 pval = 0.0070
    2:16:30 AM p 8/8 pval = 0.0030

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #26
Great test, thanks!
Two interesting points:

• iTunes AAC CBR at 96 kbps as high anchor get the same score than iTunes AAC at VBR 128 kbps when tested as competitor.

The obvious reason for this is the low overall quality. With most samples I found iTunes 96 kbps to be in the range of 3-4 instead of 4.5 or over. I had no difficulties in distinguishing it. I ABXed the high anchor a few times, but that only confirmed what I already knew after the initial listening. In my opinion the codecs in the 128 kbps test were clearly better and almost transparent with many samples.

Quote
• quality varies a lot with samples. It means that at such low bitrate VBR doesn't imply constant quality.
eig, aquatisme, spmg54 and bibilolo were all ranked under 3.0/5 with all competitors (with one exception for bibilolo = 3.16 with vorbis). On the other side other samples like locomotive breath, symphony metal, bebussy, white america, are close to transparency.
In other words, current encoders could sound pretty well but also poorly. Quality is simply unstable - and VBR doesn't really help.

I agree, the quality is quite unstable and in my opinion generally too low for pleasant listening experience even with a typical portable device. I quess that e.g. 64 kbps would already be much better for these encoders.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #27
@Alex B> You're not the only one to dislike the coding effects of all competitors with the German spoken voice. The average mark for this sample is very low (maybe the lowest one of all 20 samples). That's really interesting: voice is often considered as "non-complex", easy to encode stuff. As a consequence ultra-low bitrate is often used for DVD ripping and de facto considered as "good enough" for this kind of material. This sample is probably not representative of DVD soundtrack; noneless it should warn people about the possible damage such low bitrate can cause on video DVD. From my experience I must say that I often heard such very disturbing artefacts on soundtrack encoded at this bitrate.

> about iTunes AAC@96: I was tempted myself to lower the mark in order to make it match with the ABC/HR scale (i.e. below "perceptible but not annoying"). But I didn't, in order to gain some headroom to rank the other competitors. Therefore I didn't bother with the high anchor and I give it a high score each time. I wouldn't give the same score to this encoder in a different situation (like a 96 kbps listening test or, worse, a 180 kbps with iTunes@96 as low anchor). It's a limit to cross-comparison of different listening test.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #28
I also ranked that one very low (highest was 1.5), it was pretty unbearable. Even with ABX I had such a hard time picking out iTunes at 96 that I just stopped trying so hard on it. I generally ranked it 4-4.5 (when I bothered). Guess my ears aren't as golden as guruboolez's.

How did you rank TomsDiner though, guru?

opposite for me. at ~96kbps, Vorbis does a better job than HE-AAC IMO. i've only tested it with 2 songs but Vorbis still outperformed HE-AAC


You do realize that Vorbis was tested at 48kbps, don't you? HE-AAC has been shown to not really be worth it above about 80kbps.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #29
How did you rank TomsDiner though, guru?


Code: [Select]
ABC/HR for Java, Version 0.52b, 07 décembre 2006
Testname: Sample05: TomsDiner

Tester:

1L = Sample05\Sample05_5.wav
2R = Sample05\Sample05_6.wav
3R = Sample05\Sample05_4.wav
4L = Sample05\Sample05_1.wav
5L = Sample05\Sample05_2.wav
6L = Sample05\Sample05_3.wav

Ratings on a scale from 1.0 to 5.0

---------------------------------------
General Comments:
---------------------------------------
2R File: Sample05\Sample05_6.wav
2R Rating: 2.5
2R Comment: very smooth sound; less agressive to my ears than the reference. Nonetheless not acceptable as coding tool...
---------------------------------------
3R File: Sample05\Sample05_4.wav
3R Rating: 3.5
3R Comment:
---------------------------------------
4L File: Sample05\Sample05_1.wav
4L Rating: 3.5
4L Comment:
---------------------------------------
5L File: Sample05\Sample05_2.wav
5L Rating: 3.0
5L Comment:
---------------------------------------
6L File: Sample05\Sample05_3.wav
6L Rating: 2.5
6L Comment:
---------------------------------------

ABX Results:


Do we know which number correspond to each encoder?

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #30
Wma std has a very good performance, compared to iTunes, especially considering that theorically, there is nothing in the wma std format that would provide better efficiency than AAC-LC.

Stanley, if you are reading this: no SBR and no intensity stereo? Obviously you're going to be ranked quite lower than competitors, and I'm wondering if there is anything planned regarding this.
(yes, Lame is in the exact same situation, with no code to handle low bitrates...)

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #31
I agree, the quality is quite unstable and in my opinion generally too low for pleasant listening experience even with a typical portable device. I quess that e.g. 64 kbps would already be much better for these encoders.


I agree to that guess, especially in the case of aoTuV beta 5. After the codec's release I ABXed a few samples to find a bitrate that suits my portable player. Although my hearing's untrained (haven't done much ABXing so far) it wasn't too hard to distinguish the 48 kbps samples from the original ones. But I already was in serious trouble hearing differences between most 64 kbps samples and the .wav source files, which led me to the decision to choose this bitrate for the portable player. Too bad I don't have the logs anymore, they would have been good examples to see the improvements in quality from the -q-1 to -q0 step.


Besides, thanks @everyone for the comments to my statement about CBR vs. VBR in my first reply. I was under the apparently false impression that VBR's always an indicator for better quality, therefore I didn't even bother carrying out any tests to arrive at an objective conclusion.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #32
The link printed in the graph is broken.


No it's not.  Apache is set to redirect to the correct document and it works fine for me.

Do we know which number correspond to each encoder?


Damn, deleted my list already and I forgot to post. IIRC, it should be:

1, Vorbis
2, Nero
3, WMA Std.
4, WMA Pro.
5, High Anchor
6, Low Anchor


Multiformat Listening Test @ 48 kbps - FINISHED

Reply #34
 
How could it be obvious? Testers only discovered their own results since the publication of the encryption key (i.e. few hours ago).

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #35
For me, the contents of a sample zip file made the order obvious:

Code: [Select]
#  Archive U:\48test\Sample01.zip
2006-11-19 23:54        Folder        Folder  Sample01
2006-11-19 23:49       4880852       4881597  Sample01\Sample01.wv
2006-11-19 23:17        258820        256617  Sample01\Sample01_1.ogg
2006-11-19 23:19        275938        269579  Sample01\Sample01_2.mp4
2006-11-19 23:48       3869678       3870273  Sample01\Sample01_3.wv
2006-11-19 23:48       4865678       4866423  Sample01\Sample01_4.wv
2006-11-19 23:33        570634        520673  Sample01\Sample01_5.m4a
2006-11-19 23:39        310726        260186  Sample01\Sample01_6.m4a
#
# Total                   Size        Packed  Files
#                     15032326      14925348  8

The numbers 5 (high anchor) and 6 (low anchor) are obvious and also 3 and 4 (WMA Standard & Pro)

Also, the test results are presented in the same order (1-6).


Multiformat Listening Test @ 48 kbps - FINISHED

Reply #37
The testers should not know which number is which encoder, otherwise they are at risk of bias.

No risk for that because of well-conceived ABC/HR methodology.

Anyway it's good that over 20 results per sample were submitted only in 2 weeks. We can expect that next test goes well.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #38
For me, the contents of a sample zip file made the order obvious:

Indeed, it can't be more obvious... I'm a nut 

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #39

The testers should not know which number is which encoder, otherwise they are at risk of bias.

No risk for that because of well-conceived ABC/HR methodology.

As haregoo said, there's no risk. In the actual test the order is scrambled and cannot be determined by examining the sample files.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #40
It's not directly stated what is required and what is optional.

That tutorial should probably be updated to at least show screenshots of abchr-java.  ABX is emphasized so much on HA that it may come as a surprise to some people that not all double-blind tests require this type of repeated result.


It would be nice if next time all conditions will be described _clearly_ in readme.txt because it's not possible for all people to follow all tips from that were disucced in topics.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #41


The link printed in the graph is broken.


No it's not.  Apache is set to redirect to the correct document and it works fine for me.

Thanks, working now 


I did not change anything so I am wondering how it didn't work before while now it is.

It would be nice if next time all conditions will be described _clearly_ in readme.txt because it's not possible for all people to follow all tips from that were disucced in topics.


At no place I did write that ABX tests are mandatory in order to participate.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #42
At no place I did write that ABX tests are mandatory in order to participate.

the same valid way to say:
At no place you did write that ABX test can be omited.....



Multiformat Listening Test @ 48 kbps - FINISHED

Reply #45
@Alex B> You're not the only one to dislike the coding effects of all competitors with the German spoken voice. The average mark for this sample is very low (maybe the lowest one of all 20 samples). That's really interesting: voice is often considered as "non-complex", easy to encode stuff. As a consequence ultra-low bitrate is often used for DVD ripping and de facto considered as "good enough" for this kind of material. This sample is probably not representative of DVD soundtrack; noneless it should warn people about the possible damage such low bitrate can cause on video DVD. From my experience I must say that I often heard such very disturbing artefacts on soundtrack encoded at this bitrate.


Agreed.

German Speech sample is a typical example of a sound sample which generates "double-speak" type of artifacts, which are actually one type of general pre-echo artifacts.

TNS was actually invented with this problem mainly in mind - so, by no means, speech samples like this are easy to be encoded.

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #46
> about iTunes AAC@96: I was tempted myself to lower the mark in order to make it match with the ABC/HR scale (i.e. below "perceptible but not annoying"). But I didn't, in order to gain some headroom to rank the other competitors. Therefore I didn't bother with the high anchor and I give it a high score each time. I wouldn't give the same score to this encoder in a different situation (like a 96 kbps listening test or, worse, a 180 kbps with iTunes@96 as low anchor). It's a limit to cross-comparison of different listening test.


I am wondering if most of the participants did it like this and didn't bother about giving correct score to high anchor and just gave it 5?
And was it also the case that most people didn't bother and just gave 1 to low anchor?


Multiformat Listening Test @ 48 kbps - FINISHED

Reply #48
I am wondering if most of the participants did it like this and didn't bother about giving correct score to high anchor and just gave it 5?
And was it also the case that most people didn't bother and just gave 1 to low anchor?

Not me, I certainly didn't - my poor ears had a hard time distinguishing even the HE AAC contender from the reference in most cases.    The low anchor sounded much crappier than the contenders IMHO, except perhaps for 1 sample.

Thanks for this great test, Sebastian!

Multiformat Listening Test @ 48 kbps - FINISHED

Reply #49
Wma std has a very good performance, compared to iTunes, especially considering that theorically, there is nothing in the wma std format that would provide better efficiency than AAC-LC.



Out of curiousity, how do you know that, and could you direct me to the documentation?
-----
J. D. (jj) Johnston

 
SimplePortal 1.0.0 RC1 © 2008-2020