Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Multiformat Listening Test @ 128 kbps - FINISHED (Read 165688 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #175
Quote
I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks?  If not, then I might have a look at it again if I have time.  But if it's been fixed, then I won't have to worry.
[{POST_SNAPBACK}][/a]

There's still headroom for progress I'd say. During my [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=38792]last listening evaluation[/url], Vorbis was still unsharp and noisy on micro-attacks/short-impulses samples. Most often bitrate doesn't go really high on such samples (Vorbis tends to inflate the bitrate when needed - but not here).
Try with this sample for a good start

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #176
Some statistical analysis by AMTuring about this listening test...


@Guru:: very nice your avatar btw...

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #177
Quote
Some statistical analysis by AMTuring about this listening test...
[{POST_SNAPBACK}][/a]

Oh please, can we keep the crackpot science out of this forum? This person wasn't banned here for no reason. Just juggling around scientific words doesn't magically make anything you say sensible, lest alone correct.

Quote
It is well known that some songs are more difficult to encode than others, and they result in lower quality encoded files regardless of the encoder used. So the assumption of equal means amongst experiments is violated.

Bzzzt. This was a VBR test. Meaning, although the average was 128kbps (or slightly more), the codecs could spend as much bits as necessary to keep all clips at a constant quality. This means you cannot immediately assume the means aren't equal, in fact it should be the opposite.

So, what happens if we actually look at the data? (note that he provides many graphcs to 'illustrate' his points, except the ones where, well, the data doesn't support his claims anywhere) The variance of the means of the samples is much less than the difference between the codecs themselves. (exluding Shine, which is CBR)

In other words, VBR works. I would have thought that that was "well known" by now.

Quote
The following table shows the Tukey HSD applied to the ranks.

Say what? You cannot apply plain Tukey HSD to rank scores, it's a parametric test. Now, I'm willing to argue that we shouldn't use parametric analysis (because the top end of the results clips at 5.0, and you can see this by observing that the lower rated the codec, the higher the variance). However, if anything parametric analysis gives stronger results. If you use rank scores, let's actually use the rank score version of Tukey HSD to analyze the results:

Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) [a href=\"http://ff123.net/]http://ff123.net/[/url]
Nonparametric Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Nonparametric Tukey's HSD:  25.894

Ranksums:

Vorbis  iTunes  WMA      Nero    LAME   
 73.00    62.50    49.50    48.50    36.50 

-------------------------- Difference Matrix --------------------------

        iTunes  WMA      Nero    LAME   
Vorbis    10.500  23.500  24.500  36.500*
iTunes            13.000  14.000  26.000*
WMA                          1.000  13.000 
Nero                                12.000 
-----------------------------------------------------------------------

Vorbis is better than LAME
iTunes is better than LAME

Gee, where did those "extra" conclusions go?
Let's compare this to the means with parametric Tukey HSD:

Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) [url=http://ff123.net/]http://ff123.net/[/url]
Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Tukey's HSD:  0.110

Means:

Vorbis  iTunes  WMA      Nero    LAME   
  4.79    4.74    4.70    4.68    4.60 

-------------------------- Difference Matrix --------------------------

        iTunes  WMA      Nero    LAME   
Vorbis    0.049    0.090    0.106    0.193*
iTunes              0.041    0.056    0.143*
WMA                          0.016    0.103 
Nero                                  0.087 
-----------------------------------------------------------------------

Vorbis is better than LAME
iTunes is better than LAME

Coincidence? Hardly. If you derive the rank scores from the means, how can you expect a different conclusion? How would you expect throwing away information to increase the significance? It won't, unless you use a completely wrong analysis method.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #178
Maybe it is a good idea to post your reply on Doom9 also, Garf.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #179
I posted a reply on doom9, suggesting he use bootstrap resampling if he's hard up to analyze the data some more.

ff123

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #180
Quote
Quote
Some statistical analysis by AMTuring about this listening test...
[a href="index.php?act=findpost&pid=364680"][{POST_SNAPBACK}][/a]

Bzzzt.


I like this 

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #181
Did anyone else notice that when you normalize for bitrate, you get this:

iTunes = 4.83
AoTuV = 4.55
lame = 4.5
WMA = 4.84

Which looks like it makes iTunes and WMA tied for first and the rest in second. Now, I'm not sure if it's reasonable to normalize like that (it's asssuming that quality of a codec is directly 1:1 with bitrate), but it's an interesting thought.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #182
Quote
Did anyone else notice that when you normalize for bitrate, you get this:

iTunes = 4.83
AoTuV = 4.55
lame = 4.5
WMA = 4.84

Which looks like it makes iTunes and WMA tied for first and the rest in second. Now, I'm not sure if it's reasonable to normalize like that (it's asssuming that quality of a codec is directly 1:1 with bitrate), but it's an interesting thought.
[a href="index.php?act=findpost&pid=373975"][{POST_SNAPBACK}][/a]


The only fair way to test this would be to use CBR.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #183
Hi!

I look at the 128 kbps listening test on your site, and i cant accept that aoTuV vorbis worse than aac.
And then I realize that vorbis or aac q settings are bad. First thing that is the vorbis use comma and not dot for decimal.
Second thing the ogg file is smaller than aac approx. 0,5 - 0,9 MB lesser.
Then i looked the average bitrates in foobar2000 the ogg's bitrate 20 kbit lesser than the aac.

e.g.: 158 kbps vs 138 kpbs

I encoded same track and another track with vorbis -q 4,99 settings and the bitrate difference was 1-2 kbit and the file size difference was 0-0,1 MB.

Overall your tests doesn't give a direction for which codec is better if the codec settings are wrong.
Wavpack -hh or TAK -pMax
OggVorbis aoTuVb6.03 -q 4

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #184
The vorbis commandline uses comma or dot depending on the localization of the OS where it is running.
As you've found, if the setting being used was wrong, the bitrate would have been much different.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #185
Second thing the ogg file is smaller than aac approx. 0,5 - 0,9 MB lesser.
Then i looked the average bitrates in foobar2000 the ogg's bitrate 20 kbit lesser than the aac.

e.g.: 158 kbps vs 138 kpbs

Overall your tests doesn't give a direction for which codec is better if the codec settings are wrong.
Given that many modern codecs perform better, or only, in VBR or ABR modes, listening tests reporting a bitrate such as this one are generally based on the principle of tuning the codec settings to obtain that bitrate, or as close as possible to it, as the mean bitrate over the set of audio files being tested. Thus, variation between different encodes of one file are to be expected and do not invalidate the methodology (inasmuch as there doesn’t seem to be another solution; this is as close to fair as is possible).

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #186
I look at the 128 kbps listening test on your site, and i cant accept that aoTuV vorbis worse than aac.

Sorry, but which test are we talking about? The test discussed in this thread is MP3 only.

Edit: Thanks lvqcl! Must be http://soundexpert.org/encoders-128-kbps then, which is from mid 2006.

Chris
If I don't reply to your reply, it means I agree with you.


 

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #188
Thanks for pointing out my silly mistake; the irony is not lost on me…  That’s what I get for barely reading! Re(re)located to the correct thread.