Multiformat Listening Test @ 128 kbps - FINISHED

Topic: Multiformat Listening Test @ 128 kbps - FINISHED (Read 173388 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #175 – 2006-01-23 13:52:26

Quote

I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks? If not, then I might have a look at it again if I have time. But if it's been fixed, then I won't have to worry.
[{POST_SNAPBACK}][/a]

There's still headroom for progress I'd say. During my [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=38792]last listening evaluation[/url], Vorbis was still unsharp and noisy on micro-attacks/short-impulses samples. Most often bitrate doesn't go really high on such samples (Vorbis tends to inflate the bitrate when needed - but not here).
Try with this sample for a good start

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #176 – 2006-02-16 07:38:47

Some statistical analysis by AMTuring about this listening test...

@Guru:: very nice your avatar btw...

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #177 – 2006-02-16 09:04:32

Quote

Some statistical analysis by AMTuring about this listening test...
[{POST_SNAPBACK}][/a]

Oh please, can we keep the crackpot science out of this forum? This person wasn't banned here for no reason. Just juggling around scientific words doesn't magically make anything you say sensible, lest alone correct.

Quote

It is well known that some songs are more difficult to encode than others, and they result in lower quality encoded files regardless of the encoder used. So the assumption of equal means amongst experiments is violated.

Bzzzt. This was a VBR test. Meaning, although the average was 128kbps (or slightly more), the codecs could spend as much bits as necessary to keep all clips at a constant quality. This means you cannot immediately assume the means aren't equal, in fact it should be the opposite.

So, what happens if we actually look at the data? (note that he provides many graphcs to 'illustrate' his points, except the ones where, well, the data doesn't support his claims anywhere) The variance of the means of the samples is much less than the difference between the codecs themselves. (exluding Shine, which is CBR)

In other words, VBR works. I would have thought that that was "well known" by now.

Quote

The following table shows the Tukey HSD applied to the ranks.

Say what? You cannot apply plain Tukey HSD to rank scores, it's a parametric test. Now, I'm willing to argue that we shouldn't use parametric analysis (because the top end of the results clips at 5.0, and you can see this by observing that the lower rated the codec, the higher the variance). However, if anything parametric analysis gives stronger results. If you use rank scores, let's actually use the rank score version of Tukey HSD to analyze the results:

Code: [Select]

FRIEDMAN version 1.24 (Jan 17, 2002) [a href=\"http://ff123.net/]http://ff123.net/[/url]
Nonparametric Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Nonparametric Tukey's HSD:  25.894

Ranksums:

Vorbis   iTunes   WMA      Nero     LAME     
 73.00    62.50    49.50    48.50    36.50   

-------------------------- Difference Matrix --------------------------

         iTunes   WMA      Nero     LAME     
Vorbis    10.500   23.500   24.500   36.500* 
iTunes             13.000   14.000   26.000* 
WMA                          1.000   13.000  
Nero                                 12.000  
-----------------------------------------------------------------------

Vorbis is better than LAME
iTunes is better than LAME

Gee, where did those "extra" conclusions go?
Let's compare this to the means with parametric Tukey HSD:

Code: [Select]

FRIEDMAN version 1.24 (Jan 17, 2002) [url=http://ff123.net/]http://ff123.net/[/url]
Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Tukey's HSD:   0.110

Means:

Vorbis   iTunes   WMA      Nero     LAME     
  4.79     4.74     4.70     4.68     4.60   

-------------------------- Difference Matrix --------------------------

         iTunes   WMA      Nero     LAME     
Vorbis     0.049    0.090    0.106    0.193* 
iTunes              0.041    0.056    0.143* 
WMA                          0.016    0.103  
Nero                                  0.087  
-----------------------------------------------------------------------

Vorbis is better than LAME
iTunes is better than LAME

Coincidence? Hardly. If you derive the rank scores from the means, how can you expect a different conclusion? How would you expect throwing away information to increase the significance? It won't, unless you use a completely wrong analysis method.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #178 – 2006-02-16 10:00:41

Maybe it is a good idea to post your reply on Doom9 also, Garf.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #179 – 2006-02-16 13:56:02

I posted a reply on doom9, suggesting he use bootstrap resampling if he's hard up to analyze the data some more.

ff123

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #180 – 2006-02-16 16:45:55

Quote

Quote
Some statistical analysis by AMTuring about this listening test...
[a href="index.php?act=findpost&pid=364680"][{POST_SNAPBACK}][/a]

Bzzzt.

I like this

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #181 – 2006-03-22 18:35:07

Did anyone else notice that when you normalize for bitrate, you get this:

iTunes = 4.83
AoTuV = 4.55
lame = 4.5
WMA = 4.84

Which looks like it makes iTunes and WMA tied for first and the rest in second. Now, I'm not sure if it's reasonable to normalize like that (it's asssuming that quality of a codec is directly 1:1 with bitrate), but it's an interesting thought.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #182 – 2006-03-22 18:37:37

Quote

Did anyone else notice that when you normalize for bitrate, you get this:

iTunes = 4.83
AoTuV = 4.55
lame = 4.5
WMA = 4.84

Which looks like it makes iTunes and WMA tied for first and the rest in second. Now, I'm not sure if it's reasonable to normalize like that (it's asssuming that quality of a codec is directly 1:1 with bitrate), but it's an interesting thought.
[a href="index.php?act=findpost&pid=373975"][{POST_SNAPBACK}][/a]

The only fair way to test this would be to use CBR.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #183 – 2011-09-24 22:51:45

Hi!

I look at the 128 kbps listening test on your site, and i cant accept that aoTuV vorbis worse than aac.
And then I realize that vorbis or aac q settings are bad. First thing that is the vorbis use comma and not dot for decimal.
Second thing the ogg file is smaller than aac approx. 0,5 - 0,9 MB lesser.
Then i looked the average bitrates in foobar2000 the ogg's bitrate 20 kbit lesser than the aac.

e.g.: 158 kbps vs 138 kpbs

I encoded same track and another track with vorbis -q 4,99 settings and the bitrate difference was 1-2 kbit and the file size difference was 0-0,1 MB.

Overall your tests doesn't give a direction for which codec is better if the codec settings are wrong.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #184 – 2011-09-24 23:36:22

The vorbis commandline uses comma or dot depending on the localization of the OS where it is running.
As you've found, if the setting being used was wrong, the bitrate would have been much different.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #185 – 2011-09-25 11:30:55

Quote from: Mardel on 2011-09-24 22:51:45

Second thing the ogg file is smaller than aac approx. 0,5 - 0,9 MB lesser.
Then i looked the average bitrates in foobar2000 the ogg's bitrate 20 kbit lesser than the aac.

e.g.: 158 kbps vs 138 kpbs
…
Overall your tests doesn't give a direction for which codec is better if the codec settings are wrong.

Given that many modern codecs perform better, or only, in VBR or ABR modes, listening tests reporting a bitrate such as this one are generally based on the principle of tuning the codec settings to obtain that bitrate, or as close as possible to it, as the mean bitrate over the set of audio files being tested. Thus, variation between different encodes of one file are to be expected and do not invalidate the methodology (inasmuch as there doesn’t seem to be another solution; this is as close to fair as is possible).

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #186 – 2011-09-25 13:06:24

Quote from: Mardel on 2011-09-24 22:51:45

I look at the 128 kbps listening test on your site, and i cant accept that aoTuV vorbis worse than aac.

Sorry, but which test are we talking about? The test discussed in this thread is MP3 only.

Edit: Thanks lvqcl! Must be http://soundexpert.org/encoders-128-kbps then, which is from mid 2006.

Chris

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #187 – 2011-09-25 13:13:07

Quote from: C.R.Helmrich on 2011-09-25 13:06:24

Sorry, but which test are we talking about? The test discussed in this thread is MP3 only.

IIRC that post was moved from http://www.hydrogenaudio.org/forums/index....showtopic=77708 to this thread.

Multiformat Listening Test @ 128 kbps - FINISHED

Reply #188 – 2011-09-25 13:31:33

Thanks for pointing out my silly mistake; the irony is not lost on me… That’s what I get for barely reading! Re(re)located to the correct thread.

Notice