Skip to main content

Topic: Multiformat Listening Test @ 128 kbps - FINISHED (Read 140952 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • guruboolez
  • [*][*][*][*][*]
  • Members (Donating)
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #175
Quote
I haven't been following developments in aoTuV lately, but has Aoyumi improved the block switching algorithm to reduce smearing on microattacks?  If not, then I might have a look at it again if I have time.  But if it's been fixed, then I won't have to worry.
[{POST_SNAPBACK}][/a]

There's still headroom for progress I'd say. During my [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=38792]last listening evaluation[/url], Vorbis was still unsharp and noisy on micro-attacks/short-impulses samples. Most often bitrate doesn't go really high on such samples (Vorbis tends to inflate the bitrate when needed - but not here).
Try with this sample for a good start

  • kurtnoise
  • [*][*][*][*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #176
Some statistical analysis by AMTuring about this listening test...


@Guru:: very nice your avatar btw...

  • Garf
  • [*][*][*][*][*]
  • Developer (Donating)
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #177
Quote
Some statistical analysis by AMTuring about this listening test...
[{POST_SNAPBACK}][/a]

Oh please, can we keep the crackpot science out of this forum? This person wasn't banned here for no reason. Just juggling around scientific words doesn't magically make anything you say sensible, lest alone correct.

Quote
It is well known that some songs are more difficult to encode than others, and they result in lower quality encoded files regardless of the encoder used. So the assumption of equal means amongst experiments is violated.

Bzzzt. This was a VBR test. Meaning, although the average was 128kbps (or slightly more), the codecs could spend as much bits as necessary to keep all clips at a constant quality. This means you cannot immediately assume the means aren't equal, in fact it should be the opposite.

So, what happens if we actually look at the data? (note that he provides many graphcs to 'illustrate' his points, except the ones where, well, the data doesn't support his claims anywhere) The variance of the means of the samples is much less than the difference between the codecs themselves. (exluding Shine, which is CBR)

In other words, VBR works. I would have thought that that was "well known" by now.

Quote
The following table shows the Tukey HSD applied to the ranks.

Say what? You cannot apply plain Tukey HSD to rank scores, it's a parametric test. Now, I'm willing to argue that we shouldn't use parametric analysis (because the top end of the results clips at 5.0, and you can see this by observing that the lower rated the codec, the higher the variance). However, if anything parametric analysis gives stronger results. If you use rank scores, let's actually use the rank score version of Tukey HSD to analyze the results:

Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) [a href=\"http://ff123.net/]http://ff123.net/[/url]
Nonparametric Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Nonparametric Tukey's HSD:  25.894

Ranksums:

Vorbis  iTunes  WMA      Nero    LAME   
 73.00    62.50    49.50    48.50    36.50 

-------------------------- Difference Matrix --------------------------

         iTunes  WMA      Nero    LAME   
Vorbis    10.500  23.500  24.500  36.500*
iTunes            13.000  14.000  26.000*
WMA                          1.000  13.000 
Nero                                12.000 
-----------------------------------------------------------------------

Vorbis is better than LAME
iTunes is better than LAME

Gee, where did those "extra" conclusions go?
Let's compare this to the means with parametric Tukey HSD:

Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) [url=http://ff123.net/]http://ff123.net/[/url]
Tukey HSD analysis

Number of listeners: 18
Critical significance:  0.05
Tukey's HSD:  0.110

Means:

Vorbis  iTunes  WMA      Nero    LAME   
  4.79    4.74    4.70    4.68    4.60 

-------------------------- Difference Matrix --------------------------

         iTunes  WMA      Nero    LAME   
Vorbis    0.049    0.090    0.106    0.193*
iTunes              0.041    0.056    0.143*
WMA                          0.016    0.103 
Nero                                  0.087 
-----------------------------------------------------------------------

Vorbis is better than LAME
iTunes is better than LAME

Coincidence? Hardly. If you derive the rank scores from the means, how can you expect a different conclusion? How would you expect throwing away information to increase the significance? It won't, unless you use a completely wrong analysis method.
  • Last Edit: 16 February, 2006, 04:23:57 AM by Garf

  • bug80
  • [*][*][*][*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #178
Maybe it is a good idea to post your reply on Doom9 also, Garf.

  • ff123
  • [*][*][*][*][*]
  • Developer (Donating)
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #179
I posted a reply on doom9, suggesting he use bootstrap resampling if he's hard up to analyze the data some more.

ff123

  • detokaal
  • [*][*][*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #180
Quote
Quote
Some statistical analysis by AMTuring about this listening test...
[a href="index.php?act=findpost&pid=364680"][{POST_SNAPBACK}][/a]

Bzzzt.


I like this 

  • torok
  • [*][*][*][*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #181
Did anyone else notice that when you normalize for bitrate, you get this:

iTunes = 4.83
AoTuV = 4.55
lame = 4.5
WMA = 4.84

Which looks like it makes iTunes and WMA tied for first and the rest in second. Now, I'm not sure if it's reasonable to normalize like that (it's asssuming that quality of a codec is directly 1:1 with bitrate), but it's an interesting thought.

  • Busemann
  • [*][*][*][*][*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #182
Quote
Did anyone else notice that when you normalize for bitrate, you get this:

iTunes = 4.83
AoTuV = 4.55
lame = 4.5
WMA = 4.84

Which looks like it makes iTunes and WMA tied for first and the rest in second. Now, I'm not sure if it's reasonable to normalize like that (it's asssuming that quality of a codec is directly 1:1 with bitrate), but it's an interesting thought.
[a href="index.php?act=findpost&pid=373975"][{POST_SNAPBACK}][/a]


The only fair way to test this would be to use CBR.

  • Mardel
  • [*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #183
Hi!

I look at the 128 kbps listening test on your site, and i cant accept that aoTuV vorbis worse than aac.
And then I realize that vorbis or aac q settings are bad. First thing that is the vorbis use comma and not dot for decimal.
Second thing the ogg file is smaller than aac approx. 0,5 - 0,9 MB lesser.
Then i looked the average bitrates in foobar2000 the ogg's bitrate 20 kbit lesser than the aac.

e.g.: 158 kbps vs 138 kpbs

I encoded same track and another track with vorbis -q 4,99 settings and the bitrate difference was 1-2 kbit and the file size difference was 0-0,1 MB.

Overall your tests doesn't give a direction for which codec is better if the codec settings are wrong.
  • Last Edit: 25 September, 2011, 06:35:28 AM by db1989
Wavpack -hh or TAK -pMax
OggVorbis aoTuVb6.03 -q 4

  • [JAZ]
  • [*][*][*][*][*]
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #184
The vorbis commandline uses comma or dot depending on the localization of the OS where it is running.
As you've found, if the setting being used was wrong, the bitrate would have been much different.

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #185
Second thing the ogg file is smaller than aac approx. 0,5 - 0,9 MB lesser.
Then i looked the average bitrates in foobar2000 the ogg's bitrate 20 kbit lesser than the aac.

e.g.: 158 kbps vs 138 kpbs

Overall your tests doesn't give a direction for which codec is better if the codec settings are wrong.
Given that many modern codecs perform better, or only, in VBR or ABR modes, listening tests reporting a bitrate such as this one are generally based on the principle of tuning the codec settings to obtain that bitrate, or as close as possible to it, as the mean bitrate over the set of audio files being tested. Thus, variation between different encodes of one file are to be expected and do not invalidate the methodology (inasmuch as there doesn’t seem to be another solution; this is as close to fair as is possible).

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #186
I look at the 128 kbps listening test on your site, and i cant accept that aoTuV vorbis worse than aac.

Sorry, but which test are we talking about? The test discussed in this thread is MP3 only.

Edit: Thanks lvqcl! Must be http://soundexpert.org/encoders-128-kbps then, which is from mid 2006.

Chris
  • Last Edit: 25 September, 2011, 08:28:02 AM by C.R.Helmrich
If I don't reply to your reply, it means I agree with you.

  • lvqcl
  • [*][*][*][*][*]
  • Developer
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #187
Sorry, but which test are we talking about? The test discussed in this thread is MP3 only.


IIRC that post was moved from http://www.hydrogenaudio.org/forums/index....showtopic=77708 to this thread.

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Multiformat Listening Test @ 128 kbps - FINISHED
Reply #188
Thanks for pointing out my silly mistake; the irony is not lost on me…  That’s what I get for barely reading! Re(re)located to the correct thread.