## 128kbps Extension Test - FINISHED

#####
Reply #162 –

Great test Roberto.

I think from my point of view, I'd prefer an encoder that doesn't trip up very badly very often, even if its average score were a little lower.

Now, WMA Pro tripped up badly once. Perhaps it was bad luck and with other samples another codec would trip up, so statistical information isn't perfect.

However, I tabulated the mean scores (read from your graphs) and estimated the standard deviation.

Assuming all test samples are similarly distributed in terms of encoder variability, and assuming a "normal" or "gaussian" distribution, the average minus one sigma and average minus two sigma give a guide to the worst behaviour we're likely to see:

Track AAC Lame MPC Vorbis WMAPro Blade

41_30sec 4.36 3.3 4.33 4.2 3.97 1.4

ATrain 4.41 3.78 4.37 4.17 4.48 3.05

Bachpsic 4.5 3.41 4.66 4.51 4.8 2.9

Blackwat 4.62 3.92 4.71 4.38 4.56 2.18

death2 4.35 3.62 4.67 4.18 2.7 1.27

flooress 4.08 3.68 4.52 4.57 4.25 1.7

layla 4.15 3.59 4.4 4.24 4.45 1.83

macabre 4.59 4.06 4.55 4.54 4.86 3.16

midnight 4.56 3.42 4.43 4.26 4.38 2.39

thear1 4.69 4.16 4.48 4.11 4.44 2.41

thesourc 4.61 4.33 4.62 4.43 4.87 2.36

waiting 4.13 2.71 4.35 3.78 3.88 1.99

AvgScore 4.42 3.67 4.51 4.28 4.30 2.22

Std.Dev 0.21 0.44 0.13 0.22 0.59 0.62

-1 sigma 4.21 3.23 4.37 4.06 3.71 1.60

-2 sigma 4.00 2.79 4.24 3.84 3.11 0.99

-3 sigma 3.79 2.35 4.10 3.61 2.52 0.37

-1 sigma pt = 84.13% p(new sample < this value)

-2 sigma pt = 97.72% p(new sample < this value)

-3 sigma pt = 99.87% p(new sample < this value)

sd/sqrt12 0.06 0.13 0.04 0.06 0.17 0.18

errorbar 0.12 0.25 0.08 0.13 0.34 0.36

The probabilities at the end refer to the inverse normal distribution and the chances of getting a value worse than the -1 sigma point etc. if you chose a new sample at random and had the same listeners test it.

This is the result at the average minus 2-sigma point:

The errorbar line is based on the estimated error in the mean score, which I'd use to find the best rated codec overall on a mean score basis = 2*(Std Dev / Sqrt(12))

Just my thoughts. Many thanks to those who tested (I didn't have time, or probably the artifact training to join in)

By my criterion, of not failing badly, MPC wins over AAC, Vorbis, WMAPro, LAME, Blade.

(Edit: Note, I posted the wrong image originally, so please refresh if the top graph doesn't match the scores or this order)

DickD

P.S. Hmm, I wonder if WMAPro did badly only because it was using 2 passes to aim at 128 kbps for the specific short sample tested. Perhaps it's fairer to use it in a one-pass mode that averages at 128 kbps over many albums.