## New Public Multiformat Listening Test (Jan 2014)

Alright, nice. So the variability does already drop a load due to that. What's the analysis you used for analyzing samples and listeners separately, i.e. the original graph you posted? multi-way ANOVA? I'd be curious to see the (corrected for multiple comparisons) p-values then. I agree they're overstated in the original results. I have my reservations about ANOVA as well, due to the clipping at 5.0, but doing a bootstrap with dependent samples is out of my league so I think it's the best we can do for now.

I tried the blocked bootstrapping confidence interval estimation, using the 280 raw results.

It's almost the same as the squashed version. You've said that "All the information about variability that you get from multiple listeners is forever gone", but I can say that data is not lost by the squashing.

As for p-value, the program would be way harder than the CI estimation, but shouldn't be very different from the ANOVA of the squashed version.

FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/

Blocked ANOVA analysis

Number of listeners: 20

Critical significance: 0.05

Significance of data: 3.91E-013 (highly significant)

---------------------------------------------------------------

ANOVA Table for Randomized Block Designs Using Ratings

Source of Degrees Sum of Mean

variation of Freedom squares Square F p

Total 99 18.63

Testers (blocks) 19 6.48

Codecs eval'd 4 6.87 1.72 24.74 3.91E-013

Error 76 5.28 0.07

---------------------------------------------------------------

Fisher's protected LSD for ANOVA: 0.166

Means:

CVBR TVBR FhG CT Nero

4.42 4.36 4.26 4.08 3.69

---------------------------- p-value Matrix ---------------------------

TVBR FhG CT Nero

CVBR 0.523 0.068 0.000* 0.000*

TVBR 0.229 0.001* 0.000*

FhG 0.028* 0.000*

CT 0.000*

-----------------------------------------------------------------------

CVBR is better than CT, Nero

TVBR is better than CT, Nero

FhG is better than CT, Nero

CT is better than Nero