New Public Multiformat Listening Test (Jan 2014)
Reply #130 – 2013-12-11 20:06:25
Alright, nice. So the variability does already drop a load due to that. What's the analysis you used for analyzing samples and listeners separately, i.e. the original graph you posted? multi-way ANOVA? I'd be curious to see the (corrected for multiple comparisons) p-values then. I agree they're overstated in the original results. I have my reservations about ANOVA as well, due to the clipping at 5.0, but doing a bootstrap with dependent samples is out of my league so I think it's the best we can do for now. I tried the blocked bootstrapping confidence interval estimation, using the 280 raw results. It's almost the same as the squashed version. You've said that "All the information about variability that you get from multiple listeners is forever gone", but I can say that data is not lost by the squashing. As for p-value, the program would be way harder than the CI estimation, but shouldn't be very different from the ANOVA of the squashed version.FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/ Blocked ANOVA analysis Number of listeners: 20 Critical significance: 0.05 Significance of data: 3.91E-013 (highly significant) --------------------------------------------------------------- ANOVA Table for Randomized Block Designs Using Ratings Source of Degrees Sum of Mean variation of Freedom squares Square F p Total 99 18.63 Testers (blocks) 19 6.48 Codecs eval'd 4 6.87 1.72 24.74 3.91E-013 Error 76 5.28 0.07 --------------------------------------------------------------- Fisher's protected LSD for ANOVA: 0.166 Means: CVBR TVBR FhG CT Nero 4.42 4.36 4.26 4.08 3.69 ---------------------------- p-value Matrix --------------------------- TVBR FhG CT Nero CVBR 0.523 0.068 0.000* 0.000* TVBR 0.229 0.001* 0.000* FhG 0.028* 0.000* CT 0.000* ----------------------------------------------------------------------- CVBR is better than CT, Nero TVBR is better than CT, Nero FhG is better than CT, Nero CT is better than Nero