I'm not a great specialist in statistical analysis but I feel that something is fundamentally wrong with the one I make here. I stumbled upon a simple and very basic question.Per-sample means of grades and their bootstrapped conf. intervals have clear and simple meaning. But overall average of all grades received for a codec with corresponding confidence interval seems to me meaningless, much like average temperature over a hospital; at least it is hard to interpret and compare such values. It looks more reasonable for me to compute final codec averages using per-sample means only, not all grades. While resulting averages using both methods are almost identical, the confidence intervals, interpretation and methods of further analysis are different. So my question in short - What population we consider while computing overall codec average – population of grades or population of per-sample means?I would be thankful if somebody cleared this for me.
There's a problem there. I know it's easy to design a codec wherein most subjects will give it a pass, i.e. not make any useful distinction, but a few listeners will hate, hate hate the results.This will increase the confidence bound, no matter how you look at it.