Multiformat @ 128kbps - test discussion
Reply #72 – 2004-04-04 14:40:30
Concerning now the number of samples, I think that the question that arises is: "How many listeners per sample are needed to create accurate and statistically valid results?". If the answer to that question is "around 15 listeners" then I don't think that increasing the number of the samples would help all that much (although it would be more than helpful under different testing conditions), taking into consideration the fact, that the distribution of the listeners among the different samples won't be ideal, to the effect that some samples would be evaluated from too few listeners, making results for that specific sample, statistically invalid. IMO it's much more important to get a meaningful overall result than meaningful results for every single sample. I might be wrong, but in my understanding a low number of listeners for a single sample just leads to bigger error bars. If all error bars overlap, the result won't be meaningful for that sample but still statistically valid. When calculating the total result (average rankings + error bars), 18 samples with 20 results on average should be as good as 12 samples with 30 results on average. I believe that there will be more total results (i.e. 18*20 or 12*30 in the example) submitted if people can choose from more samples. Pleas someone knowledgable correct me if my assumptions about the way results are calculated are wrong.