Cool, I'm the first to post in this forum ;-)
I've written a command-line tool in C to perform Friedman-type analyses of codec ratings. The source and win32 executable can be found in a zipfile at:
to perform an analysis of Roel's first AQ test. It should not be hard to insert this code into a server tool which performs automatic analysis after a listener submission.
I'll have to take a look at your utility, I'm sure it could come in quite handy. Thanks for distributing the source and such
The more I read about ABC/HR (hidden reference), the more I think this would be a nice way to test for small impairments (maybe not desirable for testing at 128 kbit/s or lower, though). Short description: A is always the reference. B and C are randomly assigned to be the test sample or the reference file. The listener rates B against A, and C against A. For each test, the listener is thus rating the reference file along with the test sample.
Such a method makes it easier to post-screen listeners as described in ITU-R BS. 1116-1.
I've updated the command-line statistical tool to be able to perform a blocked ANOVA analysis, along with the corresponding Fisher multimean analysis. The blocked ANOVA analysis assumes more (normal distribution of listeners, equal interval rating scale), but is more powerful than the Friedman if those assumptions are met.
1. I forgot to output the ANOVA table (there is a specific format people are used to seeing), but I did verify the final results against the example in my book.
2. Add Tukey's HSD for both the blocked ANOVA and the Friedman rank analysis. This will produce conservative results compared with the Fisher results, because it is a simultaneous multimean comparison, meaning that the result taken as a whole is significant to the desired p, instead of just the individual comparisons.
3. Add a test for normality (if I can find the appropriate reference).
I've included Roel's AQ test 1 data in the archive. A blocked ANOVA analysis can be run on it by typing:
friedman -a aq1.txt
The zip archive is located at:
http://www.worldzonesupport.com/~fastforw/...n/friedman110.z (http://www.worldzonesupport.com/~fastforw/friedman/friedman110.z) ip
Err, I think you forgot to include the AQ test 1 data. No .txt in there
I'd like to point out two things just so there's no confusion:
a) the AQ test data is not normal, so the results you'll get from doing a blocked ANOVA on it are not reliable (equal interval scale also may be doubtfull)
b) the 'The following comparisons are each true with 95.0 percent confidence' is misleading because it hides the number of actual comparisons done and the significance levels of each result.
For example: if I look at the results I see 4 pairs each with 95% confidence. As explained before, the chance that one of those is wrong is greater than 5%. Emitting that data makes it impossible to determine how big exactly.
Edit: for example, if I know there were 8 codecs, using the data (all 95% confidence) I'd get that the chance that all four presented results are correct is as low as 24%. I assume it's actually higher, but there's no way to tell from the output.
I think that presenting the data in that way makes it too easy for someone who isn't aware of the details behind it to make a wrongfull conclusion.
Of course, depending on what you want to achieve with the utility this may or may not matter.
(ack, two boards, two posts, two threads?)