If the confidence intervals are so tight there is no way these two are statistically equal.
Quite the opposite - the confidence intervals are so broad that they all overlap - so there are no winners and no losers in that sample.
BTW how many listeners do you considers too few?
To make me happy, I need at least 20 valid results/sample.
Is there a way you can upload the text files with the individual ranks for each sample tested? It is a real pain to build the tables manually from the xml files and I have to exclude ranked references from the start. I'm asking because maybe I can help with providing statistical results for these samples.
First, download the .rar package containing all the XMLs. Decompress it to an empty folder.
Then, install python and Phong's wonderful Chunky:
http://www.phong.org/chunky/
At the folder you decompressed the RAR, run
python "C:\path\to\chunky" -n --codec-file="C:\path\to\codec\list\codecs.txt" --ratings=results --warn -p 0.05
The codecs.txt should be:
1, Vorbis
2, MPC
3, Lame
4, iTunes
5, Atrac3
6, WMA
It'll create all result tables (good to be fed to friedman.exe) at the empty folder, and will discard the ranked results that haven't been ABXd to a confidence of 0.05. Chunky is just too wonderful to be true! OMG!
Regards;
Me.