It disturbs me that you don't ABX a fixed number of rounds.
Perhaps "disturbs" is the wrong word, but why do you not pick a number of rounds and stick with it for all testing?
Are you watching the results and adjusting the number of rounds "on the fly"?
I did 2 rounds on those ABX tests, those results are from my second test.
Anyway I did another round but on ABC Hidden Reference this time, which makes the test more harder.
LAME 3.97 -V0 --vbr-new
1 of 1, p = 0.500
2 of 2, p = 0.250
2 of 3, p = 0.500
3 of 4, p = 0.313
4 of 5, p = 0.188
5 of 6, p = 0.109
5 of 7, p = 0.227
6 of 8, p = 0.145
7 of 9, p = 0.090
7 of 10, p = 0.172
7 of 11, p = 0.274
8 of 12, p = 0.194
9 of 13, p = 0.133
10 of 14, p = 0.090
11 of 15, p = 0.059
12 of 16, p = 0.038
13 of 17, p = 0.025
14 of 18, p = 0.015
15 of 19, p = 0.010
16 of 20, p = 0.006
ABC/HR Version 1.0, 6 May 2004
Testname:
1L = C:\Temp\Dope Hat V0.wav
---------------------------------------
General Comments:
---------------------------------------
ABX Results:
Original vs C:\Temp\Dope Hat V0.wav
16 out of 20, pval = 0.006