QuoteOne more thing: I'm not sure if it hurts to look at the progress of the first 5 trials for the 28-trial profile.If you want the tool to be statistically sound than don’t let the listener see the progress at all, even at the look points. What does it add to the test anyhow? You’ve now taken steps to ensure that the listener isn’t wasting time (very nice solution btw). As far as I can tell, saving wasted time was the only valid reason for allowing the listener to watch the progress in the first place. As I’ve said before, knowing the progress of the test compromises the independence of the trials and should be avoided if possible.
One more thing: I'm not sure if it hurts to look at the progress of the first 5 trials for the 28-trial profile.
Are there any other profiles that might be useful?
QuotePerhaps a traditional 12/16 test with a look point at 6/6. This still gives 95% confidence. It could also terminate if more than 4 incorrect choices were made.
Perhaps a traditional 12/16 test with a look point at 6/6. This still gives 95% confidence. It could also terminate if more than 4 incorrect choices were made.
To answer your other point, there is no statistical advantage to being able to look at progress. This is purely driven by convenience and time savings. However, if I can make it easier and faster to perform ABX trials with only a slight cost in power, I think that's a good tradeoff.
Sorry if I seem to be pressing this... but what are the time savings of allowing the listener to track his progress? How does it make the test easier and faster? (I'm not talking about the automated looks points here, they are a good trade-off. I'm just referring to the listener being able to see his score all, or part of, the time. I think this introduces potential, though probably not very serious, problems).
The time savings arise because at each look point you can decide whether or not to stop the test early. With a strictly fixed test, the listener isn't allowed to know the results until all the trials have been completed.
I'd still argue that the same time savings could be achieved by allowing the software to deal with the look points automatically. Once a look point was reached the test would either terminate (because the desired confidence was reached) or it would go on as if nothing had happend. The listener would not have to know his exact score. This is indeed different from a strickly fixed test where the listener isn't allowed to know the results until all the trials have been completed... but not by much.
Here is the lookup table I would use for the 28-trial profile:*0 wrong: at least 6 of 6 (can't have fewer than 6 trials with 0 wrong)1 wrong: at least 9 of 10 (can't have fewer than 10 trials with 1 wrong)*2 wrong: at least 10 of 123 wrong: at least 13 of 16*4 wrong: at least 14 of 185 wrong: at least 17 of 22*6 wrong: at least 17 of 237 wrong: at least 19 of 26*8 wrong: at least 20 of 28Notes:* = look points1. overall test significance is 0.05
2. listener is not allowed to perform ABX trials past the max of 28.3. listener is allowed to see trials 1 through 5 in addition to the early-decision look points4. ABX is terminated if listener gets 9 or more trials wrong.5. listener can terminate at any time, with overall results taken from the above table.
It is true that, if the listener cannot hear a difference, his knowing the progress will not change anything.
So my question now is: how does allowing the listener to track his progress help the test? (curiosity is one reason, but are there others that I'm missing?)
This should be a mode that allows 5/5 with total significance = 0.049567: at least 5 of 5 at least 10 of 12 at least 15 of 19 at least 17 of 22Not that different from the 28 profile (though shorter), but with 5/5 possibility. Might be good for finding obvious differences.
On difficult samples, I like to know if my efforts are enough. If my score is not as good as it should be, I can try to listen more carefully (but causing more fatigue). For me this information is very useful!
QuoteIt is true that, if the listener cannot hear a difference, his knowing the progress will not change anything.Of course, you are only talking about the first 5 trials?
The accurate value appears to be 0.05080.. (according to my program/calculation).
How about something like the following, where the last look point is also spaced 6 trials from the next-to-last look point, instead of only 3 trials.5 of 5: 0.03110 of 12: 0.01915 of 19: 0.01019 of 25: 0.007
Hmm, I can't verify this using my simulator. I made the total alpha precise to 4 digits and increased the simulations to 1 million, but come up with 0.0496.
QuoteQuoteIt is true that, if the listener cannot hear a difference, his knowing the progress will not change anything.Of course, you are only talking about the first 5 trials?If the true value of p=0.5, than is doesn't matter how much the listener knows, he will always be guessing. That's all I was trying to say
It does matter, if he can stop the test when it's advantegous to him. In fact, a guessing listener could pass any traditional ABX-test if he takes enough trials with probability 1.
Quote5. listener can terminate at any time, with overall results taken from the above table.The last point is a little dubious to me. But it shouldn't affect the results too much.
5. listener can terminate at any time, with overall results taken from the above table.
But in a nutshell, I run N number of simulations of a 28 total-trial ABX session. At each look point, including the 28th trial, I count the number of times that the number of correct answers equals or exceeds the specified entry at that look point. I call this a "hit." If I get a hit at a look point, I terminate and go on to the next simulation run. Then I count all the hits and divide by the number of simulations to get the total alpha.
The only thing I can think of right now is that there is a rounding error in the calculation (there are a lot of sums in the calculation). From this standpoint, the simulation should be more accurate.
Any thoughts on the non-even spreading of the alpha error?
I might need some explaining on the macros in your spreadsheet.
Edit: anyway, there doesn't seem to be any reason to believe that the simulation would produce an oscillating effect like that, so I have to think that this is an artifact of the binomial calculation!
>alpha:=(correct,trials)->evalf(sum(binomial(trials,k)*1/2^trials,k=correct..trials));>for i from 0 to 20 do> alpha(i,20);> end do;1..9999990463256835937500000.9999799728393554687500000.9997987747192382812500000.9987115859985351562500000.9940910339355468750000000.9793052673339843750000000.9423408508300781250000000.8684120178222656250000000.7482776641845703125000000.5880985260009765625000000.4119014739990234375000000.2517223358154296875000000.1315879821777343750000000.05765914916992187500000000.02069473266601562500000000.005908966064453125000000000.001288414001464843750000000.0002012252807617187500000000.00002002716064453125000000000.9536743164062500000000000*10^-6
QuoteEdit: anyway, there doesn't seem to be any reason to believe that the simulation would produce an oscillating effect like that, so I have to think that this is an artifact of the binomial calculation! What do you mean?!Here are accurate values of alphas (again from Maple):[CODE]>alpha:=(correct,trials)->evalf(sum(binomial(trials,k)*1/2^trials,k=correct..trials));
10 million simulations using the corrected random number generator total look alpha point? 5 5 0.0313 no no looks 6 6 0.0491 yes look at trial 6 7 7 0.0156 no look at trial 6 8 7 0.0390 no look at trial 6 9 8 0.0273 no look at trial 610 9 0.0214 no look at trial 611 9 0.0406 no look at trial 612 10 0.0491 yes look at trial 6, 1213 11 0.0295 no look at trial 6, 1214 11 0.0417 no look at trial 6, 1215 12 0.0356 no look at trial 6, 1216 13 0.0326 no look at trial 6, 1217 13 0.0424 no look at trial 6, 1218 14 0.0491 yes look at trial 6, 12, 1819 14 0.0495 no look at trial 6, 12, 1820 15 0.0430 no look at trial 6, 12, 1821 16 0.0399 no look at trial 6, 12, 1822 16 0.0487 no look at trial 6, 12, 1823 17 0.0491 yes look at trial 6, 12, 18, 2324 18 0.0435 no look at trial 6, 12, 18, 2325 18 0.0490 no look at trial 6, 12, 18, 2326 19 0.0462 no look at trial 6, 12, 18, 2327 20 0.0449 no look at trial 6, 12, 18, 2328 20 0.0491 no look at trial 6, 12, 18, 23