HydrogenAudio

Hydrogenaudio Forum => General Audio => Topic started by: KikeG on 2003-11-10 13:39:12

Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-10 13:39:12
Edit : this discussion was started there (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=15&t=14960).

I like viewing my ABX results during the test, and stopping when I have reached low enough p. That's what some people call a "sequential" ABX test. As Pio2001 says, for such a kind of test, getting a p = 5% is not enough pass the test. It seems (someone correct me if I'm wrong, please) that, for such a kind of sequential test, getting a p<1% is is enough for saying the the test has been passed.
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 13:55:04
Quote
I like viewing my ABX results during the test, and stopping when I have reached low enough p. That's what some people call a "sequential" ABX test. As Pio2001 says, for such a kind of test, getting a p = 5% is not enough pass the test. It seems (someone correct me if I'm wrong, please) that, for such a kind of sequential test, getting a p<1% is is enough for saying the the test has been passed.

Does it mean that 5% is a complete useless value? Or does it mean that with 5-15%, there are still some (serious) presumptions about an audible difference?
I'm really interested about it.
I generally perform ABX test quickly, and doing some mistake I could avoid by being more meticulous (listening carefully to A, listening carefully to B, etc... then validate my choice). I'm sure that I can avoid most of them, and can obtain very good ABX score, because sometime I take the time for performing a precise, long and boring test. Quick tests gave me pval of 5...15%, and meticulous one are < 1%.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-10 14:18:52
5% is valid only if either you have fixed in advance the whole number of trials to perform, or you don't look to the results until the whole test is finished (or both, obviously)

I wouldn't consider a p of 10% a very reliable indication of audible difference.
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-10 14:22:03
Is there a rule of thumb for tests without fixed number of trials to get a more reallistic result?

Something like: "If the last trial was successful, it doesn't count."
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 14:25:54
Quote
5% is valid only if either you have fixed in advance the number of trials to perform, or you don't look to the results until the test is finished (or both, obviously)

I wouldn't consider a p of 10% a very reliable indication of audible difference, either.

Should I understand that in your opinion, pval = 10% or pval = 95% mean the same thing? Or is there some nuance between "valid" statement and "invalid" results?

I'm not statistician, but an average user. For common sense, a test concluding on a difference with 10% guessing only is something to take with serious consideration (especially for high bitrates encoding, and for very short samples).
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-10 14:51:04
Quote
Should I understand that in your opinion, pval = 10% or pval = 95% mean the same thing? Or is there some nuance between "valid" statement and "invalid" results?

I don't understand what you mean sorry. pval=5% (or 0.05) means 95% confidence value, and pval=10% (or 0.1) means 90% confidence value.

As a rule, a test is considered to be passed only if you achieve p<5% on a non-sequential test. It seems that p<1% is enough for sequential tests, in order to compensate for the effects Pio2001 talked about.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-10 14:53:03
Quote
Is there a rule of thumb for tests without fixed number of trials to get a more reallistic result?

Something like: "If the last trial was successful, it doesn't count."

I don't understand very well what you mean in this last sentence. I think I explained it at my previous posts. 5% is valid only under certain conditions. If not, you must go for at least 1%.
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-10 15:10:15
Quote
I don't understand very well what you mean in this last sentence.

I'll try to explain better:

IMO especially for difficult samples it's unpredictable how many trials are needed, because one one hand performance often becomes better after a few trials (training effect), on the other hand at some point it starts to become worse because of fatigue / boredom / impatience.

Because of this I would prefer to perform tests this way:

I aim to reach a certain pval score and finish as I've reached it. Example:

I want to reach p = 0.1 or lower. My results:

0 of 1, p = 1.000
1 of 2, p = 0.750
2 of 3, p = 0.500
3 of 4, p = 0.313
4 of 5, p = 0.188
4 of 6, p = 0.344
5 of 7, p = 0.227
6 of 8, p = 0.145
7 of 9, p = 0.090

Since I haven't fixed the number of trials before the result isn't really 0.090 as you explained. My question is:

How would it be possible to get a valid result (p = 0.1 or lower) without fixing the number of trials before?

"If the last trial was successful, it doesn't count." would mean that one more trial needs to be done; if it is successful (-> 8/10), the result p = 0.090 is correct, otherwise the test will continue.
Title: Probability of passing a sequencial ABX test
Post by: ScorLibran on 2003-11-10 15:18:33
Quote
I want to reach p = 0.1 or lower...

p = 0.1 = 10%.

I think you mean p = 0.01 = 1% for a sequential test, right?

(Forgive me if I'm not reading this correctly.)
Title: Probability of passing a sequencial ABX test
Post by: Lev on 2003-11-10 15:19:40
Quote
Usually, a p <= 0.05 is considered a significant result. This is pretty close though, which to me indicates that there is a very good chance that more testing would result in a statistically significant result.

So, Gabriel scored p = 0.084, or, 17 out of 26.

To me, that is nowhere near significant whatsoever.  On any given day, if I flip a coin 26 times, I could get 17 heads.  Yet Gabriel's confidence level is ~92%, which seems extortionate.  I guess its just the English I'm having trouble with - i.e. the 'confidence' word...

How is the P Value calculated, just out of interest?
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-10 15:43:14
Quote
Quote
I want to reach p = 0.1 or lower...

p = 0.1 = 10%.

I think you mean p = 0.01 = 1% for a sequential test, right?

It was just an example and I chose 0.1, not 0.01 to save space.
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-10 15:44:53
Quote
How is the P Value calculated, just out of interest?

B) (http://www.hydrogenaudio.org/forums/index.php?showtopic=14679&hl=pval) <- click!
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 16:33:22
Quote
Quote
Should I understand that in your opinion, pval = 10% or pval = 95% mean the same thing? Or is there some nuance between "valid" statement and "invalid" results?

I don't understand what you mean sorry. pval=5% (or 0.05) means 95% confidence value, and pval=10% (or 0.1) means 90% confidence value.

As a rule, a test is considered to be passed only if you achieve p<5% on a non-sequential test. It seems that p<1% is enough for sequential tests, in order to compensate for the effects Pio2001 talked about.

What I mean is: if a test is finished on a score xx/16, with a pval = 0.10, how will you consider this test? You're asking for pval=0.05 for considering a test as "passed", but with 0.10, or 0.15? Will you consider this score as bad? Useless? Without real signification? In other words, did a pval=0.15 (some errors on ABX) have the same meaning as a pval=0.95 (a lot of error during ABX)?
Title: Probability of passing a sequencial ABX test
Post by: AstralStorm on 2003-11-10 16:45:39
I'd very much appreciate an option (in ABC/HR and its Java counterpart) to clear ABX results after changing selected time,
as I like to use ABX to find differences as misses make the score go bad before I find the part I feel I'm able to ABX.
Maybe an option to clear the results? That would help to reduce warm-up effect.
(you can do the test any number of times before recording the results)
Title: Probability of passing a sequencial ABX test
Post by: Gabriel on 2003-11-10 16:54:53
Based on the current discussion, I tryed an experiment:
Completely random guessing with abc/hr (not even listening to the files).
Result:
18 of  26, p = 0.038

I am wondering if there could be something wrong with the computations...
Title: Probability of passing a sequencial ABX test
Post by: Gabriel on 2003-11-10 16:58:04
Still continuing the experiment:
54 of  85, p = 0.008
(still random)
....

72 of 114, p = 0.003
...

78 of 127, p = 0.006

That is incredible: I can randomly generate meaningfull results.

2 possibilities:
*I am gifted and am able to do some divination
*we can not trust the current results of abc/hr
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 17:01:08
Quote
Based on the current discussion, I tryed an experiment:
Completely random guessing with abc/hr (not even listening to the files).
Result:
18 of  26, p = 0.038

I am wondering if there could be something wrong with the computations...

I sometimes had the same results.
As Astral Storm, the lake of RESET function is annoying. Therefore, I artificially restart a test by reaching 100 trials, then performing 16 another tests. It's sometime amusing to see the pval score at 100 trials. With a good but not perfect basis (something as 18/26), the final pval of xx/100 is sometimes inferior to 0.05 !

There must be something wrong with pvalue, especially when many trials were performed.
Title: Probability of passing a sequencial ABX test
Post by: Gabriel on 2003-11-10 17:04:30
....
119 of 200, p = 0.004

...

(am I god?)
Title: Probability of passing a sequencial ABX test
Post by: ErikS on 2003-11-10 17:15:16
Quote
....
119 of 200, p = 0.004

...

(am I god?)

Go and buy a lottery ticket while your luck still holds!
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 17:25:07
Quote
....
119 of 200, p = 0.004

...

(am I god?)

No, you missed 80 times the good answer, but pval tell you that there are few chances to guess. 7/8 is less significative than 60/100... In other words, if you're planning to perform a difficult ABX test, it's easier to obtain significant results by targetting 100 trials than 8.

Note that foobar2000 ABX component didn't compute pvalue after 20 trials.
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 17:49:43
On ABX tests, if you did one error each three trials, than you will have :

6/9 = 0.250
10/15 = 0.150 (15%)
20/30 = 0.049 (<5%)
30/45 = 0.018 (<2%)

more trials = more significant results.


It's good to know that. If you try to ABX something difficult, and to prove that you're right, better 50 trials than 16 ;-)
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-10 18:00:19
Quote
119 of 200, p = 0.004

I wonder what does WinABX use to generate random numbers. This might mean that there is a deficiency in it.
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-10 18:17:39
Quote
....
119 of 200, p = 0.004

...

(am I god?)

What version of ABCHR are you using? There was some doubt whether the random number generator used in older versions is reliable or not.

The result does indeed mean: The probability to score 119 or better out of 200 by guessing is 0.0043.
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-10 18:28:43
Quote
Does it mean that 5% is a complete useless value? Or does it mean that with 5-15%, there are still some (serious) presumptions about an audible difference?
I'm really interested about it.

Then you could read the Statistics For Abx-thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=3175) (long!).

But to give you an idea how much the results are affected: Think of a guessing tester who stops the test as soon as he reaches 0.95 confidence or the maximal length ( =: m) of the test. The probability for him to pass the test are:

m=10 => p-val = 0.0508
m=20 => p-val = 0.0987
m=30 => p-val = 0.1295
m=50 => p-val = 0.1579
m=100 => p-val = 0.2021

See this excel sheet (http://stud4.tuwien.ac.at/~e0025119/CorrPVal5.xls) for reference.
Title: Probability of passing a sequencial ABX test
Post by: ff123 on 2003-11-10 18:31:06
Quote
Still continuing the experiment:
54 of  85, p = 0.008
(still random)

Verified that this is the correct p using simulation:

http://ff123.net/abx/abx.php (http://ff123.net/abx/abx.php)

If you were to repeat this 85 trial test many, many times, you would find that you can get a score of 54/85 or better (by guessing randomly), with a probability of 0.008.

I have put the output of ABC/HR several times through a random number generator "runs test" and it has passed.

During one of the previous versions, Hans Heijden thought the random number generation was suspicious (it showed moderate evidence against randomness in a runs test for his particular sequence), but when I tried it myself, it passed.  I changed the random number generator for good measure, though, so as not to rely on random().  The built-in random function, at least with Visual C++ 6.0, did not appear to give completely random initial numbers when using time values which were fairly close together (on older versions of ABC/HR, I kludged this by initializing random() twice).

I purposely force the cumulative calculation of p in ABC/HR to prevent cherry picking, but one improvement that it really could use is the addition of p-value "profiles," to allow for statistically valid sequential testing to occur.  A typical profile which Continuum and I analyzed would allow for a maximum of 28 trials.

ff123

Edit:  ABC/HR also uses the Mersenne Twister to generate random numbers
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-10 18:34:14
Just a few thoughts on this whole discussion: In general I think there's no need to be all that dogmatic about the issue of "when can a test be considered meaningful?". You see, that's why all those programs compute the p-val instead of just saying "test passed!" or "test failed!". The p-val tells you exactly one thing: What is the chance of achieving this result (or an even better one) by simply guessing. Now, if someone writes "I did an ABX test, and I received a p-val of 0.1", then it's up to the reader to decide whether he considers this good enough or not.
You might very well say "Damn, if I hand a few headphones to 10 deaf monkeys, I have a nice 65% chance that at least one of them receives that result, so how the hell can this be meaningful?" or you might just say "Well, if I really had been guessing, there's a 90% chance to do worse, that should be enough".
Of course, that depends on the circumstances. So if you just do some quick private tests (this might apply to guruboolez's question), and you're pretty sure you hear something, and you don't need perfect results anyway, than even a p-val of 0.1 might be enough for you. On the other hand, if you're trying to prove in public that flac sounds worse than wav (or something like that), you better make sure you can support that with a strong p-val, if you expect people to believe your claims.

Now to some of the more recent posts:
Quote
Still continuing the experiment:
54 of 85, p = 0.008
(still random)
....

72 of 114, p = 0.003
...

78 of 127, p = 0.006

That is incredible: I can randomly generate meaningfull results.

2 possibilities:
*I am gifted and am able to do some divination
*we can not trust the current results of abc/hr

Quote
....
119 of 200, p = 0.004

...

Well, I'd have to put my money on the first possibility. All the p-vals are correct. If you did this test systematically (like always saying "X is A"), then there might of course be the third possibility, that the random number generator favors A over B.
Quote
On ABX tests, if you did one error each three trials, than you will have :

6/9 = 0.250
10/15 = 0.150 (15%)
20/30 = 0.049 (<5%)
30/45 = 0.018 (<2%)

more trials = more significant results.

This is correct.
Quote
It's good to know that. If you try to ABX something difficult, and to prove that you're right, better 50 trials than 16 ;-)

This is not. Actually these results make perfect sense. By guessing, you might very well guess two thirds of the trials correctly if you only do a few. But it's extremely improbable that you can maintain this two-thirds-streak for like 100 trials, if you really are only guessing. Conversely, if someone really manages to get two thirds right for 100 trials, you can be pretty sure he heard a difference.
So, if you can't hear any difference but you just really want to have a great ABX result, you should really do quite the opposite of what guruboolez suggests. If your result is already good at 16 trials, then by no means continue to do 50, you'll only mess it up  .
Quote
I'd very much appreciate an option (in ABC/HR and its Java counterpart) to clear ABX results after changing selected time,
as I like to use ABX to find differences as misses make the score go bad before I find the part I feel I'm able to ABX.
Maybe an option to clear the results? That would help to reduce warm-up effect.
(you can do the test any number of times before recording the results)

I don't think that's a good idea. It would make the results much less meaningful. If someone gets a p-val of 0.05 with one test, this is a pretty reliable result. But if he restarts the test 15 times, chances are he will get a p-val of 0.05 at least once (supposing he fixed the number of trials).
Also, I don't think the lack of a restart function poses much of a problem. You don't need to get a "perfect" score of 8/8 everytime. If you messed up some trials in the beginning, but after that you can hear the difference reliably, you can just do some more trials and the p-val will decrease rapidly. A short example: you started your test, and you can't hear a difference in the beginning. And on top, you have some serious bad luck, so you'll get only 2/8 correct (p-val of 0.96). But after that you can hear a difference very reliably, so you do some more tests (which probably will be much quicker than in the beginning), and you manage to get 15/16 correct. Summed up that's 17/24 with a respectable p-val of 0.03.
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-10 18:36:18
The randomizer in WinABX does seem to be deficient. By repeatedly choosing a-b-a-b-a-b-... I've got 114/202 (pval=0.039). With foobar2000's ABX component (which uses Mersenne's twister to generate random numbers) I only got 106/202 with that strategy, which corresponds to a pval of ~0.25.

Edit: One might wonder why did I do 202 trials and not 200... well, I simply didn't stop in time
Title: Probability of passing a sequencial ABX test
Post by: guruboolez on 2003-11-10 19:14:12
Quote
This is not. Actually these results make perfect sense. By guessing, you might very well guess two thirds of the trials correctly if you only do a few. But it's extremely improbable that you can maintain this two-thirds-streak for like 100 trials, if you really are only guessing. Conversely, if someone really manages to get two thirds right for 100 trials, you can be pretty sure he heard a difference.

Good point. But it clearly means that we had to take care with pval. For exemple, when KikeG said that he would't trust (too much) pval > 0.05, this mean that if people want to convice him, it's better to send him a 30/45 than a 10/15. Or, differently, if you have difficulties to maintain a good concentration and achieve good ABX score on 16 trials, better than performing another test, you should resume the first one, and reaching the 45...50 trials. It supposes of course that the tester is able to maintain the two third right on 50 trials. I'm sure that I could do it with some difficult samples : when 16/16 is strictly impossible, 30/45 isn't too difficult (not for ABXing Flac & PCM of course ;-)). I often "failed" on ABX tests : I did three, four or five different sessions of 16 trials, and all were 11/16 or 12/16. If I had decided to merge the small tests in one big 60 trials test, conclusion would change, from "failed" to "succeed".

I'm agree with your first comment ("there's no need to be all that dogmatic about the issue of "when can a test be considered meaningful?"). ABX score are nothing without precise comments about conditions of the test. For exemple, I often had 10/12 tests on anchor-like encodings, but 12/12 for high quality lossy encodings. The first is so easy that I need 30 seconds for 12 trials (and doing stupid mistake - sometimes with keyboard shortcuts), and the second is so hard that I need 15 minutes to perform it, taking "breaks" in order to keep some fresh ears.
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-10 19:56:27
Quote
Quote
Does it mean that 5% is a complete useless value? Or does it mean that with 5-15%, there are still some (serious) presumptions about an audible difference?
I'm really interested about it.

Then you could read the Statistics For Abx-thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=3175) (long!).

But to give you an idea how much the results are affected: Think of a guessing tester who stops the test as soon as he reaches 0.95 confidence or the maximal length ( =: m) of the test. The probability for him to pass the test are:

m=10 => p-val = 0.0508
m=20 => p-val = 0.0987
m=30 => p-val = 0.1295
m=50 => p-val = 0.1579
m=100 => p-val = 0.2021

See this excel sheet (http://stud4.tuwien.ac.at/~e0025119/CorrPVal5.xls) for reference.

Thanks. This is exactly the answer to my question. IMO this should be integrated in ABX utilities: You would have to enter the confidence you want to reach before and if you want to perform a fixed number of trials or to stop after a certain confidence / a maximum number of trials is reached.

Do you know how these "corrected" values are calculated?
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-10 20:06:16
Quote
Do you know how these "corrected" values are calculated?

I wrote the sheet so I hope I know it! 

You can read the source (comes with macros) and try to figure out what means what.

Or look at this post (http://www.hydrogenaudio.org/forums/index.php?showtopic=3175&view=findpost&p=31356), for a detailed (and hopefully more understandable) explanation.
Title: Probability of passing a sequencial ABX test
Post by: Mac on 2003-11-10 20:18:48
Quote
The randomizer in WinABX does seem to be deficient, I've got 114/202. With foobar2000's ABX component I only got 106/202 with that strategy.

So with Foobar you had 52.5% correct guesses, and with WinABX you got 56.4% correct?

Unless my 1 minute google search (http://www.google.com/search?q=coin+flip+statistics+deviation&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8) was wrong, both of these are within the +/- 7.1% standard deviation you would expect in a correct/wrong scenario with 202 tests. 

I think your claims about the deficiency of WinABXs randomness are unfounded.
Title: Probability of passing a sequencial ABX test
Post by: AstralStorm on 2003-11-10 20:28:22
Heh, 116/200 is nearly 35% chance of missing according to my calculator.

P-val calculator is certainly wrong.
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 20:37:52
The problem is obvious : in his random test, Gabriel is always more right than wrong ! There is no way for this to happen by chance. If the generated sequence is truly random, you should get sometimes more right than wrong, and sometimes more wrong than right.

People seem to consider high confidence to be common when the number of trial rises. No way ! High confidence is high confidence, and by definition, a common result has a low confidence ! This is the definition of "common" and "confidence".
The example assumes that a good result is got two times out of three. This is nearly impossible to maintain this just by chance. Sooner or later you'll get two wrong results out of three, and the confidence will collapse.

Maybe it would be interesting to see the logs with every choice of the program and the user. Either the random generator is bad, and there is a correlation between the user choices and the program choices. Note that even if the user chooses random answers, there can be a correlation, because people have a very bad idea of randomness, and when asked to perform random guessings, usually generate a uniform distribution of answers, rather than a random one. A human list doesn't fluctuate in the long term. A random list does. But note also that as long as the program is really random, all correlation must disappear, because comparing a random list with a non random one must lead to another random list.
The other hypothesis is that the total of success recorded by the program is wrong. Maybe if we check each answer we'll find 50 right answers out of 100 while the program counts 70 of them. The last hypothesis would be that in the final results, the program records a different list that it actually generated. Example : X is A. The user says X is B, the program records "Program : B user : B, right answer".

Mac, my probability courses are far away, but if I'm not mistaken, the probability to be outside the standard deviation is 2 %, which is OK, since we got here a 4 % probability (144 out of 202) for something inside. Can someone comfirm this ?
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-10 20:39:40
Quote
Quote
The randomizer in WinABX does seem to be deficient, I've got 114/202. With foobar2000's ABX component I only got 106/202 with that strategy.

So with Foobar you had 52.5% correct guesses, and with WinABX you got 56.4% correct?

Unless my 1 minute google search (http://www.google.com/search?q=coin+flip+statistics+deviation&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8) was wrong, both of these are within the +/- 7.1% standard deviation you would expect in a correct/wrong scenario with 202 tests. 

I think your claims about the deficiency of WinABXs randomness are unfounded.

I'm not sure what link exactly you are refering too. Anyway, the confidence 0.9608 for 114/202 is an exact value (which approximated with the normal distribution returns 0.954).

Maybe your "+/-"-interval is considering a 2/10 and an 8/10 result as equally important? This, however, is not how it's done in our case.
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-10 20:42:37
Quote
Heh, 116/200 is nearly 35% chance of missing according to my calculator.

P-val calculator is certainly wrong.

???

The correct confidence value is 0.98593!

What are calculating?
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 20:47:54
Quote
Quote
Heh, 116/200 is nearly 35% chance of missing according to my calculator.

P-val calculator is certainly wrong.

???

The correct confidence value is 0.98593!

My calculator agrees :

(http://perso.numericable.fr/laguill2/pictures/prob.png)
Title: Probability of passing a sequencial ABX test
Post by: Mac on 2003-11-10 20:57:16
I was going on the standard deviation of a 202 trial binomial distribution as being 7.1, meaning any number of correct guesses between 94 and 108 is dead on target, and anything between 87 and 115 isn't completely unexpected.  As both 106 (Foobar) and 114 (WinABX) both fell into this, I saw no problem with that..  I admit I forgot all my statistics work the day after the exam on it, so I could be wrong
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-10 21:00:36
I think one little suggestion is necessary here: Please don't jump to conclusions. Two or three examples are not enough to conclude that some random number generator is faulty. Especially, if you don't do your tests carefully. Continuum's comments show that it's all too easy to "prove" that some program is faulty: Just press the buttons long enough, and it's pretty likely you dip below pval=0.05 at least once.

Quote
The problem is obvious : in his random test, Gabriel is always more right than wrong ! There is no way for this to happen by chance.

How did you conclude that? Maybe I am missing something here, but if I understand it correctly, Gabriel posted 4 intermediate results, out of 200! Certainly we can't conclude that he was always more right than wrong. And it's overwhelmingly probable that you will be more right than wrong 4 times in a 200 trials test.
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 21:12:04
You're right.

I'd like to see a graph of p (probability) versus n (number of trials). Does p decrease ? Does it constantly fluctuate and sometimes (not often) reach low values ?

...I forgot one thing about confidence. The confidence level that is needed must depend on the number of tests performed by someone. For example if I perform one test per day, accept a 5% result as valid, and pass one test out of two...
After 40 days, I get 20 successes, but 5 % is one chance out of 20 !
Thus it is very probable that one of my 20 correct results is flawed !
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-10 21:13:56
Quote
Unless my 1 minute google search (http://www.google.com/search?q=coin+flip+statistics+deviation&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8) was wrong, both of these are within the +/- 7.1% standard deviation you would expect in a correct/wrong scenario with 202 tests. 

Standard deviation alone does not give you the answer to the question if the behaviour that I have encountered wasn't normal. Instead, you need to perform a certain statistical test.

The basis of my statement that there seems to be a deficiency (note the 'seems', as when formally evaluated, my test would only be valid at ~92% confidence, which isn't generally considered high enough) is the following.

The probability of getting 113 or less trials correct by guessing is 0,960839995. Thus, the probability of getting 114 or more of them correct is less than 0,04.

Now, since I didn't expect the number to be higher or lower than the mean value of 101 beforehand, I must also include the event that I get 86 or less correct answers in the critical interval, making my statement valid only at 92% confidence.
Quote
I think your claims about the deficiency of WinABXs randomness are unfounded.

Well, you could help debunking them by performing a simple test.

Following the a-b-a-b-a-b-... strategy, do ~200 trials and post your results.

If you still doubt my methodology, I can write a formal mathematical description of my test.
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-10 21:14:48
Quote
both of these are within the +/- 7.1% standard deviation you would expect in a correct/wrong scenario with 202 tests. 

I think I finally understand your calculation (blame my poor continuous probability knowledge ), I believe, there are two things wrong:
1. +/- is uninteresting. We are only concerned about good results.
2. 7.1 (=sqrt(0.5*0.5*202)) is not a percentage but an absolute number, so 114 is well outside it.
Title: Probability of passing a sequencial ABX test
Post by: AstralStorm on 2003-11-10 21:30:57
Check with what probability you can get this result with a random generator (PRNG will probably suffice).
If you get the result in ~5% of the half of the guesses (in this case 101), then they're random (p=~0.5)

It's not that the test gets harder at 100th try than at 20th. (of course given the results so far at 10/20 or 50/100)
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-10 21:54:58
Quote
Well, you could help debunking them by performing a simple test.

Following the a-b-a-b-a-b-... strategy, do ~200 trials and post your results.

OK, I tried it once, decided that it's too much work (having to click 400 times for each result), wrote a little program which remote-controls WinABX, and here's a few results:

1. 88/200, pval=96.1%
2. 110/200, pval=8.9%
3. 92/200, pval=88.5%
4. 77/200, pval=99.9%
5. 120/200, pval=0.3%
6. 92/200, pval=88.5%
7. 104/200, pval=31%
8. 91/200, pval=91%
9. 99/200, pval=58.4%
10. 104/200, pval=31%

edit: I did a few more tests, using different strategies (choosing always A or always B), and they seem to indicate that there's no problem with WinABX's RNG.
Title: Probability of passing a sequencial ABX test
Post by: Mac on 2003-11-10 22:18:12
Erg, I mixed myself up a little  When saying +/- I meant that a result of 80 out of 200 is identical to a result of 120 out of 200, as the likelihood of success and failure is equal.  By 7.1% I meant 2 standard deviations away from the mean was 14.2, or 7.1%

It seems schnofler beat me to the test, but here are my 2 results from WinABX:

Choosing all A: 98/200, p=63.8
Choosing all B: 101/200 p=47.2

The P value may be totally screwed, but I see no problems with the randomness of it, hence I stick with saying your claim is unfounded
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-10 22:20:06
Quote
Choosing all A: 98/200, p=63.8
Choosing all B: 101/200 p=47.2

The P value may be totally screwed

No, it's not.
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 22:20:46
Here's the graph for the Pval of my 200 answers :

(http://perso.numericable.fr/laguill2/pictures/prob1.png)
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 22:23:30
Schnofler, could you post your program, of the log file for 2,000 trials (if the probabilities are not too long to be computed, or don't overflow) ? I'd like to plot a larger graph...

Edit : 2,000 should be enough
Title: Probability of passing a sequencial ABX test
Post by: Gabriel on 2003-11-10 22:34:05
another test:
15 of  24, p = 0.154
16 of  25, p = 0.115
17 of  26, p = 0.084
18 of  27, p = 0.061
19 of  28, p = 0.044
20 of  29, p = 0.031
21 of  30, p = 0.021

another one:
27 of  44, p = 0.087


I tryed a 140 choices set, and only 25 times during the test my p-value was .5 or higher. If it was random, should't it be moving around .5?
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-10 22:42:22
Quote
Schnofler, could you post your program, of the log file for 2,000 trials (if the probabilities are not too long to be computed, or don't overflow) ? I'd like to plot a larger graph...


I don't really understand what you mean, so here (http://www.rz.uni-frankfurt.de/~bkuckuck/winabx_log_01.txt)'s a log file with 10 tests of 200 trials each, and this (http://www.rz.uni-frankfurt.de/~bkuckuck/winabx_log_02.txt) and
this one (http://www.rz.uni-frankfurt.de/~bkuckuck/winabx_log_03.txt) are log files with two giant tests of 2000 trials.

edit: I'm not so keen on posting the program itself, because that would mean I'd have to make it usable for anyone but myself  . (It's an absolutely awful hack. I wrote another program years ago, which does something similar, and I just replaced some parts to make it control WinABX).
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-10 22:46:52
Quote
OK, I tried it once, decided that it's too much work (having to click 400 times for each result), wrote a little program which remote-controls WinABX, and here's a few results:

Nice work!

Quote
4. 77/200, pval=99.9%
5. 120/200, pval=0.3%


I'd say these two are a good indication that something is wrong with the PRNG.

Given that we wanted to test for an abnormal probability of a test returning pval of less than 1% or more than 99%, at a confidence level of 98% it can be claimed that it's bigger than 2% (which it should be equal to).

However, for the results to be statistically valid this value should have been set before the test...
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-10 23:00:01
Anyway, I'd like to know what did KikeG use to generate random numbers in his program.

My guess is that it was rand(), and in that case then I'd suggest replacing it with something better.
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 23:24:26
Thanks, Schnofler, this is exactly what I wanted. I've plotted your data :

(http://perso.numericable.fr/laguill2/pictures/prob2.png)

(http://perso.numericable.fr/laguill2/pictures/prob3.png)

Pval, in %, vs trial number, of Schnofler's data

The second graph behaves exactly how I would expect it to do (random), but the first one looks strange and needs an analysis. Maybe there is a mathematic explanation.

Actually, the % of right answers varies randomly, but the variations are slower as the number of trials grow. We must also take into account that the p value is very sensitive to the % of right answers. Once in some hundreds of trials, it can fall below 1 % or above 99%.
Defining the interval of % of right answers that leads to 1% < pval < 99 %, we should study the probability that the % of right answers stays outside this interval once it is out of it.
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-10 23:28:54
Note in the first graph, that if we only had the first 500 trials, we would have concluded that the Pval either decrease to 0 or increase to 100 % as the trial goes, and pushing the simulation to 1000 trials would have confirmed it. We would have deduced that for sure, once the Pval raises or falls, nothing can bring it back to 50 %. Pushing to 1500 trials would only have comfirmed this again, but... at 2000, it decreases again 

(That's what is called "listening fatigue", you know, after 1500 trials, I got tired and didn't hear the difference anymore  But I'm sure I can reproduce the result for 2000 trials if I get one or two more coffees ...) ... that's what an audiophle could say, if the curve is inverted (0 % instead of 100 %)

Ah ! Random numbers    !
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-10 23:36:00
Quote
The second graph behaves exactly how I would expect it to do (random), but the first one looks strange and needs an analysis. Maybe there is a mathematic explanation.

I sure hope there are some people on this board with a profound knowledge of stochastics  . The p-value is a pretty wicked function of the (hopefully) random 0-1-sequence the program should produce. It might be quite hard to phrase properties that graph should have. Well, I'm too tired now, anyway. Good night everyone.
Title: Probability of passing a sequencial ABX test
Post by: phong on 2003-11-11 01:27:57
There are many "weak" pseudorandom number generators that have patterns when looking at a subset of the bits produced.  In the case of the simplest, the lowest bit simply toggles resulting in a pattern of even-odd-even-odd.  It a somewhat more sophisticated algorithm, perhaps there is a 60% chance of the number being even on an even trial and a 60% chance of an odd number on an odd trial.  The pattern is not obvious at first, but becomes significant in a large number of trials in a situation such as this:
Code: [Select]
if (rand() % 2) {
   x = a;
} else {
   x = b;
}

Making a good pseudorandom number generator is a Hard Problem™; there are certainly many PhDs or at least Masters theses on the subject.  Intel was nice enough to include a true random number generating widget on their more recent chipsets that generates true random numbers from thermal noise on a resistor.  AMD also makes one available in their 76x series of chipsets.  I do not know if equivalent/compatable implementation is/will be available on other chipsets/platforms.

In Linux the kernel keeps track of certain quasi-random events from the "real world" such as interrupt times, network traffic, time between keystrokes/mouse movements, etc. and stores it in an "entropy pool" which programs can draw from by reading from /dev/random.  Entropy is a limited resource though, so occasionally some program requiring secure random numbers will stall waiting for more entropy (gnupg for example, will ask you to use the computer if it can't get enough entropy to generate a private key).  If there is a hardware RNG is present and support is turned on in the kernel, it will get entropy from that on a regular basis.  I do not know the equivalent for Windows (I assume CryptoAPI or something would have something like that).

So, getting good random numbers in a reliable (let alone portable) way is Hard.
Title: Probability of passing a sequencial ABX test
Post by: ff123 on 2003-11-11 01:58:23
Quote
There are many "weak" pseudorandom number generators that have patterns when looking at a subset of the bits produced.  In the case of the simplest, the lowest bit simply toggles resulting in a pattern of even-odd-even-odd.  It a somewhat more sophisticated algorithm, perhaps there is a 60% chance of the number being even on an even trial and a 60% chance of an odd number on an odd trial.  The pattern is not obvious at first, but becomes significant in a large number of trials in a situation such as this:
Code: [Select]
if (rand() % 2) {
   x = a;
} else {
   x = b;
}

I noticed that the rand() function in Microsoft Visual C++ 6.0 has a bias if the code is written this way (even-odd algorithm).  But if the integer outputs are binned into 10 ranges (representing digits 0-9) then the bias disappears.  So in addition to the double initialization kludge, I had to be careful not to use the even-odd algorithm.  All-in-all, I can't say that I'm impressed by the rand() function.

The Mersenne twister now used in ABC/HR doesn't have this problem, but I still don't rely on an even-odd algorithm.

ff123
Title: Probability of passing a sequencial ABX test
Post by: Pio2001 on 2003-11-11 04:05:38
Here's how the p value behaves, according to the binomial table (http://www.kikeg.arrakis.es/winabx/bino_dist.zip)

http://perso.numericable.fr/laguill2/pictu...s/binomial5.png (http://perso.numericable.fr/laguill2/pictures/binomial5.png)

The yellow aera on the left is the aera where pval < 5 %, and the blue aera on the right is the one where pval > 95 %. Each intermediate zone is 5 % wide.
The number of successes in a sequencial random ABX test starts from the bottom of the graph, and at each step up, it randomly goes one step to the right or one step to the left.
In each point, pval represents the probability of being to the left of the point by chance. Thus at any time, there is an equal chance of being in any zone. Since all zones gather at the center of the graph, the probability of being there is high.

Here's the same graph, but with 1% wide bands. Left aera : pval < 1%, right aera, pval >99%.

http://perso.numericable.fr/laguill2/pictu...s/binomial1.png (http://perso.numericable.fr/laguill2/pictures/binomial1.png)

Here, I added a third graph on it.

http://perso.numericable.fr/laguill2/pictu...mial1scotch.png (http://perso.numericable.fr/laguill2/pictures/binomial1scotch.png)

It represents the same thing, but with 10 % wide bands. All central bands are white. We can see that if, at the 50th trial, pval gets inferior to 1%, there is one chance out of 5 for it to stay below 1 % for the 50 next trials, because the 1% line of the "1%" graph stays on the right of the 20% line of the "10 %" graph until trial number 100.

The pval table is too small to simulate it for 2000 trials.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-11 08:33:28
Quote
But to give you an idea how much the results are affected: Think of a guessing tester who stops the test as soon as he reaches 0.95 confidence or the maximal length ( =: m) of the test. The probability for him to pass the test are:

m=10 => p-val = 0.0508
m=20 => p-val = 0.0987
m=30 => p-val = 0.1295
m=50 => p-val = 0.1579
m=100 => p-val = 0.2021

See this excel sheet (http://stud4.tuwien.ac.at/~e0025119/CorrPVal5.xls) for reference.

So, according to this table, going for a 0.99 confidence (0.01 or 1% pvalue), one would have a probability of getting it by chance on 40 trials of 0.0327 (3.27%), isn't it? So, I'm right thinking that a person who passed this test on 40 trials would also have an over 95% confidence value of being hearing a true difference.

I know little about excel macros, sorry, so what would be the value for a 100 trials test?
EDIT: I think figured it out in the VBA code, and rerun the calculations. It would be 0.05162, so I guess this would not pass the 95% confidence value required. The max. no. of trials allowed to pass this test would be 93, with a pval of passing it by chance of 0.04989. 94 trials would be 0.05039, over 5%.

So, for a 16 trials max. sequential test, it would be enough to get a "calculated" p<3%.

I don't really know much about statistics, I'm afraid.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-11 08:36:42
About WinABX PRNG, from v0.4 it also uses the Mersenne twister generator. However, it uses an even-odd algorithm. Maybe should I change this? What version did you test?
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-11 10:11:07
Quote
About WinABX PRNG, from v0.4 it also uses the Mersenne twister generator. However, it uses an even-odd algorithm. Maybe should I change this? What version did you test?

I have tested the latest version, but I only did one test.

Could you try to reproduce this behaviour (abnormally high or low results when a-b-a-b-... pattern is followed) with the current implementation and when 0 or 1 is chosen by comparing the pseudorandom number to a half of its  maximum value?

Maybe it's a good idea to implement a cryptographically secure PRNG for WinABX?
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-11 10:58:58
Quote
Could you try to reproduce this behaviour (abnormally high or low results when a-b-a-b-... pattern is followed) with the current implementation and when 0 or 1 is chosen by comparing the pseudorandom number to a half of its  maximum value?

I tried with v0.41 version, using the a-b-a-b strategy, and got 95/200, p=78%. But now I've updated the algorith, not using the even-odd algorith and using same algorithm as abc/hr, and then tried again. The funny thing is that I got 108/200, p=14%, and during this test I got a score as good as 78/134, p=3.5%. But as Continuum and Pio2001 have explained, this is not significant in this kind of test. I tried again using a "A" always strategy, and got 104/200,p=31%.

Edit: tried again an a-b-a-b strategy with last version, and got 99/200, p=58.4%. I guess that's what happens with random numbers: they're random, and a score more "perfect" than another (foobar vs WinABX) doesn't mean anything unless many tests or may trials are averaged...

Another edit: A passed test (going for p<5%) out of 20 random tests is just what statistics predict, even 1 passed test out of 10 is possible it you are a bit lucky. What is quite unlikely is to pass just a single random test. However, it's possible if you are very lucky.

The updated version (v0.42) is available at http://www.kikeg.arrakis.es/winabx/winabx.zip (http://www.kikeg.arrakis.es/winabx/winabx.zip)

It would be good if you tested it same way you did with the old version.

Quote
Maybe it's a good idea to implement a cryptographically secure PRNG for WinABX?

I don't know it this would have any advantage for the issues at discussion.
Title: Probability of passing a sequencial ABX test
Post by: Moneo on 2003-11-11 11:49:31
Quote
I tried with v0.41 version, using the a-b-a-b strategy, and got 95/200, p=78%. But now I've updated the algorith, not using the even-odd algorith and using same algorithm as abc/hr, and then tried again. The funny thing is that I got 108/200, p=14%, and during this test I got a score as good as 78/134, p=3.5%. But as Continuum and Pio2001 have explained, this is not significant in this kind of test. I tried again using a "A" always strategy, and got 104/200,p=31%.

Yes, neither of these are statistically significant.
Quote
It would be good if you tested it same way you did with the old version.

Maybe you could post the sources for both old and new random number generation routines, including the initial seeding? It isn't exactly fun to click the mouse 400 times, and I don't know windows programming well enough to write an application that would control WinABX
Quote
Quote
Maybe it's a good idea to implement a cryptographically secure PRNG for WinABX?

I don't know it this would have any advantage for the issues at discussion.

It would essentially rule out a possibility of "cheating" by noticing any patterns in the prng.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-11 14:37:04
Quote
Maybe you could post the sources for both old and new random number generation routines, including the initial seeding?

PRNG used is publically available, it's a Mersenne Twister PRNG that implements MT19937. Extracted from source code:

"This is a Mersenne Twister pseudorandom number generator
with period 2^19937-1 with improved initialization scheme,
modified on 2002/1/26 by Takuji Nishimura and Makoto Matsumoto."

Inicialization in all versions is at program startup, and everytime the test files are reloaded in the newest versions (I guess the later isn't necessary):

Quote
void InitSeed(void)
{
   init_genrand((unsigned long)time(NULL));
}



RN generation in v0.40 and v0.41 was:

Quote
int Rand(int base)
{
   return genrand_int32()%base;
}


and in new v0.42 is:

Quote
#define MAX_GENRAND_REAL 0xffffffff

int Rand(int base)
{
   return (int)((genrand_int32()/(MAX_GENRAND_REAL+1.0))*base);
}



Quote
It isn't exactly fun to click the mouse 400 times, and I don't know windows programming well enough to write an application that would control WinABX


Ok, I was talking in general, not only you. However I guess results won't be very different, on simulation over thousands of trials they give very similar results.

Quote
It would essentially rule out a possibility of "cheating" by noticing any patterns in the prng.


I think this would be really difficult to notice, if there were any.
Title: Probability of passing a sequencial ABX test
Post by: schnofler on 2003-11-11 15:10:56
Ok, I tested the new version, here you go:

1. 100/200, pval=52.8%
2. 90/200, pval=93.1%
3. 90/200, pval=93.1%
4. 103/200, pval=36.2%
5. 83/200, pval=99.3%
6. 107/200, pval=17.9%
7. 101/200, pval=47.2%
8. 89/200, pval=94.8%
9. 102/200, pval=41.6%
10. 85/200, pval=98.6%

And some longer tests:

1. 985/2000, pval=75.5%
2. 1012/2000, pval=30.3%
2. 1001/2000, pval=49%
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-12 14:18:02
the 3 following posts have been moved from Upsampling output, Any theoretical advantages? (http://www.hydrogenaudio.org/forums/index.php?showtopic=15179&)
__________________________________________________________

Quote
I'm not so sure about that, it could be significant due to the p=3.3% reached during the test on just 11 trials.

I'm quite sure that Continuum's thoughts he talked about here (http://www.hydrogenaudio.org/forums/index.php?showtopic=15151&hl=p+value) and here (http://www.hydrogenaudio.org/forums/index.php?showtopic=3175&st=50&&#entry31356) are correct, so without fixing the trial number before starting, reaching 9/11 doesn't mean "probability you are guessing" is 3.3 %, rather something higher like > 5%.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-12 14:35:12
Quote
I'm quite sure that Continuum's thoughts he talked about here (http://www.hydrogenaudio.org/forums/index.php?showtopic=15151&hl=p+value) and here (http://www.hydrogenaudio.org/forums/index.php?showtopic=3175&st=50&&#entry31356) are correct, so without fixing the trial number before starting, reaching 9/11 doesn't mean "probability you are guessing" is 3.3 %, rather something higher like > 5%.

Yes, I took this into account. According to Continuum's corrected p-values table, the probability of passing the test going for a p below around 3.5% on just 11 trials, would be very close to 5%, so this result would have some significance. That's why I say that if lucpes repeats the test and gets a similarly good score, it would be a quite reliable indication of that he heard a difference. For example, if he got 9/11 again, that would be 18/22, an "uncorrected" p=0.2%. That would be quite below the required 5% even if corrected (please someone correct me if I'm wrong, I'm still not totally confident on my interpretation and extrapolation of the table data).

Edit: At first, it seems that these conclusions agree with ff123 "decision" table at the long ABX statistics thread: http://www.hydrogenaudio.org/forums/index....indpost&p=30785 (http://www.hydrogenaudio.org/forums/index.php?showtopic=3175&view=findpost&p=30785)

However, I should spend more time trying to understand better and verify all these things, time that I lack now.
Title: Probability of passing a sequencial ABX test
Post by: tigre on 2003-11-12 15:42:56
Continuum's corrected p-values are only correct with the following preconditions:

- before the test, a maximum number of trials (N) is fixed.
- A p-value (P) is set (like 0.05).
-- If P is reached, the test stops immediately as "passed", the "probability you're guessing" is the "corrected p-value" from Continuum's table
-- If after N trials P isn't reached, the test is "failed".

If the precondition is something like "I try to reach p = 0.01. I don't know how many runs I'll perform, but if I'm frustrated I'll give up", it's hard to impossible to tell the true "probability you're guessing".

If Lucpes had said before the test "I want to get [/b]p=0.04[/b] or better and perform not more than [/b]13[/b] trials", he would have stopped after reaching 9/11 = 0.033. According to Continuum's table this would mean a corrected p-value of 0.063 => "Probability you're guessing" = 6.3%

If he had said "I want to get p=0.05 or better and perform not more than 40 trials", he would have stopped after reaching 9/11 = 0.033. => corrected p-value: 0.145 => "Probability you're guessing" = 14.5%

From these 2 examples (best case vs. worst case covered by the table) we see that without defining preconditions it's hard to interprete the result.
Title: Probability of passing a sequencial ABX test
Post by: KikeG on 2003-11-12 16:15:12
Quote
Continuum's corrected p-values are only correct with the following preconditions:

- before the test, a maximum number of trials (N) is fixed.
- A p-value (P) is set (like 0.05).
-- If P is reached, the test stops immediately as "passed", the "probability you're guessing" is the "corrected p-value" from Continuum's table
-- If after N trials P isn't reached, the test is "failed".

...

If Lucpes had said before the test "I want to get [/b]p=0.04[/b] or better and perform not more than [/b]13[/b] trials", he would have stopped after reaching 9/11 = 0.033. According to Continuum's table this would mean a corrected p-value of 0.063 => "Probability you're guessing" = 6.3%

But, given that he had stopped when his p=3.3%, does it really matter what p he wanted to reach before the test started? And does it matter what number of trials max. he was planning to perform? These things were just in his mind, and I'd say final results don't depend on these. I'd like Continuum to explain this, because I'm not really confident on my interpretation, and I'm a little bit confused right now and maybe I'm skipping something.

Anyway, I'm not saying he passed the test, I guess because he didn't stop at the p=3.3% point, and even if he had, the corrected p could be slightly over 5%. But if he repeated the test and got similarly good results, I think one could say he passed without much doubt.
Title: Probability of passing a sequencial ABX test
Post by: Continuum on 2003-11-12 17:51:56
I'm not a statistics guru, but here is what I think on this topic (I can try to ask someone more experienced in this area, next week):

It all depends on the question you ask. Keep in mind, that we cannot calculate the probability the listener was guessing.

First consider a fixed test like the one tigre describes. We can exactly calculate the probability that a guessing tester would pass this test (-> "corrected p-val"). In statistics results are usually considered significant when this value is below 0.05 (or for stricter tests 0.01).

In the first situation however, we loose information about when the test is stopped. For example, we could say that a 6/6 result is better than a 7/8, and one could rephrase the  question to: "What is the probability for a guessing tester to achieve a 'better' score (in our peculiar ordering)?"

Think of the following situation: A listener decides on a "classic p-val" (as displayed in the program) he wants to achieve, say 0.05, and stops the test as soon as this value is reached. The maximal trial number is not fixed at this moment -- but it would not change the strategy of a guessing listener anyway!
Let's say he reached 0.04 at 20 trials. The probability for a guessing tester to score a better result, that is to reach 0.05 with at most 20 trials is 0.098.

In how far this is a sensible thing to do, I don't know.