### Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Probability of passing a sequencial ABX test (Read 45057 times)
0 Members and 1 Guest are viewing this topic.

## Probability of passing a sequencial ABX test

##### 2003-11-10 13:39:12
Edit : this discussion was started there.

I like viewing my ABX results during the test, and stopping when I have reached low enough p. That's what some people call a "sequential" ABX test. As Pio2001 says, for such a kind of test, getting a p = 5% is not enough pass the test. It seems (someone correct me if I'm wrong, please) that, for such a kind of sequential test, getting a p<1% is is enough for saying the the test has been passed.

## Probability of passing a sequencial ABX test

##### Reply #1 – 2003-11-10 13:55:04
Quote
I like viewing my ABX results during the test, and stopping when I have reached low enough p. That's what some people call a "sequential" ABX test. As Pio2001 says, for such a kind of test, getting a p = 5% is not enough pass the test. It seems (someone correct me if I'm wrong, please) that, for such a kind of sequential test, getting a p<1% is is enough for saying the the test has been passed.

Does it mean that 5% is a complete useless value? Or does it mean that with 5-15%, there are still some (serious) presumptions about an audible difference?
I generally perform ABX test quickly, and doing some mistake I could avoid by being more meticulous (listening carefully to A, listening carefully to B, etc... then validate my choice). I'm sure that I can avoid most of them, and can obtain very good ABX score, because sometime I take the time for performing a precise, long and boring test. Quick tests gave me pval of 5...15%, and meticulous one are < 1%.

## Probability of passing a sequencial ABX test

##### Reply #2 – 2003-11-10 14:18:52
5% is valid only if either you have fixed in advance the whole number of trials to perform, or you don't look to the results until the whole test is finished (or both, obviously)

I wouldn't consider a p of 10% a very reliable indication of audible difference.

## Probability of passing a sequencial ABX test

##### Reply #3 – 2003-11-10 14:22:03
Is there a rule of thumb for tests without fixed number of trials to get a more reallistic result?

Something like: "If the last trial was successful, it doesn't count."
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

## Probability of passing a sequencial ABX test

##### Reply #4 – 2003-11-10 14:25:54
Quote
5% is valid only if either you have fixed in advance the number of trials to perform, or you don't look to the results until the test is finished (or both, obviously)

I wouldn't consider a p of 10% a very reliable indication of audible difference, either.

Should I understand that in your opinion, pval = 10% or pval = 95% mean the same thing? Or is there some nuance between "valid" statement and "invalid" results?

I'm not statistician, but an average user. For common sense, a test concluding on a difference with 10% guessing only is something to take with serious consideration (especially for high bitrates encoding, and for very short samples).

## Probability of passing a sequencial ABX test

##### Reply #5 – 2003-11-10 14:51:04
Quote
Should I understand that in your opinion, pval = 10% or pval = 95% mean the same thing? Or is there some nuance between "valid" statement and "invalid" results?

I don't understand what you mean sorry. pval=5% (or 0.05) means 95% confidence value, and pval=10% (or 0.1) means 90% confidence value.

As a rule, a test is considered to be passed only if you achieve p<5% on a non-sequential test. It seems that p<1% is enough for sequential tests, in order to compensate for the effects Pio2001 talked about.

## Probability of passing a sequencial ABX test

##### Reply #6 – 2003-11-10 14:53:03
Quote
Is there a rule of thumb for tests without fixed number of trials to get a more reallistic result?

Something like: "If the last trial was successful, it doesn't count."

I don't understand very well what you mean in this last sentence. I think I explained it at my previous posts. 5% is valid only under certain conditions. If not, you must go for at least 1%.

## Probability of passing a sequencial ABX test

##### Reply #7 – 2003-11-10 15:10:15
Quote
I don't understand very well what you mean in this last sentence.

I'll try to explain better:

IMO especially for difficult samples it's unpredictable how many trials are needed, because one one hand performance often becomes better after a few trials (training effect), on the other hand at some point it starts to become worse because of fatigue / boredom / impatience.

Because of this I would prefer to perform tests this way:

I aim to reach a certain pval score and finish as I've reached it. Example:

I want to reach p = 0.1 or lower. My results:

0 of 1, p = 1.000
1 of 2, p = 0.750
2 of 3, p = 0.500
3 of 4, p = 0.313
4 of 5, p = 0.188
4 of 6, p = 0.344
5 of 7, p = 0.227
6 of 8, p = 0.145
7 of 9, p = 0.090

Since I haven't fixed the number of trials before the result isn't really 0.090 as you explained. My question is:

How would it be possible to get a valid result (p = 0.1 or lower) without fixing the number of trials before?

"If the last trial was successful, it doesn't count." would mean that one more trial needs to be done; if it is successful (-> 8/10), the result p = 0.090 is correct, otherwise the test will continue.
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

## Probability of passing a sequencial ABX test

##### Reply #8 – 2003-11-10 15:18:33
Quote
I want to reach p = 0.1 or lower...

p = 0.1 = 10%.

I think you mean p = 0.01 = 1% for a sequential test, right?

(Forgive me if I'm not reading this correctly.)

## Probability of passing a sequencial ABX test

##### Reply #9 – 2003-11-10 15:19:40
Quote
Usually, a p <= 0.05 is considered a significant result. This is pretty close though, which to me indicates that there is a very good chance that more testing would result in a statistically significant result.

So, Gabriel scored p = 0.084, or, 17 out of 26.

To me, that is nowhere near significant whatsoever.  On any given day, if I flip a coin 26 times, I could get 17 heads.  Yet Gabriel's confidence level is ~92%, which seems extortionate.  I guess its just the English I'm having trouble with - i.e. the 'confidence' word...

How is the P Value calculated, just out of interest?

## Probability of passing a sequencial ABX test

##### Reply #10 – 2003-11-10 15:43:14
Quote
Quote
I want to reach p = 0.1 or lower...

p = 0.1 = 10%.

I think you mean p = 0.01 = 1% for a sequential test, right?

It was just an example and I chose 0.1, not 0.01 to save space.
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

## Probability of passing a sequencial ABX test

##### Reply #11 – 2003-11-10 15:44:53
Quote
How is the P Value calculated, just out of interest?

B) <- click!
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

## Probability of passing a sequencial ABX test

##### Reply #12 – 2003-11-10 16:33:22
Quote
Quote
Should I understand that in your opinion, pval = 10% or pval = 95% mean the same thing? Or is there some nuance between "valid" statement and "invalid" results?

I don't understand what you mean sorry. pval=5% (or 0.05) means 95% confidence value, and pval=10% (or 0.1) means 90% confidence value.

As a rule, a test is considered to be passed only if you achieve p<5% on a non-sequential test. It seems that p<1% is enough for sequential tests, in order to compensate for the effects Pio2001 talked about.

What I mean is: if a test is finished on a score xx/16, with a pval = 0.10, how will you consider this test? You're asking for pval=0.05 for considering a test as "passed", but with 0.10, or 0.15? Will you consider this score as bad? Useless? Without real signification? In other words, did a pval=0.15 (some errors on ABX) have the same meaning as a pval=0.95 (a lot of error during ABX)?

## Probability of passing a sequencial ABX test

##### Reply #13 – 2003-11-10 16:45:39
I'd very much appreciate an option (in ABC/HR and its Java counterpart) to clear ABX results after changing selected time,
as I like to use ABX to find differences as misses make the score go bad before I find the part I feel I'm able to ABX.
Maybe an option to clear the results? That would help to reduce warm-up effect.
(you can do the test any number of times before recording the results)
ruxvilti'a

## Probability of passing a sequencial ABX test

##### Reply #14 – 2003-11-10 16:54:53
Based on the current discussion, I tryed an experiment:
Completely random guessing with abc/hr (not even listening to the files).
Result:
18 of  26, p = 0.038

I am wondering if there could be something wrong with the computations...

## Probability of passing a sequencial ABX test

##### Reply #15 – 2003-11-10 16:58:04
Still continuing the experiment:
54 of  85, p = 0.008
(still random)
....

72 of 114, p = 0.003
...

78 of 127, p = 0.006

That is incredible: I can randomly generate meaningfull results.

2 possibilities:
*I am gifted and am able to do some divination
*we can not trust the current results of abc/hr

## Probability of passing a sequencial ABX test

##### Reply #16 – 2003-11-10 17:01:08
Quote
Based on the current discussion, I tryed an experiment:
Completely random guessing with abc/hr (not even listening to the files).
Result:
18 of  26, p = 0.038

I am wondering if there could be something wrong with the computations...

I sometimes had the same results.
As Astral Storm, the lake of RESET function is annoying. Therefore, I artificially restart a test by reaching 100 trials, then performing 16 another tests. It's sometime amusing to see the pval score at 100 trials. With a good but not perfect basis (something as 18/26), the final pval of xx/100 is sometimes inferior to 0.05 !

There must be something wrong with pvalue, especially when many trials were performed.

## Probability of passing a sequencial ABX test

##### Reply #17 – 2003-11-10 17:04:30
....
119 of 200, p = 0.004

...

(am I god?)

## Probability of passing a sequencial ABX test

##### Reply #18 – 2003-11-10 17:15:16
Quote
....
119 of 200, p = 0.004

...

(am I god?)

## Probability of passing a sequencial ABX test

##### Reply #19 – 2003-11-10 17:25:07
Quote
....
119 of 200, p = 0.004

...

(am I god?)

No, you missed 80 times the good answer, but pval tell you that there are few chances to guess. 7/8 is less significative than 60/100... In other words, if you're planning to perform a difficult ABX test, it's easier to obtain significant results by targetting 100 trials than 8.

Note that foobar2000 ABX component didn't compute pvalue after 20 trials.

## Probability of passing a sequencial ABX test

##### Reply #20 – 2003-11-10 17:49:43
On ABX tests, if you did one error each three trials, than you will have :

6/9 = 0.250
10/15 = 0.150 (15%)
20/30 = 0.049 (<5%)
30/45 = 0.018 (<2%)

more trials = more significant results.

It's good to know that. If you try to ABX something difficult, and to prove that you're right, better 50 trials than 16 ;-)

## Probability of passing a sequencial ABX test

##### Reply #21 – 2003-11-10 18:00:19
Quote
119 of 200, p = 0.004

I wonder what does WinABX use to generate random numbers. This might mean that there is a deficiency in it.

## Probability of passing a sequencial ABX test

##### Reply #22 – 2003-11-10 18:17:39
Quote
....
119 of 200, p = 0.004

...

(am I god?)

What version of ABCHR are you using? There was some doubt whether the random number generator used in older versions is reliable or not.

The result does indeed mean: The probability to score 119 or better out of 200 by guessing is 0.0043.

## Probability of passing a sequencial ABX test

##### Reply #23 – 2003-11-10 18:28:43
Quote
Does it mean that 5% is a complete useless value? Or does it mean that with 5-15%, there are still some (serious) presumptions about an audible difference?

But to give you an idea how much the results are affected: Think of a guessing tester who stops the test as soon as he reaches 0.95 confidence or the maximal length ( =: m) of the test. The probability for him to pass the test are:

m=10 => p-val = 0.0508
m=20 => p-val = 0.0987
m=30 => p-val = 0.1295
m=50 => p-val = 0.1579
m=100 => p-val = 0.2021

See this excel sheet for reference.

## Probability of passing a sequencial ABX test

##### Reply #24 – 2003-11-10 18:31:06
Quote
Still continuing the experiment:
54 of  85, p = 0.008
(still random)

Verified that this is the correct p using simulation:

http://ff123.net/abx/abx.php

If you were to repeat this 85 trial test many, many times, you would find that you can get a score of 54/85 or better (by guessing randomly), with a probability of 0.008.

I have put the output of ABC/HR several times through a random number generator "runs test" and it has passed.

During one of the previous versions, Hans Heijden thought the random number generation was suspicious (it showed moderate evidence against randomness in a runs test for his particular sequence), but when I tried it myself, it passed.  I changed the random number generator for good measure, though, so as not to rely on random().  The built-in random function, at least with Visual C++ 6.0, did not appear to give completely random initial numbers when using time values which were fairly close together (on older versions of ABC/HR, I kludged this by initializing random() twice).

I purposely force the cumulative calculation of p in ABC/HR to prevent cherry picking, but one improvement that it really could use is the addition of p-value "profiles," to allow for statistically valid sequential testing to occur.  A typical profile which Continuum and I analyzed would allow for a maximum of 28 trials.

ff123

Edit:  ABC/HR also uses the Mersenne Twister to generate random numbers