Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Statistics For Abx (Read 36174 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Statistics For Abx

Reply #50
Quote
As I said, the oscillation of the difference between the simulation and the calculation is very suspicious.  And I don't see how it could have come from the simulation.

The calculated numbers shown above by continuum are the right values. (It would be extremely unlikely for Maple to be wrong here anyhow).

This may be a stupid question. But when you run the simulation multiple times what is the variation? Could it account for the differences?

Statistics For Abx

Reply #51
Quote
Quote
As I said, the oscillation of the difference between the simulation and the calculation is very suspicious.  And I don't see how it could have come from the simulation.

The calculated numbers shown above by continuum are the right values. (It would be extremely unlikely for Maple to be wrong here anyhow).

This may be a stupid question. But when you run the simulation multiple times what is the variation? Could it account for the differences?

I corrected the oscillation (it was the simulation!) using a different random number generator.

I ran 19 cases of the 28-trial profile, using 1 million trials each:

Average = 0.04918
std dev = 0.0002
std error of mean = 0.00005

Just to be sure we're still on the same page regarding the 28-trial profile, here it is again:

required correct at look points:

6 of 6
10 of 12
14 of 18
17 of 23
20 of 28

 

Statistics For Abx

Reply #52
Quote
Any thoughts on the non-even spreading of the alpha error?

I can't see any problem with it. Maybe there are some subtle effects on the chances of type-2 errors. You could run some simulations to check that out.

My thought is that you shouldn't limit the design to evenly spread alpha errors. Arguably, the overall Pr(type-1 error) is all you really need to constitute a valid significance test anyhow.

Statistics For Abx

Reply #53
Quote
Quote
Any thoughts on the non-even spreading of the alpha error?

I can't see any problem with it. Maybe there are some subtle effects on the chances of type-2 errors. You could run some simulations to check that out.

My thought is that you shouldn't limit the design to evenly spread alpha errors. Arguably, the overall Pr(type-1 error) is all you really need to constitute a valid significance test anyhow.

The problem I see with a non-even alpha spreading is that the listener will probably not be aware of what's going on, i.e., why the test gets more difficult (in the case of Continuum's proposed profile) as it progresses.

ff123

Statistics For Abx

Reply #54
Investigating the rand() error a little further, the problem isn't in the rand() function itself, but in the way I used it.

To get a random 0 or a 1, one proper way to do it is:

response = (int)((2)*(rand()/(RAND_MAX+1.0)));

But I coded it as:

response = rand() % 2;

It's interesting that the latter method doesn't work, because it would seem like there's an equal number of 0's and 1's.  Oh well, chalk it up to yet another thing I don't understand.

ff123

Edit:  the only thing I can think of is that Microsoft's rand() doesn't generate an equal number of even and odd numbers!  Well, another lesson learned.

Statistics For Abx

Reply #55
Quote
But going back to the 28-trial case, I still get 0.0491. So there was something wrong with my random numbers, but there must still be something wrong with your calculation (could still be roundoff errors).
There are definitely no rounding errors. I executed the LookPVal algorithm in Maple and got the exact result:
Code: [Select]
>LookPVal(28, array([0.95, 0.95, 0.98, 0.98,0.98]), array([6, 12, 18, 23,28]));
> evalf[40](%);
                              1704631
                              --------
                              33554432

             .05080196261405944824218750000000000000000

(Excel was
Code: [Select]
at least  6 of  6
at least  10 of  12
at least  14 of  18
at least  17 of  23
at least  20 of  28
5,08019626140594E-02
)
Still, there could be a logical flaw somewhere. (Then again, I would be surprised by the close results.)


Quote
1. No looks allowed for trials 1 through 5

Why not? Statistically, they are irrelevant, but they are helpful for people like me (see my previous post).

Statistics For Abx

Reply #56
First Step: Interprete Pascal triangle as abx results

Definition: The pascal triangle A (up to degree n) is a n*n matrix where the upper triangle is filled with 0's, whereas the first column is filled with 1's. The remaining elements are calculated as follows: A_i,j := A_i-1,j-1 + A_i-1,j (i for row, j for column).

1  0  0  0  0  0
1  1  0  0  0  0
1  2  1  0  0  0
1  3  3  1  0  0
1  4  6  4  1  0
1  5 10 10  5  1
.............


Now consider an ABX test: With one trial, there are two possible outcomes, 0/1 and 1/1. Each have a probability of 1/2.
With two trials there are four results (correct-correct, correct-false, false-correct, false-false). These can be simplified to three cases: 0/2, 1/2 and 2/2. The corresponding probabilities are 1/4, 1/2 and 1/4, or in other words, 1/2^2, 2/2^2 and 1/2^2.

Theorem: The probability to score exactly k correct results out of n abx trials (if one's guessing) (=:P(abx=k/n)) is A_n+1,k+1 / 2^n.
Proof: The statement is correct for n=1.
May the statement be correct for n-1 abx trials. Then
P(k/n) =        P(k/n-1)*P(0/1) +          P(k-1/n-1)*P(1/1) =
      = A_n,k+1 /2^(n-1) * 1/2  +  A_n,k /2^(n-1) * 1/2 =
      = A_n+1,k+1 /2^n
is true. qed.


Second Step: Add look-points

Next we'll add look-points, e.g. let's say 4/4, 4/5 and 5/5 are winning conditions. Furthermore,
P(4/5)=P(4/4)/2 + P(3/4)/2 and P(5/5)=P(4/4)/2 is true.
The probability of reaching a winning condition, P(4/4 or 4/5 or 5/5), is not P(4/4) + P(4/5) + P(5/5), but
  P(4/4) + P(4/5 and not 4/4) + P(5/5 and not 4/4 and not 4/5) =
  (because scoring 5/5 after less than 4/4 is impossible)
= P(4/4) + P(4/5 and 3/4) =
= P(4/4) + P(4/5 | 3/4) * P(3/4) =
= 1/2^4  + P(4/5 | 3/4) * P(3/4) =
= 1/16  +          1/2 * 4/16 =
= 1/16  + 2/16 = 3/16 = 0.1875


Third Step: The Pascal triangle approach

An easier way to calculate this would be to recalculate the Pascal triangle, line after line, but remove (set to 0) the corresponding value for each winning condition (note that even the 1's in the main diagonal line are changed):
5th line (=4th trial):
1  0  0  0  0  0
1  1  0  0  0  0
1  2  1  0  0  0
1  3  3  1  0  0
1  4  6  4  0  0
gives us a changed 6th line (=5th trial):
1  5 10 10  4  0
where we remove the 4/5 and 5/5 (note that 5/5 has probability 0, as calculated above):
1  5 10 10  0  0
the 7th line would be:
1  6 15 20 10  0  0

The sought-after probability is the sum of the removed values, each divided by 2^trialnumber:
P(4/4 or 4/5 or 5/5) = 1 / 2^4  +  4 / 2^5  +  0 / 2^5 = 3/16

But what have we done? By removing the "1" in the 5th line, we changed the probability of P(4/5) to P(4/5 and not 4/4), because the "thread" 4/4->4/5 is taken away.


Fourth Step: Implementing the algorithm

To calculate a new line (variable name: Result) of the modified Pascal triangle we need only the last line, which is stored in the variable LastResult. To sum up all the probabilities of winning conditions, I've used the variable Prob, which is set to 0 at the beginning and increased as winning situations occur.

For Trial = 1 To n                      'Run through all required lines of the triangle
  If NextLook <= UBound(LookTimes) Then  'Is a look-time left?
  ...
    If Trial = LookTimes(NextLook) Then  'Is this trial a lookup-point?

The second if-clause checks, if the current trial has a winning condition.
 
Note: For my program I used an a little different approach: winning conditions are not directly specified (like 8/10), but indirectly calculated by passing a requested confidence for each trial wich is a look-point.
By adding
...
        Wend
       
        Debug.Print ("at least " & Str(k) & " of " & Str(Trial))
        'lists the included winning conditions

        If k <= Trial Then
...

you can review the used winning conditions (in the debug window).

I hope this explains, what my program is doing.

Edit: Corrected wrong indices in proof. Removed smilie.
(This board screwed my double-spaces...)
Edit 2: more corrections
Edit 3: Yet another correction: I erroneously used conditional probabilities on a few occasions.

Statistics For Abx

Reply #57
I still don't know why the simulation doesn't agree with the calculation.  Let's try a very simple one.  Can you calculate the overall alpha for the following lookpoints, maximum 6 trials:

2 of 2
3 of 4
4 of 6

The exact answer is 0.453125

My program yields 0.4530 with 10 million simulations

ff123

Statistics For Abx

Reply #58
Quote
I still don't know why the simulation doesn't agree with the calculation.  Let's try a very simple one.  Can you calculate the overall alpha for the following lookpoints, maximum 6 trials:

2 of 2
3 of 4
4 of 6

The exact answer is 0.453125

My program yields 0.4530 with 10 million simulations

ff123

at least  2 of  2
at least  3 of  4
at least  4 of  6
0,453125 -> exact

Statistics For Abx

Reply #59
Quote
at least  2 of  2
at least  3 of  4
at least  4 of  6
0,453125 -> exact

This is really stumping me.

Can you try two more easy tests?

at least 2 of 2
at least 3 of 6

exact answer is 0.671875
simulation yields: 0.6715 after 10 million sims.

Also:

at least 2 of 4
at least 3 of 6

exact answer is 0.75
simulation yields: 0.7501 after 10 million sims.

The Pascal triangle method is really interesting.  I'm trying to verify that it works using excel right now.

ff123

Statistics For Abx

Reply #60
Yay!

I made up my own spreadsheet using the Pascal triangle method, and trying to get the simple examples to agree.  I finally ended up with an answer of 0.049155 for the 28-trials profile

ff123

Statistics For Abx

Reply #61
So there is a mistake in the code. If you can find it, tell me.

Edit: ARGH! Found it! I forgot to reset the values to 0 for winning conditions. This problem couldn't occur with my first version, but now happens under certain circumstances.

I had to add a line:
          For k = k To Trial
            Prob = Prob + (LastResult(k) + LastResult(k + 1)) / 2 ^ Trial
            Result(k + 1) = 0                'THIS LINE WAS MISSING

          Next k


Here is the corrected version: http://www.freewebz.com/aleph/CorrPVal3.xls

I have added a new version, which makes setting up tests easier. (LookPval2)

Statistics For Abx

Reply #62
Great!  The code looks like it should be very easy to incorporate into abchr, and it will be a lot faster than simulation

ff123

Statistics For Abx

Reply #63
Another thing to consider is, that this test allows no obvious conclusions beyond the given 0.95 or 0.99 confidence, because the test is either passed or failed. Not that it was much different with old ABX tests, but a 16/16 result allowed claiming a difference with confidence >0.9998. Maybe some extreme high confidence test modes should be added. (at least some people think it's neccessary, otherwise we wouldn't see that many 16/16 or 29/30 results)

Just an idea.

Statistics For Abx

Reply #64
I think this is a major improvement to ABX testing. Great job!

Statistics For Abx

Reply #65
Quote
I've been thinking some more about the in-between-look terminations. Since the listener cannot make a decision to continue the test after he terminates it, I think I have calculated things wrong. For example, if the listener gets a look at trial 6, but then stops at trial 8, then all the other looks at trial 12, 18, 23, and 28 should not be counted towards the overall alpha.


I think there are three options:
1. Show the progress to the listener, but do not allow him to quit the test here (or with worst case for next the look-point).
2. As (1), but don't show anything. This is the strictest option.
3. Don't show the progress, but allow the listener to quit with his current result (at the point-in-between). This might lead to statistical problems* (alpha might be higher).

Edit:
*) I have checked this and it is true. Let's say the listener achieved 13/18 at the third look point in the 28 profile, which is insufficient to pass the test.
If he continues after option 1 or 2, the probability to succeed is the same as passing a test with look-points 4/5 and 7/10 (= (17-13)/(23-18) and (20-13)/(28-18) ), i.e. 0.25586.
But if option three is applied, the user can stop at 1/1 (which he cannot see though). So if he stops after the next trial (after the 19th total trial) he succeeds with probability 0.5!

Therefore, option 3 is statistically unusable (or everything had to be recalculated very difficultly).

Statistics For Abx

Reply #66
I'm having difficulty understanding your point.  Are you saying that my in-between values are correct or not?

My thinking came about because I was trying to decide what should be done if the listener performs without knowing progress up to trial 5, and then terminates.  What should the program show if he got all 5 correct?  It should show an unadjusted alpha of 0.031.  In other words, if the listener cannot make a decision to continue or not based on information he has seen, there should be no adjustment.

BTW, I will probably go ahead and show progress for trials 1 through 4.  Yes, the listener can perform a Bayesian analysis and decide to stop if he gets all 4 wrong, but since this test is mainly interested in type 1 errors, that should not be a problem.

ff123

Statistics For Abx

Reply #67
Quote
I'm having difficulty understanding your point. Are you saying that my in-between values are correct or not?
I'm not sure, how you calculated them. Could you explain it a little bit more? But I'm suspecting you would run into the same problem.

Quote
My thinking came about because I was trying to decide what should be done if the listener performs without knowing progress up to trial 5, and then terminates. What should the program show if he got all 5 correct? It should show an unadjusted alpha of 0.031. In other words, if the listener cannot make a decision to continue or not based on information he has seen, there should be no adjustment.
Yes, without any information the user should be allowed to stop at 5/5. (Which would theoretically be the same as if he had chosen this test method -- 5/5 -- right from the beginning.)
But if more information is available in later trials, a guessing listener might use it to his advantage; see the example in my last post, where the user cannot be allowed to stop after 19 trials (with his real score).

Quote
BTW, I will probably go ahead and show progress for trials 1 through 4. Yes, the listener can perform a Bayesian analysis and decide to stop if he gets all 4 wrong, but since this test is mainly interested in type 1 errors, that should not be a problem.
Personally, I like the idea to see the results for the first tests. But then the listener can't be allowed to stop at 5/5, as this would obviously increase the total alpha -- like adding a look point at 5/5.
Code: [Select]
at least  6 of  6
at least  10 of  12
at least  14 of  18
at least  17 of  23
at least  20 of  28
4,91552352905273E-02
at least  2 of  1
at least  3 of  2
at least  4 of  3
at least  5 of  4
at least  6 of  5
at least  6 of  6
at least  10 of  12
at least  14 of  18
at least  17 of  23
at least  20 of  28
4,91552352905273E-02
at least  2 of  1
at least  3 of  2
at least  4 of  3
at least  5 of  4
at least  5 of  5
at least  6 of  6
at least  10 of  12
at least  14 of  18
at least  17 of  23
at least  20 of  28
6,20170868933201E-02

Statistics For Abx

Reply #68
Here's how I was calculating the in-between points:

For Trial 13, for example, I would include the look points at trials 6 and 12 in the total alpha calculation, but not the look points at trials 18 and 23.  In this respect, it would be just like calculating the overall alpha after stopping at trial 28.  So in my simulator, I would enter

6 of 6
10 of 12
11 of 13

to get a total alpha of 0.0295

Ok, I'm reviewing your 13/18 case.  Using my current table, if the listener is allowed to stop at 19, he does have a 50% chance of randomly getting the next one right and passing the test overall.  But what's wrong with that?  To get 13 of 18, he had to be pretty close to an overall significance of 0.05 in the first place (sim says about 0.062).

ff123

Edit:  still thinking about seeing trials 1 - 4

Statistics For Abx

Reply #69
Quote
Ok, I'm reviewing your 13/18 case. Using my current table, if the listener is allowed to stop at 19, he does have a 50% chance of randomly getting the next one right and passing the test overall. But what's wrong with that? To get 13 of 18, he had to be pretty close to an overall significance of 0.05 in the first place (sim says about 0.062).
Yes, but nominally, without the option to stop at 19 his chances are far lower, i.e. in a strict 28-trial look point test, they are only about 0.25586. (see above)

The other in-between points might have the same problem.

Statistics For Abx

Reply #70
Regarding trials 1-4, I am thinking about how I have simulated it.  Right now, if I enter 1 of 1 as a look point, then the simulation assumes the listener terminates if he gets a 1 of 1.  But that isn't how it would really work.  In real life, the listener should never terminate, no matter what results he gets on trials 1-4.  So I think allowing the listener to see those first 4 trials is ok (trial 5 should be blinded).

ff123

Statistics For Abx

Reply #71
Quote
Yes, but nominally, without the option to stop at 19 his chances are far lower, i.e. in a strict 28-trial look point test, they are only about 0.25586. (see above)

The other in-between points might have the same problem.

I'm considering things purely from a simulation point of view right now.  I.e., what does the simulation say the overall probability of getting 14 of 19 is when he is allowed to look at trials 6, 12 and 18 (and terminate early) and then allowed to stop at trial 19?

The simulator says 0.0495 probability of terminating at any of the look points or at trial 19 with an adequate score.

ff123

Statistics For Abx

Reply #72
Quote
I'm considering things purely from a simulation point of view right now. I.e., what does the simulation say the overall probability of getting 14 of 19 is when he is allowed to look at trials 6, 12 and 18 (and terminate early) and then allowed to stop at trial 19?

The simulator says 0.0495 probability of terminating at any of the look points or at trial 19 with an adequate score.

But what if things sum up? The problem is, the listener can choose if he aborts the test early or not and has therefore an advantage. It is clear to me, that the total alpha would increase, maybe not above 0.05, but it definitely would be higher than what we calculated earlier.

Statistics For Abx

Reply #73
How should one modify the simulation?

Right now the simulation says that the listener always terminates at a look point if the total alpha is 0.05 or less.  This is as much to the listener's advantage as possible.

But there is another approach to all this.  So far we have investigated the "frequentist" approach.  The Bayesian approach could be just as interesting.  Say that the listener based his decision on whether or not to continue based on his past performance.  What should be his decision at each look point?

ff123

Statistics For Abx

Reply #74
Quote
How should one modify the simulation?
If you mean, to account for in-between termination, I have no idea right now. It would be possible to calculate the best strategy at each (unsuccessful) look-point, but rather difficult.

Quote
Right now the simulation says that the listener always terminates at a look point if the total alpha is 0.05 or less. This is as much to the listener's advantage as possible.
Yes, and this is good. But this only includes the look-points. Allowing the user to terminate in-between causes nontrivial problems (at least for me).

Quote
But there is another approach to all this. So far we have investigated the "frequentist" approach. The Bayesian approach could be just as interesting. Say that the listener based his decision on whether or not to continue based on his past performance. What should be his decision at each look point?
When either option 1 or 2 from my previous is used, it doesn't matter, I think, because his decision is obvious: stop when target is reached, else continue. I don't see what he should do different.