Maths of ABX tests

Topic: Maths of ABX tests (Read 5417 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Maths of ABX tests

2016-06-01 22:39:56

This is likely due to my naive/limited understanding of statistics, but how does one go about hypothesis testing an ABX test with hypotheses

H_0: p>0.5
H_1: p=0.5

Where p is the probability of a correct choice in an individual ABX trial. Making the number of correct choices in a sample of trials closer to the expectance of a random guess as the significance level is increased.

This is as opposed to the trivial problem of

H_0: p=0.5
H_1: p>0.5

Due to the intention of the experiment not to determine to the highest level of confidence that A and B are distinguishable, but that they are perceptually identical. I can't fathom as to how one could determine the minimum number of trials required for this to be the case to a specified significance level, or what the distribution for n trials p>0.5, since it does not seem as simple as a mere binomial.

Re: Maths of ABX tests

Reply #1 – 2016-06-02 00:09:37

Quote from: pixelwitch on 2016-06-01 22:39:56

This is likely due to my naive/limited understanding of statistics, but how does one go about hypothesis testing an ABX test with hypotheses

Ah, the old argument from incredulity. If we are deciding a yes/no question in a properly blinded test then the binomial distribution is what applies whether you find it credible or not. Period.

Re: Maths of ABX tests

Reply #2 – 2016-06-02 00:38:46

Quote from: pixelwitch on 2016-06-01 22:39:56

Due to the intention of the experiment not to determine to the highest level of confidence that A and B are distinguishable, but that they are perceptually identical. I can't fathom as to how one could determine the minimum number of trials required for this to be the case to a specified significance level, or what the distribution for n trials p>0.5, since it does not seem as simple as a mere binomial.

Showing that p=0.5 would require infinitely many trials since the width of the confidence interval will only asymptotically approach 0. Which makes sense, since with a finite number of trials you'd never expect to hit the expectation value exactly.

I think this is just not a very productive way to setup your problem.

Re: Maths of ABX tests

Reply #3 – 2016-06-02 01:02:16

Does that mean that there is no means of determining whether P(not-guessing) < p-value ?
If so, is the only alternative to test for distinguishability, and accept transparency otherwise?

Re: Maths of ABX tests

Reply #4 – 2016-06-02 04:14:20

Quote from: pixelwitch on 2016-06-02 01:02:16

Does that mean that there is no means of determining whether P(not-guessing) < p-value ?

You could show that P was within some bound around 50% with a certain confidence interval. This won't help you prove that two things are identical though. They could simply be different at a level you are not powered to detect.

Quote from: pixelwitch on 2016-06-02 01:02:16

If so, is the only alternative to test for distinguishability, and accept transparency otherwise?

If you're asking for a way to prove the absecense of a thing in general, that is going to be quite hard. Usually failing to detect a difference under relevant conditions is a close enough approximation.

Re: Maths of ABX tests

Reply #5 – 2016-06-02 12:01:17

Quote from: saratoga on 2016-06-02 04:14:20

Quote from: pixelwitch on 2016-06-02 01:02:16
Does that mean that there is no means of determining whether P(not-guessing) < p-value ?

You could show that P was within some bound around 50% with a certain confidence interval. This won't help you prove that two things are identical though. They could simply be different at a level you are not powered to detect.

Are you referring to accepting transparency if
P(trials - number correct <= X <= number correct) < significance level,
or perhaps,
P(trials/2 <= X <= number correct) < significance level?

With the issue being that the minimum number of trials required at each significance level is substantially higher e.g. 7 vs 6366 for a 1% significance level?

Re: Maths of ABX tests

Reply #6 – 2016-06-02 14:48:15

One of the "issues" (not really for practical purposes) is that significantly ">0.5" does not say how much. If you make 1 000 000 000 000 trials with 500 000 000 000 successes, then you can be confident that you are within 49.9999 and 50.0001 percent. Which is a worry if your life depends on some probability being "1/2" and not "500001 in a million".
The following two statements are not the same:
Given iid trials with a fixed probability >1/2, then there is a number N of trials such that the test should conclude.
Given iid trials with a probability >1/2 and a test with N trials, you should get a conclusion.
Since in the latter case N is given, there is an interval (containing 1/2) of probabilities which is so narrow that N trials is way too little.

But there is a different issue with the OP's framing, and that is to distinguish between ">1/2 vs =1/2" and ">1/2 vs at most 1/2". That is the one-sided vs. two-sided test thing.

Re: Maths of ABX tests

Reply #7 – 2016-06-02 17:40:12

Bayes Factors for ABX tests

You use exactly the same equation.
P(X = x | H₀)

In the "simple" case H₀: p=0.5, in your case: H₀: p>0.5.

So you integrate for p from 0.5 to 1.0.

Re: Maths of ABX tests

Reply #8 – 2016-06-02 18:02:29

Ah, I had actually considered integrating between 0.5 and 1, but retracted it in an edit due to concerns with the whether

Or if it was even describing what I thought it was.

Re: Maths of ABX tests

Reply #9 – 2016-06-02 21:39:56

Quote from: xnor on 2016-06-02 17:40:12

Bayes Factors for ABX tests

You use exactly the same equation.
P(X = x | H₀)

Does that mean in Bayesian statistics, the analogue to hypotheses in the test are reversible without changing the test?

Beyond that, I now grasp that the aforementioned summation should be 1/2 not 1, due to the single tail, and what it encompasses. But I'm unsure as to whether I'd be correct in stating that the equivalent hypothesis test would be demonstrating that given:

H₀: p>0.5
H₁: p=0.5

n=number of trials
X=number of correct choices
k=number of correct choices in the experiment

Re: Maths of ABX tests

Reply #10 – 2016-06-03 22:19:03

Quote from: pixelwitch on 2016-06-02 21:39:56

Does that mean in Bayesian statistics, the analogue to hypotheses in the test are reversible without changing the test?

In null hypothesis testing you have a null hypothesis. You assume the null hypothesis as a given, and then try to reject it by calculating P for the observed or a more extreme result.
The p-value acts a more or less arbitrary decision point, when to reject.

This is quite problematic. For one, you can reject a null due to tiny differences (they could be completely irrelevant to what you're trying to show) given a large enough number of trials. Secondly, you will reject the null in many cases where the null is actually true.

Bayesian hypothesis testing allows you to pit two hypotheses against each other. A vs B or B vs A does not matter, the result will just be the inverse of each other (or negative if using log).
Whichever hypothesis fits the observed result better will be favored.

Re: Maths of ABX tests

Reply #11 – 2016-06-03 22:48:55

The problem with your H1 is that p=0.5 does not mean you will get X=k=5 successes for a 10 trial experiment. You will get another probability distribution for all k from 0 to 10.

Also, it's 2 times the integral to arrive at a sum of 1.

(edit: the images in the linked thread are online again)

Re: Maths of ABX tests

Reply #12 – 2016-06-03 23:24:42

Quote from: xnor on 2016-06-03 22:48:55

The problem with your H1 is that p=0.5 does not mean you will get X=k=5 successes for a 10 trial experiment. You will get another probability distribution for all k from 0 to 10.

Sorry, I'm not sure what you mean by this.

Re: Maths of ABX tests

Reply #13 – 2016-06-04 17:49:42

In the typical ABX null hypothesis test you do not calculate the probability of H₁: p > 0.5 given the data. You calculate the probability of the data (X=x) or a more extreme result (X>x) given p = 0.5.
A leap is then made to make conclusions about the alternative hypothesis H₁. I've already mentioned some problems with this before.

When you do a 10 trial test, you can calculate P(X=5|p>0.5) but the leap to make any conclusions about your H₁ would be ginormous.

Flip a fair coin 10 times. On average, how many times will you get a 5 heads/tails result?

Notice