Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: ABX (Read 3973 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

ABX

These may be the most 'duh!!' questions someone on this board may ever ask, but I am in genuine ignorance: What's ABX? How is ABXing done? So far, I have only learnt that ABXing must be done to achieve objective and accurate results in comparing test samples.

I could not find any description whatsoever anywhere, and the search on this forum didn't help 

If anyone is not sufficiently turned off by my questions to post a reply, can he/she kindly direct me to some resources?
Thanks in advance! 

ABX

Reply #1
This is a FAQ.

First go to ff123's page.

There are two major types of blind listening tests:

ABA: Also called "odd man out". The tester usually listens to the sequence of two original samples (A) and an encoded sample ( B ), taken in random order, e.g. ABA, AAB or BAA. The tester must guess the B sample.
Use the WinABX v0.22 ABA version to perform the tests.

ABX: The tester usually first listens to the original sample (A), then to the encoded sample ( B ), and then to sample X, which is selected randomly to be either A or B. The tester must guess the X sample---is it A or B. 

To achieve statistically significant result (p <= 0.5%) with the chosen AB-pair of samples, the tester must perform the sequence of at least 8 ABX-trials, or at least 5 ABA-trials.

Edit:
Thanks to the KikeG and ff123 for the heads-up and correcting me here. The last statement should actually read:

To achieve statistically significant result with the chosen AB-pair of samples, the tester must perform the sequence of
at least 5 ABX-trials, or at least 3 ABA-trials -- for p <= 5%.
at least 8 ABX-trials, or at least 5 ABA-trials -- for p <= 1%


ABX

Reply #3
Quote
For a short definition, see: http://www.rane.com/par-a.html

Oh, good link! Bookmarked.


Quote
...neither the tester nor the listener (can be the same)...

Curious, so the test subject is officially called the "listener".
I assumed tester = listener in my post above.

ABX

Reply #4
Quote
To achieve statistically significant result (p <= 0.5%)

Ehm... that's wrong.

It should be p<=5% or p<=0.05 . (same thing).

ABX

Reply #5
Quote
Quote
To achieve statistically significant result (p <= 0.5%)

Ehm... that's wrong.

It should be p<=5% or p<=0.05 . (same thing).

Isn't it just a matter of taste somehow? If you want more security (and have enough time and patience) why not going for p<=0.01?

One more thing that worries me about this:

Let's say I do an ABX test on a hard-to-spot sample/setting. I'll need some practice/training rounds to be prepared. When I start the test and my 1st choice is wrong (0/1) I say "this belongs to the training" and press Reset. If the 1st choice is right, but the second is wrong (1/2) I say "I was unconcentrated and need to focus better" and press Reset.

So the true results will be 3/3 instead of 5/5 (p=12.5% instead of 3.125%) or 5/6 instead of 7/8. [EDIT]With ABA it'd be even worse.[/EDIT] So IMO it's a good idea to (especially tell people you don't know to) go for p<=0.01. 
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

ABX

Reply #6
Quote
Isn't it just a matter of taste somehow? If you want more security (and have enough time and patience) why not going for p<=0.01?

In a strictly proper test, the bar is set beforehand, and then you determine whether you achieved it after the completion of the test.  One problem with setting the bar at 0.01 instead of at 0.05, is that you have a greater chance of not getting there -- and no fair lowering the bar to 0.05 post-test if you fail to get to 0.01!

Also, if you set the bar too high, you run the risk of making an error in the opposite direction (i.e., saying there isn't a difference when there really is one).  However, in practice, almost nobody here worries about these types of errors, being much more concerned instead with not making the mistake of saying there is a difference when there really isn't one.

Quote
Let's say I do an ABX test on a hard-to-spot sample/setting. I'll need some practice/training rounds to be prepared. When I start the test and my 1st choice is wrong (0/1) I say "this belongs to the training" and press Reset. If the 1st choice is right, but the second is wrong (1/2) I say "I was unconcentrated and need to focus better" and press Reset.


That's called cherry-picking, and in order for a test to be strictly proper, that sort of thing can't be allowed.  That is, every trial must count.  So if you think you need training, train yourself before you make your first choice.  The best way to avoid cherry-picking is either to make all the trials blind (so you can't see your results in real-time) or to use some sort of ABX profile (which nobody has implemented, so far).

ff123

 

ABX

Reply #7
Quote
Quote
To achieve statistically significant result (p <= 0.5%)

Ehm... that's wrong.

It should be p<=5% or p<=0.05 . (same thing).

Yeah! Thanks.

I wonder what I was thinking....I totally messed up with the number of trials.

It should be just 5 %, of course.

What I failed to remember is that those numbers (8 ABX and 5 ABA) were the number of required trials IF one of the trials was guessed wrong.
Actually it happens to me a lot on a near-transparent samples.

So, basically 5 correctly guessed trials in ABX and 3 in ABA are enough if the difference is obvious.

If difference is not obvious, then I should better go for 8 ABX and 5 ABA - to start with. Then I look at the result: if all guessed right - very well, p < 1%. If only one trial is guessed wrong (it happens very often to me) - also good, p < 5%. If more trials are wrong - continue testing or give up.