Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: How do I actually perform ABX tests? (Read 8697 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

How do I actually perform ABX tests?

I have the ABX plugin with foobar, and I did some playing around with two files and didn't really know if I was doing the test correctly. Am I supposed to be comparing the difference between A and B or am I supposed to be looking for similarities between X and Y?

How do I actually perform ABX tests?

Reply #1
You have to decide if X or Y is A or B.

How do I actually perform ABX tests?

Reply #2
A and B are known, X and Y are randomized. After listening intently to all four A, B, X and Y, you should be able to say whether A is X or Y, and same for B.

You must repeat this process a number of times ("trials") in a single ABX session, preferrably 10-20. The result probability will either hover around 50%, or approximate zero. In the 50% case, you were clearly guessing, and thus incapable of hearing a difference between A and B. In the second case, there's a good chance you can hear the difference.

Try it once with completely different songs to be sure of what you're doing.

ABX tests prove beyond a shadow of a doubt whether a person can hear a difference or not.
ABX test are wholly incapable of determining which item has better quality.

How do I actually perform ABX tests?

Reply #3
ABX tests prove beyond a shadow of a doubt whether a person can hear a difference or not.


well...no.  A 'shadow of a doubt' is exactly what remains in a statistics-based result. We quantify how large that shadow is, via the p value...in the case of a 'no difference' conclusion, it's our willingness to risk a false negative result.  For a p=.05 (a typical, though not necessarily appropriate, value for such tests) we accept a 1-in-20 chance that our results were merely a fluke, rather than being informative. That's the shadow hanging over our conclusion.

You can shrink this, and your conclusion can lie far beyond any *reasonable* doubt, but it never actually reaches zero.

How do I actually perform ABX tests?

Reply #4
pf, nitpickins.

But of course, yes, there are facts, and then there are statistics.

How do I actually perform ABX tests?

Reply #5
Or, as Mark Twain would say, "Lies, Damned Lies and Statistics".

How do I actually perform ABX tests?

Reply #6
Also, ABX tests are designed to demonstrate perceived differences.  They aren't really intended to determine if things sound the same, let alone prove that things sound the same.


How do I actually perform ABX tests?

Reply #8
pf, nitpickins.

But of course, yes, there are facts, and then there are statistics.


http://xkcd.com/882/


ooh I like that!

But I think, nitpickingly speaking, that one would have to perform the *green* test 20 times to make that 1-in-20 point.  OR show that a different color gets a 'significant' result in the next 20 rounds.


How do I actually perform ABX tests?

Reply #9
http://xkcd.com/882/



Ha! That's good.

Replace "acne" with "a rare form of bone cancer" and "green jelly beans" with "fluoridated water" and you have a real story from the news I have read about. The fluoride scaremonger sites reduce it to simply "fluoride causes bone cancer", but when you look into it, it turns out an undergraduate decided to take a much larger, existing study, which found no correlation, and she broke it down into bunches of smaller sub groups. Sure enough, a certain age category of young boys showed a (slight) correlation between fluoridated water intake and this particular cancer, yet girls the same age didn't, nor did any other age group.

I would think there must be a name for this kind of error, does anyone know what it is?

How do I actually perform ABX tests?

Reply #10
I would think there must be a name for this kind of error, does anyone know what it is?


I've seen it called clusters (if memory serves). Take any randomly distributed dataset, like cancer incidence over a big enough area. There will, even if it's perfectly random be clusters within this data set. So a particular town could have a high incidence of say brain cancer and also happen to have overhead power lines running through it. The brain thinks these two have to be related even though it's just statistical noise. There are huge problems with this in statistics for obvious reasons.

Edit: From a quick google, my memory is failing me it's not called clusters. It's a sampling error problem coming from using a small section of a population (and thus a serious problem in small tests if they are not repeated elsewhere). You can't know if your small sample happens to be a sample containing people/things from one of these clusters or not a priori. This is all assuming that you're looking at something is normally distributed, rather than something that has a distribution with fat tails (i.e. more of a chance of extreme events than one expects with a normal distribution) which complicates things further.

Sorry, for all the edits, just woke up.

How do I actually perform ABX tests?

Reply #11
Clusters are something else.

Dunno what the universal term for this is, but in insurance and in certain branches of economics, it is called «selection». Simply, you select the dice after you have rolled them. (Google «adverse selection» -- then you select rush to action while everyone still treats the dice as random. In this case: you do N trials and report the best, while the uninformed public thinks it is a random draw.)

I guess the prototypical joke is this science demonstration at some public fair:
Scientist equips the audience with dice and tells them to roll N times and record the outcome. 
Scientist collects the data, draws the histogram on the overhead projector, explains the theory and opens for questions.
Journalist asks to get a picture to the newspaper story of the guy who was so good at rolling dice.

How do I actually perform ABX tests?

Reply #12
It's the "look-elsewhere effect". Related to the law of very large numbers: with a large enough sample size, any possible event will eventually occur. If you select sample sets from a larger total some will contain such very rare events.

 

How do I actually perform ABX tests?

Reply #13
I guess the prototypical joke is this science demonstration at some public fair:
Scientist equips the audience with dice and tells them to roll N times and record the outcome. 
Scientist collects the data, draws the histogram on the overhead projector, explains the theory and opens for questions.
Journalist asks to get a picture to the newspaper story of the guy who was so good at rolling dice.



It's the "look-elsewhere effect". Related to the law of very large numbers: with a large enough sample size, any possible event will eventually occur. If you select sample sets from a larger total some will contain such very rare events.



Yeah my memory is crap, did all this stuff in college and forgot the names of it. The phrase "look-elsewhere effect" doesn't ring any bells for me though, I think we called it something else. We always were presented it as illness "clusters", i.e. a town with say a high rate of mental retardation that also happened to have fluoridated water as a lesson in cause, effect and spurious correlation. I think it's slightly different to what you're talking about SoAnIs, it's more about finding unusual rates of something within a subset of the population that aren't consistent with the population rate than finding a rare event within a sample. Something along the lines of "Let me pick my sample and I can prove anything." It's like how in a randomly distributed data set there will be clusters of a particular event happening or not happening, so say heart disease was randomly distributed and we looked at a nation's distribution of it, we would by chance find towns and villages that have very high or very low rates of heart disease. Some people take these high rates or low rates and assume automatically that there needs to be some causal factor behind them when really they can just be a product of chance. Like the journalist in Porcus' joke.