Understanding ABX Test Confidence Statistics
Reply #21 – 2015-02-01 22:07:34
When an overall positive ABX result is recorded, it has by design, passed the false positive aspect of the test Nope. I've multiple times made a perfect score by just randomly clicking. Now imagine what a spectrum analyzer does... An online ABX test only works if you have honest participants that will not only not cheat but also point out and accept problems with the test files (like the time offset in the AVS AIX test files).a positive overall result means you have statistically successfully identified the audible difference i.e your results are not false positives. Nope. You really should read up on statistics again.There really is no concern given to false negatives in these tests i.e how many of the trial results are due to the many, many reasons that people don't hear differences when real, measurable differences actually exist - these are false negatives. They can happen for all sorts of reasons. What is a "real" difference? A measurable difference certainly does not mean that there's an audible difference anyway. And nope, there is concern given to false negatives, for example by including low anchors in test files. But again, in an online test you can only assume that people try their best and also list the equipment they actually used. If this is not the case then it still does matter less than false positives, because we do not accept the null hypothesis anyway. Again, read up on statistics.To get more technical, in forced choice, binary classification tests, specificity & sensitivity are the necessary measures needed to judge the reliability & performance of the test. From Wiki "Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity (also called the true positive rate, or the recall rate in some fields) measures the proportion of actual positives which are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition), and is complementary to the false negative rate. Specificity (sometimes called the true negative rate) measures the proportion of negatives which are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition), and is complementary to the false positive rate." In any valid test false negatives & false positives should be given equal consideration Most audio DBTs are almost solely focussed on eliminating false positives - look at the recent changes to Foobar ABX. As a result nobody has any handle on the error rate embedded in the results due to false negatives. Asking people to accept test results whose error rate is unknown is simply asking for their blind faith. It certainly takes faith to accept (positive) results of demonstrably dishonest people. But that aside, it seems you are trivializing this. Black/white kinda thinking, as you did above. How would an online test look like where you can calculate specificity for each participant? I would really be interested in your answer. It's hard enough (= impossible) to get honest people doing the required number of trials and sending in their results regardless of success.So, if you want to produce results that aren't based on blind belief then include controls for false negatives & produce these stats along with the results. Nope. No faith required. Where you need faith, or let's better call it gullibility, is with the (most of the time) positive dishonest sighted listening tests.