History and Accreditation of ABX Testing/Invention?
Reply #63 – 2014-12-09 15:45:26
I don't buy this "it's so difficult - how can anyone be expected to manage this" argument. I assume the context is that we're taking people who think sighted testing is OK, and are trying to get them to blind test? So they're coming from sighted tests where there's no easy way to listen to the same segment twice, no easy way to synchronise sources, sometimes no easy way to switch quickly at all etc etc. Then you give them a double-blind test, solve all these problems, but it's suddenly too hard to hear a difference? Come on. We both know why it's too hard to hear the difference: no one is telling them what to expect any more. Boo Hoo. Well, you need to "buy it" because I have concrete data to back what I said, and contra to your statement. When we had this very discussion/tests on AVS Forum, Mark Henninger who is an AVS Writer and influential member, constantly ridiculed anyone being able to pass these tests. He proceeded to post his results repeatedly showing random outcomes:In these tests, I tried to increase the number of iterations in order to decrease the margin of error. PC -> Optical S/PDIF -> Pioneer Elite SC-55 -> Sony MDR-1R foo_abx 1.3.4 report foobar2000 v1.3.2 2014/07/15 08:40:55 File A: E:\AVS\Foobar ABX\Jangling Keys\keys jangling band resolution limited 4416 2496.wav File B: E:\AVS\Foobar ABX\Jangling Keys\keys jangling full band 2496.wav 08:40:55 : Test started. 08:41:50 : 01/01 50.0% 08:42:08 : 01/02 75.0% [...] 08:47:00 : 13/29 77.1% 08:47:07 : 13/30 81.9% 08:47:13 : Test finished. ---------- Total: 13/30 (81.9%) and... foo_abx 1.3.4 report foobar2000 v1.3.2 2014/07/15 09:14:40 File A: E:\AVS\Foobar ABX\Jangling Keys\keys jangling band resolution limited 3216 2496.wav File B: E:\AVS\Foobar ABX\Jangling Keys\keys jangling full band 2496.wav 09:14:40 : Test started. 09:15:33 : 00/01 100.0% 09:15:45 : 00/02 100.0% 09:16:20 : 01/03 87.5% [...] 09:22:30 : 11/29 93.2% 09:22:45 : 11/30 95.1% 09:22:48 : Test finished. ---------- Total: 11/30 (95.1%) Mark Henninger After a bunch of back and forth and seeing my results and technique for passing such tests, he posts this remarkable outcome: http://www.avsforum.com/forum/91-audio-the...ml#post25871786 Laptop? Practice? Well, I decided to give my laptop a try since Amir did so well using his. Lo and behold, I had little difficulty with the 16/32 key jangling test. Not quite perfect, but I suspect a bit more practice would get me up to perfect. My laptop is a Sony Vaio PCG-41412L with the HD upgraded to a SSD. All audio enhancements are off. I used a pair of Sony MDR-1R headphones. The results speak for themselves; I found a critical segment that revealed an audible difference. I've had some practice, which helped—just as Amir suggested. Now, I can pass an ABX test I previously failed. I'll tackle the 16/44 test next. Oh, and it was a piece of cake to pick out the differences in the 16/16 and 22/16 tests. foo_abx 1.3.4 report foobar2000 v1.3.3 2014/07/19 11:26:49 File A: C:\Users\mark_000\Downloads\keys jangling band resolution limited 3216 2496.wav File B: C:\Users\mark_000\Downloads\keys jangling full band 2496.wav 11:26:49 : Test started. 11:27:29 : 00/01 100.0% 11:28:58 : 00/02 100.0% 11:29:46 : 00/03 100.0% 11:29:59 : 01/04 93.8% 11:30:06 : 01/05 96.9% 11:30:16 : 02/06 89.1% 11:30:26 : 03/07 77.3% 11:30:34 : 04/08 63.7% 11:30:45 : 05/09 50.0% 11:31:00 : 06/10 37.7% 11:31:10 : 07/11 27.4% 11:31:29 : 08/12 19.4% 11:31:41 : 09/13 13.3% 11:32:05 : 10/14 9.0% 11:32:20 : 10/15 15.1% 11:32:30 : 11/16 10.5% 11:32:41 : 12/17 7.2% 11:32:52 : 13/18 4.8% 11:33:07 : 13/19 8.4% 11:33:16 : 14/20 5.8% 11:33:28 : 15/21 3.9% 11:33:40 : 16/22 2.6% 11:33:58 : 17/23 1.7% 11:34:12 : 18/24 1.1% 11:34:25 : Test finished. ---------- Total: 18/24 (1.1%) Mark Henninger He went from being casual listener with extreme bias as to the outcome of the test, to a more critical listener and neutral experimenter. That was all that it took for the outcome to reverse. So no, we are not discussing the people who do sighted tests. They could care less about double blind tests of any kind. We are discussing people who shout from mountaintop about double blind tests who: 1. Won't participate in hardly any of them. 2. Defend poor implementations of tools. 3. Lack the professional experience to know and understand how users behave in such tests. 4. Lack the professional experience to know how to create tests that show small differences. 5. Routinely violate best practices in the industry as JJ mentioned and I have said until am blue in the face. The problem is "us." We say we believe in double blind testing yet demonstrate little interest in properly conducting such tests. Our excuse? Oh, the other guy uses sighted tests. What does that have to do with what we do? Let's set the best example here. Let's make the tools and experiments as good as we can make them. Let's all of us participate and run these tests instead of accusing each other of not knowing how to run a test like this by comparing a track to silence. If we are not willing to reform ourselves, then let's not remotely go to the place where we say we want to reform others.