Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Unexpected results from ABX test (Read 12440 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Unexpected results from ABX test

I have just completed an ABX between a ripped flac file and a direct derivative 320kbps ogg vorbis file. I ripped my own CD to flac and converted the test file to ogg vorbis using the converter within foobar2000.  I did not expect to be able to discern any difference, and indeed I could not. I got bored with the process very quickly and guessed my way to the end of the trial. I used very modest Sennheiser HD202 headphones plugged into the PC to conduct the test

If you examine the time codes in the log file you should be able to deduce that I spent most of the time at the start of the trial setting up a start and end time, and then listening to A and then B over the selected section of the track.  I confess I am not a trained listener and generally if I cannot easily discern any difference I quickly move on. I am quite happy with my Spotify Premium subscription which I understand is 320kpbs ogg vorbis quality when 'extreme' quality is selected in the user settings.

So in the ABX test I scored 19/24 correct 'guesses'. The log shows a 0.3% chance that I was guessing, but as explained above I was most certainly guessing. You will have to take my word for it that this was the first and only test comparing these two tracks.

My position is that although the test does not see through my guessing, I still believe that it would be very difficult to positively prove listening differences that were not present. It is just a surprise that my results were not nearer 12/24.

I reproduce the log file below, and if desired I can place the compared files in my dropbox for a short period. (Although I do not wish to breach any copyright restrictions).

I would welcome any comments about my results.

foo_abx 2.0.4 report
foobar2000 v1.3.17
2018-01-04 12:04:06

File A: 206 5th mvt- Allegro ordinario (Tempo I).flac
SHA1: e7d581d6c2513e471db04a083887eb6dd0d8a0a0
File B: 206 5th mvt- Allegro ordinario (Tempo I).ogg
SHA1: 60f5d3a412540b371dc4d1523765431ad096e0e9

Output:
WASAPI (event) : Speakers (Realtek High Definition Audio), 16-bit
Crossfading: NO

12:04:06 : Test started.
12:13:30 : 01/01
12:13:47 : 02/02
12:14:02 : 03/03
12:14:12 : 04/04
12:14:15 : 05/05
12:14:17 : 05/06
12:14:19 : 06/07
12:14:21 : 07/08
12:14:33 : 07/09
12:14:37 : 08/10
12:14:40 : 09/11
12:14:42 : 10/12
12:14:45 : 10/13
12:14:46 : 11/14
12:14:48 : 11/15
12:14:50 : 12/16
12:14:51 : 13/17
12:14:53 : 13/18
12:14:55 : 14/19
12:14:56 : 15/20
12:14:58 : 16/21
12:15:00 : 17/22
12:15:01 : 18/23
12:15:03 : 19/24
12:15:03 : Test finished.

 ----------
Total: 19/24
Probability that you were guessing: 0.3%

 -- signature --
b5ab8581c0b4b9d96c0ab983d1af2434bfd85352


Re: Unexpected results from ABX test

Reply #2
My position is that although the test does not see through my guessing, I still believe that it would be very difficult to positively prove listening differences that were not present.
Reminds me of little Johnny's contention about ABX "false positives" and Arny resorting to diabolical cheating by merely guessing.
He posts as mmerrill99 and lord knows who else on other fora these days.
As greynol suggested, rinse and repeat
Loudspeaker manufacturer

Re: Unexpected results from ABX test

Reply #3
Quote
So in the ABX test I scored 19/24 correct 'guesses'. The log shows a 0.3% chance that I was guessing, but as explained above I was most certainly guessing. You will have to take my word for it that this was the first and only test comparing these two tracks.

The statistics you are quoting are only accurate if you fix the number of trials before doing the first test. If you set out to do exactly 24 trials and guessed every time, then you'd have a less than 1% chance. If you just guess for a while and stop when you are ahead, you can easily get relatively high ratios just from a coin flip. 

For a related idea, see: https://en.m.wikipedia.org/wiki/Gambler%27s_ruin

Re: Unexpected results from ABX test

Reply #4
If one is really 'guessing' without bias, some trials will come out as 24/24 and some as 0/24 and others anywhere in between. 50% is only an average over a large number of random guesses, even if the majority usually come out near there.

Re: Unexpected results from ABX test

Reply #5
The statistics you are quoting are only accurate if you fix the number of trials before doing the first test. If you set out to do exactly 24 trials and guessed every time, then you'd have a less than 1% chance. If you just guess for a while and stop when you are ahead, you can easily get relatively high ratios just from a coin flip.
This is a reason to use Bayesian inference instead. Using classical methods like p-values it is possible to bias results by stopping at a convenient time. However, the "stopping rule principle" means that in a Bayesian analysis the reason for stopping can be safely ignored. With the current system, the number of trials must be specified in advance or cheating is indeed possible.

Neither method deals with the problem of undisclosed results. Someone could easily do four sets of 16 trials and get relatively normal results but only publish the fifth set which was unusual.

Re: Unexpected results from ABX test

Reply #6
Quote
So in the ABX test I scored 19/24 correct 'guesses'. The log shows a 0.3% chance that I was guessing, but as explained above I was most certainly guessing.
Welcome to the wonderful world of probability and statistics!  :D  

IMO - If there's a significant "quality difference" you should get it right 24/24 (or 100/100).   If you can't hear the "defect" every single time, it's probably not worth worrying about and you're probably not going to hear it in everyday listening when you don't have "A" to compare it to.    Or if somebody says "MP3 sounds terrible", they'd better get it right every single time!

But, there's always some probability of guessing them all correctly. 

Re: Unexpected results from ABX test

Reply #7
I tend to think that if you have to really concentrate to try and discern differences then that itself says something even in cases where a difference can be discerned. Sometimes there is a discernible difference but one which is pretty much irrelevant for most people.

Re: Unexpected results from ABX test

Reply #8
Thanks for the responses. I will have a go at the comparison again at some stage and if I have to resort to guessing my way to the end of the trial again I will expect results nearer 8/16.

Unexpected results such as mine could be offered up as indicative of a real difference, especially if someone was eager to hear a difference. In my case the time gaps in between the results of just 1 or 2 seconds was just sufficient to vote without actually listening at all, so that gives my guessing game away.

I find that if any difference is not easily discernable, it is difficult to stay focused and listen carefully without resorting to plain guessing.

Re: Unexpected results from ABX test

Reply #9
A while ago I posted a string of ABX tests (here: https://hydrogenaud.io/index.php/topic,113949.0.html), and although my score wasn't quite as good as yours, then I did score 9/12 when just pressing A or B randomly. I then did the exact same thing 20 times more, but never got as good a result as this.
So like others have said, statistically speaking, there will be 24/24 if you try long enough. Derren Brown (who you should look up if you don't know him, as his TV shows are amazing) flipped a coin and got 10 heads out of 10 eventually.
"What is asserted without evidence can be dismissed without evidence"
- Christopher Hitchens
"It is always more difficult to fight against faith than against knowledge"
- Sam Harris

Re: Unexpected results from ABX test

Reply #10
If we always got the expected result, we wouldn't have to test anything. 

Re: Unexpected results from ABX test

Reply #11
That's why people have hopes in all kinds of gambling because the chance is not zero.
However, it is not an excuse to dismiss the importance of blind tests.

For those who think retesting similar things are waste of time, those lengthy flame wars, circular reasoning and unlimited excuses to avoid blind tests in this and other forums not only time-wasting, but also disgusting.

I'd be happy if HA adopts the Bayes approach as the new standard though, but of course, it is useful only if people are honest, or the test design is hard to cheat. For this reason I am also interested in learning spectrogram faking techniques :))

 

Re: Unexpected results from ABX test

Reply #12
Neither method deals with the problem of undisclosed results. Someone could easily do four sets of 16 trials and get relatively normal results but only publish the fifth set which was unusual.

Or, assuming "way more good faith" and yet getting the same problem:
Four people would get the normal result - which in their honest opinion is nothing to write forum posts about. Then every now and then, by coincidence, someone will by chance get a strange score. And a certain fraction of them would by chance get it twice.

By the way, on the Bayesian approach:
It does not seem controversial to advise the OP to try the test again, in order to find out whether we can write it off as chance.
But that advise is a (rough!) stopping rule.