## What does probability that your guessing mean?

##### 2015-09-24 18:22:32
Hi guys I'm new to this audio decoding and ABX test. So what does it mean? The bigger the value is the better your hearing or vice versa? I just do some test between 320 kbps mp3 and 320 kbps aac. Here the result. Is it good or is it bad?

foo_abx 2.0.1 report
foobar2000 v1.3.9 beta 3
2015-09-25 00:45:34

File A: 01 - &Z .mp3
File B: 01 - &Z .m4a
SHA1: 9a87f5cd6634ba3a2060046b8ae4ffb938713f6e

Output:
DS : Primary Sound Driver

00:45:34 : Test started.
00:46:35 : 01/01
00:47:15 : 02/02
00:48:14 : 03/03
00:49:17 : 04/04
00:50:04 : 05/05
00:51:05 : 05/06
00:53:09 : 06/07
00:54:28 : 06/08
00:59:35 : 07/09
01:00:23 : 08/10
01:00:58 : 09/11
01:01:58 : 09/12
01:02:25 : 09/13
01:02:25 : Test finished.

----------
Total: 9/13
Probability that you were guessing: 13.3%

-- signature --
5a7c986d6c1f8d882e1ef78f193de0ab3d198bf3
I will think about tomorrow's problem tomorrow

##### Reply #1 – 2015-09-24 18:26:50
Did you set out to do exactly 13 trials in advance? Because you must fix the number of trials before doing the first one or else the probability will be wrong.

A high probability of guessing means you failed the test.

##### Reply #2 – 2015-09-24 18:37:06
Yeah I set it to 13 because 16 is too much for me.
So that mean that test was a success?
I will think about tomorrow's problem tomorrow

##### Reply #3 – 2015-09-24 18:43:33
Yeah I set it to 13 because 16 is too much for me.
So that mean that test was a success?

You succeeded in performing the test correctly, but there is insufficient evidence that you are able to tell them apart. We would look for a probability of 5% or less.

By the way, If you had been watching your results and quit when you got 09/11, your results would have looked a lot better, but the test would have been invalid because you did not do the number of trials you set out to (and you were cheating by looking at the results while performing the test).

##### Reply #4 – 2015-09-24 19:03:43
Consider letting the computer do this test. He just randomly chooses A or B and will on average get half of them right, but sometimes he will get 9/13, 10/13 ... or even 13/13 right.

The "probability of guessing" tells you exactly that.
13.3% means that if we repeated the test with random guessing a 100 times, on average 13.3 test results would have a score of 9/13 or better.

We generally accept 5%, so to put it simply: score lower than 5% and you've passed.
"I hear it when I see it."

##### Reply #5 – 2015-09-24 19:46:07
So what does it mean?

Not what it says, unfortunately. It is not the probability that you were guessing, it is the probability that guessing would get you a result as good as this or better.

So what it is doing, is comparing your performance to repeated cointossing. If you cannot demonstrate with confidence that you outdo the coin, then you should not claim you can identify the difference.
High Voltage socket-nose-avatar

##### Reply #7 – 2015-09-25 00:34:36
@AndyH-ha: Well, nobody here is saying it is a definite test. But given a limited number of true/false trials that one person does, I don't think you can do any better. I think most of us do weigh the results based on the claim. A claim regarding audibility between an uncompressed vs. losslessly compressed file is very different from a low bitrate lossy file vs. the original.

Quote
He intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look.

I tend to say worthy of investigation, if the hypothesis is "interesting" that is.
"I hear it when I see it."

##### Reply #8 – 2015-09-25 15:31:36
"Probability that you were guessing" is misleading and I'd say "Probability that you were just lucky" or just "P-value". Suppose you can't distinguish anything, but you can still sometimes get your score(9/13) or better score(10+/13) by pure coincidence or chance. P-value is a probability of the coincidence(13.3%).

##### Reply #9 – 2015-09-25 15:37:44
You must also take scores like 1/16 into account.

Probability that you can do the same or better by simply guessing (or flipping a fair coin) is the best description so far.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #10 – 2015-09-25 16:32:39
The discussion on the wording comes up every now and again, but I think Peter kept the old wording because it is concise and only slightly "wrong".
It's only audiophile if it's inconvenient.

##### Reply #11 – 2015-09-25 17:34:54
As I see it, using precise wording starts becomming overly tedious.

Case in point, I now want to amend "simply" in my above post to "randomly."  As it has been discussed in the past, you may actually be hearing a difference, even though you think you're just guessing. The fair coin removes that problem, but what happens if there are more than two choices, or does foobar yet permit more than two choices?
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #12 – 2015-09-25 17:43:26
The wording "guessing" implies some wise insights, when it is not smarter than flipping a coin. It is certainly confusing for newcomers.

##### Reply #13 – 2015-09-25 18:03:28
"guessing" is not the problem - I don't understand this as educated but random guessing. The problem is implying that we calculate the probability of you guessing.

You can achieve any result by randomly clicking buttons, and the probability of you guessing would have to be 100% in each such test, but that is not what is calculated.
"I hear it when I see it."

##### Reply #14 – 2015-09-25 18:11:58
Still concise(?) and but not slightly wrong:

Probability that you can do the same or better by flipping a coin: 5%
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #15 – 2015-09-25 18:40:04
I've always assumed it means: Likelihood your success was simply due to dumb luck.

##### Reply #16 – 2015-09-25 18:46:51
I've always thought of it as the inverse of how seriously to take the results.

##### Reply #17 – 2015-09-25 20:44:46
I've always assumed it means: Likelihood your success was simply due to dumb luck.

If by dumb luck you mean either good luck or bad luck then that should get the job done, even if isn't echnically accurate.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #18 – 2015-09-25 21:11:58
To fix the ambiguity of luck meaning either just good or good/bad, how about the words "random chance" instead:

Likelihood your success was simply due to random chance.

##### Reply #19 – 2015-09-25 21:25:22
One more refinement:
"success was" -> "successes were" in the event you got at least 2 correct. "Success" won't work if all your results were wrong.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #20 – 2015-09-25 21:49:50
Would anyone mind if I removed the silly "!" topic icon?  The OP's username should raise questions about whether this discussion is urgent.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #21 – 2015-09-25 21:50:10
I'd be baffled as to what "successes" meant and would think "I only took this test once. What do they mean by successes, plural?"

Likelihood your results were simply due to random chance. or Likelihood your score was simply due to random chance.

##### Reply #22 – 2015-09-25 22:01:17
I still much prefer (but hey, it ain't my call):

"Probability that you can do the same or better by flipping a coin:"

It makes perfect sense for all possible scores (0/N, 1/N ... N/N).

...because that is essentially the definition of the value...expressed in lay terms.
Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

##### Reply #23 – 2015-09-25 23:19:27
Likelihood this score level or higher* was actually attained simply due to random chance.

The word actually could be omitted however I think it is helpful to newbs who might go into this thinking they are dead sure their score proves they can hear a difference. It gently explains to them that they need to be open minded to the possibility they are wrong.

*(or even more extreme than this, would be technically better but sounds too awkward)