## Topic: Minimum number of required ABX trials (Read 12723 times)previous topic - next topic

0 Members and 1 Guest are viewing this topic.
• saratoga
Minimum number of required ABX trials
##### Reply #25 – 11 January, 2012, 09:37:18 PM
Before to post such way please re-read my posts and You will see that and I refer this issue as crediblity, Mister Wisdom itself.

I'm having a lot of trouble understanding your writing, but you said this:

5/5 (5 correct tries of total 5 tries) will give p=0.03125 (~3%) (p<0.05). -> Excellent result!

I'm pointing out to you that this is not in general true for the reasons the link you provided explained.  And in this particular case, its absolutely wrong.  Since you did not retract that above, I assume you still believe it to be true.  Could you explain why?

We can play this game a long time.  Why?
I can do exatly the same question to You: Why not?

I've actually already provided you with an explanation in my own words, and a citation from your own link also explaining why.  Its up to you to explain why I and your own link are wrong.

We´re mixing two different things: credibility and statistics.

I would say credibility at a minimum involves either supporting your positions or admitting you are wrong.

• IgorC
Minimum number of required ABX trials
##### Reply #26 – 11 January, 2012, 09:55:47 PM
A lot of discussion here but it won't change the disagreement.  We can continue to play "now catch me" .
I'm quit of discussion '5/5' as  in my consideration I explain my point clear which is:  5/5 is more than acceptable (considering credibility as well as other factors).

• sauvage78
Minimum number of required ABX trials
##### Reply #27 – 11 January, 2012, 10:01:41 PM
Even if I think 5 trials is too low to convince others & agree with greynol & co, I fear IgorC is speaking to a wall due to lack of real life ABXing practice from his contradictors.

If the ABXer exactly knows what artefact he listens, 5 trials is more than enough to convince himself. The problem only start if he tries to convince others. 5 trials might not be enough, by itself alone (without knowing the ABXer) .

Because the necessary number of trial to convince others could be be arbitrary depending on who you try to convince, you have to rely on statistics. But statistics are not perfect.

Maths & statistics is the a commun langage for us to understand each other. Because ABXing tests are made by humans, despite maths & statistics you cannot draw an result absolute TRUTH from them.

ABXing is not pure maths, you wouldn't need human if it was.

Edit:
Blindly trusting in the fact that the statistics behind ABXing would provide an universal scientific truth, is a placebo effect by itself & it completly defeats the original purpose of ABXing. Instead of comparing it to flipping a coin, you should compare it to playing poker ... ABXing is here to provide you with a pair of Ace ... but even with a high probability to win, you can lose with a pair of Ace ...

The only moment you know that an ABX test is 100% correct, is when you have double tested it & your results agree. Any other claim about quality is a statistical approximation, no matters how close is the approximation.
Rip & Check: EAC Secure [Low/C2]+CUETools [AR Confidence 2+]
NAS (Backup): Flac -4 (for Speed) | CDImage+CUE with F2K
DAP (Playback): Opus 128Kbps | Tracks with F2KM on Android (LG G5)
Video: VP10 (2160p30@24Mbps/2160p60@48Mbps) Asap !!!

• saratoga
Minimum number of required ABX trials
##### Reply #28 – 11 January, 2012, 10:06:09 PM
A lot of discussion here but it won't change the disagreement.  We can continue to play "now catch me" .
I'm quit of discussion '5/5' as  in my consideration I explain my point clear which is:  5/5 is more than acceptable (considering credibility as well as other factors).

If you're not willing to explain why you believe this so that I can help you, I'll just leave you with your link's explanation of why this is not true in general:

Quote
Here's another example where numbers can fool us. If we test 20 cables, one by one, in order to know if they have an effect on the sound, and if we consider that p < 0.05 is a success, then in the case where no cable have any actual effect on the sound, since we run 20 tests, we should all the same expect in average one accidental success among the 20 tests ! In this case we can absolutely not tell that the cable affects the sound with a probability of 95%, even while p is inferior to 5 %, since anyway, this success was expected. The test failed, that's all.

For someone so concerned about credibility, you're oddly unwilling to swallow your own pride to save it.

• greynol
• Global Moderator
Minimum number of required ABX trials
##### Reply #29 – 11 January, 2012, 10:13:34 PM
IgorC is speaking to a wall due to lack of real life ABXing practice from his contradictors.

Are you really trying to tell us that IgorC's detractors have no ABX experience?  Simply reading posts by these people will put that myth to rest.

Blindly trusting in the fact that the statistics behind ABXing...

I don't see this taking place here.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

• saratoga
Minimum number of required ABX trials
##### Reply #30 – 11 January, 2012, 10:13:50 PM
Even if I think 5 trials is too low to convince others & agree with greynol & co, I fear IgorC is speaking to a wall due to lack of real life ABXing practice from his contradictors.

Are you claiming that I (or greynol) have not performed ABX tests?

If the ABXer exactly knows what artefact he listens, 5 trials is more than enough to convince himself. The problem only start if he tries to convince others. 5 trials might not be enough, by itself alone (without knowing the ABXer) .

5 trials is enough IMO for a single test, but I would not be convinced if I did 5 trials across multiple samples and compared the results.

Because the necessary number of trial to convince others could be be arbitrary depending on who you try to convince, you have to rely on statistics. But statistics are not perfect.

Maths & statistics is the a commun langage for us to understand each other. Because ABXing tests are made by humans, despite maths & statistics you cannot draw an result absolute TRUTH from them.

Who is trying to draw an absolute truth here?  As I said before, there is better then a 1 in 4 chance of getting a result like this flipping a coin.  You really find p < 0.30 all that convincing?

• IgorC
Minimum number of required ABX trials
##### Reply #31 – 11 January, 2012, 10:46:10 PM
I think I understand what sauvage78 wants to say. It's not like You (saratoga and greynol) haven't done ABX test but rather that in real life the claim part will post wrong ABX results or in worse case will cheat (maybe even on him/herself) once will be tired to perform unsuccessful ABX. So again we are at the same place: credibility & statistics.

I prefer games "let's figure out it together" instead of "you catch me now"

• saratoga
Minimum number of required ABX trials
##### Reply #32 – 11 January, 2012, 10:50:40 PM
It's not like You (saratoga and greynol) haven't done ABX test but rather that in real life the claim part will just prefer to cheat (maybe even on him/herself) once will be tired to perform unsuccessful ABX. So again we are at the same place: credibility & statistics.

I think its unfair to imply that the OP is lying.  I certainly believe he has correctly and truthfully reported the numbers, and that he is trying his best.  I'm just pointing out that the statistics do not support his conclusion.

• sauvage78
Minimum number of required ABX trials
##### Reply #33 – 11 January, 2012, 10:54:31 PM
Quote
Are you claiming that I (or greynol) have not performed ABX tests?

I don't know, as I said I don't trust anyone until I have double tested him. I just didn't double test neither saratoga or greynol. Maybe there are logs floating around. I don't know & don't care.

But I wouldn't worry about that if I were you because it happens that if I recall well I have already disagreed about vorbis vs nero quality (I think Nero AAC was better) with IgorC in the past. So for me the "credibility" of IgorC is not "perfect" as we don't always agree together. Does this mean he is wrong or dishonest, does this mean I am better than him ? No, it means ABXing is not a perfect science.

So in the end my only real objection to you & Greynol is & has always been that "perfectly" ABXing takes time. That perfection doesn't exist when ABXing & that knowing this fact anyone who seriously wants to ABX have to make compromise between theory & practice & take the middle road.

It's one thing to post 2 or 3 logs with 16 trials from time to time. It's another thing when you want to create a big test & when you try to run a big test you understand that you have to lower your requirement & make compromise in order for the test to be physically do-able.

In the end all I am saying is the fact that telling a newbie that 16 trials is the solution to his 5 trials problem is not the solution. If 5 trials is too low to convince others, you do not have to publish test with 16 trials if you think it is a waste of time. There a sweat spot in between 5 & 16 that should be enough to convince others of your honnesty, & if others are not satisfied with 8 to 12 trials depending on your success (or whatever is your personnal methodology) ... as long as you conform to the TOS ... well let them go to hell, you did your best.
Rip & Check: EAC Secure [Low/C2]+CUETools [AR Confidence 2+]
NAS (Backup): Flac -4 (for Speed) | CDImage+CUE with F2K
DAP (Playback): Opus 128Kbps | Tracks with F2KM on Android (LG G5)
Video: VP10 (2160p30@24Mbps/2160p60@48Mbps) Asap !!!

• IgorC
Minimum number of required ABX trials
##### Reply #34 – 11 January, 2012, 10:57:31 PM
It's not like You (saratoga and greynol) haven't done ABX test but rather that in real life the claim part will just prefer to cheat (maybe even on him/herself) once will be tired to perform unsuccessful ABX. So again we are at the same place: credibility & statistics.

I think its unfair to imply that the OP is lying.  I certainly believe he has correctly and truthfully reported the numbers, and that he is trying his best.  I'm just pointing out that the statistics do not support his conclusion.

Oh, no, I haven't refer particularly to OP but in general.

• saratoga
Minimum number of required ABX trials
##### Reply #35 – 11 January, 2012, 11:02:05 PM
Quote
Are you claiming that I (or greynol) have not performed ABX tests?

I don't know, as I said I don't trust anyone until I have double tested him. I just didn't double test neither saratoga or greynol. Maybe there are logs floating around. I don't know & don't care.

If you don't know, its probably better not to make up things about people.  Some might consider that dishonest.

So in the end my only real objection to you & Greynol is & has always been that "perfectly" ABXing takes time. That perfection doesn't exist when ABXing & that knowing this fact anyone who seriously wants to ABX have to make compromise between theory & practice & take the middle road.

"Doing this correctly is hard" is not an objection, its a cop-out.  If audio were an easy thing, people wouldn't make a hobby or a science out of it.  But its not easy, and so we do.  If you dislike how difficult it is, take up fishing

In the end all I am saying is the fact that telling a newbie that 16 trials is the solution to his 5 trials problem is not the solution. If 5 trials is too low to convince others, you do not have to publish test with 16 trials if you think it is a waste of time. There a sweat spot in between 5 & 16 that should be enough to convince others of your honnesty, & if others are not satisfied with 8 to 12 trials depending on your success (or whatever is your personnal methodology) ... as long as you conform to the TOS ... well let them go to hell, you did your best.

Funny, I said:

Looking at this test, I think the OP did more trials then he needed to, but he did them of the wrong samples. It seems like he only thinks he can ABX one or two of the 10 samples, so the obvious thing to do would have been to do about 10-12 trials of each of those few samples.

So no disagreement from me!

• sauvage78
Minimum number of required ABX trials
##### Reply #36 – 11 January, 2012, 11:02:56 PM
IgorC:
I was more trying to say that if TOS8 is very usefull to keep the forum social, it is not necessary to have too high requirement for newbies, because it might lead them to not ABX at all.

ABXing is already long & boring, you don't have to make it even worst by making ABXing rules too hard to follow.
Rip & Check: EAC Secure [Low/C2]+CUETools [AR Confidence 2+]
NAS (Backup): Flac -4 (for Speed) | CDImage+CUE with F2K
DAP (Playback): Opus 128Kbps | Tracks with F2KM on Android (LG G5)
Video: VP10 (2160p30@24Mbps/2160p60@48Mbps) Asap !!!

• IgorC
Minimum number of required ABX trials
##### Reply #37 – 11 January, 2012, 11:03:32 PM
It's one thing to post 2 or 3 logs with 16 trials from time to time. It's another thing when you want to create a big test & when you try to run a big test you understand that you have to lower your requirement & make compromise in order for the test to be physically do-able.

Exactly. It's fatigue.
So it's ok to ask one or two times ABX logs with 8 tries or so. Once We know who is the person I will believe him/her with 5 tries or just pure ABC/HR logs on large enough number of samples.

• greynol
• Global Moderator
Minimum number of required ABX trials
##### Reply #38 – 11 January, 2012, 11:04:08 PM
My favorite part was ranking the encoders to see which one produced the smallest 320 kbps CBR file.

It seems that the test was done @320kbit mp3 may have been lost on some people.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

• IgorC
Minimum number of required ABX trials
##### Reply #39 – 11 January, 2012, 11:06:08 PM
IgorC:
I was more trying to say that if TOS8 is very usefull to keep the forum social, it is not necessary to have too high requirement for newbies, because it might lead them to not ABX at all.

ABXing is already long & boring, you don't have to make it even worst by making ABXing rules too hard to follow.

Agree.

• saratoga
Minimum number of required ABX trials
##### Reply #40 – 11 January, 2012, 11:15:45 PM
I was more trying to say that if TOS8 is very usefull to keep the forum social, it is not necessary to have too high requirement for newbies, because it might lead them to not ABX at all.

So what, if you're new here we should give you 10 posts where you can say nonsense and everyone just kind of pats you on the head and pretends you're making sense?  Whats the point of that?  IMO everyone should be held to the same standards and treated like an adult.  Its condescending to pretend that some people can't handle real discussion.

My favorite part was ranking the encoders to see which one produced the smallest 320 kbps CBR file.

It seems that the test was done @320kbit mp3 may have been lost on some people.

But you might hurt his feelings by letting him know about the truth!

• greynol
• Global Moderator
Minimum number of required ABX trials
##### Reply #41 – 11 January, 2012, 11:19:00 PM
So it's ok to ask one or two times ABX logs with 8 tries or so. Once We know who is the person I will believe him/her with 5 tries or just pure ABC/HR logs on large enough number of samples.

You've clearly missed the point and perhaps it's my fault, though I did go out of my way to say I was talking about this specific situation as in this specific test with this specific poster.  It is in no way supposed to be interpreted as some universal law that will be applied throughout the forum.

I demanded a larger number of trials and samples before conclusions about sound quality be reached.  Sixteen trials per test and p<0.01 were offered up as a metrics that would accomplish this goal and hopefully better acquaint the OP with ABX testing.  Would twelve or even ten trials be OK?  Sure, it's certainly better than five.

It has also been suggested that the OP need not spend so much time on specific samples that are transparent and focus his efforts on samples that weren't (assuming that there actually were some!).  I agree with this too.

There really needn't be so much uproar over having the OP understand double-blind testing and interpretation of results and know that this is a prerequisite to posting conclusions about sound quality.  If it's too much trouble, then that's fine too, but if you can't substantiate your conclusions about sound quality with objective data then don't make them.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

• greynol
• Global Moderator
Minimum number of required ABX trials
##### Reply #42 – 11 January, 2012, 11:25:46 PM
But you might hurt his feelings by letting him know about the truth!

Nah, we're to believe that the OP has either identified new killer samples (which have not been shared, BTW), has golden ears or both.

...and no, I didn't ever think, let alone claim, that the OP is lying.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

• IgorC
Minimum number of required ABX trials
##### Reply #43 – 11 January, 2012, 11:46:55 PM
So what, if you're new here we should give you 10 posts where you can say nonsense and everyone just kind of pats you on the head and pretends you're making sense?  Whats the point of that?  IMO everyone should be held to the same standards and treated like an adult.  Its condescending to pretend that some people can't handle real discussion.

At first time try to explain him/her then will see.

• dhromed
Minimum number of required ABX trials
##### Reply #44 – 12 January, 2012, 04:47:13 AM
So what, if you're new here we should give you 10 posts where you can say nonsense and everyone just kind of pats you on the head and pretends you're making sense?  Whats the point of that?  IMO everyone should be held to the same standards and treated like an adult.

I think it's very important to "pat them on the head" first, and then sternly tell them they're not making sense, as per 2Bdecided's sentiments in this [a href='index.php?showtopic=11442']age-old thread[/a]:

Quote
We've got to allow people who don't know any better (and sometimes even those of us who do!) to make unsubstantiated claims at first, so that other members can point out that they're unsubstantiated, and suggest a fair way of testing them. This doesn't mean we accept unsubstantiated claims as truth, but it does mean that people sometimes need to be allowed to post them as a starting point for discussion and investigation. "I think X" is an unsubstantiated claim, but it's OK if it leads on to "How can I test if it's true?"

Catching flies with honey etcetera.

I personally don't need a pat on the head to before I'm told I speak gibberish, but then I'm a technical person familiar with the myriad posting styles on technical forums. The fact that a person came here with a grand testing project and went out of his way to provide test results -- however useless -- is already very special. It may feel like a child is coming up to you asking you to like his crappy kid doodles, and sure, you don't have time for that, but unlike artistic skill, a semblance of proper testing methodology can be explained in a few short posts.

• nesf
Minimum number of required ABX trials
##### Reply #45 – 12 January, 2012, 05:08:32 AM
From a complete newbie perspective: It could have been handled in a more friendly fashion but someone has to point out the flaws in the methodology of the test early for the person to go on and do more useful tests for the community. You catch more flies with honey sure, but if you child is holding their pencil wrong when writing their letters you correct them early before the habit forms. You then make sure they get a real good patting on the head any time you see them doing it correctly though. Perhaps the user should have been told how to do it correctly for two samples and then given a lot of praise for doing it right when they came back with their results?

Anyway, thanks for the thread, very interesting.

• apodtele
Minimum number of required ABX trials
##### Reply #46 – 12 January, 2012, 11:09:14 AM

I just want to point out the p-value is only a measure of confidence that one sees a difference however small it is. P-value is not a measure of how big the difference. P-value is not a measure of the quality. Using too many tests will always produce good p-value. The goal, however, is to accurately estimate the percentage. The closer it is to 50% the harder it gets.

Do we care about 55%, or 60%, or 80% of correct guesses as a hypothesis? This is what should be the subject of this thread. Then we can calculate the
number of necessary tests to test this hypothesis.

• krabapple
Minimum number of required ABX trials
##### Reply #47 – 12 January, 2012, 12:00:12 PM
It looks like we're groping towards a discussion of statistical power and/or Bayesian concepts like prior probability....interesting stuff.

Apodtele's link to Goodman's article prompted me to look for recent citations.  Now I've got these on my reading list:

http://www.ncbi.nlm.nih.gov/pubmed/19921345

http://www.ncbi.nlm.nih.gov/pubmed/21356064

http://www.ncbi.nlm.nih.gov/pubmed/15101426

• Porcus
Minimum number of required ABX trials
##### Reply #48 – 12 January, 2012, 04:07:25 PM

I just want to point out the p-value is only a measure of confidence that one sees a difference however small it is. P-value is not a measure of how big the difference. P-value is not a measure of the quality. Using too many tests will always produce good p-value. The goal, however, is to accurately estimate the percentage. The closer it is to 50% the harder it gets.

While I do agree with a great deal of the content of the article, I do not buy into your subsequent point. Sure it would be a good thing to be able to present some «Probability that I can guess X=A or X=B correctly in an ABX trial with these two samples, is in [range]», but I guess that we are better of with a slightly lesser ambition for the TOS#8 rule.

The «purpose» of TOS#8 is of course up to interpretation, so here is mine: To get rid of unfounded statements. At least a certain kind of them, namely those which claim some kind of audible difference (usually a «better than»-statement). Enter ABX: if you cannot tell the difference in a blind setting, then your statement concerning the audible differences, is unfounded. Ideally, we would want to measure whether your statements are true, but we don't do that -- if you pass the «I can tell the difference» test, then you are free to speak all kinds of nonsense about what the difference is.

And then comes the interpretation of an ABX session. Again, we would like to test whether the statements are true, but again, we settle for a lower level of ambition: how often would the coin outperform you at identifying? We set a certain standard for this.

Quote
Using too many tests will always produce good p-value.

If you do guess better than the coin, it does. And this is an issue if we are worried about a case where journalist boosts a small and unimportant difference into a «Scientists say there is no shadow of doubt anymore: A is different from B» headline. But that is luxury.

• saratoga
Minimum number of required ABX trials
##### Reply #49 – 12 January, 2012, 05:15:58 PM
From a complete newbie perspective: It could have been handled in a more friendly fashion but someone has to point out the flaws in the methodology of the test early for the person to go on and do more useful tests for the community. You catch more flies with honey sure, but if you child is holding their pencil wrong when writing their letters you correct them early before the habit forms. You then make sure they get a real good patting on the head any time you see them doing it correctly though. Perhaps the user should have been told how to do it correctly for two samples and then given a lot of praise for doing it right when they came back with their results?

I think we were pretty friendly in general, its just that by the time the OP told anyone what he was doing he had already made up his mind and it was too late to help him come up with a better test.  IMO it wasn't until I pointed out that he had gone 5 out of 5 trials the wrong way on one test that he started to realize what ABX testing was supposed to do, and by then he was frustrated at the wasted effort and just gave up.

Of course, since it wasn't just the OP, but also some experienced posters in this thread who misunderstood how ABX testing should work, perhaps our documentation needs to be made more accessible.  I'm not exactly sure how that should be done though.  I'm hesitant to put up guides to doing tests that imply that there is an absolute right way to do testing.  Maybe some examples of sound test setups for comparing different codecs on a problem sample would help people understand though.  I get the feeling that the wall of text on most of our links discourages people.