HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: sauvage78 on 2012-01-11 23:41:26

Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-11 23:41:26
The minimal number of trials depends on how successfull you are.
How quickly you are sucessfull shows how confident in yourself you are.

The time & number of sucessfull trials are tied, you should never separate them when judging an ABX log.

With F2K ABX component, 8 sucessfull trials in a row (& if all successfull in a row, it usually means quick) trials is the minimum for me.

As soon as you begin to fail you can easyly increase to 10 or 12 to try to "erease" your failures.
In this case if you fail once or twice you can usually still get a signifiant result although it usually means the ABXing was hard, & by consequence longer as you begin to hesitate.

Usually if you begin to fail more than 3 times on 12 trials, it begins to be so hard & you have so much hesitation that it begins to take forever to ABX. At this stage I usually give up by myself & declare that I cannot ABX as in general it means I am not sure that the audio part I am focusing on actually contains any real artefact.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-11 23:44:03
It is expected that you choose the number of trials in advance rather than continue testing in hopes that you get the desired result.  This is spelled out pretty clearly in the link (http://www.hydrogenaudio.org/forums/index.php?showtopic=16295) for those who bother to read it.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-11 23:45:53
The minimal number of trials depends on how successfull you are.
How quickly you are sucessfull shows how confident in yourself you are.

The time & number of trials are tied, you should never separate them when judging an ABX log.

With F2K ABX component, 8 sucessfull trials in a row (& if all successfull in a row, it usually means quick) trials is the minimum for me.


Additionally, if you're going to do more samples, you should do more trials per sample, since the odds of being wrong by chance increase the more times you test.  So 8 trials might be good for you for one sample, but if you do 2 samples, then you have just halved your probability of not being wrong. 

The problem with a test like this is that theres a lot of samples (10 total!) and only a few trials per sample.  So the odds of getting these results by just tossing a coin are actually fairly high.  If I'm doing my math right, you have greater then a 1/4 chance that at least one of these 10 trials will return 5/5 correct choices even with just random guessing!
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-11 23:56:45
Well there is the theory & there is real life practice. I agree that chosing the number of trials before the test & chosing a higher number of trials to avoid randomness is better ... but just try to organize a real test with several samples, several encoders (& maybe even several bitrates) ... you will quickly realize that the time it takes goes exponential very quickly specially as the part when you actually decide what to test also takes time & usually goes unoticed for beginner who read the test ...

To be short theorical perfection is indeed better, but time is the ennemy. If you cannot take the necessary time to ABX correctly, you'd better prevent yourself from making any quality claims.
That's why serious ABXing on a large scale is rare.

It''s been years that I wanted to re-ABX Apple AAC & CELT & so far I have always been unable to run the test as it would need 3 days in a row of work, which is discouraging as I use 100% lossless anyway.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 00:05:34
To be short theorical perfection is indeed better, but time is the ennemy. If you cannot take the necessary time to ABX correctly, you'd better prevent yourself from making any quality claims.


You can mitigate this to some extent by carefully picking what you will ABX.  If A/B comparisons of a sample and control reveal no apparent differences, its probably not worthwhile to ABX it.  Just announce that you can't tell the difference and move onto something else.  Failing an ABX test proves nothing, so no sense doing it needlessly. 

Looking at this test, I think the OP did more trials then he needed to, but he did them of the wrong samples.  It seems like he only thinks he can ABX one or two of the 10 samples, so the obvious thing to do would have been to do about 10-12 trials of each of those few samples.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 00:10:12
He should be conducting sets of 16 trials at first to get a feeling about how these things work, rather than being discouraged into taking shortcuts.

At this point in time it is only taking him about two minutes per test set and he's telling us the results are legitimate.  He complains that doing something other than a half-baked job will take too long and you're giving the impression (intentional or not) that what he's doing is OK.
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 00:13:08
I never said I judged this test valid, I only gave my opinion about what is a valid methodology for me. I usually only trust ABXer which I have personnaly double tested, so I don't know if it's valid for me & honnestly I don't care as I ain't gonna re-test this.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 00:17:36
I never said I judged this test valid, I only gave my opinion about what is a valid methodology for me. I usually only trust ABXer which I have personnaly double tested, so I don't know if it's valid for me & honnestly I don't care as I ain't gonna re-test this.



Was this directed at me?  If so, sorry, I don't think I understand what you mean.  If not, please ignore
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 00:20:25
I usually only trust ABXer which I have personnaly double tested, so I don't know if it's valid for me & honnestly I don't care as I ain't gonna re-test this.

...but this discussion isn't about you.

Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 00:27:54
saratoga:
Yes, I had the feeling that you were thinking that I was defending the topic starter by stating that "in some case (complete sucess on identified artefact) a fast ABXing with a low number trials (but not too low indeed) can be perfectly valid". It's not the case, I am not defending the topic starter. I am just trying to help him as HA users can be relentless while he seems to be making some effort to conform the TOS.

Edit:
Obviously his number of trials is too low & his "multiply my results" method is wrong, it's specially wrong as soon as you begin to fail.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 00:32:06
Yes, I had the feeling that you were thinking that I was defending the topic starter by stating that "in some case (complete sucess on identified artefact) a fast ABXing with low (but not too low indeed) can be perfectly valid".


Are you confusing me with another, maybe deleted post?  I never said that . . .

It's not the case, I am not defending the topic starter.


I never thought you were 

I admit I'm totally confused by your last 2 posts.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 00:36:55
I hate to say it but your testing was flawed.  5-6 trials for a single song isn't nearly enough.  You should conduct the test and make at least 10 determinations as to which sone is which.  5 guesses is way too short as the results can be skewed.  Increasing the sample number (i.e. how many times you pick which song is which) is necessary.  People gave you links in your original post where you did absolutely no testing but it looks like you didn't fully read them.

Completely False!
5 tries are not only enough but also an excellent indicator.  Re-read the ABX FAQ again http://www.hydrogenaudio.org/forums/index....showtopic=16295 (http://www.hydrogenaudio.org/forums/index.php?showtopic=16295)

To flip the coin 5 times requires  a really short time. But it doesn't mean that the results will be flawed. Because it will require much more time to get 5 times the same side of the coin. [On average] more exactly 32 times * 5 tries  (160 tries) to get a false positive with just random clicking.  And listener should be really insane to listen anything 160 times, hence not real scenario.

5/5 (5 correct tries of total 5 tries) will give p=0.03125 (~3%) (p<0.05). -> Excellent result!

Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 00:42:23
deleted post?

No posts have been deleted.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 00:50:04
Completely False!
5 tries are not only enough but also an excellent indicator.  Re-read the ABX FAQ again http://www.hydrogenaudio.org/forums/index....showtopic=16295 (http://www.hydrogenaudio.org/forums/index.php?showtopic=16295)


From that link:

Quote
Here's another example where numbers can fool us. If we test 20 cables, one by one, in order to know if they have an effect on the sound, and if we consider that p < 0.05 is a success, then in the case where no cable have any actual effect on the sound, since we run 20 tests, we should all the same expect in average one accidental success among the 20 tests ! In this case we can absolutely not tell that the cable affects the sound with a probability of 95%, even while p is inferior to 5 %, since anyway, this success was expected. The test failed, that's all.


Or hell, just read my post above where I said the same thing
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 00:53:30
it will require much more time to get 5 times the same side of the coin.

Completely false! link (http://www.hydrogenaudio.org/forums/index.php?showtopic=6651&st=25&p=70284&#entry70284)

Five in a row can happen right off the bat or can happen somewhere later on.  If we were to assume that all of his results were guesses then all of a sudden five in a row at some point in time doesn't even remotely touch upon the unreasonable.

...then there is the glaring zero out of five which you seemed to have overlooked!

Now of course there will be situations where artifacts are so obvious that five out of five would be considered acceptable, but that's where reproducibility comes in.  Unfortunately we don't have clips that are 30 seconds or less for others to verify.

Are five trials enough in this situation?
No, they are not!

Until an administrator or another moderator says otherwise, this is the way things stand with regards to this discussion in its current state.
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 01:31:58
Well I know this topic isn't about me  but my opinion on the topic is that ABXing is not pure statistics ... as soon as you select your sample & bitrate it is flawed statistics because you usually select the bitrate & sample in order to get failure from the start. Unlike a heads or tails coin flipping, nobody knows the average probability of failure\success of a target ABXing test because it depends both on how hard the ABX test is (samples\birate\encoders ...) & how good the listener is (ears\experience\patience ...). The coin is flawed, so applying pure math is good, but it has its real life limits.

So is there a minimal number of trials to identify an artefact ? for yourself the answer is NO, not really ... well usually 2 or 3 to be honest.
For yourself & yourself only, what matters is not the number of trials but the fact that you can identify the artefact. Identifying the artefact means that you know WHEN it happens in the sample & that you can DESCRIBE it. Knowing WHAT happens & WHEN it happens is what matters the most to me, because it means that you can ABX it for yourself 100% of time no matter the number of trials.

The number of trials is only usefull to convince others that you're not telling complete bullshits. This is why a minimum number of trials to get meaningfull statistical value is usefull.

Science means that you can repeat the experience. Once you know for yourself what you hear, the number of trials & how fast you can repeat your success is only usefull to convince others.
Obviously you need a higher number of trials to convince others than to convince yourself because they are lazy & won't double check your results.

So between 5 trials that is only good for yourself & 16 trials that is overkill for you, there is a real life "in between" which is statisticaly valid & usually it is between 8 & 12 depending on how sucessfull you are.

PS: Sorry if I was a little to extensive about myself...

Edit: Typo:expensive>extensive, as you can see saratoga my english is not perfect, so sometimes there are communication breakdowns. Sorry.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 01:45:33
it will require much more time to get 5 times the same side of the coin.

Completely false! link (http://www.hydrogenaudio.org/forums/index.php?showtopic=6651&st=25&p=70284&#entry70284)

That's not that easy. It directs the different issues.  Credibility (if it was irony or double sense post    ), bugs etc.  We don't know that for sure. Do we?
Otherwise You're shooting yourself right in the foot (if not in the other place as well). Because after years of applying TOS8 You mention the post where 12/13 isn't valid?
Sorry, with all respect,  refer yourself (even if You are Admin) to TOS8 and show me where 5/5 wasn´t good enough and one HA member (with a few years of registration) has right to claim from new participant something that isn´t really necesary by rules (>5 tries.)

If one day I will post my ABX logs with just 5/5, please, do not ask me for more than that. Take it or leave it. Why? Because it´s the only thing we can do. Trust.
If one person cheats and provides You 20/20 it won´t makes it more valid than other guy with only but a true 5/5, is it?

Second, please, if you quote somebody´s post, quote the complete part.  Because only one part changes completely the sense of the original poster
Like this:

Quote
To flip the coin 5 times requires a really short time. But it doesn't mean that the results will be flawed. Because it will require much more time to get 5 times the same side of the coin.

Do You still think that it's not truth?

Code: [Select]
Five in a row can happen right off the bat or can happen somewhere later on.

Code: [Select]
 If we were to assume that all of his results were guesses then all of a sudden five in a row at some point in time isn't unreasonable.

Again. Credibility issues, more than statistics.  The listener usually tries several samples. Not just one and case closed.

Code: [Select]
...then there is the glaring zero out of five which you seemed to have overlooked!

I didn't overlooked it. My statements have a general character.


P.S. In the end, TOS8 works for everybody in the same way.  You have provide the result with p<0.05. We can beleive You... or not. But it will already the question of credibility and not statistics.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 01:50:15
Are five trials enough in this situation?
No, they are not!

Until an administrator or another moderator says otherwise, this is the way things stand with regards to this discussion in its current state.

 
So let's remove all previous public tests as well as all personal ABX logs with only 5/5 results because they are not anymore valid. 
Please show me which TOS stands that 5/5 isn't enough?
Greynol, don´t do that. That´s not the way.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 01:54:33
The difference between you and the OP is that you have earned a reputation.

If you came to our forum telling us, "5/5 is all you'll ever get from me take it or leave it," without providing samples and never demonstrating that you actually understand how to interpret double-blind test results, you might end up finding yourself in a position where you would no longer be allowed to post here.

Someone can flip a coin five times and only five times and get the same result.  Someone can flip a coin five times and only five times and guess wrong about the result every time as well.  You are wrong to say otherwise, and you did in fact say otherwise.  I suggest you take the time to help groom the OP to be more like you rather than try and then fail to play "gotcha" with me.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 01:56:56
Greynol, don´t do that. That´s not the way.

You may start another discussion or I can split this one if you wish.  This is no longer up for debate in this thread.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 02:00:21
Are five trials enough in this situation?
No, they are not!

Until an administrator or another moderator says otherwise, this is the way things stand with regards to this discussion in its current state.

 
So let's remove all previous public tests as well as all personal ABX logs with only 5/5 results because they are not anymore valid. 
Please show me which TOS stands that 5/5 isn't enough?
Greynol, don´t do that. That´s not the way.


As I have pointed out twice before, the results in this thread collectively do not reach p<0.05. 

Please, take the time to understand what people are saying before you accuse them of such nonsense.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 02:09:18
Are five trials enough in this situation?
No, they are not!

Until an administrator or another moderator says otherwise, this is the way things stand with regards to this discussion in its current state.

 
So let's remove all previous public tests as well as all personal ABX logs with only 5/5 results because they are not anymore valid. 
Please show me which TOS stands that 5/5 isn't enough?
Greynol, don´t do that. That´s not the way.


As I have pointed out twice before, the results in this thread collectively do not reach p<0.05. 

Please, take the time to understand what people are saying before you accuse them of such nonsense.

Before to post such way please re-read my posts and You will see that and I refer this issue as crediblity, Mister Wisdom itself.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 02:19:58
Before to post such way please re-read my posts and You will see that and I refer this issue as crediblity, Mister Wisdom itself.


I'm having a lot of trouble understanding your writing, but you said this:

5/5 (5 correct tries of total 5 tries) will give p=0.03125 (~3%) (p<0.05). -> Excellent result!


I'm pointing out to you that this is not in general true for the reasons the link you provided explained.  And in this particular case, its absolutely wrong.  Since you did not retract that above, I assume you still believe it to be true.  Could you explain why?
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 02:29:54
Before to post such way please re-read my posts and You will see that and I refer this issue as crediblity, Mister Wisdom itself.


I'm having a lot of trouble understanding your writing, but you said this:

5/5 (5 correct tries of total 5 tries) will give p=0.03125 (~3%) (p<0.05). -> Excellent result!


I'm pointing out to you that this is not in general true for the reasons the link you provided explained.  And in this particular case, its absolutely wrong.  Since you did not retract that above, I assume you still believe it to be true.  Could you explain why?

We can play this game a long time.  Why?
I can do exatly the same question to You: Why not?

We´re mixing two different things: credibility and statistics.
Statistics say: p<0.05  implies 95% validity, hence acceptable. But with one condition: credibility, in other words that the OP didn't run it a lot of time. 
We're mixing these two things as it was the same.
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 02:30:47
I don't even need a log anymore to trust /mnt or martel, not because of reputation ... but because I have double tested some of their results (when I needed samples) & most of time I agree with them & when I don't maybe they are more skilled than me, maybe I don't listen where I should.

So 3 trials or 300 trials, other than to build a crediblity & share results it doesn't matter. What matters is that you can re-do the test yourself & find the same results.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 02:37:18
Before to post such way please re-read my posts and You will see that and I refer this issue as crediblity, Mister Wisdom itself.


I'm having a lot of trouble understanding your writing, but you said this:

5/5 (5 correct tries of total 5 tries) will give p=0.03125 (~3%) (p<0.05). -> Excellent result!


I'm pointing out to you that this is not in general true for the reasons the link you provided explained.  And in this particular case, its absolutely wrong.  Since you did not retract that above, I assume you still believe it to be true.  Could you explain why?

We can play this game a long time.  Why?
I can do exatly the same question to You: Why not?


I've actually already provided you with an explanation in my own words, and a citation from your own link also explaining why.  Its up to you to explain why I and your own link are wrong.

We´re mixing two different things: credibility and statistics.


I would say credibility at a minimum involves either supporting your positions or admitting you are wrong.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 02:55:47
A lot of discussion here but it won't change the disagreement.  We can continue to play "now catch me" .
I'm quit of discussion '5/5' as  in my consideration I explain my point clear which is:  5/5 is more than acceptable (considering credibility as well as other factors).
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 03:01:41
Even if I think 5 trials is too low to convince others & agree with greynol & co, I fear IgorC is speaking to a wall due to lack of real life ABXing practice from his contradictors.

If the ABXer exactly knows what artefact he listens, 5 trials is more than enough to convince himself. The problem only start if he tries to convince others. 5 trials might not be enough, by itself alone (without knowing the ABXer) .

Because the necessary number of trial to convince others could be be arbitrary depending on who you try to convince, you have to rely on statistics. But statistics are not perfect.

Maths & statistics is the a commun langage for us to understand each other. Because ABXing tests are made by humans, despite maths & statistics you cannot draw an result absolute TRUTH from them.

ABXing is not pure maths, you wouldn't need human if it was.

Edit:
Blindly trusting in the fact that the statistics behind ABXing would provide an universal scientific truth, is a placebo effect by itself & it completly defeats the original purpose of ABXing. Instead of comparing it to flipping a coin, you should compare it to playing poker ... ABXing is here to provide you with a pair of Ace ... but even with a high probability to win, you can lose with a pair of Ace ...

The only moment you know that an ABX test is 100% correct, is when you have double tested it & your results agree. Any other claim about quality is a statistical approximation, no matters how close is the approximation.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 03:06:09
A lot of discussion here but it won't change the disagreement.  We can continue to play "now catch me" .
I'm quit of discussion '5/5' as  in my consideration I explain my point clear which is:  5/5 is more than acceptable (considering credibility as well as other factors).


If you're not willing to explain why you believe this so that I can help you, I'll just leave you with your link's explanation of why this is not true in general:

Quote
Here's another example where numbers can fool us. If we test 20 cables, one by one, in order to know if they have an effect on the sound, and if we consider that p < 0.05 is a success, then in the case where no cable have any actual effect on the sound, since we run 20 tests, we should all the same expect in average one accidental success among the 20 tests ! In this case we can absolutely not tell that the cable affects the sound with a probability of 95%, even while p is inferior to 5 %, since anyway, this success was expected. The test failed, that's all.


For someone so concerned about credibility, you're oddly unwilling to swallow your own pride to save it.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 03:13:34
IgorC is speaking to a wall due to lack of real life ABXing practice from his contradictors.

Are you really trying to tell us that IgorC's detractors have no ABX experience?  Simply reading posts by these people will put that myth to rest.

Blindly trusting in the fact that the statistics behind ABXing...

I don't see this taking place here.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 03:13:50
Even if I think 5 trials is too low to convince others & agree with greynol & co, I fear IgorC is speaking to a wall due to lack of real life ABXing practice from his contradictors.


Are you claiming that I (or greynol) have not performed ABX tests?

 

If the ABXer exactly knows what artefact he listens, 5 trials is more than enough to convince himself. The problem only start if he tries to convince others. 5 trials might not be enough, by itself alone (without knowing the ABXer) .


5 trials is enough IMO for a single test, but I would not be convinced if I did 5 trials across multiple samples and compared the results. 

Because the necessary number of trial to convince others could be be arbitrary depending on who you try to convince, you have to rely on statistics. But statistics are not perfect.

Maths & statistics is the a commun langage for us to understand each other. Because ABXing tests are made by humans, despite maths & statistics you cannot draw an result absolute TRUTH from them.


Who is trying to draw an absolute truth here?  As I said before, there is better then a 1 in 4 chance of getting a result like this flipping a coin.  You really find p < 0.30 all that convincing? 
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 03:46:10
I think I understand what sauvage78 wants to say. It's not like You (saratoga and greynol) haven't done ABX test but rather that in real life the claim part will post wrong ABX results or in worse case will cheat (maybe even on him/herself) once will be tired to perform unsuccessful ABX. So again we are at the same place: credibility & statistics. 

I prefer games "let's figure out it together" instead of "you catch me now" 
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 03:50:40
It's not like You (saratoga and greynol) haven't done ABX test but rather that in real life the claim part will just prefer to cheat (maybe even on him/herself) once will be tired to perform unsuccessful ABX. So again we are at the same place: credibility & statistics. 


I think its unfair to imply that the OP is lying.  I certainly believe he has correctly and truthfully reported the numbers, and that he is trying his best.  I'm just pointing out that the statistics do not support his conclusion.
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 03:54:31
Quote
Are you claiming that I (or greynol) have not performed ABX tests?

I don't know, as I said I don't trust anyone until I have double tested him. I just didn't double test neither saratoga or greynol. Maybe there are logs floating around. I don't know & don't care.

But I wouldn't worry about that if I were you because it happens that if I recall well I have already disagreed about vorbis vs nero quality (I think Nero AAC was better) with IgorC in the past. So for me the "credibility" of IgorC is not "perfect" as we don't always agree together. Does this mean he is wrong or dishonest, does this mean I am better than him ? No, it means ABXing is not a perfect science.

So in the end my only real objection to you & Greynol is & has always been that "perfectly" ABXing takes time. That perfection doesn't exist when ABXing & that knowing this fact anyone who seriously wants to ABX have to make compromise between theory & practice & take the middle road.

It's one thing to post 2 or 3 logs with 16 trials from time to time. It's another thing when you want to create a big test & when you try to run a big test you understand that you have to lower your requirement & make compromise in order for the test to be physically do-able.

In the end all I am saying is the fact that telling a newbie that 16 trials is the solution to his 5 trials problem is not the solution. If 5 trials is too low to convince others, you do not have to publish test with 16 trials if you think it is a waste of time. There a sweat spot in between 5 & 16 that should be enough to convince others of your honnesty, & if others are not satisfied with 8 to 12 trials depending on your success (or whatever is your personnal methodology) ... as long as you conform to the TOS ... well let them go to hell, you did your best.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 03:57:31
It's not like You (saratoga and greynol) haven't done ABX test but rather that in real life the claim part will just prefer to cheat (maybe even on him/herself) once will be tired to perform unsuccessful ABX. So again we are at the same place: credibility & statistics. 


I think its unfair to imply that the OP is lying.  I certainly believe he has correctly and truthfully reported the numbers, and that he is trying his best.  I'm just pointing out that the statistics do not support his conclusion.

Oh, no, I haven't refer particularly to OP but in general.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 04:02:05
Quote
Are you claiming that I (or greynol) have not performed ABX tests?

I don't know, as I said I don't trust anyone until I have double tested him. I just didn't double test neither saratoga or greynol. Maybe there are logs floating around. I don't know & don't care.


If you don't know, its probably better not to make up things about people.  Some might consider that dishonest. 

So in the end my only real objection to you & Greynol is & has always been that "perfectly" ABXing takes time. That perfection doesn't exist when ABXing & that knowing this fact anyone who seriously wants to ABX have to make compromise between theory & practice & take the middle road.


"Doing this correctly is hard" is not an objection, its a cop-out.  If audio were an easy thing, people wouldn't make a hobby or a science out of it.  But its not easy, and so we do.  If you dislike how difficult it is, take up fishing

In the end all I am saying is the fact that telling a newbie that 16 trials is the solution to his 5 trials problem is not the solution. If 5 trials is too low to convince others, you do not have to publish test with 16 trials if you think it is a waste of time. There a sweat spot in between 5 & 16 that should be enough to convince others of your honnesty, & if others are not satisfied with 8 to 12 trials depending on your success (or whatever is your personnal methodology) ... as long as you conform to the TOS ... well let them go to hell, you did your best.


Funny, I said:

Quote from: saratoga link=msg=0 date=
Looking at this test, I think the OP did more trials then he needed to, but he did them of the wrong samples. It seems like he only thinks he can ABX one or two of the 10 samples, so the obvious thing to do would have been to do about 10-12 trials of each of those few samples.


So no disagreement from me!
Title: Minimum number of required ABX trials
Post by: sauvage78 on 2012-01-12 04:02:56
IgorC:
I was more trying to say that if TOS8 is very usefull to keep the forum social, it is not necessary to have too high requirement for newbies, because it might lead them to not ABX at all.

ABXing is already long & boring, you don't have to make it even worst by making ABXing rules too hard to follow.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 04:03:32
It's one thing to post 2 or 3 logs with 16 trials from time to time. It's another thing when you want to create a big test & when you try to run a big test you understand that you have to lower your requirement & make compromise in order for the test to be physically do-able.

Exactly. It's fatigue.
So it's ok to ask one or two times ABX logs with 8 tries or so. Once We know who is the person I will believe him/her with 5 tries or just pure ABC/HR logs on large enough number of samples.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 04:04:08
My favorite part was ranking the encoders to see which one produced the smallest 320 kbps CBR file.

It seems that the test was done @320kbit mp3 may have been lost on some people.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 04:06:08
IgorC:
I was more trying to say that if TOS8 is very usefull to keep the forum social, it is not necessary to have too high requirement for newbies, because it might lead them to not ABX at all.

ABXing is already long & boring, you don't have to make it even worst by making ABXing rules too hard to follow.

Agree.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 04:15:45
I was more trying to say that if TOS8 is very usefull to keep the forum social, it is not necessary to have too high requirement for newbies, because it might lead them to not ABX at all.


So what, if you're new here we should give you 10 posts where you can say nonsense and everyone just kind of pats you on the head and pretends you're making sense?  Whats the point of that?  IMO everyone should be held to the same standards and treated like an adult.  Its condescending to pretend that some people can't handle real discussion. 

My favorite part was ranking the encoders to see which one produced the smallest 320 kbps CBR file.

It seems that the test was done @320kbit mp3 may have been lost on some people.


But you might hurt his feelings by letting him know about the truth!
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 04:19:00
So it's ok to ask one or two times ABX logs with 8 tries or so. Once We know who is the person I will believe him/her with 5 tries or just pure ABC/HR logs on large enough number of samples.

You've clearly missed the point and perhaps it's my fault, though I did go out of my way to say I was talking about this specific situation as in this specific test with this specific poster.  It is in no way supposed to be interpreted as some universal law that will be applied throughout the forum.

I demanded a larger number of trials and samples before conclusions about sound quality be reached.  Sixteen trials per test and p<0.01 were offered up as a metrics that would accomplish this goal and hopefully better acquaint the OP with ABX testing.  Would twelve or even ten trials be OK?  Sure, it's certainly better than five.

It has also been suggested that the OP need not spend so much time on specific samples that are transparent and focus his efforts on samples that weren't (assuming that there actually were some!).  I agree with this too.

There really needn't be so much uproar over having the OP understand double-blind testing and interpretation of results and know that this is a prerequisite to posting conclusions about sound quality.  If it's too much trouble, then that's fine too, but if you can't substantiate your conclusions about sound quality with objective data then don't make them.
Title: Minimum number of required ABX trials
Post by: greynol on 2012-01-12 04:25:46
But you might hurt his feelings by letting him know about the truth!

Nah, we're to believe that the OP has either identified new killer samples (which have not been shared, BTW), has golden ears or both.

...and no, I didn't ever think, let alone claim, that the OP is lying.
Title: Minimum number of required ABX trials
Post by: IgorC on 2012-01-12 04:46:55
So what, if you're new here we should give you 10 posts where you can say nonsense and everyone just kind of pats you on the head and pretends you're making sense?  Whats the point of that?  IMO everyone should be held to the same standards and treated like an adult.  Its condescending to pretend that some people can't handle real discussion.

At first time try to explain him/her then will see.
Title: Minimum number of required ABX trials
Post by: dhromed on 2012-01-12 09:47:13
So what, if you're new here we should give you 10 posts where you can say nonsense and everyone just kind of pats you on the head and pretends you're making sense?  Whats the point of that?  IMO everyone should be held to the same standards and treated like an adult.


I think it's very important to "pat them on the head" first, and then sternly tell them they're not making sense, as per 2Bdecided's sentiments in this [a href='index.php?showtopic=11442']age-old thread[/a]:

Quote
We've got to allow people who don't know any better (and sometimes even those of us who do!) to make unsubstantiated claims at first, so that other members can point out that they're unsubstantiated, and suggest a fair way of testing them. This doesn't mean we accept unsubstantiated claims as truth, but it does mean that people sometimes need to be allowed to post them as a starting point for discussion and investigation. "I think X" is an unsubstantiated claim, but it's OK if it leads on to "How can I test if it's true?"


Catching flies with honey etcetera.

I personally don't need a pat on the head to before I'm told I speak gibberish, but then I'm a technical person familiar with the myriad posting styles on technical forums. The fact that a person came here with a grand testing project and went out of his way to provide test results -- however useless -- is already very special. It may feel like a child is coming up to you asking you to like his crappy kid doodles, and sure, you don't have time for that, but unlike artistic skill, a semblance of proper testing methodology can be explained in a few short posts.
Title: Minimum number of required ABX trials
Post by: nesf on 2012-01-12 10:08:32
From a complete newbie perspective: It could have been handled in a more friendly fashion but someone has to point out the flaws in the methodology of the test early for the person to go on and do more useful tests for the community. You catch more flies with honey sure, but if you child is holding their pencil wrong when writing their letters you correct them early before the habit forms. You then make sure they get a real good patting on the head any time you see them doing it correctly though. Perhaps the user should have been told how to do it correctly for two samples and then given a lot of praise for doing it right when they came back with their results?

Anyway, thanks for the thread, very interesting.
Title: Minimum number of required ABX trials
Post by: apodtele on 2012-01-12 16:09:14
Please read Fallacy of p-value (http://www.annals.org/content/130/12/995.abstract).

I just want to point out the p-value is only a measure of confidence that one sees a difference however small it is. P-value is not a measure of how big the difference. P-value is not a measure of the quality. Using too many tests will always produce good p-value. The goal, however, is to accurately estimate the percentage. The closer it is to 50% the harder it gets.

Do we care about 55%, or 60%, or 80% of correct guesses as a hypothesis? This is what should be the subject of this thread. Then we can calculate the
number of necessary tests to test this hypothesis.
Title: Minimum number of required ABX trials
Post by: krabapple on 2012-01-12 17:00:12
It looks like we're groping towards a discussion of statistical power and/or Bayesian concepts like prior probability....interesting stuff.

Apodtele's link to Goodman's article prompted me to look for recent citations.  Now I've got these on my reading list:

http://www.ncbi.nlm.nih.gov/pubmed/19921345 (http://www.ncbi.nlm.nih.gov/pubmed/19921345)

http://www.ncbi.nlm.nih.gov/pubmed/21356064 (http://www.ncbi.nlm.nih.gov/pubmed/21356064)

http://www.ncbi.nlm.nih.gov/pubmed/15101426 (http://www.ncbi.nlm.nih.gov/pubmed/15101426)
Title: Minimum number of required ABX trials
Post by: Porcus on 2012-01-12 21:07:25
Please read Fallacy of p-value (http://www.annals.org/content/130/12/995.abstract).

I just want to point out the p-value is only a measure of confidence that one sees a difference however small it is. P-value is not a measure of how big the difference. P-value is not a measure of the quality. Using too many tests will always produce good p-value. The goal, however, is to accurately estimate the percentage. The closer it is to 50% the harder it gets.



While I do agree with a great deal of the content of the article, I do not buy into your subsequent point. Sure it would be a good thing to be able to present some «Probability that I can guess X=A or X=B correctly in an ABX trial with these two samples, is in [range]», but I guess that we are better of with a slightly lesser ambition for the TOS#8 rule.

The «purpose» of TOS#8 is of course up to interpretation, so here is mine: To get rid of unfounded statements. At least a certain kind of them, namely those which claim some kind of audible difference (usually a «better than»-statement). Enter ABX: if you cannot tell the difference in a blind setting, then your statement concerning the audible differences, is unfounded. Ideally, we would want to measure whether your statements are true, but we don't do that -- if you pass the «I can tell the difference» test, then you are free to speak all kinds of nonsense about what the difference is.


And then comes the interpretation of an ABX session. Again, we would like to test whether the statements are true, but again, we settle for a lower level of ambition: how often would the coin outperform you at identifying? We set a certain standard for this.


Quote
Using too many tests will always produce good p-value.


If you do guess better than the coin, it does. And this is an issue if we are worried about a case where journalist boosts a small and unimportant difference into a «Scientists say there is no shadow of doubt anymore: A is different from B» headline. But that is luxury.
Title: Minimum number of required ABX trials
Post by: saratoga on 2012-01-12 22:15:58
From a complete newbie perspective: It could have been handled in a more friendly fashion but someone has to point out the flaws in the methodology of the test early for the person to go on and do more useful tests for the community. You catch more flies with honey sure, but if you child is holding their pencil wrong when writing their letters you correct them early before the habit forms. You then make sure they get a real good patting on the head any time you see them doing it correctly though. Perhaps the user should have been told how to do it correctly for two samples and then given a lot of praise for doing it right when they came back with their results?


I think we were pretty friendly in general, its just that by the time the OP told anyone what he was doing he had already made up his mind and it was too late to help him come up with a better test.  IMO it wasn't until I pointed out that he had gone 5 out of 5 trials the wrong way on one test that he started to realize what ABX testing was supposed to do, and by then he was frustrated at the wasted effort and just gave up. 

Of course, since it wasn't just the OP, but also some experienced posters in this thread who misunderstood how ABX testing should work, perhaps our documentation needs to be made more accessible.  I'm not exactly sure how that should be done though.  I'm hesitant to put up guides to doing tests that imply that there is an absolute right way to do testing.  Maybe some examples of sound test setups for comparing different codecs on a problem sample would help people understand though.  I get the feeling that the wall of text on most of our links discourages people.
Title: Minimum number of required ABX trials
Post by: nesf on 2012-01-13 00:59:27
A Dummies guide for doing some basic two sample and three sample tests might be an idea discussing options about number of tests and such and showing how the number required changes as you add samples. My stats knowledge is more regressions and similar stuff so this isn't something I know a lot about so this thread is quite interesting for me.

I wasn't saying ye were unfriendly more that if you want to convert people like the OP you might need adopt cult like levels of friendliness.