How do you listen to an ABX test?

Topic: How do you listen to an ABX test? (Read 345049 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

How do you listen to an ABX test?

Reply #250 – 2015-04-02 22:55:58

Quote from: jkeny on 2015-04-02 22:42:38

Wow, I'm glad that you laid all this out in black & white for all to see - it really proves the experimenter bias that underlies the thinking by many DBT supporters on thsi forum.

or that you can't/don't read.

How do you listen to an ABX test?

Reply #251 – 2015-04-02 23:01:05

Quote from: krabapple on 2015-04-02 22:38:32

jkeny seems to think a lot of people would 'randomly guess' for nefarious purposes -- to somehow game the cumulative results toward negative.

Nope, wrong again - there are many reasons why people would just guess, not just for nefarious reasons but doe sit really matter what the motivation/intent behind guessing is - the point is to have a robust test design that can sense random guessing. I already suggested a way to include controls in ABX testing that randomly puts an agreed audible condition in random trials as one way to sense this occurrence.

So far, no reply to this suggestion. Oh I see pelmazo replied to it while I was posting this - yes the usual reply given "Go ahead & do it"
No interest whatsoever! That's expected.

How do you listen to an ABX test?

Reply #252 – 2015-04-02 23:07:30

Quote from: pelmazo on 2015-04-02 22:53:23

You're welcome. Except it doesn't prove any bias. It just describes how ABX testing works. There is not and cannot be any symmetry between positive and negative results. Your expectations are completely off, and your insistence just shows your ignorance.

Now, please do me a favor and disseminate my statement as widely as possible as "proof" that the objectivists are biased. You show how tempting it is for you. It would only reinforce my conviction, and my amusement.

Yep, I've bookmarked it, thanks - now don't go changing it

That seems like a good note to end on!!

How do you listen to an ABX test?

Reply #253 – 2015-04-02 23:14:09

Quote from: jkeny on 2015-04-02 20:37:11

The point is that there is no way of knowing from the published test results that they are simply just random guesses completed by a deaf monkey. Similarly for many other situations where the test isn't actually taken i.e. there is no serious attempt at listening involved. Yet all these tests are treated as valid & lumped into the null "evidence" pile.

You *stipulate* that 'no serious attempt at listening' was involved. But that's not been my experience.

I can stipulate that there are tests where 'positive' difference might have been due to some other factor. They get lumped into the positive 'evidence' file. Depending on who the lumper is.

In fact, much of the formal/professional audio DBT literature consists of ever-more-contrived protocols to account for possible sources of false positives/negatives.

And to date, have the large claims of the high-end folks -- the ones you're courting as customers -- been borne out by these tests?

Nope. No veils lifted, no 'night and day' differences.

But by all means, if you feel you've identified yet another hole in the dike that needs to be plugged, do so, and publish your own protocols and multisubject results.

Just don't expect anyone with a clue (or any journal) to take your (or anyone's) sighted claims of DAC audio quality as good evidence.

How do you listen to an ABX test?

Reply #254 – 2015-04-02 23:14:38

Quote from: jkeny on 2015-04-02 22:13:44

As I said, I have no problem with anyone doing their own personal blind test on my DACs as I have done

You don't mention anything like this in the interview and quite frankly, you're personal blind tests are a fabrication (good biz practice apparently) without any evidence of such.

Quote from: jkeny on 2015-04-02 22:13:44

this is fine for me & for most people who want to make buying decisions.

Not a single person buying your biochemically engineered "organic" DAC would ever consider a blind test. That simply isn't your type customer...and you know it.
If you had any real blind test data showing you identified/preferred your DAC, it would be pasted on every forum and your website. That's the reason why it isn't.

Quote from: jkeny on 2015-04-02 22:13:44

The rest of this is just ~~people~~ jkeny playing science & aping what he think the grown-ups do in their laboratories but not really understanding much of it

Bingo. You have nothing to gain by blind/ABXing your orgasmic DAC...and nothing to lose either, given the appeal of such "engineering".

cheers,

AJ

How do you listen to an ABX test?

Reply #255 – 2015-04-02 23:19:51

But by all means continue to rabbit on without me!!

How do you listen to an ABX test?

Reply #256 – 2015-04-03 01:18:09

Quote from: jkeny on 2015-04-02 17:29:00

Quote from: Arnold B. Krueger on 2015-04-02 17:17:18
Quote from: jkeny on 2015-04-02 17:03:44
I've asked over & over again what proctor would be accepted but got no answer. Care to answer this?

Sure:

JJ

Me

John Vanderkooy

Stan Lipshitz

and 100's others who are probably more conveniently physically sited
Pretty much as I suspected - you need to get a grip on reality if you think any of those named people would be remotely interested in proctoring.

Prove it.

Quote

It makes any positive ABX results impossible - just as the whole silly issue of proctoring was designed to do.

False and illogical claim. If you were able to follow instructions, your results would be the same, proctor or not.

Quote

You guys have backed yourself up into a corner of reality that is untenable.

It is you who are cornered, Mr. Kenny.

Quote

All ABX tests need to be proctored, right?

Not at all.

Quote

Quote
Besides, aren't you in this for the sake of knowledge and truth? Wouldn't just knowing be enough? It has been for me on many occasions. For example, I didn't spill the beans about the first ABX test ever done until someone did the second-dozen or more ones. Didn't want to bias them.

Do you not think I haven't done my own personal blind tests & am satisfied with my personal conclusions? That is truth for me

Do tell about those alleged blind tests.

How do you listen to an ABX test?

Reply #257 – 2015-04-03 01:36:08

Quote from: jkeny on 2015-04-02 22:28:48

So as is done on this thread, let's summarise:

1) - it's claimed that a null result "proves" nothing

That is how it works. That is not a claim, it is simple logic. Absence of proof is not proof of absence, or said otherwise: A negative hypothesis is difficult or impossible to prove. That's one reason why we try to prove the positive outcome with every test.

Quote

2) - it is admitted that an accumulation of null results is a strong indication of there being no ACTUAL audible difference

That is just common sense.

One other thing. Your co-conspirator is plastering AVS with claims that the percentage right answers means nothing. I quoted and linked that earlier so it is a proven fact that he is saying this.

The percentage of right can be used to discern a number of worthwhile things:

(1) A percentage right that is significantly below random guessing means that the test was compromised. The usual error at this point is communication among the listeners.

(2) Statistically significant results that have a low percentage of right answers means that the audible effect is subtle.

Point (2) is interesting because virtually every person who has actually done a significant number of DBTs finds this to be intuitively clear, and it is a result that can duplicated by varying the strength of the audible difference over a range around the threshold of reliable detection. Yet we have someone who claims to have done a number of DBTs, but keeps spewing this preposterous nonsense. oesn't compute. one explanation is that the majority of his alleged DBTs were shams and the ABX logs that were shown were the results of trickery.

Quote

3) - thus the number of null results has a direct bearing on how strong this indication is perceived to be - the higher the number the stronger the indication

Other things matter too, such the quality and rigor of the test procedures.

Quote

4) - treating all null results as valid & piling them all into the valid null results pile is knowingly skewing the overall number of null results towards (2) & (3) above

However, if one can come up with some worthwhile refinement on the test that would make positive results more probable, the game starts all over with a fresh score card.

How do you listen to an ABX test?

Reply #258 – 2015-04-03 10:46:50

Hi jkeny,

I think you’ve shown more “perseverance in the face of adversity” than I did. ;-)
Although, many people are open-minded, and therefore neither hard-line subjectivists nor hard-line objectivists, “preaching to the choir” within either group doesn’t have much power to persuade.
It is the power to convince a skeptic that is most interesting in this thread.

Many here have said they are skeptical of certain claims but would be convinced that a difference that is heard is real if: a proctored* ABX is performed with p<##, where *proctor and ## need to be agreed upon. There is no chance of getting 100% being convinced. krabapple wants a proctor, Mr. Krueger doesn’t. Some want p<0.05, some p<0.01 (from other threads/forums).

But you are an ABX skeptic, or at least you believe many are not performed well, and I would agree. But showing examples of bad ABX tests does not mean all are bad. Under what conditions would you accept the result and allow it to convince you of something (e.g. no audible difference)? You have mentioned including controls. Would you accept a result with controls that showed no audible difference between two DACs? If you chose the listeners and the listening conditions and there were controls, would you accept the result? If not, where are you going with this? Before you request analysis of the probability of Type II errors, let’s state up front that that requires a measure, or at least sensible estimate, of the effect size, that is, the probability of noticing a difference in a trial for the population being tested (e.g. “critical listeners”, “experienced, trained listeners”, “humans”). But to do that you must have data from previous work to calculate the probability. You have to do the experiment before you do the experiment = impossible. The first step would be to produce one positive test and then the statistical calculations can begin. If you can’t describe what would convince you, then I wonder why you continue to pursue it. Why do you have such “perseverance in the face of adversity”?

Cheers.

How do you listen to an ABX test?

Reply #259 – 2015-04-03 11:54:58

Quote from: jkeny on 2015-04-02 23:01:05

I already suggested a way to include controls in ABX testing that randomly puts an agreed audible condition in random trials as one way to sense this occurrence.

That would confuse a listener who was listening - if A or B suddenly became a previously unheard C!

However, the concept is well used in other ways that don't confuse, and called a hidden reference. It's not a new idea.

In the strictest sense, a blind test can't prove that something is inaudible. There are pages written about this. However, where someone claims to hear a differences, continues to believe they hear the difference during the blind test, but their answers show they could not detect the difference, that's about as close to proof that that person was imagining the difference that you can get.

Cheers,
David.

How do you listen to an ABX test?

Reply #260 – 2015-04-03 12:02:55

Quote from: jkeny on 2015-04-02 23:01:05

I already suggested a way to include controls in ABX testing that randomly puts an agreed audible condition in random trials as one way to sense this occurrence.

I already told you 2 months ago (in the thread about amir's demonstration that he has no clue about statistics) that dishonest people can still distort the results to their liking. But you aren't here to discuss or learn anything, just make noise. So /ignore.

How do you listen to an ABX test?

Reply #261 – 2015-04-03 12:11:27

I'm not an audiophile bi$man but I play one on TV, so I'll address some of this just in case John is just too busy elsewhere. Clearly you are new at this also.

Quote from: SoundAndMotion on 2015-04-03 10:46:50

Under what conditions would you accept the result and allow it to convince you of something (e.g. no audible difference)?

Specifically regarding audiophile beliefs, like about DACs, positive results only. Negatives (nulls) are unacceptable, since we "know" that there are audible differences between mass market and organically grown DACs. How do we know? Long term completely uncontrolled sighted "listening", where we are relaxed, get invaluable and completely impartial, detached input from the Mrs, etc.
IOW not waterboarded being unable to know the namebrand and what I read on audiophile sites, etc.

Quote from: SoundAndMotion on 2015-04-03 10:46:50

You have mentioned including controls.

Yes, as part of the adaptation, you must learn the terms so you can throw them out there, even though you have no clue what they mean. Leading to some very amusing exchanges with JJ on AVS.

Quote from: SoundAndMotion on 2015-04-03 10:46:50

If you chose the listeners and the listening conditions and there were controls, would you accept the result?

Momentarily. However, without continued therapy for the condition, there will be the inevitable relapse.

Quote from: SoundAndMotion on 2015-04-03 10:46:50

If not, where are you going with this?

Bank of Ireland?

Quote from: SoundAndMotion on 2015-04-03 10:46:50

Why do you have such “perseverance in the face of adversity”?

They banned Scientologists in Germany didn't they. Lucky you!

cheers,

AJ

How do you listen to an ABX test?

Reply #262 – 2015-04-03 12:26:39

Hi AJ,
Just as one bad ABX doesn't spoil the whole bunch...
One bad Subjectivist (ML) doesn't spoil the whole bunch.
I'm curious what JK's conditions to be persuaded would be. He has been posting concerns here.
Cheers, SAM

How do you listen to an ABX test?

Reply #263 – 2015-04-03 12:45:20

Quote from: SoundAndMotion on 2015-04-03 10:46:50

It is the power to convince a skeptic that is most interesting in this thread.

Many here have said they are skeptical of certain claims but would be convinced that a difference that is heard is real if: a proctored* ABX is performed with p<##, where *proctor and ## need to be agreed upon. There is no chance of getting 100% being convinced. krabapple wants a proctor, Mr. Krueger doesn’t. Some want p<0.05, some p<0.01 (from other threads/forums).

Given that everybody is skeptical about something, I think it is not that hard to understand why people show different amounts of resistance to getting convinced of something they find implausible.

Changing the topic sometimes helps illustrating the problem. Say you are dealing with claims by numerous people that they can run the 100m distance in 7 seconds or better. They say that they have done it routinely, and some even provide time measurements of runs they claim to have done. You are skeptical that anyone can complete this distance in 7 seconds, given your current knowledge, so the claims don't convince you. You don't need a degree in human biology for that, some common sense is enough. The question now is, what will convince you? Do you think a proctor helps? Any proctor? Do you need to be present yourself?

Even if there is an impartial proctor, does he have access to all relevant aspects of the test? For example, if the role of the proctor is only to check that the time measurement was correct, you may suspect that the run didn't start at 0 velocity. The trick may be that the runner enters the 100m stretch at full velocity. Or the track goes steeply downhill. What about doping? What if the runner isn't a human being at all, but a purpose built machine...

You see that it can be quite difficult to work out every cheating possibility in advance. It is correspondingly difficult to say in advance what you would accept as proof. Every set of conditions you provide can be used as a basis for working out a cheating strategy. Take sports again as an example: The anti-doping rules have illustrated this for decades. Knowing what the rules are allows developing strategies to circumvent them. If the incentive is large enough this will most certainly be done in practice. This can only be counterbalanced by updating the rules according to the experiences made along the way.

Back to our topic: We have seen that at least for some "players", the incentive seems to be large enough to employ any cheating method they can find. Some people fight a bitter battle and feel compelled to use any device they can get hold of. In a climate like that, one proctor will probably not be enough. You will need an elaborate system of rules, and the first problem is to get everybody to accept them. We see how often it fails at that step already.

Many people don't want to go that far. They resolve to say: I don't care how and whether you can run the 100m in 7 seconds. I don't believe it, no matter what you say. My time is too precious to waste it on elaborate testing rules designed to prevent you from cheating. You can just fuck off.

Would that be close minded, even arrogant? Perhaps, but at that point, I have quite a bit of sympathy with such a stance. I find it perfectly reasonable. Being accused of politically motivated bias in such a situation doesn't really disturb me much. Not if the claim that I'm supposed to take seriously is so wide off any reasonable expectation that rather solid evidence would be called for.

How do you listen to an ABX test?

Reply #264 – 2015-04-03 13:08:18

Quote from: SoundAndMotion on 2015-04-03 10:46:50

Hi jkeny,

I think you’ve shown more “perseverance in the face of adversity” than I did. ;-)
Although, many people are open-minded, and therefore neither hard-line subjectivists nor hard-line objectivists, “preaching to the choir” within either group doesn’t have much power to persuade.
It is the power to convince a skeptic that is most interesting in this thread.

Many here have said they are skeptical of certain claims but would be convinced that a difference that is heard is real if: a proctored* ABX is performed with p<##, where *proctor and ## need to be agreed upon. There is no chance of getting 100% being convinced. krabapple wants a proctor, Mr. Krueger doesn’t. Some want p<0.05, some p<0.01 (from other threads/forums).

Thanks on the perseverance comment
I believe that both krabapple & Arny want a procotor - Arny just reserves the option to invoke it when he sees fit - a many here do.

Quote

But you are an ABX skeptic, or at least you believe many are not performed well, and I would agree. But showing examples of bad ABX tests does not mean all are bad.

Sure, but the issue is we don't know what percentage are bad & I suspect that many are. On the other hand the attitude shown here is that they don't care about this. I wouldn't care either if null results were simply just discarded & of no importance but that is not the case. The accumulated body of null results is used as evidence of inaudibility. Look at ArnyK's jangling keys test - flawed as it was - the claim was that so many have done this test in 15 years & nobody has produced a positive result. So these flawed files were tested how many thousands of time over 15 years & not one person picked up the audible flaw. This body of null results was often cited as evidence that there is no audible difference with high-res. Irrespective of whether Amir's results are due to high-res differences, his repeated ABX positive test results show that there is an audible difference between the two jangling keys files that stood for 15 years as being audibly identical.

A similar thing happened for Ethan Winer's loopback test files which stood for a similar amount of time without any positive results. These files were the evidence that Winer used to claim that a soundblaster audio card is audibly indistinguishable from a professional audio system for recording. He looped back a recording through the card many times & put online test files extracted after different number of loopback generations as "proof" that many trips through D/A-A/D is not audible. Again positive results were reported around the same time & Winer then changed his files.

The point being - in both of these cases ACTUAL audible differences existed in the test files but this was undiscovered during the claimed many thousands of blind tests run during the 15 years previous. So what was the problem? Why no positive results over this time? Why, when some positive results are reported do others then find similar positive results?

This gives me a large question mark over the capability of such tests to reveal small differences & led me to ask what the level of false negatives are for ABX testing. I know from my own experience of running ABX & other blind tests how easily it is to get bored & lose focus & not actually listen. It's a difficult task to retain concentration on the same short piece of repeated music at the level of analytic hearing required in this form of listening. This lapse is often not even something that people are aware of - it's not like reading where you find that you need to re-read the same paragraph a number of time because your mind has wandered - in audio, a lapse in focus generally goes unnoticed. I figured that including some internal controls in the test could begin to reveal how prevalent this might be & build a profile of just how reliable these tests are.

There seems to be a great reluctance to address such a mechanism.

Quote

Under what conditions would you accept the result and allow it to convince you of something (e.g. no audible difference)? You have mentioned including controls. Would you accept a result with controls that showed no audible difference between two DACs? If you chose the listeners and the listening conditions and there were controls, would you accept the result? If not, where are you going with this? Before you request analysis of the probability of Type II errors, let’s state up front that that requires a measure, or at least sensible estimate, of the effect size, that is, the probability of noticing a difference in a trial for the population being tested (e.g. “critical listeners”, “experienced, trained listeners”, “humans”). But to do that you must have data from previous work to calculate the probability. You have to do the experiment before you do the experiment = impossible. The first step would be to produce one positive test and then the statistical calculations can begin. If you can’t describe what would convince you, then I wonder why you continue to pursue it. Why do you have such “perseverance in the face of adversity”?

Cheers.

As I said, once I see reasonable internal controls that can be used to give an indication of when someone is actually listening Vs when someone is not (for whatever reason) then I will accept the results of blind tests conducted by A.N. others. I even have to be aware of this in my own blind testing as I know this is a very insidious issue.

Let me give you an example - I find that this is somewhat like reading a book - normal reading is different to proof reading. In normal reading you may pick up some spelling/typo glitches but it's the story & flow that is of importance. In proof reading (which I find is the equivalent of blind testing) you are reading to pick up typos/spellings & grammar issues. It's very easy to lose focus doing this & regular breaks are needed. One also has to be aware of when you lose focus.

What I'm saying is that before I accept that the book has been properly proof read by a stranger, I would want to include misspellings/typos grammar mistakes throughout the book & if they weren't reported I would know that the proof-reading wasn't done properly. If some were found but not others I would know that focus had been lost at times, etc.

At the moment, for the majority of blind tests run by non-specialists, no-one has a way of judging the validity of the null results.
I have given here & in past threads some examples of how I would implement internal controls in ABX software & what they could be so I'm not sure why you say " If you can’t describe what would convince you, then I wonder why you continue to pursue it." I was hoping that internal controls might be considered a good idea & people would work together to come up with some workable solutions rather than every suggestion I made being shot down & the idea being dismissed as unimportant or dismissed in other ways. The reaction to my questioning of the validity of null results suggests to me that people have invested so much in these null results that they are unwilling to objectively examining the testing & finding a way of separating out tests that should be eliminated from valid null results - they seem to feel threatened by this very concept.

How do you listen to an ABX test?

Reply #265 – 2015-04-03 13:19:47

Quote from: 2Bdecided on 2015-04-03 11:54:58

Quote from: jkeny on 2015-04-02 23:01:05
I already suggested a way to include controls in ABX testing that randomly puts an agreed audible condition in random trials as one way to sense this occurrence.
That would confuse a listener who was listening - if A or B suddenly became a previously unheard C!

By changing the volume of X every now & then you are introducing a small change which should be audible. I doubt this would confuse - how are hidden references used in blind tests? If on the other hand, it wasn't noticed, would this not indicate that the sound file was not being listened to analytically at that particular point in the test? If all occurrences of this control went unnoticed would it not suggest that there was a loss of focus (or some other reason for not hearing the audible difference) throughout the test?

Quote

However, the concept is well used in other ways that don't confuse, and called a hidden reference. It's not a new idea.

In the strictest sense, a blind test can't prove that something is inaudible. There are pages written about this. However, where someone claims to hear a differences, continues to believe they hear the difference during the blind test, but their answers show they could not detect the difference, that's about as close to proof that that person was imagining the difference that you can get.

Cheers,
David.

Sure hidden references are recommended for blind tests in the standards documents - why?
It isn't just included for no reason - it is a way of self-checking the test itself - something that is sadly missing in ABX testing
This is what I'm trying to get across here.

How do you listen to an ABX test?

Reply #266 – 2015-04-03 13:37:45

Quote from: pelmazo on 2015-04-03 12:45:20

....
Back to our topic: We have seen that at least for some "players", the incentive seems to be large enough to employ any cheating method they can find. ....

A good start in eliminating cheating is to look at examples of cheating that have been perpetrated in the past. Have you got these examples?

How do you listen to an ABX test?

Reply #267 – 2015-04-03 13:40:49

Quote from: xnor on 2015-04-03 12:02:55

Quote from: jkeny on 2015-04-02 23:01:05
I already suggested a way to include controls in ABX testing that randomly puts an agreed audible condition in random trials as one way to sense this occurrence.

I already told you 2 months ago (in the thread about amir's demonstration that he has no clue about statistics) that dishonest people can still distort the results to their liking. But you aren't here to discuss or learn anything, just make noise. So /ignore.

Sorry, what has your reply got to do with the extract from my post you quoted?

How do you listen to an ABX test?

Reply #268 – 2015-04-03 14:09:45

Quote from: jkeny on 2015-04-03 13:37:45

A good start in eliminating cheating is to look at examples of cheating that have been perpetrated in the past. Have you got these examples?

Yes, although I am not in a position to prove that it was being done consciously and on purpose. It is often conceivable that the individuals were simply incompetent or negligent, but not malicious, but sometimes that's hard to believe, and I'm compelled to assume that there must have been an element of maliciousness. Sometimes I'd say that they were deliberately ignorant and obstinate. I can't look into other peoples' heads, but it doesn't matter for the end result.

Examples are: Fiddling with the statistics, trying to extract a significance where there isn't one. Moving the goalpost after the fact. Trying to selectively exclude unwelcome results using dubious arguments. Inventing reasons for dismissing a test after the fact, even though they previously had accepted the terms. Not revealing a clue which gave away the result, thereby pretending the test was valid when it wasn't.

How do you listen to an ABX test?

Reply #269 – 2015-04-03 15:17:02

Quote from: 2Bdecided on 2015-04-03 11:54:58

Quote from: jkeny on 2015-04-02 23:01:05
I already suggested a way to include controls in ABX testing that randomly puts an agreed audible condition in random trials as one way to sense this occurrence.
That would confuse a listener who was listening - if A or B suddenly became a previously unheard C!

However, the concept is well used in other ways that don't confuse, and called a hidden reference. It's not a new idea.

In the strictest sense, a blind test can't prove that something is inaudible. There are pages written about this. However, where someone claims to hear a differences, continues to believe they hear the difference during the blind test, but their answers show they could not detect the difference, that's about as close to proof that that person was imagining the difference that you can get.

Cheers,
David.

Thanks.

One would have thought none of this needed saying at this point.

But one would again have been wrong.

Those who imagine they are bravely tilting ' in the face adversity' are going to keep showing up here, unaware of prior work, or mis-characterizing it, or insisting that they've invented the new work. Some may even actually know a thing or two, but not a thing or three.

Note to self, must keep that in mind.

How do you listen to an ABX test?

Reply #270 – 2015-04-03 15:46:49

Quote from: jkeny on 2015-04-03 13:08:18

Sure, but the issue is we don't know what percentage are bad & I suspect that many are.

You've been a member here at HA since 2005...plenty of time to witness many, many posted ABX results. Can you point to all the 'suspect' ones right here on HA? Have you any sense of how many returned 'positive' (which I
presume you're OK with) vs 'negative' (which is where all your suspicion seems to point) results?

Quote

On the other hand the attitude shown here is that they don't care about this. I wouldn't care either if null results were simply just discarded & of no importance but that is not the case. The accumulated body of null results is used as evidence of inaudibility. Look at ArnyK's jangling keys test - flawed as it was - the claim was that so many have done this test in 15 years & nobody has produced a positive result.
So these flawed files were tested how many thousands of time over 15 years

Good question.

Quote

& not one person picked up the audible flaw. This body of null results was often cited as evidence that there is no audible difference with high-res. Irrespective of whether Amir's results are due to high-res differences, his repeated ABX positive test results show that there is an audible difference between the two jangling keys files that stood for 15 years as being audibly identical.

A similar thing happened for Ethan Winer's loopback test files which stood for a similar amount of time without any positive results.

How many reports existed?

Quote

The point being - in both of these cases ACTUAL audible differences existed in the test files but this was undiscovered during the claimed many thousands of blind tests

You sure about that number you're throwing around? Downloading some files does not mean tests were done and results were reported every time.

Btw, if you find all *that* suspect, what are we skeptics to make when suddenly *multiple* reports of positive difference, within a short period of time, appear online for differences previously mooted to be difficult if not impossible to discern?

Quote

run during the 15 years previous. So what was the problem? Why no positive results over this time? Why, when some positive results are reported do others then find similar positive results?

This gives me a large question mark over the capability of such tests to reveal small differences & led me to ask what the level of false negatives are for ABX testing. I know from my own experience of running ABX & other blind tests how easily it is to get bored & lose focus & not actually listen. It's a difficult task to retain concentration on the same short piece of repeated music at the level of analytic hearing required in this form of listening. This lapse is often not even something that people are aware of - it's not like reading where you find that you need to re-read the same paragraph a number of time because your mind has wandered - in audio, a lapse in focus generally goes unnoticed. I figured that including some internal controls in the test could begin to reveal how prevalent this might be & build a profile of just how reliable these tests are.

'Internal controls' can be a training run beforehand, where difference is introduced at the start then incrementally decreased until some threshold is reached. This is not a new idea. Neither 'phantom switch', where A and B are actually the same, though the listener doesn't know it. This is not a new idea either.

Best practice for an ABX type test (which does not admit 'internal' negative and positive controls -- the sort that would prove the subject is 'really listening' -- in the sense of including them *within a single test*) includes training the subjects beforehand. For a given A and B, the only way to implement 'controls' of the sort you demand, would be to do *two* ABX 'training' runs, keeping A the same as you experimental A, but changing B -; in one, B 'must' differ from A (positive expected from magnitude of difference; this could be an incremental difference test, thereby also checking sensitivity)) ,and in the other 'B 'cannot' differ (phantom switch). These would of course mean you are using a *different* B in some sense, than your experimental B. They merely would demonstrate that the ABX setup works and that the subject's hearing is intact.

Quote

There seems to be a great reluctance to address such a mechanism.

So say you. There are also comparison tests that are better suited to testing different propositions. Your own ignorance of them is not a sign of 'reluctance'.

There does seem to be a great reluctance of DAC makers -- and champions of certain DACs -- to run and publish DBTs involving their gear, though.

Meanwhile there have been many, many attempts to critique ABX...perhaps the most recent being the 'cognitive load' ploy from Meridian (which is amusingly at odds with your critique -- if anything it suggests listeners are *trying too hard*). Your belief that an accumulation of nulls is due to many subject not 'really listening' is simply that: your belief, absent some good evidence. It does nto accord with either my personal ABX experiences, nor with what I ahve seen on this forum, which is perhaps the largest repository of ABX results online.

How do you listen to an ABX test?

Reply #271 – 2015-04-03 15:55:43

Quote from: jkeny on 2015-04-03 13:19:47

By changing the volume of X every now & then you are introducing a small change which should be audible. I doubt this would confuse

With respect, there speaks the voice of one who doesn't know what they're talking about. You really haven't thought this through at all.

Quote

- how are hidden references used in blind tests?

...

Sure hidden references are recommended for blind tests in the standards documents - why?
It isn't just included for no reason - it is a way of self-checking the test itself - something that is sadly missing in ABX testing
This is what I'm trying to get across here.

Known audio problems (which aren't hidden references - apologies for using the word for the opposite concept!) are included in the double-blind medium scale listening tests carried out right here.

You're posting in the very same forum and sub-forum as the results from these tests. Go on - be a little bit curious and go and look at the one from last year. FAAC q30 was the audibly different thing that everyone should have been able to spot. You might find some of my own comments aren't 100 miles way from yours. The difference being of course that I spent some time doing over 60 sets of double-blind comparisons before commenting.

It's strange to want to debate something that you know so little about. I can understand you wanting to ask questions to learn about it from people who invented it and people who have done it a lot. That would make sense. But to be so keen to find folks who understand something which you yourself have so little experience and understanding of, for the purpose of telling them how wrong and flawed the thing is. It's just weird.

It most real-life situations, most people would keep quiet until they'd learned a bit more.

Still, the faceless world of the internet does weird things to people. For better and worse.

Cheers,
David.

How do you listen to an ABX test?

Reply #272 – 2015-04-03 16:00:36

Quote from: pelmazo on 2015-04-03 14:09:45

Quote from: jkeny on 2015-04-03 13:37:45
A good start in eliminating cheating is to look at examples of cheating that have been perpetrated in the past. Have you got these examples?

Yes, although I am not in a position to prove that it was being done consciously and on purpose. It is often conceivable that the individuals were simply incompetent or negligent, but not malicious, but sometimes that's hard to believe, and I'm compelled to assume that there must have been an element of maliciousness. Sometimes I'd say that they were deliberately ignorant and obstinate. I can't look into other peoples' heads, but it doesn't matter for the end result.

Examples are: Fiddling with the statistics, trying to extract a significance where there isn't one. Moving the goalpost after the fact. Trying to selectively exclude unwelcome results using dubious arguments. Inventing reasons for dismissing a test after the fact, even though they previously had accepted the terms. Not revealing a clue which gave away the result, thereby pretending the test was valid when it wasn't.

And my suggestion for internal controls would eliminate some of this revisionism, don't you think? If it was shown in the results that the hidden references were differentiated in the test then it would go a long way towards showing that the test was at least sensitive enough to reveal the impairment level in these hidden references.

How do you listen to an ABX test?

Reply #273 – 2015-04-03 16:11:14

Quote from: 2Bdecided on 2015-04-03 15:55:43

Quote from: jkeny on 2015-04-03 13:19:47
By changing the volume of X every now & then you are introducing a small change which should be audible. I doubt this would confuse
With respect, there speaks the voice of one who doesn't know what they're talking about. You really haven't thought this through at all.

Quote
- how are hidden references used in blind tests?

...

Sure hidden references are recommended for blind tests in the standards documents - why?
It isn't just included for no reason - it is a way of self-checking the test itself - something that is sadly missing in ABX testing
This is what I'm trying to get across here.
Known audio problems (which aren't hidden references - apologies for using the word for the opposite concept!) are included in the double-blind medium scale listening tests carried out right here.

You're posting in the very same forum and sub-forum as the results from these tests. Go on - be a little bit curious and go and look at the one from last year. FAAC q30 was the audibly different thing that everyone should have been able to spot. You might find some of my own comments aren't 100 miles way from yours. The difference being of course that I spent some time doing over 60 sets of double-blind comparisons before commenting.

It's strange to want to debate something that you know so little about. I can understand you wanting to ask questions to learn about it from people who invented it and people who have done it a lot. That would make sense. But to be so keen to find folks who understand something which you yourself have so little experience and understanding of, for the purpose of telling them how wrong and flawed the thing is. It's just weird.

It most real-life situations, most people would keep quiet until they'd learned a bit more.

Still, the faceless world of the internet does weird things to people. For better and worse.

Cheers,
David.

David, you seem like a reasonable person & I'm sorry if I come across as somebody who knows nothing about what I'm talking about - I believed that this discussion was the form of debate in which I would learn something but Ok, let me ask why it is considered inappropriate to include the internal controls I suggested in the Foobar ABX software? You say it will confuse but I can't see how - can you explain your thinking some more, please?

Maybe my suggested control in Foobar ABX is not practical or not workable or not useful but so far I don;t see this

Irrespective of the practicality of my suggestion, is the principle of trying to separate valid from invalid null results objected to? as I haven't seen this actually answered. If the principle is agreed, is there no interest in working towards a way of doing this?

I see this applies for the blind test I was directed to:

Quote

Kamedo2, please, correct "Post-screening":

Quote

If you rank the low anchor at 5.0, your result of the sample will be invalid.
If you rank the mid-low anchor at 5.0, your result of the sample will be invalid.
If you rank the low anchor higher than the mid-low anchor, your result of the sample will be invalid.
If you rank the reference worse than 4.5, your result of the sample will be invalid.
If you rank the reference worse than 5.0 on 25% or more of submitted results, all of your results will be invalid.
If you submit 25% or more invalid results, all of your results will be invalid.

I just can't see why the same attempt at valid/invalid test discrimination doesn't apply to ABX testing?

BTW, I'm under no illusion that I'm introducing anything novel - I have cited the ITU standards documents many times in reference to the inclusion of internal controls in blind tests - I just wanted to see how this could be applied to Foobar ABX tests

Maybe my test won't work& I haven't really thought through the practicality of using it - I know that ABX is a forced choice test but if the listener is made aware that some random hidden anchors/controls will be included & they should just register when they discern them, would this be workable?

How do you listen to an ABX test?

Reply #274 – 2015-04-03 16:31:00

In an ABX test (or any listening test, sighted or blind) you try to figure out what the difference is. Mostly (not always, but mostly) you then try to detect that difference again and again. That's how you pass an ABX test. That's how I do it, anyway.

A level difference is almost certainly not what you're looking for. Worse, a small level difference might not be perceived as a level difference, but as some other change. I've been tricked by this myself. It can sound like more or less bass, treble, clarity, sound stage - you name it. It's got to be comparatively big before you can unambiguously say "the main change, maybe the only change, is the level." By that stage, it's obvious, and even the person who wasn't really listening properly will notice it, so it probably doesn't help you. Worse though is that at lower levels, it could interfere with finding the exact difference that the attentive listener has homed in on.

You can't force someone to listen. But the context in which an ABX test makes most sense is when someone thinks they hear a difference in a sighted test (i.e. normal listening). Then you take the test to prove (first to yourself) that you really hear something. In that circumstance, I can't see why someone wouldn't listen.

I know some people do do it, but having failed to hear any difference in a sighted test, I've never felt the need to "prove" to myself that I don't hear a difference by failing an ABX test. What I have done sometimes is simply not been sure. Then I've done the ABX test, and carefully (not casually, but carefully) made my best guess when I've still not been sure, and looked at the results to see if there was anything in my hunches.

Proper psychoacoustic tests do the same thing BTW, varying the amplitude of the difference until they find the point at which your guesses are correct, say, 70% of the time. I've created, run, and taken part in those tests, and when you find yourself doing it, sometimes it's like magic right at that threshold because you can't "hear" the difference enough to be certain you heard anything, but you can guess, and those guesses are mostly right.

Cheers,
David.

Notice