HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: item on 2012-10-08 19:21:38

Title: Overcoming the Perception Problem
Post by: item on 2012-10-08 19:21:38
I've noticed an increase in forum debate about the validity of transferring the credibility of ABX from the physical domain to perception testing. I'm wondering if anyone has found a way past this issue?

The purpose of blind testing is to subtract subjectivity from the effect of - for instance - a drug trial: to assess a medication's impact on a subject's physiology with interference from their psychology. But what about when the purpose of a test is subjective perception? How do we then subtract the effect of the method to arrive at a meaningful outcome?

While we would like to remove expectation bias from the equation, if the conditions under which this is done also change the perceptive state of the listener, the test is invalidated as surely as they would be by tissue sample contamination.

Recent large scale public experiments by Lotto Labs (http://www.lottolab.org/) demonstrated that perception acuity is dramatically altered by test conditions: for instance, that time contraction/dilation effects are experienced when exposed to colour fields. In one experiment, two groups were asked to perform an identical fine-grained visual acuity test. One group was pre-emptively 'manipulated' by filling in a questionnaire designed to lower their self-esteem. This 'less confident' group consistently performed worse on the test that the unmanipulated one: their acuity was significantly impaired by a subtle psychological 'tweak' that wasn't even in effect during the test.

It seems undeniable that the much grosser differences between the mental states of sighted and 'blind' listening - considered generously - cast serious doubt on the results thus obtained.

The harder line is that blind perception tests are a fundamental misappropriation of methodology. In psychology it's axiomatic that for many experiments the subject must be unaware of the nature of the test (see Milgram). If a normalised state is not cunningly contrived, results are at best only indicative of what a subject thinks they should do; at worst, entirely invalid.

Probing hearing, the point is that a test must not change the mental state of the listener.

The contrast between outcomes of sighted and listening tests is as stark as those demonstrating suggestibility (see McGurk), but giving too much credence to such an intrinsically unsound experimental approach (not spotting this difficulty) does no favours to our credibility at all.

The only way past the dilemma seems to be direct mechanical examination of the mind during 'normal' listening to explore why the experiences of sighted and unsighted listening differ. This seems to be an interesting question. 

In the meantime, the idea that - despite the method problem - results from blind ABX are valid is at least supported by the majority of data derived from home testing, Audio DiffMaker et al, so we needn't get hung up on it.
Title: Overcoming the Perception Problem
Post by: Soap on 2012-10-08 19:38:48
You appear to be confused.

Despite common shorthand, ABX is not a test.  It is a method.  A method equally at home in a variety of settings.

Second you appear to be picking and choosing your literature.  You attack the idea of audio testing though a broadside against tests where the subjects are aware they are taking a test.  Ignoring the irrelevant mention of purposeful manipulation of tester state (are you honestly proposing that HA's "default" ABX routine somehow systematically influences the self esteem of participants?) you are ignoring the fact that there is no literature supporting your presupposition that tester awareness diminishes perceptual acuity.

Thirdly you (without backing) conclude that "direct mechanical examination of the mind during 'normal' listening" must be employed.  Failing to defend, in the slightest, the idea that perception can be differentiated from subconscious neural activity.  I can directly measure my pulse rate through a variety of methods.  That is far from even hinting, much less proving, conscious perception of my pulse rate.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-08 19:52:29
To rephrase: what, in any trial, is blind testing designed to filter out?

Not an attack - no need to be combative! - it's self-evident that psych evaluations often depend crucially on the subject not being aware of the purpose of the test. Why?

I think you may have misunderstood the Lotto Labs experiments I referred to: maybe check them out. They aren't about manipulating the tester's state: they explore how (surprisingly) easily perceptual states are changed by environmental conditions: changing the subject's mind changes the subject's mind . . . .

The degree to which perception can be differentiated from subconscious neural activity is a whole different (and tangential) question.
Title: Overcoming the Perception Problem
Post by: Soap on 2012-10-08 19:59:49
To rephrase then: what, in any trial, is blind testing designed to filter out?

Bias
What doesn't need attacking, because it's self-evident, is that psych evaluations often depend crucially on the subject not being aware of the purpose of the test. Why is that?

So a test of perception is now a "psych" test?  (Whatever that lump is.)

I think you may have misunderstood the Lotto Labs experiments I referred to: maybe check them out. They aren't about manipulating the tester's state: they explore how (surprisingly) easily perceptual states are changed by environmental conditions: changing the subject's mind changes the subject's mind . . . .

Not in the slightest.  They aren't about manipulating the tester's state, but they are dependent upon manipulation of tester's state. 

Regardless, you dodged the question.  Call it "tester's state" call it "environmental conditions", call it what you will.  Where is your argument, much less your evidence, that the manner of ABX testing practiced creates a systematic bias in "environmental conditions"?  The Lotto experiments were dependent on such a systematic influence.  If you can't demonstrate one They Are Not Relevant.

The degree to which perception can be differentiated from subconscious neural acticity is a whole different (and tangential) question.


Agreed, it is totally off topic and undefended.  But it is one you brought up.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-08 23:32:53
To rephrase then: what, in any trial, is blind testing designed to filter out?

Bias

Partly, yes: more specifically, in the clinical domain 'blindness' separates the psychological from the physiological. From the perspective of a drugs trial, psychological factors are generally extraneous and need to be excised from the process. From the perspective of an auditory trial, 'psychological factors' are the subject of the test.

I think you may have misunderstood the Lotto Labs experiments I referred to: maybe check them out. They aren't about manipulating the tester's state: they explore how (surprisingly) easily perceptual states are changed by environmental conditions: changing the subject's mind changes the subject's mind . . . .


Not in the slightest. They aren't about manipulating the tester's state, but they are dependent upon manipulation of tester's state. Regardless, you dodged the question.  Call it "tester's state" call it "environmental conditions", call it what you will.  Where is your argument, much less your evidence, that the manner of ABX testing practiced creates a systematic bias in "environmental conditions"?  The Lotto experiments were dependent on such a systematic influence.  If you can't demonstrate one They Are Not Relevant.


Again, the state of the tester is irrelevant: it's the subject we're interested in - the testees, if you will - and the test environment. You're not making yourself clear: are you saying you don't like the conclusions of the experiments I referred to? Or that environment makes no difference to perception? Or are you claiming that listening to music for pleasure, through known equipment, is materially the same as listening analytically under test conditions, 'blind' to what you're hearing?

The degree to which perception can be differentiated from subconscious neural acticity is a whole different (and tangential) question.


Agreed, it is totally off topic and undefended.  But it is one you brought up.

As it seems to me that you introduced this irrelevant topic - and apparently vice versa - can we agree to move on?! I don't want to get bogged down redefining the problem, but would like to begin discussing solutions . . .
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 00:30:37
Part of Beau Lotto's 'Public Perception project was documented by the BBC Horizon programme in 2011: http://www.bbc.co.uk/programmes/b013c8tb (http://www.bbc.co.uk/programmes/b013c8tb)
http://www.lottolab.org/programmes-article...nperception.asp (http://www.lottolab.org/programmes-article_humanperception.asp)

He expressed surprise that such a subtle a priori manipulation of the subject's self-esteem (of all things!) would depress visual acuity to the extent it did. Particularly germane to this discussion is that the test required subjects to distinguish two subtly different colours.

That particular experiment neatly illustrates the problem, but the issue doesn't hinge on any single experiment: the point is general to all such tests. If the method in any way modifies the 'normal' state of the listener, the data will be invalid, and subsequent statistical analysis is a fool's errand. Attempts to borrow the credibility of drug trials for the purpose of a perception test fatally misunderstands the purpose of such testing and is (at best) sloppy.

Although it's tempting to reach for the conclusion that almost everything is identical, there is an equally valid interpretation of results generated by DBT tests which invariably demonstrate a diminution of differences (oranges become more like lemons, Stradivari become more like toys, speakers become more like speakers) that seem apparent when sighted: namely that the method itself results in a diminution of differences. The more 'blunt tool' results emerge from blind perception tests, the less credible they look.
Title: Overcoming the Perception Problem
Post by: DVDdoug on 2012-10-09 00:59:52
I just don't see how making a good scientific listening test blind ever makes the results less reliable.  There are however, cases where making the test non-blind clearly makes it less reliable.

Quote
In psychology it's axiomatic that for many experiments the subject must be unaware of the nature of the test (see Milgram).
In my limited study of psychology, a lot of psychological experiments seem to involve outright deception.  All of the lying to the subjects has always made me a bit uncomfortable.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-09 03:51:10
Partly, yes: more specifically, in the clinical domain 'blindness' separates the psychological from the physiological. From the perspective of a drugs trial, psychological factors are generally extraneous and need to be excised from the process. From the perspective of an auditory trial, 'psychological factors' are the subject of the test.


They can be so in medical DBTs too, e.g, the effectiveness of a drug or treatment as a pain reliever.


Quote
Although it's tempting to reach for the conclusion that almost everything is identical, there is an equally valid interpretation of results generated by DBT tests which invariably demonstrate a diminution of differences (oranges become more like lemons, Stradivari become more like toys, speakers become more like speakers) that seem apparent when sighted: namely that the method itself results in a diminution of differences. The more 'blunt tool' results emerge from blind perception tests, the less credible they look.


This is an argument from incredulity.  You just can't believe that your/our sighted perceptions could be so *wrong*. 

Explain, then, how people DO get it so wrong, in the case where two bottles of the same wine 'taste' vastly different, depending on how they are labelled. Or in the case where the listener 'hears' a vast difference between unit A and unit B, when in fact unit B has never even been put into the circuit.
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-09 04:54:56
@item:
Perhaps you could share with us a little about who you are so that we can put your point of view into proper perspective.

This site has TOS #8 in place to keep the signal to noise ratio high. As has been aptly pointed out, sighted tests provide absolutely no guarantee of reliability, whereas positive double-blind tests do.  The only concern on the table that could possibly have any validity (I am being generous) is that double-blind testing might reduce the sensitivity of the person taking the test.  This is perfectly fine since a failed DBT is not used as universal proof that two things must sound the same which is generally where those arguing on behalf of placebophiles get tripped up.  FWIW, as a professional tester I can tell you that I actually pay closer attention to detail when I am consciously involved in a test, despite DBT skeptics and snake oil salesmen telling me that I can't or don't.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-09 09:57:21
a failed DBT is not used as universal proof that two things must sound the same


Quoted for truth. Or more generally: we are seeking evidence for differences. Just because a test failed to verify, it does not imply it falsifies.


There are many cases where a DBT would fail to demonstrate (real) differences. E.g., there could be a difference only 1 out of 3 could ever detect. Then you (as one person) would more likely than not fail. Or it could be that your hit rate is only slightly above the 50/50 mark. Better than coinflipping, but you need a bit of data to actually verify it. Not enough data --> no verification.
... but then: would any of these issues be resolved by simply equipping the test subject -- or the person administering the test -- with a set of bias-provoking prejudices?

Now there are cases where DBTs are not too feasible. Heart transplants, for example. You don't open a patient, remove the heart and blindfold the surgeon before flipping the coin over whether the old or the new heart is to be inserted (and then make sure the patient does not know). And who has tried to DBT different live venues? That does not mean that the double-blindness is bad, just that it might be hard to obtain.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-09 10:53:38
Part of Beau Lotto's 'Public Perception project was documented by the BBC Horizon programme in 2011: http://www.bbc.co.uk/programmes/b013c8tb (http://www.bbc.co.uk/programmes/b013c8tb)
http://www.lottolab.org/programmes-article...nperception.asp (http://www.lottolab.org/programmes-article_humanperception.asp)

He expressed surprise that such a subtle a priori manipulation of the subject's self-esteem (of all things!) would depress visual acuity to the extent it did.

[...]

Although it's tempting to reach for the conclusion that almost everything is identical, there is an equally valid interpretation of results generated by DBT tests which invariably demonstrate a diminution of differences (oranges become more like lemons, Stradivari become more like toys, speakers become more like speakers) that seem apparent when sighted: namely that the method itself results in a diminution of differences.



You can of course link to reliable scientific metastudies documenting that 'invariably' and that the differences that disappear, are for real? Not that the effect is proven every now and then in certain setups, but that it 'invariably' is so?

And still this is no excuse for uncontrolled tests. Under the (hardly controversial) hypothesis that test setups can manipulate the test subjects in a manner crucially affecting the results, one takes measures to test ceteris paribus; that is, at least until one can establish that a certain manipulation introduced to the test setup is more likely to get you true answers (not only cocksure answers) and can be administered reliably. Good if we can find one (and in certain setups one can isolate the effect of this and correct for it, which is unlikely to happen if a marketing guy is about to convince you).

Look, the scientifical conventions are themselves biased towards the null hypothesis: one has accepted standards which by themselves fail to accept lots of true answers, this because one does not want to accept false answers. That means that a lot of facts will have the status as unconfirmed hypotheses for a long time. If the only test result that could possibly indicate a difference is prone to indicate a false difference stemming from mumbojumbo marketing, then one should not accept it as evidence. Too bad if the difference is for real, but that's the cost of not being gullible. (Too bad if the millions I was offered from Nigeria yesterday were for real too, but heck ...)
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 11:14:46
Although it's tempting to reach for the conclusion that almost everything is identical, there is an equally valid interpretation of results generated by DBT tests which invariably demonstrate a diminution of differences (oranges become more like lemons, Stradivari become more like toys, speakers become more like speakers) that seem apparent when sighted: namely that the method itself results in a diminution of differences. The more 'blunt tool' results emerge from blind perception tests, the less credible they look.


This is an argument from incredulity.  You just can't believe that your/our sighted perceptions could be so *wrong*.


Easy tiger: that's a straw man - heck, can we do better than reach for these lazy forumspeak dismissals? Not an argument from incredulity: I'm simply describing results generated by DBT perception testing in the least controversial manner possible, but referring to an equally valid interpretation.

Explain, then, how people DO get it so wrong, in the case where two bottles of the same wine 'taste' vastly different, depending on how they are labelled. Or in the case where the listener 'hears' a vast difference between unit A and unit B, when in fact unit B has never even been put into the circuit.


That's the interesting question, isn't it? It's desirable to remove the powerful filter of expectation bias in such tests - in an attempt to reach the 'actual experience' rather than the 'perceived experience' of the subject (becoming a slippery concept at this point). It's self-explanatory that such bias operates negatively: we don't need further proof that people see and hear things that aren't there if they receive sufficient suggestion. Blah blah QED.

But what is expectation bias and why is it such a strong force in our perception? Broadly, EB is a key part of our mental mechanism for predictive modeling and pattern-building. We are very powerfully hardwired to gather clues from our immediate environment from which a build a framework for what happens next.

Sensory deprivation has profoundly disorienting effects - particularly time sense - because it removes the markers needed to build perceptive frameworks. Similarly, being fed false or uncertain cues, or being deprived of them altogether (the crucial 'blind' part of a perception test) may impair construction of a perceptive framework: the ability to identify and model characteristic differences between unknown stimuli.

Removing expectation bias throws out the baby with the bath-water.

Beyond doubt, though, is that creating this environment is a major shift in mental state of the subject: there is precedent for such 'panic response' modes accelerating (adrenal reflexes) and repressing (quiz show contestant) brain function, but the overwhelmingly homogenised results generated by DBT point strongly to the latter. Perhaps there is useful research on this somewhere: I'm not aware of it.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 11:42:37
@item:
Perhaps you could share with us a little about who you are so that we can put your point of view into proper perspective.


What do you mean by 'proper perspective'? Am I looking at an ad hominem warmup or a chat-up line?

This site has TOS #8 in place to keep the signal to noise ratio high. As has been aptly pointed out, sighted tests provide absolutely no guarantee of reliability, whereas positive double-blind tests do.  The only concern on the table that could possibly have any validity (I am being generous) is that double-blind testing might reduce the sensitivity of the person taking the test.  This is perfectly fine since a failed DBT is not used as universal proof that two things must sound the same which is generally where those arguing on behalf of placebophiles get tripped up.


Positive DBT is inherently cast-iron. The problem is that negative results equally indict the efficacy of the method, and that DBT perception tests are anathema: they generate results with poor resolution: they conform suspiciously well to the 'bad test' model: ie, they generate positives for gross phenomena but fail to recognise fine-grained distinctions. Wrong sieve size is a plausible diagnosis. Given that the test is misappropriated from a different domain and therefore - by definition - crudely tampers with its objective, this isn't surprising.

FWIW, as a professional tester I can tell you that I actually pay closer attention to detail when I am consciously involved in a test, despite DBT skeptics and snake oil salesmen telling me that I can't or don't.

That's exactly the point: test conditions create an environment in which you have to 'pay closer attention' - in reality, listen in an entirely different way, disorientated and deprived of cues. For a psych test, that's inadmissable.

Again, the purpose of DBT is to remove subjectivity as a factor. It can't legitimately be applied with any degree of precision to a study of subjectivity. Negative DBT results in the physiological domain are always open to question, but in this domain they aren't even interesting, and it's an embarrassment to the cause to see such faith placed in them.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 12:23:31
You can of course link to reliable scientific metastudies documenting that 'invariably' and that the differences that disappear, are for real? Not that the effect is proven every now and then in certain setups, but that it 'invariably' is so?

And still this is no excuse for uncontrolled tests. Under the (hardly controversial) hypothesis that test setups can manipulate the test subjects in a manner crucially affecting the results, one takes measures to test ceteris paribus; that is, at least until one can establish that a certain manipulation introduced to the test setup is more likely to get you true answers (not only cocksure answers) and can be administered reliably. Good if we can find one (and in certain setups one can isolate the effect of this and correct for it, which is unlikely to happen if a marketing guy is about to convince you).


It would have been interesting to learn from that experiment the ratio of self-esteem depression to acuity. Then again, how is that measured? Beau Lotto is a frequent speaker on TED: his Public Perception project is worth a look.

This forum is a good repository of DBT acuity depression 'invariability' - or, by another interpretation of the same results - The Truth. However, the latter interpretation places an uncomfortably unquestioning faith in the application of the method.

Visual acuity is more amenable to study: direct function of grey matter percentage! Auditory perception is highly perturbable and poorly reportable. It requires subtle - and direct - interrogation of mental processes. You can't inspect Brownian motion while shaking the flask.

Look, the scientifical conventions are themselves biased towards the null hypothesis: one has accepted standards which by themselves fail to accept lots of true answers, this because one does not want to accept false answers. That means that a lot of facts will have the status as unconfirmed hypotheses for a long time. If the only test result that could possibly indicate a difference is prone to indicate a false difference stemming from mumbojumbo marketing, then one should not accept it as evidence. Too bad if the difference is for real, but that's the cost of not being gullible. (Too bad if the millions I was offered from Nigeria yesterday were for real too, but heck ...)

'Null' definition depends on the intent of the test. It's perfectly proper that we should rail against commercial exploitation: if you tell someone an amplifier costs £5000, they will likely believe it sounds better than a £500 one, even if the labels are swapped. But there's a disturbing lack of rigour that muddies the waters when the intent of widespread homebrew tests is retaliatory, not exploratory. It's ironic that negative DBT results are spun with evangelical zeal as misleading as any manufacturer's hype - particularly when based on a spurious borrowing.

Fundamentalists of either stripe cling to mechanical measurements and DBT negatives on one side of the fence, and the primacy of experience on the other. Personally, I'm reluctant to come down on one side or the other, because of fundamental conflicts and flaws in the position of both camps. But I do think we should be open-eyed and honest about it: this is one time being blind doesn't help.
Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-09 12:35:33
The only people I know of to shun DBT method of testing audio equipment are those that live off them in some way (editors/writers of hifi magazines, hifi salesmen) and people who themselves believe in their superiority over the common plebs in terms of hearing. The first kind knows that utilizing DBT in their reviews would result in sales plummet, and the loss of income through payed reviews and advertising. Second ones are often technologically disabled, and are more prone to explain things to themselves (and, more dangerously, to others) through pure magic and rituals, than to actually learn what is going on, because the bubble they live in would burst.
Now, you say DBT testing would somehow influence the listener and he wouldn't hear the difference because of reasons. Bear in mind that these people often claim the "sky-earth" difference between two DACs, for example, so I hardly believe that testing that difference over the course of time (a month, a year) would involve any stress, and that they wouldn't hear even the tiniest difference, if it really exists.
That argument is so invalid - if you are so easily affected by switching buttons form A to B, X to Y, than I am sure that every listening to the same song is a new experience, and it sounds different altogether  and that difference, either is there or is not, that difference does not exist only when we are casually listening to music. Humans can't telepathically effect the bitstream in DACs or optical cables yet. It doesn't care what are you feeling, it just - streams and decodes, over and over again, every time you play the song.
I really don't care how the ABX testing in medical research works - I'm not into medicine at all, and for the hydrogenaudio's sake, it shouldn't matter. Only thing that matters is audio ABX test, which serves to individuals to see if they really can hear difference between two codecs, or two DACs, if they have equipment to set this up. Individuals set up the testing environment as they prefer (I like drinking cocoa, for example), and the test is straightforward in the results - either you can hear the difference, or you can't. If you can't, that doesn't mean there is none, it just means that you can't hear it. Someone else might.
So, why do you try so hard to convince us that ABX isn't valid method?
Title: Overcoming the Perception Problem
Post by: aethelberht on 2012-10-09 13:17:02
What I can perhaps maybe possibly gather from your posts is that blind perception experiments are crude. What I don't gather is how the negative results have any impact on the positive results.

Ideally you would stop rambling and be more concise, but at the very least, you should explain explicitly what it is that is faulty. You begin by questioning "the credibility of ABX from the physical domain to perception testing." If it is this broad challenge, then consider the fact that Signal Detection Theory and discrimination experiments, ABX being one of them, have widespread use in the speech perception literature. Are you suggesting that subjects performing well in spite of the lack of cues leads to flawed conclusions?

But the issue closer at hand seems to be the much narrower one, of the retaliatory use of negative ABX results as evidence that things sound the same. And yet you keep responding to comments about the scientific (in)validity of this with comments such as "The problem is that negative results equally indict the efficacy of the method" without explaining how this is so.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 13:25:04
The only people I know of to shun DBT method of testing audio equipment are those that live off them in some way (editors/writers of hifi magazines, hifi salesmen) and people who themselves believe in their superiority over the common plebs in terms of hearing. The first kind knows that utilizing DBT in their reviews would result in sales plummet, and the loss of income through payed reviews and advertising. Second ones are often technologically disabled, and are more prone to explain things to themselves (and, more dangerously, to others) through pure magic and rituals, than to actually learn what is going on, because the bubble they live in would burst.
Now, you say DBT testing would somehow influence the listener and he wouldn't hear the difference because of reasons. Bear in mind that these people often claim the "sky-earth" difference between two DACs, for example, so I hardly believe that testing that difference over the course of time (a month, a year) would involve any stress, and that they wouldn't hear even the tiniest difference, if it really exists.
That argument is so invalid - if you are so easily affected by switching buttons form A to B, X to Y, than I am sure that every listening to the same song is a new experience, and it sounds different altogether  and that difference, either is there or is not, that difference does not exist only when we are casually listening to music. Humans can't telepathically effect the bitstream in DACs or optical cables yet. It doesn't care what are you feeling, it just - streams and decodes, over and over again, every time you play the song.
I really don't care how the ABX testing in medical research works - I'm not into medicine at all, and for the hydrogenaudio's sake, it shouldn't matter. Only thing that matters is audio ABX test, which serves to individuals to see if they really can hear difference between two codecs, or two DACs, if they have equipment to set this up. Individuals set up the testing environment as they prefer (I like drinking cocoa, for example), and the test is straightforward in the results - either you can hear the difference, or you can't. If you can't, that doesn't mean there is none, it just means that you can't hear it. Someone else might.
So, why do you try so hard to convince us that ABX isn't valid method?

I think perhaps you underestimate the general level of the public's intelligence. Anyone who buys a piece of audio equipment knows that most visitors to their house will point out that this system sounds pretty similar to the last one they were excited about.

At the back of their mind, most buyers know that past a certain basic level of competence, expensive equipment is all counting the number of angels dancing on a pinhead. But people go see illusionists because the illusion is fun. People buy fancy boxes because there is pride of ownership - and - away from the hype, alone in their living room - for whatever reason - there is a absolutely real, fundamental sense of pleasure in music reproduction that may - or may not - derive from measured performance of the boxes. There is also the unshakeable fact that humans are status-driven, and that audio equipment is a status symbol, just like a car.

This may all be wrong, but it will persist. It is unaffected by our little erudite discussions about what ultimately can be measured or perceived.

I also know personally a number of editors, writers, salesmen and manufacturers: some do think they are superior to the plebs, but not usually because of their hearing: it's just their character. Similarly, I know a number of judges and car salesmen and postman who feel exactly the same way. It's also completely untrue that DBT for audio is unused in the audio industry: it's a standard tool for many makers and reviewers.

The sole, specific point I'm making is that DBT is rarely used in perception testing for obvious reasons outlined above, and attempting to smear its credibility from the physiological domain is intellectually dishonest. And that the abundance of negative results indicates coarse granularity in the test method as much as it supports any particular paradigm.
Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-09 13:54:06
The sole, specific point I'm making is that DBT is rarely used in perception testing for obvious reasons outlined above, and attempting to smear its credibility from the physiological domain is intellectually dishonest. And that the abundance of negative results indicates coarse granularity in the test method as much as it supports any particular paradigm.


The only reason there is so much "failed" ABX tests is simple - people tend to believe in many magickal beings living in their amps and speakers and wires and headphones. But when put in front of magnifying glass, those little creatures tend to disappear. It's human fault in believing in ghosts rather than the fault of the testing method. And that's it.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 14:01:06
What I can perhaps maybe possibly gather from your posts is that blind perception experiments are crude. What I don't gather is how the negative results have any impact on the positive results.

Ideally you would stop rambling and be more concise, but at the very least, you should explain explicitly what it is that is faulty. You begin by questioning "the credibility of ABX from the physical domain to perception testing." If it is this broad challenge, then consider the fact that Signal Detection Theory and discrimination experiments, ABX being one of them, have widespread use in the speech perception literature. Are you suggesting that subjects performing well in spite of the lack of cues leads to flawed conclusions?

But the issue closer at hand seems to be the much narrower one, of the retaliatory use of negative ABX results as evidence that things sound the same. And yet you keep responding to comments about the scientific (in)validity of this with comments such as "The problem is that negative results equally indict the efficacy of the method" without explaining how this is so.


Abstract:
A positive DBT result establishes reliably that two outcomes or entities differ.
A negative means - equally - either a) the two objects are identical, or b) that the method doesn't permit resolution of their differences.
Inherent in DBT is that 'blindness' de-normalises test conditions: the mechanism by which expectation bias is removed represses acuity by removing cues from which perceptive frameworks are built.
DBT is designed to separate subjective and objective responses and provide hard-to-falsify positive outcomes. Its suitability for physiological testing is not transferable to psychological testing but negative outcomes are aggressively touted as meaningful.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 14:09:09
The only reason there is so much "failed" ABX tests is simple - people tend to believe in many magickal beings living in their amps and speakers and wires and headphones. But when put in front of magnifying glass, those little creatures tend to disappear. It's human fault in believing in ghosts rather than the fault of the testing method. And that's it.


Sure, people believe in crazy things.

I'm more interested in what happens when subjects are put in front of a magnifying glass: whatever disappears from their mind in those conditions is just as interesting as whatever does or doesn't 'appear' (sonically) in the test room.
Title: Overcoming the Perception Problem
Post by: aethelberht on 2012-10-09 14:14:55
"A negative means - equally - either a) the two objects are identical, or b) that the method doesn't permit resolution of their differences."

Means equally to whom? You? "To be sure, statements have been made in the literature to the effect that human listeners simply cannot perceive certain auditory properties of speech sounds, and this has, of course, been grist for the psychophysical mill. Apart from dismissing such extreme claims..." (Rapp 1986 (http://www.haskins.yale.edu/sr/sr086/SR086_01.pdf))

"Its suitability for physiological testing is not transferable to psychological testing"
You haven't established this at all.

"but negative outcomes are aggressively touted as meaningful."
And when it's not so touted? Should we throw all positive experimental babies out with the bathwater because of overaggressive interpretation in other cases?
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-09 14:15:20
Positive DBT is inherently cast-iron. The problem is that negative results equally indict the efficacy of the method, and that DBT perception tests are anathema: they generate results with poor resolution: they conform suspiciously well to the 'bad test' model: ie, they generate positives for gross phenomena but fail to recognise fine-grained distinctions. Wrong sieve size is a plausible diagnosis. Given that the test is misappropriated from a different domain and therefore - by definition - crudely tampers with its objective, this isn't surprising.
With respect, this is demonstrably wrong. The core tests that probe the very limits of human hearing use blind testing, and deliver results that match predictions from the known physiology of the ear. To get these results takes careful training - people need to learn what to listen for before they can hear as well as the physiology would predict.


You've written many words, but like most blind testing bashing, it comes down to this: "when people are under test, they listen differently so we can't know what they really hear. If they don't know what they are listening to, they are even more stressed." What if they do know what they are listening to, like in most hi-fi magazine reviews? The people are still "under test", yet seem to hear just fine? Given that knowing what you are listening to is both the only differentiating variable, and a known feature that will give completely unreliable results, you are either wrong (that's my guess), or are correct and have just kicked audio into a "never possible to know" philosophical world.

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-09 14:37:35
Positive DBT is inherently cast-iron. The problem is that negative results equally indict the efficacy of the method, and that DBT perception tests are anathema: they generate results with poor resolution: they conform suspiciously well to the 'bad test' model: ie, they generate positives for gross phenomena but fail to recognise fine-grained distinctions. Wrong sieve size is a plausible diagnosis. Given that the test is misappropriated from a different domain and therefore - by definition - crudely tampers with its objective, this isn't surprising.
With respect, this is demonstrably wrong. The core tests that probe the very limits of human hearing use blind testing, and deliver results that match predictions from the known physiology of the ear. To get these results takes careful training - people need to learn what to listen for before they can hear as well as the physiology would predict.


You've written many words, but like most blind testing bashing, it comes down to this: "when people are under test, they listen differently so we can't know what they really hear. If they don't know what they are listening to, they are even more stressed." What if they do know what they are listening to, like in most hi-fi magazine reviews? The people are still "under test", yet seem to hear just fine? Given that knowing what you are listening to is both the only differentiating variable, and a known feature that will give completely unreliable results, you are either wrong (that's my guess), or are correct and have just kicked audio into a "never possible to know" philosophical world.

Cheers,
David.

'Knowing what you are listening to' is the troublesome variable. The power of suggestion derives from pre-erecting a preliminary framework with given reference points. The mind obediently attaches incoming sense data to that superstructure because it's hard, slow work to build a model from scratch. Proprioception is another example of the mind slowly building consistent external reality models only via trial and error cycles (Oliver Sacks has a fine description of this in 'A Leg to Stand On').

Deprived of reference points, acuity suffers. Not badly enough to become deaf, obviously - but badly enough to diminish large variables to small ones, and make small ones vanish entirely - which isn't a bad one-line summary of DBT perception results: particularly with reference to hearing which - being driven by feebler mental horsepower - is more prone to suggestion (and more in need of supporting frameworks) than sight (hence McGurk).

Either this is wrong (as you say), or truly accurate testing of this type will come later, when we can directly, mechanically examine - and analyse - brain response without tampering with the subject's psychological state. Certainly not a 'never possible to know' scenario.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-09 14:53:36
DBT is designed to separate subjective and objective responses and provide hard-to-falsify positive outcomes.


I don't know why you introduced new terms, namely subjective and objective at this point.

You seem to be confused. Blind testing is merely used to control the various possible influences in a test. Objectivity and subjectivity need not be relevant.

For example I can control influences on a test so that subjectivity is maximized and objectivity is minimized. Or not. Or the opposite.

The opposite of a blind test is thus a completely uncontrolled test in which I have no idea what is or is not influencing the outcome.

This seems to be such a poorly-conditioned situation that it begs the question, why are you wasting your time doing this? ;-)

In the audio world non-blind testing is commonly used to make the outcome of the alleged test strongly influenced by expectations or other needs such as the need for an anecdote for a review promoting a certain product.


Quote
Its suitability for physiological testing is not transferable to psychological testing but negative outcomes are aggressively touted as meaningful.


Again you seem to be confused. Do you think that audio testing is physiological, psychological, or technical?  You seem to have excluded testing for technical purposes for some reason. This seems strange because audio listening tests are usually represented as being tests of audio products, not people.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-09 16:04:01
This forum is a good repository of DBT acuity depression 'invariability' - or, by another interpretation of the same results - The Truth.


Ehm ... are you sure you don't confuse The Truth with Conclusions Inferred From Reliable Evidence?

Scene I:
(1) You have an entertainment show, where you roll six dice in front of a large bunch of people.
(2) You take a sneak peak and know they are 1-2-3-4-5-6.
(3) I guess 1-2-3-4-5-6, and it turns out to be The Truth.
(4) Journalist calls up and wants an interview with this clairvoyant guy.

Obviously, it need not be that I am clairvoyant. Journalist has no control over the experiment, and doesn't know how way out of the null hypothesis this is (for that one would in the very least need to know the number of people). Solution: do a controlled test.


Scene II: Journalist has no idea what a controlled test environment is, and put you in front of me, rolling the dice.
(1) You roll six dice.
(2) You take a sneak peak and know they are 1-2-3-4-5-6.
(3) I am thinking really hard now. I guess ... 1? Your face lights up. I am meandering ... maybe another one, or maybe not, maybe a 2? Your face lights up. Et cetera.
(4) Wow! News story!

Here the issue is not the randomness. The issue is that the test is not double-blinded, and your behaviour influences my answers.


Now suppose for the sake of the discussion that clairvoyance isn't impossible. Suppose, for the sake of the discussion, that I am indeed clairvoyant -- I only need to think very hard first. In that case, that is The Truth. But is there any evidence for it? Sure not.
Title: Overcoming the Perception Problem
Post by: ExUser on 2012-10-09 17:09:48
I think perhaps you underestimate the general level of the public's intelligence. Anyone who buys a piece of audio equipment knows that most visitors to their house will point out that this system sounds pretty similar to the last one they were excited about.
I am a DJ and heavily-involved in my local electronic music scene. The amount of disinformation and outright lies that people who are professionally involved in audio reproduction believe is absolutely mind-boggling, even (especially?) without their paychecks revolving around selling hardware.
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-09 17:43:56
Deprived of reference points, acuity suffers. Not badly enough to become deaf, obviously - but badly enough to diminish large variables to small ones, and make small ones vanish entirely - which isn't a bad one-line summary of DBT perception results: particularly with reference to hearing which - being driven by feebler mental horsepower - is more prone to suggestion (and more in need of supporting frameworks) than sight (hence McGurk).
Are you claiming people are more prone to hearing things that aren't there than seeing things that aren't there - in respect of the brain "filling in gaps" to create what turns out to be an incorrect model of the real world? I'm not sure that's true.


Quote
Either this is wrong (as you say), or truly accurate testing of this type will come later, when we can directly, mechanically examine - and analyse - brain response without tampering with the subject's psychological state. Certainly not a 'never possible to know' scenario.
...but they've already hacked cat's heads about to find out what signals went in/out of the auditory nerve long before it was possible to do this in a humane way. It's relevant because they have similar cochlea to us, and it's found that, like us, the losses (what you can't hear) derive from the air to neural transduction process.

So we've got cut up cats, predictions from physiology, and blind tests on humans all delivering the same "what difference is just audible" data - but you don't trust the blind tests?


Yet you'll trust the brain response. That's strange, since no one is doubting that placebo is a real brain response - it's just not a response to what you hear!  And if we measure some brain response when people are not aware of what they're listening to (i.e. some response to A that is absent to B), either this will be associated with a conscious audible difference, or not. If not, who cares. If so, then having reported hearing it when not knowing what they were listening to, they've just passed a blind test!

What does the brain scan add? Think this through. Draw a flowchart of the possibilities if it helps.

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: dhromed on 2012-10-09 19:11:29
'Knowing what you are listening to' is the troublesome variable.

Exactly. It's very troublesome. So we remove that variable in order to get better data.

Deprived of reference points, acuity suffers

You mean the "reference point" of knowing the brand of amplifier playing? Of knowing the encoder? We remove them, because they muddy the data. The only reference point you need is the first piece of audio that plays in your test.
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-09 22:07:26
@item:
Perhaps you could share with us a little about who you are so that we can put your point of view into proper perspective.

What do you mean by 'proper perspective'? Am I looking at an ad hominem warmup or a chat-up line?]

Let's just focus about the first part of my question and not worry about the second part.

Please share with us a little about who you are and who you might represent.

Quote
FWIW, as a professional tester I can tell you that I actually pay closer attention to detail when I am consciously involved in a test, despite DBT skeptics and snake oil salesmen telling me that I can't or don't.

That's exactly the point: test conditions create an environment in which you have to 'pay closer attention' - in reality, listen in an entirely different way, disorientated and deprived of cues.

Disorientated and deprived of cues?  You appear not to realize that double-blind testing can (and often does!) provide for the listener to audition the subjects/samples that are known as they are known.  To elaborate on the part that you quoted, during testing I pay closer attention to details explicitly to listen for differences.  If I am listening casually, I do just that, which is to say relax and enjoy.  I have no doubt that my casual listening is done so with less acuity. To put it another way, when was the last time someone accused you of paying close attention because you forgot something you were told and were expected to remember?

If part of my enjoyment has to do with knowing that my equipment or sample is XYZ then that is my business.  If I think I am actually perceiving something differently as a result I respect the forum and refrain from discussing it unless I can comply with its rules.

Quote
For a psych test, that's inadmissable.
Perhaps, perhaps not; I really don't care either way.

Quote
Again, the purpose of DBT is to remove subjectivity as a factor. It can't legitimately be applied with any degree of precision to a study of subjectivity.
...which is nonsense and another obvious display of ignorance.  MUSHRA is well accepted as a double blind test that provides for subjective grading.

Quote
Negative DBT results in the physiological domain are always open to question, but in this domain they aren't even interesting, and it's an embarrassment to the cause to see such faith placed in them.
According to you.  Personally I am really not interested in counting angels dancing on the head of a pin in order to describe the possibility of a difference when there exists no verified objective test data to confirm it.  This is especially the case when the difference is characterized as "night-and-day."

Back on the idea of perceiving differences as a result of having a-priori knowledge of what is being heard, I don't think anyone with enough understanding is denying its power to foster truly tangible differences in the mind.  We simply aren't interested in reading about them here.  There are plenty of other places where you can indulge yourself.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-10 01:25:22
The sole, specific point I'm making is that DBT is rarely used in perception testing for obvious reasons outlined above, and attempting to smear its credibility from the physiological domain is intellectually dishonest.


And as I told you before,  double blind protocols are common in pain perception studies, where self-reports of perception are the outputs.  You going to address all that, or punt again? 

Quote
And that the abundance of negative results indicates coarse granularity in the test method as much as it supports any particular paradigm.


It 'indicates' that to you because you believe it should, not because it's necessarily true. And as I said before, this amounts to nothing more than an argument from incredulity, if not ignorance.

As 2bdecided already noted, DBTs can support audio signal level differences down to the physical limits of human hearing. Their *granularity* is quite good, if the protocol is good, your windy assertions notwithstanding. 

And now, Mr Joined-in-August,  I trust you won't mind if I sit back and await your predictable departure (back?) to more woozy audio forums, where you'll claim you were driven out of HA for 'unorthodoxy' or 'thinking outside the box'.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-10 13:09:52
While I certainly agree that putting humans in a lab-situation might change their behaviour and even their perception (although as Greynol points out, it is often likely to sharpen awareness), I still am not getting why this should ever be an argument against attempting to obtain a controlled environment. Certainly there are cases where it isn't feasible. E.g., if I spend a month rebuilding my living room with to get rid of vibrations, damping standing waves (and getting extra insulation against the cold outside), there is no practically feasible way to ABX the old room and the new one [note*]. But I still don't see why it the best practice of research would be to have the listening tests being carried out after hearing me boasting of how much work it was and how much I paid for it.

@item,
could you suggest a design of experiment where blinding the listener or the test administrator would be outright bad for the experiment?


Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-10 21:04:46
Abstract:
A positive DBT result establishes reliably that two outcomes or entities differ.
A negative means - equally - either a) the two objects are identical, or b) that the method doesn't permit resolution of their differences.


No, you are wrong at the second part. You don't set up DBT of lossy audio encoders on supertweeters, with FR form 35 kHz to 80 kHz. That method is not going to give anything useful. It's idiotic. Do you really think that peope can't make valid DBT audio tests? It has been done in the past numerous times.

If you set up test correctly, as expected, there will be only "yes, I can definitely hear the difference between A and B", and "no, I can't hear shit". It doesn't matter if the difference in reality is so subtle you can't hear it (lossless vs high bitrate lossy), because the only thing you are testing is if you can hear that difference, and the results are YES and NO. There is no MAYBE UNDER CERTAIN CIRCUMSTANCES, because you then recreate those circumstances and repeat the test. As many times you want, as long as you need.

The trouble is when you realize that you have all that 24/96 music, with fancy expensive amp and loudspeakers hand made from Siberian wood which grew in Tunguska at the crater (not to forget hand-made speaker drivers), and you can't hear the difference from 128 kbit aac file and that high bitrate lossless file. Then the imagination kicks in.

And you, sir, are the product of that mentality, which is contagious.
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-11 11:58:57
It's like The Princess and the Pea. Audiophiles all want to be Princesses, able to feel the pea no matter how many mattresses are between it and them.

Similarly, no matter how good the audio is, and how small a change you make, that change is always audible.

"I can hear a difference" = "I'm a real princess - I can feel the pea"!



I don't believe even a "Real Princess" could feel that pea - it only happens in fairytales, so I guess that means most hi-fi magazines are just that: fairytales.

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-11 14:20:41
If you set up test correctly, as expected, there will be only "yes, I can definitely hear the difference between A and B", and "no, I can't hear shit". It doesn't matter if the difference in reality is so subtle you can't hear it (lossless vs high bitrate lossy), because the only thing you are testing is if you can hear that difference, and the results are YES and NO.


Ehem ... this is not how statistical tests work.

And even if it were, your description would only be valid if this “you” is what is supposed to be tested, and arguably not even then.
Title: Overcoming the Perception Problem
Post by: skamp on 2012-10-11 14:28:53
If ABXing negatively alters one's ability to hear differences, it's only a problem if you're using negative results to prove that there is no difference, which is a fallacy in any case: while a positive ABX result shows with a high degree of probability that there IS an audible difference, a negative result never proves anything.

I don't have a problem, however, using the utter lack of positive results as probable evidence of inaudibility, when the alleged difference is claimed to be obvious ("night and day") during sighted tests. Rather, I'll ask "where's your proof?", and shrug when you fail to come up with one.

The effects of sighted bias on what is heard are at least as damaging as the supposedly negative effects on perception of ABX tests. The big difference (no pun intended) is that while sighted tests prove nothing interesting with any degree of certainty, positive ABX results show a high level of probability of unambiguous audibility. Can you think of a better methodology?
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-11 16:17:28
I believe our skeptic has flown the coop.
Title: Overcoming the Perception Problem
Post by: dhromed on 2012-10-11 17:05:21
Not everyone is as much a netizen as most of us. Perhaps he just went camping.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-11 18:24:29
If ABXing negatively alters one's ability to hear differences, it's only a problem if you're using negative results to prove that there is no difference, which is a fallacy in any case: while a positive ABX result shows with a high degree of probability that there IS an audible difference, a negative result never proves anything.


Basically,  what a 'negative' ABX results means is that the hypothesis 'there is an audible difference' was not supported, with a 'p' chance (typically  1 in 20) that an audible difference nevertheless exists . 
Title: Overcoming the Perception Problem
Post by: googlebot on 2012-10-11 21:44:41
While the OP's reasoning and claimed inference from his cited studies are certainly flawed, he touches a valid point: Double-blind testing of isolated senses puts a subject into an artificial mode of perception, that is different from our usual perception of the world, which is always a multisensory blend. It should also be out of question, that the human brain makes excessive use of inter sensoric correlation while forming a consolidated mental representation of the outside world.

An ABX test can tell you, what an attentive mind, with artificially blocked non-auditory senses, can differentiate at best through the remaining, isolated auditory channel. I do not question, that this is probably as objective as it can get, when the auditory channel alone is all you want to map. I do question how much can be inferred regarding to the experience of actual listening situations in peoples' homes, where not only other senses (seeing your carefully composed system), but also a history of attached memories, associations and whatnot are constantly part of your perception of the world. I'm not claiming that aiming to be objective in a subjective environment is senseless, but it certainly might also make sense, that well situated, older men write magazines for each other, reporting about their experiences of trying to transform their surplus of dollars back to some sense of meaning. Subjective prose, that isn't castrated by some blinded, pain-in-the ass protocol as ABX, might be an actually more "objective" guide to identify an perfectly matching audio system for a member of any common enough group of individuals.

Long term HA usage might turn your mind into something, that has become unable to extract joy from owning expensive audio gear. A history of personal ABX comparisons can attach enough associations, so that this road is just closed. Not everyone might like that and I have come to a point, where I think that both is fine. I do also have come to the belief, that a man convinced that his gold cables sound better in a sighted test, even when he is unable to verify the same results blindly, is probably not lying to us.

PS & BTW Could it be shown already that results from double-blind listening tests correlate with the results of sighted tests over a large enough pool of listeners and different setups?
Title: Overcoming the Perception Problem
Post by: [JAZ] on 2012-10-11 22:37:01
Double-blind testing of isolated senses puts a subject into an artificial mode of perception [...]
An ABX test can tell you, what an attentive mind, with artificially blocked non-auditory senses, can differentiate at best through the remaining, isolated auditory channel.  [...]
I do question how much can be inferred regarding to the experience of actual listening situations in peoples' homes,[...]

A double-blind test does not need to be any different for the subject than a sighted test. The only point is that the subject should not be aware which of the two experiences that are being evaluated is he taking.

For example, if he likes to listen with headphones, do so. If he likes to listen laying in the bed, or in a coach, looking at some big speakers, with high or low illumination, with drinks or not,... There's no limitation on that, as long as the same is used for each experience. To be even less distracted, just take another person to run the ABX program for you (i.e. maintaining it double blind, not substituting the ABX program). If it takes 5 minutes, or five hours (because you want to clearly forget about doing the ABX) is up to the subject.


I do also have come to the belief, that a man convinced that his gold cables sound better in a sighted test, even when he is unable to verify the same results blindly, is not lying to us.

If you ommited the word "sound", I could give it a pass, but since sound implies a specific event that originates outside, and is perceived by the ear, then I cannot agree.  One thing is a feeling, and another a perception.  Your mind state can make you more receptive to perceptions, but if that is the case, then you can pass an ABX. Else, there's no proof you're actually perceiving a difference, while there would be reasons to believe the placebo effect is working.

Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-11 22:58:57
I would be careful not to limit the word perceive.  Thoughts, feelings, differences in the mind are real and are perceived as such, even if the experience causing the perception came from somewhere other than the ears.  When the word sound is being used then that clearly implies that the experience is coming from the ears and only the ears.

...or am I going overboard in making the terminology more placebophile-friendly?
Title: Overcoming the Perception Problem
Post by: sld on 2012-10-12 03:38:02
Long term HA usage might turn your mind into something, that has become unable to extract joy from owning expensive audio gear.

On the contrary, I enjoy my audio gear knowing that I didn't get fleeced by myself or by someone else.

I don't know if you can actually break down your definition of "Long term HA usage" into specific activities, but ABX doesn't affect musical experiences in any way. If and when you question the quality of your gear, or want to espouse its superiority, you speak the scientific lingua franca by relating your ABX results. Only after that is established can you go on to sighted tests or spectral analysis to bring up discussions of any flavour as to why there were or not any differences.

Most people don't care for comparisons for a variety of reasons. They don't see the need to make claims, and therefore don't need to perform any tests to verify their claims. But in such cases, even a casual discussion of audio gear among friends would really be a discussion of the variety of placebos that people fall prey to.

If the difference is really night and day, it should empower you that your ABX results would come up with the necessary statistical significance in the minimum number of trials and a very short test duration, just to shut the mouths of your sceptics. Yes? And if you had the financial resources to purchase expensive audio gear, it wouldn't be a pain to get some quality switching equipment to facilitate your testing. I see anything otherwise as scientific laziness.
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-12 06:11:02
I extract all the joy I could ever need from simply experiencing music.
Title: Overcoming the Perception Problem
Post by: bandpass on 2012-10-12 06:44:24
A double-blind test does not need to be any different for the subject than a sighted test. The only point is that the subject should not be aware which of the two experiences that are being evaluated is he taking.

Indeed, using the word 'blind' (with its natural connotations of being disconcerting, stressful, etc.), where what is actually meant is lack of knowledge or awareness, is highly misleading, and something which many an audiophile builds an argument based upon.  The scientific community would do well to come up with a better term: nescient testing perhaps—testing from a position of nescience.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-12 06:59:16
While the OP's reasoning and claimed inference from his cited studies are certainly flawed, he touches a valid point: Double-blind testing of isolated senses puts a subject into an artificial mode of perception, that is different from our usual perception of the world, which is always a multisensory blend.


Except,  DBT doesn't do that.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-12 09:30:03
Among my friends, we have been blind testing ... hm, certain consumer-grade goods. The scientific standards are of course abysmal, and it is evident to anyone & everyone that as the evening passes, the experiments become increasingly worthless (it could be noted though, that the  test is likely getting increasingly more double-blind). We know that professionals try to mitigate the effect in question by spitting out the [fermented|distilled] [barley water|grape juice], a move which in our case would rather constitute a major bias-inducing effect to those who have not already tasted that particular glass.

Anyway, one of us routinely flunks the blind tests with a major smile and a “look at how much money I just saved”. A high-end salesman's worst nightmare, that guy, but I cannot help thinking sometimes that the testing itself actually makes him go for the null. Interesting to see though, how high his hit rate is when he proclaims that “this is awful, it's gotta be the expensive one”.


(Maybe we should one day actually blind test whether the sound in his living room actually sucks as much as we think it does, or whether it is just the placebo from having seen him gradually move his speakers way into the corners over the years, but we should maybe try to keep that test apart from the one I just described.)
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-12 10:08:11
@Porcus:

Long term HA usage might turn your mind into something, that has become unable to extract joy from owning expensive audio gear.
Oh no, if I had the money and no moral compulsion to do anything better with it, I'd get great joy from shiny audio gear. It's jewellery or decoration or an amazing human creation. I can already get joy from not-that-cheap and not-audibly-perfect audio products by enjoying them for what they are.

Whether I'd ever pay to upgrade 96kHz audio files to 192kHz files is another thing. You can't see the difference, and you can't hear the difference, so...

David.
Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-12 14:26:18
Ehem ... this is not how statistical tests work.
And even if it were, your description would only be valid if this “you” is what is supposed to be tested, and arguably not even then.


When doing personal test of codec or parameters, I am testing them for my usage, for myself. And then it's either "i can hear the difference" or " i can't", I don't bother if someone else can hear it. And I don't care, really, if these music files sound different for anyone else - I am using them, not trading them, not uploading them.
And I understand that these results can't be solely used as the general truth - "130 kbit tvbr aac file is enough for everyone", for example, but for my usage, these tests are more than statistically enough.

I am sorry if I am missing the point - but isn't the point of ABX test to see if YOU can hear the difference between two files? It is not for viewing the difference between files - simple filesize or binary comparation can be used for that.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-12 15:04:27
Ehem ... this is not how statistical tests work.
And even if it were, your description would only be valid if this “you” is what is supposed to be tested, and arguably not even then.


When doing personal test of codec or parameters, I am testing them for my usage, for myself. And then it's either "i can hear the difference" or " i can't"


A (one-sided) statistical test would have two outcomes: Either that the alternative hypothesis has passed and should be accepted, or do not accept the alternative hypothesis (then, keep the null). Up to the threshold of the test (confidence could of course always be an artifact of chance), the latter should be the conclusion if there is not positive evidence for the alternative (in this case, “you can hear the difference”).

That should occur if
(I) You have no better chances than a random draw
(II) Your chances are better than the random draw, but you don't have sufficient data to prove it.

Notice that before testing, you are always in one of those two situations. Even if it will turn out when the samples are played that you get a 100/100, then you are in case II until the fact is established.



I am sorry if I am missing the point - but isn't the point of ABX test to see if YOU can hear the difference between two files?


Not necessarily. Suppose I want to establish a “sufficiently good” (for whatever purpose) end-user format. Then I am not satisfied with your score on your music, unless I am only targetting you as a customer. Even if your music does not have nasty enough artifacts for you to detect (or find annoying), it might be different with other ears and other signals. (Of course, you then need to use the appropriate method (test / design of experiment) to check whether the accuracy is better than random, but that is a practical obstacle.)

If 5 percent of the listeners hear differences on 10 percent of their music collection, is then the format “transparent”? I think not. It may be good enough for the purpose, by all means, but it does not mean that there are no audible differences.
Title: Overcoming the Perception Problem
Post by: googlebot on 2012-10-12 21:44:48
Except,  DBT doesn't do that.


Why do sighted test regularly lead to different results, then? Just calling it "bias, that should be eliminated" doesn't change the fact.

Imagine the following test setup: A test subject is presented music supposedly sourced from either a Sansa Clip or his favorite Burmester rack. You present an expensive looking switch to him, that's basically a dummy and that only inserts a small pause, but connects to the Clip at all times. Now imagine, you'd get a statistically significant result, that the subject rates the sound quality consistently higher, when he believes it to be coming from his Burmester rack / not coming from the Sansa Clip.

Now do a second test, this time double blind with both sources actually connected. Imagine the subject now fails to identify a difference.

What can we draw from this, especially when the subject was a honest type, sincerely motivated to rate the quality exactly as he perceived it in the first setup, without trying to prove or defying anything?

First, HA habit, the subject should stop claiming, that his Burmester setup sounds better than a Sansa Clip, as proven by the DBT. HA usually stops here.

But maybe one shouldn't. The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception. In addition, the subjects usual mode of listening is reflected much better in the first setup than in the second (DBT).
Title: Overcoming the Perception Problem
Post by: Nick.C on 2012-10-12 21:51:06
The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception.
Is that not pretty much a summary of the placebo effect?
Title: Overcoming the Perception Problem
Post by: googlebot on 2012-10-12 21:58:23
Is that not pretty much a summary of the placebo effect?


Yes. Does this change anything? Placebos could be shown to have significant causal effect.

Title: Overcoming the Perception Problem
Post by: dhromed on 2012-10-12 22:12:55
But is there a problem?
Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-12 22:14:41
Not necessarily. Suppose I want to establish a “sufficiently good” (for whatever purpose) end-user format. Then I am not satisfied with your score on your music, unless I am only targetting you as a customer. Even if your music does not have nasty enough artifacts for you to detect (or find annoying), it might be different with other ears and other signals. (Of course, you then need to use the appropriate method (test / design of experiment) to check whether the accuracy is better than random, but that is a practical obstacle.)
If 5 percent of the listeners hear differences on 10 percent of their music collection, is then the format “transparent”? I think not. It may be good enough for the purpose, by all means, but it does not mean that there are no audible differences.


You misunderstood me.
I am conducting ABX test for MYSELF. I am not conducting ABX test to gain statistical knowledge if people can hear difference.
I understand you have to have some sort of statistical chance for error when doing multiple users test, but I am talking about single person making test for his (or her's) own advantage and knowledge.
That margin, if testing codecs for personal knowledge, is irrelevant, IMO.
Title: Overcoming the Perception Problem
Post by: Nick.C on 2012-10-12 22:21:35
Yes. Does this change anything? Placebos could be shown to have significant causal effect.
Yes, but in the case of sighted vs ABX, placebo causes the sighted test to be biased in favour of the source of which the test subject has a preconceived preference.
Title: Overcoming the Perception Problem
Post by: googlebot on 2012-10-12 23:43:46
I do not see how calling the phenomenon "preconveived preference" changes anything. It is a variable one tries to eliminate in many tests, but why here? The subject, in its usual environment, produces different results than the same subject in a modified ("bias eliminating") environment. If the subject wants to compare a Burmester vs. a Teac vs. a Sansa Clip for future usage in its usual environment and as the person he/she is, a sighted test might be more appropriate than a DBT to identify the product with the best perceived (by this subject) performance.
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-13 03:32:09
The sighted test difference is not coming from the equipment, its origin is within the test subject. The results do not reveal anything about the equipment. Doing sighted tests just reinforce the individual's bias.

If the purpose of the test is to make the subject feel good about his preferences, then maybe the sighted test is useful: The person has just spent significant money. Buyer's remorse is starting to hit hard. Ahh, relief! The sighted test says he made the right choices after all.
Title: Overcoming the Perception Problem
Post by: Nick.C on 2012-10-13 09:43:43
@googlebot: You are now allowing the results to be heavily skewed in favour of the equipment that the test subject "wants to be the best" (for whatever reason - cost, etc).

In your example it is no longer about whether any differences can be heard by the test subject in a objective test, rather whether the test subject states preference for the output of their preferred equipment in a blatantly subjective test.
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-13 14:16:10
Buyer's remorse is starting to hit hard. Ahh, relief! The sighted test says he made the right choices after all.

...unless buyer's remorse has sunk in and the person is now biased against the new purchase.
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-13 14:25:06
I am conducting ABX test for MYSELF.

Even still, a failed test only fails to demonstrate that an individual can distinguish a difference during that instance. Training and/or rest may affect the outcome of a future test, as examples.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-13 22:05:52
Except,  DBT doesn't do that.


Why do sighted test regularly lead to different results, then? Just calling it "bias, that should be eliminated" doesn't change the fact.


You wrote: Double-blind testing of isolated senses...

DBT does not necessarily 'isolate' any senses.  Your can see, hear, taste, touch smell.  All that has changed is what you *know*. 


Quote
Imagine the following test setup: A test subject is presented music supposedly sourced from either a Sansa Clip or his favorite Burmester rack. You present an expensive looking switch to him, that's basically a dummy and that only inserts a small pause, but connects to the Clip at all times. Now imagine, you'd get a statistically significant result, that the subject rates the sound quality consistently higher, when he believes it to be coming from his Burmester rack / not coming from the Sansa Clip.


Now do a second test, this time double blind with both sources actually connected. Imagine the subject now fails to identify a difference.


What can we draw from this, especially when the subject was a honest type, sincerely motivated to rate the quality exactly as he perceived it in the first setup, without trying to prove or defying anything?



The first time he failed to identify that there was in fact no difference, and we can reasonably attribute that to sighted bias.  The second time he may well have successfully identified that there was no difference, or may have failed to identify a real, but small, difference.

Quote
First, HA habit, the subject should stop claiming, that his Burmester setup sounds better than a Sansa Clip, as proven by the DBT. HA usually stops here.

But maybe one shouldn't. The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception. In addition, the subjects usual mode of listening is reflected much better in the first setup than in the second (DBT).


This is no different from putting the same cheap wine in differently-priced bottles.    Subjects often think the pricier wine tastes better.  So, what does that tell us about the *wine*?  What does your  listeners *beliefs* about a piece of gear, tell us about the *gear*? What claims can reasonably be made about the relative performance of A and B? 



Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-14 13:35:29
I am conducting ABX test for MYSELF.

Even still, a failed test only fails to demonstrate that an individual can distinguish a difference during that instance. Training and/or rest may affect the outcome of a future test, as examples.


Yes, but I am conducting the test at that one point of time. And the results are valid for that test.
Of course you should take what, 16 full turns? But they don't have to be the same day. Or week.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-14 23:31:39
Yes, but I am conducting the test at that one point of time.


Strange to read your initial postings in this thread now after you have tried to downplay the applicability of the test to a one-time personal experience with a clear-cut full sensitivity/specificity.
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-15 12:20:30
I think Googlebot is making a valid philosophical point. If the shiny thing sounds best to you (because of placebo), and you want the thing that sounds best to you, you can be very grateful to the shiny thing and placebo for delivering it.

Though you'd better not probe too deeply - because, while the only guaranteed way to completely remove placebo is to take away the knowledge of what you're listening to, you can certainly reduce placebo (or change the direction in which it operates) by introducing doubt as to whether something really does sound better.

This latter effect is probably the cause of the audiophile's never ending upgrade path.


The downsides to this are many. e.g.
1) your entire investment can be rendered worthless to you by anything that causes placebo to break down - that's a pretty risky investment.
2) if you had blind tested before purchase, you would probably have chosen the cheapest well-made thing that sounded as good as everything else - saving you money, and giving you your own unshakable placebo effect in enjoying that equipment - you see, ABX-lovers can enjoy their own placebo experience after having chosen the equipment. They've proven scientifically that it's as good as it needs to be, and then placebo can add to the subjective perception that it's as good as they could possibly perceive it to be.
3) some nice looking equipment sounds objectively awful, and doesn't work very well. While you might be able to convince yourself that it sounds wonderful, you'll still have the pain of unreliability, quickly/difficult operation, and the anxiety of damaging or wearing your music collection away every time you play an LP (if sighted testing led you to choose vinyl over CD).

However, sighted listening equipment purchasing is great for the economy (you just keep spending money), and it avoids time consuming things (e.g. proper listening tests), and difficult questions such as "how well can you hear anyway?"

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: pisymbol on 2012-10-15 13:48:53
I think Googlebot is making a valid philosophical point. If the shiny thing sounds best to you (because of placebo), and you want the thing that sounds best to you, you can be very grateful to the shiny thing and placebo for delivering it.

Though you'd better not probe too deeply - because, while the only guaranteed way to completely remove placebo is to take away the knowledge of what you're listening to, you can certainly reduce placebo (or change the direction in which it operates) by introducing doubt as to whether something really does sound better.

This latter effect is probably the cause of the audiophile's never ending upgrade path.


The downsides to this are many. e.g.
1) your entire investment can be rendered worthless to you by anything that causes placebo to break down - that's a pretty risky investment.
2) if you had blind tested before purchase, you would probably have chosen the cheapest well-made thing that sounded as good as everything else - saving you money, and giving you your own unshakable placebo effect in enjoying that equipment - you see, ABX-lovers can enjoy their own placebo experience after having chosen the equipment. They've proven scientifically that it's as good as it needs to be, and then placebo can add to the subjective perception that it's as good as they could possibly perceive it to be.
3) some nice looking equipment sounds objectively awful, and doesn't work very well. While you might be able to convince yourself that it sounds wonderful, you'll still have the pain of unreliability, quickly/difficult operation, and the anxiety of damaging or wearing your music collection away every time you play an LP (if sighted testing led you to choose vinyl over CD).

However, sighted listening equipment purchasing is great for the economy (you just keep spending money), and it avoids time consuming things (e.g. proper listening tests), and difficult questions such as "how well can you hear anyway?"

Cheers,
David.


I believe as a friend pointed out you mean "expectation bias" not "placebo."

Your environment certainly plays a big role when testing gear.

I mean look, the DBT is CLEARLY pointless in Evan's Sound Room (the "couple" test is a better metric):

https://www.youtube.com/watch?v=ovr1TvQSQII (https://www.youtube.com/watch?v=ovr1TvQSQII)

(safe for work)
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-15 17:14:50
If ABXing negatively alters one's ability to hear differences, it's only a problem if you're using negative results to prove that there is no difference, which is a fallacy in any case: while a positive ABX result shows with a high degree of probability that there IS an audible difference, a negative result never proves anything.


Basically,  what a 'negative' ABX results means is that the hypothesis 'there is an audible difference' was not supported, with a 'p' chance (typically  1 in 20) that an audible difference nevertheless exists .


This doesn't seem correct to me. What skamp said I think is accurate. One can apply the statistical analysis you mention to a test where the the test subject, the listener, showed a strong ability to differentiate between the two sources, however one can't apply the same statement if he or she only had *random* results.

Besides there not being any actual audible differences between the two sources to mortal ears, other possibilities for such random results might include:

A. The listener wasn't trying very hard or was sleepy/fatigued/ill etc.
B. The listener was mischievous and *intentionally* gave random results.
C. The test conditions, such as the resolution/accuracy of the loudspeakers used, weren't up to the task, that day, etc.

What's important to note is that these three possibilities, A, B, and C, are *precluded* when the listener successfully *does* differentiate between the two DUTs. That's why one can correctly apply the statistics to such an outcome, only. Sure, there's a one in twenty chance the listener's results were just dumb luck, however there's a 95% chance it was because they truly could hear a difference.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-15 20:51:32
You're right that different terms apply when we are talking about rejecting vs accepting the *null* hypothesis (in this case, the 'no difference' hypothesis).  Rejecting null H  when it is true is a Type I error, accepting null H when it is false is a Type II.
When we get results we do statistics to calculate the probability that those results would have been obtained 'by chance'.  This is the p value.  We compare the p value to a pre-determined, more or less arbitrary (though traditions exist) maximum p value , usually 1 in 20 (0.05) , the alpha value.  So if our p < alpha, we reject the null H ('null H not supported') , otherwise not .


Alpha and p are really values for the probability of making a Type I error -- alpha is the pre-set threshold for 'acceptable' chance of Type I error, p is the calculated value for the obtained results.  If we get a p < alpha, then we say the chance that we made a Type I error, while by no means eliminated , is within our comfort zone.


It's true that my original post was really talking about a Type II error.  But in either case we use statistics to call our results 'random' or not, so I don't see how you can say that statistics only work for 'positive' ABX results.  Or maybe I'm just not understanding what you are getting at.  I didn't disagree with what skamp wrote...at least, not intentionally!
Title: Overcoming the Perception Problem
Post by: skamp on 2012-10-15 22:49:00
B. The listener was mischievous and *intentionally* gave random results.


I don't see how you can say that statistics only work for 'positive' ABX results.  Or maybe I'm just not understanding what you are getting at.  I didn't disagree with what skamp wrote...at least, not intentionally!


What good are statistics if the listener acted in bad faith? If he decided to answer randomly, the result is meaningless. Whereas he could hardly act in bad faith in the other direction.
Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-15 22:57:15
There are ways of cheating to get positive ABX results such as altering the log or using some sort of workaround that un-blinds the test subjects.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-16 03:58:07
What good are statistics if the listener acted in bad faith? If he decided to answer randomly, the result is meaningless. Whereas he could hardly act in bad faith in the other direction.


Sure he could, if he's determined, and the test isn't very carefully proctored.  So once you assume 'bad faith' then the results either way are useless. All reported ABX results on HA could be considered invalid, if you assume that cheating was involved.

(Btw if the stats show that  answers are *more* wrong than they should be by chance, that can also be a useful thing to know)

I've lost track, but why exactly are we going down this 'what if they're answering randomly on purpose' road?  The  supposed 'perception problem' is not one of bad faith.



Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-16 04:32:05
[Trying to bring this back on topic]

There is not a big distinction between consciously selecting random results  (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.

Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased? If you were to participate, do you really, honestly think you'd be giving it your best effort possible and that there's no way your bias could be influencing your selections, at least at a subconscious level?
Title: Overcoming the Perception Problem
Post by: Woodinville on 2012-10-16 04:33:13
[Trying to bring this back on topic]

There is not a big distinction between consciously selecting random results  (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.


Baloney, that's what positive controls are for. You did build both negative and positive controls into your test, right?
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-16 04:39:53
Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned?  Please be specific, thanks.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-16 06:19:59
Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased?


Sure. Turning to medicine: what if we were to test the effect of homeopathy? I would have a fairly negative expectation bias, especially if I was told it was actually done the homeopathically “proper” way. This and the particular case you mention, could have been mitigated by not telling what specifically were tested. I assume you really shouldn't.

And if you have anything such at hand, introducing a third thingy with known effect could help the analysis. I.e., if you have A, B and C where the difference between A and C is well-established and quantified, and the listeners are biased as you describe (or merely not sufficiently randomly drawn – in practice you would have to deal with self-selection) then you might check if they can distinguish A and C better or worse than “the known average”. That could have been done in the homeopathy case as well. Problem is, it only tells you that you have no test.
Title: Overcoming the Perception Problem
Post by: itisljar on 2012-10-16 09:54:51
Yes, but I am conducting the test at that one point of time.

Strange to read your initial postings in this thread now after you have tried to downplay the applicability of the test to a one-time personal experience with a clear-cut full sensitivity/specificity.


Downplay?
If I am conductig ABX test of a codec settings when I think I hear difference, how is simple encoding wav file, loading it into foobar and running ABX comparator not valid?
Where does the percepcion comes into? I am listening music either on speakers or headphones (mostly headphones) - speakers being Brand A and headphones Brand B - where does exactly my (mis)perception kicks in? I am sorry, but your theory isn't very explainable - tell me where is, in my case, ABX test failing?
Or, for that matter, to anyone doing the same exact test? Or technically properly designed DAC/speaker test with ABX switchbox?
And correct me if I am wrong, but ABX test is primarily personal experience from which the results can be collected and statistically processed. The more personal results, the more accurate statistics.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-16 15:51:49
Could it be shown already that results from double-blind listening tests correlate with the results of sighted tests over a large enough pool of listeners and different setups?


I think we've seen some probably unintended very large-scale sighted test collections to judge by.

The most recent one was the introduction of DVD-A and SACD.

Understanding of the relevant perceptual mechanisms and DBTs were accurate predictors of their success in the mainstream marketplace: They failed.

There was also an apparently unintended sub-experiment where about half the media brought to market in those so-called hi-rez formats turned out to be actually at or approximating the technical performance of medium they pretended to upgrade: CD-audio. Nobody got the joke from pure listening evaluations until people started blowing the whistle based on technical testing.

I take that as a reiteration of the original DBTs that basically said no audible differences due to the technically enhanced medium.

I conclude that if you want to make money by bringing some purported technical improvement in sound quality to the market, run ABX tests and if they strongly tend to null results, save your time and money and career and leave your proposed enhancement in the lab.
Title: Overcoming the Perception Problem
Post by: knutinh on 2012-10-16 21:02:24
Self-reporting about ones mental state surely carry some issues.

1) Is it conceivable that "hirez audio"/"snake-oil cables"/... somehow is registered by the low-level audio perception, then passed on to some subconscious part of our brain, but never to the conscious part of our brain? I guess it is within "conceivable". What would this mean in practice? It would imply that every audiophile claim about "A sounding better than B" was delusional, as their conscious brain never would have access to this information. It might mean that those who happen to listen via carbon-nano-kevlar cables are somehow "happier" than those of us who purchase simple stuff. Or it might mean that those individuals are more inclined to want to have a coke after listening than the rest of us. One might device tests that partially test this, but I think that the search-space and probability of interesting results makes it a bad career move.

2) Is it conceivable that some individuals react to the testing environment by decreasing their sensitivity to phenomena that they can otherwise easily distinguish? Perhaps. But involuntary (unknown) participitation in experiments does not seem to support it. Furthermore, if your capabilities are shaken by sitting alone in front of the abx plugin of foobar, how are you ever able to listen critically?

I dont get your point about not understanding the bias removal. When I spend hard-earned money on wine or loudspeakers or whatever, I want to know what I am getting. I want to know if it tastes differently to me, or if it is perhaps a case of "the emperrors new clothes". I know that I am prone to such biases, and I want to test the product using perception with and without them.

-k
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-16 22:28:46
Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned?  Please be specific, thanks.


Start with two signals that are by any reasonable measure vastly different.  Then slightly less different.  Then slightly less different again.  Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.

Seriously, this is a non-issue for most listeners.  In most cases I've read about where the DBT result was 'no support for the null H',  the listener *believes* they hear a difference both before and *during the test* as well.  In other cases they complain that the difference that they thought they heard 'sighted' suddenly seems harder to hear whne they're listening blind.  In neither case is 'bias towards not hearing' a credible factor.

Btw, have you read Pio's HA sticky thread about blind listening tests?

http://www.hydrogenaudio.org/forums/index....showtopic=16295 (http://www.hydrogenaudio.org/forums/index.php?showtopic=16295)
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-17 00:30:30
Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned?  Please be specific, thanks.


Start with two signals that are by any reasonable measure vastly different.  Then slightly less different.  Then slightly less different again.  Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.

Good example. Thanks. You then cherry pick the test subjects to get rid of the bad apples, I guess.

Can't say I recall ever reading of any DBT in the audio press which ever did this pre-screening you've just described, but I hope it or some similar procedure to preclude this particular bias is standard procedure in the academic world. [I can't edit my original post at this late date, however I didn't stress enough that the "mischievous" behavior of the listener may be (possibly) at a subconscious level. He/she would pass a lie detector test that they were "Doing their best", that is; they aren't "frauds".]

Quote
Seriously, this is a non-issue for most listeners.
So rather than applying the time consuming control you just described, we could simply ask potential participants if they might be biased on a conscious or subconscious level, instead.  Ha-ha!

Quote
Btw, have you read Pio's HA sticky thread about blind listening tests?


Yes. Take Rule #1 for example:
Quote
Rule 1 : It is impossible to prove that something doesn't exists. The burden of the proof is on the side of the one pretending that a difference can be heard.
If you believe that a codec changes the sound, it is up to you to prove it, passing the test. Someone pretending that a codec is transparent can't prove anything.


So even though random results don't prove anything, which I agree is correct, you seem to think that statistical analysis may be applied to them?! Huh? That's what I don't get. If no firm conclusion can be drawn, one way or the other, how on earth can one describe the probability/certainty of "no evidence/proof found, at least in this instance". NOTHING was established in the first place and nothing was proven so how can you describe the certainty of this "nothing", this randomness, as a percentage?! For all we know the randomness is caused by problems with the test design, such as A, B, and/or C , all things which can be immediately ruled out if the results go the other way however (where the listener can successfully hear a difference most/all of the time), and we have no way of knowing the "percentage of likelihood" of problems such as A, B, or C. [But I do like that control you suggested for B. Bravo!]
Title: Overcoming the Perception Problem
Post by: saratoga on 2012-10-17 03:03:00
So even though random results don't prove anything, which I agree is correct, you seem to think that statistical analysis may be applied to them?! Huh? That's what I don't get. If no firm conclusion can be drawn, one way or the other, how on earth can one describe the probability/certainty of "no evidence/proof found, at least in this instance".


Its probably helpful if you explain how you got from "proving a negative" to "knowledge does not exist".  Some of those steps may be questionable. 

Leaving aside whatever broader metaphysical point you were grasping at, I think the meaning of Pio's quote is that no one can prove that you can't hear something, so its up to you to show that you can.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-17 04:53:41
Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned?  Please be specific, thanks.


Start with two signals that are by any reasonable measure vastly different.  Then slightly less different.  Then slightly less different again.  Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.


Good example. Thanks. You then cherry pick the test subjects to get rid of the bad apples, I guess.


um....no. Unless by 'bad apples' you mean people with significant hearing loss, which this control will indeed identify. Look, you've already admitted you have no knowledge of how the science is done.  I was merely telling you what Woodinville meant by 'positive control', and he's right, a rigorous experiment typically employs a positive as well as negative control. 


Quote
Can't say I recall ever reading of any DBT in the audio press which ever did this pre-screening you've just described, but I hope it or some similar procedure to preclude this particular bias is standard procedure in the academic world. [I can't edit my original post at this late date, however I didn't stress enough that the "mischievous" behavior of the listener may be (possibly) at a subconscious level. He/she would pass a lie detector test that they were "Doing their best", that is; they aren't "frauds".]


Do you recall the audio press DBT subjects ever NOT claiming they heard a difference, sighted?  I don't.

So, against the body of perceptual psychology data, tallying the ways humans are alert for 'difference' whether it exists or not, you posit a population who consciously assert things sound different, yet unconsciously think they sound the *same*.


Title: Overcoming the Perception Problem
Post by: greynol on 2012-10-17 05:28:47
Let me get this straight, the subconscious mind is overruling decisions by the conscious mind that is presumably able to detect a difference?  What is the basis for such a counter-intuitive notion?
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-17 05:35:55
So even though random results don't prove anything, which I agree is correct, you seem to think that statistical analysis may be applied to them?! Huh? That's what I don't get. If no firm conclusion can be drawn, one way or the other, how on earth can one describe the probability/certainty of "no evidence/proof found, at least in this instance".


Its probably helpful if you explain how you got from "proving a negative" to "knowledge does not exist".  Some of those steps may be questionable.

Sorry, I can't help you there, since I don't know what it is that I wrote that you equate as "knowledge does not exist."

Quote
Leaving aside whatever broader metaphysical point you were grasping at, I think the meaning of Pio's quote is that no one can prove that you can't hear something, so its up to you to show that you can.


I'm pretty sure Pio meant excatly what James Randi means when he says "You can't prove a negative." Randi "You can't prove a negative" (http://www.youtube.com/watch?v=qWJTUAezxAI)

You can test 1000 subjects, listeners (or reindeer), and all it shows is that none of them on that day under those test conditions could hear a difference with any statistical significance beyond random guessing. If however they show an ability to hear a difference with strong statistical significance, then the test does prove, (or at least presents evidence for) the conclusion that on that day, with that music etc, some people can indeed hear a difference and we are 95% confident this wasn 't because of just dumb luck.

Title: Overcoming the Perception Problem
Post by: saratoga on 2012-10-17 05:43:30
Sorry, I can't help you there, since I don't know what it is that I wrote that you equate as "knowledge does not exist."


Easy enough then.

I'm pretty sure Pio meant excatly what James Randi means when he says "You can't prove a negative." Randi "You can't prove a negative" (http://www.youtube.com/watch?v=qWJTUAezxAI)


Most people probably aren't going to watch a 10 minute video just to figure out what you're trying to say, so if its important, you might want to explain yourself.

You can test 1000 subjects, listeners (or reindeer), and all it shows is that none of them on that day under those test conditions could hear a difference with any statistical significance beyond random guessing. If however they show an ability to hear a difference with strong statistical significance, then the test does prove, (or at least presents evidence for) the conclusion that on that day, with that music etc, some people can indeed hear a difference and we are 95% confident this wasn 't because of just dumb luck.


Thats correct.  What was it you didn't understand ?
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-17 06:36:58
Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned?  Please be specific, thanks.


Start with two signals that are by any reasonable measure vastly different.  Then slightly less different.  Then slightly less different again.  Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.


Good example. Thanks. You then cherry pick the test subjects to get rid of the bad apples, I guess.


um....no. Unless by 'bad apples' you mean people with significant hearing loss, which this control will indeed identify.

Yes, that's essentially what I meant. The bad apples you weed out in pre-screening are the ones who aren't showing an ability to differentiate between the two sources with a pre-established small difference which should be audible to most.

You don't know if it is due to their poor hearing, malicious intent to skew the test, a lack of understanding how to vote, or what they are to do, or a preconceived notion (a bias) that it is ridiculous to think that power cords on CD players make an audible difference, so they are lackadaisically selecting A vs B and not really giving it their all [even though unbeknownst to them they haven't even started to hear those pairings yet, since they are still in a pre-screening stage looking to weed the biased people out!]

Quote
Look, you've already admitted you have no knowledge of how the science is done.

No, I said I wasn't a scientist. I do, however, understand that in a good scientific experiment there should be ways to prevent all forms of bias, even if that bias may be at a subconscious level and/or one thinks "that sort of bias isn't likely to have an impact on the results". There may be unforeseen reasons why it does have an impact that haven't been thought through. The test subjects not giving it their all, 100% focused attention, or rushing to give answers compared to the rest, was just an example.

The whole reason why we do double blind testing, not single blind, is for this exact same reason. There's no reason to think that a competent test administrator passing out the test forms and pencils to the test subjects would act or speak in a way that would influence the subjects, or give away the identity of A or B, however to be absolutely sure there's nothing we may have overlooked, we make them blinded too!

We don't just get rid of the forms of bias we suspect might have an impact, we get rid of ALL biases as best we can.
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-17 07:22:35
Do you recall the audio press DBT subjects ever NOT claiming they heard a difference, sighted?  I don't.

I do, that is, individual test subjects in a DBT who thought the likelihood that there would be an audible difference was rather slim, pre-test, but if you want the exact name of the magzine article, date, page etc I don't have that stored in my memory. I suspect it is one of these (http://home.provide.net/~djcarlst/abx_peri.htm), however.

Tests conducted of BAS and SMWTMS members would be first on my list to check, but I don't have the time to devote to such a task. IIRC individual results were broken down into "believers" and "non-believers", ie people who very well may be biased.
----

[Saratoga]
Quote
What was it you didn't understand ?
Answer: How a confidence level expressed as a percentage can be applied to the results of a test where the random outcome basically means "Oh well, the results are inconclusive; nothing was proven by this test."

The video is 7m 50s , not ten minutes, (and the first 30 seconds can be safely skipped)  It best explains what is meant by the expression "you can't prove a negative" since it is it explained by the man who actually coined it. He explains it much better than I can using a humorous example, but I'd rather not paraphrase the master. He does explain why nothing is proven by such a test.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-17 18:51:04
Quote
Rule 1 : It is impossible to prove that something doesn't exists. The burden of the proof is on the side of the one pretending that a difference can be heard.
If you believe that a codec changes the sound, it is up to you to prove it, passing the test. Someone pretending that a codec is transparent can't prove anything.


So even though random results don't prove anything, which I agree is correct, you seem to think that statistical analysis may be applied to them?! Huh? That's what I don't get. If no firm conclusion can be drawn, one way or the other, how on earth can one describe the probability/certainty of "no evidence/proof found, at least in this instance". NOTHING was established in the first place and nothing was proven so how can you describe the certainty of this "nothing", this randomness, as a percentage?! For all we know the randomness is caused by problems with the test design, such as A, B, and/or C , all things which can be immediately ruled out if the results go the other way however (where the listener can successfully hear a difference most/all of the time), and we have no way of knowing the "percentage of likelihood" of problems such as A, B, or C. [But I do like that control you suggested for B. Bravo!]



There are actually a lot of issues here.

You might want to have a look at http://en.wikipedia.org/wiki/P-value#Misunderstandings (http://en.wikipedia.org/wiki/P-value#Misunderstandings) .

Then there is considerable disagreement on what a 'probability' really is. Frequentists refuse to attach probabilities to hypotheses (they are either patently true or patently false), while from a Bayesian point of view, such probabilities are acceptable quantifications of how well informed we are. Different interpretations of probability might be a proper disagreement over concepts, but even when it isn't, it might confuse the terminology.

Then there is 'random' vs 'uniform'. Whether you are guessing yes/no at 50/50, or if you have a hit rate of 95 percent, there is still some randomness left. Statistical analysis might be applied to check whether a claim that 'this is not uniform guessing' is any trustworthy. BTW, 'randomness' (for a suitable interpretation of the word) caused by test design could be both 'noise' and 'bias'.
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-17 20:14:42
Thanks, Porcus. I'll check it out.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-18 14:04:36
Do you recall the audio press DBT subjects ever NOT claiming they heard a difference, sighted?  I don't.


Right. A potentially disturbing fraction of all DBT articles published in the mainstream and *underground* audio press were done within 50 miles of my house by people I know, including me.

We all heard differences in sighted evaluations, and often did sighted evaluations before we did the DBTs. 

Sighted evaluations are a good cheer leading technique to get people engaged in the DBTs.

Quote
So, against the body of perceptual psychology data, tallying the ways humans are alert for 'difference' whether it exists or not, you posit a population who consciously assert things sound different, yet unconsciously think they sound the *same*.


I guess the question is, "Knowing what you knew about the tests, did you kinda like know in your heart that no difference would be heard".  For me the answer would probably be yes, especially of how ever many years of testing. For many of the actual test subjects, they were all over the map - some believing that a different would be heard, some skeptical, and some believing for sure that none would be heard.

If  we fast forward to the tests that were on my now-departed www.pcabx.com web site, those tests included training files so that listeners were led in logical steps from posamatively hearing relevant differences based on files that were augmented so that the differences would be clearly heard as being different, to raw files that encapsulated the audible difference in technically correct ways.

A key point is that if you do the training files right, the listener doesn't know when the audible difference will become inaudible. In the beginning he can hear it "clear as a bell" and someplace along the way, in a set of files that he knows not which, his ability to hear the difference just vanishes. I would say that this is the best guarantee that his unconscious state has been made as irrelevant as possible, and quite irrelevant at that.

All this listener training didn't really change the outcomes. So much of what the high end has staked their credibility on is so far away from the now well-known thresholds of hearing that making logical improvements in the listening test process, even fairly heroic efforts like the extensive listener training described above, just can't help their cause.
Title: Overcoming the Perception Problem
Post by: mzil on 2012-10-18 19:16:56
Do you recall the audio press DBT subjects ever NOT claiming they heard a difference, sighted?  I don't.

I do, that is, individual test subjects in a DBT who thought the likelihood that there would be an audible difference was rather slim, pre-test, but if you want the exact name of the magzine article, date, page etc I don't have that stored in my memory. I suspect it is one of these (http://home.provide.net/~djcarlst/abx_peri.htm), however.

OK, update. My memory was correct. It was one of those, specifically:

Masters, I. G. and Clark, D. L., "Do All Amplifiers Sound the Same?", Stereo Review, pp. 78-84 (January 1987) [I read it at the time it was published, BTW.]

From it:

"The kind of listeners was important as well, and so the sample was made up both of people who professed to be able to hear differences between amplifiers, the 'Believers,' and of those who doubted their existence, the Skeptics'. "

"NOTES...

2. Believers believe that amplifiers sound significantly different, Skeptics are skeptical of that claim."

Interestingly, some of the Skeptics in the pre-test, open (sighted) warm up sessions (I presume in a room filled with Believers who were insistent they indeed heard differences) apparently "jumped ship" and thought they too could hear differences (hmm, "peer pressure placebo effect", anyone?) , however, not all of them. So we have good documentation of at least one study where not only were there no controls used to preclude any bias, conscious or unconscious, of some test subjects thinking they shouldn't expect to hear a difference and therefor might not have tried as hard (as just one possible example of how such a bias might influence a test, but there could very well be others as well), but we know that 10 of the 25 test subjects stated upfront that they were "Skeptics" and predisposed to thinking (I call that "biased") that amplifiers don't sound significantly different.

It is no longer on the web, however, there was a snapshot stored on the web archive "wayback machine" site I found, and here's a link to it:
http://web.archive.org/web/20060323085504/...o/Amp_Sound.pdf (http://web.archive.org/web/20060323085504/http://bruce.coppola.name/audio/Amp_Sound.pdf)

----

The author agrees with Randi, "you can't prove a negative":

"After completion of the blind (cable-swap) or double-blind (ABX) testing, the listeners were given their scores...High scores can prove differences were audible, but random scores can never prove that all amplifiers sound alike." [emphasis mine]

The author assigned a "Probability Results Due To Chance [and not audible differences]" score to all instances where the percentage of correct answers was greater than 50%, however none is given when it is 50% or lower. This makes perfect sense to me from my way of thinking. You can't assign a certainty of "95%", or whatever, to a conclusion of "I didn't conclude anything", but my perception is that I am in the minority with that view, here in this thread, and as I said I don't have the time to devote to this, especially since it seems to be a battle I'd have to fight alone. 

Bye all!

P.S.

[Arny]
Quote
In the beginning he can hear it "clear as a bell" and someplace along the way, in a set of files that he knows not which, his ability to hear the difference just vanishes. I would say that this is the best guarantee that his unconscious state has been made as irrelevant as possible, and quite irrelevant at that
[emphasis mine]

YES! Finally. That's the way to do it, IMHO.

Quote
So much of what the high end has staked their credibility on is so far away from the now well-known thresholds of hearing that making logical improvements in the listening test process, even fairly heroic efforts like the extensive listener training described above, just can't help their cause.

Agreed. And just to set the record straight, I'm not attempting to help anyones "cause" except for science. [And thanks for inventing ABX, by the way. That was a huge contribution, unlike this relatively trivial matter!]




Title: Overcoming the Perception Problem
Post by: item on 2012-10-19 17:34:50
Sorry - been away, but lots of noise (most - fascinatingly - irrelevant) generated in the interim but, to respond to a few points . . .

Are you claiming people are more prone to hearing things that aren't there than seeing things that aren't there - in respect of the brain "filling in gaps" to create what turns out to be an incorrect model of the real world? I'm not sure that's true.


Not what I said, or McGurk shows. Please re-read.

...but they've already hacked cat's heads about to find out what signals went in/out of the auditory nerve long before it was possible to do this in a humane way. It's relevant because they have similar cochlea to us, and it's found that, like us, the losses (what you can't hear) derive from the air to neural transduction process. So we've got cut up cats, predictions from physiology, and blind tests on humans all delivering the same "what difference is just audible" data - but you don't trust the blind tests? Yet you'll trust the brain response. That's strange, since no one is doubting that placebo is a real brain response - it's just not a response to what you hear!  And if we measure some brain response when people are not aware of what they're listening to (i.e. some response to A that is absent to B), either this will be associated with a conscious audible difference, or not. If not, who cares. If so, then having reported hearing it when not knowing what they were listening to, they've just passed a blind test! What does the brain scan add? Think this through. Draw a flowchart of the possibilities if it helps.
Cheers,
David.

We don't report a fraction of our brain response. Humans are not cats. And placebo is not relevant to either.
Title: Overcoming the Perception Problem
Post by: krabapple on 2012-10-19 17:53:09
It's best to be careful drawing conclusions from measured 'brain response'  (neural imaging) .  One of the 2012 Ig Nobel prize winners illustrates the point

Quote
NEUROSCIENCE PRIZE: Craig Bennett, Abigail Baird, Michael Miller, and George Wolford [USA], for demonstrating that brain researchers, by using complicated instruments and simple statistics, can see meaningful brain activity anywhere — even in a dead salmon.

REFERENCE: "Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Multiple Comparisons Correction," Craig M. Bennett, Abigail A. Baird, Michael B. Miller, and George L. Wolford, Journal of Serendipitous and Unexpected Results, vol. 1, no. 1, 2010, pp. 1-5.


http://www.improbable.com/ig/winners/ (http://www.improbable.com/ig/winners/)


Any why, pray tell, would you claim 'placebo is irrelevant' to brain response.  The placebo effect (and expectations bias) ARE brain responses. The point is that 'brain responses' are not perfect correlates of objective reality. Just because the brain registers a 'difference' doesn't mean one exists in fact.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-19 18:33:08
@item:
Perhaps you could share with us a little about who you are so that we can put your point of view into proper perspective.

What do you mean by 'proper perspective'? Am I looking at an ad hominem warmup or a chat-up line?]

Let's just focus about the first part of my question and not worry about the second part. Please share with us a little about who you are and who you might represent.


OK: I'll stop worrying about being chatted up, and concentrate on worrying about the incoming ad hominem. Although, to keep my options open - in case you're just cheekily playing hard to get - I'm a 43-year old Gemini vegetarian: mainly single, with a good sense of humour and a two-bedroom flat in a nice part of South London. I'm not representing anyone: I like music, I think how we listen is interesting. I'm not presenting myself as an authority. I think it's important not to ignore facts, and to be impartial when reasoning from inferences.

Disorientated and deprived of cues?  You appear not to realize that double-blind testing can (and often does!) provide for the listener to audition the subjects/samples that are known as they are known.

I think that I don't understand this is to do with the way you've said what your saying but if its because its cleverer than it can be understood by me to understand then im sorry.

To elaborate on the part that you quoted, during testing I pay closer attention to details explicitly to listen for differences.  If I am listening casually, I do just that, which is to say relax and enjoy.  I have no doubt that my casual listening is done so with less acuity.

Brraap! Entirely different mode of listening - which is part of the point I'm making. Inconveniently, perception in general happens largely subconsciously - consciously directed modes are incredibly slow and weak by comparison. Co-ordination, peripheral vision, muscle-memory: all harness the fastest, most primal parts of the brain. So, no: like sex, relaxed is better. That was also a conclusion of the Beau Lotto experiment quoted earlier, among of course many others

To put it another way, when was the last time someone accused you of paying close attention because you forgot something you were told and were expected to remember?

I can't remember.
But what a beautiful diversionary analogy!

If part of my enjoyment has to do with knowing that my equipment or sample is XYZ then that is my business.  If I think I am actually perceiving something differently as a result I respect the forum and refrain from discussing it unless I can comply with its rules.

That's good to hear.

Perhaps, perhaps not; I really don't care either way. . . . .  MUSHRA is well accepted as a double blind test that provides for subjective grading. I don't think anyone with enough understanding is denying its power to foster truly tangible differences in the mind.  We simply aren't interested in reading about them here.  There are plenty of other places where you can indulge yourself.

It's evident that your interest and expertise lies in statistical analysis of experimental data. That's very exciting and important and everything, but we're discussing the circumstances under which that data is generated. Given that these experiments are fundamentally addressing the nature of perception - ie, are psychological in nature - it's curious to care so little about getting the experiment right. If the experiment is wrong, so is the data. Such loving analysis would then be so much turd-polishing.
Title: Overcoming the Perception Problem
Post by: item on 2012-10-19 18:45:36
The sole, specific point I'm making is that DBT is rarely used in perception testing for obvious reasons outlined above, and attempting to smear its credibility from the physiological domain is intellectually dishonest.


And as I told you before, double blind protocols are common in pain perception studies, where self-reports of perception are the outputs. You going to address all that, or punt again?


And you posit an equation between pain and hearing?!

Quote
And that the abundance of negative results indicates coarse granularity in the test method as much as it supports any particular paradigm.


Quote
It 'indicates' that to you because you believe it should, not because it's necessarily true. And as I said before, this amounts to nothing more than an argument from incredulity, if not ignorance.


Certainly not: the point explicitly made is that the results are open to two interpretations. You are insisting on one interpretation so dogmatically that the other, equally valid one, is being dismissed. Consideration of alternatives is what I like to call argument by method.

And now, Mr Joined-in-August,  I trust you won't mind if I sit back and await your predictable departure (back?) to more woozy audio forums, where you'll claim you were driven out of HA for 'unorthodoxy' or 'thinking outside the box'.

Let's not snipe!
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-19 19:22:27
The sole, specific point I'm making is that DBT is rarely used in perception testing for obvious reasons outlined above, and attempting to smear its credibility from the physiological domain is intellectually dishonest.


And as I told you before, double blind protocols are common in pain perception studies, where self-reports of perception are the outputs. You going to address all that, or punt again?


And you posit an equation between pain and hearing?!


I guess krabapple will claim an inclusion between 'pain perception studies' and 'perception testing'. Guess which way.
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-19 20:56:23
I could be missing the point, or dozens of them, but it seems to me the hypothesis has been presented that the conditions necessary for proper DBT themselves alter perception in such a way as to strongly bias against what people are physically capable of sensing about the external world (i.e. that part not inside their head). The evidence for that hypothesis is that people regularly report perceptions that they are unable to repeat under DBT conditions.

We do know, because it has been demonstrated many times, that perception of signals which can be successfully and consistently identified by test subjects can be strongly overridden by an expectation introduced into the trials. Now subjects often report signals as being what they expect to hear, rather than what the signals really are, and even report the expected signal when they are led to believe they will receive it but have been given nothing.

These particular findings seems to present a case for being skeptical of claims made from sighted tests. The proposition here is that the expectations introduced in the tests are equivalent to the expectations introduced by really knowing which signal is been received. This proposition is at least somewhat supported by the fact that the expectation can be introduced by letting the subjects see what they believe are the sources of the signals (e.g. the cables, the amplifiers, the wine bottles) when the signals are actually from something else.

As far as I can see, the hypothesis that perception is really so much better and more pure outside of these restrictive test conditions is useless to science unless, and until, someone can think up a (repeatable) means to positively test it. Maybe the gods open deeper levels of perception to those filled with wine, love, and sympathy, and stop up the ears of those playing with that nasty science idea, but unless the gods decide to openly reveal themselves, we are unlikely to ever know. We can posit possibilities until the sun burns out but will never get any closer to knowing.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-19 21:30:58
I could be missing the point, or dozens of them, but it seems to me the hypothesis has been presented that the conditions necessary for proper DBT themselves alter perception in such a way as to strongly bias against what people are physically capable of sensing about the external world (i.e. that part not inside their head). The evidence for that hypothesis is that people regularly report perceptions that they are unable to repeat under DBT conditions.

We do know, because it has been demonstrated many times, that perception of signals which can be successfully and consistently identified by test subjects can be strongly overridden by an expectation introduced into the trials. Now subjects often report signals as being what they expect to hear, rather than what the signals really are, and even report the expected signal when they are led to believe they will receive it but have been given nothing.

These particular findings seems to present a case for being skeptical of claims made from sighted tests.


You trying for a Master's standing in understatement of what should be obvious to anybody with real world experience? ;-)

People who ignore expectation bias, along with the other systematic biases that afflict most amateur listening tests are just showing what little they know about the real world.

The three biggies are matching levels, listening to exactly the same musical selections, and managing expectation bias. Most audiophile listening evaluations ignore all 3.

Given the endemic nature of this sort of ignorant and sometimes willfully irrational behavior, most of these discussions about the alleged failings of well-controlled subjective testing can be dismissed out of hand.

IME trying to teach audiophiles how to do reasonable subjective tests is like trying to teach pigs to fly in that the usual result of the latter is that you at minimum upset the emotional state of the pig. ;-)
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-19 22:31:14
My point was not that many reported tests involve confounding variables but that a hypothesis is useful only for speculative philosophers and metaphysicists unless there is some way to definitively differentiate its results from other possibilities. In physics that may be a matter of making small, cumulative steps that refine results. Maybe the theory will be eventually broken at the sixteenth decimal place, and something totally different revealed, but exact testing is needed to get to that point. In the matter under consideration, I don't recall any proposed means of removing belief and expectation bias from sighted tests, no matter how well all other variables are controlled.

Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-22 12:49:44
It's best to be careful drawing conclusions from measured 'brain response'  (neural imaging) .  One of the 2012 Ig Nobel prize winners illustrates the point

Quote
NEUROSCIENCE PRIZE: Craig Bennett, Abigail Baird, Michael Miller, and George Wolford [USA], for demonstrating that brain researchers, by using complicated instruments and simple statistics, can see meaningful brain activity anywhere — even in a dead salmon.

REFERENCE: "Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Multiple Comparisons Correction," Craig M. Bennett, Abigail A. Baird, Michael B. Miller, and George L. Wolford, Journal of Serendipitous and Unexpected Results, vol. 1, no. 1, 2010, pp. 1-5.


http://www.improbable.com/ig/winners/ (http://www.improbable.com/ig/winners/)
Thanks - this is great stuff. They clearly had fun writing the "method" part of the write-up...
http://www.jsur.org/ar/jsur_ben102010.pdf (http://www.jsur.org/ar/jsur_ben102010.pdf)

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-22 12:53:07
Certainly not: the point explicitly made is that the results are open to two interpretations. You are insisting on one interpretation so dogmatically that the other, equally valid one, is being dismissed.
I'm not sure they can be equally valid. One has been proven to be true sometimes (people swearing they hear a difference when nothing has been changed), the other cannot be tested.

Does an untestable hypothesis even have a place in science?

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-22 13:44:12
Does an untestable hypothesis even have a place in science?


That question is certainly up for discussion. Popper would answer a resounding 'no': untestable hypotheses are unscientific. That is not to say they are unimportant (imagine there is a $DEITY that sentences you to salvation or damnation based on the colour of your shoes – you can hardly dismiss that as unimportant), just to say that they are outside the realm of science. That is, kinda, twisting the question around: does science even have a place in the discussion of untestables?

Then on the other hand, you have cases which are in principle testable, but you won't ever find the data – or it may or may not be so that there will ever be revealed enough information (you don't know yet). Form a “We believe that this happened because ...” hypothesis. Likely there is a grey area between “testable”, “will become testable, just wait and see”, “may or may not ever become testable” and “won't ever be testable”. And, what if it is “in principle testable”, but you are fully aware that hardly anyone makes a decent attempt at it? Is that a failure of science as such? Is it unscientific to base yourself on such a hypothesis? (Warning: potentially leading trick question.)
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-23 10:51:49
I think that's an intelligent response, and this could be an interesting discussion.

However, I'm reminded of the reality of blind and sighted testing, and I think we're way off on a tangent.

Sighted listening tests can have all the problems of "stress" and "altered perception" that item ascribes to blind tests. If item is right, the very fact that we are not listening purely for the enjoyment of the music (or why ever you normally listen to your stereo), and are listening, at least partly, with a view to forming a judgement - that simple fact has nullified our attempt to form a correct judgement related to the practice of normal listening.

Blind, sighted, whatever - irrelevant. Asking the question has made the question unanswerable.


And yet, in practice, it's only expectation bias that seems to be a problem. Everything else has an effect on what I perceive, but not a systemic effect which makes me consistently prefer the wrong thing - not if the test is designed properly.

I think Arny quietly makes the same excellent point over and over again, such that readers miss the power and relevance of it: people who haven't even tried sighted and blind and double blind testing really don't know what they're talking about. While the philosophical discussions we've had here are interesting, and consideration of the "what ifs" might be great fun, it's a complete and utter waste of time compared to getting some practical experience of the issues surrounding these things - and learning about your own responses to sighted vs blind testing. That's the real eye (ear!) opener. Do that - experience the magnitude of the expectation bias problem vs every other possible problem - and then come back and argue the philosophy of the situation if you still think its relevant.

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: dhromed on 2012-10-23 11:17:48
Sighted listening tests can have all the problems of "stress" and "altered perception" that item ascribes to blind tests. If item is right, the very fact that we are not listening purely for the enjoyment of the music (or why ever you normally listen to your stereo), and are listening, at least partly, with a view to forming a judgement - that simple fact has nullified our attempt to form a correct judgement related to the practice of normal listening.


Item's argument is "I hypothesize incorrectly that there is a problem with DBT for audio, based on flawed reasoning and fantasy. Now please demonstrate that this isn't so." It's a cop-out. We cannot learn anything from this line of thought.

Had I not believed that item is sincere, I would have accused him/her of eloquently trolling the board.
Title: Overcoming the Perception Problem
Post by: StephenPG on 2012-10-23 11:56:29
This might explain it, if it's the same Item?


Item Audio (http://www.itemaudio.com/index.php)

Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-23 12:26:51
I for one do not discard the statement “Putting people in a testing lab alters their behaviour”. Not even if “perception” is part of “behaviour”. Greynol (post #9) puts forth the opinion that people are more alert in a testing lab – in which case the behaviour does indeed change.

But it is hardly controversial to claim that the test setup matters. Indeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.

- If the lab setup misses some (real, as opposed to imagined) differences only every now and then (unsystematically), then it simply requires more trials to pin them down. That is not much of an issue as long as the number is reasonable. If it is/were so that a sighted test “detects” a difference in two trials but is useless to prove any differences in thousand, and a DBT detects and reliably establishes in fifteen, then the latter is superior.
And even then, the practitioners would have a gut feeling on how many trials would suffice. A hidden piece of information saying that under ideal circumstances you could reduce the # of trials from 15 to 8 ... ? well, that is just a matter of cost-efficiency.

- It is known and completely uncontroversial that a statistical test with a finite number of trials is inherently prone to miss real effects which are sufficiently weak. If the lab setup completely misses some real differences – either because they for some reason show up so much rarer that we cannot detect them, or because for some mysterious reason they are completely gone – well, too bad; but compare this to a test setup which we know from the outset that distort the result so much that it is useless from the beginning.


I guess the latter is only food for this typical anti-science stand of “error in one, error in all”, used to bulk together every science that has ever made an inaccurate statistical prediction or refrained to make one that by coincidence hit. The pet of denialists of evolution, geosciences, smoking-induced cancer, and not to mention Godwin's ineffable application of the infinite monkey theorem. Statistical science has wrongs. It is a method of reducing errors, at avoiding them by numbers, but not at one omniscient stroke getting rid of them all – that's a luxury reserved to those who have received a G-d-given truth revelated to them once and for all. To those, any misprediction – or prediction failed to be made – is just a proof that science cannot be trusted any more than the previous doomsday that passed unnoticed.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-23 20:09:04
I for one do not discard the statement “Putting people in a testing lab alters their behaviour”. Not even if “perception” is part of “behaviour”. Greynol (post #9) puts forth the opinion that people are more alert in a testing lab – in which case the behaviour does indeed change.


The above is an incomplete statement of the claims we're dealing with.

The actual claim is “Putting people in a testing lab always alters their behavior in such a way that they are always transformed from highly sensitive detectors of audible differences into lumps of coal who are deaf to all but the grossest of auditory stimuli".

My point is that the alteration of behavior is always negative for audible differences, according to the DBT critics.

No known instances of even the best of their listeners ever beating the sensitivity-robbing failings of the DBT system, ever.

That would appear to be their story. ;-)
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-24 09:06:55
The actual claim is “Putting people in a testing lab always alters their behavior in such a way that they are always transformed from highly sensitive detectors of audible differences into lumps of coal who are deaf to all but the grossest of auditory stimuli".


And even then one would have to establish that the highly sensitive detectors of audible differences aren't swamped by the highly sensitive detection of placebo. One could in principle be lucky, in that the information revealed in sighted tests overshadow the misinformation, but why bet on it? (It isn't that hard to check, given time and resources and ... a slight willingness to sacrifice honesty – what about a sighted version of http://www.matrixhifi.com/ENG_contenedor_ppec.htm (http://www.matrixhifi.com/ENG_contenedor_ppec.htm) bar a white lie about which gear was playing? Edit at the end of the posting.)

Or we could just put the snakeoil investment plan on hold until we get better (and less uncomfortable) brain scans.


Edit: should have bookmarked this one, but google is my friend:
[blockquote]We also heard David Wilson's fascinating presentation of his conception of system hierarchy. He compared a pair of Wilson Sophias driven by a Parasound stereo power amplifier with a competitor's flagship speaker and an extremely powerful premium-priced amplifier. Not, as he explained, because he thought the Sophias sounded better, but to prove that meaningful comparisons could be made between systems assembled according to different priorities. This was a demo aimed at his hi-fi dealer clientele, after all (it's a trade show, remember?), but there's a kicker: after we all confirmed that we could hear meaningful differences, Wilson whipped a fake component shell off the digital source and revealed that with the Wilson speakers we weren't listening to the $20,000 CD player that had been used for the competitor's speakers, but an Apple iPod playing uncompressed WAV files![/blockquote]
http://www.stereophile.com/news/011004ces/ (http://www.stereophile.com/news/011004ces/) , found via http://www.head-fi.org/t/486598/testing-au...laims-and-myths (http://www.head-fi.org/t/486598/testing-audiophile-claims-and-myths) .
Title: Overcoming the Perception Problem
Post by: Calvin on 2012-10-24 09:36:42
Indeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.


I am no scientist nor technician (nor a native English speaker either as you may guess, I am sorry) but I really don't understand how the ABX method is expected to ban “placebo” as a variable of the testing procedure. If I understand the way the tests are run it is not “blind” at all. People know what are A's and what are B's and they are asked to match them to X's. If your inner belief is (possible placebo effect) that there is an “audible” difference between A's and B's and the difference is not “audible” you will fail to get a result beyond mere random guessing. In this case we can affirm placebo effect is avoided (but at the cost of not being able to tell odd from a maybe statisical relevant result). But what if your inner belief is that there is no difference at all? I guess in this case placebo would affect the result of the test. My point is that this method does not seem “scientific” at all to me, it is completely “asymmetric” as shown by the fact that in the best hypothetical case can verify something only if it's obvious under test condition and is unable to falsify any hypothesis from the less than obvious to the absolutely impossible.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-24 09:49:17
Indeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.


I am no scientist nor technician (nor a native English speaker either as you may guess, I am sorry) but I really don't understand how the ABX method is expected to ban “placebo” as a variable of the testing procedure. If I understand the way the tests are run it is not “blind” at all.


You certainly have a few points. Brief and itemized:
- The three-letter acronym you quoted was “DBT”, not “ABX” 
- ABXing can be done sighted or not. (“can” means “can”, not “should”)
- Yes, those prejudiced to believe that there is no difference, will be inclined to report no difference. But
(i) As we cannot, strictly speaking, prove negatives any more than we can disprove a “before this universe was born, I was incarnated as a Russell's teapot (http://en.wikipedia.org/wiki/Russell%27s_teapot)” (yes this is totally unsymmetric, that is well known ... and widely accepted as appropriate) – the conclusion from a negative is simply “Do not reject the null hypothesis”. As opposed to “Null hypothesis proven”, which is not valid.
(ii) We can remedy by introducing a comparison with known difference. (E.g., if test subject cannot differentiate between first-release and a brickwalled remaster, then you should not be surprised if they cannot identify a 96kb/s lossy of one of them.)
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-24 09:55:09
There is little point to any listening test, blind or sighted, where the listener is pre-disposed to ignore any audible differences.

The onus is on those who say/believe that they hear an audible difference to demonstrate it in a double-blind test. Those who say there is no audible difference have no reason to take the test.

(Though, occasionally, people who don't think they hear an audible difference will take the test for the heck of it, or to satisfy their own curiosity, and in doing so sometimes prove to themselves and others that there is a barely audible difference).

Cheers,
David.
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-24 11:34:45
My main interest in ABX software is to determine if a treatment I am considering is worthwhile or not. This is mainly in doing "restoration" of old recordings. There is generally, if not always, more than one way to deal with a problem.

As an easy example, consider declicking a recording made from an old LP, although declicking is not where I tend to do such comparison testing. At one extreme, one can proceed manually, click by click, trying to get the best result each time. At the other extreme one can run an number of batch steps on the entire recording, without regard to the particular treatment of any individual click. Batch declicking ALWAYS modifies many transients that are not clicks. This can easily be verified objectively from the data, without regard to what it sounds like.

The point of my making the comparison is that it could mean the difference between twenty-five hours of careful, repetitive stress damage producing work and a half hour of automated computer processing. The preliminary tests, to decide what to do, involves selecting a couple of short passages that seem likely to show differences, then doing them in two or more ways, then testing to find out if I can tell any difference in the finished products. As with testing lossy compression against an original, it is usually easy to show a physical difference. The question is whether or not that difference is audible.

Sometimes I am gratified to find that the fast easy way is just as good as the long hard way, although I always wonder about the possibility that other people might hear something I don't. There have been many times where I cannot identify any difference in the treatments. As far as I can tell, I make choices during the test because I have to say either X or Y in order to go forward. My random scores say the different approaches make for identical results, audibly speaking. This is the result I often want; it lets the work proceed with less effort and time.

However, on a number of occasions, although I can not identify any difference, I have come up with perfect matches every guess. This could happen randomly, even though the probability is very small. But is randomness a reasonable explanation for it to occur every once in a while -- for one individual?
Title: Overcoming the Perception Problem
Post by: 2Bdecided on 2012-10-24 11:40:28
However, on a number of occasions, although I can not identify any difference, I have come up with perfect matches every guess. This could happen randomly, even though the probability is very small. But is randomness a reasonable explanation for it to occur every once in a while -- for one individual?
About 1-in-20 for p=0.05
Title: Overcoming the Perception Problem
Post by: Calvin on 2012-10-24 12:55:48
You certainly have a few points. Brief and itemized:
- The three-letter acronym you quoted was “DBT”, not “ABX” 
- ABXing can be done sighted or not. (“can” means “can”, not “should”)
- Yes, those prejudiced to believe that there is no difference, will be inclined to report no difference. But
(i) As we cannot, strictly speaking, prove negatives any more than we can disprove a “before this universe was born, I was incarnated as a Russell's teapot (http://en.wikipedia.org/wiki/Russell%27s_teapot)” (yes this is totally unsymmetric, that is well known ... and widely accepted as appropriate) – the conclusion from a negative is simply “Do not reject the null hypothesis”. As opposed to “Null hypothesis proven”, which is not valid.


Yes, I’m sorry, I omitted a step. I was talking about the way double blind procedures are said to be applied to ABX and other analogous methods of conducting listening tests. The Russell paradox has nothing to do in my opinion with the effectiveness of the way you plan to get rid in your test result of influences caused by subjective reactions not correlated with the stimulus you are testing. In the case of listening tests, in my opinion, the usual way has logical flaws.
DBT procedure imply that the examiner and the examinees are both not aware of every relevant information that might influence (consciously or unconsciously) the result of the test. DBT procedures are being developed to run test where “the mind” might influence results but you can also verify results by direct observation of the examinee. This is impossible for perceptual listening tests and their results may be also influenced (being perceptual) by almost everything. What I mean is that the level of “blindness” must be set to maximum and you have to cross check your results and your test procedures and if you use a statistical approach you need to be very rigorous.

Quote
(ii) We can remedy by introducing a comparison with known difference. (E.g., if test subject cannot differentiate between first-release and a brickwalled remaster, then you should not be surprised if they cannot identify a 96kb/s lossy of one of them.)


Yes, that is a way I was thinking about. But I have the feeling that even in highly regarded scientific circles (as AES) this is not the approach (but I may be proven wrong) and everybody prefer to run a raw ABX test without to much worries. I think that the attitude towards the whole thing is the one expressed, e.g., by 2Bdecided's last post and in my opinion is not an useful one.

Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-24 13:38:46
Indeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.


I am no scientist nor technician (nor a native English speaker either as you may guess, I am sorry) but I really don't understand how the ABX method is expected to ban “placebo” as a variable of the testing procedure. If I understand the way the tests are run it is not “blind” at all. People know what are A's and what are B's and they are asked to match them to X's.


The the test is sighted for As and Bs, but blind for Xs.  Only Xs are scored.

Quote
If your inner belief is (possible placebo effect) that there is an “audible” difference between A's and B's and the difference is not “audible” you will fail to get a result beyond mere odd.


odd? I hope you mean random.

Random scores for identifying the X's  is consistent with the idea that ABX is a DBT.

Quote
In this case we can affirm placebo effect is avoided (but at the cost of not being able to tell odd from a maybe statisical relevant result). But what if your inner belief is that there is no difference at all?


Often, the sighted part of the test is sufficient to dispel that notion.

Quote
I guess in this case placebo would affect the result of the test.


You can check that out by not telling the listeners what A & B actually are or even simply lying to them. Simple enough and its been done many times. Doesn't seem to improve the results.

Quote
My point is that this method does not seem “scientific” at all to me, it is completely “asymmetric” as shown by the fact that in the best hypothetical case can verify something only if it's obvious under test condition and is unable to falsify any hypothesis from the less than obvious to the absolutely impossible.


It appears to me that you don't understand the test. I sense a language problem.

You're staking your argument on the word obvious which seems to be used in a very vague way.

For example, aren't all reliably audible differences in some sense obvious?

The ABX critics will say that ABX works well enough for differences that are very, very  obvious, but not at all for differences that are in their judgement merley obvious.

Your obvious is my subtle or vice-versa! ;-)
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-24 13:43:25
There is little point to any listening test, blind or sighted, where the listener is pre-disposed to ignore any audible differences.


Right, and the listener training scheme I point out in an earlier post filters those people out.

Quote
The onus is on those who say/believe that they hear an audible difference to demonstrate it in a double-blind test. Those who say there is no audible difference have no reason to take the test.


Agreed.

Quote
(Though, occasionally, people who don't think they hear an audible difference will take the test for the heck of it, or to satisfy their own curiosity, and in doing so sometimes prove to themselves and others that there is a barely audible difference).


Agreed.

The sighted aspects of ABX can help that happen more often.
Title: Overcoming the Perception Problem
Post by: Calvin on 2012-10-24 15:20:33
odd? I hope you mean random.

Yes, I am sorry, please change the word "odd" in my post with "random guessing". Please do not think mine is an attack to the ABX method in itself, I just have doubts aroused by the way the tests are in many cases actually done and maybe by the fact I do not understand the procedure (and the language). So before I reply to this post saying something stupid, please give me a link to this:

Quote
Right, and the listener training scheme I point out in an earlier post filters those people out.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-24 17:25:40
odd? I hope you mean random.

Yes, I am sorry, please change the word "odd" in my post with "random guessing". Please do not think mine is an attack to the ABX method in itself, I just have doubts aroused by the way the tests are in many cases actually done and maybe by the fact I do not understand the procedure (and the language). So before I reply to this post saying something stupid, please give me a link to this:

Quote
Right, and the listener training scheme I point out in an earlier post filters those people out.




http://www.hydrogenaudio.org/forums/index....st&p=811818 (http://www.hydrogenaudio.org/forums/index.php?showtopic=97365&view=findpost&p=811818)
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-25 08:40:39
However, on a number of occasions, although I can not identify any difference, I have come up with perfect matches every guess. This could happen randomly, even though the probability is very small. But is randomness a reasonable explanation for it to occur every once in a while -- for one individual?
About 1-in-20 for p=0.05


The statistics I took was interesting, and kind of fun, but it was a very long time ago. I've had no use for trying to remember any of it since then except for very simple circumstances. With my usual ten trials for a sample, that comes out to p=0.001 that all 10 of my correct guesses were the result of chance, or likely to happen 1 time in 1000, no? If it happened with two consecutive ten trial tests, each on a different album, it would be the product, 0.001X0.001 or  1 in 1,000,000?

I'm not sure what the calculation would be if, instead of consecutive tests, the two peculiar outcomes of ten trails tests were separated by ten or twelve different albums for which the tests were more easily comprehended. I.E. instead of ten trials of correct guesses where I am unable to tell how I make the correct choice, each of the intervening album tests seemed reasonably clear: I can't tell any difference and the results are about 50/50 or the results are, more or less, ten perfect guesses, but I can tell why.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-25 10:50:58
With my usual ten trials for a sample, that comes out to p=0.001 that all 10 of my correct guesses were the result of chance, or likely to happen 1 time in 1000, no? If it happened with two consecutive ten trial tests, each on a different album, it would be the product, 0.001X0.001 or  1 in 1,000,000?


The coin has 50/50 chance of getting one right. Under the null (i.e., if you were guessing like a fair coin), the probability of getting N out of N right, is 1:2^N. With N=10, the coin has got the chance of 1/1024. For N=20, the coin has got the chance of  1/(1024*1024), i.e. slightly less than one in a million.

If you get 9 of 10 right: There are 1024 different outcomes; one with 10/10 right and ten with precisely 9/10.  Probability of getting 9 or more by coinflipping, is 11/1024, or just above 1 percent. (That is: if you are targetting a p=.01 threshold, decide to do 11 rather than 10. Also, 8/10 barely misses the .05 threshold.)


I'm not sure what the calculation would be if, instead of consecutive tests, the two peculiar outcomes of ten trails tests were separated by ten or twelve different albums for which the tests were more easily comprehended.


By “for which the tests were more easily comprehended”, do you mean “for which the differences are easier to catch” or ... ?

If you are testing tenfourteen albums, score 50/50 on thirteen of them and 9/10 on the fourteenth, that just misses the .05 threshold. Is your question then whether the fourteenth is different from the others?
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-25 12:39:13
To try to state if more clearly.

First, I am talking about samples treated two or more different ways, so there are data differences,  but I don't know if there will be audible differences. I run ABX tests to find out what I can or can't hear.

I do a ten trial test on a sample. I can not (consciously) hear any difference, therefore I have no idea whether X is A or B. I make a choice based on how I feel about it at the moment, which is indeed rather vague and may or may not be identical to flipping a coin and using the result. Anyway, I get all ten correct.

There is some probability that I did this simply through random chance. Now I chose another sample. Lets make it from a different album (also recorded from an LP) to make it less likely that I am somehow biasing things with the album I just used. Also lets choose some different treatment to use on this sample, to make sure there isn't a lurking bias in how I prepared the two copies of the sample.

Now I run an ABX test with ten trials on this new material. Again, I can't hear any difference but I make an attempt to guess something. Again I get all ten correct, I know not how.

If looked at in isolation, the probability for this second test's results are the same as the probability for the first test's results. However, if both are random results, together the probability is much less. It is at least as small as getting 20 random correct guesses on one test of twenty trials, no? If I had run 200 independent tests to get those two peculiar ones, they might seem less peculiar, or perhaps to say it might seem less likely that there is some unconscious but non-random factor operating.

The question presented in the last post has to do with such 10/10 tests happening every once in a while. How is the probability of N such results computed if there are five, six, or twelve (but not hundreds) of non-weird tests conducted between those getting this peculiar result? Is there any difference, considering all the tests done,  if all the weird ones occur consecutively compared to if they occur at random intervals among more probable results?

The more probable results have been defined, in the last post as (1) I get good scores because I can recognize the differences and (2) I get random scores because I can't recognize the difference. The "weird" tests are those where I can't identify a difference but get all correct.

People often do things without conscious intention and without conscious awareness. Parents respond in a quite unreasonable way to something their child does and are totally unaware that they are repeating the irrational emotional behavior of their own parents, who probably learned it from their parents, etc.

Emotions effect perception. Expectation and belief effect perception. The main concern here at HA is the tendency of these factors to produce perceptions that do not conform to sensory input. The ABX test is supposed to block that.

Too many of the "weird" results described above might be evidence of the opposite effect: the differences are not below the threshold of detection but for some reason are below the threshold of consciousness.

That may be a rather outre hypothesis but that is the reason for my questions. Having experienced it myself from time to time (with no way to know if it is or is not really just random), I've often wondered about this when people report high identification scores without really saying whether or not they knew what they were hearing -- or maybe I just haven't paid close enough attention to what has been written.

Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-25 13:54:45
To try to state if more clearly.

First, I am talking about samples treated two or more different ways, so there are data differences,  but I don't know if there will be audible differences. I run ABX tests to find out what I can or can't hear.

I do a ten trial test on a sample. I can not (consciously) hear any difference, therefore I have no idea whether X is A or B. I make a choice based on how I feel about it at the moment, which is indeed rather vague and may or may not be identical to flipping a coin and using the result. Anyway, I get all ten correct.

There is some probability that I did this simply through random chance. Now I chose another sample. Lets make it from a different album (also recorded from an LP) to make it less likely that I am somehow biasing things with the album I just used. Also lets choose some different treatment to use on this sample, to make sure there isn't a lurking bias in how I prepared the two copies of the sample.

Now I run an ABX test with ten trials on this new material. Again, I can't hear any difference but I make an attempt to guess something. Again I get all ten correct, I know not how.


AFAIK, you are not doing a test with a guaranteed null outcome, like say a comparison of interconnects.

The results you've described could be attributed to listener learning.

The big question is what happens when you run the third and fourth sets of 10?
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-25 17:19:33
If looked at in isolation, the probability for this second test's results are the same as the probability for the first test's results. However, if both are random results, together the probability is much less. It is at least as small as getting 20 random correct guesses on one test of twenty trials, no?


Let me address this first – I know you ask more interesting questions below. You have a null hypothesis that you are equivalent to a fair coin. The p-value is calculated under this hypothesis. But if the stopping time is stochastically dependent on the history, then you can no more think in these 2^n terms. If you have prespecified "do 10 and then do 10 on another sample pair", then the probability that the coin would get everything right, is 1:2^20, or one in a million.

But – and I know this is out of line with your hypothetical results – if you fail the first test, then run a second, and then – because that turned out with a low “standalone” p-value – you stop, then what? That's a different story. To calculate the coin's probability, you need to specify the stopping rule precisely, and the easy way of doing that, is to require the user to prescribe a number of experiments: do this, then carry the result to your local statistician.




The question presented in the last post has to do with such 10/10 tests happening every once in a while. How is the probability of N such results computed if there are five, six, or twelve (but not hundreds) of non-weird tests conducted between those getting this peculiar result? Is there any difference, considering all the tests done,  if all the weird ones occur consecutively compared to if they occur at random intervals among more probable results?

The more probable results have been defined, in the last post as (1) I get good scores because I can recognize the differences and (2) I get random scores because I can't recognize the difference. The "weird" tests are those where I can't identify a difference but get all correct.



I'll present a setup, and it may or may not be what you have in mind:

Suppose that you have many distinct pairs: signal1A+signal1B, signal2A+signal2B, etc., up to signalNA+signalNB. Suppose that you ABX each pair “10 times” -- call that a “pair” in the following.  Calculate p-values, pair by pair.  Then one question is: how likely is it that one of these pairs would have a certain “low” p-value?  I.e., you want the distribution – under the null – for the smallest number of p1,p2,...,pN. You then want the appropriate quantile for this distribution as a benchmark.  Again it is crucial to specify the test for each pair – and the total number of pairs – in advance.  (Otherwise, you would get an N-dependent m(N) which the ignorant or dishonest user could stop first time it looks favourable.)

More generally, you might for each single pair #n form an alternative hypothesis Hn: “Detectable difference between the A and B of pair #n.” Then ask the questions:
- what is the number of “false positive” pairs reported if you calculate each p-value and cherry-pick?
- what should the threshold be to reject the “Every Hn false” null?
- what is the expected number of true Hn's, given the data?


There is a theory for testing multiple hypotheses simultaneously.  One not uncommon way is the Bonferroni correction, see http://en.wikipedia.org/wiki/Bonferroni_correction (http://en.wikipedia.org/wiki/Bonferroni_correction) and the references therein.  Another approach is Schweder / Spjøtvoll (1982) (free version here) (http://folk.uio.no/tores/Publications_files/Schweder_Spjotvoll_1982.pdf).




People often do things without conscious intention and without conscious awareness.


Yes.  They pay notice when something “incredible” happens. Surely this cannot be a coincidence? Yes it can, if the number of trials is big, and did you count how many “attempts” Chance made at getting your attention?

If one can test again, then fine: do that. Use random/arbitrary discoveries as a basis for forming hypoehteses, and then set up a new single test.  (Does not work in the science of history ... unless you are so unfortunate that (i) people don't learn, and (ii) you have a hotline to a dictator who thinks your experiment idea sounds funny.)
Title: Overcoming the Perception Problem
Post by: AndyH-ha on 2012-10-26 13:04:14
So it really isn't possible to say much, statistically, about some casual observations of what passes one by. One has to adopt some particular statistical model and collect data in accordance with the model's requirements. I suspect, in regard to this question, the most one could say after collecting enough data is that the number of 10/10 scores, where I believe I hear no difference between A and B, is either within normal variations or is unusually common, depending upon how the number add up. If one wanted to actually get a handle on the idea of whether or not (some) people can hear, and respond to, small differences, without being aware that they can hear them, one would have to come up with some more clever experiments.

Most of the time when I am convinced that I do not hear a difference, I give up part way through. Perhaps that indicates some kind of psychological bias against finding that there is (seems to be) a difference in the treatments. I think it just indicates boredom. I don't think I ever even bother to check the score for the trials I did complete. Those that I mentioned, 10 correct out of 10 when I have no idea which is which, are exceptions where I had some particular reason, or whim, to produce a score.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-10-26 14:24:23
Most of the time when I am convinced that I do not hear a difference, I give up part way through. Perhaps that indicates some kind of psychological bias against finding that there is (seems to be) a difference in the treatments. I think it just indicates boredom. I don't think I ever even bother to check the score for the trials I did complete. Those that I mentioned, 10 correct out of 10 when I have no idea which is which, are exceptions where I had some particular reason, or whim, to produce a score.


I see a different issue here. When we make a change we are hoping for a difference that we don't have to resort to high effort testing to hear.

I've done ABX tests that were positive for audible differences without ever actually hearing what I thought was really a difference. There was a technical difference that on a good day may have been large enough to hear, but it was on the borderline.  I walked away not exactly a fan of working to obtain that difference. ;-)
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-26 18:02:11
So it really isn't possible to say much, statistically, about some casual observations of what passes one by. One has to adopt some particular statistical model and collect data in accordance with the model's requirements.


Well ... even when you cannot get the one and only true p-value out of a dataset without making explicit or implicit assumptions, the working practitioner need not be completely lost, of course.  In sciences where you cannot redo experiments, and have to pick up the data you get, they can still do statistical analyses, but they will be more vulnerable to the assumptions on the statistical model. For example, just because you cannot run the dinosaur age over, that doesn't mean that statistics couldn't be a useful tool in paleontology. And it is hardly controversial to claim that the Wall Street crash 15 years and a week ago, is sufficient to reject a hypothesis of random walk Gaussian logreturns. Among financial analysts, there was a “this month's million-year-event” tongue-in-cheek expression; on one hand, if you dig through your data looking for something that looks weird, you will find it, but on the other, using “worst day” as a test statistic isn't that far from what you could have chosen ex ante. You may of course argue that if you just pick up the ex post extreme events, you should have taken into account e.g. the merchantile exchange as well (if nothing happened there, there would be – grossly oversimplified – twice as many “normal” days, right?), but that does only contribute a minor tweak to an insane p-value.

There are a couple of types of inferences we must avoid. One is “a man produces so many sperms that the p-value that you have precisely that genetical combination, is one in a hundred million (even given that we know which ejaculation you were conceived from)” [no pun intended for the p here ...]. Fallacy:  shouldn't there have been a winner in the raffle/tombola? Another – suppose you have a lottery based on betting on a random draw of one number from 1 to (large) N. It isn't given that anyone will have betted on the correct one, but an ex ante probability (under the null) of someone winning, requires some (statistical) knowledge of the bets. One bet? A billion bets? It matters.
Title: Overcoming the Perception Problem
Post by: Porcus on 2012-10-26 18:35:03
Here you got two apparent statistical issues – well assuming the article is to be believed: http://www.guardian.co.uk/world/2012/oct/2...l-sweden-murder (http://www.guardian.co.uk/world/2012/oct/20/thomas-quick-bergwall-sweden-murder) . Case in brief: convicted for several murders without much more evidence than confessions; in many cases the prosecution even did argue then that his confessions were indeed reliable (he was, after all, insane).

Prosecution claimed that Berwall had provided information that only the police and the murderer could know about. Let us disregard the claims that he could indeed pick it up in the newspaper, and let us interpret “know about” as “guess without knowing”. Arguing by the infinite monkey theorem, he could of course have guessed sufficiently close, given a large number of attempts.
Had this been a listening test, it would have been the fallacy of reporting only your positives. Either on a single pair to be ABXed – or, alternatively, going to hydrogenaudio with the one nice ABX log out of a hundred.
(I suggested a partial foobar2000 fix against this: http://www.hydrogenaudio.org/forums/index....showtopic=96006 (http://www.hydrogenaudio.org/forums/index.php?showtopic=96006) )

Consider then Bergwall's claim that he could guess the right answer out of leading questions.
That's lack of blinding. The test administrator is biased, he wants to have his product sold, and the listener can tell from the grin of his face that this time, X = his expensive product.
Title: Overcoming the Perception Problem
Post by: onlyconnect on 2012-12-01 14:16:52
OK: I'll stop worrying about being chatted up, and concentrate on worrying about the incoming ad hominem. Although, to keep my options open - in case you're just cheekily playing hard to get - I'm a 43-year old Gemini vegetarian: mainly single, with a good sense of humour and a two-bedroom flat in a nice part of South London. I'm not representing anyone: I like music, I think how we listen is interesting. I'm not presenting myself as an authority. I think it's important not to ignore facts, and to be impartial when reasoning from inferences.


Umm, I know you on another forum as a trade member under the name "Item audio". Is that not relevant?

Tim
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-12-02 00:21:28
Consider then Bergwall's claim that he could guess the right answer out of leading questions.


Seems like a very believable claim.

Further light would be shed by a critical review of a recording of the interrogation.

Quote
That's lack of blinding. The test administrator is biased, he wants to have his product sold, and the listener can tell from the grin of his face that this time, X = his expensive product.


No further effort need be expended to support a claim that single blind tests are very likely to be compromised. It is an accepted fact.

Making listening evaluations double blind is usually easy enough. If there is any doubt, simply redo the evaluation as a DBT.
Title: Overcoming the Perception Problem
Post by: jaythesail on 2012-12-02 00:43:57
If I may include a side note to the discussion...

As a rather average punter I have a lot of respect for those who continually (and usually politely) hold the fort against the continual onslaught of misinformation, bad logic and just plain bs. To the regulars - I salute you.

For myself and my few audio enthusiast friends, ABX testing on showing whether A is better than B isn't so significant. It is the ABX testing showing there is no difference that is more important. We know that there isn't some next higher level of musical enjoyment that we are missing out on because we haven't invested mega bucks into cables, conditioning sprays or whatever else someone might want to sell. Clarity to cut through the BS and focus on the simple stuff that matters.
Title: Overcoming the Perception Problem
Post by: Arnold B. Krueger on 2012-12-02 13:05:43
If I may include a side note to the discussion...

As a rather average punter I have a lot of respect for those who continually (and usually politely) hold the fort against the continual onslaught of misinformation, bad logic and just plain bs. To the regulars - I salute you.

For myself and my few audio enthusiast friends, ABX testing on showing whether A is better than B isn't so significant.


Good for you to say that because ABX is not designed for testing preferences.

Quote
It is the ABX testing showing there is no difference that is more important. We know that there isn't some next higher level of musical enjoyment that we are missing out on because we haven't invested mega bucks into cables, conditioning sprays or whatever else someone might want to sell. Clarity to cut through the BS and focus on the simple stuff that matters.


When we invented ABX we knew that it was not a tool for testing preferences, so we were a bit worried about how to handle the age-old question of which sounds better. We were frankly stunned when we found that it was often very hard to hear differences among very different components when deprived of sighted and other cues.
Title: Overcoming the Perception Problem
Post by: StephenPG on 2012-12-02 15:35:17
Recording - room/loudspeaker and nothing else...

It really is that simple.