Skip to main content


Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Overcoming the Perception Problem (Read 88529 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Overcoming the Perception Problem

Reply #50
The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception.
Is that not pretty much a summary of the placebo effect?

Overcoming the Perception Problem

Reply #51
Is that not pretty much a summary of the placebo effect?

Yes. Does this change anything? Placebos could be shown to have significant causal effect.

Overcoming the Perception Problem

Reply #52
But is there a problem?

Overcoming the Perception Problem

Reply #53
Not necessarily. Suppose I want to establish a “sufficiently good” (for whatever purpose) end-user format. Then I am not satisfied with your score on your music, unless I am only targetting you as a customer. Even if your music does not have nasty enough artifacts for you to detect (or find annoying), it might be different with other ears and other signals. (Of course, you then need to use the appropriate method (test / design of experiment) to check whether the accuracy is better than random, but that is a practical obstacle.)
If 5 percent of the listeners hear differences on 10 percent of their music collection, is then the format “transparent”? I think not. It may be good enough for the purpose, by all means, but it does not mean that there are no audible differences.

You misunderstood me.
I am conducting ABX test for MYSELF. I am not conducting ABX test to gain statistical knowledge if people can hear difference.
I understand you have to have some sort of statistical chance for error when doing multiple users test, but I am talking about single person making test for his (or her's) own advantage and knowledge.
That margin, if testing codecs for personal knowledge, is irrelevant, IMO.

Overcoming the Perception Problem

Reply #54
Yes. Does this change anything? Placebos could be shown to have significant causal effect.
Yes, but in the case of sighted vs ABX, placebo causes the sighted test to be biased in favour of the source of which the test subject has a preconceived preference.

Overcoming the Perception Problem

Reply #55
I do not see how calling the phenomenon "preconveived preference" changes anything. It is a variable one tries to eliminate in many tests, but why here? The subject, in its usual environment, produces different results than the same subject in a modified ("bias eliminating") environment. If the subject wants to compare a Burmester vs. a Teac vs. a Sansa Clip for future usage in its usual environment and as the person he/she is, a sighted test might be more appropriate than a DBT to identify the product with the best perceived (by this subject) performance.

Overcoming the Perception Problem

Reply #56
The sighted test difference is not coming from the equipment, its origin is within the test subject. The results do not reveal anything about the equipment. Doing sighted tests just reinforce the individual's bias.

If the purpose of the test is to make the subject feel good about his preferences, then maybe the sighted test is useful: The person has just spent significant money. Buyer's remorse is starting to hit hard. Ahh, relief! The sighted test says he made the right choices after all.

Overcoming the Perception Problem

Reply #57
@googlebot: You are now allowing the results to be heavily skewed in favour of the equipment that the test subject "wants to be the best" (for whatever reason - cost, etc).

In your example it is no longer about whether any differences can be heard by the test subject in a objective test, rather whether the test subject states preference for the output of their preferred equipment in a blatantly subjective test.

Overcoming the Perception Problem

Reply #58
Buyer's remorse is starting to hit hard. Ahh, relief! The sighted test says he made the right choices after all.

...unless buyer's remorse has sunk in and the person is now biased against the new purchase.

Overcoming the Perception Problem

Reply #59
I am conducting ABX test for MYSELF.

Even still, a failed test only fails to demonstrate that an individual can distinguish a difference during that instance. Training and/or rest may affect the outcome of a future test, as examples.

Overcoming the Perception Problem

Reply #60
Except,  DBT doesn't do that.

Why do sighted test regularly lead to different results, then? Just calling it "bias, that should be eliminated" doesn't change the fact.

You wrote: Double-blind testing of isolated senses...

DBT does not necessarily 'isolate' any senses.  Your can see, hear, taste, touch smell.  All that has changed is what you *know*. 

Imagine the following test setup: A test subject is presented music supposedly sourced from either a Sansa Clip or his favorite Burmester rack. You present an expensive looking switch to him, that's basically a dummy and that only inserts a small pause, but connects to the Clip at all times. Now imagine, you'd get a statistically significant result, that the subject rates the sound quality consistently higher, when he believes it to be coming from his Burmester rack / not coming from the Sansa Clip.

Now do a second test, this time double blind with both sources actually connected. Imagine the subject now fails to identify a difference.

What can we draw from this, especially when the subject was a honest type, sincerely motivated to rate the quality exactly as he perceived it in the first setup, without trying to prove or defying anything?

The first time he failed to identify that there was in fact no difference, and we can reasonably attribute that to sighted bias.  The second time he may well have successfully identified that there was no difference, or may have failed to identify a real, but small, difference.

First, HA habit, the subject should stop claiming, that his Burmester setup sounds better than a Sansa Clip, as proven by the DBT. HA usually stops here.

But maybe one shouldn't. The belief, that sound was coming from a impressively crafted sound system, was able to significantly alter the subjects perception. In addition, the subjects usual mode of listening is reflected much better in the first setup than in the second (DBT).

This is no different from putting the same cheap wine in differently-priced bottles.    Subjects often think the pricier wine tastes better.  So, what does that tell us about the *wine*?  What does your  listeners *beliefs* about a piece of gear, tell us about the *gear*? What claims can reasonably be made about the relative performance of A and B? 

Overcoming the Perception Problem

Reply #61
I am conducting ABX test for MYSELF.

Even still, a failed test only fails to demonstrate that an individual can distinguish a difference during that instance. Training and/or rest may affect the outcome of a future test, as examples.

Yes, but I am conducting the test at that one point of time. And the results are valid for that test.
Of course you should take what, 16 full turns? But they don't have to be the same day. Or week.

Overcoming the Perception Problem

Reply #62
Yes, but I am conducting the test at that one point of time.

Strange to read your initial postings in this thread now after you have tried to downplay the applicability of the test to a one-time personal experience with a clear-cut full sensitivity/specificity.

Overcoming the Perception Problem

Reply #63
I think Googlebot is making a valid philosophical point. If the shiny thing sounds best to you (because of placebo), and you want the thing that sounds best to you, you can be very grateful to the shiny thing and placebo for delivering it.

Though you'd better not probe too deeply - because, while the only guaranteed way to completely remove placebo is to take away the knowledge of what you're listening to, you can certainly reduce placebo (or change the direction in which it operates) by introducing doubt as to whether something really does sound better.

This latter effect is probably the cause of the audiophile's never ending upgrade path.

The downsides to this are many. e.g.
1) your entire investment can be rendered worthless to you by anything that causes placebo to break down - that's a pretty risky investment.
2) if you had blind tested before purchase, you would probably have chosen the cheapest well-made thing that sounded as good as everything else - saving you money, and giving you your own unshakable placebo effect in enjoying that equipment - you see, ABX-lovers can enjoy their own placebo experience after having chosen the equipment. They've proven scientifically that it's as good as it needs to be, and then placebo can add to the subjective perception that it's as good as they could possibly perceive it to be.
3) some nice looking equipment sounds objectively awful, and doesn't work very well. While you might be able to convince yourself that it sounds wonderful, you'll still have the pain of unreliability, quickly/difficult operation, and the anxiety of damaging or wearing your music collection away every time you play an LP (if sighted testing led you to choose vinyl over CD).

However, sighted listening equipment purchasing is great for the economy (you just keep spending money), and it avoids time consuming things (e.g. proper listening tests), and difficult questions such as "how well can you hear anyway?"


Overcoming the Perception Problem

Reply #64
I think Googlebot is making a valid philosophical point. If the shiny thing sounds best to you (because of placebo), and you want the thing that sounds best to you, you can be very grateful to the shiny thing and placebo for delivering it.

Though you'd better not probe too deeply - because, while the only guaranteed way to completely remove placebo is to take away the knowledge of what you're listening to, you can certainly reduce placebo (or change the direction in which it operates) by introducing doubt as to whether something really does sound better.

This latter effect is probably the cause of the audiophile's never ending upgrade path.

The downsides to this are many. e.g.
1) your entire investment can be rendered worthless to you by anything that causes placebo to break down - that's a pretty risky investment.
2) if you had blind tested before purchase, you would probably have chosen the cheapest well-made thing that sounded as good as everything else - saving you money, and giving you your own unshakable placebo effect in enjoying that equipment - you see, ABX-lovers can enjoy their own placebo experience after having chosen the equipment. They've proven scientifically that it's as good as it needs to be, and then placebo can add to the subjective perception that it's as good as they could possibly perceive it to be.
3) some nice looking equipment sounds objectively awful, and doesn't work very well. While you might be able to convince yourself that it sounds wonderful, you'll still have the pain of unreliability, quickly/difficult operation, and the anxiety of damaging or wearing your music collection away every time you play an LP (if sighted testing led you to choose vinyl over CD).

However, sighted listening equipment purchasing is great for the economy (you just keep spending money), and it avoids time consuming things (e.g. proper listening tests), and difficult questions such as "how well can you hear anyway?"


I believe as a friend pointed out you mean "expectation bias" not "placebo."

Your environment certainly plays a big role when testing gear.

I mean look, the DBT is CLEARLY pointless in Evan's Sound Room (the "couple" test is a better metric):

(safe for work)

Overcoming the Perception Problem

Reply #65
If ABXing negatively alters one's ability to hear differences, it's only a problem if you're using negative results to prove that there is no difference, which is a fallacy in any case: while a positive ABX result shows with a high degree of probability that there IS an audible difference, a negative result never proves anything.

Basically,  what a 'negative' ABX results means is that the hypothesis 'there is an audible difference' was not supported, with a 'p' chance (typically  1 in 20) that an audible difference nevertheless exists .

This doesn't seem correct to me. What skamp said I think is accurate. One can apply the statistical analysis you mention to a test where the the test subject, the listener, showed a strong ability to differentiate between the two sources, however one can't apply the same statement if he or she only had *random* results.

Besides there not being any actual audible differences between the two sources to mortal ears, other possibilities for such random results might include:

A. The listener wasn't trying very hard or was sleepy/fatigued/ill etc.
B. The listener was mischievous and *intentionally* gave random results.
C. The test conditions, such as the resolution/accuracy of the loudspeakers used, weren't up to the task, that day, etc.

What's important to note is that these three possibilities, A, B, and C, are *precluded* when the listener successfully *does* differentiate between the two DUTs. That's why one can correctly apply the statistics to such an outcome, only. Sure, there's a one in twenty chance the listener's results were just dumb luck, however there's a 95% chance it was because they truly could hear a difference.

Overcoming the Perception Problem

Reply #66
You're right that different terms apply when we are talking about rejecting vs accepting the *null* hypothesis (in this case, the 'no difference' hypothesis).  Rejecting null H  when it is true is a Type I error, accepting null H when it is false is a Type II.
When we get results we do statistics to calculate the probability that those results would have been obtained 'by chance'.  This is the p value.  We compare the p value to a pre-determined, more or less arbitrary (though traditions exist) maximum p value , usually 1 in 20 (0.05) , the alpha value.  So if our p < alpha, we reject the null H ('null H not supported') , otherwise not .

Alpha and p are really values for the probability of making a Type I error -- alpha is the pre-set threshold for 'acceptable' chance of Type I error, p is the calculated value for the obtained results.  If we get a p < alpha, then we say the chance that we made a Type I error, while by no means eliminated , is within our comfort zone.

It's true that my original post was really talking about a Type II error.  But in either case we use statistics to call our results 'random' or not, so I don't see how you can say that statistics only work for 'positive' ABX results.  Or maybe I'm just not understanding what you are getting at.  I didn't disagree with what skamp least, not intentionally!

Overcoming the Perception Problem

Reply #67
B. The listener was mischievous and *intentionally* gave random results.

I don't see how you can say that statistics only work for 'positive' ABX results.  Or maybe I'm just not understanding what you are getting at.  I didn't disagree with what skamp least, not intentionally!

What good are statistics if the listener acted in bad faith? If he decided to answer randomly, the result is meaningless. Whereas he could hardly act in bad faith in the other direction.

Overcoming the Perception Problem

Reply #69
What good are statistics if the listener acted in bad faith? If he decided to answer randomly, the result is meaningless. Whereas he could hardly act in bad faith in the other direction.

Sure he could, if he's determined, and the test isn't very carefully proctored.  So once you assume 'bad faith' then the results either way are useless. All reported ABX results on HA could be considered invalid, if you assume that cheating was involved.

(Btw if the stats show that  answers are *more* wrong than they should be by chance, that can also be a useful thing to know)

I've lost track, but why exactly are we going down this 'what if they're answering randomly on purpose' road?  The  supposed 'perception problem' is not one of bad faith.

Overcoming the Perception Problem

Reply #70
[Trying to bring this back on topic]

There is not a big distinction between consciously selecting random results  (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.

Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased? If you were to participate, do you really, honestly think you'd be giving it your best effort possible and that there's no way your bias could be influencing your selections, at least at a subconscious level?

Overcoming the Perception Problem

Reply #71
[Trying to bring this back on topic]

There is not a big distinction between consciously selecting random results  (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.

Baloney, that's what positive controls are for. You did build both negative and positive controls into your test, right?
J. D. (jj) Johnston

Overcoming the Perception Problem

Reply #72
Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned?  Please be specific, thanks.

Overcoming the Perception Problem

Reply #73
Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased?

Sure. Turning to medicine: what if we were to test the effect of homeopathy? I would have a fairly negative expectation bias, especially if I was told it was actually done the homeopathically “proper” way. This and the particular case you mention, could have been mitigated by not telling what specifically were tested. I assume you really shouldn't.

And if you have anything such at hand, introducing a third thingy with known effect could help the analysis. I.e., if you have A, B and C where the difference between A and C is well-established and quantified, and the listeners are biased as you describe (or merely not sufficiently randomly drawn – in practice you would have to deal with self-selection) then you might check if they can distinguish A and C better or worse than “the known average”. That could have been done in the homeopathy case as well. Problem is, it only tells you that you have no test.

Overcoming the Perception Problem

Reply #74
Yes, but I am conducting the test at that one point of time.

Strange to read your initial postings in this thread now after you have tried to downplay the applicability of the test to a one-time personal experience with a clear-cut full sensitivity/specificity.

If I am conductig ABX test of a codec settings when I think I hear difference, how is simple encoding wav file, loading it into foobar and running ABX comparator not valid?
Where does the percepcion comes into? I am listening music either on speakers or headphones (mostly headphones) - speakers being Brand A and headphones Brand B - where does exactly my (mis)perception kicks in? I am sorry, but your theory isn't very explainable - tell me where is, in my case, ABX test failing?
Or, for that matter, to anyone doing the same exact test? Or technically properly designed DAC/speaker test with ABX switchbox?
And correct me if I am wrong, but ABX test is primarily personal experience from which the results can be collected and statistically processed. The more personal results, the more accurate statistics.