Does an untestable hypothesis even have a place in science?
Sighted listening tests can have all the problems of "stress" and "altered perception" that item ascribes to blind tests. If item is right, the very fact that we are not listening purely for the enjoyment of the music (or why ever you normally listen to your stereo), and are listening, at least partly, with a view to forming a judgement - that simple fact has nullified our attempt to form a correct judgement related to the practice of normal listening.
I for one do not discard the statement “Putting people in a testing lab alters their behaviour”. Not even if “perception” is part of “behaviour”. Greynol (post #9) puts forth the opinion that people are more alert in a testing lab – in which case the behaviour does indeed change.
The actual claim is “Putting people in a testing lab always alters their behavior in such a way that they are always transformed from highly sensitive detectors of audible differences into lumps of coal who are deaf to all but the grossest of auditory stimuli".
Indeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.
Quote from: Porcus on 23 October, 2012, 07:26:51 AMIndeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.I am no scientist nor technician (nor a native English speaker either as you may guess, I am sorry) but I really don't understand how the ABX method is expected to ban “placebo” as a variable of the testing procedure. If I understand the way the tests are run it is not “blind” at all.
However, on a number of occasions, although I can not identify any difference, I have come up with perfect matches every guess. This could happen randomly, even though the probability is very small. But is randomness a reasonable explanation for it to occur every once in a while -- for one individual?
You certainly have a few points. Brief and itemized:- The three-letter acronym you quoted was “DBT”, not “ABX” - ABXing can be done sighted or not. (“can” means “can”, not “should”)- Yes, those prejudiced to believe that there is no difference, will be inclined to report no difference. But(i) As we cannot, strictly speaking, prove negatives any more than we can disprove a “before this universe was born, I was incarnated as a Russell's teapot” (yes this is totally unsymmetric, that is well known ... and widely accepted as appropriate) – the conclusion from a negative is simply “Do not reject the null hypothesis”. As opposed to “Null hypothesis proven”, which is not valid.
(ii) We can remedy by introducing a comparison with known difference. (E.g., if test subject cannot differentiate between first-release and a brickwalled remaster, then you should not be surprised if they cannot identify a 96kb/s lossy of one of them.)
Quote from: Porcus on 23 October, 2012, 07:26:51 AMIndeed, DBT procedures are introduced for precisely that reason – it is well documented that placebo is too significant to be ignored.I am no scientist nor technician (nor a native English speaker either as you may guess, I am sorry) but I really don't understand how the ABX method is expected to ban “placebo” as a variable of the testing procedure. If I understand the way the tests are run it is not “blind” at all. People know what are A's and what are B's and they are asked to match them to X's.
If your inner belief is (possible placebo effect) that there is an “audible” difference between A's and B's and the difference is not “audible” you will fail to get a result beyond mere odd.
In this case we can affirm placebo effect is avoided (but at the cost of not being able to tell odd from a maybe statisical relevant result). But what if your inner belief is that there is no difference at all?
I guess in this case placebo would affect the result of the test.
My point is that this method does not seem “scientific” at all to me, it is completely “asymmetric” as shown by the fact that in the best hypothetical case can verify something only if it's obvious under test condition and is unable to falsify any hypothesis from the less than obvious to the absolutely impossible.
There is little point to any listening test, blind or sighted, where the listener is pre-disposed to ignore any audible differences.
The onus is on those who say/believe that they hear an audible difference to demonstrate it in a double-blind test. Those who say there is no audible difference have no reason to take the test.
(Though, occasionally, people who don't think they hear an audible difference will take the test for the heck of it, or to satisfy their own curiosity, and in doing so sometimes prove to themselves and others that there is a barely audible difference).
odd? I hope you mean random.
Right, and the listener training scheme I point out in an earlier post filters those people out.
Quote from: Arnold B. Krueger on 24 October, 2012, 08:38:46 AModd? I hope you mean random. Yes, I am sorry, please change the word "odd" in my post with "random guessing". Please do not think mine is an attack to the ABX method in itself, I just have doubts aroused by the way the tests are in many cases actually done and maybe by the fact I do not understand the procedure (and the language). So before I reply to this post saying something stupid, please give me a link to this:QuoteRight, and the listener training scheme I point out in an earlier post filters those people out.
Quote from: AndyH-ha on 24 October, 2012, 06:34:45 AMHowever, on a number of occasions, although I can not identify any difference, I have come up with perfect matches every guess. This could happen randomly, even though the probability is very small. But is randomness a reasonable explanation for it to occur every once in a while -- for one individual?About 1-in-20 for p=0.05
With my usual ten trials for a sample, that comes out to p=0.001 that all 10 of my correct guesses were the result of chance, or likely to happen 1 time in 1000, no? If it happened with two consecutive ten trial tests, each on a different album, it would be the product, 0.001X0.001 or 1 in 1,000,000?
I'm not sure what the calculation would be if, instead of consecutive tests, the two peculiar outcomes of ten trails tests were separated by ten or twelve different albums for which the tests were more easily comprehended.
To try to state if more clearly.First, I am talking about samples treated two or more different ways, so there are data differences, but I don't know if there will be audible differences. I run ABX tests to find out what I can or can't hear.I do a ten trial test on a sample. I can not (consciously) hear any difference, therefore I have no idea whether X is A or B. I make a choice based on how I feel about it at the moment, which is indeed rather vague and may or may not be identical to flipping a coin and using the result. Anyway, I get all ten correct.There is some probability that I did this simply through random chance. Now I chose another sample. Lets make it from a different album (also recorded from an LP) to make it less likely that I am somehow biasing things with the album I just used. Also lets choose some different treatment to use on this sample, to make sure there isn't a lurking bias in how I prepared the two copies of the sample.Now I run an ABX test with ten trials on this new material. Again, I can't hear any difference but I make an attempt to guess something. Again I get all ten correct, I know not how.
If looked at in isolation, the probability for this second test's results are the same as the probability for the first test's results. However, if both are random results, together the probability is much less. It is at least as small as getting 20 random correct guesses on one test of twenty trials, no?
The question presented in the last post has to do with such 10/10 tests happening every once in a while. How is the probability of N such results computed if there are five, six, or twelve (but not hundreds) of non-weird tests conducted between those getting this peculiar result? Is there any difference, considering all the tests done, if all the weird ones occur consecutively compared to if they occur at random intervals among more probable results?The more probable results have been defined, in the last post as (1) I get good scores because I can recognize the differences and (2) I get random scores because I can't recognize the difference. The "weird" tests are those where I can't identify a difference but get all correct.
People often do things without conscious intention and without conscious awareness.
Most of the time when I am convinced that I do not hear a difference, I give up part way through. Perhaps that indicates some kind of psychological bias against finding that there is (seems to be) a difference in the treatments. I think it just indicates boredom. I don't think I ever even bother to check the score for the trials I did complete. Those that I mentioned, 10 correct out of 10 when I have no idea which is which, are exceptions where I had some particular reason, or whim, to produce a score.
So it really isn't possible to say much, statistically, about some casual observations of what passes one by. One has to adopt some particular statistical model and collect data in accordance with the model's requirements.