Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: "Audiophile" listening event @ Definitive Audio in Seattle (Read 154877 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #225
The questions here are what is practiced, what is practical, what is confusing, and the degree (if any) to which long-term listening allows one to hear differences that aren't apparent during short-term listening. Which is a lot of questions. There are also questions of experimental design, e.g., ABXes of 44.1 kHz sampling that are in fact ABXes of 44.1. kHz sampling using certain algorithms with certain source material under certain conditions, and produce contradictory results as a consequence.


What makes you think that long-term listening is at all revealing of differences? Tom Nousaine has shown that a pool of experienced listeners cannot even reliably detect 4% distortion under long-term listening conditions (link), even though 4% distortion can be discerned with 100% accuracy under formal ABX test conditions (as shown later in the same article). Clearly, this suggests that ABX tests are in fact highly revealing of non-imaginary differences.


I think it depends on the type of difference. The brain doesn't analyze them all the same way. As I think I pointed out in another post, frequency response aberrations are analyzed almost immediately by what appears to be highly specialized neural circuitry. The same is likely true of timbre, which would include of course the detection of partials and harmonic distortion. What goes into long-term memory is for most of us an analyzed and simplified version of results. This however is useful when the phenomenon is too complex to analyze in real time.

Let me give you a real world example of that, involving vision, rather than sound. A few days ago, I repainted my bedroom. A previous sloppy paint job had left a ragged paint border on the edges of the wooden floor. After I pulled up the dropcloths and paint, I immediately noticed a new spot of paint, no more than a few millimeters wide. Sure enough, when I probed it it turned out to be a small fleck of paint that had fallen off the dropcloth. I would not have been able to make that determination in a real time A/B test, not without nauseatingly painstaking repetition, anyway. But the variations in the border had been recorded in my long-term memory, so I saw the difference immediately.

I think there is potentially an opposite extreme, as well -- steady-state distortions for which the brain compensates so rapidly that accommodation could occur even within the span of an ABX test.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #226
It's not so much that I'm faulting ABX, as that I'm faulting any test that suffers from a particular constraint, sighted or unsighted. One of those is, as you say, the selection of source material. In practice, I find that this is particularly troubling with some resonant phenomena. It's no secret for example that some ribbons have serious distortion spikes at certain frequencies.


But then you have to fault every possible scientific test, don't you?  They all suffer from constraints, every single one of them.

Of course that's why scientists repeat tests many times and refuse to accept them until other scientists have repeated them with the same results.
And even they they never report anything as an absolute truth but say things like "The probability of X lying somewhere between Y and Z is 99%", or if they don't then they aren't being scientists when that happens.

Ed


Sure. One of the things you learn pretty quickly in science is to take experimental results with the proverbial grain of salt. I have in my lifetime witnessed the discover of magnetic monopoles and cold fusion, not to mention any number of other phenomena that somehow vanished when others analyzed or attempted to reproduce the experiment. As Arnold said here, all scientific results are provisional, something that the popular press frequently doesn't understand.

Another thing you learn to recognize is the beautiful result -- the result that is compelling because it is astronomically impossible unless a theory conforms in some basic respect to the truth. And the elegant theory. It's surprising how often one can tell that a theory is wrong right out of the gate, or wonderfully right.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #227
I repainted my bedroom.

Once you know where to look for that spot of paint, ABXing becomes trivial.  No one has suggested that it is not OK for people to form subjective opinions, just that they should be verified though objective means.

Thank you for your anecdote.  Keep trying.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #228
Let me give you a real world example of that, involving vision, rather than sound. A few days ago, I repainted my bedroom. A previous sloppy paint job had left a ragged paint border on the edges of the wooden floor. After I pulled up the dropcloths and paint, I immediately noticed a new spot of paint, no more than a few millimeters wide. Sure enough, when I probed it it turned out to be a small fleck of paint that had fallen off the dropcloth. I would not have been able to make that determination in a real time A/B test, not without nauseatingly painstaking repetition, anyway. But the variations in the border had been recorded in my long-term memory, so I saw the difference immediately.


That would be analogous to ABX'ing a lossy encode against a wave file that you have listened to for years.  That might make you more sensitive to any artifacts but a positive result still means you heard them and the blind test still works.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #229
Once you know where to look for that spot of paint, ABXing becomes trivial.  No one has suggested that it is not OK for people to form subjective opinions, just that they should be verified though objective means.

Thank you for your anecdote.  Keep trying.

Weren't you the one who made that point already? Someone did. Anyway, I agreed: an ABX test conducted after long term listening could be a way of dealing with this potential inaccuracy. But I was answering Specific Impulses's question, "What makes you think that long-term listening is at all revealing of differences." The next question is, what kind of experiment could demonstrate or refute that assertion, and have any such experiments been conducted. Other than young Mozart's listening session at the Vatican, I mean.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #230
But I was answering Specific Impulses's question, "What makes you think that long-term listening is at all revealing of differences."

Which is trivial and not particularly useful or interesting if someone is trying to argue that ABX is flawed because some artifacts can only be revealed through long term exposure.  From what I've experienced, a common "artifact" being offered by placebophiles is listening fatigue.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #231
But I was answering Specific Impulses's question, "What makes you think that long-term listening is at all revealing of differences."

Which is trivial and not particularly useful or interesting if someone is trying to argue that ABX is flawed because some artifacts can only be revealed through long term exposure.  From what I've experienced, a common "artifact" being offered by placebophiles is listening fatigue.


Listening fatigue is, I think, another issue. Are you aware of any experiments that seek to quantify that, or the putative benefits of long-term listening? Because in their absence, I don't think there's much of an argument against these two assertions. At first glance, both types of experiment seem to me difficult to design, since statistically significant variations could potentially be attributed to multiple factors.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #232
If someone is going to claim fatigue as a difference, the burden falls on him to demonstrate it.

The ABX skeptics throw up a lot of theories.  They are nowhere near as prolific when it comes to providing evidence.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #233
If someone is going to claim fatigue as a difference, the burden falls on him to demonstrate it.


An AES paper authored by Soren Bech that resulted from research connected with the 1980s Eureka Project demonstrated that listener fatigue in listening tests was an interfering variable. IIRC, his conclusion was that a blind test should be limited to less than one hour, to minimize the effect of the variable. I don't have the reference available right now, but I can post it if requested.

John Atkinson
Editor, Stereophile



"Audiophile" listening event @ Definitive Audio in Seattle

Reply #234
An AES paper authored by Soren Bech that resulted from research connected with the 1980s Eureka Project demonstrated that listener fatigue in listening tests was an interfering variable. IIRC, his conclusion was that a blind test should be limited to less than one hour, to minimize the effect of the variable.


I don't see the relevance with regard to this discussion.

Let a subject S claim that item A sounds better than B. Common sense dictates that for such a claim S must at least be able to distinguish A and B. S can easily demonstrate this by double blindly identifying A and B. Using short time spans to not make it unnecessarily bothersome for him is fine (good practice, no objections).

But what if S claims, that an audible difference is there, but at the same time is unable to identify A and B or even a difference between both (mapping to X) under the latter conditions. At this point it is already likely that S is just pig-headed (unable/unwilling to modify a belief even if surrounded by facts).

But give S another chance to explain why an audible difference should still be assumed, although he could not identify it. S claims that the difference is only audible during prolonged listening. Grant him that in a second, unconstrained double blind installment. If he fails again, a sane conclusion is: there just was no audible difference, S is just fishing for reasons to retroactively justify his mental immobility. The other option is a circular fallacy: there is a difference, but S cannot identify it short-term (because it is only exposed long-term) and S cannot identify it long-term (because long-term testing is too fatiguing).

 

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #235
Using short time spans to not make it unnecessarily bothersome for him is fine (good practice, no objections).


By short timespans, you refer to the sample length, and not the length of a single session?

Because I'd think that doing many short AB[X] sessions would both satisfy the "long-term" requirement that S wants, as well as compensate for listener fatigue.

My point is that when a subjects claims a difference is only heard in the long-term (ie after becoming very familiar with the material), listener fatigue is not a valid complaint, as the ABX sessions can easily be broken down and spread out across days, even weeks. In fact, that would also compensate for tons of other hard to control environmental aspects, including the subject's mood, his ear wax or whether he got a good nights' sleep.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #236
Yes, I agree.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #237
If someone is going to claim fatigue as a difference, the burden falls on him to demonstrate it.


An AES paper authored by Soren Bech that resulted from research connected with the 1980s Eureka Project demonstrated that listener fatigue in listening tests was an interfering variable. IIRC, his conclusion was that a blind test should be limited to less than one hour, to minimize the effect of the variable. I don't have the reference available right now, but I can post it if requested.


The paper in question appears to be online and freely available:

Soren Bech AES paper URL 

Table 1 is particularly interesting. Means of bias reduction that it lists include:

"Use Blind Listening Tests"

and

"Use short, looped recordings with consistent characteristics"

The word *fatigue*  does not appear in the body of the paper. I appears  just //in the title of a work in the footnotes. I see no examples of common synonyms for fatigue, either.

It appears that Mr. Atkinson would do well to actually read this paper and take it to heart, as it contains a number of excellent criticisms of how Stereophile does its listening tests.




"Audiophile" listening event @ Definitive Audio in Seattle

Reply #238
If someone is going to claim fatigue as a difference, the burden falls on him to demonstrate it.


An AES paper authored by Soren Bech that resulted from research connected with the 1980s Eureka Project demonstrated that listener fatigue in listening tests was an interfering variable. IIRC, his conclusion was that a blind test should be limited to less than one hour, to minimize the effect of the variable. I don't have the reference available right now, but I can post it if requested.


The paper in question appears to be online and freely available:
Soren Bech AES paper URL 


I was referring to this paper, reprinted in the July 1992 issue of the Journal of the AES: http://www.aes.org/e-lib/browse.cfm?elib=7040 . I am at work right now and this issue is at home, so I will check the reference this evening.

John Atkinson
Editor, Stereophile

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #239
Because I'd think that doing many short AB[X] sessions would both satisfy the "long-term" requirement that S wants, as well as compensate for listener fatigue.

My point is that when a subjects claims a difference is only heard in the long-term (ie after becoming very familiar with the material), listener fatigue is not a valid complaint, as the ABX sessions can easily be broken down and spread out across days, even weeks. In fact, that would also compensate for tons of other hard to control environmental aspects, including the subject's mood, his ear wax or whether he got a good nights' sleep.


That was the point I was making to greynol. Consider a formal test to examine the audibility of a small but possibly audible effect, which may not be audible on all kinds of music. You need a very large number of trials to bring the power of statistical anlaysis to bear, both on the audibility or lack thereof, and the interdependence between the effect and the music program.

Let's say that 200 ABX trials would give you an answer to a desired level of statistical significance. If you were to continue the testing with a single subject until he had done all 200 trials in a single session, I can coinfidently predict that the results would be no different from chance. But you haven't got meaningful results. What you need to do is allow the subject to do, say, 20 trials, then rest before doing another 20 trials, and so on. Because the time of day may well also be an interfering variable, it might also be best to allow the subject to do each set of 20 trials on consecutive days at the same time. This, for example, is how blind testing is done at B&O's research lab, which Tom Nousaine and I visited last spring.

John Atkinson
Editor, Stereophile

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #240
If someone is going to claim fatigue as a difference, the burden falls on him to demonstrate it.


An AES paper authored by Soren Bech that resulted from research connected with the 1980s Eureka Project demonstrated that listener fatigue in listening tests was an interfering variable. IIRC, his conclusion was that a blind test should be limited to less than one hour, to minimize the effect of the variable. I don't have the reference available right now, but I can post it if requested.


The paper in question appears to be online and freely available:
Soren Bech AES paper URL 


I was referring to this paper, reprinted in the July 1992 issue of the Journal of the AES: http://www.aes.org/e-lib/browse.cfm?elib=7040 . This issue is at home, so I will check the reference this evening.


I just checked the contents of this paper, and find a similar lack of mention of the word fatigue or common synonyms. 

The two parameters that were studied in the referenced paper were related to the results of standard hearing tests and listener training by means of participationin the experiment.

The paper concluded:

"The results show that there is no correlation between the mean hearing threshold level and the mean standard deviation of the ratings for a group of subjects which
are screened to ensure that their hearing threshold level does not exceed 15 dB re ISO 389 [4] from 250 to 8000 Hz.

Note that the test subjects were between the ages of 18 and 28 years old.

"The effects of training are also discussed. The results suggest that 65% of the subjects will reach an asymptotic performance as measured by the magnitude of the error variance and the loudspeaker test statistic after four experiments. The result§ further indicate that the remaining 35% of the subjects will require a total of seven to eight training experiments before they stabilize in performance."

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #241
I think it depends on the type of difference. The brain doesn't analyze them all the same way.


Let's call the above what it is: It is unfounded speculation that there is an elusive kind of distortion that  can be heard in long term tests and can't be heard in short term tests.

That is all it is - unfounded speculation with AFAIK zero reliable real world evidence to back it up.

Quote
As I think I pointed out in another post, frequency response aberrations are analyzed almost immediately by what appears to be highly specialized neural circuitry.


I notice that you are ignoring the fact that counter-evidence already cited was *not* based on frequency response AKA linear distortion, It was based on nonlinear distortion.  Whether you know it or not, there are only two known kinds of distortion in this particular universe which are  linear distortion and nonlinear distortion. Other relevant facts are that both linear and nonlinear distortion change the spectral balance of sounds and  the ear is primarily a spectrum analyzer.

Quote
What goes into long-term memory is for most of us an analyzed and simplified version of results.


Common sense would take the above as a good explanation of why long term listening is so demonstrably insensitive to small differences.

AFAIK the only thing that long term listening is good for is finding those small portions of real-world recordings that reliably elicit the perception of an audible difference. The literature of good listening tests repeatedly shows that once those small segments are found, short-term listening to small snippets is the most effective way to demonstrate the presence of a difference.


"Audiophile" listening event @ Definitive Audio in Seattle

Reply #242
That was the point I was making to greynol. Consider a formal test to examine the audibility of a small but possibly audible effect, which may not be audible on all kinds of music.


It is almost a given that small differences are only audible on a very small percentage of all music.  In some cases the number of recordings where the difference is audible is a tiny fraction of all recordings and the portion of recordings where it is audible is only a tiny percentage of those few recordings.

Quote
You need a very large number of trials to bring the power of statistical analysis to bear, both on the audibility or lack thereof, and the interdependence between the effect and the music program.


No.  Once you have formed your short list of candidate relevant recordings, you need to do more listening to just them to find the few most relevant segments. This can be done primarily using sighted evaluations.  Blind tests need only be done in the final qualification stage. Only a small number of trials need to be done for each candidate selection. You usually only need a few final selections to run your actual test.

The other thing is that perchance you actually find something that takes say 50 trials per individual to obtain statistical significance, the listeners invariably report that they think that you are investigating something that makes no practical difference even though they obtained positive results. In many cases the listeners have no perception that they obtained statistically significant results. They thought they were just guessing.




"Audiophile" listening event @ Definitive Audio in Seattle

Reply #243
That was the point I was making to greynol. Consider a formal test to examine the audibility of a small but possibly audible effect, which may not be audible on all kinds of music. You need a very large number of trials to bring the power of statistical anlaysis to bear, both on the audibility or lack thereof, and the interdependence between the effect and the music program.

How convenient.  People claim to be fatigued by sighted listening to otherwise transparent lossy or CDDA but not by sighted listening to a high resolution counterpart, but they cannot be tested on this because unsighted listening is fatiguing???

EDIT: I fixed my quote.  It made no impact on my statement, however.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #244
I think it depends on the type of difference. The brain doesn't analyze them all the same way.


Let's call the above what it is: It is unfounded speculation that there is an elusive kind of distortion that  can be heard in long term tests and can't be heard in short term tests.

That is all it is - unfounded speculation with AFAIK zero reliable real world evidence to back it up.

That's not what I said above. I said that the brain doesn't analyze all auditory differences in the same way. There is abundant, even overwhelming evidence for this, beginning with the fact that comb filtering is often interpreted by the brain as spatial consequences of reflections or the HRTF rather than as response anomalies, even when the anomalies are of a magnitude that surpasses the known thresholds for audibility. It does this processing almost in real time. You can see some of the anomalies *while* you move your head; once your head is stationary, the brain quickly compensates for the response differences.

I made no reference to some kind of mystery distortion, nor do I posit one.

Quote
Quote
As I think I pointed out in another post, frequency response aberrations are analyzed almost immediately by what appears to be highly specialized neural circuitry.


I notice that you are ignoring the fact that counter-evidence already cited was *not* based on frequency response AKA linear distortion, It was based on nonlinear distortion.  Whether you know it or not, there are only two known kinds of distortion in this particular universe which are  linear distortion and nonlinear distortion. Other relevant facts are that both linear and nonlinear distortion change the spectral balance of sounds and  the ear is primarily a spectrum analyzer.

You have stated my own assumptions. I had thought them so obvious as to be trivial.

Quote
What goes into long-term memory is for most of us an analyzed and simplified version of results.
Quote

Common sense would take the above as a good explanation of why long term listening is so demonstrably insensitive to small differences.

AFAIK the only thing that long term listening is good for is finding those small portions of real-world recordings that reliably elicit the perception of an audible difference. The literature of good listening tests repeatedly shows that once those small segments are found, short-term listening to small snippets is the most effective way to demonstrate the presence of a difference.

I think the key phrase here is "AFAIK." It is, I grant, a distinct possibility. After all, in identifying the snippets that elicit the perception of an audible difference, long term memory would seem to have done the analytical work that gives it utility. That could be true even if the analysis was performed by somebody else. To use an analogy, someone not familiar with my bedroom would have great difficulty finding the paint chip if he were presented with images of the entire border, but would rapidly identify it if comparing preselected closeups of the specific region.

I suspect that there are still some differences that would be more identifiable in long term tests, based on the fact that even short audio snippets put heavy demands on short term memory. But that must remain speculation; I haven't seen any objective evidence for it, and some subjective observations, while perhaps food for thought, would not be allowable here and certainly don't have scientific weight.

We are both speculating here, not necessarily a bad thing, since it lays the ground for further research.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #245
but they cannot be tested on this because unsighted listening is fatiguing???


You're sceptical about the mental effort required to give one's full attention to a complete ABX test?

I can't quantify it, though. When I want to have a go, I use foobar's ABX'er, but commonly tend to quit after a single incomplete run because I almost never hear a difference and my attention span doesn't last very long when I suspect the futility of my efforts.

Othe people may have greater discipline to complete several sittings.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #246
The other thing is that perchance you actually find something that takes say 50 trials per individual to obtain statistical significance, the listeners invariably report that they think that you are investigating something that makes no practical difference even though they obtained positive results. In many cases the listeners have no perception that they obtained statistically significant results. They thought they were just guessing.


That, I think, is one of the most interesting facts in this thread.

I wonder to what extent such differences, which are apparently analyzed by some part of the brain but apparently not  others, affect our experience of listening to music. It's akin to what I was wondering about 320 kbit/s MP-3's. As I said earlier, never having done any formal comparisons, I've never noticed any difference between high bit rate MP-3's and the uncompressed source material. But assuming that my ears aren't broken and I could be trained to identify them, would that training then be the sole determinant of my listening experience, or are the differences that I don't notice consciously influencing and perhaps diminishing my subjective experience, because perceived on a lower level?

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #247
That was the point I was making to greynol. Consider a formal test to examine the audibility of a small but possibly audible effect, which may not be audible on all kinds of music. You need a very large number of trials to bring the power of statistical anlaysis to bear, both on the audibility or lack thereof, and the interdependence between the effect and the music program.

How convenient.  People claim to be fatigued by sighted listening to otherwise transparent lossy or CDDA but not by sighted listening to a high resolution counterpart, but they cannot be tested on this because unsighted listening is fatiguing???


As I said in the message of mine that you deleted, I have not said that. Nor does it follow from the message of mine from which you quoted. I was making a general argument concerning the possibility of listener fatigue acting as an interfering variable in formal listening tests and what can be done to prevent that from occurring.

John Atkinson
Editor, Stereophile

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #248
I wonder to what extent such differences, which are apparently analyzed by some part of the brain but apparently not  others, affect our experience of listening to music. It's akin to what I was wondering about 320 kbit/s MP-3's. As I said earlier, never having done any formal comparisons, I've never noticed any difference between high bit rate MP-3's and the uncompressed source material. But assuming that my ears aren't broken and I could be trained to identify them, would that training then be the sole determinant of my listening experience, or are the differences that I don't notice consciously influencing and perhaps diminishing my subjective experience, because perceived on a lower level?


Reality is that differences that you can ABX in a heartbeat often have zero effect on your listening pleasure.

For example, ABX the same musical selection, one unchanged, the other attenuated by a dB or two.  In open listening both are equally enjoyable. Yet you can easily ABX them and hear a difference.

Just enough nonlinear distortion to be easily ABX-able can have similar effects, even high orer distortion. As I pointed out in a previous post, one of the subjective effects of modest amounts of  common forms of nonlinear distortion might be a slight shift in timbre.

While we call common forms of nonlinear distortion "grunge", it doesn't sound like grunge in modest amounts - enough to be just barely reliably detectable.  You hear it as a barely discernable difference that you can't detect at all without a ready undistorted referemce. Listen to it all day, and you'll never know that it is there.  The things that actually detract from your listening pleasure are pretty gross.

"Audiophile" listening event @ Definitive Audio in Seattle

Reply #249
As I said in the message of mine that you deleted, I have not said that. Nor does it follow from the message of mine from which you quoted. I was making a general argument concerning the possibility of listener fatigue acting as an interfering variable in formal listening tests and what can be done to prevent that from occurring.

You didn't suggest that ABX testing causes fatigue?  If you did then your complaints are a non sequitur.  My question was not posed to you personally.  If it were I would have addressed you personally.  This is a public forum, John; my comments are open for anyone to respond.