Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: How do you listen to an ABX test? (Read 342707 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

How do you listen to an ABX test?

Reply #25
Only true if you haven't seen it happen in many places at many times. Many of us have been there and done that. I am surprised that you are surprised... ;-)

Start here: Neil Young Hates MP3s for fun and profit

The highest profile example of this sort of thing we've seen lately is probably the story of Neil Young, the Kickstarter web site, and  the Pono digital player which you are invited to Google.

Hey, thanks for the invitation to use Google... ;-)
Your being surprised at my being surprised made me try to answer my questions above on my own, so I took you up on your invite and read a bunch of stuff. I was already familiar with most of it, but I read some new stuff, including the Wired article (new to me). The flaws in that article are so numerous that listing *all* of them and explaining why each is a flaw would result in a post longer than the article. I'm not opposed to writing, but that task would be boring... you know, fish in a barrel..

But I'm still left with multiple questions, the biggest being: why do *YOU* do ABX tests? By *YOU*, I mean the 7 who have answered on this thread (mzil, castleofargh, Cubist Castle, pelmazo, xnor, krabapple, eric.w) and the OP (Arnold B. Krueger). I can think of 3 answers for me personally: 1-I want to buy something (perhaps expensive) and I want to be sure; 2-I want to make a claim of an audible difference on a hobbyist website (like HA) or in a non-scientific magazine (like Wired); 3-I want to publish in a journal. For me, practicality would be a big thing. To validly compare 2 files would be easy to describe and easy to do. To compare hardware would be easy to describe, but a big hassle to do. So comparing formats (AAC vs. RBCD vs. HiRes) for cases 1 (buy from iTunes Store or CD or HDTracks?) and 2 (write about it informally) would be clear. But case 3 and *all* hardware comparisons would require a cost-benefit decision that I'm trying to imagine the 8 people who have posted would have done. Why have *YOU* done it? Telling me about salespeople's claims or bad articles doesn't explain why you would. This is not meant as a challenge; I'm curious.

Also about the Wired link: when I buy a "mixed" book of puzzles, I skip the "easy" and go for challenging (to me, of course). The Wired article is a waste of anybody's time, IMO. It's not clear why you linked it. What article or claim (about Pono or HiRes or whatever you find relevant) do you find most challenging to answer? And what is your answer? (I don't expect you to repeat a large effort... I'd guess you could just link a challenging article and link a response you've already made.... if you have time.) TIA

How do you listen to an ABX test?

Reply #26
[Your being surprised at my being surprised made me try to
But I'm still left with multiple questions, the biggest being: why do *YOU* do ABX tests? By *YOU*,
I mean the 7 who have answered on this thread (mzil, castleofargh, Cubist Castle, pelmazo, xnor, krabapple, eric.w) and the OP (Arnold B. Krueger).
I can think of 3 answers for me personally:
1-I want to buy something (perhaps expensive) and I want to be sure;
2-I want to make a claim of an audible difference on a hobbyist website (like HA) or in a non-scientific magazine (like Wired);
3-I want to publish in a journal. For me, practicality would be a big thing.
To validly compare 2 files would be easy to describe and easy to do.
To compare hardware would be easy to describe, but a big hassle to do.
So comparing formats (AAC vs. RBCD vs. HiRes) for cases 1 (buy from iTunes Store or CD or HDTracks?) and 2 (write about it informally) would be clear. B
ut case 3 and *all* hardware comparisons would require a cost-benefit decision that I'm trying to imagine the 8 people who have posted would have done.
Why have *YOU* done it? Telling me about salespeople's claims or bad articles doesn't explain why you would. This is not meant as a challenge; I'm curious.


(1) Lately one of the big drivers for doing DBTs has been the fact that the thresholds for the audibility of jitter are complex and not nailed down tightly enough to please me.
(2) the general driver for me doing DBTs is that I don't always trust my more-casual perceptions and need periodic reality checks.

Quote
Also about the Wired link: when I buy a "mixed" book of puzzles, I skip the "easy" and go for challenging (to me, of course).
The Wired article is a waste of anybody's time, IMO. It's not clear why you linked it.
What article or claim (about Pono or HiRes or whatever you find relevant) do you find most challenging to answer?


To answer them for a reasonable, well informed person: None of them are challenging.

However the world is full of poorly informed people who may also have poor critical reasoning skills.

Quote
And what is your answer? (I don't expect you to repeat a large effort...
I'd guess you could just link a challenging article and link a response you've already made.... if you have time.) TIA


The last such thing that I have put much effort into is this article:

AES Conference paper about alleged problems with digital players and high resolution audio



How do you listen to an ABX test?

Reply #27
One of the reasons why I conduct ABX tests on myself is to document to others that I can easily replicate, with strong statistical significance, the ability to distinguish between two files that are part of a posted challenge of Hi-res audio vs. standard CD quality versions (16/44), the former being claimed by some of the "golden-eared"/"trained" con artists and snake oil peddlers which frequent the audio forums as being "better", so as to discredit their posted test results and expose to all that there was simply some tiny difference with, for example, minor level differences and/or time alignment between the two files. I did this in the AVS forum last year with their AIX records challenge.

How do you listen to an ABX test?

Reply #28
Only true if you haven't seen it happen in many places at many times. Many of us have been there and done that. I am surprised that you are surprised... ;-)

Start here: Neil Young Hates MP3s for fun and profit

The highest profile example of this sort of thing we've seen lately is probably the story of Neil Young, the Kickstarter web site, and  the Pono digital player which you are invited to Google.

Hey, thanks for the invitation to use Google... ;-)
Your being surprised at my being surprised made me try to answer my questions above on my own, so I took you up on your invite and read a bunch of stuff. I was already familiar with most of it, but I read some new stuff, including the Wired article (new to me). The flaws in that article are so numerous that listing *all* of them and explaining why each is a flaw would result in a post longer than the article. I'm not opposed to writing, but that task would be boring... you know, fish in a barrel..

But I'm still left with multiple questions, the biggest being: why do *YOU* do ABX tests? By *YOU*, I mean the 7 who have answered on this thread (mzil, castleofargh, Cubist Castle, pelmazo, xnor, krabapple, eric.w) and the OP (Arnold B. Krueger). I can think of 3 answers for me personally: 1-I want to buy something (perhaps expensive) and I want to be sure; 2-I want to make a claim of an audible difference on a hobbyist website (like HA) or in a non-scientific magazine (like Wired); 3-I want to publish in a journal. For me, practicality would be a big thing. To validly compare 2 files would be easy to describe and easy to do. To compare hardware would be easy to describe, but a big hassle to do. So comparing formats (AAC vs. RBCD vs. HiRes) for cases 1 (buy from iTunes Store or CD or HDTracks?) and 2 (write about it informally) would be clear. But case 3 and *all* hardware comparisons would require a cost-benefit decision that I'm trying to imagine the 8 people who have posted would have done. Why have *YOU* done it? Telling me about salespeople's claims or bad articles doesn't explain why you would. This is not meant as a challenge; I'm curious.

Also about the Wired link: when I buy a "mixed" book of puzzles, I skip the "easy" and go for challenging (to me, of course). The Wired article is a waste of anybody's time, IMO. It's not clear why you linked it. What article or claim (about Pono or HiRes or whatever you find relevant) do you find most challenging to answer? And what is your answer? (I don't expect you to repeat a large effort... I'd guess you could just link a challenging article and link a response you've already made.... if you have time.) TIA


I do some abx tests when in doubt. I did a series of abx over the years to pick my file format on portable gears.  some decide they were born knowing, some ask on the web and trust answers from all over the place. I decided in such occasions that as it was for my ears, I should be the one doing the test. I knew from a few delusional failures how sighted tests are more often than not in audio, just useless crap! so I go for abx when it can answer my question, and something else when it cannot or my answer could be satisfied by measurements.

I also use ABX when I'm curious about my own limits, like making tracks with added noise or music over another music at different loudness to find out where noise really matters to me in practice. Arny posted a few cool files over the years that my curiosity just couldn't resist(last one I tried was with jitter I guess). those kind of stuff. I'm ignorant and curious so opportunities are all around me .

and I guess I'm the opposite of mzil as I never ever do an abx to prove something to anybody. in fact I have never published one result on a forum. at best I would mention that I failed or passed an abx. but that's it.

How do you listen to an ABX test?

Reply #29
Thanks Arnold B. Krueger! (can I call you Arny? Everyone else seems to.) I'll read the Stuart paper and the 2 threads about it on HA that I found.
Thanks mzil, I'll read the AIX thread on AVS. Are you m.zillch on AVS?
Wow that'll be a lot of reading... but I enjoy it.
Thanks castleofargh. Your usage seems like the closest to what I would do. I wouldn't put sighted testing (my own) at the bottom of the list, although I know the high risk of biases. Salespeople, listening claims from people with unknown or suspect motives, and zero information fall below it for me. But when I use the word blatant, I know how I mean it, and some things would be blatant to me (iPhone speaker vs. Audioengine 2 = no brainer, blatant). Because I know I'm prone to bias, I too will do some blind ABXing of audio file formats. Did you mention in the forums your ABX pass/fail results for your portable file format choice?
Any suggestions on how ABXphile people test headphones? I can't really do that blind. Or can I?

EDIT: mzil, did you mean "AVS/AIX High-Resolution Audio Test: Take 2" or are your ABX comments in the first part?

How do you listen to an ABX test?

Reply #30
I doubt that most people would bother ABX testing of headphones because headphones do sound very different from each other. Testing of headphones would fall into the category of personal preference, and you are certainly entitled to your personal preference.

Of course, sighted evaluation of headphones is subject to bias, but this may be a case where it is OK to be influenced by factors other than how they sound.

How do you listen to an ABX test?

Reply #31
Thanks Arnold B. Krueger! (can I call you Arny? Everyone else seems to.) I'll read the Stuart paper and the 2 threads about it on HA that I found.
Thanks mzil, I'll read the AIX thread on AVS. Are you m.zillch on AVS?
Wow that'll be a lot of reading... but I enjoy it.
Thanks castleofargh. Your usage seems like the closest to what I would do. I wouldn't put sighted testing (my own) at the bottom of the list, although I know the high risk of biases. Salespeople, listening claims from people with unknown or suspect motives, and zero information fall below it for me. But when I use the word blatant, I know how I mean it, and some things would be blatant to me (iPhone speaker vs. Audioengine 2 = no brainer, blatant). Because I know I'm prone to bias, I too will do some blind ABXing of audio file formats. Did you mention in the forums your ABX pass/fail results for your portable file format choice?
Any suggestions on how ABXphile people test headphones? I can't really do that blind. Or can I?

EDIT: mzil, did you mean "AVS/AIX High-Resolution Audio Test: Take 2" or are your ABX comments in the first part?

I'm very suspicious of myself because of my own cognitive biases, and the ideas I tend to develop without control on them. it's easy enough to get an opinion on something when I get it and try it without control. most of the times unless there is something really wrong, I will just confirm that I was right about my preconceptions(that's what a brain does all day long). then I'll put together some matter of a controlled test like abx when I can. and poof! half of it blows up to my face. so from my own experiences, not using a controlled test is taking a big chance at being wrong and make a fool of myself. something that very obviously doesn't bother many audiophiles...
but I guess it's not ignorance toward audio, but more ignorance about how humans really are. and big ego will always be in the way of auto-evaluation.
to me, most serious studies in sciences are done with blind tests for a reason. the same reason there are marketing schools .

I doubt  that most people would bother ABX testing of headphones because  headphones do sound very different from each other. Testing of  headphones would fall into the category of personal preference, and you  are certainly entitled to your personal preference.

Of course,  sighted evaluation of headphones is subject to bias, but this may be a  case where it is OK to be influenced by factors other than how they  sound.

you can't abx headphones. even if they were to feel the same on my head, and I closed my eyes, the delay to switch would be too much for that purpose.


How do you listen to an ABX test?

Reply #33
Thanks mzil, I'll read the AIX thread on AVS. Are you m.zillch on AVS? Wow that'll be a lot of reading... but I enjoy it.
Yes. This post cuts to the chase:

http://www.avsforum.com/forum/91-audio-the...ml#post28355562


Does this post reference the defective files that Amir and Fremer listened to and then claimed that they obtained world changing positive results?

BTW I have disqualified the test files that I provided that were mentioned in post #150 same thread, on the grounds that their downsampling involved digital filters with unrealistically narrow transition bands.

The transition band that I used for that particular downsampling job was about 100 Hz wide, while a typical real world 44.1 KHz DAC has a transition band that might be several KHz wide. The lowest quality that CEP 2.1 provides is pretty close to delivering a 1.5 KHz transition band @ 44.1 KHZ. The highest quality setting is what I used.

Graphics here: Link to Transition band graphics in uploads forum

How do you listen to an ABX test?

Reply #34
Thanks mzil, I'll read the AIX thread on AVS. Are you m.zillch on AVS? Wow that'll be a lot of reading... but I enjoy it.
Yes. This post cuts to the chase:  http://www.avsforum.com/forum/91-audio-the...ml#post28355562
  Does this post reference the defective files that Amir and Fremer listened to and then claimed that they obtained world changing positive results?
The two files are the exact same time code, about 1m52s into the AIX records' song, provided for the AVS forum challenge by Dr. Waldrup, called "Mosaic". These are the newer A2 and B2 versions which are said to have corrected a small level mismatch found in the original released versions, hence the number "2" in both of their designations.

I stopped reading any material from Fremer in the 1980/90s and Amir some time last year, so I can't comment on their propaganda. Krabapple may know more about what they claim.

How do you listen to an ABX test?

Reply #35
But I'm still left with multiple questions, the biggest being: why do *YOU* do ABX tests? By *YOU*, I mean the 7 who have answered on this thread (mzil, castleofargh, Cubist Castle, pelmazo, xnor, krabapple, eric.w) and the OP (Arnold B. Krueger).


To find out if something really sounds different to me, typically when there debate about whether it sound different to *anyone*.

Sometimes it does (p<0.05) sometimes not.

If you hang around audio forums long enough, plenty of such debates arise.

Perhaps you are a bit new to this ? 

How do you listen to an ABX test?

Reply #36
People don't talk about it very much but fb2k ABX is a fantastic sighted listening aid as much as it is a double blind testing tool [just click A and B and never even examine X]. It let's you pick whatever files you want, synchronizes their playback [assuming they were made properly], applies DSP or Replaygain optionally, switches at any point you want, loops a favorite section, and most importantly switches nearly instantaneously between A and B at the listener's discretion. Echoic memory is fleeting and being able to flip between two options so quickly and easily greatly improves one's sensitivity.

Putting this switching control in the listener's hand also is important. When I read about a/b testing where the switchovers are not in the control of the test listener but instead are done at predetermined time marks or at the control of the test administrator, I always think to myself how much better the listeners would have done had they been the ones pressing the button. Bob Stuart's recent paper's trashing of CD quality sound in order to promote Hi-re$ examination of digital filters would be an example of that.

How do you listen to an ABX test?

Reply #37
… most serious studies in sciences are done with blind tests for a reason.
Well, I know what you mean - human studies where knowledge of test parameters could influence results - but *most* studies don’t fall in this category. You don’t need to do blind testing with protons, yeast or fruit flies… you just need to control the variables. ;-)
I doubt  that most people would bother ABX testing of headphones because  headphones do sound very different from each other. Testing of  headphones would fall into the category of personal preference, and you  are certainly entitled to your personal preference.

Of course,  sighted evaluation of headphones is subject to bias, but this may be a  case where it is OK to be influenced by factors other than how they  sound.
you can't abx headphones. even if they were to feel the same on my head, and I closed my eyes, the delay to switch would be too much for that purpose.
Sorry about picking nits, but I *could* ABX headphones if I wanted to publish the results, but it would just be too much trouble for a personal purchase. That was why I am asking some basic questions, trying to understand how others do it, and whether there are shortcuts (hardware/software) to minimise the hassle. I find the responses here informative. Thanks, all.
Echoic memory is fleeting and being able to flip between two options so quickly and easily greatly improves one's sensitivity.
It seems a lot of people are very focussed on echoic memory. It would seem that this is crucially important for *some* tests, but inappropriate, perhaps counterproductive for others. If I want to compare two pure tones with a:slightly differing amplitudes, constant pitch, or b:slightly differing pitch, constant amplitude. I would certainly be concerned with echoic memory. But if I wanted you to identify which instrument is played for an E plucked on a lute or mandolin, you need a long enough sample to identify it and could probably remember your answer for hours or days. My understanding is that auditory memory goes through 3 stages: perceptual auditory storage (aka echoic memory), which lasts up to 300 ms; synthesised auditory memory, lasting 1 to 30 sec; and generated abstract memory, which can last very long. For headphones or speakers, am I looking for subtle differences that will disappear from memory quickly or are the differences “abstractable” so I can remember them? I suspect the latter may be important to me. I agree with pdq that non-sonic factors will play an important subjective role, e.g. comfort and (for me) cost.
If you hang around audio forums long enough, plenty of such debates arise.

Perhaps you are a bit new to this ?
LOL, yes, I’m new to audio forums (since Dec.) and only recently started posting. There is a low SNR on most forums, so I thought I’d ask some questions of my own. I have to admit that there is some short-lived entertainment value (ala Jerry Springer Show) in the “punch-outs”. But they tend to distract when seeking info. Thanks for responding about your use.
People don't talk about it very much but fb2k ABX is a fantastic sighted listening aid as much as it is a double blind testing tool [just click A and B and never even examine X]. It let's you pick whatever files you want, synchronizes their playback [assuming they were made properly], applies DSP or Replaygain optionally, switches at any point you want, loops a favorite section, and most importantly switches nearly instantaneously between A and B at the listener's discretion.
That’s a useful tip. Thanks. I use Macs and downloaded a program called ABXTester (not *nearly* as nice as fb2k that you describe). I also have Parallels (running Windows 7) on one of my Macs. Does anyone know if that works well?

… I always think to myself how much better the listeners would have done had they been the ones pressing the button. Bob Stuart's recent paper's trashing of CD quality sound in order to promote Hi-re$ examination of digital filters would be an example of that.
Are you saying you believe their results would have been better with listener-switching? I’ll read the paper soon, and the HA threads about it, to further my education.

How do you listen to an ABX test?

Reply #38
I doubt that most people would bother ABX testing of headphones because headphones do sound very different from each other.


AFAIK there is no controversy over whether or not headphones sound different from each other. The measured differences in important areas are well above known and even highly conservative thresholds of audibility. Furthermore headphones feel different on the head so there are irreducable non-audible factors in the evaluation.

ABX was not designed for headphone or speaker testing. It was designed for those situations where there is a serious question as to whether an audible difference even exists.

Quote
Testing of headphones would fall into the category of personal preference, and you are certainly entitled to your personal preference.


I think that some attributes of headphones rise above mere personal preference such as comfort particularly long term comfort, dynamic range, nonlinear distortion, isolation, and smoothness of response. 

Frequency response at the ear drum is subject to natural variations based on how the headphones and particularly earphones interface with the ear.

Quote
Of course, sighted evaluation of headphones is subject to bias, but this may be a case where it is OK to be influenced by factors other than how they sound.


IME people's preferences for headphones are strongly affected by frequency response which is easy to manage with equalization provided the headphones are reasonably smooth, extended, and have good dynamic range.  Unlike speakers the frequency response of headphones is well described by a single value per frequency for the acoustic signal, which is air pressure in the hearing canal.

How do you listen to an ABX test?

Reply #39
Are you saying you believe their results would have been better with listener-switching?


One of the founding principles of ABX testing as most people here know it is that the most sensitive results are obtained when the listening test allows the listener to interact with the process.

Experience shows that the selection of the segment of music used in the comparison is a very important parameter.

To establish context, I'm referring to this article:

The Audibility of Typical Digital Audio Filters in a High-Fidelity Playback System

The conference paper is based on a straw man argument against ABX. The ABX test that they criticize is the 1950 version which was not interactive. The ABX test that has been widely used for audio component testing during the past approximate 38 years is highly interactive.

I find it curious that in his recent 3/16/2015 response, John Stuart continues to make this rather grotesque error, even though he has been publicly corrected for it since 2/23/2015. Old dogs, new tricks or tacit admission that without the error, his criticism of ABX simply has no basis?

If you look at Stuart's 3/16/2015 one might find a number of criticisms of the listening test that he used in his conference paper. The music segments used in the conference paper appear to have been highly arbitrarily chosen and listened to non-interactively.

Quote
I’ll read the paper soon, and the HA threads about it, to further my education.


You may wish to review these posts about modern ABX testing of audio components:

Forum post explaining modern ABX testing #1

Forum post explaining modern ABX testing #2

How do you listen to an ABX test?

Reply #40
Echoic memory is fleeting and being able to flip between two options so quickly and easily greatly improves one's sensitivity.

It seems a lot of people are very focused on echoic memory.


As is a lot of the scientific literature.

Quote
It would seem that this is crucially important for *some* tests, but inappropriate, perhaps counterproductive for others. If I want to compare two pure tones with a:slightly differing amplitudes, constant pitch, or b:slightly differing pitch, constant amplitude. I would certainly be concerned with echoic memory.


I disagree. Those are all IME simple attributes that can be dealt with by abstract memory. I don't think that people have to memorize pure tones at every possible frequency to know what both pure and impure tones sound like.  I'm under the impression that when listening to tones I discern abstract properties such as steadiness of pitch and loudness in something that is pretty close to real time and I don't have to have a memory of what every different frequency sounds like to do this. Along the lines I have learned different rules for what a pure tone sounds like at vastly different frequencies but over wide ranges, the its the rules, not any specific memory that dictates my judgement.

Quote
But if I wanted you to identify which instrument is played for an E plucked on a lute or mandolin, you need a long enough sample to identify it and could probably remember your answer for hours or days.


I again disagree. A lot hinges on what properties of the lute note are changing between the samples. If the changing property is one that is familiar to me such as pitch, loudness or timbre, then things work more or less as is suggested above. But if the property that is changing between the samples is unfamiliar to me then at least initially echoic memory probably has a lot to do with it. As I listen to the comparison more often I often find that the property that is changing is added to my internal multidimensional list list of properties of lute notes, and then I can detect those new differences based on synthesized auditory memories.
 

Quote
My understanding is that auditory memory goes through 3 stages: perceptual auditory storage (aka echoic memory), which lasts up to 300 ms; synthesized auditory memory, lasting 1 to 30 sec; and generated abstract memory, which can last very long.


Agreed. That is a workable list. It may not be complete.

Quote
For headphones or speakers, am I looking for subtle differences that will disappear from memory quickly or are the differences “abstractable” so I can remember them? I suspect the latter may be important to me. I agree with pdq that non-sonic factors will play an important subjective role, e.g. comfort and (for me) cost.


I think that things like purchase decisions would ideally be based on abstractable differences. I think that most subjective reviews pretend to be based on abstractable differences but due to the low quality listening tests that those reviews are based on, many of the purported abstractable differences are either wrong, poorly expressed, or purely based on imagination.

How do you listen to an ABX test?

Reply #41
It seems a lot of people are very focused on echoic memory.

As is a lot of the scientific literature.

Hmmm. References? I just double-checked: auditory memory researchers do not focus so heavily on echoic memory. It is one link in the chain. Check out Nelson Cowan's work. He's the best known (and most cited) researcher in this area.
Quote
It would seem that this is crucially important for *some* tests, but inappropriate, perhaps counterproductive for others. If I want to compare two pure tones with a:slightly differing amplitudes, constant pitch, or b:slightly differing pitch, constant amplitude. I would certainly be concerned with echoic memory.

I disagree. Those are all IME simple attributes that can be dealt with by abstract memory. I don't think that people have to memorize pure tones at every possible frequency to know what both pure and impure tones sound like.  I'm under the impression that when listening to tones I discern abstract properties such as steadiness of pitch and loudness in something that is pretty close to real time and I don't have to have a memory of what every different frequency sounds like to do this. Along the lines I have learned different rules for what a pure tone sounds like at vastly different frequencies but over wide ranges, the its the rules, not any specific memory that dictates my judgement.

In order to do a differential threshold test, as I describe, you must use echoic memory. You can't abstract a loudness. Sorry, it won't work. If you allow even a few seconds between samples, the threshold value will be incorrectly too high.

Quote
But if I wanted you to identify which instrument is played for an E plucked on a lute or mandolin, you need a long enough sample to identify it and could probably remember your answer for hours or days.

I again disagree. A lot hinges on what properties of the lute note are changing between the samples. If the changing property is one that is familiar to me such as pitch, loudness or timbre, then things work more or less as is suggested above. But if the property that is changing between the samples is unfamiliar to me then at least initially echoic memory probably has a lot to do with it. As I listen to the comparison more often I often find that the property that is changing is added to my internal multidimensional list list of properties of lute notes, and then I can detect those new differences based on synthesized auditory memories.

Your identification of the lute, as a lute, relies on abstract memory, and your remembering your identification has nothing to do with echoic memory - pure abstract memory.

Quote
My understanding is that auditory memory goes through 3 stages: perceptual auditory storage (aka echoic memory), which lasts up to 300 ms; synthesized auditory memory, lasting 1 to 30 sec; and generated abstract memory, which can last very long.

Agreed. That is a workable list. It may not be complete.

Nelson Cowan calls it complete. How would you complete it?

Quote
For headphones or speakers, am I looking for subtle differences that will disappear from memory quickly or are the differences “abstractable” so I can remember them? I suspect the latter may be important to me. I agree with pdq that non-sonic factors will play an important subjective role, e.g. comfort and (for me) cost.

I think that things like purchase decisions would ideally be based on abstractable differences. I think that most subjective reviews pretend to be based on abstractable differences but due to the low quality listening tests that those reviews are based on, many of the purported abstractable differences are either wrong, poorly expressed, or purely based on imagination.
I think I mostly agree, but what do you mean with "due to the low quality listening tests that those reviews are based on"? For me, many of the pretty words used in audio reviews don't have meaning. But some things like "soundstage", I assume to mean an ability to localise the source of sounds in 3D (2D?). I have done this with live recordings of a small number of instruments. I think the term so often used in the VR literature is "presence". If I can abstract some ideas that relate to a feeling of immersion or presence, I might be able to compare headphones. But as has been extensively discussed in the VR literature, these factors are *heavily* influenced by other factors (biases). So then I come full circle: using very bias-able methods to decide. Oh well, that's life.

How do you listen to an ABX test?

Reply #42
In order to do a differential threshold test, as I describe, you must use echoic memory. You can't abstract a loudness.

If you allow even a few seconds between samples, the threshold value will be incorrectly too high.


That hinges on what is considered "Too high".  For a few tenths of a dB I do need to hear the samples very close together. For several dB, I can walk into a room stone cold and guess the SPL value with a reasonable tolerance.  Most important differences in quality usually involve a fair number of dB.

Quote
Your identification of the lute, as a lute, relies on abstract memory, and your remembering your identification has nothing to do with echoic memory - pure abstract memory.


I was not talking about the identification of a lute sound as sounding like a lute, but rather I was talking about some more subtle difference in the a particular lute sound that differs from the usual lute sound.  Note that a lute sounds like a lute over a fairly wide range of SPLs, timbres  and fundamental frequencies even though you may have never heard that timbre, fundamental frequency or SPL before.

Real world example. An amplifier or a MP3 encoder makes an unfamiliar kind of audible error when processing a certain kind of lute note. It is a new kind of error of a class that I've never heard before which particularly MP3 encoders are prone to do.  It sounds like a lute but it sounds wrong in a new and different way.

What I'm describing is shifting reliance on auditory memory to learning an abstraction and then to relying on the abstraction.

Quote
Quote
My understanding is that auditory memory goes through 3 stages: perceptual auditory storage (aka echoic memory), which lasts up to 300 ms; synthesized auditory memory, lasting 1 to 30 sec; and generated abstract memory, which can last very long.

Agreed. That is a workable list. It may not be complete.

Nelson Cowan calls it complete. How would you complete it?


What about working memory?  Cowan seems to believe in that.

Quote
I think I mostly agree, but what do you mean with "due to the low quality listening tests that those reviews are based on"?


Listening tests that are not based on close, matched comparisons but need to be.
Listening tests done by people who aren't really that familiar with listening tests or the item being compared.
Listening tests that actually involve small differences and need to be done blind to be valid, but aren't.

Quote
But some things like "soundstage", I assume to mean an ability to localise the source of sounds in 3D (2D?). I have done this with live recordings of a small number of instruments. I think the term so often used in the VR literature is "presence".


Agreed.  IME soundstaging is frequently abused. It can be a catch-all.

Quote
If I can abstract some ideas that relate to a feeling of immersion or presence, I might be able to compare headphones.


You might, but first you might want answer the question - if you hold everything else reasonably constant, do most headphones even soundstage differently?

Why should headphones of a general kind ( say closed and sealed to the head) even soundstage differently?

How much of the perception of different soundstaging be due to unmatched frequency response or just unmatched levels?

Quote
But as has been extensively discussed in the VR literature, these factors are *heavily* influenced by other factors (biases). So then I come full circle: using very bias-able methods to decide. Oh well, that's life.


In reality - get things right within a few dB and as the listener listens, his FR biases and preferences change, and things start sounding more familiar and therefore right to him.

Some call that "Equipment break-in". ;-)

How do you listen to an ABX test?

Reply #43
In order to do a differential threshold test, as I describe, you must use echoic memory. You can't abstract a loudness.

If you allow even a few seconds between samples, the threshold value will be incorrectly too high.

That hinges on what is considered "Too high".  For a few tenths of a dB I do need to hear the samples very close together. For several dB, I can walk into a room stone cold and guess the SPL value with a reasonable tolerance.  Most important differences in quality usually involve a fair number of dB.

"Too high" means wrong. For testing with human subjects, you can't measure a differential threshold if the person can't directly "compare", which requires echoic memory. You may have an unusual ability to estimate and therefore abstract the SPL, but most people can't. I'm not talking about "most important differences in quality", I named a specific test that would require attention to echoic memory limits. But not for you ;-)

Quote
Your identification of the lute, as a lute, relies on abstract memory, and your remembering your identification has nothing to do with echoic memory - pure abstract memory.

I was not talking about the identification of a lute sound as sounding like a lute,

But I was, as an example (lute vs. mandolin) where echoic memory can be ignored.

Quote
Quote
My understanding is that auditory memory goes through 3 stages: perceptual auditory storage (aka echoic memory), which lasts up to 300 ms; synthesized auditory memory, lasting 1 to 30 sec; and generated abstract memory, which can last very long.

Agreed. That is a workable list. It may not be complete.

Nelson Cowan calls it complete. How would you complete it?


What about working memory?  Cowan seems to believe in that.

He doesn't "believe" in it, so much as that's the current understand of general memory. Working memory is not specific to the auditory system, and an auditory experience must have been processed to "generated abstract memory" before you can "place" it in working memory or long-term memory.

Quote
If I can abstract some ideas that relate to a feeling of immersion or presence, I might be able to compare headphones.

You might, but first you might want answer the question - if you hold everything else reasonably constant, do most headphones even soundstage differently?
...
How much of the perception of different soundstaging be due to unmatched frequency response or just unmatched levels?

Last question first: I wouldn't be surprised if most or all of soundstaging relates to FR, and therefore to the first question, since FRs are very different for all headphones, I expect they would soundstage differently.

So the normal differential threshold is nonlinear, but constant above about 70dB, where it between 0.3 and 0.5 dB. Can you distinguish that difference after say 10sec? If so, I'd be truly impressed. But some subjects are outside the normal range. We could use you instead of an SPL meter. ;-)

How do you listen to an ABX test?

Reply #44
I doubt that most people would bother ABX testing of headphones because headphones do sound very different from each other.
  AFAIK there is no controversy over whether or not headphones sound different from each other. ...  ABX was not designed for headphone or speaker testing. It was designed for those situations where there is a serious question as to whether an audible difference even exists.


Just because B might be easily distinguishable from A doesn't mean we no longer need to worry if some form of bias might be influencing listeners in their decision making regarding sound quality evaluations. Although I agree ABX testing itself, "Is there any difference or not?", may be a waste of time in both headphone and speaker testing (in most circumstances), don't take my agreement as any form of endorsement that double blind testing itself isn't still very necessary with headphone/speaker testing [not that I'm claiming it is easy or even possible for most of us to pull off]. Double blind protocols are still VERY important and it is why researchers (like S. Olive) use them in both speaker and headphone quality/preference testing [or at least attempt to as much as possible].

How do you listen to an ABX test?

Reply #45
I doubt that most people would bother ABX testing of headphones because headphones do sound very different from each other.
  AFAIK there is no controversy over whether or not headphones sound different from each other. ...  ABX was not designed for headphone or speaker testing. It was designed for those situations where there is a serious question as to whether an audible difference even exists.


Just because B might be easily distinguishable from A doesn't mean we no longer need to worry if some form of bias might be influencing listeners in their decision making regarding sound quality evaluations.


I totally agree with that. When a difference is known to exist, preference testing makes sense, but good preference testing takes other forms than ABX.

Quote
Although I agree ABX testing itself, "Is there any difference or not?", may be a waste of time in both headphone and speaker testing (in most circumstances), don't take my agreement as any form of endorsement that double blind testing itself isn't still very necessary with headphone/speaker testing [not that I'm claiming it is easy or even possible for most of us to pull off]. Double blind protocols are still VERY important and it is why researchers (like S. Olive) use them in both speaker and headphone quality/preference testing [or at least attempt to as much as possible].


While strictly speaking they are not preference tests, I wonder how ABX/hr and Mushra would work out for situations where it is correctly assumed that audible differences exist.

How do you listen to an ABX test?

Reply #46
"Too high" means wrong.


This isn't about right or wrong.

Quote
For testing with human subjects, you can't measure a differential threshold if the person can't directly "compare", which requires echoic memory.


Looks like proof by assertion to me. Can you do better?

Quote
You may have an unusual ability to estimate and therefore abstract the SPL, but most people can't.


Decades of training were required. Not everybody wants to do that, and not everybody has the opportunity.



How do you listen to an ABX test?

Reply #47
Just because B might be easily distinguishable from A doesn't mean we no longer need to worry if some form of bias might be influencing listeners in their decision making regarding sound quality evaluations. Although I agree ABX testing itself, "Is there any difference or not?", may be a waste of time in both headphone and speaker testing (in most circumstances), don't take my agreement as any form of endorsement that double blind testing itself isn't still very necessary with headphone/speaker testing [not that I'm claiming it is easy or even possible for most of us to pull off]. Double blind protocols are still VERY important and it is why researchers (like S. Olive) use them in both speaker and headphone quality/preference testing [or at least attempt to as much as possible].

Certainly someone doing research as their job (e.g. Olive) would need to put in the time/expense/effort, which his job would give him. And Olive has done lots of interesting and IMO important work. Are you suggesting blind protocols are needed for personal headphone decisions? I'm familiar with their "virtual headphone" method, where they inverse filter Senn HD 518s, and play models of other headphones through them. Do you know of other methods they may have used?

OT question: why is double blind always stated, when for example, fb2k isn't double? When I use "blind" at work, it's always assumed that no cues from any source (including people) are provided, other than the controlled stimulus, be it just the subject alone (technically single, I guess), or additionally the experimenter (then double) and sometimes the person doing analysis (triple)? Just curious.

Also, do you know anything about fb2k on a VM on a mac, or a similar program for a mac?

How do you listen to an ABX test?

Reply #48
While strictly speaking they are not preference tests, I wonder how ABX/hr and Mushra would work out for situations where it is correctly assumed that audible differences exist.

ABC/HR and MUSHRA were intended to make qualitative assessments of impairments to an "original" or reference, as you know. But as you point out the qualitative scoring system seems a plausible method for 2 devices, without a specific reference. A reread of the ITU docs should reveal if different analysis is required (do you have no reference, or do you arbitrarily assign one device to be the reference...).

 

How do you listen to an ABX test?

Reply #49
"Too high" means wrong.

This isn't about right or wrong.

Well, you are very experienced with ABX testing. If I were to tell you I did a test with the old AIX AVS files (with unmatched levels), and I said "I know the levels are different, but I took that into account before responding". Would you tell me I did it "wrong" or would you be more gentle. Would you say "best practice would indicate..." or "I'd suggest a better way" or would you blast me? I've read many of your posts, and I can't use the word "gentle" to describe them. You brought up the significance of memory in posts 6 and 10 above. mzil mentions echoic memory in post 37. I respond that echoic memory is important for certain types of tests, but not all, and question the strong focus by so many on it. In post 41, you say the scientific literature focusses on it.
.... and now you seem to argue, it can be ignored (if you are the subject). I'm confused about your position.

Quote
For testing with human subjects, you can't measure a differential threshold if the person can't directly "compare", which requires echoic memory.

Looks like proof by assertion to me. Can you do better?

Of course. Guilty as charged. I'm glad you point out "proof by assertion" and I hope whenever I do it, I'm called on it. I apologize that I can't provide a list of references today, but I'm happy to do so on the weekend. Of course, you have also done "proof by assertion" several times above, and when I request references or clarification, you ignore me. That's okay, it's not your job to help me... just kinda thought it'd be nice. I suspect that some of what I'll provide would have been in your response to my request (post 42) for references about echoic memory.
Quote
You may have an unusual ability to estimate and therefore abstract the SPL, but most people can't.

Decades of training were required. Not everybody wants to do that, and not everybody has the opportunity.

Sounds like you have "golden ears". No problem with TOS #8 though, I'd guess, because you make no claim of quality. I'm just happy that you would agree that Peter Aczel's lie #10 is no lie. He says: "The Golden Ears want you to believe that their hearing is so keen, so exquisite, that they can hear tiny nuances of reproduced sound too elusive for the rest of us." I don't know about "Golden Ears"(capitalized) , but normal variability, plus as you point out, training, do make some people more sensitive than the rest of us.
;-) Don't get angry. I'm playing with you a little. I know you use  "Golden Ears" as a derogatory term, and I just want to underscore your example of yourself, as someone who hears certain characteristics better than most. :-)