Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Other Listening Test Methodologies? (Read 29855 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Other Listening Test Methodologies?

Reply #25
The problem with mushra is that the results are practically useless for any given individual, regardless of whether the individual was an actual testee or not.  It really only shines for large test groups.  ABX has the same problem of not being universally applicable (perhaps even more so), but mushra is a one-shot test.  If you happen to guess the reference and the anchors are easy, the ranking for subjects that fall in-between could just very well be noise.  A second test of the same subjects by the same testee could be completely different.  A solution could be to conduct multiple trials of the same subjects and toss them if the testee cannot give reasonably consistent results.  Unfortunately, mushra tests already take a lot of time to complete as they are.  Besides, those who conduct them are interested in general performance anyway.

It boils down for choosing the right tool for the job.  In the case of demonstrating an ability to detect differences, the consensus of reliable experts seems to point at ABX.

Other Listening Test Methodologies?

Reply #26
I want a testing method that allows me to "scan" through a continuously variable sound-altering mechanism to (economically) search for thresholds of Just Noticable Difference or Unbearable Level of Difference.

Say that I have this mp3 codec that can code at rates of 16kbps to 320kbps. How do I establish (for a given song/set of songs) establish a realistic estimate of the bitrate that is barely transparent for me, using as little listening time as possible? Intuitively, it must be some kind of divide and conquer, where I divide the range into two, check for audible errors, then repeat of the upper/lower range again and again until I have narrowed in the range to my satisfaction.

In other cases, I might have some room correction algorithm that allows me to tweak the 1-dimensional "amount of correction". How do you balance the evils?

-k

Other Listening Test Methodologies?

Reply #27
I want a testing method that allows me to "scan" through a continuously variable sound-altering mechanism to (economically) search for thresholds of Just Noticable Difference or Unbearable Level of Difference.


There is the rub - setting up a mechanism for continuously varying a sound-altering mechanism.

This is doable for simple things like hearing thresholds - the widely used up/down test depends on having such a mechanism.

Dolby made such a mechanism for jitter, and IME that's not too hard to do.  Did anybody say "Deep pockets"?  ;-)

But the arbitrary case taxes the mind.

It is far easier to set up a list of target percentages of sound alteration and making files that fit the list.

Other Listening Test Methodologies?

Reply #28
I want a testing method that allows me to "scan" through a continuously variable sound-altering mechanism to (economically) search for thresholds of Just Noticable Difference or Unbearable Level of Difference.


There is the rub - setting up a mechanism for continuously varying a sound-altering mechanism.

This is doable for simple things like hearing thresholds - the widely used up/down test depends on having such a mechanism.

Dolby made such a mechanism for jitter, and IME that's not too hard to do.  Did anybody say "Deep pockets"?  ;-)

But the arbitrary case taxes the mind.

It is far easier to set up a list of target percentages of sound alteration and making files that fit the list.

In my mind, a sensibly chosen range of 100 is practically the same as continous. I.e. if you can create the degradation in MATLAB or something else offline into a sorted set of 100 files, then any testing methology that lets me have good estimates on JND etc on those is interesting. Disk space is cheap. CPU cycles are cheap. My life-span (and attention-span) is finite...

I am more concerned with How do you "train" properly (start out with the most degraded sample?), how do you avoid listening fatigue (minimize the listening time?), and how do you cope with inherent variability of listeners ("backtrack" to some degree?)

Other Listening Test Methodologies?

Reply #29
If you prefer the same version 75% or more of the time, then it is audible.

If I guess right 75% of the time on a set of 8 coin flips, does that make me clairvoyant?

I did it yesterday, so I guess I am!  Thanks for confirming my suspicion.

Seriously though, were you not satisfied with the answers you got when you raised this point nearly ten years ago?
http://www.hydrogenaudio.org/forums/index....st&p=163040

FWIW, I've done these types of tests and think they are much more fun than mushra, but they were given in order to establish a preference between two things without the presence of a reference.


No it doesn't.  You used too few trials, and 2afc is not ABX.  The stats involved are different.  Coin flipping isn't a model of it. Quite simply you misunderstood. And they are best for preference testing.  I made no bones about that. Preference testing has its place.  Testing for a difference is still done.  If you results are below the 75% levels then you aren't perceiving anything upon which a preference could be based.

Other Listening Test Methodologies?

Reply #30
One that makes sense to me is a variation used in the food industry.  Two alternative forced choice. Also generally you have a reliably perceived difference if the testee scores 75% correct choices regardless of the number of trials. 

You present two choices.  A and B.  The testee must choose one.  The parameter in food industry is something like choose the sweeter or the pair.  In audio you could ask a person to choose the sample with most bass, or simply the version you prefer, or that sounds most real. 

I think it nicer than ABX as it is more how people listen for differences when not doing blind tests.  They listen to a couple things and pick the one they prefer.  Also you are not straining to hear if something is different or if it matches some references.  You know for certain each of the two tracks presented are in fact different.  You just pick the one you prefer or the one with whatever quality is being tested for.  So you hear two versions, know they are different, pick a preference.  Of course which version is presented first varies randomly.  If you prefer the same version 75% or more of the time, then it is audible.


Proving once again that tests are a lot more fun if they aren't real tests. One of the characteristics of a real test is that they must provide a means for people to fail the test. Sorry about that!

You seem to be sort of dancing around tests that are more like Mushra or ABC/hr.  They aren't preference tests but they are more like preference tests than ABX.

It's really about the right tool for the job. ABX seems to still be king if you want to know if there is an audible difference, but it is horrible for preference testing.

If you want to look at proper blind preference testing, please see http://www.sensorysociety.org/ssp/wiki/Category:Methodology/  The Triangle test is pretty close to ABX.


So what gave you the idea this was a test you couldn't fail?  Fail to meet the 75% level, and whatever preference you are testing for isn't perceivable as different.  Come on Arnie read this more carefully.  It is used successfully in the food and other industries. 

Other Listening Test Methodologies?

Reply #31
You used too few trials, and 2afc is not ABX.  The stats involved are different.  Coin flipping isn't a model of it.

1. You never specified a minimum number of trials.

2. Why couldn't I use a coin to determine what my answer would be when taking the test?  Perhaps you can provide the statistical model that illustrates how simple common sense isn't applicable here. If there are two choices then a coin flip is a perfectly valid comparison to illustrate how well the test is resistant to guessing.

3. Tell me why one couldn't simply impose limitations on ABX in order to perform perform 2afc. I contend that any differences prohibiting ABX from being used to perform 2afc will lie in the nature of the test subjects (tasting vs. hearing) and those differences would preclude 2afc from being used in listening tests.  The point is that you can't distill the essence of a sound clip for consumption like you can a sample for tasting without someone crying foul over whether the selection was sensitive enough.

The point to all this is whether a test for preference where it is fair to assume that a reasonable difference exists between the the subjects can also be used to demonstrate that there is a simply a difference between two test subjects when that difference may only cross the threshold of perceptibility (if that).  Unless I am mistaken, ABX is a superset of 2afc and as such is far better equipped to demonstrate differences.

Other Listening Test Methodologies?

Reply #32
I accept the premise that ABX will show differences with the highest resolution, however we all understand that, especially with components, it's a quite difficult test to setup. There are several complaints regarding "stress" of the listener, and "lack of familiarity" as well as that the test is "to clinical, doesn't capture the soul of the music".  Assuming one is looking for valuable data, but trying to accommodate how people listen in their homes (whether they set up their own ABX or whatever in the house to tell the difference etc) has their, or could there be a test that is a subjective "blind" test. Basically, asking respondents for where they hear differences between several blinded options? For example, everyones favorite cable debate. (I think this could also be done for amps or music sources or other specific components, but would obviously be more difficult. I think this test methodology does not offer anything regarding speakers)

We take a range of cables "known good"expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.

Assumptions:
Sample size would have to be large
Form would have to include everything that people might want to describe the differences (no 'write ins' allowed)
Form would require testers to fill out details of their listening system (Co-variants)
Cables would have to be shipped from person to person
Cables must stay blinded, only labelled as 1, 2, 3, 4 etc
Cable properties (resistance/capacitance etc) should be measured and used as co-variants

Variances between the duplicate cables are used as your "listening error". Co-variants are analyzed to see if they contribute to variable data or not

Weakness of this method : you'll never have statistical power for specific co-variants, for example, a given set of speakers, or a specific amplifier, but perhaps we could include topologies, cost, or other measures to see if "Systems greater than $50k" results in differences compared to "systems under $10k"
We are relying on people to leave the cables blinded. It might be possible for someone to cheat.
We would need to rely on "Trained listeners" but the sample size would have to be high, would need a high community acceptance of this test"
Test should be anonymous, however with the amount of information we would need to collect, people will still fear lack of anonymity.
The sample size to avoid false positives has to be very large.
Form itself guides answers, so this would have to be very highly reviewed to be as neutral as possible.

Do people think this test methodology would be valuable?

Other Listening Test Methodologies?

Reply #33
It's just a big departure from how people actually listen.

I'm not sure what you mean by this. True, to get the most sensitivity to small differences one listens to short segments with rapidly switching between them, but there is absolutely no reason that one could not do ABX testing by listening to the entire piece from beginning to end each time.


There is a very good reason to not listen to each alternative that long. The human memory for small audible details falls off a cliff after from 2 to 8 seconds depending the the experiment.

Quote
It's all a matter of what you want to accomplish and how much time you are willing to put into it.


If you feel that way...  I like to succeed when success is at all possible!

Quote
The main thing is repetition. If the segments are short, that's also unnatural in terms of the length of a listening segment.


The only reasonable argument is that repetition is unnatural in that it provides unnaturally sensitive results.  That actually makes sense in some cases because it is more like natural listening.

Quote
True, one could listen to entire pieces ("Let's A/B Tosca! --see you in a month!")


For your trouble, you get stinky results.

Quote
But ears acclimate to the current sound environment. ABX depends on listening memory--unlikely to be effective for listening sessions which resembled "normal" listening.


Nobody has explained to me how you compare 2 sonic alternatives without depending on listening memory. Someone actually published an article in one of the consumer audio magazines suggesting that you listen to one alternative through one speaker and the other to another speaker and stand halfway between them...

Not so much!


Quote
I tend to listen to an album all the way through, so my typical realistic session would be in the neighborhood of 30-60 minutes. I could provide Likert responses with some confidence, but compared to an album I heard an hour ago? It doesn't seem feasible. Not because it takes too long, but because aural memory will not function effectively across such long spans. I could be wrong, I'd be interested in published data if anyone has done it.


"This is your brain on Music" by Levetin.

Achilles’ Ear? Inferior Human Short-Term and Recognition Memory in the Auditory Modality by James Bigelow,

and  Amy Poremba

- someone pointed this one out to me and I've only read the abstract, but it seems pretty good.

Other Listening Test Methodologies?

Reply #34
I accept the premise that ABX will show differences with the highest resolution,


Yes, if a same/different result is what you want.

Quote
however we all understand that, especially with components, it's a quite difficult test to setup.


The fact that we have a recognized tool for the purpose FOOBAR2000 makes a big difference.

Quote
There are several complaints regarding "stress" of the listener, and "lack of familiarity" as well as that the test is "to clinical, doesn't capture the soul of the music".


Typically heard from people who have never done an ABX test themselves, or were disappointed when their hobby horse theory was not proven out as overwhelmingly as they desired.

Quote
Assuming one is looking for valuable data, but trying to accommodate how people listen in their homes (whether they set up their own ABX or whatever in the house to tell the difference etc) has their,


(1) Download FOOBAR2000
(2) Download ABX plug-in for FOO (from same site)

(3) Obtain by whatever means files that exemplify the area of investigation

(4) Test!

Quote
We take a range of cables "known good"expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.



Oh, did I say "hobby horse theory"?

Did someone say something about the clear and audible differences between good cables?

Friendly advice: Stick to things that actually sound different.  Things that are measurably so large that the known thresholds of hearing predict a possibly favorable outcome.


And don't dismiss me as having a closed mind. I've probably done as many if not more cable listening tests than the next 3 people you know or have heard of.

 

Other Listening Test Methodologies?

Reply #35
I accept the premise that ABX will show differences with the highest resolution,


Yes, if a same/different result is what you want.


Personally, I think same/different is a 75% of it. But for the industry to move in a positive direction, additional information is also necessary.
Quote
Quote
however we all understand that, especially with components, it's a quite difficult test to setup.


The fact that we have a recognized tool for the purpose FOOBAR2000 makes a big difference.

Quote
There are several complaints regarding "stress" of the listener, and "lack of familiarity" as well as that the test is "to clinical, doesn't capture the soul of the music".


Typically heard from people who have never done an ABX test themselves, or were disappointed when their hobby horse theory was not proven out as overwhelmingly as they desired.

Quote
Assuming one is looking for valuable data, but trying to accommodate how people listen in their homes (whether they set up their own ABX or whatever in the house to tell the difference etc) has their,


(1) Download FOOBAR2000
(2) Download ABX plug-in for FOO (from same site)

(3) Obtain by whatever means files that exemplify the area of investigation

(4) Test!

Yes, and seeing the plethora of "digital only" tests, testing only files is relatively easy. The difficulty comes in testing things that are not files, such as amplifiers, speakers, etc. I agree that ABX is a great test for it's purpose. I think it does the job fantastically. I'm asking about how to collect data for things that aren't that easy to measure? And then how do we know better or worse (Yes, I know MUSHRA exists, and I honestly think that wouldn't be a bad way either)
Quote
Quote
We take a range of cables "known good" expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.



Oh, did I say "hobby horse theory"?

Did someone say something about the clear and audible differences between good cables?

Friendly advice: Stick to things that actually sound different.  Things that are measurably so large that the known thresholds of hearing predict a possibly favorable outcome.


And don't dismiss me as having a closed mind. I've probably done as many if not more cable listening tests than the next 3 people you know or have heard of.

I agree with you! and do not want to rehash any part of that debate. My question is would the above proposed test provide valuable data. (Forget that it's cables, I just used them as an example) Obviously this would be extremely difficult with speakers, but not impossible (you couldn't ship them from person to person while keeping them blinded). Unfortunately, in my opinion, speakers are where it's most critical to see what differences if any exist, but not only yes or no.

All of this is stemming from my desire to build or buy a nice set of loudspeakers and my intense desire for them to be the best possible speakers my money can buy. The lack of objective data to know what is and isn't actually different between products is frustrating to me. ABX has existed for a long time, and has not and can not provide the data I'm looking for, so is there another option that is actually feasible? Perhaps one that people are more willing to participate in?

Other Listening Test Methodologies?

Reply #36


(1) Download FOOBAR2000
(2) Download ABX plug-in for FOO (from same site)

(3) Obtain by whatever means files that exemplify the area of investigation

(4) Test!


Yes, and seeing the plethora of "digital only" tests, testing only files is relatively easy. The difficulty comes in testing things that are not files, such as amplifiers, speakers, etc. I agree that ABX is a great test for it's purpose. I think it does the job fantastically. I'm asking about how to collect data for things that aren't that easy to measure? And then how do we know better or worse (Yes, I know MUSHRA exists, and I honestly think that wouldn't be a bad way either)

Quote
Quote
We take a range of cables "known good" expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.


Oh, did I say "hobby horse theory"?

Did someone say something about the clear and audible differences between good cables?

Friendly advice: Stick to things that actually sound different.  Things that are measurably so large that the known thresholds of hearing predict a possibly favorable outcome.

And don't dismiss me as having a closed mind. I've probably done as many if not more cable listening tests than the next 3 people you know or have heard of.


I agree with you! and do not want to rehash any part of that debate. My question is would the above proposed test provide valuable data. (Forget that it's cables, I just used them as an example) Obviously this would be extremely difficult with speakers, but not impossible (you couldn't ship them from person to person while keeping them blinded). Unfortunately, in my opinion, speakers are where it's most critical to see what differences if any exist, but not only yes or no.


The better modern DACs and ADCs are several times more than good enough to transcribe anything audible that happens in the electrical domain (cables, amps, digital players) into a file.  ABX the file. Done.

Quote
All of this is stemming from my desire to build or buy a nice set of loudspeakers and my intense desire for them to be the best possible speakers my money can buy. The lack of objective data to know what is and isn't actually different between products is frustrating to me. ABX has existed for a long time, and has not and can not provide the data I'm looking for, so is there another option that is actually feasible? Perhaps one that people are more willing to participate in?


There are fairly complete technical tests of loudspeaker on the web:

The Stereophile web site:

http://www.stereophile.com/category/floor-...speaker-reviews

http://www.stereophile.com/category/stand-...speaker-reviews

Soundstage Canadian National Research Council Labs tests

http://www.soundstagenetwork.com/index.php...6&Itemid=18

Subwoofer technical reviews:

http://www.data-bass.com/home




Other Listening Test Methodologies?

Reply #37
I think some critics of ABX, who propose "superior" methods, haven't tried it.

I've run tests (of equipment) where I intended to just listen to full pieces of music and switch slowly, because that's what participants claimed to want - they were sniffy about fast switching ruining their musical enjoyment.

First they tried the full pieces of music, and found that despite familiar music, a fairly relaxed setting, and plenty of time, the previously "obvious" difference disappeared simply because they didn't know what they were listening to. THEN they were begging me to switch sources every second.

Proving an audible difference comes first. Deciding which is "better" comes second.

Cheers,
David.

Other Listening Test Methodologies?

Reply #38
I think some critics of ABX, who propose "superior" methods, haven't tried it.


Amen brother!  Almost invariably they will expose themselves. Just saw it happen today over at AVS.

The other category of bogus critics are those who did try it but it didn't agree with their beliefs and/or prejudices, and since they fancy themselves to be infallible, ABX has to be wrong. A lot of high end "Authorities" (we all know their names) fit into that category. 

Audiophiles as a group may include more with Narcissistic Personality Disorder (DSM-4) DSM-4 article that is relevant than the general public and this is exactly a behavior pattern of those unfortunates who are so afflicted.

ABX demands an ego-less approach to listening.