HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: UltimateMusicSnob on 2013-09-20 14:13:04

Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-20 14:13:04
Has anyone come across other listening test methodologies they thought were highly revealing, useful, and rigorous? In a few papers I've seen (mostly AES) there are methodological details that vary here and there, but mostly researchers are doing double-blind tests of direct comparisons between two files.

There could be a role, for example, in tests of the form usually seen in medicine. Instead of taking one person and exposing them to both stimuli as a single data point, take a lot of persons, put them in different groups, and then collect Likert-scale data on their response to one class of stimuli. You'd probably want 100-300 persons per group, and of course expose them still double-blind, but only to one format.

This would present a calibration problem, of course, since who's to say what each number on the Likert scale represents. Still, with careful random sampling and a large sample, it's possible that the variations in judgment here would wash out in a large sample n (or they might add so much noise that no significant result is obtained). Analysis would look for systematic/significant differences between groups.

As a variation, a long-term experiment might ask a subject to listen to a single exposure of at least 10 minutes or so each day, and rate it on a Likert scale, then come back on successive days until at least 30 and preferably over 100 data points are obtained. Still lots of statistical noise here, but that's one of the points of large sample sets.

In the past such tests would have had huge practical barriers, but I'll bet it could be done now on the Internet.
Title: Other Listening Test Methodologies?
Post by: 2Bdecided on 2013-09-20 14:41:01
Not quite what you're saying (because there are still comparisons), but closer...
http://soundexpert.org/home (http://soundexpert.org/home)

Please see past HA discussions about soundexpert.


The classic BS 1116 test isn't just ABX, it's ABC...

Quote
In the preferred and most sensitive form of this method, one subject at a time is involved and the selection of one of
three stimuli (“A”, “B”, “C”) is at the discretion of this subject. The known reference is always available as stimulus “A”. The hidden reference and the object are simultaneously available but are “randomly” assigned to “B” and “C”, depending on the trial.

The subject is asked to assess the impairments on “B” compared to “A”, and “C” compared to “A”, according to the continuous five-grade impairment scale. One of the stimuli, “B” or “C”, should be indiscernible from stimulus “A”; the other one may reveal impairments. Any perceived differences between the reference and the other stimuli must be interpreted as an impairment.


from...
http://www.itu.int/dms_pubrec/itu-r/rec/bs...;!PDF-E.pdf (http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1116-1-199710-I!!PDF-E.pdf)


Note those first words "In the preferred and most sensitive form of this method..." - people have tried a lot of ways of doing this, and we end up with ABX or very similar because it's the most sensitive method.


Another sensitive method is Three Alternate Forced Choice Comparison Test (3-AFC). Present A, B and C, where two of those are the original, and one is the coded version. Pick the odd one out. If the user picks correctly, move to a higher quality version. If the user pick incorrectly, move to a lower quality version. Great for training and finding thresholds of audibility, but fraught with problems in terms of moving up and down the "quality" scale in a useful way. Easy for simple masking experiments. Harder and less reliable for finding the transparency threshold of a codec, though it's one possible tool.

Cheers,
David.
Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-20 19:49:45
Great info, thank you. Depending on how confident I am, I have conducted a de facto ABC round from time to time in the foobar ABX interface. Nothing forces the user to check both A and B before deciding, so one way I've proceeded is to check A alone, and then decide which of X or Y is actually A.

I'm also thinking of a case were a control group of large sample n gets *only* a referent file, like a CD-mastered copy, and then rates that on a variety of quality-oriented Likert scales. There *is* an implicit comparison between that and "everything else I listen to beside this test file", but subjects would not A/B. Then other treatment groups get *only* a manipulated file derived from the referent, and respond on the same survey measures. The hope would be to reject the null hypothesis on a t-test of no difference between the aggregated measures for each group.
Title: Other Listening Test Methodologies?
Post by: testyou on 2013-09-21 07:19:54
But you are comparing a qualitative judgment between two different groups using two different samples?  Would you think it likely to obtain any meaningful result?

With ABX test: you can reject null hypothesis by correctly identifying x with significance, or fail to reject and imply transparency.  This gives you a strong result.
Title: Other Listening Test Methodologies?
Post by: C.R.Helmrich on 2013-09-21 09:30:19
I'm also thinking of a case were a control group of large sample n gets *only* a referent file, like a CD-mastered copy, and then rates that on a variety of quality-oriented Likert scales. There *is* an implicit comparison between that and "everything else I listen to beside this test file", but subjects would not A/B. Then other treatment groups get *only* a manipulated file derived from the referent, and respond on the same survey measures.

I think what you're describing here is provided by the Absolute Category Rating (ACR) configuration of a P.800 test: https://www.itu.int/rec/dologin_pub.asp?lan...!!PDF-E (https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.800-199608-I!!PDF-E)

By the way, there is also the MUSHRA methodology, which is very similar to the BS.1116: http://www.itu.int/dms_pubrec/itu-r/rec/bs...;!PDF-E.pdf (http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1534-1-200301-I!!PDF-E.pdf)

Chris
Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-21 14:46:25
But you are comparing a qualitative judgment between two different groups using two different samples?  Would you think it likely to obtain any meaningful result?

With ABX test: you can reject null hypothesis by correctly identifying x with significance, or fail to reject and imply transparency.  This gives you a strong result.
Possibly just two different samples, but I was thinking of classes of samples. One group does only, say MP3 128, the other does only AAC 128, but given those settings, what I'd do then is encode each subject's only personal listening library. Collect the Likert scales plus demographics and personal listening information (dollars spent, type of equipment, genres, hours/day, etc). Then with many hundreds of subjects over a period of weeks, collect a lot of data. Then potentially you'd get results like "Punk rockers, the elderly and those who listened primarily in the car showed no variance between the treatment groups, while young listeners playing pop on headphones preferred AAC."
  I actually filled out an online survey solicited by SONY with questions about my listening habits and preferences, and it did go to different formats as well, just without any actual listening tests. This information would be of great interest to SONY, but getting the methodology together is difficult.

I'm trying to get away from the unrealistic spot listening encouraged by foobar-type ABXing. I never listen to 30 seconds of music, and I certainly don't play pieces over and over. I listen to one piece, and then move on. My appreciation of the audio quality is cumulative over time, not locked to a variety of spots.

[Caveat: Perhaps this goes to successful ABXing: I ***Do*** listen to the same spot over and over, when I'm practicing a new piece on a live instrument. Getting the septuplet flourish in the Chopin fingered right and phrased beautifully, is precisely the exercise of ABXing: short excerpt, repeated listening, listening as a perfectionist for every detail.]

I listened to YouTube music videos with my children for a good two hours the other evening, and my ears acclimated to it. Then I switched to one of my own Redbook-ripped-to-HD tracks--the soundstage leaped out of the speakers, a dramatic contrast. This could conceivably be expanded as a methodology: "Listen all your usual ways all day. Then stop for ten minutes and listen to the prescribed track in the prescribed way. Then rate the quality on these dimensions. No re-listening."

The ABX result IS a strong result, the best there is, but for a highly artificial listening situation. To put it another way, ABX is a lab experiment, well-controlled and rigorous, but with artificial conditions and poor generalizability. Something closer to a field experiment would also be useful. The data would be noisier for sure, but the potential for results directly applicable to development and product offerings would make it worth it to find out if significant results are possible.
Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-21 14:59:59
I'm also thinking of a case were a control group of large sample n gets *only* a referent file, like a CD-mastered copy, and then rates that on a variety of quality-oriented Likert scales. There *is* an implicit comparison between that and "everything else I listen to beside this test file", but subjects would not A/B. Then other treatment groups get *only* a manipulated file derived from the referent, and respond on the same survey measures.

I think what you're describing here is provided by the Absolute Category Rating (ACR) configuration of a P.800 test: https://www.itu.int/rec/dologin_pub.asp?lan...!!PDF-E (https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.800-199608-I!!PDF-E)

By the way, there is also the MUSHRA methodology, which is very similar to the BS.1116: http://www.itu.int/dms_pubrec/itu-r/rec/bs...;!PDF-E.pdf (http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1534-1-200301-I!!PDF-E.pdf)

Chris
Excellent, that's what I was looking for. The unit of treatment exposure in the telephone tests is the "conversation", which is a highly realistic representation of the normal mode of use for that population. One long stimulus, followed by subject responses on a number of quality dimensions. Good procedure.
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2013-09-23 13:29:38
Has anyone come across other listening test methodologies they thought were highly revealing, useful, and rigorous? In a few papers I've seen (mostly AES) there are methodological details that vary here and there, but mostly researchers are doing double-blind tests of direct comparisons between two files.


I'm trying to figure out where you are going here. Your statement "mostly researchers are doing double-blind tests of direct comparisons between two files" is extremely general and seems describe a very large family of procedures that seem hard to criticize. If not double blind, then what, sighted? If not direct comparisons, then what, indirect comparisons?

If you want to see a general treatment of comparison methodolgies, look at the classic:

http://www.amazon.com/Sensory-Evaluation-T...n/dp/0849338395 (http://www.amazon.com/Sensory-Evaluation-Techniques-Fourth-Edition/dp/0849338395)

Sensory Evaluation Techniques, Fourth Edition [Hardcover]
Morten C. Meilgaard (Author), B. Thomas Carr (Author), Gail Vance Civille (Author)

The current favorite for perceptual coding testing seems to be MUSHRA.

http://en.wikipedia.org/wiki/MUSHRA (http://en.wikipedia.org/wiki/MUSHRA)

Quote
There could be a role, for example, in tests of the form usually seen in medicine. Instead of taking one person and exposing them to both stimuli as a single data point, take a lot of persons, put them in different groups, and then collect Likert-scale data on their response to one class of stimuli.


Been there, done that.

It has its moments.  One key issue to bear in mind is that the test needs to be tailored and optimized for the question at had. For example: "Do these two files sound different at all?" is thought by many to be a prerequisite  for, and a very different question than: "Which of these files do I prefer?"

Quote
You'd probably want 100-300 persons per group, and of course expose them still double-blind, but only to one format.


Lotsa luck with finding 300 qualified listeners.

Seems to me you need to do more study of subjective testing technology to date. The study of hearing developed blind testing for a decade or more before modern testng methodologies were popularized for audio in the middle 1970s. The Journal of the Acoustical Society is a good resource, and just to keep you on your toes they have an ABX test which is substantially different from the one we use in audio, and for good reasons.

Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-23 17:55:04
If not double blind, then what, sighted?
No, not sighted. Here I was just listing the normal methodological procedures.

If not direct comparisons, then what, indirect comparisons?
No, as described earlier I'm thinking about tests in which the subjects themselves do not compare side-by-side, they only rate, as is commonly the case in drug tests. The point of comparison occurs during analysis of results, not data collection.

There could be a role, for example, in tests of the form usually seen in medicine. Instead of taking one person and exposing them to both stimuli as a single data point, take a lot of persons, put them in different groups, and then collect Likert-scale data on their response to one class of stimuli.


Been there, done that.
Excellent, do you have any citations?

You'd probably want 100-300 persons per group, and of course expose them still double-blind, but only to one format.
Lotsa luck with finding 300 qualified listeners.
Obviously a tough problem, but that's why I mention the possibility of leveraging the Internet. It *would* be expensive, but conceivably a service like Zoomerang could provide me a sample population.

Seems to me you need to do more study of subjective testing technology to date.
Yes.....that's why I posted the thread....
Title: Other Listening Test Methodologies?
Post by: saratoga on 2013-09-23 18:24:10
Actually, in clinical trials, direct comparisons are used if at all feasible.  The reason is that if you don't do direct comparisons, your sample sizes will need to be enormous.  So you could get together huge numbers of listeners and spend months and years doing a test, but its probably easier to just design a better test that doesn't cost millions of dollars to run
Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-23 19:28:41
Actually, in clinical trials, direct comparisons are used if at all feasible.  The reason is that if you don't do direct comparisons, your sample sizes will need to be enormous.  So you could get together huge numbers of listeners and spend months and years doing a test, but its probably easier to just design a better test that doesn't cost millions of dollars to run

Yes, the cost/benefit ratio is probably never going to work out. It's the artificiality of the ABX that works against generalizability for the research questions in audio, though. The procedure is rigorous, the data is useful, absolutely. It's just a big departure from how people actually listen.
Title: Other Listening Test Methodologies?
Post by: pdq on 2013-09-23 19:45:34
It's just a big departure from how people actually listen.

I'm not sure what you mean by this. True, to get the most sensitivity to small differences one listens to short segments with rapidly switching between them, but there is absolutely no reason that one could not do ABX testing by listening to the entire piece from beginning to end each time. It's all a matter of what you want to accomplish and how much time you are willing to put into it.
Title: Other Listening Test Methodologies?
Post by: saratoga on 2013-09-23 19:48:14
It's just a big departure from how people actually listen.


How so?
Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-23 20:04:49
It's just a big departure from how people actually listen.

I'm not sure what you mean by this. True, to get the most sensitivity to small differences one listens to short segments with rapidly switching between them, but there is absolutely no reason that one could not do ABX testing by listening to the entire piece from beginning to end each time. It's all a matter of what you want to accomplish and how much time you are willing to put into it.
The main thing is repetition. If the segments are short, that's also unnatural in terms of the length of a listening segment. True, one could listen to entire pieces ("Let's A/B Tosca! --see you in a month!") 
But ears acclimate to the current sound environment. ABX depends on listening memory--unlikely to be effective for listening sessions which resembled "normal" listening. I tend to listen to an album all the way through, so my typical realistic session would be in the neighborhood of 30-60 minutes. I could provide Likert responses with some confidence, but compared to an album I heard an hour ago? It doesn't seem feasible. Not because it takes too long, but because aural memory will not function effectively across such long spans. I could be wrong, I'd be interested in published data if anyone has done it.
Title: Other Listening Test Methodologies?
Post by: saratoga on 2013-09-23 20:10:14
It's just a big departure from how people actually listen.

I'm not sure what you mean by this. True, to get the most sensitivity to small differences one listens to short segments with rapidly switching between them, but there is absolutely no reason that one could not do ABX testing by listening to the entire piece from beginning to end each time. It's all a matter of what you want to accomplish and how much time you are willing to put into it.


The main thing is repetition.


You don't have to actually repeat the test.  You could do all sorts of methodologies where one does an ABX comparison a single time and then uses multiple samples.  I suspect you'll find that its just a more complex way to arrive at the same answer though.

I tend to listen to an album all the way through, so my typical realistic session would be in the neighborhood of 30-60 minutes. I could provide Likert responses with some confidence, but compared to an album I heard an hour ago? It doesn't seem feasible. Not because it takes too long, but because aural memory will not function effectively across such long spans. I could be wrong, I'd be interested in published data if anyone has done it.


If all you care about is how things sound compared to your long term memory, than accuracy is probably not too important.  Even relatively large differences will not be apparent over such time periods.  Or to put this another way, differences that matter over such long time periods are generally so obvious when A/B'ed that ABX is unnecessary.
Title: Other Listening Test Methodologies?
Post by: pdq on 2013-09-23 20:15:22
One of the excuses that is given when someone is unable to back up a claim of audibility using ABX testing, is that the usual protocol of switching back and forth rapidly between two versions makes it more difficult rather than less difficult to tell the difference, because it is so different than how one usually listens to music. They talk of things like "fatigue factor".

The counter argument is that if they are able to hear the difference only when listening to much longer segments separated by much longer times, there is no reason ABX cannot be performed in that way. Of course, those people never take up this challenge, or if they do then they are unwilling to report the results. 
Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-23 22:18:30
One of the excuses that is given when someone is unable to back up a claim of audibility using ABX testing, is that the usual protocol of switching back and forth rapidly between two versions makes it more difficult rather than less difficult to tell the difference, because it is so different than how one usually listens to music. They talk of things like "fatigue factor".

The counter argument is that if they are able to hear the difference only when listening to much longer segments separated by much longer times, there is no reason ABX cannot be performed in that way. Of course, those people never take up this challenge, or if they do then they are unwilling to report the results. 

One of the benefits of Likert scale data is that the researcher (if they obtained significant results) would have more than just "could they tell the difference" to use.
"How *much* better is A than B?", for example requires at least ordinal data. Detection is just the first step. Some of the protocols cited above get into this area, which strikes me as useful.
Title: Other Listening Test Methodologies?
Post by: MichaelW on 2013-09-24 00:32:22
One of the benefits of Likert scale data is that the researcher (if they obtained significant results) would have more than just "could they tell the difference" to use.
"How *much* better is A than B?", for example requires at least ordinal data. Detection is just the first step. Some of the protocols cited above get into this area, which strikes me as useful.


Perhaps one of the reasons ABX and MUSHRA rule is that more elaborate data would be useful in areas where preferences for different kinds of imperfections matter. With digital encoding, it is pretty trivial to get (audibly) perfect representation of the source, using lossless if necessary at only a minor cost in file size. So it is easy to get to a point where preferences could not be relevant. Same with electronics, as I understand.

So the large scale clinical trials would only be of interest, I think, to makers of loudspeakers and devisers of multichannel systems.
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2013-09-24 12:44:04
If not double blind, then what, sighted?
No, not sighted. Here I was just listing the normal methodological procedures.

If not direct comparisons, then what, indirect comparisons?
No, as described earlier I'm thinking about tests in which the subjects themselves do not compare side-by-side, they only rate, as is commonly the case in drug tests. The point of comparison occurs during analysis of results, not data collection.

There could be a role, for example, in tests of the form usually seen in medicine. Instead of taking one person and exposing them to both stimuli as a single data point, take a lot of persons, put them in different groups, and then collect Likert-scale data on their response to one class of stimuli.


Been there, done that.
Excellent, do you have any citations?


Bailar, John C. III, Mosteller, Frederick, "Guidelines for Statistical Reporting in Articles for Medical Journals", Annals of Internal Medicine, 108:266-273, (1988).
Buchlein, R., "The Audibility of Frequency Response Irregularities" (1962), reprinted in English in Journal of the Audio Engineering Society, Vol. 29, pp. 126-131 (1981)
Burstein, Herman, "Approximation Formulas for Error Risk and Sample Size in ABX Testing", Journal of the Audio Engineering Society, Vol. 36, p. 879 (1988)
Burstein, Herman, "Transformed Binomial Confidence Limits for Listening Tests", Journal of the Audio Engineering Society, Vol. 37, p. 363 (1989)
Carlstrom, David, Greenhill, Laurence, Krueger, Arnold, "Some Amplifiers Do Sound Different", The Audio Amateur, 3/82, p. 30, 31, also reprinted in Hi-Fi News & Record Review, Link House Magazines, United Kingdom, Dec 1982, p. 37.
CBC Enterprises, "Science and Deception, Parts I-IV", Ideas, October 17, 1982, CBC Transcripts, P. O. Box 500, Station A, Toronto, Ontario, Canada M5W 1E6
Clark, D. L., Krueger, A. B., Muller, B. F., Carlstrom, D., "Lipshitz/Jung Forum", Audio Amateur, Vol. 10 No. 4, pp. 56-57 (0ct 1979)
Clark, D. L., "Is It Live Or Is It Digital? A Listening Workshop", Journal of the Audio Engineering Society, Vol.33 No.9, pp.740-1 (September 1985)
Clark, David L., "A/B/Xing DCC", Audio, APR 01 1992 v 76 n 4, p. 32
Clark, David L., "High-Resolution Subjective Testing Using a Double-Blind Comparator", Journal of the Audio Engineering Society, Vol. 30 No. 5, May 1982, pp. 330-338.
Diamond, George A., Forrester, James S., "Clinical Trials and Statistical Verdicts: Probable Grounds for Appeal", Annals of Internal Medicine, 98:385-394, (1983).
Downs, Hugh, "The High-Fidelity Trap", Modern HI-FI & Stereo Guide, Vol. 2 No. 5, pp. 66-67, Maco Publishing Co., New York (December 1972)
Frick, Robert, "Accepting the Null Hypothesis", Memory and Cognition, Journal of the Psychonomic Society, Inc., 23(1), 132-138, (1995).
Fryer, P.A. "Loudspeaker Distortions: Can We Hear Them?", Hi-Fi News and Record Review, Vol. 22, pp 51-56 (1977 June)
Gabrielsonn and Sjogren, "Preceived Sound Quality of Sound Reproducing Systems", Journal of the Acoustical Society of America, Vol 65, pp 1019-1033 (1979 April)
Gabrielsonn, "Dimension Analyses of Perceived Sound Quality of Sound Reproducing Systems", Scand. J. Psychology, Vol. 20, pp. 159-169 (1979)
Greenhill, Laurence , "Speaker Cables: Can you Hear the Difference?" Stereo Review, ( Aug 1983)
Greenhill, L. L. and Clark, D. L., "Equipment Profile", Audio, (April 1985)
Grusec, Ted, Thibault, Louis, Beaton, Richard, "Sensitive Methodolgies for the Subjective Evaluation of High Quality Audio Coding Systems", Presented at Audio Engineering Society UK DSP Conference 14-15 September 1992, available from Government of Canada Communcations Research Center, 3701 Carling Ave., Ottawa, Ontario, Canada K1Y 3Y7.
Hirsch, Julian, "Audio 101: Physical Laws and Subjective Responses", Stereo Review, April 1996
Hudspeth, A. J., and Markin, Vladislav S., "The Ear's Gears: Mechanoelectrical Transduction By Hair Cells", Physics Today, 47:22-8, Feb 1994.
ITU-R BS.1116, "Methods for the Subjective Assessment of Small Impairment in Audio Systems Including Multichannel Sound Systems", Geneva, Switzerland (1994).
Lipschitz, Stanley P., and Van der kooy, John, "The Great Debate: Subjective Evaluation", Journal of the Audio Engineering Society, Vol. 29 No. 7/8, Jul/Aug 1981, pp. 482-491.
Masters, I. G. and Clark, D. L., "Do All Amplifiers Sound the Same?", Stereo Review, pp. 78-84 (January 1987)
Masters, Ian G. and Clark, D. L., "Do All CD Players Sound the Same?", Stereo Review, pp.50-57 (January 1986)
Masters, Ian G. and Clark, D. L., "The Audibility of Distortion", Stereo Review, pp.72-78 (January 1989)
Meyer, E. Brad, "The Amp-Speaker Interface (Tube vs. solid-state)", Stereo Review, pp.53-56 (June 1991)
Nousaine, Thomas, "Wired Wisdom: The Great Chicago Cable Caper", Sound and Vision, Vol. 11 No. 3 (1995)
Nousaine, Thomas, "Flying Blind: The Case Against Long Term Testing", Audio, pp. 26-30, Vol. 81 No. 3 (March 1997)
Nousaine, Thomas, "Can You Trust Your Ears?", Stereo Review, pp. 53-55, Vol. 62 No. 8 (August 1997)
Olive, Sean E., et al, "The Perception of Resonances at Low Frequencies", Journal of the Audio Engineering Society, Vol. 40, p. 1038 (Dec 1992)
Olive, Sean E., Schuck, Peter L., Ryan, James G., Sally, Sharon L., Bonneville, Marc E., "The Detection Thresholds of Resonances at Low Frequencies", Journal of the Audio Engineering Society, Vol. 45, p. 116-128, (March 1997)
Pease, Bob, "What's All This Splicing Stuff, Anyhow?", Electronic Design, (December 27, 1990) Recent Columns, http://www.national.com/rap/ (http://www.national.com/rap/)
Pohlmann, Ken C., "6 Top CD Players: Can You Hear the Difference?", Stereo Review, pp.76-84 (December 1988)
Pohlmann, Ken C., "The New CD Players, Can You Hear the Difference?", Stereo Review, pp.60-67 (October 1990)
Schatzoff, Martin, "Design of Experiments in Computer Performance Evaluation", IBM Journal of Research and Development, Vol. 25 No. 6, November 1981
Shanefield, Daniel, "The Great Ego Crunchers: Equalized, Double-Blind Tests", High Fidelity, March 1980, pp. 57-61
Simon, Richard, "Confidence Intervals for Reporting Results of Clinical Trials", Annals of Internal Medicine, 105:429-435, (1986).
Spiegel, D., "A Defense of Switchbox Testing", Boston Audio Society Speaker, Vol. 7 no. 9 (June 1979)
Stallings, William M., "Mind Your p's and Alphas", Educational Researcher, November 1995, pp. 19-20
Toole, Floyd E., "Listening Tests - Turning Opinion Into Fact", Journal of the Audio Engineering Society, Vol. 30, No. 6, June 1982, pp. 431-445.
Toole, Floyd E., "The Subjective Measurements of Loudspeaker Sound Quality & Listener Performance", Journal of the Audio Engineering Society, Vol. 33, pp. 2-32 (1985 Jan/Feb)
Toole, Floyd E., and Olive, Sean E., "The Detection of Reflections in Typical Rooms", Journal of the Audio Engineering Society, Vol. 39, pp. 539-553 (1989 July/Aug)
Toole, Floyd E., and Olive, Sean E., "Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Tests, and Other Interesting Things", 97th AES Convention (San Francisco, Nov. 10-13, 1994), [3893 (H-5], 20 pages.
Toole, Floyd E., and Olive, Sean E., "The Modification of Timbre By Resonances: Perception & Measurement", Journal of the Audio Engineering Society, Vol 36, pp. 122-142 (1988 March).
Warren, Richard M., "Auditory Illusions and their Relation to Mechanisms Enhancing Accuracy of Perception", Journal of the Audio Engineering Society, Vol. 31 No. 9 (1983 September).

Acoustical Society of America, Hearing: Its Psychology and Physiologogy, American Institute of Physics
Andersen, Hans Christian, "The Emperor's New Clothes" Andersen's Fairy Tales, with biographical sketch of Hans Christian Andersen by Thomas W. Handford. Illustrated by True Williams and others., Chicago, Belford, Clarke (1889)
Armitage, Statistical Methods in Medicine, Wiley (1971)
Burlington, R., and May, D. Jr., Handbook of Probability and Statistics with Tables, Second Edition, McGraw Hill NY (1970)
Fisher, Ronald Aylmer, Sir, Statistical Methods and Scientific Inference, 3d ed., rev. and enl., New York Hafner Press (1973)
Frazier, Kendrik, ed., Paranormal Borderlands of Science, Prometheus Books (1981)
Grinnell, Frederick, The Scientific Attitude, Boulder, Westview Press (1987)
Hanushek, E., and Jackson, J., Statistical Methods for Social Scientists, Academic Press NY (1977)
Kockelmans, Joseph J., Phenomenology and Physical Science - An Introduction to the Philosophy of Physical Science, Duquesne Press, Pittsburg PA (1966)
Lakatos, Imre, The Methodology of Scientific Research Programmes, Vol. 1 , Cambridge University Press (1978)
McBurney, Donald H., Collings, Virginia B., Introduction to Sensation/Perception, Prentice Hall, Inc., Englewood Cliffs, NJ 07632 (1977)
Moore, Brian C. J., An Introduction to the Psychology of Hearing, 3rd Edition , Academic Press, London ; New York (1989)
Mosteller and Tukey, "Quantitative Methods", chapter in Handbook of Social Psychology, Lindzey G., and Aronson, Eds., Addison-Wesley (1964)
Neave, H. R., Statistical Tables, Allen & Unwin, London (1978)
Norman, Geoffrey, R., PDQ Statistics, B. C. Decker Toronto, C. V. Mosby St. Louis, (1986)
Rock, Irwin, An Introduction to Perception, Macmillan Publishing Company, New York NY (1975)
Scharf, Bertam, and Reynolds, George S. Experimental Sensory Psychology, Scott Forseman and Company, Glenview IL (1975)

Title: Other Listening Test Methodologies?
Post by: UltimateMusicSnob on 2013-09-24 21:12:25
There could be a role, for example, in tests of the form usually seen in medicine. Instead of taking one person and exposing them to both stimuli as a single data point, take a lot of persons, put them in different groups, and then collect Likert-scale data on their response to one class of stimuli.


Been there, done that.
Excellent, do you have any citations?

Bailar, John C. III, Mosteller, Frederick, "Guidelines for Statistical Reporting in Articles for Medical Journals", Annals of Internal Medicine, 108:266-273, (1988).
Buchlein, R., "The Audibility of Frequency Response Irregularities" (1962), reprinted in English in Journal of the Audio Engineering Society, Vol. 29, pp. 126-131 (1981)
Burstein, Herman, "Approximation Formulas for Error Risk and Sample Size in ABX Testing", Journal of the Audio Engineering Society, Vol. 36, p. 879 (1988)
Burstein, Herman, "Transformed Binomial Confidence Limits for Listening Tests", Journal of the Audio Engineering Society, Vol. 37, p. 363 (1989)

etc
etc
etc

    Actually, this report (http://home.provide.net/~djcarlst/abx_peri.htm) would have been just fine, but thanks for the full list anway! 
Title: Other Listening Test Methodologies?
Post by: esldude on 2013-11-17 01:44:48
One that makes sense to me is a variation used in the food industry.  Two alternative forced choice. Also generally you have a reliably perceived difference if the testee scores 75% correct choices regardless of the number of trials. 

You present two choices.  A and B.  The testee must choose one.  The parameter in food industry is something like choose the sweeter or the pair.  In audio you could ask a person to choose the sample with most bass, or simply the version you prefer, or that sounds most real. 

I think it nicer than ABX as it is more how people listen for differences when not doing blind tests.  They listen to a couple things and pick the one they prefer.  Also you are not straining to hear if something is different or if it matches some references.  You know for certain each of the two tracks presented are in fact different.  You just pick the one you prefer or the one with whatever quality is being tested for.  So you hear two versions, know they are different, pick a preference.  Of course which version is presented first varies randomly.  If you prefer the same version 75% or more of the time, then it is audible.
Title: Other Listening Test Methodologies?
Post by: saratoga on 2013-11-17 02:00:18
You present two choices.  A and B.  The testee must choose one.  The parameter in food industry is something like choose the sweeter or the pair.  In audio you could ask a person to choose the sample with most bass, or simply the version you prefer, or that sounds most real. 

I think it nicer than ABX


The two are not really comparable because they test two different things.  ABX is a test of transparency, hence you must have the reference.  What you're describing above is a test of preference, hence no reference is necessary.

Basically, you're asking two different questions, which will not in general necessarily give you the same answer.  Neither is nicer, it just depends what you want to know.

Title: Other Listening Test Methodologies?
Post by: greynol on 2013-11-17 02:24:18
If you prefer the same version 75% or more of the time, then it is audible.

If I guess right 75% of the time on a set of 8 coin flips, does that make me clairvoyant?

I did it yesterday, so I guess I am!  Thanks for confirming my suspicion.

Seriously though, were you not satisfied with the answers you got when you raised this point nearly ten years ago?
http://www.hydrogenaudio.org/forums/index....st&p=163040 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=16298&view=findpost&p=163040)

FWIW, I've done these types of tests and think they are much more fun than mushra, but they were given in order to establish a preference between two things without the presence of a reference.
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2013-11-18 03:28:17
But ears acclimate to the current sound environment.


So let people acclimate to the current sound environment before you start the test!

Quote
ABX depends on listening memory


Good case of demonizing ABX for a property of all comparative listening evaluations.

Here's your challenge - how do you do comparative listening without depending on listening memory?

Quote
-unlikely to be effective for listening sessions which resembled "normal" listening.


Here's a news flash - normal listening is horrifically unreliable once you figure out how to determine how reliable it actually is.

We live in a world where self-deceit is very common. People do sighted listening evaluations and they think they hear all sorts of things. But there is no way to know how reliable sighted evaluations all by themselves really were since sighted listening involves other senses than listening which easily substitute their influence for just listening.

Quote
I tend to listen to an album all the way through, so my typical realistic session would be in the neighborhood of 30-60 minutes. I could provide Likert responses with some confidence,


You may have confidence but silly boy that I am, I notice that you know which alternative you are listening to by other means than listening, and I rightfully cry foul!


The history of ABX is that first we started doing blind tests, and we quickly encountered the problems with memory for small differences. We then devised ABX to maximize the sensitivity of our blind testing. The reason why we never encountered these problems before is that our listening tests weren't just listening tests, they were also tests that involved knowing what we listened to by other means than listening. Guess what, tests are harder if the right answers aren't posted on the blackboard during the test!

Quote
but compared to an album I heard an hour ago? It doesn't seem feasible. Not because it takes too long, but because aural memory will not function effectively across such long spans.


There you go! The most sensitive form of aural memory is all over with in about 2 seconds. There is actually a cascade of different kinds of aural memory, but they last for different amounts of time. Generally they become less sensitive to small differences the longer the amount of time involved.

Quote
I could be wrong, I'd be interested in published data if anyone has done it.


The best book I've found that describes how we remember sounds is "This is your Brain on Music by Letivin". It is full of citations of proper scientific research. It is readily available for about $15.
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2013-11-18 03:31:10
One that makes sense to me is a variation used in the food industry.  Two alternative forced choice. Also generally you have a reliably perceived difference if the testee scores 75% correct choices regardless of the number of trials. 

You present two choices.  A and B.  The testee must choose one.  The parameter in food industry is something like choose the sweeter or the pair.  In audio you could ask a person to choose the sample with most bass, or simply the version you prefer, or that sounds most real. 

I think it nicer than ABX as it is more how people listen for differences when not doing blind tests.  They listen to a couple things and pick the one they prefer.  Also you are not straining to hear if something is different or if it matches some references.  You know for certain each of the two tracks presented are in fact different.  You just pick the one you prefer or the one with whatever quality is being tested for.  So you hear two versions, know they are different, pick a preference.  Of course which version is presented first varies randomly.  If you prefer the same version 75% or more of the time, then it is audible.


Proving once again that tests are a lot more fun if they aren't real tests. One of the characteristics of a real test is that they must provide a means for people to fail the test. Sorry about that!

You seem to be sort of dancing around tests that are more like Mushra or ABC/hr.  They aren't preference tests but they are more like preference tests than ABX.

It's really about the right tool for the job. ABX seems to still be king if you want to know if there is an audible difference, but it is horrible for preference testing.

If you want to look at proper blind preference testing, please see http://www.sensorysociety.org/ssp/wiki/Category:Methodology/ (http://www.sensorysociety.org/ssp/wiki/Category:Methodology/)  The Triangle test is pretty close to ABX.
Title: Other Listening Test Methodologies?
Post by: greynol on 2013-11-18 04:18:51
The problem with mushra is that the results are practically useless for any given individual, regardless of whether the individual was an actual testee or not.  It really only shines for large test groups.  ABX has the same problem of not being universally applicable (perhaps even more so), but mushra is a one-shot test.  If you happen to guess the reference and the anchors are easy, the ranking for subjects that fall in-between could just very well be noise.  A second test of the same subjects by the same testee could be completely different.  A solution could be to conduct multiple trials of the same subjects and toss them if the testee cannot give reasonably consistent results.  Unfortunately, mushra tests already take a lot of time to complete as they are.  Besides, those who conduct them are interested in general performance anyway.

It boils down for choosing the right tool for the job.  In the case of demonstrating an ability to detect differences, the consensus of reliable experts seems to point at ABX.
Title: Other Listening Test Methodologies?
Post by: knutinh on 2013-11-19 12:09:01
I want a testing method that allows me to "scan" through a continuously variable sound-altering mechanism to (economically) search for thresholds of Just Noticable Difference or Unbearable Level of Difference.

Say that I have this mp3 codec that can code at rates of 16kbps to 320kbps. How do I establish (for a given song/set of songs) establish a realistic estimate of the bitrate that is barely transparent for me, using as little listening time as possible? Intuitively, it must be some kind of divide and conquer, where I divide the range into two, check for audible errors, then repeat of the upper/lower range again and again until I have narrowed in the range to my satisfaction.

In other cases, I might have some room correction algorithm that allows me to tweak the 1-dimensional "amount of correction". How do you balance the evils?

-k
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2013-11-20 14:22:54
I want a testing method that allows me to "scan" through a continuously variable sound-altering mechanism to (economically) search for thresholds of Just Noticable Difference or Unbearable Level of Difference.


There is the rub - setting up a mechanism for continuously varying a sound-altering mechanism.

This is doable for simple things like hearing thresholds - the widely used up/down test depends on having such a mechanism.

Dolby made such a mechanism for jitter, and IME that's not too hard to do.  Did anybody say "Deep pockets"?  ;-)

But the arbitrary case taxes the mind.

It is far easier to set up a list of target percentages of sound alteration and making files that fit the list.
Title: Other Listening Test Methodologies?
Post by: knutinh on 2013-11-20 18:29:45
I want a testing method that allows me to "scan" through a continuously variable sound-altering mechanism to (economically) search for thresholds of Just Noticable Difference or Unbearable Level of Difference.


There is the rub - setting up a mechanism for continuously varying a sound-altering mechanism.

This is doable for simple things like hearing thresholds - the widely used up/down test depends on having such a mechanism.

Dolby made such a mechanism for jitter, and IME that's not too hard to do.  Did anybody say "Deep pockets"?  ;-)

But the arbitrary case taxes the mind.

It is far easier to set up a list of target percentages of sound alteration and making files that fit the list.

In my mind, a sensibly chosen range of 100 is practically the same as continous. I.e. if you can create the degradation in MATLAB or something else offline into a sorted set of 100 files, then any testing methology that lets me have good estimates on JND etc on those is interesting. Disk space is cheap. CPU cycles are cheap. My life-span (and attention-span) is finite...

I am more concerned with How do you "train" properly (start out with the most degraded sample?), how do you avoid listening fatigue (minimize the listening time?), and how do you cope with inherent variability of listeners ("backtrack" to some degree?)
Title: Other Listening Test Methodologies?
Post by: esldude on 2013-11-27 06:10:23
If you prefer the same version 75% or more of the time, then it is audible.

If I guess right 75% of the time on a set of 8 coin flips, does that make me clairvoyant?

I did it yesterday, so I guess I am!  Thanks for confirming my suspicion.

Seriously though, were you not satisfied with the answers you got when you raised this point nearly ten years ago?
http://www.hydrogenaudio.org/forums/index....st&p=163040 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=16298&view=findpost&p=163040)

FWIW, I've done these types of tests and think they are much more fun than mushra, but they were given in order to establish a preference between two things without the presence of a reference.


No it doesn't.  You used too few trials, and 2afc is not ABX.  The stats involved are different.  Coin flipping isn't a model of it. Quite simply you misunderstood. And they are best for preference testing.  I made no bones about that. Preference testing has its place.  Testing for a difference is still done.  If you results are below the 75% levels then you aren't perceiving anything upon which a preference could be based.
Title: Other Listening Test Methodologies?
Post by: esldude on 2013-11-27 06:13:57
One that makes sense to me is a variation used in the food industry.  Two alternative forced choice. Also generally you have a reliably perceived difference if the testee scores 75% correct choices regardless of the number of trials. 

You present two choices.  A and B.  The testee must choose one.  The parameter in food industry is something like choose the sweeter or the pair.  In audio you could ask a person to choose the sample with most bass, or simply the version you prefer, or that sounds most real. 

I think it nicer than ABX as it is more how people listen for differences when not doing blind tests.  They listen to a couple things and pick the one they prefer.  Also you are not straining to hear if something is different or if it matches some references.  You know for certain each of the two tracks presented are in fact different.  You just pick the one you prefer or the one with whatever quality is being tested for.  So you hear two versions, know they are different, pick a preference.  Of course which version is presented first varies randomly.  If you prefer the same version 75% or more of the time, then it is audible.


Proving once again that tests are a lot more fun if they aren't real tests. One of the characteristics of a real test is that they must provide a means for people to fail the test. Sorry about that!

You seem to be sort of dancing around tests that are more like Mushra or ABC/hr.  They aren't preference tests but they are more like preference tests than ABX.

It's really about the right tool for the job. ABX seems to still be king if you want to know if there is an audible difference, but it is horrible for preference testing.

If you want to look at proper blind preference testing, please see http://www.sensorysociety.org/ssp/wiki/Category:Methodology/ (http://www.sensorysociety.org/ssp/wiki/Category:Methodology/)  The Triangle test is pretty close to ABX.


So what gave you the idea this was a test you couldn't fail?  Fail to meet the 75% level, and whatever preference you are testing for isn't perceivable as different.  Come on Arnie read this more carefully.  It is used successfully in the food and other industries. 
Title: Other Listening Test Methodologies?
Post by: greynol on 2013-11-27 14:59:16
You used too few trials, and 2afc is not ABX.  The stats involved are different.  Coin flipping isn't a model of it.

1. You never specified a minimum number of trials.

2. Why couldn't I use a coin to determine what my answer would be when taking the test?  Perhaps you can provide the statistical model that illustrates how simple common sense isn't applicable here. If there are two choices then a coin flip is a perfectly valid comparison to illustrate how well the test is resistant to guessing.

3. Tell me why one couldn't simply impose limitations on ABX in order to perform perform 2afc. I contend that any differences prohibiting ABX from being used to perform 2afc will lie in the nature of the test subjects (tasting vs. hearing) and those differences would preclude 2afc from being used in listening tests.  The point is that you can't distill the essence of a sound clip for consumption like you can a sample for tasting without someone crying foul over whether the selection was sensitive enough.

The point to all this is whether a test for preference where it is fair to assume that a reasonable difference exists between the the subjects can also be used to demonstrate that there is a simply a difference between two test subjects when that difference may only cross the threshold of perceptibility (if that).  Unless I am mistaken, ABX is a superset of 2afc and as such is far better equipped to demonstrate differences.
Title: Other Listening Test Methodologies?
Post by: drivemusicnow on 2014-02-27 09:39:17
I accept the premise that ABX will show differences with the highest resolution, however we all understand that, especially with components, it's a quite difficult test to setup. There are several complaints regarding "stress" of the listener, and "lack of familiarity" as well as that the test is "to clinical, doesn't capture the soul of the music".  Assuming one is looking for valuable data, but trying to accommodate how people listen in their homes (whether they set up their own ABX or whatever in the house to tell the difference etc) has their, or could there be a test that is a subjective "blind" test. Basically, asking respondents for where they hear differences between several blinded options? For example, everyones favorite cable debate. (I think this could also be done for amps or music sources or other specific components, but would obviously be more difficult. I think this test methodology does not offer anything regarding speakers)

We take a range of cables "known good"expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.

Assumptions:
Sample size would have to be large
Form would have to include everything that people might want to describe the differences (no 'write ins' allowed)
Form would require testers to fill out details of their listening system (Co-variants)
Cables would have to be shipped from person to person
Cables must stay blinded, only labelled as 1, 2, 3, 4 etc
Cable properties (resistance/capacitance etc) should be measured and used as co-variants

Variances between the duplicate cables are used as your "listening error". Co-variants are analyzed to see if they contribute to variable data or not

Weakness of this method : you'll never have statistical power for specific co-variants, for example, a given set of speakers, or a specific amplifier, but perhaps we could include topologies, cost, or other measures to see if "Systems greater than $50k" results in differences compared to "systems under $10k"
We are relying on people to leave the cables blinded. It might be possible for someone to cheat.
We would need to rely on "Trained listeners" but the sample size would have to be high, would need a high community acceptance of this test"
Test should be anonymous, however with the amount of information we would need to collect, people will still fear lack of anonymity.
The sample size to avoid false positives has to be very large.
Form itself guides answers, so this would have to be very highly reviewed to be as neutral as possible.

Do people think this test methodology would be valuable?
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2014-02-27 21:09:03
It's just a big departure from how people actually listen.

I'm not sure what you mean by this. True, to get the most sensitivity to small differences one listens to short segments with rapidly switching between them, but there is absolutely no reason that one could not do ABX testing by listening to the entire piece from beginning to end each time.


There is a very good reason to not listen to each alternative that long. The human memory for small audible details falls off a cliff after from 2 to 8 seconds depending the the experiment.

Quote
It's all a matter of what you want to accomplish and how much time you are willing to put into it.


If you feel that way...  I like to succeed when success is at all possible!

Quote
The main thing is repetition. If the segments are short, that's also unnatural in terms of the length of a listening segment.


The only reasonable argument is that repetition is unnatural in that it provides unnaturally sensitive results.  That actually makes sense in some cases because it is more like natural listening.

Quote
True, one could listen to entire pieces ("Let's A/B Tosca! --see you in a month!")


For your trouble, you get stinky results.

Quote
But ears acclimate to the current sound environment. ABX depends on listening memory--unlikely to be effective for listening sessions which resembled "normal" listening.


Nobody has explained to me how you compare 2 sonic alternatives without depending on listening memory. Someone actually published an article in one of the consumer audio magazines suggesting that you listen to one alternative through one speaker and the other to another speaker and stand halfway between them...

Not so much!


Quote
I tend to listen to an album all the way through, so my typical realistic session would be in the neighborhood of 30-60 minutes. I could provide Likert responses with some confidence, but compared to an album I heard an hour ago? It doesn't seem feasible. Not because it takes too long, but because aural memory will not function effectively across such long spans. I could be wrong, I'd be interested in published data if anyone has done it.


"This is your brain on Music" by Levetin.

Achilles’ Ear? Inferior Human Short-Term and Recognition Memory in the Auditory Modality by James Bigelow,

and  Amy Poremba (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089914)
- someone pointed this one out to me and I've only read the abstract, but it seems pretty good.
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2014-02-27 21:50:49
I accept the premise that ABX will show differences with the highest resolution,


Yes, if a same/different result is what you want.

Quote
however we all understand that, especially with components, it's a quite difficult test to setup.


The fact that we have a recognized tool for the purpose FOOBAR2000 makes a big difference.

Quote
There are several complaints regarding "stress" of the listener, and "lack of familiarity" as well as that the test is "to clinical, doesn't capture the soul of the music".


Typically heard from people who have never done an ABX test themselves, or were disappointed when their hobby horse theory was not proven out as overwhelmingly as they desired.

Quote
Assuming one is looking for valuable data, but trying to accommodate how people listen in their homes (whether they set up their own ABX or whatever in the house to tell the difference etc) has their,


(1) Download FOOBAR2000
(2) Download ABX plug-in for FOO (from same site)

(3) Obtain by whatever means files that exemplify the area of investigation

(4) Test!

Quote
We take a range of cables "known good"expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.



Oh, did I say "hobby horse theory"?

Did someone say something about the clear and audible differences between good cables?

Friendly advice: Stick to things that actually sound different.  Things that are measurably so large that the known thresholds of hearing predict a possibly favorable outcome.


And don't dismiss me as having a closed mind. I've probably done as many if not more cable listening tests than the next 3 people you know or have heard of.
Title: Other Listening Test Methodologies?
Post by: drivemusicnow on 2014-02-27 22:53:55
I accept the premise that ABX will show differences with the highest resolution,


Yes, if a same/different result is what you want.


Personally, I think same/different is a 75% of it. But for the industry to move in a positive direction, additional information is also necessary.
Quote
Quote
however we all understand that, especially with components, it's a quite difficult test to setup.


The fact that we have a recognized tool for the purpose FOOBAR2000 makes a big difference.

Quote
There are several complaints regarding "stress" of the listener, and "lack of familiarity" as well as that the test is "to clinical, doesn't capture the soul of the music".


Typically heard from people who have never done an ABX test themselves, or were disappointed when their hobby horse theory was not proven out as overwhelmingly as they desired.

Quote
Assuming one is looking for valuable data, but trying to accommodate how people listen in their homes (whether they set up their own ABX or whatever in the house to tell the difference etc) has their,


(1) Download FOOBAR2000
(2) Download ABX plug-in for FOO (from same site)

(3) Obtain by whatever means files that exemplify the area of investigation

(4) Test!

Yes, and seeing the plethora of "digital only" tests, testing only files is relatively easy. The difficulty comes in testing things that are not files, such as amplifiers, speakers, etc. I agree that ABX is a great test for it's purpose. I think it does the job fantastically. I'm asking about how to collect data for things that aren't that easy to measure? And then how do we know better or worse (Yes, I know MUSHRA exists, and I honestly think that wouldn't be a bad way either)
Quote
Quote
We take a range of cables "known good" expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.



Oh, did I say "hobby horse theory"?

Did someone say something about the clear and audible differences between good cables?

Friendly advice: Stick to things that actually sound different.  Things that are measurably so large that the known thresholds of hearing predict a possibly favorable outcome.


And don't dismiss me as having a closed mind. I've probably done as many if not more cable listening tests than the next 3 people you know or have heard of.

I agree with you! and do not want to rehash any part of that debate. My question is would the above proposed test provide valuable data. (Forget that it's cables, I just used them as an example) Obviously this would be extremely difficult with speakers, but not impossible (you couldn't ship them from person to person while keeping them blinded). Unfortunately, in my opinion, speakers are where it's most critical to see what differences if any exist, but not only yes or no.

All of this is stemming from my desire to build or buy a nice set of loudspeakers and my intense desire for them to be the best possible speakers my money can buy. The lack of objective data to know what is and isn't actually different between products is frustrating to me. ABX has existed for a long time, and has not and can not provide the data I'm looking for, so is there another option that is actually feasible? Perhaps one that people are more willing to participate in?
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2014-02-28 03:30:26


(1) Download FOOBAR2000
(2) Download ABX plug-in for FOO (from same site)

(3) Obtain by whatever means files that exemplify the area of investigation

(4) Test!


Yes, and seeing the plethora of "digital only" tests, testing only files is relatively easy. The difficulty comes in testing things that are not files, such as amplifiers, speakers, etc. I agree that ABX is a great test for it's purpose. I think it does the job fantastically. I'm asking about how to collect data for things that aren't that easy to measure? And then how do we know better or worse (Yes, I know MUSHRA exists, and I honestly think that wouldn't be a bad way either)

Quote
Quote
We take a range of cables "known good" expensive cable, a cheap cable, a cheap cable with a capacitor in line (Force a different sound) and duplicate of whichever of the previous cables.  We "blind" these samples and ask testers to specifically describe (Using a created form) where each cable is best. Load the form with all of the lovely audiophile terms that we know and love, giving a 3 pt scale to rate such things as "transparency in the highs", and "real sound instruments" etc etc. Users are allowed to listen/test however they want.


Oh, did I say "hobby horse theory"?

Did someone say something about the clear and audible differences between good cables?

Friendly advice: Stick to things that actually sound different.  Things that are measurably so large that the known thresholds of hearing predict a possibly favorable outcome.

And don't dismiss me as having a closed mind. I've probably done as many if not more cable listening tests than the next 3 people you know or have heard of.


I agree with you! and do not want to rehash any part of that debate. My question is would the above proposed test provide valuable data. (Forget that it's cables, I just used them as an example) Obviously this would be extremely difficult with speakers, but not impossible (you couldn't ship them from person to person while keeping them blinded). Unfortunately, in my opinion, speakers are where it's most critical to see what differences if any exist, but not only yes or no.


The better modern DACs and ADCs are several times more than good enough to transcribe anything audible that happens in the electrical domain (cables, amps, digital players) into a file.  ABX the file. Done.

Quote
All of this is stemming from my desire to build or buy a nice set of loudspeakers and my intense desire for them to be the best possible speakers my money can buy. The lack of objective data to know what is and isn't actually different between products is frustrating to me. ABX has existed for a long time, and has not and can not provide the data I'm looking for, so is there another option that is actually feasible? Perhaps one that people are more willing to participate in?


There are fairly complete technical tests of loudspeaker on the web:

The Stereophile web site:

http://www.stereophile.com/category/floor-...speaker-reviews (http://www.stereophile.com/category/floor-loudspeaker-reviews)

http://www.stereophile.com/category/stand-...speaker-reviews (http://www.stereophile.com/category/stand-loudspeaker-reviews)

Soundstage Canadian National Research Council Labs tests

http://www.soundstagenetwork.com/index.php...6&Itemid=18 (http://www.soundstagenetwork.com/index.php?option=com_content&view=article&id=16&Itemid=18)

Subwoofer technical reviews:

http://www.data-bass.com/home (http://www.data-bass.com/home)



Title: Other Listening Test Methodologies?
Post by: 2Bdecided on 2014-02-28 09:48:44
I think some critics of ABX, who propose "superior" methods, haven't tried it.

I've run tests (of equipment) where I intended to just listen to full pieces of music and switch slowly, because that's what participants claimed to want - they were sniffy about fast switching ruining their musical enjoyment.

First they tried the full pieces of music, and found that despite familiar music, a fairly relaxed setting, and plenty of time, the previously "obvious" difference disappeared simply because they didn't know what they were listening to. THEN they were begging me to switch sources every second.

Proving an audible difference comes first. Deciding which is "better" comes second.

Cheers,
David.
Title: Other Listening Test Methodologies?
Post by: Arnold B. Krueger on 2014-02-28 13:26:41
I think some critics of ABX, who propose "superior" methods, haven't tried it.


Amen brother!  Almost invariably they will expose themselves. Just saw it happen today over at AVS.

The other category of bogus critics are those who did try it but it didn't agree with their beliefs and/or prejudices, and since they fancy themselves to be infallible, ABX has to be wrong. A lot of high end "Authorities" (we all know their names) fit into that category. 

Audiophiles as a group may include more with Narcissistic Personality Disorder (DSM-4) DSM-4 article that is relevant (http://behavenet.com/node/21653) than the general public and this is exactly a behavior pattern of those unfortunates who are so afflicted.

ABX demands an ego-less approach to listening.