Skip to main content
Topic: Issues with Blind-Testing Headphones and Speakers (Read 9434 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Issues with Blind-Testing Headphones and Speakers

Reply #25
In the late 80's/early 90's I was a Denon dealer and they were one of the top names in CD players back then. For some reason there was a top of the line unit we were throwing in the trash [because it had fallen from a shelf and had a bashed in corner, making it toast, for example].  Out of curiosity I dismantled the remains and discovered hidden away from view, completely obscured by the main internal circuit board, was a very thick metal slab of steel, a metal plate, much heavier than almost all other brands' entire CD design, heck it was even heavier than some of their own receivers. It wasn't attached to anything and served no electrical or heat dissipation purpose. I'm confident the sole purpose was to make the entire product's heft and "bulid quality" seem greater and served no other function.

[I think it was the DCD-3300 or DCD-3000, IIRC, but I'm not 100% sure at this point.]

Plus think of how many forum reviews of amps and receivers we've all read where they felt it to be useful to describe the unit's heft, which they note with some pride, as if that should convey to us something about its quality.

I suppose one could be generous and assume that was a large mu-metal shield. 

Issues with Blind-Testing Headphones and Speakers

Reply #26
^Maybe to protect against an nuclear blast EMP which is fired from directly below the unit, only (not from the sides or above).

Issues with Blind-Testing Headphones and Speakers

Reply #27
Never said "strong".

So? You implied how weight may have skewed the results.

Not just weight. Headphone comfort, cushion size, shape [some contenders are circular and others are ovals of differing shapes and sizes], air seal, headband size, foam thickness, width, configuration [for example rigid/soft/cloth strap/contact area and shape], head clamp pressure, room noise attenuation curve, and in some instances noise canceling electronic's background hiss [which Olive acknowledges was a problem which needed to be overcome and implies he was confident he successfully did].

I know I've rejected certain headphones over the years not solely but at least in part due to headband issues, having nothing to do with the sound itself which I liked. It's perceptible and in some instances, sometimes, can be a tad annoying over long periods of listening. [I no longer have a lot of hair up top, so maybe I'm more sensitive to it now than others, but this was even true when I was younger and had a full head of hair.]

In some headphone measurements the inward clamping pressure or contact pressure isn't even accomplished with the headband at all, which will of course vary due to the size setting selected and the user's head size, so instead it is completely bypassed and achieved by a more repeatedly uniform means set to a specific value often expressed in N, newtons:

There seems to be much less focus on how this variable pressure alters the sound these days, compared to the headphone research before the 90's. Exaggerated contact pressure usually ensures a better seal, true, but also increases level and bass response which is why images of people adding addition force, by hand, is not uncommon in this random collection them wearing headphones:
Images of people wearing headphones

Issues with Blind-Testing Headphones and Speakers

Reply #28
Here's a 'best guess' at the headphone  ranking, from the other thread on HA :

HP1 - LCD-2 (Audeze)
HP2 - K701
HP3 - Bose
HP4 - K550
HP5 - Beats
HP6 - Crossfade (v-moda)

So, how well do the weight and feel of each DUT correlate to this ranking?

Issues with Blind-Testing Headphones and Speakers

Reply #29
Not just weight. Headphone comfort, cushion size, shape [some contenders are circular and others are ovals of differing shapes and sizes], air seal, headband size, foam thickness, width, configuration [for example rigid/soft/cloth strap/contact area and shape], head clamp pressure,

But the LCD is not the most comfortable headphone out of the bunch ... not by far. For me personally, the weight alone would be reason enough not to buy it.

Have you even looked at the research? Because there is a comfort ranking and the LCD is the worst out of the bunch.

@krabapple: Bose > K550 > K701 > Crossfade > Beats > LCD2 (least comfortable).
I don't see a correlation.


Quote
room noise attenuation curve

If we compare the LCD to the K701 they are not that different, but the K701 is brighter, has a bit of bass roll-off, higher distortion ... and therefore was ranked lower.
The Bose has extreme isolation but was ranked in the upper half, so I don't see a correlation here either.

Quote
and in some instances noise canceling electronic's background hiss [which Olive acknowledges was a problem which needed to be overcome and implies he was confident he successfully did].

Yes, the NC circuitry in the Bose. I was surprised that they even used a NC headphone in the test.
It doesn't seem to be an outlier in terms of sound preference, but you can simply remove that headphone and you will still have 5 passive ones that don't produce noise.

Quote
In some headphone measurements the inward clamping pressure or contact pressure isn't even accomplished with the headband at all, which will of course vary due to the size setting selected and the user's head size, so instead it is completely bypassed and achieved by a more repeatedly uniform means set to a specific value often expressed in N, newtons:

Doesn't matter, since they also did ear-canal measurements across several subjects.

Also, for the different EQ-curves preference test they only used one headphone at a time, so all the differences between headphones are irrelevant. It's all in the papers.


Quote
There seems to be much less focus on how this variable pressure alters the sound these days, compared to the headphone research before the 90's.

It's kinda in their research since they looked into bass response consistency ... and critiqued one of their "own" headphones for it.
In fact, they did not shy away from criticizing AKG headphones at all.


Quote
Exaggerated contact pressure usually ensures a better seal, true, but also increases level and bass response which is why images of people adding addition force, by hand, is not uncommon in this random collection them wearing headphones

That's a photo thing, because nobody is listening like that afaik. Also, in most photos they don't even apply force.


edit: Maybe an op should move this into the thread linked by mzil a few posts ago.
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #30
Not just weight. Headphone comfort, cushion size, shape [some contenders are circular and others are ovals of differing shapes and sizes], air seal, headband size, foam thickness, width, configuration [for example rigid/soft/cloth strap/contact area and shape], head clamp pressure,

But the LCD is not the most comfortable headphone out of the bunch ... not by far. For me personally, the weight alone would be reason enough not to buy it.


So you just admitted in this quote that at least some people, and you seem to include yourself, can discern differences between headphones they are wearing based on head comfort/feel, independently of seeing with their eyes which brand they are currently wearing, or listening to the sound it produces. We don't need to ponder as to how, exactly, comfort/feel may influence people positively or negatively towards one headphone or another in a "blind" test, any more than we need to ponder how in a blind taste test of Coke versus Pepsi using a paper cup for one versus using a glass for the other might influence the test results. We must serve them the same way, regardless of the test results falling neatly into our preconceived notions of how they "should", based on some other factor(s) we can measure.

I can't for the life of me understand why you don't think head feel needs to be controlled for, regardless of the test results falling neatly into a pattern which matches "flatness of response", considering Olive himself went to great lengths to eliminate any influence from touch via fingers, not head, by his clever use of additional handles being added to each pair of 'phones so users could fiddle with their positioning and ear seal without sensing the headphone cup shape/size...via their fingers.

I would agree discussing if people can discern differences between headphones' comfort and feel is tangential to the thread's topic of level matching and ideally should be split off.

Issues with Blind-Testing Headphones and Speakers

Reply #31
So, how well do the weight and feel of each DUT correlate to this ranking?

I've sold headphones professionally for over 20 years and can assure you comfort is subjective. Some customers I've dealt with may like one design yet others may dislike the exact same pair.

For instance, I was briefly examining some Audeze in a high end store yesterday and I thought they seemed acceptably comfortable and not too heavy for me, yet xnor just wrote:
Quote
But the LCD is not the most comfortable headphone out of the bunch ... not by far. For me personally, the weight alone would be reason enough not to buy it.

Issues with Blind-Testing Headphones and Speakers

Reply #32
So you just admitted in this quote that at least some people, and you seem to include yourself, can discern differences between headphones they are wearing based on head comfort/feel, independently of seeing with their eyes which brand they are currently wearing, or listening to the sound it produces.

Yes, of course you can discern headphones by comfort alone. But the research was not to detect if people can distinguish an LCD from a Bose ... it wasn't an ABX test (which would be trivial considering the 10 dB frequency response differences) ... but which sound signature they preferred. They still tried to reduce influence from comfort (see below).
I said it before and I will repeat it again: the LCD was ranked as the worst in terms of comfort, but as best (out of the bunch) in terms of sound.


Quote
We don't need to ponder as to how, exactly, comfort/feel may influence people positively or negatively towards one headphone or another in a "blind" test, any more than we need to ponder how in a blind taste test of Coke versus Pepsi using a paper cup for one versus using a glass for the other might influence the test results. We must serve them the same way, regardless of the test results falling neatly into our preconceived notions of how they "should", based on some other factor(s) we can measure.

That's quite problematic.
a) Comfort is part of the headphone experience. It is not like a glass through which you serve some drink, but more like part of the drink.
b) Changing the earpads' shape will alter the sound. Changing the pad material will alter the sound. Changing the clamping force will alter the sound. Even equalizing the weight probably would indirectly alter the sound.

Trying to "serve them the same way" is like transplanting speaker drivers out of their enclosure into a standard test enclosure. That doesn't work for obvious reasons.


Quote
I can't for the life of me understand why you don't think head feel needs to be controlled for, regardless of the test results falling neatly into a pattern which matches "flatness of response", considering Olive himself went to great lengths to eliminate any influence from touch via fingers, not head, by his clever use of additional handles being added to each pair of 'phones so users could fiddle with their positioning and ear seal without sensing the headphone cup shape/size...via their fingers.

Exactly, he did as much as he could without altering the sound of the headphones. For further research, like solely comparing different EQ curves, you can use a single headphone - which is what they did.


I've sold headphones professionally for over 20 years and can assure you comfort is subjective. Some customers I've dealt with may like one design yet others may dislike the exact same pair.

Of course!

I hope that clears it up.
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #33
Yes, of course you can discern headphones by comfort alone. But the research was not to detect if people can distinguish an LCD from a Bose ... it wasn't an ABX test (which would be trivial considering the 10 dB frequency response differences) ... but which sound signature they preferred. They still tried to reduce influence from comfort (see below).

An ABX test is a subset of another kind of test: a double blind test. The fact that Olive put plastic handles on the headphones proves he agrees with me that the identity of the headphones' shape or size should be obscured as best possible and he admits that in terms of tactile sensation, short of anesthetizing the listeners' skin from the neck up [ha-ha], he couldn't come up with a way to make the test truly blind.

I think he did a much better job than most but whenever we read of non-blind tests we need to always consider that there were possibly expectation biases at play, not fully dictating but rather influencing the decisions, possibly at a subconscious level.

Issues with Blind-Testing Headphones and Speakers

Reply #34
I've sold headphones professionally for over 20 years and can assure you comfort is subjective. Some customers I've dealt with may like one design yet others may dislike the exact same pair.


And that's one of the reasons there are different headphones and in fact different everything! Everyone is different with everything not just headphones.

Issues with Blind-Testing Headphones and Speakers

Reply #35
Here's a 'best guess' at the headphone  ranking, from the other thread on HA :

HP1 - LCD-2 (Audeze)
HP2 - K701
HP3 - Bose
HP4 - K550
HP5 - Beats
HP6 - Crossfade (v-moda)

So, how well do the weight and feel of each DUT correlate to this ranking?


You can assign a specific value in terms of weight, say in grams, or as another example "accuracy score" to a frequency response using some de rigueur weighting curve, no problem. This was first done by Consumer's Union, publishers of Consumer Reports magazine in I believe the 1970s and still done by major brands like my buddies at Etymotic Reasearch, with only minor modifications, but how on earth do you assign a numeric value to
"comfort/feel" when we all seem to be in agreement we all feel differently about it?

http://www.etymotic.com/technology/hwmra.html

Here's a random observation that caught my eye. Maybe people dig "big circular" earpads over oval shape as their top comfort priority however they also dig cushy foam? Maybe?

Going by the rank you posted above:

HP1- big circular
HP2- big circular
HP3- oval (but super cushy)
HP4- big circular
HP5- oval
HP6- oval

Just a thought.

Plus of course sound quality may have had an important if not over ridding role too.

Whenever customers asked me, "I've been reading up on headphones and I've learned all about frequency response, diffuse field equalization, free field equalization to a lateral target source, free field equalization to a 30 degree off axis target, free field equalization to a straight forward target source, square wave reproduction, harmonic distortion, bass extension, treble extension, channel balance, channel balance per frequency, HATS measurements, G.R.A.S cheek and ear measurements, KEMAR measurements...do tell, what is the most important factor?"
My answer was always the same "Comfort, hands down. If a headphone sounds "great" but is uncomfortable to wear, what does it matter if it sounds great?"

Comfort/feel plays a very important role in our assessment of many things, not just headphones, even if we are trained to "ignore that". It's just human nature.

Issues with Blind-Testing Headphones and Speakers

Reply #36
An ABX test is a subset of another kind of test: a double blind test.

So what? It still wasn't an ABX test ...

Quote
The fact that Olive put plastic handles on the headphones proves he agrees with me that the identity of the headphones' shape or size should be obscured as best possible and he admits that in terms of tactile sensation, short of anesthetizing the listeners' skin from the neck up [ha-ha], he couldn't come up with a way to make the test truly blind.

I don't understand what your problem is.

You started this with how you have problems with Harman studies, but after a while it was obvious you didn't even read the research first, and now you try to turn this around by saying Olive's work proves that what you said (which is basically what I just said before) is right?
WTF is going on?!
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #37
His headphones preference test wasn't truly blind. To me that's important and worthy of discussion, but if you disagree, whatever.

I also have an independent gripe about the very concept of what "level matching" means when we have grossly differing response curves, as we often do with headphones and speakers.

These are two separate distinct problems in headphone research, IMHO.

Issues with Blind-Testing Headphones and Speakers

Reply #38
His headphone preference test wasn't truly blind. To me that's important and worthy of discussion, but if you disagree, whatever.

How does a truly blind test with headphones look like?
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #39
I'm currently of the mind it is nearly impossible to conduct a truly fair test of headphones under completely blind conditions with, "perfect level matching (at least according to the weighting curve we're currently using this decade)".

He freely admits his test wasn't blind but dismisses it with a joke about having to anesthetize people above the neck to correct the problem. Some people's take away message is, "Well, since we can't do that, then these existing tests are scientifically valid." My take away is ,"Since we can't make this method truly blind we'll never know for sure what impact headphone comfort/feel played in biasing people's decisions."

If his basic notion is that frequency response is what we really want to assess, then that should be done electrically via EQ, by simulating the Audeze, AKG, Bose, etc. response curves and then feeding that into the exact same pair of pre-calibrated headphones, which have been pre-calibrated to deliver an otherwise neutral response for that individual test subject, via a probe mic at the DRM reference point.

That test isn't perfect either, but in my opinion conducting a "blind" test of Coke vs Pepsi, but using a paper cup for one, yet a glass for the other, doesn't fly, despite any arguments to the contrary that "it shouldn't matter because they won't focus on that" or "it is too difficult to do it any other way".

Issues with Blind-Testing Headphones and Speakers

Reply #40
His headphones preference test wasn't truly blind. To me that's important and worthy of discussion, but if you disagree, whatever.

I also have an independent gripe about the very concept of what "level matching" means when we have grossly differing response curves, as we often do with headphones and speakers.

These are two separate distinct problems in headphone research, IMHO.

Maybe we should break this thread into two as well. I'm not kidding. These two topics have almost nothing to do with eachother.

I also don't know why ITU BS. 1770 was chosen over other standards, say A-weighting or ISO 226, as examples, which clearly more closely reflect Fletcher Munson equal loudness contours:

Not to say that any one weighting is the correct one.

Issues with Blind-Testing Headphones and Speakers

Reply #41
Here are four of the contenders I could find with their raw, un-corrected curves, normalized at 1 kHz.  Notice how we have to do some major shifting laterally of some levels by 10 dB or so, if we instead decide to normalize at say 4 kHz, where the ear is more sensitive. But who's to say that's necessarily the correct place either? What if the test subject happens to focus on the sound of a bass instrument in the song, not some 4kHz centric instrument. What then? Are we "level matched" for their listening?
http://graphs.headphone.com/graphCompare.p...11&scale=20

Clearly the level match will differ greatly depending on which weighting you go with.

Issues with Blind-Testing Headphones and Speakers

Reply #42
The only way I know of to do proper level matching between two sources with differing frequency responses, such as is found with headphones or speakers, in a blind test where you don't want to inadvertently disclose identities, is to have a randomly chosen gain level for your volume knob for each and every listening trial. This needs to be explained to the test subject, the listener, that this has indeed been assigned randomly and that they'll never know what to expect as to how quickly the volume will change as they rotate the knob clockwise up from zero (full mute), so they should go slowly at first so as to not blast their ears out.

On some trials just a 1/3 rotation will achieve a very loud level yet for other trials, even if from the very same source which is of course always obscured from them, they'll have to turn it way up to 3/4 rotation to achieve the same sense of volume, even though it may (or may not) be the very same source. Only in this way will there be no tactile feedback to the test subject to potentially disclose identities, yet they can set volume at will and we don't have to worry about which weighting curve to use. They, in a sense, are using their own unique, customized weighting, yet they have no idea how much gain they had to apply to achieve what pleases them.

For example, if they get exposed to what is unbeknownst to them a somewhat bass shy source, for example,  which they have to really crank up to achieve any semblance of a full rich sound, due to Fletcher-Munson equal loudness contours, they'll never know it, because the amount they need to rotate that volume knob is randomly assigned each time and they won't know the source needed more than what they typically apply.

Issues with Blind-Testing Headphones and Speakers

Reply #43
I also don't know why ITU BS. 1770 was chosen over other standards, say A-weighting or ISO 226, as examples, which clearly more closely reflect Fletcher Munson equal loudness contours:

a) Equal loudness contours (ISO 226) deal with loudness of pure tones, not noise or music.
b) A-weighting is based on the old 40 phon Fletcher-Munson curves, so if it is used at all then ideally at very low levels with pure tones. Despite that it is still being (ab)used for noise measurements.
c) BS.1770 deals with loudness monitoring, so out of the family of weighting filers above its "K" filter is the only one that actually fits your purpose.
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #44
While a tactile DBT is not 'truly blind' it seems that what you, mzil, want is something like a PCA (principle component analysis) to see what degree of influence, if any, tactile sensation has on the preference rankings.  Have I go that right?

Issues with Blind-Testing Headphones and Speakers

Reply #45
I honestly don't really see the point, because given the new target curves you can build different types of headphones that sound "equally" awesome.
People will choose whatever type of headphone that they prefer anyway, be it in-ear, on-ear, around-ear, ... light or heavy, (p)leather or velour pads, and so on ...


And such a test would basically be just the comfort ranking that I mentioned before.
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #46
c) BS.1770 deals with loudness monitoring, so out of the family of weighting filers above its "K" filter is the only one that actually fits your purpose.

Isn't it interesting that in examining the inflexsion point of the K-weighting, it just so happens to occur at exactly 1000 Hz. Funny how in our analysis of human's perception of loudness to determine that exact point it just so happens to fall at a very convenient number value because our number system is based on 10. If one didn't know any better they might think it was selected for convenience as an easy to remember ballpark figure rather than measured via precise audiometric accuracy! Wouldn't that be absurd. [sarcasm]

"Fits my purpose"? Not at all, in fact it seems rotten. It was based on a broadcast standard for mono which the inventors described as such:

"It may come as a surprise that the ITU uses such basic filtering to define the difference between RMS and loudness, but as they put it, "for typical monophonic broadcast material, a simple energy-based loudness measure is similarly robust compared to more complex measures that may include detailed perceptual models”. The ITU calls such a filter 'K-weighting',

I'll take the "complex methods which involves perceptual models" any day, and even using them I still think the concept of precisely level matching is sketchy; all we can do is "ballpark" levels and keep our fingers crossed it will cover the narrow band of the audible spectrum our test listener happens to focus on. If they happen to focus on a bass instrument we are dead in the water since upper and mid frequencies are given priority status in all these weighting schemes (which I agree is proper looking at the big picture), and the very concept of claiming the two systems are "level matched" regardless of frequency of interest is laughable.

"NOTE 1 – Users should be aware that measured loudness is an estimation of subjective loudness and involves some degree of discrepancy depending on listeners, audio material and listening conditions."

WHAT!? "Estimation"!?..."Discrepancy"?! "Varies by listener, conditions, and material"?! But, but, but I thought this new fangled method was guaranteed to play everything at the same level, no?! ITU, you are admitting there are scenarios where that might not be true? Then since I know any two divergent frequency response curves have to be able to be precisely level matched, because that is important for my blind studies, then I'll just keep on shopping for a better weighting system which promises me it has no flaws, since I've concocted in my mind that there must be one. Bye-bye ITU BS.1770 .

ITU word document

Issues with Blind-Testing Headphones and Speakers

Reply #47
Claiming a speaker or headphone with a mountainous frequency response has a particular "level" you can match to another one is like saying Da Vinci's Mona Lisa has a particular color value.

All you can do is take a tiny little section of these things and say "I've decided this is the important part to match and to hell with all of the rest".

Issues with Blind-Testing Headphones and Speakers

Reply #48
Isn't it interesting that in examining the inflexsion point of the K-weighting, it just so happens to occur at exactly 1000 Hz.

Yep, only that it doesn't...


If one didn't know any better they might think it was selected for convenience as an easy to remember ballpark figure rather than measured via precise audiometric accuracy! Wouldn't that be absurd. [ sarcasm]

Well, obviously you don't know better.


"Fits my purpose"? Not at all, in fact it seems rotten.

As I said, out of the weighting filers above this is the only one that was actually made for this purpose. I never said it is perfect.


"It may come as a surprise that the ITU uses such basic filtering to define the difference between RMS and loudness, but as they put it, "for typical monophonic broadcast material, a simple energy-based loudness measure is similarly robust compared to more complex measures that may include detailed perceptual models”. The ITU calls such a filter 'K-weighting',

You obviously did not read BS.1770 either, because in the appendix you can see correlation with subjective loudness ratings. There is a stereo and multichannel dataset with correlation r=0.98, even better than the first monophonic dataset.


I'll take the "complex methods which involves perceptual models" any day

Please do. Which algorithms do you suggest? Does it achieve better correlation?


"NOTE 1 – Users should be aware that measured loudness is an estimation of subjective loudness and involves some degree of discrepancy depending on listeners, audio material and listening conditions."

WHAT!? "Estimation"!?..."Discrepancy"?! "Varies by listener, conditions, and material"?! But, but, but I thought this new fangled method was guaranteed to play everything at the same level, no?! ITU, you are admitting there are scenarios where that might not be true? Then since I know any two divergent frequency response curves have to be able to be precisely level matched, because that is important for my blind studies, then I'll just keep on shopping for a better weighting system which promises me it has no flaws, since I've concocted in my mind that there must be one. Bye-bye ITU BS.1770 .

Until you stop being a moron I will also say bye-bye.


ITU word document

That's the 6 year old outdated version..
"I hear it when I see it."

Issues with Blind-Testing Headphones and Speakers

Reply #49
Until you stop being a moron I will also say bye-bye.

Your completely unprovoked personal attack calling me a "moron" will not be forgotten.

 
SimplePortal 1.0.0 RC1 © 2008-2019