Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Transparent Gear and Testing (Read 19771 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Transparent Gear and Testing

Reply #50
I respectfully disagree. If a person can show with strong statistical significance that they can reliably hear a difference using normal music, not special test tones or signals that don't represent real world use, then it is their call, not mine, as to whether this is important to them or not.
"Night and day diffrerence" has no exact definition, whereas "audible vs inaudible on music" does.

I'd rather focus on exposing the possible errors* of their test, and how these errors were used as tells, rather than endlessly arguing if "a little bit different" is, or isn't, of great importance.

Speaking of Stuart et al, he played the card "you used inadequate playback gear which obscured the subtle differences" preemptively when he wrote at the end of his paper's summary:

"an audio chain used for such experiments must be capable of high-fidelity reproduction."

Now if anyone attempts to replicate his study using less than the DSP7200SE speakers that were used, with a response to over 32kHz and a price tag of *gulp* $46,000, all he has to do is say "See, I told ya. Your system was inadequate".

Hmm, I wonder how many other researchers have access to $46K speakers?


*- Dither? The wrong kind was used with a lame explanation as to why.

- Filter slopes used? Unusual and not "typical" at all, according to Arny.

- Level matching to 0.1dB or less? No discussion in the paper, everyone but me just "assumes" he did it, and since he foolishly might think response over 20k has meaning he also might have used weighting which included content above 20k [ITU does this I believe] to determine the level matching which is dead wrong and will skew things in the truly audible range.

- Time alignment and latency after processing the signal causing audible tells, like I demonstrated in the AVS AIX test comparison both through my ABX scores and Audacity analysis [I learned just for the occasion, BTW]? None mentioned.

Transparent Gear and Testing

Reply #51
Sigh.

Would you mind reiterating what the success rate was?  Now tell us how this doesn't say something about the extent of the difference that was audible.

*- Dither? The wrong kind was used with a lame explanation as to why.

The results showed this to be irrelevant.

Filter slopes used? Unusual and not "typical" at all, according to Arny.

According to Arny?  He hasn't exactly demonstrated mastery over digital filters.  I'm afraid you'll have to do better.

Time alignment and latency after processing  the signal causing audible tells, like I demonstrated in the AVS AIX  test comparison both through my ABX scores and Audacity analysis [I  learned just for the occasion, BTW]?

Again, I refer you to the success rate of the test.

Inexact level matching could have made the difference, however I think it's probably safe to assume this wasn't an issue.  It wasn't like they were comparing different hardware.

Transparent Gear and Testing

Reply #52
New post instead of editing my previous (apologies in advance if I decide to merge them later).

Intermodulation distortion was not ruled out as a cause for differences heard.

Transparent Gear and Testing

Reply #53
Sigh.

Would you mind reiterating what the success rate was?

56% in aggregate form over many trials, if I understand correctly. I mocked how unimportant this would be (from my perspective at least) here:

"Imagine the sales pitch: "Ladies and Gentlemen, come one, come all, wait till you hear our incredible new Hi-Re$ sound system that blows CD away! True, you may not be able to hear the difference on your own, as an individual, but merely invite seven of your closest friends over, listen through my necessary $46,000 speakers*, use my specially prepared samples only, cast your votes over several listening trials, sum your totals, and then finally examine the results in aggregate form, AND BINGO! 56% correct responses don't lie (instead of a random coin flip's 50% results) and it conclusively shows, with statistical significance, that yes, you made the right decision to only buy THE BEST!" - not a real quote

*- "audio chain used for such experiments must be capable of high-fidelity reproduction" [This is a real quote. It's the last line of the paper's abstract, protecting him from any subsequent failed attempts to replicate his findings, by others who may discredit his paper: https://secure.aes.org/forum/pubs/conventions/?ID=416 ] "But your test setup didn't use $46K speakers, now did it?!" He'll protest.
- "high fidelity" defined by me, or authorized agents of Meridian Audio, details not specified nor provided upon request
- offer not valid under test supervision by a disinterested third party
- must use exact, unpublished, unreleased down converted samples held in my possession
- alternate forms of conversion, music, or the use of superior dither disallowed
- any attempt to measure the down converted sample's level match to the original, possibly showing a minor mismatch, as was found in the initial AIX records' down conversion samples for the AVSforum tests, for example, is disallowed. "

Quote
Now tell us how this doesn't say something about the extent of the difference that was audible.
I never said the difference was strong, in fact I pointed out how trivial it was in my mockery of the 56% figure, however it does technically exist (assuming his test was fair, which I'm not thoroughly convinced of) and shows statistical significance based on the number of trials. [Although, as I understand it, not a single person in the test showed an ability on their own to hear a significant difference, it was only when all the results were pooled together, and that's what I was making fun of in the above quote. Would an audiophile (with such Hi-res gear) really brag to their friends about that?]

My point is arguing  over what is and what isn't "important" is a judgement call and will have no conclusion. Arguing over "is or isn't audible on music" is however more of an exact thing, hence of more interest to me personally. YMMV.

Quote
Inexact level matching could have made the difference, however I think it's probably safe to assume this wasn't an issue. It wasn't like they were comparing different hardware.

Software filters, not just hardware filters, cause level changes too, when the level matching procedure uses an inappropriate form of frequency band weighting.

Transparent Gear and Testing

Reply #54
Speaking of Stuart et al, he played the card "you used inadequate playback gear which obscured the subtle differences" preemptively when he wrote at the end of his paper's summary:

"an audio chain used for such experiments must be capable of high-fidelity reproduction."

Now if anyone attempts to replicate his study using less than the DSP7200SE speakers that were used, with a response to over 32kHz and a price tag of *gulp* $46,000, all he has to do is say "See, I told ya. Your system was inadequate".


A speaker does not have to be outrageously expensive to have reasonably flat super-high frequency response.

All of Adam Audio's studio monitors with their folded ribbon tweeter have treble response to ~50kHz at -3dB, and there are plenty of other speakers available with similar treble extension at similarly non-crazy prices. The practical application of this is debatable, of course, but you don't have to spend $46K to get that kind of frequency range.

And considering those are some of the most highly-rated studio monitors out there, anyone claiming that they are incapable of "high-fidelity playback" must be out of their mind.

Transparent Gear and Testing

Reply #55
Software filters, not just hardware filters, cause level changes too, when the level matching procedure uses an inappropriate form of frequency band weighting.

With anti-aliasing/imaging filters this is a bit of a stretch.

Transparent Gear and Testing

Reply #56
KozmoNaut, I was merely mentioning one specific aspect of his $46k speakers that were used. I didn't mean to imply ultra high frequency response is all that matters but as soon as anyone attempts to replicate his study with speakers that don't go up to 32kHz, or beyond , I'll bet Stuart will be quick to point it out.

Transparent Gear and Testing

Reply #57
Software filters, not just hardware filters, cause level changes too, when the level matching procedure uses an inappropriate form of frequency band weighting.


IIRC, forum member David (?) mentioned that the current incarnation of ReplayGain used in Foobar2K ABX, [which I don't follow closely so perhaps things have changed] used weighted filtering that takes ultrasonics into consideration in its estimation of level. So for instance, in comparing song 'A' with ultrasonic content to the same song 'B' with everything above, say, 20kHz stripped away, it inappropriately lowers the overall level of 'A' because it weighs those ultrasonics as being part of overall power level of the music.
[I can't remember what thread this discussion took place in, unfortunately.] The end result is the "level matched" audible level of 'A' and 'B' will be off, over the audible <20kHz range. Arny's "keys jangling" files, as I recall it, might be an example of where this problem would show up, perhaps, I'm not sure, and it wouldn't surprise me if Meridian's files would suffer from the same issue, however to the best of my knowledge Meridian has never made available the actual processed and unprocessed files, nor spoken a single word about how, and to what level of accuracy, the files were level matched.

Transparent Gear and Testing

Reply #58
KozmoNaut, I was merely mentioning one specific aspect of his $46k speakers that were used. I didn't mean to imply ultra high frequency response is all that matters but as soon as anyone attempts to replicate his study with speakers that don't go up to 32kHz, or beyond , I'll bet Stuart will be quick to point it out.

Yes, but as KozmoNaut points out, your price inclusion is a red herring. It wouldn't be hard to find a far less expensive speaker with that high extension. Heck, there are $40 planars that extend to 40k with less drama than that dome.
The problem with any repeatability test is the lack of system details regarding the whole mess. But if >30k response is required for the "benefits" of MQA, it might be DOA, as very few speakers on the market, even the "audiophile" one, would qualify.
If you read the comments on the paper AES site, there has been some backpedalling that would make Deon Sanders proud. Including an admission that the whole "cognitive load" aspect might have been a load of BS:
Quote
However, we accept that the use of the term “cognitive load” was perhaps over-reaching as we used it.

I guess the "smearing" aspect remains a "hypothesis":
Quote
Point 3 is perhaps worded unhelpfully generally, but it is not untrue that our results are consistent with such a temporal smearing hypothesis; we do not claim that our results support this hypothesis.

No mention of whether the claim in their own manual about TPDF being transparent will be, ummm, corrected to:
Quote
Turning to the comments on dither: we know that in order to approach transparency TPDF is the minimum that should be accepted.

"Approach transparency". 

cheers,

AJ
Loudspeaker manufacturer


Transparent Gear and Testing

Reply #60
Yes eric.w, good find. David's response below it, and your testing, confirms what I warned of.

Transparent Gear and Testing

Reply #61
[Yes, but as KozmoNaut points out, your price inclusion is a red herring.

If I perhaps gave you or anyone else the impression I respect Stuart or his grossly overpriced speakers, you are mistaken.

What we do know is he preemptively used the "your gear is inadequate" card, right in the abstract itself. As to what makes his gear "hi-fi" enough, none of us really know. He can conveiniently manufacture whatever it needs to be at the time when bad mouthing any subsequent studies by others, should he need to.  No reason for him to show his cards prematurely before the showdown when he doesn't need to.

From the paper by Stuart et al opening abstract:
"Two main conclusions are offered: first, there exist audible signals that cannot be encoded transparently by a standard CD; and second, an audio chain used for such experiments must be capable of high-fidelity reproduction. " [bold text emphasis mine]

"Must be"? Funny how that "lesser, not as high-fi audio chain", which didn't cut it and failed to show any differences, I guess, never eneded up being discussed in his paper. 

Transparent Gear and Testing

Reply #62
But he's not really saying that you must have expensive audiophile-approved speakers. I read it as a him making a point that unless your entire signal chain is actually capable of super high-frequency signal reproduction, there is absolutely no chance you will hear a difference, no matter if it's technically audible or not.

A lot of gear will impose an upper limit on frequency response, maybe because response above 20kHz is simply not specified and the manufacturer only cared about 20Hz-20kHz, maybe they even put a lowpass filter in there to avoid ultrasonic harmonic interference in the audible range.

Or if there's a DSP in there, the frequency range is limited by sampling rate. For instance, even though my speakers technically go to 50kHz, they will never ever get an input above 24kHz, because my DSP crossover uses 48kHz sampling rate ADCs/DACs.

Transparent Gear and Testing

Reply #63
I feel it would be a stronger case if there was some sort of study out there that pulled 50, 100 people and did a double blind study about transparency of say, the Odac.


Back in the 1970s and 1980s when we invented ABX, we did DBT power amp comparisons involving about 25 different people and found our classic "no differences" results. The work was repeated with a similar size group of people in the 1990s with similar results.

The ODAC is a far cleaner piece of gear than most of the gear we tested then.

Quote
There are many ways to ignore the results, but I think it would convince some people. I only came up with this thread because somebody I knew felt this would be the nail in the coffin for him.


Don't underestimate the power of greed and denial.

Transparent Gear and Testing

Reply #64
I respectfully disagree. If a person can show with strong statistical significance that they can reliably hear a difference using normal music, not special test tones or signals that don't represent real world use, then it is their call, not mine, as to whether this is important to them or not.
"Night and day diffrerence" has no exact definition, whereas "audible vs inaudible on music" does.



But in the real world, real audiophiles make real claims like 'night and day'.  All the time.

'Audible' doesn't matter so much as 'HOW audible'

Btw what's the exact definition  of 'strong statistical significance' and 'reliably'?  And 'normal music'?



Quote
I'd rather focus on exposing the possible errors* of their test, and how these errors were used as tells, rather than endlessly arguing if "a little bit different" is, or isn't, of great importance.


I'd rather focus on what claims are being promulgated in the real world to consumers.  In that regard I don't much care if a trained listener in a lab using the most sensitive conditions was able to detect a difference with  pvalue just a bit less than 0.05.  That's academically and scientifically interesting, but  I want to know if the grand claims promoted every month since the 1970s by, say, the Michael Fremers and John  Atkinsonses of the world, are likely to be true. 

So, if an audiophile blowhard says, 'I auditioned Cable A and Cable B and wowie zowie,  Cable B was clearly better sounding',  I want to see them replicate that result, with everything the same except 1) blind  2) level matched 3) random order 4) proctored.*  Not some random and trained subject in a lab setting with gear and materials they haven't heard before.  And not some Internet cowboy reporting unproctored results.

Audiophile blowhards should be able to ace such  a test.  Their gear, their materials, their claim.  The results shouldn't be borderline.

Hmm, why don't we see more of such tests?



*if this sounds familiar, I'm thinking of the Mike Levigne cable tests on AVSF, one of the few instances where my wish was granted.    The Zipser trials of amps is another.

Transparent Gear and Testing

Reply #65
Hmm, why don't we see more of such tests?


Because it takes months of listening to create solid day/night differences in your head.
"I hear it when I see it."

Transparent Gear and Testing

Reply #66
Isn't it usually going to be the new thing you haven't yet bought that's going to make an improvement over something you already have?  And if you already have the latest and greatest, you can always tape another bag of pebbles to something.

Transparent Gear and Testing

Reply #67
'Audible' doesn't matter so much as 'HOW audible'

What "levels of audibility" scale do you personally like best? What test instrument do you use to measure it and how do you calibrate it?  Or do you simply take people's word for it on an arbitrary scale you've concocted like Olive does?
If a person says a volume difference of .5 dB is "night and day", how can you prove they are lying? You can't. If they can prove they can hear it, which they probably can under the right conditions, then you are stuck. It is a pointless path to follow that get's you nowhere.
"Night and day" has no measurable defininition. "Audible" and "inaudible" do.

I'm with jj: preference in inviolate, and arguing that everyone fits into the same "annoying to pleasurable" scale is absurd, in my view [I'm not sure how jj feels on that part]. As an analogy, some people are easily bothered when the room temperature is just a couple of degrees above 72 degrees F whereas other people don't care at all if it is 10 degrees hotter. Neither group is "wrong" , they are just different, and I don't care if polls of 100 million other people say on average a difference of 2 degrees should be deemed " hardly annoying at all" whereas 10 degrees is "night and day". All I care about is under scientifically controlled conditions can that two degree guy show an ability to feel a difference, yes or no? How much does it annoy him? I don't care. It's his business.

Quote
Btw what's the exact definition  of 'strong statistical significance' and 'reliably'?  And 'normal music'?

Are you asking what I personally use when I'm taking money from audiophiles when betting them that their claims are bogus?

 

Transparent Gear and Testing

Reply #68
'Audible' doesn't matter so much as 'HOW audible'

What "levels of audibility" scale do you personally like best? What test instrument do you use to measure it and how do you calibrate it?  Or do you simply take people's word for it on an arbitrary scale you've concocted like Olive does?
If a person says a volume difference of .5 dB is "night and day", how can you prove they are lying?


You can never prove that a person is intentionally presenting false claims as being true (i.e., meeting the actual formal definition of lying) without some usually hard-to-obtain evidence about what they know and mean.

Quote
You can't.


It is like any other negative hypothesis - difficult or impossible to prove. 

Quote
If they can prove they can hear it, which they probably can under the right conditions, then you are stuck. It is a pointless path to follow that get's you nowhere.


It may be ironic that everybody I know who has provided reliable evidence that they can hear a 0.5 dB difference with music will generally stop well short of calling it "Night and Day". 

Quote
"Night and day" has no measurable definition.


Agreed. Typically, its hyperbole.

Quote
"Audible" and "inaudible" do.


But they vary with the circumstance.

Quote
I'm with jj: preference in inviolate, and arguing that everyone fits into the same "annoying to pleasurable" scale is absurd, in my view [I'm not sure how jj feels on that part]. As an analogy, some people are easily bothered when the room temperature is just a couple of degrees above 72 degrees F whereas other people don't care at all if it is 10 degrees hotter. Neither group is "wrong" , they are just different, and I don't care if polls of 100 million other people say on average a difference of 2 degrees should be deemed " hardly annoying at all" whereas 10 degrees is "night and day". All I care about is under scientifically controlled conditions can that two degree guy show an ability to feel a difference, yes or no? How much does it annoy him? I don't care. It's his business.


There may be a rational and relevant scale for evaluating preferences - preferences that are based on reliable perceptions, and those that are not.

Said briefly, "If there is no reliable audible difference, then there can be no rational preference".


Quote
Quote
Btw what's the exact definition  of 'strong statistical significance' and 'reliably'?  And 'normal music'?

Are you asking what I personally use when I'm taking money from audiophiles when betting them that their claims are bogus?


This could be begging the question. What is "an exact definition"?  Got an exact definition for that? ;-)

Seriously, as humans we constantly work in a universe of inexactitude, yet we often operate reliably and rationally in it.

My definition of "normal music"  depends on the circumstance, but I vacillate between something that comes from a commercial recording, to something that is not pathological.