Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Next page in the hi-rez media scam: A Meta-Analysis of High Resolution Audio Perceptual Evaluation (Read 97940 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #100
My thread on head-fi went nowhere (surprise surprise), so might as well schlep into here.

I took Reiss's data at face value and plopped it into a random-intercept logistic regression, using "training" as a study-level predictor. Sure enough "training" is significant at an insanely low level, but you can kind of tell that just by eyeballing the data. Under this model, though, the intercept is not significant at the 0.05 level, so you cannot say that untrained individuals did significantly better than a coin flip. Sure enough the Theiss data stick out as having the biggest difference between the model fit and the observed proportion. If I split out the data by "training" and run separate beta-binomial models, there is again nothing to suggest that 0.5 is an unexpected result for a study of non-trained participants, and Theiss again sticks out like a sore thumb. There's really no ammo in here for casual hi-res proponents to use, unless they happen to do blind tests after training (whatever this training is)...

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #101
You seem to be quite adept at statistics. Maybe you can help me.

Mixing tests for frequency and bit depth seems to me like mixing tests for two different medications. If you squeeze out a bit of significance this way, what does this actually mean? That either medication is effective? Or both? Or that you still can't say?

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #102
You seem to be quite adept at statistics. Maybe you can help me.

Mixing tests for frequency and bit depth seems to me like mixing tests for two different medications. If you squeeze out a bit of significance this way, what does this actually mean? That either medication is effective? Or both? Or that you still can't say?

That sounds a bit mixed up, in that the ability to differentiate is an outcome rather than a treatment. The medication here would seem to be the training, and sample rate and bit-depth differentiations are cancers. It's certainly possible for one medication to affect multiple cancers, and it's certainly possible to model the possible remission outcomes ({0,0}, {1,0}, {0,1}, {1,1}) in a single model. He seems to punt on the bit-depth question here, though.

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #103
[...] the intercept is not significant at the 0.05 level, so you cannot say that untrained individuals did significantly better than a coin flip.
In anyone else's opinion, (apart from them audiophools) that alone speaks books as to why this subject is, and always be - to those interested in science and not myths, that is - the same old dead-horse flogging of before. But audiophools are, as usual, more than happy to try raise said equine from the dead.  ::)

 In my own personal case, if anything, it only makes me stand by my own HA avatar and not to change it whatsoever in the foreseeable future. :-D

My thread on head-fi went nowhere (surprise surprise)
That, as any sensible person have come to learn, is just an utter exercise in futility; akin to Sisyphus’s job: that Greek mythology’s entity who’s always pushing a bolder uphill, only to see it roll downhill again, every time he nears the hilltop.

But looking at it from an audiophool’s perspective, it’s certainly not a pleasant experience to have pure statistics proving that, all the time and money you’ve spent on that gold-plated coin is actually a total waste, and goes against proven scientific methods.

Hence their blindly refusing to acknowledge such methods - as keenly as a medieval peasant would, if they were shown a smartphone - (or even a typewriter, for that matter), and carry on claiming their very sample of the afore-mentioned coin has been providing them with more heads than tails, or vice-versa.
So, in the end, their hocus-pocus cult will always win within their circles, according to their sad, self-indulgent opinion.
Listen to the music, not the media it's on.
União e reconstrução

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #104
The medication here would seem to be the training, and sample rate and bit-depth differentiations are cancers.
I rather have considered the bit-depth and the samplerate as two different medications for the "illness" that you could perhaps call "audible non-transparency".

Quote
It's certainly possible for one medication to affect multiple cancers, and it's certainly possible to model the possible remission outcomes ({0,0}, {1,0}, {0,1}, {1,1}) in a single model. He seems to punt on the bit-depth question here, though.
If he'd focus on bit-depth, several of the studies he looked at would be irrelevant.

The overall question inherent in HRA is, whether increasing the samplerate and/or bit-depth audibly improves the quality of playback. The secondary question is whether the improvement, should there be any, is perceptible in enough situations and by enough people to make it worthwile to support in the consumer market. The secondary question isn't addressed in the study, of course, but the press release makes it clear that it is on the researcher's mind.

My question is whether studies which administer two different medications, some only one, some both combined, to cure one illness, can be combined in a meta-study, and what this means for the result. I understand Reiss like this: He says that the medications work, but he doesn't say which one. That's a curious result to me. I wonder what it means. Does it mean that the medications work when administered together? Does it mean that either of the two works alone? Does it mean anything at all?

And another question is on my mind: Reiss tries to judge how much each individual study was subject to errors. Some studies are labelled as neutral, others as more prone to Type I errors, the remainder as prone to Type II errors. I don't see how this influenced his results, however. Can this be factored into the result somehow? Would it be wise to do so, given that this judgment will be somewhat speculative? What if the studies really had such errors, can the impact on the overall result of the meta-study be controlled?

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #105
So, in the end, their hocus-pocus cult will always win within their circles, according to their sad, self-indulgent opinion.
Now there is the study saying trained people heard a difference. Every audiophile sees himself well trained even if deaf as piece of wood. All others are ignorant!

'I'm not saying it was aliens, but it was aliens!'
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #106
What if the studies really had such errors, can the impact on the overall result of the meta-study be controlled?
I would hope so.
Yet another issue I raised, was the validity of the studies themselves, rather than assessing them because they had the right statistical criteria. Maybe they were hearing IM. So it that what "Hi Rez" training involves?
Of course believers will believe and encourage others to "just listen and decide for themselves" (the exact opposite of the controlled studies). Then it's entirely possible for 73 year old audiophile believers to "hear" and vociferously defend the benefits of Hi-Rez without that horrific Redbook 20KHz low pass filtering, using acoustically large panel speakers like these (purple trace):

 ::)

cheers,

AJ
Loudspeaker manufacturer

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #107
Re Type I and Type II errors: I think he's slightly misusing the terms, as they depend upon assumptions made if the models hold, whereas he seems to be addressing the possibility that modeling assumptions were violated.

As far as whether one half of the hi-res "cocktail" could be used to justify the effects of the other, I'm not so keen on that given that the theoretical mechanisms by which they should be detectable are completely different and I see nothing in the analysis that points to any kind of link/dependence. Ideally you'd use models that had something like an interaction term for sample rate x bit depth, but he didn't include anything like that. I'll work on something.

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #108

Yet another issue I raised, was the validity of the studies themselves, rather than assessing them because they had the right statistical criteria. Maybe they were hearing IM. So it that what "Hi Rez" training involves?
 


Reiss does visit the topic of 'what was heard' in a way, when he discusses his Table 2B.   What bugs me about *that* work is that he more or less subjectively bins the potential biases into 'low risk' ,'high risk', and 'unclear'. 

That's an important choice because it affects his argument about Type II errors.

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #109
Mixing tests for frequency and bit depth seems to me like mixing tests for two different medications.

I see your point. To better fit the situation before us, we must further specify that the medications are very strongly different from each other. For example, they might be medications that target vastly different diseases of different parts of the body, or only different diseases that are unique to  different species of animal.

The reason for this is that bit depth and sample rate are orthogonal properties of a digital signal. By orthogonal, it is meant that their implementation and effects are usually completely different and independent from each other.  They can be interchanged or made dependent, but only by means of very intentional and complex processing.  Their basic nature is to be independent of each other. One can be dramatically changed without having any effect at all on the other.

Any meta study that conflates them would seem to have a fatal flaw.




Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #110
he thinks his discussion of Type II errors in audio testing is a new thing
I'd like to point out that "concerns" about Type II errors seem to be part of the audiophile pundit narrative. 

Audiophile pundits have an important challenge, which is to explain why the alleged sonic improvements that they repeately and loudly portray as being obvious and highly significant, are either vastly diminished or even completely disappear when good experimental controls are used.

The obvious solution is to attack the experiment.  Criticizing  bad design or execution done intelligently and honestly can be very helpful.

The problem is that many recent examples of criticism of various other's experiments seem to have serious problems of their own.   For example Table 1 of the paper "A Meta-Analysis of High Resolution Audio Perceptual Evaluation"  tries to categorize the test methodologies into categories called "ABX" "2IFC"  and "Same Different" and others.  The problem is that ABX may be  performed as 1IFC, a 2IFC and/or a Same/Different test, and that some of the tests called "ABX" were references to two different tests, one developed a Bell Labs in the early 1950s and the other developed independently in the 1970s.  The author's confusion in the area of experimental methodology is further exemplified by the following passage:

"Authors have noted that ABX tests have a high cognitive load [11], which might lead to false negatives (Type II errors). An
alternative, 1IFC Same-different tasks, was used in many tests"

But can't  ABX be used as a 1IFC Same/Different test?   The ca. 1970 ABX test was initially implemented as a 1IFC same/different test., Provisions to perform 2IFC tests were added to the ABX implementation in order to help listeners improve the accuracy of their results over the results that they were obtaining with 1IFC tests. This was a working strategy to the extent that many listeeners preferred  the 2IFC  option.  1IFC is often used to this day when listeners become highly accurate and don't need to refer to more than one sample interval per trial to obtain accurate results.

How can a test address its own shortcomings, particularly when the alleged shortcomings only came to light years if not decades after its publication?




Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #111
So what is a "1IFC" test?
Kevin Graf :: aka Speedskater


Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #113
So what is a "1IFC" test?

To expand on AJ's correct info, it is a test where the listener listens to one unknown, and is forced to respond that it is "A" or "B".

An early prototype of the ABX Comparator implemented this scheme. Listeners didn't like it because it depended on the listener's memory over a longer period of time.  They correctly determined that listening for small differences in sound quality over longer periods of time is more difficult test.

The next version of the ABX Comparator implemented the listener preferred 2IFC test with as many opportunities to listen to the known references and unknown X  and compare sounds as the listener felt the need for.   This minimized the need to remember sounds to as short of a time as possible as any of the known references and the unknown could be listened to in any order. Any sound could immediately follow any other, including itself.

Needless to say, I'm completely mystified by those who favor 1IFC over 2IFC, as adding 2IFC support to the ABX Comparator was highly preferred by all of the listeners due to the memory time issue.  In any case the basic listening task is Same/Different.


Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #114
The author's confusion in the area of experimental methodology is further exemplified by the following passage:

"Authors have noted that ABX tests have a high cognitive load [11], which might lead to false negatives (Type II errors). An
alternative, 1IFC Same-different tasks, was used in many tests"

Reference [11] in this snippet from Reiss' Paper is of course the paper by Jackson/Capp/Stuart that has been discussed here before. Precisely the part where they mention the alleged "cognitive load" problem seems to be a thinly veiled revenge to Meyer/Moran. Some critique can also be found on the AES discussion page. None of this, however, seems to have registered with Reiss.

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #115

Reference [11] in this snippet from Reiss' Paper is of course the paper by Jackson/Capp/Stuart that has been discussed here before. Precisely the part where they mention the alleged "cognitive load" problem seems to be a thinly veiled revenge to Meyer/Moran. Some critique can also be found on the AES discussion page. None of this, however, seems to have registered with Reiss.

Correct. Reiss's meta study seems to fail on the grounds that the tests that were used to make up his study are not consistent with each other. IOW it's not a collection of tests of apples, but rather a conflation of tests of just about every fruit and vegetable in the store.

I continue to assert that testing the audibility of various sample rates and word lengths is actually pretty simple at this point in life, but people appear to be afraid to use good procedures because they already know that good procedures don't give them the results that they need or desire.

BTW for an example of summarizing years of subjective testing of audio gear:

Ten years of A/B/X Testing

Experience from many years of double-blind listening tests of audio equipment is summarized. The results are generally consistent with threshold estimates from psychoacoustic literature, that is, listeners often fail to prove they can hear a difference after non-controlled listening suggested that there was one. However, the fantasy of audible differences continues despite the fact of audibility thresholds.

Author: Clark, David L.
Affiliation: DLC Design, Farmington Hills, MI
AES Convention:91 (October 1991) Paper Number:3167
Publication Date:October 1, 1991 Import into BibTeX
Subject:Listening Tests
Permalink: http://www.aes.org/e-lib/browse.cfm?elib=5549

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #116
The primary  purpose of oversampling relates to improving dynamic range. 
It is very feasible to have a digital filter that is very effective and also does not use oversampling.  IOW it operates at the identical same clock frequency as the data it processes.

No, Arny, no.

When the aim is digital anti-imaging aka reconstruction filtering oversampling is mandatory. A digital filter cannot operate above Fs/2. A digital  reconstruction filter's task is exactly to suppress everything above the original Fs/2. So Fs has to be increased before the filter can do this.

Quite how someone can survive in this hobby for decades without getting this eludes me. Do you actually know how sampling works, at all?


Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #117
The primary  purpose of oversampling relates to improving dynamic range. 
It is very feasible to have a digital filter that is very effective and also does not use oversampling.  IOW it operates at the identical same clock frequency as the data it processes.

No, Arny, no.

When the aim is digital anti-imaging aka reconstruction filtering oversampling is mandatory. A digital filter cannot operate above Fs/2. A digital  reconstruction filter's task is exactly to suppress everything above the original Fs/2. So Fs has to be increased before the filter can do this.

Quite how someone can survive in this hobby for decades without getting this eludes me. Do you actually know how sampling works, at all?

You appear to be answering a question that you made up.  I did not mention any specific kind of filter. I was talking about the most common primary purpose for oversampling.

More specifically, I was talking about this:

https://en.wikipedia.org/wiki/Oversampling

"
Resolution
In practice, oversampling is implemented in order to achieve cheaper higher-resolution A/D and D/A conversion.[1] For instance, to implement a 24-bit converter, it is sufficient to use a 20-bit converter that can run at 256 times the target sampling rate. Combining 256 consecutive 20-bit samples can increase the signal-to-noise ratio at the voltage level by a factor of 16 (the square root of the number of samples averaged), effectively adding 4 bits to the resolution and producing a single sample with 24-bit resolution.[3]
"

But since you answered your own question by citing yourself as the superior authority, please provide an independent authoritative source that supports your claim that all digital filters must be oversampled and there is no other purpose for doing so.







Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #118
<snip>

My question is whether studies which administer two different medications, some only one, some both combined, to cure one illness, can be combined in a meta-study, and what this means for the result. I understand Reiss like this: He says that the medications work, but he doesn't say which one. That's a curious result to me. I wonder what it means. Does it mean that the medications work when administered together? Does it mean that either of the two works alone? Does it mean anything at all?

I think rrod´s answer is spot on.
If the CD - Quality (i.e. 16 Bit, 44.1 kHz) is considered to be transparent (audio perception wise) then anything "beyond CD - Quality" qualifies as "Hi-Res" , be it "more bits" or  "higher sample rate" or "more bits and higher sample rate" . Usually one would not do a meta-analysis on every possible reason at once, but as the underlying null hypothesis is as described, it is justified.
And one of the reasons why Reiss recommends strongly further research.

Quote
And another question is on my mind: Reiss tries to judge how much each individual study was subject to errors. Some studies are labelled as neutral, others as more prone to Type I errors, the remainder as prone to Type II errors. I don't see how this influenced his results, however. Can this be factored into the result somehow? Would it be wise to do so, given that this judgment will be somewhat speculative? What if the studies really had such errors, can the impact on the overall result of the meta-study be controlled?

Again i think rrod´s correct in stating that Reiss´s usage of the terms might be slightly different from the normal meaning; at least wrt to Type I Errors, as some of the concerns would normally be treated in an evaluation of test validity .
Regarding Type II Errors- it is usually one of the reasons to do a meta analysis, because by combining the results (if applicable) the stastical power will be raised. High risk of Type II Errors means low power so by doing a meta-analysis the impact of Type II Errors will be lowered overall.

Regarding Type I Errors i have to dig deeper into his material, as he wrote about some recalculation and transforming of data. At least he mentioned several methods (and used these) to control the Type I Error familywise (concerns the multiple comparison problem) but that might have been only in use for his own subgroup analysis .

Btw, you obviously missed the AES press release from 29.06.2016 ........

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #119
If the CD - Quality (i.e. 16 Bit, 44.1 kHz) is considered to be transparent (audio perception wise) then anything "beyond CD - Quality" qualifies as "Hi-Res" , be it "more bits" or  "higher sample rate" or "more bits and higher sample rate" . Usually one would not do a meta-analysis on every possible reason at once, but as the underlying null hypothesis is as described, it is justified.
If that would be true, it could be extended to even more factors associated with redbook CD, for example filter characteristics,  jitter levels, error rates, etc.

I'm not a statistics expert, but the notion that this is a valid and justified approach doesn't seem very convincing to me.

Quote
And one of the reasons why Reiss recommends strongly further research.
Besides being self-evident, it's also a very cheap get-out-of-jail answer in such a discussion. This recommendation didn't prevent him from publicly drawing conclusions from his research that go markedly beyond what he has shown.

Quote
Again i think rrod´s correct in stating that Reiss´s usage of the terms might be slightly different from the normal meaning; at least wrt to Type I Errors, as some of the concerns would normally be treated in an evaluation of test validity .
Well, maybe, but does this change anything? Type I errors lead to false positives, and problems with test validity are prone to have the same effect.

Take the Plenge study which I referred to earlier as an example. One might suspect from their data, that there was something special with the cauer filter, even though none of the trials reached their significance level. That's a speculation, of course, as they haven't tried to get to the bottom of this, as far as one can tell today. We can merely try to make sense of their given data. The cauer filter may have had something in its in-band behavior that allowed slightly easier detection. It could have been in-band ripple, or earlier rolloff. It is known that both can potentially be audible. That would constitute a potential for false positives, i.e. Type-I errors in Reiss' parlance.

More generally, we do have some cumulative evidence that filter characteristics, for reconstruction filters, may in some cases introduce audible effects. I'm not talking only about elusive concepts like "time smearing", "pre-ringing" or phase distortion, but mainly about ordinary stuff like in-band ripple, stopband attenuation, slope, etc.

For the purposes of Plenge et.al., they may have ignored the potential for Type-I errors, since they wouldn't have affected their interpretation of their results. If they couldn't get significance in the possible presence of Type-I errors, elimination of Type-I errors would only have gotten them further away from significance. Hence there was little incentive to investigate the reason for the slightly different result of the cauer filter.

However, if there were such errors, it would matter for the type of analysis Reiss did 35 years later. They would increase the significance levels of his analysis.

Quote
Btw, you obviously missed the AES press release from 29.06.2016 ........
I had seen it. What makes it obvious to you that I must have missed it?

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #120
I had seen it. What makes it obvious to you that I must have missed it?
For the benefit of those who may not have: http://www.aes.org/press/?ID=362
“Our study finds high-resolution audio has a small but important advantage in its quality of reproduction over standard audio content. "
I asked Reiss about the dichotomy between the paper findings of "discrimination" of "unknown cause" and this specious claim about "important advantage over standard audio" found.
He said he stands by the press release and leaves it to the reader whether "discrimination" of "unknown" as stated in paper, implies "advantage over standard audio" per press release, for "Audiophiles" who want music as "close to real thing".
I think it clearly removes any facade of impartiality and purely academic curiosity.

cheers,

AJ
Loudspeaker manufacturer

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #121
Similar press releases have been issued by the AES and by Reiss' university. I think it is fair to assume that Reiss has written both, perhaps with the help of some staff member. It is fairly rare that the AES announces a scientific paper with a press release. In this case, I assume, that the fact that Reiss is the AES vice chair of publications, had something to do with it.

It is of course also clear that press releases aren't peer reviewed, whereas the article was. This must also have had some impact on the actual wording used. I think the peer review could have been better in this case, but if Reiss had included in the paper the kind of conclusions he offered in the press release, the reviewers would have objected (I hope).

Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #122

I had seen it. What makes it obvious to you that I must have missed it?

I think he's saying that he's so taken with Reiss's highly critically flawed article and hyped-up press release that he can't understand how anybody who read it wouldn't want to join him in bowing and scraping at the throne of high resolution as the panacea for what ails audio.

Quote
For the benefit of those who may not have: http://www.aes.org/press/?ID=362
“Our study finds high-resolution audio has a small but important advantage in its quality of reproduction over standard audio content. "

I must have missed the preference testing. What I saw was very weak evidence that on occasional sightings of a blue moon, people
kinda mighta heard a weak difference that might actually be the result of choosing p = 0.05 as his critiera for statistical significance.

Quote
I asked Reiss about the dichotomy between the paper findings of "discrimination" of "unknown cause" and this specious claim about "important advantage over standard audio" found.

He said he stands by the press release and leaves it to the reader whether "discrimination" of "unknown" as stated in paper, implies "advantage over standard audio" per press release, for "Audiophiles" who want music as "close to real thing".

I think it clearly removes any facade of impartiality and purely academic curiosity.

Agreed. Reality is that there have been at least  4 failed attempts to go mainstream with some kind of high resolution audio. All failed in the mainstream marketplace, probably with adverse financial and professional consequences.

(1) HDCD  " as developed and promoted by Prof." Keith O. Johnson and Michael "Pflash" Pflaumer of Pacific Microsonics Inc.

(2) HDCD as promoted by Microsoft, web site discontinued in 2005.

(3) DVD-A

(4) SACD


Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #123
<snip>
If that would be true, it could be extended to even more factors associated with redbook CD, for example filter characteristics,  jitter levels, error rates, etc.

I think you miss the argument which is related to the concept of internal validity.
If an experiment tries to examine a difference in audibility between various formats, the independent variable has to be the format itself, every other effects are confounders that have to be blocked out or, if impossible, randomised.

So, every technically weak point that is related to the format counts, everything else does not.

Quote
I'm not a statistics expert, but the notion that this is a valid and justified approach doesn't seem very convincing to me.

I hope the explanation above helps a bit and it is basically not a question of statistics.
In which way the data should be transformed, what to do in the case of heterogenous data, which underliying model to use for choosing test statistics and how to correct for the multiple comparison problem, that are statistical questions.

Quote
Besides being self-evident, it's also a very cheap get-out-of-jail answer in such a discussion.

"cheap get-out-of-jail answer" might be a suitable phrase for forums, but in science? What else should an author do if a meta-analysis or systematic review did not find overwhelming evidence for a hypothesis?
Last time i looked it up, the percentage of cochrane meta-analysis (systemativ reviews) where authors ends with the recommendation for further research was roughly 50%.
Given the unwillingness in the scientific communitiy to replicate experiments and (even more important) to publish the results of replications it is no surprise.
I´d rather be surprised if just the audio field were an exemption.


Quote
This recommendation didn't prevent him from publicly drawing conclusions from his research that go markedly beyond what he has shown.

I haven´t finished my analysis of Reiss´s paper yet, so can´t at the moment judge if your assertion is correct. At a glance i think "markedly beyond" is exaggerated.

Quote
Well, maybe, but does this change anything? Type I errors lead to false positives, and problems with test validity are prone to have the same effect.

Both may favour getting wrong results or draw wrong conclusions, but nevertheless it is better not to confuse these things. Internal validity means an experiment/tests measures the effect it is intended to measure and if that part is flawed, statistics can´t correct.
Otoh if the statistical analysis is flawed or the test was underpowered, it is nothing where technics could help..

Quote
More generally, we do have some cumulative evidence that filter characteristics, for reconstruction filters, may in some cases introduce audible effects. I'm not talking only about elusive concepts like "time smearing", "pre-ringing" or phase distortion, but mainly about ordinary stuff like in-band ripple, stopband attenuation, slope, etc.

Beside that "in-band ripple" in the frequency domain is related to "pre-ringing" in the time domain, if it is not directly related to the format under test (means unavoidable within the limits of that format) then it should be treated as a confounder.

Quote
<snip>
However, if there were such errors, it would matter for the type of analysis Reiss did 35 years later. They would increase the significance levels of his analysis.

It depends, but i agree that experiments like plenge´s (like every other that uses nonmusic stimuli) can´t really support a conclusion about preferences while listening to music delivered in high res.
But, as said before, i haven´t finished my analysis yet so can´t say which part of Dr. Reiss´s conclusion is backed up by the data.

Quote
Quote
Btw, you obviously missed the AES press release from 29.06.2016 ........
I had seen it. What makes it obvious to you that I must have missed it?

A sentence from your german blog:
Quote
Die AES hat meines Wissens auch keine Presserklärung darüber herausgegeben.


Re: A Meta-Analysis of High Resolution Audio Perceptual Evaluation

Reply #124
I think you miss the argument which is related to the concept of internal validity.
If an experiment tries to examine a difference in audibility between various formats, the independent variable has to be the format itself, every other effects are confounders that have to be blocked out or, if impossible, randomised.

So, every technically weak point that is related to the format counts, everything else does not.
Well, this applies to an individual study. For a meta-analysis, I would expect the "independent variable" to be the same for each input study. If it isn't, I don't know anymore what the result actually says, nor whether the combining of the individual results makes sense. That may be my fault, and perhaps someone manages to enlighten me here.

On top of that, it seems to me that following your argument would mean that some studies would have to be excluded from the meta-study. For example in the case of Jackson [11], it appears that their choice of filter characteristics and dithering method may well be the main factor in their result. So if you think those factors aren't part of the independent variable, which I would concur with, then there must be doubts regarding the internal validity of this test.

A similar argument can be made for other tests. In some cases there is a distinct possibility that the "other effects" you referred to may in fact not have been blocked out sufficiently well. Such other effects could be intermodulation, or artefacts of the filters, amongst other things. As I pointed out already, such potential problems may not have rendered the original test invalid in the sense that its result needs to be dismissed, because the test's own conclusion may not have been endangered by such an effect. However its use in a meta-study is a different case, where such errors can play a different role.

To make this somewhat theoretical argument a bit clearer, consider this thought experiment with 3 hypothetical studies:
  • The first study finds a slight statistical likelihood for being able to distinguish between a particular low-pass filter at 20 kHz being on or off, however the significance level isn't reached, so the study concludes that the null hypothesis couldn't be rejected. The testers are unaware that it was the filter's in-band behavior that was borderline-detectable.
  • The second study finds a slight statistical likelihood for being able to distinguish between a particular dither at 16 bit and no dither at 24 bit. Again, the significance level isn't reached, so the study concludes that the null hypothesis couldn't be rejected. The testers are unaware that the dither used leads to noise modulation with low-level signals, which some testers were just barely able to pick up.
  • The third study finds a slight statistical likelihood for being able to distinguish between 192/24 and 44.1/16 playback, however once more, the significance level isn't reached to allow rejection of the null hypothesis. This time, the factor that just about allowed detection with a few of the testers, was some intermodulation in the speakers which were used in the test, which was triggered by the ultrasonic content present in 192/24.
You would probably agree that all three studies had a flaw. In each case the flaw increased the likelihood of false positives, i.e. there was a risk that the test concluded wrongly that high-res was audibly different from standard resolution. However, in none of the tests the effect was strong enough to make it cross the significance line, so it didn't change the test conclusion. All tests concluded that the null hypothesis couldn't be rejected. Hence there was no need and no incentive to investigate whether there had been any flaws in the test that could have led to false positives.

Now let's do a meta analysis of the three tests combined. Lets suppose that the increased statistical strength obtained by the combined results now causes the significance line to be crossed. The conclusion would have to be that the null hypothesis can be rejected, i.e. one would have reason to believe that the subjects really could hear a difference between hi-res and standard res.

In a sense this result is correct, because in each case there was a factor that caused a just barely audible difference. If each of the tests had been done with more trials, each individual test might well have crossed the significance line by itself.

However, the conclusion would be wrong, because it would have been the result of the flaws in the tests. In each case the subjects' ability to distinguish the stimuli was because of secondary effects that compromised the test's internal validity.

So even though the individual tests reached the right conclusion, because the error wasn't strong enough to tip the balance, the meta analysis results in a wrong result.

Quote
"cheap get-out-of-jail answer" might be a suitable phrase for forums, but in science? What else should an author do if a meta-analysis or systematic review did not find overwhelming evidence for a hypothesis?
That's why I directed the argument at you. Reiss is of couse entitled to recommend further research. I just would have wished he didn't oversell his own results.

Quote
Beside that "in-band ripple" in the frequency domain is related to "pre-ringing" in the time domain, if it is not directly related to the format under test (means unavoidable within the limits of that format) then it should be treated as a confounder.
I fully agree. I think the studies used by Reiss should be put under some scrutiny regarding this possibility.

Quote
Quote
What makes it obvious to you that I must have missed it?
A sentence from your german blog:
Quote
Die AES hat meines Wissens auch keine Presserklärung darüber herausgegeben.
That was my state of knowledge at the point in time when I wrote it. Shortly afterwards, but well before our exchange here, I became aware of the AES press release and duly amended my blog post with an update, which you appear to have missed. I chose to correct it in the form of an update and leave the original text there, for transparency.