It's important to understand that what JJ considers a listening test and what the ABX/Hydogen Audio skeptics crowd considers a listening test are two very very different things.
Perhaps JJ can explain what he considers a listening test and how it's different from the Hydrogenaudio standpoint.I was somehow under the impression they were not that different.
Including positive and negative controls, lots of training for the test as well as familiarity with the equipment and music, and equiment validation are the biggies.Test evaluation might be an issue, too. Many tests, including some of the MPEG tests and 1116 make assumptions that the entire population reacts the same to impairments. While basic masking is universal, what people dislike when they can hear something is NOT universal.
Pio's post does make mention of relegating ABX testing as practice trials, so training is touched upon at least indirectly.I don't see that we should go out of our way to engage in some debate by proxy. Maybe those players who are members here can have the debate here. Those who are not members can certainly join so long as they do so in compliance with our rules, namely TOS12. Personally I'm not interested in advocating for yet another thread dedicated to trolling TOS8, however (somewhat related to those who aren't welcome back per TOS12).
With all due respect to Mr. J., while his criticism of many of our public mass listening tests is valid, we do not stick to that approach dogmatically. Our intent in those tests is to get ordinary-citizen feedback regarding codec quality. All of his criticisms can be addressed without violating TOS8 in any way, they just make tests harder to conduct. We're aiming to maximize the audience we get feedback from, not maximize the quality of the results. Furthermore, as an Internet-based community, some criticisms are nigh impossible to address, such as equipment validation.
I think the criticisms were made in good faith without the intent of demeaning what we do. I fear that this thread will become divisive.
Lots of the personal listening tests are by people with considerable training. As for abx tests by me members the goal is determine if a given file or system is good enough for that individual. In this case training may not even be desirable let alone necessary. I think it comes down to what you want to measure and how you analyze the results.
Honestly, I think our procedure is fine, given what we're trying to achieve. We get statistically significant results. There's no need to change anything. We can run tests with altered procedure, should there be a desire, but what would the goal of such a test be?
Would it help if I put air-quotes around "guilty" and "charged"?
Quote from: krabapple on 23 November, 2012, 10:09:07 PMWould it help if I put air-quotes around "guilty" and "charged"?Not that it has anything to do with attracting TOS8 bashing, but you should suggest a new title that is compliant with TOS #6. The current one doesn't make the grade, with or without scary quotes.
Also, if we're talking about forum policy, this discussion belongs in site related discussion, not listening tests. Please read the subforum descriptions if you haven't already.
How does what I said have anything to do with TOS 8 bashing
I am frankly surprised at the apparent offense taken to what I said. I'm simply describing standard practice.
QuoteAlso, if we're talking about forum policy, this discussion belongs in site related discussion, not listening tests. Please read the subforum descriptions if you haven't already.Seems to me it's a it's a bit of both, and Listening Tests is the more specific of the two. But feel free to move it wherever you think it fits best.
One of the big failures of that kind of testing is the forced ranking. Such tests assume that relative rankings are transitive. We all know better.
On the other hand, listening testers might self-select anyways, so that those who go to the trouble to take such tests may very well find the request for additional documentation of their listening experience, training, etc. to be reasonable. And such documentation would be extremely useful to use HA test results as an adjunct for clinical-/institutional-grade listening tests, of the sort that jj describes.
Sorry, "transitive" doesn't describe enough well your central idea and I'm quite sure people interpret it different (read as wrong) ways.You're questioning not only HA's methodic but the whole ABC/HR, hence all previous tests which were used for standarization of lossy encoders. But that's not an issue. Everybody is free to beleive and express an ideas freely.
With the talk about "including positive and negative controls", isn't this base already covered? We've been including low and high anchors for a while now. Is there more to this criticism than just the two forms of anchor?
A negative control is A vs. A, present as ABX or ABC/hr, of course. If that's what you mean by 'high anchor', that's good.A positive control might be a low anchor, but you would then perhaps want multiple anchors. So anchors that are not tests of identity can all be positive controls IF they should all be audible.Basically, you want a positive control of a level equal to your desired test sensitivity. Yes, I know, this isn't the easiest thing in the world to spec.But any test result has to show the results of the controls.Anchors are generally for a different purpose, that of relating one test to another, of course.