Should HA promote a more rigorous listening test protocol?
Reply #24 – 2012-11-27 14:05:13
A negative control is A vs. A, present as ABX or ABC/hr, of course. If that's what you mean by 'high anchor', that's good. A positive control might be a low anchor, but you would then perhaps want multiple anchors. So anchors that are not tests of identity can all be positive controls IF they should all be audible. Basically, you want a positive control of a level equal to your desired test sensitivity. Yes, I know, this isn't the easiest thing in the world to spec. But any test result has to show the results of the controls. Anchors are generally for a different purpose, that of relating one test to another, of course. I think I understand now. We're talking about Control as in Control Condition in a Controlled Experiment, where the Control is used to compare against the Test Condition. Negative Control in this case does not refer to negative or positive numbers, but to a Null Condition where no difference should be expected. This means that the Negative Control is there to catch False Positives (where listeners falsely detect non-transparency) We are comparing the original sample (or possibly the high anchor) with itself, so should expect no difference. This eliminates testers who claim to discern a difference when they cannot, but might believe they can because of expectation bias or something similar and also those who might be tempted to score somewhat at random. All the recent HA public listening tests include in their methodology a method of excluding results for any sample & tester in which the reference sample is rated for impairment. Given that ABC/HR is used, in the case of uncertainty (i.e. non-obvious flaws) a tester should be performing an ABX to verify that a difference is discernible before committing their ranking. I think it's then obvious that the meaning of Positive Control is a sample that should be obviously inferior to the reference to all listeners, but not necessarily inferior to all the samples under test. The Positive Control is there principally to catch False Negatives (where people thing it is transparent). It's difficult to get the idea that 'negative' = 'bad' out of one's mind. In this case 'negative' means 'good' as in 'unable to detect the difference from the reference'. In some cases, it's a low-pass filtered sample. In the case of the recent speech codec comparisons conducted by Google to evaluate Opus (in its SILK and Hybrid modes) versus other speech codecs, there was both a 3.5 kHz LPF and a 7 kHz LPF in the test which should function as a Positive Control and something of an anchor to provide comparison between different listening tests. In recent HA tests the low anchor has consistently been scored low by all participants who weren't excluded, if I recall correctly, which tends to indicate that False Negatives (false transparency results) have been excluded. Usually the low anchor is below all the tested codecs on every sample. There may be scope for using an intermediate anchor whose quality should fall consistently in about the range of impairments expected by the codecs under test. The problem may be that the nature of impairment is consistent, making it too easy to detect the anchor. We usually do plot the low anchor in HA public listening tests, but not the reference, though one or two tests did use a high anchor that was not the original audio and plotted it. Where ranked references result in exclusion from the results, the screened results will obviously place the Negative Control (for False Positives) at the screening level (typically 5.0), making a plot of these values trivial.