What is the justification for the "dashed" portion of the curve?Shouldn't it be a flat line once you reach "imperceptible"? If not, once something is imperceptible, how can it become "more imperceptible"?
Without that dashed section assessment of quality beyond perception is just impossible.
Exactly! Which is as it should be - there is no change in "quality" beyond the point of perception, unless you're defining "quality" to mean something imperceptible.
If you want to build human-hearing-oriented audio metrics for the area beyond perception point (p-point) you will presently need some psychometric relationship in that area, which is impossible by definition – you can't research perception beyond p-point.
Each masking threshold was determined by a 3-interval, forced choice task, using a one up two down transformed stair case tracking method. This procedure yields the threshold at which the listener will detect the target 70.7% of the time [Levitt, 1971]. The process is as follows.For each individual measurement, the subject is played three stimuli, denoted A, B, and C. Two presentations consist of the masker only, whilst the third consists of the masker and tar-get. The order of presentation is randomised, and the subject is required to identify the odd-one-out, thus determining whether A, B, or C contains the target. The subject is required to choose one of the three presentations in order to continue with the test, even if this choice is pure guesswork, hence the title “forced choice task.” If the subject fails to identify the target signal, the amplitude of the target is raised by 1 dB for the next presentation. If the subject cor-rectly identifies the target signal twice in succession, then the amplitude of the target is re-duced by 1 dB for the next presentation. Hence the amplitude of the target should oscillate about the threshold of detection, as shown in Figure 6.5. In practice, mistakes and lucky guesses by the listener typically cause the amplitude of the target to vary over a greater range than that shown. A reversal (denoted by an asterisk in Figure 6.5) indicates the first incorrect identification following a series of successes (upper asterisks), or the first pair of correct identi-fications following a series of failures (lower asterisks). The amplitudes at which these rever-sals occur are averaged to give the final masked threshold. An even number of reversals must be averaged, since an odd number would cause a +ve or –ve bias. Throughout these tests, the final six (out of eight) reversals were averaged to calculate each masked threshold.The initial amplitude of the target is set such that it should be easily audible. Before the first reversal, whenever the subject correctly identifies the target twice, the amplitude is reduced by 6 dB. After the first reversal, whenever the subject fails to identify the target, the amplitude is increased by 4 dB. After the second reversal, whenever the subject correctly identifies the tar-get twice, the amplitude is reduced by 2 dB. After the third reversal, the amplitude is always changed by 1 dB, and the following six reversals are averaged to calculate each masked threshold. This procedure allows the target amplitude to rapidly approach the masked thresh-old, and then finely track it. If the target amplitude were changed in 1 dB steps initially, then the decent to the masked threshold would take considerably longer, and add greatly to listener fatigue. In the case where the listener fails to identify the target initially, then the target ampli-tude is increased by 6 dB for each failed identification, up to the maximum allowed by the re-play system (90 dB peak SPL at the listener’s head).
I repeat my original assertion - the curve should be a flat line when it reaches the point labeled "imperceptible".
> human-hearing-oriented> beyond perception pointDoes not compute.
You are talking about "quality margin", but there is no such thing as absolute quality. Quality is, essentially, a measure of fitness for particular purpose. That is, notion of quality is always related to particular application, or a defined set of applications.So, what kind of application your extrapolated curve relates to?If the purpose is simply to compare two codecs or devices as applied to perceived qualty of audio reproduction, then the part of the curve below the "imperceptible" threshold should be sufficient. What purpose does the extrapolated part serve then?
If this extrapolation is sane (I have no idea if it is), then one could predict the outcome of exhaustive, expensive listening experiments from small ones, and say something clever about the likelihood of a given flaw ever being detected, right?
This has been discussed on the forum on more than one occasion. While Serge may take his method seriously, HA does not.
Just to be clear, your graph example shows grades where the default noise level (0dB) is quite objectionable, and reducing the noise makes it less and less so - correct?But with codec testing, you do kind of the opposite. The default noise level (0dB) is usually indistinguishable/transparent, or very nearly so, and to build the "worse quality" part of the curve (the part where people can hear the noise), you have to amplify the coding noise - correct?
People in this thread are saying the scale beyond "imperceptible" makes no sense. I'm not sure if that's true or not. What you're "measuring" (I put that in quotes - see later) is how far the coding noise sits below the threshold of audibility. (or above, if it's audible at the default level). If the second-order curve theory holds true, then to do this you only need sufficient points on the curve where the difference is audible. Points on the curve where the difference is inaudible don't help because it does become a flat line there.
There are several accepted ways to judge the threshold of audibility. I used this one..................................................................................................................................It seems to me that your method is far kinder to listeners. If your second order curve fitting can be justified, then it's a really neat way of finding the threshold of audbility (the cross over from 5.0 "imperceptible", to 4.9 "just perceptible but not annoying" on the usual scale) without even having to test at that (difficult) level.
So far so good. What I'm less convinced of is the implication that a given codec has so much "headroom", and that this is a "good thing".e.g. on the range of content tested, at a given bitrate/setting, a given codec might be transparent even with the noise elevated by 12dB. It scores well in your test. Fair enough. IMO it would be wrong to draw too much from this conclusion. e.g.1. It's tempting to think this means it's suitable for transcoding, but it might not be - it might fall apart when transcoded.2. It's tempting to think this means that audible artefacts will be rarer (and/or less bad) with this codec than with one where the noise becomes audible when elevated by 3dB, but this might be very wrong - this wonderful codec which keeps coding noise 12dB below the threshold of audibility on the content tested might fall apart horribly on some piece of content that hasn't been tested.
SE testing methodology is new and questionable, but all assumptions look reasonable and SE ratings – promising, at least to me. Time will show.
Seeing as all this will be entirely dependent on the short-term spectrum of both signal and interferer, I wonder how you can develop any "metric" that is not specifically designed for one track, or one short bit of music.
In your example, I see no accounting for spectra, which is a key factor for the human auditory system.
Quote from: Woodinville on 26 November, 2010, 02:25:40 AMSeeing as all this will be entirely dependent on the short-term spectrum of both signal and interferer, I wonder how you can develop any "metric" that is not specifically designed for one track, or one short bit of music.The metric works as long as you can measure Diff.Level (always) and estimate annoyance of diff. signal in some sound excerpt (not always, for long excerpts the term "basic audio quality" could be inapplicable). In short - if listening tests are valid for the excerpt, the metric is valid too.
Um, I don't think so. I can measure a difference level that is exactly the same, i.e. the same exact SNR, and have enormously different percieved quality.See "13 dB miracle", please.
Exactly, the "different perceived quality" will be revealed during listening tests and this will be reflected by the psychometric curve above. So, the same Diff.level will be mapped to different points on subjective scale because of different curves.
So, then, this curve of yours is only useful to compare like to like. This is, simply put, not very useful. I don't get your point here.
Inaudible impairments of impairments near the threshold of audibility require a new method to assess the quality. A variable amplification of the impairments to provide the detection can be realized with the help of the difference signal. In a large listening test, the coding margin for 14 test items was measured. A time varying filter bank to modify the difference signal and to enhance the listening conditions is described.
NB: Just after posting I found an old HA post from Serge. Apparently it's his own paper, although the paper states "Feiten, Bernhard" as the author from Deutsche Telekom.
That's a mighty big if.For years people have requested verification and none has been forthcoming.