Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Objective difference measurements to predict listening test results? (Read 25138 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Objective difference measurements to predict listening test results?

Reply #25
4 predictions?  More like 1.  You're effectively holding AAC,  hitting MP3 sloppy and missing OPUS.

They are 4 (Table 7). At the moment they are of little practical use for general public. They are here to show the method of error assessment for the model/ predictions, the method that will be used during the research.

That's way more words than needed to admit the fact you're talking 3 minor variants of AAC, not 3 unique predictions, no?

Three variants of AAC and Lame. If cluster analysis shows another codecs in this group then predictions will be possible for those codecs as well.
keeping audio clear together - soundexpert.org

Objective difference measurements to predict listening test results?

Reply #26
With what basically is a difference signal, I would expect that if these difference signals have high correlation that they also sound similar. That's not the objection though. The objection is that you always need listening tests first, but then you already have the scores (which you could inter- or even extrapolate if you have enough data points)...
"I hear it when I see it."

Objective difference measurements to predict listening test results?

Reply #27
... The objection is that you always need listening tests first ...

Old listening tests are also suitable.
keeping audio clear together - soundexpert.org

Objective difference measurements to predict listening test results?

Reply #28
Yes, doesn't matter if new or old.

I'm still not sure how the Df score is helping...
"I hear it when I see it."

Objective difference measurements to predict listening test results?

Reply #29
I'm still not sure how the Df score is helping...

Assuming Df-SQ relationship exists what benefits we get:

1. More productive listening tests. Organization of a listening test can be started with choosing much more contenders than usually – 10-20 for example. After processing some sound material by the contenders cluster analysis will show the groups with similar Df signatures. For each group it suffices 4-5 contenders to be really tested by listeners, the scores of others can be computed with known accuracy. New contenders (say, with different settings) can be added to the test any time later if they fall into the group.

2. Objective measurements with high correlation to subjective scores. Similarity of Df signatures between analog audio devices is much closer than between codecs. It means that using relatively big and varied sound sequence we can compute Df value that says a lot (but not everything of course) about audio performance of a device and match well with auditory impression of this device. A single parameter that is both highly informative and relatively easy measurable.
keeping audio clear together - soundexpert.org

Objective difference measurements to predict listening test results?

Reply #30
For (2), I have to say I am not satisfied with the current low degree of freedom. If you draw a line from only 4 points, only 2 degree of freedom will be left.  Removing further one point will results in 1 degree of freedom. It means the prediction will be very unreliable.
In our particular case we already have 4 predictions and can roughly assess their reliability (Table 7): max error is 5.86%, RMS Error is 0.10. Having only 3 points (and one of them - lame - is not quite suitable according to cluster analysis) this is not a bad result. Thus, we have an instrument for assessment of reliability of predicted scores and can research dependency of this reliability upon, for example, number of points or average distance between Df sequences. For that purpose other listening tests should be examined (with more codecs tested).

I wouldn't trust an assessment of reliability, even if it includes the max error and RMS error, if the data is derived from only four samples.
Statisticians wouldn't use such small sample size.

This is the observed result.


Imagine testing more codecs, and adding the results to the graph. It can be like this.


Or it can be like this.


Small sample numbers are vulnerable to heavy-tailed distributions.
My listening test includes 74 tracks. Why not use the individual track data, to gain a degree of freedom over 70?

Objective difference measurements to predict listening test results?

Reply #31
Hmmm... and I'm guessing the strangely high current maximum scores for e.g. aoTuV B4.51 on the 128-kbps SE test site (over 14 when 5.0 is already defined as transparent) are due to the same problem, right?

See discussion in this HA thread, edit: especially this post.

Chris
If I don't reply to your reply, it means I agree with you.

Re: Objective difference measurements to predict listening test results?

Reply #32
Sorry for this time lapse, the research goes on.

As method and instruments of the research are quite defined now, I'm going to examine 7-10 more listening test cases (the more the better). Almost any listening test can be used for the purpose if there is possibility to download test sound samples and exact codec contenders. One such well-documented test (thanks to Kamedo2 again) was conducted in July 2014. I already started to process its results. Probably other HA listening tests could be also used. If you have info about any listening test with codecs and samples available, please, share.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #33
Or it can be like this.


Small sample numbers are vulnerable to heavy-tailed distributions.
My listening test includes 74 tracks. Why not use the individual track data, to gain a degree of freedom over 70?
I'm pretty sure the result of using individual tracks will be like the one above.


Hmmm... and I'm guessing the strangely high current maximum scores for e.g. aoTuV B4.51 on the 128-kbps SE test site (over 14 when 5.0 is already defined as transparent) are due to the same problem, right?
The ratings on that page are computed analytically on the basis of 3 points of real listening tests. So, when number of returned grades is not sufficient the rating can change a lot with each new grade added. The average rating (3) is less prone to such effect but high (4) and low (5) are.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #34
HA July 2014 listening test has been added to the research - Case #2. Below is Df vs. QScore scatter plot for 40 samples and 4 codecs:



The next will be Public AAC Listening Test @ 96 kbps (July 2011).
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #35
And that is showing that e.g. -17 dB can mean anything really? Quality scores range from 1 to 5.

What the difference shows is just an obvious general trend: if here are little or no measurable differences then quality is likely to be higher than if there are huge measurable differences.
But that does not make it very useful to assess perceived sound quality.
"I hear it when I see it."

Re: Objective difference measurements to predict listening test results?

Reply #36
And that is showing that e.g. -17 dB can mean anything really? Quality scores range from 1 to 5.

What the difference shows is just an obvious general trend: if here are little or no measurable differences then quality is likely to be higher than if there are huge measurable differences.
But that does not make it very useful to assess perceived sound quality.
In our case knowing the "trend" means that you have some equation which connects measurable differences with quality score averages. And thus, knowing average of differences, say -17dB, you can compute average score with foreknown accuracy.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #37
Public AAC Listening Test @ 96 kbps (July 2011) has been added to the research - Case #3. Now a disturbing surprise - for two close AAC codecs Nero and CT (according to distance measure, Fig.303) lower Df value corresponds to lower subjective quality score. 



Looking for next listening test to explore.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #38
The next listening test to explore will be Public MP3 Listening Test @ 128 kbps (October 2008)

The older the test the harder to find the samples and contenders. For that test I found everything except three samples:

 - Vangelis_Chariots_of_Fire
 - White_Stripes_Hypnotize
 - sfbay

Please, share if somebody still have them.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #39
Public AAC Listening Test @ 96 kbps (July 2011) has been added to the research - Case #3. Now a disturbing surprise - for two close AAC codecs Nero and CT (according to distance measure, Fig.303) lower Df value corresponds to lower subjective quality score. 

That is not disturbing, not a surprise at all. It is exactly what xnor has said, and what you inadvertently (to you) asserted: You can only get a generic prediction like: "if the difference is below -20dB", the score is likely to be above 3". And that's all.

Since you are not making an acoustic test (i.e. you are not judging multiple relevant aspects of the audio), you cannot get an absolute position, but instead a likely position.

Concretely, the problem is as follows: 
The higher the difference it is, the more relevant this difference is against other audible artifacts, so it gets a higher importance when dedicing a result.
The lower it is, the less likely it is to be the main problem, so you can get a broader range of results when judging it by ear.

Re: Objective difference measurements to predict listening test results?

Reply #40
That is not disturbing, not a surprise at all. It is exactly what xnor has said, and what you inadvertently (to you) asserted: You can only get a generic prediction like: "if the difference is below -20dB", the score is likely to be above 3". And that's all.

Since you are not making an acoustic test (i.e. you are not judging multiple relevant aspects of the audio), you cannot get an absolute position, but instead a likely position.

Concretely, the problem is as follows: 
The higher the difference it is, the more relevant this difference is against other audible artifacts, so it gets a higher importance when dedicing a result.
The lower it is, the less likely it is to be the main problem, so you can get a broader range of results when judging it by ear.
All audible artifacts happen ONLY because initial waveform has changed. There are no artifacts without changes in waveform. This is obvious I think. So, every artifact has its “tracks” in waveform, and any such “tracks” can be measured with Df. The real problem is that in general case there is no simple dependency between magnitude of the audible artifacts and magnitude of their “tracks” in waveform. In other words, smaller “tracks” can cause greater audible artifacts and vice versa. This is because not all “tracks” are equal or similar. And here I contend that if the “tracks” are similar then there is a simple dependency between magnitude of artifacts (quality scores) and the size of their “tracks” (Df). Example of such dependency can be seen in Fig.18. It is not a “likely position” at all, it is very clear relationship, the more samples used the clearer.

Taking all this into account your explanation of the “disturbing surprise” seems to me unconvincing. I think quite the opposite – the lower Df (closer to -Inf) the shorter the range of results when judging it by ear, just because the artifacts become hard to be noticed at low difference. Anyway, the research is aimed exactly to test my hypothesis and if you are right we will see more such disturbing occurrences. But 3 cases are really insufficient to make any conclusions.

BTW, from the results of Public MP3 Listening Test @ 128 kbps (October 2008) you participated in it. May be you still have somewhere the samples from that test. I found 11 of them and still missing 3. Sebastian told me that sfbay is San Francisco Bay Blues by Eric Clapton from Unplugged. So I know all the three tracks but it's necessary to find the parts used during the test. Probably I can figure out those problem parts by listening but it would be better to know for sure.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #41
You can draw a line through all kind of widely scattered data, even random data ... and you may also detect a trend.

Unless you can lower the variance of the data (by applying some psychoacoustic processing on your Df), the results will continue to be all over the place and therefore the Df of limited use.
In other words, you want to see data points along and close to (ideally on) this average trend line.

It already takes quite a while to compute the Df. During that time one can probably just listen to the files and actually assess the quality (because the Df as it is does not assess subjective quality).
"I hear it when I see it."

Re: Objective difference measurements to predict listening test results?

Reply #42
You can draw a line through all kind of widely scattered data, even random data ... and you may also detect a trend.

Unless you can lower the variance of the data (by applying some psychoacoustic processing on your Df), the results will continue to be all over the place and therefore the Df of limited use.
In other words, you want to see data points along and close to (ideally on) this average trend line.
Moving the data points closer to the average trend line is not possible nor necessary. Various sound samples have different Df values naturally. Exactly like the samples have different quality scores during listening test. The relationship/trend between Df and QS is the same no matter what you use for its computation – individual data points or their averages. The more points the better.

Applying psychoacoustic processing on Df – the approach already used in PEAQ and the like. It also has its pros and cons.

Quote
It already takes quite a while to compute the Df. During that time one can probably just listen to the files and actually assess the quality (because the Df as it is does not assess subjective quality).
Df is intended to measure distortion of initial waveform. It is quantitative objective parameter like THD, IMD etc. But in some cases it can be converted into quality score. To define such special cases is my goal at the moment. I think it would be great if there would be possibility to replace listening tests with measurements in some cases and to be sure that the error of such conversion will be below some predefined value.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #43
Df is intended to measure distortion of initial waveform.
Your artifact amplification/difference method has been thoroughly debunked on this forum time and again.  Why you continue to trumpet it here is anyone's guess.

It is quantitative objective parameter like THD, IMD etc.
...and every bit as useless for judging the quality of psychoacoustic-based compression.

Re: Objective difference measurements to predict listening test results?

Reply #44
Moving the data points closer to the average trend line is not possible nor necessary.
Then I don't see why you keep trying to assess quality with Df, because while your code has the potential to be extended in such a direction you don't seem interested. So it simply is not a subjective quality measure.

Various sound samples have different Df values naturally. Exactly like the samples have different quality scores during listening test.
Yes, like matching random numbers with more random numbers and then drawing a weak trend line through the average.


I think it would be great if there would be possibility to replace listening tests with measurements in some cases and to be sure that the error of such conversion will be below some predefined value.
The only way I could see this remotely working is by extremely narrowing down your data e.g. to the same lossy format using the same mode and possibly even same track. In other words, the Df may tell you that a 256 CBR MP3 will probably sound better than a 192 CBR MP3 (encoded with the same encoder). So you'd have a different trendline for each such set of parameters.
The usefulness of this information is questionable however.
"I hear it when I see it."

Re: Objective difference measurements to predict listening test results?

Reply #45
The only way I could see this remotely working is by extremely narrowing down your data e.g. to the same lossy format using the same mode and possibly even same track. In other words, the Df may tell you that a 256 CBR MP3 will probably sound better than a 192 CBR MP3 (encoded with the same encoder). So you'd have a different trendline for each such set of parameters.
The usefulness of this information is questionable however.
Yes, this is the core idea of using Df measurements in the sphere of subjective evaluations (main area for Df is objective measurements). Df can tell that some codec sounds better than the other only if (1) the same set of samples is used and (2) codecs have similar type of processing or in other words – similar type of waveform distortion. For any such specific case a unique trend line exists. Condition (2) can be met not only for different settings of the same codec but also for different codecs which use similar psychoacoustic models. To avoid guessing whether two types of waveform degradation are similar or not, the measure of similarity is introduced (based on distance correlation). It helps to expand applicability of the method far beyond codec evaluations. For any two audio devices, if their types of waveform distortion are similar then the device with lower Df will sound better. The experiments with portable players revealed high similarity of their waveform degradation (far below the figures for codecs), so their sound quality can be easily compared just by Df measurements with sufficient amount of real world music material.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #46
Condition (2) can be met not only for different settings of the same codec but also for different codecs which use similar psychoacoustic models.

What I find interesting about this need for similarity-in-lossy-technique is that it suggests that Df measurements may have absolutely zero correlation with audibility, rather they are detecting encoder settings which change along with bitrate.

In other words, have you tested Df against known-inaudible changes?  Such as AAC @ 256 with 20Khz cutoff vs AAC @ 256 with 18Khz cutoff?  If Df says the second one is lower quality it's not helpful.
Creature of habit.

Re: Objective difference measurements to predict listening test results?

Reply #47
"Similar" may not be good enough. I mean that is what the data seems to have shown so far.
So again, ideally you'd want the exact same distortion just with different amplitudes.
"I hear it when I see it."

Re: Objective difference measurements to predict listening test results?

Reply #48
Your artifact amplification/difference method has been thoroughly debunked on this forum time and again.  Why you continue to trumpet it here is anyone's guess.
These discussions help me to improve my research and methodology.
keeping audio clear together - soundexpert.org

Re: Objective difference measurements to predict listening test results?

Reply #49
Hmm, I totally borked the quotes, but I think what I intended to say can be deciphered, QuickEdit isn't working.  I'm probably blocking some scripts somewhere.

Point being that because it only works on similar encoders suggests it may be detecting inaudible changes which while correlated with the encoder settings aren't actually important to subjective quality.
Creature of habit.