Print Page - Codec overall average

Title: Codec overall average
Post by: Serge Smirnoff on 2013-04-13 13:07:26

I'm not a great specialist in statistical analysis but I feel that something is fundamentally wrong with the one I make here (http://soundexpert.org/news/-/blogs/opus-aac-and-vorbis-in-64-kbit-s-section#results). I stumbled upon a simple and very basic question.

Per-sample means of grades and their bootstrapped conf. intervals have clear and simple meaning. But overall average of all grades received for a codec with corresponding confidence interval seems to me meaningless, much like average temperature over a hospital; at least it is hard to interpret and compare such values. It looks more reasonable for me to compute final codec averages using per-sample means only, not all grades. While resulting averages using both methods are almost identical, the confidence intervals, interpretation and methods of further analysis are different. So my question in short - What population we consider while computing overall codec average – population of grades or population of per-sample means?

I would be thankful if somebody cleared this for me.

Title: Codec overall average
Post by: Woodinville on 2013-04-14 22:08:08

Quote from: Serge Smirnoff on 2013-04-13 13:07:26

I'm not a great specialist in statistical analysis but I feel that something is fundamentally wrong with the one I make here (http://soundexpert.org/news/-/blogs/opus-aac-and-vorbis-in-64-kbit-s-section#results). I stumbled upon a simple and very basic question.

Per-sample means of grades and their bootstrapped conf. intervals have clear and simple meaning. But overall average of all grades received for a codec with corresponding confidence interval seems to me meaningless, much like average temperature over a hospital; at least it is hard to interpret and compare such values. It looks more reasonable for me to compute final codec averages using per-sample means only, not all grades. While resulting averages using both methods are almost identical, the confidence intervals, interpretation and methods of further analysis are different. So my question in short - What population we consider while computing overall codec average – population of grades or population of per-sample means?

I would be thankful if somebody cleared this for me.

This would certainly reveal more about codec performance. In fact, per-sample mean compared to overall mean by itself often tells a story.

Also: Confidence intervals tell a lot. When you find a sample with a high confidence interval, it usually means that different listeners respond very differently to the distortions in that sample.

Title: Codec overall average
Post by: Serge Smirnoff on 2013-04-15 01:02:04

Ok. Using experimental nature of SE project and its forever-beta state, I'm going to introduce non-standard audio quality analysis and comparison at SE. Here is a draft:

1. Treat each sound sample as revealing some aspect(s) of codec performance. Mean and conf. intervals of the sample grades are quantitative estimators of that aspect(s). A collection of such means defines quality profile of the codec. This quality profile is specific to particular samples used, listening conditions and listening subjects. Comparison of codecs is in fact comparison of their quality profiles.

2. Integral parameter of a quality profile is mean of its mean collection. As there is no possibility to make any assumptions about distribution of means in collection only non-parametric estimators are allowed. Bootstrap confidence interval could be sufficient though.

3. In order to compare different codecs (their quality profiles) some simple and clear criteria are necessary. They could be for example as follows:
[blockquote]some codec A considered to be better than codec B if ALL means of profile A are higher than corresponding means of profile B (Low Criterion) [/blockquote]
For uncompromising audio purists and statisticians there could be a more rigorous criterion:
[blockquote]the same as Low Criterion but with additional requirement of non-overlapping corresponding confidence intervals (High Criterion)[/blockquote]

Three degrees of “better” follow from this:

if overall mean of a codec is higher, but Low Criterion is NOT met (some samples were graded higher, some - lower) such codec is “conditionally better”; comparison of per-sample means could reveal those "conditions" (particular weaknesses of the codec)
if overall mean of a codec is higher, and Low Criterion is met (all samples were graded higher) such codec is “better”
if overall mean of a codec is higher, and High Criterion is met (all samples were graded higher without overlapping intervals) such codec is “unconditionally better”

“Worse” can be introduced accordingly if necessary.

What are possible weaknesses/down sides of this approach?

Title: Codec overall average
Post by: Woodinville on 2013-04-15 10:29:02

There's a problem there. I know it's easy to design a codec wherein most subjects will give it a pass, i.e. not make any useful distinction, but a few listeners will hate, hate hate the results.

This will increase the confidence bound, no matter how you look at it.

So it's a bit harder than what you propose.

Title: Codec overall average
Post by: Serge Smirnoff on 2013-04-15 11:45:21

Quote from: Woodinville on 2013-04-15 10:29:02

There's a problem there. I know it's easy to design a codec wherein most subjects will give it a pass, i.e. not make any useful distinction, but a few listeners will hate, hate hate the results.

This will increase the confidence bound, no matter how you look at it.

Sorry, not quite understood your example. Confidence intervals are not used directly for inference about overall quality in the proposed metric. Only per-sample means matter, and only taken together. Can you describe your example in more details?

HydrogenAudio

Hydrogenaudio Forum => Scientific Discussion => Topic started by: Serge Smirnoff on 2013-04-13 13:07:26