HydrogenAudio

Lossy Audio Compression => MPC => Topic started by: ScorLibran on 2004-02-08 17:43:01

Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-02-08 17:43:01
This is a question that has floated through my mind for most of a year, but only yesterday became more clear to me.

Actually, it's a two-part question...

... 1.  Is MPC commonly accepted among the HA community as the best psychoacoustic encoding format?  (i.e., the most efficient at achieving perceptual transparency.)

... 2.  If so, why?

The first item I've heard stated quite frequently, but have never seen any results of "transparency threshold tests" that would reveal the superior efficiency of MPC.  I've heard that MPC uses superior encoding technology, but I'm referring more to the end result of such development efforts...the perceived sound quality, as measured against other codecs at the point of perceptual transparency for a significant number of people.

These concerns on my part were born from a post I made here (http://www.hydrogenaudio.org/forums/index.php?showtopic=18491&view=findpost&p=182025), where the points of MPC statistically tying other formats at 128kbps, but no other known test results existing, were brought up.  The thread portion ended up in the recycle bin, but I'm taking the chance that my concerns about calling MPC "the best" weren't the reason it was put there.

Hence, I want to bring up this idea in a different context in the off-topic forum (in the hope that this will be the correct area for it).

What I'd like to see, for instance, for the education of myself and others, would be a results summary like the following (though this is a very simplistic example)...

Format............Perceptual Transparency Threshold (nominal bitrate across samples tested)
MPC.................nnn kbps
AAC.................nnn kbps
Vorbis..............nnn kbps

...and so forth

Granted, VBR is more efficient at mid-bitrates and up, and quality-based VBR modes aren't bitrate centric, but we need some means of measurement and comparison between codecs in this context, so if not calling it "nominal bitrate", then perhaps "average filesize per minute of audio across all samples"

Perceptual Transparency Threshold could have a fixed target, like >90% samples with 5.0 subjective ratings, and non-differentiable from reference with ABX testing.

This kind of test has been discussed before, and has been mostly viewed as having little "real-world value".  And I agree. Roberto's tests are much more relevant for most music listeners, and for determining the best formats for useful purposes like streaming audio, portable players, etc.

Many of us (including me) have trouble testing even these bitrate ranges, so higher ones would be even more tedious, and would answer not as many pressing questions.

My point, though, is how can MPC be called "the best for achieving transparency" without a test such as this?  (Because so far it's been shown to be only "among the best" at lower rates.)
Title: Why is MPC perceived to be the best?
Post by: sthayashi on 2004-02-08 18:53:57
1) The answer appears to be yes.
2) This result came about long before you or I joined HA, possibly before HA was even founded.

There are two specific attitudes I've found here that border on violating TOS #8.  You've picked up on one of those attitudes.  The other is that --aps uses the most efficient settings for mp3.

Many people have reinterated the above sentiments (myself included), but most of them don't know what formal testing was done to establish MPC and Lame --APS as the best.  The generic response tends to be, "They are the best, if you don't think so, scientifically* prove it wrong" (assuming ABX is scientific).

As for your table, the perceived threshold for transparency may be lower than you think.  Many people think that 128kbps is transparent.  Certainly, even more people will think that 192 is transparent.  But most of this is person dependent.  Although it's been a while since I've tested, 160 mp3 may very well fit the description of transparent for me, where I can't distinguish 90% of the samples from the original.

Likewise though, there are people on this very board who can hear the difference between the higher settings.  So creating a table would be impossible, since it's highly dependent on the person.

But I do agree with you that the lack of data concerning MPC is mildly disturbing
Title: Why is MPC perceived to be the best?
Post by: ExUser on 2004-02-08 20:51:31
There's never been a hard and fast test as to which is the best. The problem is the same reason why roberto hasn't attempted a test at greater than 128kbps. On representative samples, you run into transparency.

So, the basis for Musepack's high status around here is pretty straightforward: on all the problem samples that kill most encoders, Musepack (at Q5+) tends to do better than the others. This was especially pronounced before AAC and Vorbis were contenders, when LAME was being tuned, --r3mix happened, and Frank was still heavily tuning Musepack and so on. Musepack is also less technically complex, being a subband encoder. Tuning it takes less effort than would tuning a transform encoder like Vorbis, AAC, or MP3. It also has fewer intrinsic problems with things like pre-echo.

I am uncertain if the problem samples comparisons would give the same results now. However, you're not going to get comparative information by any other means than problem samples. The issue there is that different encoding methods give different problem samples, so one that breaks Musepack horribly might not break AAC at all.

It would be nice if one of the older members here would say a word or two. The opinions I've reiterated here were "fresh" when I joined. They've sort of been passed down to later HA members by proxy.
Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-02-08 21:52:20
Thanks for the input, guys. 

If nothing "formal" was ever done, I'd still like to see some kind of initiative to provide evidence that "MPC is the best" if it's going to claimed as such.  I would think a variety of samples, most problem samples, and few that were not previously considered "problem samples", might provide a fairly thorough test package, without being too unmanageable.  Granted, there's no way to provide a comprehensive, "across the board" appraisal of the formats (as there never is), but testing 10 samples across a variety of music styles at near-transparent encoding rates could at least provide something to point to and say "MPC is transparent at lower encoding rates than the other formats with these samples to the ears of the people who tested them", for instance.

And different people have different transparency thresholds, certainly.  So like with any other listening test, the statistical validity of the results would depend on the number of testers providing results.  The problem is finding people willing to listen to samples encoded at close to the edge of transparency and try to pick them out from the references with any consistency.

Special baselines would have to be set for interpreting test results, a little different than previous tests, I would think.  For instance, "the threshold of perceptual transparency for a tester will be at the 90th percentile".  Interpreting results by starting at the low anchor and putting results in order by increasing encode rates in each format.  When a tester cannot differentiate 9 out of 10 samples, then you've found their threshold for that format.  Repeat for each tester.

Then you average these threshold points across all testers, and you'd get a number (with an error margin) for each format.  Throw them into a table to show which one wins (or how many tie for the "win" based on error margin overlaps).

Now, we're talking a minimum of 10 samples, let's say 5 codecs (LAME, QT-AAC, MPC, Vorbis, and WMA Pro), and at least 4 encoding rates (quality settings) for each format (perhaps targeting kilobitrates of 128, 144, 160 and 192).

10 samples x 5 formats x 4 rates = 200 test groups. 

Now it's unmanageable. 

The alternative to this kind of (likely unpopular) test, in my opinion, is that no one can say that "MPC is better".  Because based on current evidence, AAC, MPC, Vorbis and WMA Pro are equally good at sub-transparent rates, and there has yet be evidence I've seen of which format wins in the range of perceptual transparency.

Unless someone can think of a way to provide evidence another way?
Title: Why is MPC perceived to be the best?
Post by: ExUser on 2004-02-08 22:13:53
The smartest way to do it would be to encode the problem samples in MPC q5 and q6, get the average bitrate, then approximate LAME, AAC, Vorbis, et al to that bitrate. If the test is focused on MPC, go with q5 and q6. Then you have only two rates. 10x5x2 isn't unmanageable. It's a little larger than Roberto's tests, but the problem samples tend to be shorter and more specific.

Problem samples are not representative though, which will cause scoring issues. Furthermore, because MPC is so widely used and accepted as the ultimate in lossy compression,  there have been many samples collected to display where it fails. Thus, we'd need to adjust the problem sample set to balance out the imbalances and be a little more relevant.

Then, once we've determined where the codecs fail, start ramping up the quality levels and see which one becomes transparent first and such.

Using ABC/HR, it's no big deal if people can't tell that a certain sample sounds different. Just leave both sliders at 5, and you're happy. You won't get as many "valid" results, but you'll figure out around where the transparency level lies.

It's odd that this isn't on-topic. I'm thinking it should be moved either to general audio or to listening test discussion.

---

The thing with MPC transparency is that it has sort of been proven, in a very roundabout manner. It's just that the results are so distant from the present day that  it's hard to connect the two. I'm of the opinion that we keep the "axiom" that Musepack is the best lossy audio codec, because what evidence we do have tends to corroborate that. It's the codec that's been primarily tuned for nothing but high-end transparency. And if a codec tuned for transparency works well at midbitrates (128), although it may not follow in a mathematical sense that it would achieve it's goal of transparency, it's probably very likely.

I think there's enough "proof" that MPC is the best solution out there that we can keep the idea for the time being until more proof comes. There's yet to be proof that there's a better solution, and there are many technical reasons why Musepack is likely the most transparent solution. (Most having to do with the fact that it is much easier to tune, being subband-based, and the fact, like I've said, that the goal of the tuners was perceptual transparency)
Title: Why is MPC perceived to be the best?
Post by: ddrawley on 2004-02-09 02:38:17
The proof of MPC quality is neither vague nor simply perceived. It is however, hard to find links to the tests that confirmed this fact. MPC seems to have fallen out of favor and popular use. This does not change the fact that, with research, two years ago, I found MPC to be the undisputed leader in 'transparent' lossy codecs.

This link gives a little background:

http://xmixahlx.com/audio/sc/questions.html (http://xmixahlx.com/audio/sc/questions.html)

This seems to be a very strong link:

http://www.ff123.net/dogies/dogies_plots.html (http://www.ff123.net/dogies/dogies_plots.html)

And another:

http://mp3.radified.com/mp3.htm (http://mp3.radified.com/mp3.htm)

These links only represent about 15 mins work on my part. I have done more lengthy research in the past.

I am lead to believe that MPC has not risen to the top due to the larger files and lack of an active developer and champion.
Title: Why is MPC perceived to be the best?
Post by: Eli on 2004-02-09 02:45:50
Its more than the quality of the codec that draws me to MPC (though this is certainly its strongest point). Put MPC also has APE tagging, gapless playback, native replay gain support, low cpu low decoding (would mean better battery life for portable if this ever happens)...
Title: Why is MPC perceived to be the best?
Post by: rjamorim on 2004-02-09 03:07:15
Quote
If nothing "formal" was ever done, I'd still like to see some kind of initiative to provide evidence that "MPC is the best" if it's going to claimed as such.

May I suggest you conduce such test?

I would give as much support as I can.
Title: Why is MPC perceived to be the best?
Post by: indybrett on 2004-02-09 04:09:41
This is what I would like to see compared. It would take some really golden ears to do this.

mpc -q5
vorbis gt3b1 -q5
lame -aps
nero acc -transparent
Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-02-09 06:36:29
Quote
Quote
If nothing "formal" was ever done, I'd still like to see some kind of initiative to provide evidence that "MPC is the best" if it's going to claimed as such.

May I suggest you conduce such test?

I would give as much support as I can.

Well, that's an offer I can't refuse! 

Unless anyone else would want to run one sooner, I'll have (or make) time this coming spring to do this.

As mentioned, my goal is to see, and not just for MPC, where the average transparency threshold for each of the top five codec "families" resides, based on a variety of samples and variety of listeners (as many as possible...this is critical).  ABX on every sample would be required for this test, which would make it more tedious, unfortunately.  And using a rating scale would have less meaning, and may not even be needed.  The goal is not to compare how "good" each codec sounds compared to the reference, but rather simply find the "point of threshold" for each one on each sample, then average results across samples and then across participants.

The same statistical analysis as Roberto uses should be adhered to to make the results as meaningful as possible.

This would not be a bitrate-centric test, but rather an attempt to "slowly turn up the quality dial", so to speak, until the participant can no longer ABX at p<0.05.  We obviously have to start low enough that a significant number of people can actually ABX samples with that p-value or lower.  Then try (as) gradually (as possible) increasing bitrates until they can't ABX them with confidence.  This would be done with each format in the test.

So, everyone please post whether you think...

1. this test would be possible/manageable,
2. with enough participation, results from this test would be meaningful enough to justify it, and
3. the best method and approach that could balance manageability and effectiveness.


A few preliminaries of how I'd like to do it (or see it done)...If all goes well, we could show an average perceptual transparency threshold for the music samples tested.  And something recent to point to when someone asks "where does MPC/MP3/Vorbis/WMA-Pro/AAC become transparent" or "which codec is the most efficient at mid-high bitrates"?  (Since Roberto has done so much to provide the same info in other bitrate ranges.)


Issues/Questions/Brainstorming:
Title: Why is MPC perceived to be the best?
Post by: Kalamity on 2004-02-09 08:00:19
Looks like double-nested QUOTES puts the auto-quoting function out of sorts.

Quote
Issues/Questions/Brainstorming:
  • 5 formats x 3 encoding rates x 8 samples = 120 test groups.  Is this feasible?  If not...


Some of these codecs have an 'advertised' setting that is supposed to be transparent, coincidentally averaging around 160-200kbps. Perhaps holding them to this would be appropriate here? A pass or failure would determine an appropriate direction (lower or higher) for a second test to determine operational tolerance. Otherwise just start at whatever whole number 'quality' setting (where applicable) gets you closest to 160kbps, and go from there.

Quote
  • How can people be convinced to participate in a listening test that would be more tedious than most previous tests, when some of those tests even had trouble getting enough participants themselves?


Make participation compulsory for continued HA posting rights? 

In all seriousness, maybe those who participate could receive a special 'Official HA Codec Tester' tag under their name. It is amazing what people will do for web board titles.
Title: Why is MPC perceived to be the best?
Post by: 2Bdecided on 2004-02-09 11:51:31
Just how many people are going to give you anything except 5.0 for all samples?

(I can think of some - but if you slashdot it to get a large number of listeners, I bet the percentage is low!)

There's also the possibility that people who can hear problems with various samples at the settings you suggest have already reported it here. Maybe you could somehow analyse this data?

Some people will have already made themselves more sensitive to one codecs artefacts than another. This would likely bias your test.


If you're going to do this, try a small scale pre-test first to alert yourself to the possible problems. And ask advice from Roberto and ff123.

It's a pity the r3mix forum is dead, because it would be helpful to look at his "archive quality" test results too - I've saved them somewhere, if you're interested. I don't have the discussion surrounding them though.

Cheers,
David.
Title: Why is MPC perceived to be the best?
Post by: Der_Iltis on 2004-02-09 16:04:35
Is there any site you would suggest where i can find a scientific description of how psycho acoustic models work and what different kinds exist (subband, ???, SBR etc) I'm really interested in it but don't want to spend too much money in books.
Title: Why is MPC perceived to be the best?
Post by: bubka on 2004-02-09 16:08:33
Quote
This is what I would like to see compared. It would take some really golden ears to do this.

mpc -q5
vorbis gt3b1 -q5
lame -aps
nero acc -transparent

yeah, and throw in vorbis 1.01 -q6 for the hell of it
Title: Why is MPC perceived to be the best?
Post by: sthayashi on 2004-02-09 16:47:05
Quote
How can people be convinced to participate in a listening test that would be more tedious than most previous tests, when some of those tests even had trouble getting enough participants themselves?

I can't help you with this.  Hopefully your test will be more popular than the one I attempted.

There are two additional codecs that ought to be tested, WavPack Hybrid and OptimFrog DualStream.  These are codecs that have never been formally tested in lossy modes, and Somebody should do it™

Given the nature of the test though, it may be too much.

Also, be sure not to crap out rjamorim's tests (or vice-versa).  I can give you the samples I used for my test if you'd like, which can save you a sample call.  PM me if you're interested
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-09 20:30:18
This problem has bothered me as well in the past.  Due to the posts I have read by Guruboolez, Garf and many other gifted ABXers here at HA there isn't much doubt in my mind that MPC is, in fact, the best for quality at higher bitrates.  However, assumptions like this just don't mesh very well with the core values of HA.org; thus, IMO, they should be proved with a public listening test.  Excellent work ScorLibran, I will be happy to participate in your test this spring!  As always you add a healthy dose of extra rationality to this forum (not that there isn't enough to begin with ).

We should start making some preparations for this test right away.  It will be exceedingly time consuming  to come up with settings for all these different encoders that all have the same nominal bitrates.  I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5.  Finding equal nominal bitrates for all the encoders, should be easy at --quality 4, we already know the correct settings for 128kb/s nominal on most encoders, but trying to find these values for --quality 4.x (where x≠0) will probably be considerably more difficult.  We will have to encode a LOT of samples to do this.

-Tyler

P.S.  Lexx is the shit!
Title: Why is MPC perceived to be the best?
Post by: MGuti on 2004-02-09 21:00:19
if all the samples were problem samples, this would  easy. however, theres no point in using strictly problem samples.

i recomend splitting the test up by encoder. if its a strictly ABX test, then it woun't be quite as time consuming, IMO, as normal test. you can either tell or you can't. I would like to participate and would be glad to contribute to this (a nice little 'official tester' beneath my name would be nice).

good luck. once you decide on some samples i'll do some private testing and come up with suggestions.
Title: Why is MPC perceived to be the best?
Post by: ChristianHJW on 2004-02-09 21:03:16
To make this test sensible, you have to remove the 'noise', i.e. the people who dont have the necessary training to differentiate between those codecs.

I recommend to achieve this by either

- doing a pretest, like users have to find out what the 320 kbps MP3 and the original CD is ( quite easy  )

- add the original source ( CD ) to the listening test, and null every vote that ranks the original worse than one of the compressed samples

Just my 2 cents ....
Title: Why is MPC perceived to be the best?
Post by: Continuum on 2004-02-09 21:08:40
Some remarks:

I think neither MP3 nor Vorbis is a contender of MPC at high bitrates. Lame APS is very good but still abx-able at many, many samples (not only problem cases), and even the Insane preset fails obviously on pre-echo material (e.g. castanets, but not only that one).
Vorbis' HF problems were easily noticeable up to -q 8 for me on nearly every quite noisy sample. Pre-echo handling is better than MP3's but still noticeable. Personally, I've only tested version 1, but I can't imagine that a huge leap forward has been made by either 1.0.1 or GT.
Wavpack does consume a lot more bit than MPC standard. Yet there are some deficiencies (according to Guru, post (http://www.hydrogenaudio.org/forums/index.php?showtopic=15073&view=findpost&p=170893)).

I can't really comment on AAC, as I hardly tested that format. Considering, that some (earlier versions of) Nero presets failed miserably with classical content, again according to Guru, and that iTunes is not really tuned for high bitrates (or is it? VBR?), I think it's still quite safe to assume that MPC is the best encoder for transparent lossy.


Quote
I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5
I don't think that would work. IIRC there are some optimizations that kick in at quality level 5 (and are important to quality).
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-09 21:19:44
Quote
Quote
I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5
I don't think that would work. IIRC there are some optimizations that kick in at quality level 5 (and are important to quality). 

So that may mean that we will find that MPC becomes statistically transparent at exactly --quality 5.  What would be the problem with that result?

edit: typo
Title: Why is MPC perceived to be the best?
Post by: Kalamity on 2004-02-09 23:07:18
Quote
We should start making some preparations for this test right away.  It will be exceedingly time consuming  to come up with settings for all these different encoders that all have the same nominal bitrates.  I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5.  Finding equal nominal bitrates for all the encoders, should be easy at --quality 4, we already know the correct settings for 128kb/s nominal on most encoders, but trying to find these values for --quality 4.x (where x≠0) will probably be considerably more difficult.  We will have to encode a LOT of samples to do this.


While the above process would be very scientific in approach, the resulting number of test groups would be far more than (most) anyone would want to deal with, especially if the goal is to include similar tests for other codecs. You would run the very real risk of boring your test audience into giving up early, and initiating lisening fatigue that could skew the results.

Keep things simple. I would wager most people use whole number 'quality' settings (where applicable) or other significant steps (like --alt-preset standard or extreme). To that end, these settings should be raised or lowered by these significant steps until transpareny is found or lost. It would be just as relevant to prove that, in general, transparency of a given codec occurs between 'quality' 4.0 and 5.0, and be far simpler to test. Future testing could isolate which particular setting is the threshold, if any single one exists.
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-10 00:16:58
Quote
Quote
We should start making some preparations for this test right away.  It will be exceedingly time consuming  to come up with settings for all these different encoders that all have the same nominal bitrates.  I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5.  Finding equal nominal bitrates for all the encoders, should be easy at --quality 4, we already know the correct settings for 128kb/s nominal on most encoders, but trying to find these values for --quality 4.x (where x≠0) will probably be considerably more difficult.  We will have to encode a LOT of samples to do this.


While the above process would be very scientific in approach, the resulting number of test groups would be far more than (most) anyone would want to deal with, especially if the goal is to include similar tests for other codecs. You would run the very real risk of boring your test audience into giving up early, and initiating lisening fatigue that could skew the results.

Keep things simple. I would wager most people use whole number 'quality' settings (where applicable) or other significant steps (like --alt-preset standard or extreme). To that end, these settings should be raised or lowered by these significant steps until transpareny is found or lost. It would be just as relevant to prove that, in general, transparency of a given codec occurs between 'quality' 4.0 and 5.0, and be far simpler to test. Future testing could isolate which particular setting is the threshold, if any single one exists.

I agree with you in terms of tester fatigue/boredom.  However I can already tell you the result of this test if we used integer values for --quality:

--quality 4 = not transparent
--quality 5 = transparent

We certainly wouldn't find the exact point at which MPC becomes transparent, which ScorLibran origonally intended:

Quote
Format............Perceptual Transparency Threshold (nominal bitrate across samples tested)
MPC.................nnn kbps
AAC.................nnn kbps
Vorbis..............nnn kbps

...and so forth
Title: Why is MPC perceived to be the best?
Post by: Kalamity on 2004-02-10 00:52:11
Quote
I agree with you in terms of tester fatigue/boredom.  However I can already tell you the result of this test if we used integer values for --quality:

--quality 4 = not transparent
--quality 5 = transparent

From what I have read here, even this outcome is being questioned for lack of definitive proof for or against.
Quote
We certainly wouldn't find the exact point at which MPC becomes transparent, which ScorLibran origonally intended...

I do not think you are going to find a single exact point for each of these codecs where all samples become transparent. The best you could hope for is a tight range that generally results in transparency. Any of the various problem samples currently at hand show that issues at quality X might or might not be fixed with quality Y.
Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-02-10 01:08:27
Quote
Some of these codecs have an 'advertised' setting that is supposed to be transparent, coincidentally averaging around 160-200kbps. Perhaps holding them to this would be appropriate here?

True, but I would not base this test on how the codecs are marketed.  They can call whatever they like "transparent" or "CD quality".  But ABX results don't lie. 

Quote
A pass or failure would determine an appropriate direction (lower or higher) for a second test to determine operational tolerance.

That's exactly what I had in mind.

Quote
Just how many people are going to give you anything except 5.0 for all samples?

(I can think of some - but if you slashdot it to get a large number of listeners, I bet the percentage is low!)

There's also the possibility that people who can hear problems with various samples at the settings you suggest have already reported it here. Maybe you could somehow analyse this data?

Some people will have already made themselves more sensitive to one codecs artefacts than another. This would likely bias your test.

A rating scale wouldn't even be used for this kind of test.  Only ABX.  If the tester can get p<0.05, then they "move up" to the next higher encoding rate for the format.  If they can't, then their transparency threshold for that sample encoded in that format lies below this rate and above the last one they could differentiate.

And "artifact familiarity" won't have a statistical impact if there are enough testers.  Some people would be "attuned" to a format's particular artifacts, but many others won't be.

Quote
There are two additional codecs that ought to be tested, WavPack Hybrid and OptimFrog DualStream. These are codecs that have never been formally tested in lossy modes, and Somebody should do it™

I agree that they should be tested at some point against the ones pointed out previously in this thread.  But it should really be done a future test, because  a) there will be enough test groups as it is with the formats discussed, and b) I want to first pare down these five most commonly used formats, before tackling others.

Quote
We should start making some preparations for this test right away. It will be exceedingly time consuming to come up with settings for all these different encoders that all have the same nominal bitrates. I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5.

I agree.  And the more I think about it and try to "envision" what the test would be like, I'm thinking we should have one test run for each format, close enough together to minimize unfairness by "version variance" between encoders.

And I don't want to only have 3 rates, as I previously stated.  It wouldn't be enough.  "Vorbis -q 4 isn't transparent to me, but -q 5 is."  OK, so transparency for this tester on this sample in this format has been narrowed to within 32-52kbps of the "line".  Not accurate enough.  I want the scale to be as granular as possible.

As you point out in your example, I'd like to know a sample's transparency to a particular person with a format to within 10kbps or so.

Quote
i recomend splitting the test up by encoder. if its a strictly ABX test, then it woun't be quite as time consuming, IMO, as normal test. you can either tell or you can't.

My thoughts exactly.  I'm hoping it'll make the whole thing more manageable in "smaller chunks".  But it would have to be almost a "marathon" of tests.  If we wait a month between testing each format, then too many people would say "Yeah, but you tested the old MPC against the new Vorbis v1.3", etc.  We could, if possible, prepare for testing all the formats at once (over, say, 6-8 weeks), then we could fire off one test, 11 days, then 3 days to compile and publish results, then fire off the next test, 11 days, 3 days to compile/publish, ...and so forth.  Prep time in between tests would be minimal if we were set up at the beginning as much as possible.  The whole thing, with 5 formats, would take about 10 weeks.

Quote
To make this test sensible, you have to remove the 'noise', i.e. the people who dont have the necessary training to differentiate between those codecs.

I recommend to achieve this by either

- doing a pretest, like users have to find out what the 320 kbps MP3 and the original CD is ( quite easy  )

- add the original source ( CD ) to the listening test, and null every vote that ranks the original worse than one of the compressed samples

Pre-tests may be required to determine which particular format variants would be the "most fair to test at mid-high bitrates", but there would be no subjective rating.  ABX only.  It would not be possible to "rate a reference" in this kind of test.  With each encoder setting tested, it's just p<0.05 or p>0.05.  The former shows transparency, the latter does not.  Maybe we should define a "gray area" of 0.05>p>0.07, perhaps, to show an "exploded view" of the threshold when compiling results.  I'm not sure of how much value this would hold, though.  It can always be determined at the end, and even shown both ways if preferred.

Quote
I think it's still quite safe to assume that MPC is the best encoder for transparent lossy.

It's that word, "assume", that we will be killing with this test.  If MPC wins, no need to "assume" any more.    If not, or if it ties for the top position with other formats, then "assumptions" can summarily be corrected.

Quote
IIRC there are some optimizations that kick in at quality level 5 (and are important to quality).

Then, as Tyler says, that is simply the nature of MPC.  As in Roberto's tests, we should seek to minimize worrying too much about how formats "scale" their quality settings.  If MPC would indeed perform better with a more shallow quality "slope" between q4.1 and q5, then maybe it should be modified to do just that.

This idea is simply to test the best encoder version that each of these formats brings to the table when measured at the threshold of perceptual transparency.  And as mentioned before, we could spend the next few weeks pre-testing the different versions of each encoder (especially the ones with newer versions), and picking ideal samples for this kind of test.
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-10 01:18:34
Quote
Quote

I agree with you in terms of tester fatigue/boredom.  However I can already tell you the result of this test if we used integer values for --quality:

--quality 4 = not transparent
--quality 5 = transparent

From what I have read here, even this outcome is being questioned for lack of definitive proof for or against.

True, however I highly doubt we would find this to be otherwise, especially if the conditions are going to be as loose as 90% of samples, as was previously suggested.  Have you ever tried to ABX a non-problem sample at --quality 5?  Yes there might be some worth in simply testing the current settings of each encoder which are supposed to be transparent, but the test would be extremely tedious and I doubt the results would be that interesting.  Perhaps a compromise?

--quality 4
--quality 4.33
--quality 4.66
and we would only do --quality 5 and above if necessary (unlikely, IMO, but not impossible).

or --quality 4
--quality 4.25
--quality 4.5
etc.

I would like the increments to be small, however the answer to the question "How small is too small?" is not  immediately apparent to me.
Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-02-10 01:30:07
Quote
I do not think you are going to find a single exact point for each of these codecs where all samples become transparent. The best you could hope for is a tight range that generally results in transparency. Any of the various problem samples currently at hand show that issues at quality X might or might not be fixed with quality Y.

Correct.  We would discover a different transparency threshold for each format, for each sample, by each tester.  Then build a table of results, showing the average transparency threshold for each format based on the averaged results from all testers on all tested samples.
Title: Why is MPC perceived to be the best?
Post by: Doctor on 2004-02-10 01:43:29
For narrowing down quality settings use binary search, e.g.:

q5 pass
q4 fail
q4.5 fail
q4.75 pass
q4.625 pass

ad nauseam. At some point it will be narrow enough to warrant a stop.

To trim down the total number of tests, shrink your goal. If you seek to verify that MPC is superior to all codecs, don't try to calculate the "transparency threshold" for each codec, just locate a bitrate that is transparent under MPC and not under the codec in question.

Finally, if you do separate tests by codec, you will open yourself up to placebo. If people strongly believe that a particular setup is transparent, they may relax and not look hard for differences. The solution to this is to provide, each time, a set of flacs without identifying what the codecs are, and reveal that only after the test.

For example, here's a plan:

Round 1:

MPC -q5 and Vorbis, LAME, WMA, AAC, whatever, bitrate-matched to MPC

Suppose ABX reveals that MPC, Vorbis and AAC pass, LAME and WMA fail => these two are dropped from future tests.

Round 2: MPC -q4 and -q4.5; two each of Vorbis and AAC bitrate-matched to the two MPC's

Suppose everything at MPC -q4 bitrate fails, Vorbis fails, MPC -q4.5 and AAC counterpart pass => drop Vorbis.

Round 3: MPC -q4.125, -q4.25, -q4.325 with AAC counterparts.

=> should settle it.
Title: Why is MPC perceived to be the best?
Post by: Kalamity on 2004-02-10 01:59:20
Quote
Quote
Some of these codecs have an 'advertised' setting that is supposed to be transparent, coincidentally averaging around 160-200kbps. Perhaps holding them to this would be appropriate here?
True, but I would not base this test on how the codecs are marketed.  They can call whatever they like "transparent" or "CD quality".  But ABX results don't lie. 

My point was perhaps missed. You will need to start somewhere, preferably close to the threshold of transparency to save time. By starting from a particular codec's 'advertised' transparent setting, you immediately put to test the veracity of this claim. By 'advertised' I mean using official encoder settings, like with Nero's AAC Transparent, as well as common knowledge settings like Vorbis Q6 and MPC Q5. Later tests would merely isolate the general location of this threshold. This would be a fair starting point for all involved. If an 'advertised' transparent setting is used, no claims of bias could be lodged. Transparent is transparent is transparent; if someone can hear the difference, then this claim is false. If this happens to result in some codecs utilizing an overly high bitrate, then so be it. The next test (below) will show if this was truly necessary.
Quote
Quote
A pass or failure would determine an appropriate direction (lower or higher) for a second test to determine operational tolerance.

That's exactly what I had in mind.

This is the merely the second step of the above process, to follow the above initial step. Neither tests are intended to stand solely by themselves, although claims of transparency could be proven or disproven for the settings used in the first step.

As to issues of encoder version variance: establish a cutoff point (such as today) where only the major or stable versions of official encoders that were generally available as of that date will be used. Valid or not, the assumption can be made that later encoder releases should, if nothing else, be no worse than prior versions. Maintain the test as a snapshot of the world of audio compression on this, a normal and unimportant day.

Talk of 3 or more settings used per codec is already making my ears ache and my mind wander. Keeping the test groups to a small, friendly number would open your test up to a larger and more active, audience. This would be good for maintaining relevance. Anything else would merely be a test of those few indvididuals willing to stick it out, and they will almost be guaranteed to break the curve (you know who you are).
Title: Why is MPC perceived to be the best?
Post by: Vertigo on 2004-02-10 03:00:15
Do we ask why god is omnipotent?  He just is....same with MPC. 
Title: Why is MPC perceived to be the best?
Post by: rjamorim on 2004-02-10 03:07:31
Quote
Do we ask why god is omnipotent?  He just is....same with MPC. 

ABX please
Title: Why is MPC perceived to be the best?
Post by: Dologan on 2004-02-10 03:22:41
The god part or the MPC part?
Title: Why is MPC perceived to be the best?
Post by: Mr_Rabid_Teddybear on 2004-02-10 03:33:13
Quote
Quote
Do we ask why god is omnipotent?  He just is....same with MPC. 

ABX please

....When worlds collide.......

recommended  medicine:
"Phillip K. Dick: Valis"
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-10 03:59:40
Quote
For narrowing down quality settings use binary search, e.g.:

q5 pass
q4 fail
q4.5 fail
q4.75 pass
q4.625 pass

ad nauseam. At some point it will be narrow enough to warrant a stop.

To trim down the total number of tests, shrink your goal. If you seek to verify that MPC is superior to all codecs, don't try to calculate the "transparency threshold" for each codec, just locate a bitrate that is transparent under MPC and not under the codec in question.

Finally, if you do separate tests by codec, you will open yourself up to placebo. If people strongly believe that a particular setup is transparent, they may relax and not look hard for differences. The solution to this is to provide, each time, a set of flacs without identifying what the codecs are, and reveal that only after the test.

For example, here's a plan:

Round 1:

MPC -q5 and Vorbis, LAME, WMA, AAC, whatever, bitrate-matched to MPC

Suppose ABX reveals that MPC, Vorbis and AAC pass, LAME and WMA fail => these two are dropped from future tests.

Round 2: MPC -q4 and -q4.5; two each of Vorbis and AAC bitrate-matched to the two MPC's

Suppose everything at MPC -q4 bitrate fails, Vorbis fails, MPC -q4.5 and AAC counterpart pass => drop Vorbis.

Round 3: MPC -q4.125, -q4.25, -q4.325 with AAC counterparts.

=> should settle it.

Excellent system Doctor.  I bet this would be a much simpler and more efficient method than my brute force technique.  However we need to guess at what --quality values are going to be used ahead of time so we can find out the corresponding command lines for the other codecs.  Once we have some idea what --quality values will *probably* be tested, we should try to find their nominal bitrates.  To do this I propose that several of us encode a whole bunch of random tracks at a specific --quality value and then post the average bitrate we got  (you can calculate with this) (http://www.musepack-source.de/downloads/windows/tools/kbps.zip) along with the number of tracks encoded (try not to include tracks of silence or tracks with long periods of silence).  Then we can get the mean by multiplying those two numbers together and adding all the products, as well as all the track numbers together, then dividing total products by total tracks.

Example:

Person 1:  A = [Number of tracks]; B = [Average bitrate]
Person 2:  C = [Number of tracks]; D = [Average bitrate]
Person 3:  E = [Number of tracks]; F = [Average bitrate]

Total Average = (A×B+C×D+E×F)/(A+C+E)

I already tested 394 tracks at --quality 4.1 and got an average bitrate of 128kb/s (oddly enough).  Don't follow my lead, please as I'm not sure if we will be using --quality 4.1 yet.  I did this before Doctor's post.
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-10 14:43:36
Quote
Excellent system Doctor.  I bet this would be a much simpler and more efficient method than my brute force technique.

However if we did the test this way we would not be able to get statistical information to the effect of how many people find the sample transparent at --quality x.x.  The following idea would be very difficult to achieve:

Quote
Correct. We would discover a different transparency threshold for each format, for each sample, by each tester. Then build a table of results, showing the average transparency threshold for each format based on the averaged results from all testers on all tested samples.
Title: Why is MPC perceived to be the best?
Post by: Doctor on 2004-02-11 01:17:48
Quote
However if we did the test this way we would not be able to get statistical information to the effect of how many people find the sample transparent at --quality x.x.

That's what I meant by "shrinking the goal" - I strongly suspect people will be dropping out of a 120-sample barrage. ;-)

Also, if another codec raises interest we can later take these results and continue in a different direction.

Regarding matching bitrates, we might simply take the average bitrate of the samples we are testing. However, this may be a stupid idea because these samples - the killer ones - are probably interacting with the codec's bitrate management in peculiar ways.

So, we probably should take a "representative" set of recordings and rely on that.
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-11 03:53:01
Quote
Regarding matching bitrates, we might simply take the average bitrate of the samples we are testing. However, this may be a stupid idea because these samples - the killer ones - are probably interacting with the codec's bitrate management in peculiar ways.

Exactly!  Thats why MPC had an average bitrate, over all samples it was 140+kbit/s, on the last (128kbit/s) multiformat listening test.  We need to do rigorous testing ahead of time to make sure we have comparable bitrates for all the codecs to be tested.  I think we should start working toward this right away.  Again, first thing that must be decided is the likely --quality values we will use during the test (I know we can't be sure what we will and won't use but I don't mind testing more than I have too).  For example:

--quality 4 - 5.5 (just to be sure) in increments of 0.1 or
--quality 4 - 5.5 in increments of 0.2
etc . . . 

-Tyler
Title: Why is MPC perceived to be the best?
Post by: Eli on 2004-02-11 03:55:02
Wouldnt a problem sample set make the most sense? My understanding is that the subband encoding has fewer problem samples. Otherwise there really isnt much point as other codecs, like AAC, also perform very well.

But as I said above MPC has other strengths that AAC doesnt, and you dont need to ABX to hear gapless payback or enjoy the benefits of a better tagging system...
Title: Why is MPC perceived to be the best?
Post by: music_man_mpc on 2004-02-11 04:00:35
Quote
Wouldnt a problem sample set make the most sense? My understanding is that the subband encoding has fewer problem samples. Otherwise there really isnt much point as other codecs, like AAC, also perform very well.

No, no, no!  The point of this test is to find out where (the approximate --quality value and average bitrate) MPC becomes transparent for most people, on most samples and how that compares to other leading codecs (AAC, Ogg, Mp3, etc.).  So a test suite of all problem samples would completely defeat the purpose. . . . . I don't mean to sound too harsh.
Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-02-11 04:24:19
Well, the time frame for this test should really be May-June.  That'll give Roberto time to conduct the tests already on his schedule, then we can begin serious discussions about this test some time in April.

Everybody's got really good ideas here.  We'll have plenty of time to talk more about them in the official pre-test thread, when the time's right.

As complex as this test would be, there has to be a way to get it all done.  I've also gotten some really good insights from Roberto.  When the time's right, we can all determine the best approach for this test, and then go from there with sample selection and determining encoder versions and settings.

To sum up.....deviating from how I titled this thread, I don't want the test to be "MPC-centric", but rather a determination of average transparency threshold for each of the formats mentioned, all following the exact same testing methodology.  We shouldn't try going into too much more detail than we have already.  The only thing for sure at this point is that the test would have to be divided into "phases" somehow, to make it manageable for everyone who participates.  I don't want people keeling over from "ABX-overdose" and blaming me for it. 

Thanks to everyone here for your input.  And thanks to Roberto for sharing his vast knowledge on this subject, and offering his assistance in what's going to be a pretty big endeavor.  I've hinted at it previously, but now I'm officially announcing my intention to conduct the test, in the time frame mentioned above.



Edit:  Adjusted the planned time frame for the test.
Title: Why is MPC perceived to be the best?
Post by: Kalamity on 2004-02-11 05:22:02
Quote
Quote
Regarding matching bitrates, we might simply take the average bitrate of the samples we are testing. However, this may be a stupid idea because these samples - the killer ones - are probably interacting with the codec's bitrate management in peculiar ways.

Exactly!  Thats why MPC had an average bitrate, over all samples it was 140+kbit/s, on the last (128kbit/s) multiformat listening test.  We need to do rigorous testing ahead of time to make sure we have comparable bitrates for all the codecs to be tested.  I think we should start working toward this right away.  Again, first thing that must be decided is the likely --quality values we will use during the test (I know we can't be sure what we will and won't use but I don't mind testing more than I have too)...

Not to step on ScorLibran's excellent summation, but I felt that this might be a valid point to mull over for the actual test. Doctor appears to have the same reservations.

Matching the bitrates for the test samples should not be a concern, and the resulting bitrates at transparency are merely a formality. This test is about determining the general threshold (setting) of transparency for all of the involved codecs. Some codecs might yield complete transparency with an average below 200kbps, while others are never fully transparent (it could happen...) even above 300kbps. Or vice-versa.

Going to this point, my understanding of why Musepack was the best was because when it did fail, it generally was not catastrophic. A tendency for low bitrates just happened to be an added plus. As long as the bitrate was below lossless, transparency was of primary importance.
Title: Why is MPC perceived to be the best?
Post by: SometimesWarrior on 2004-02-11 05:41:29
Whoa, I haven't posted here in 6 months! I hope people still take what I have to say seriously.

This thread started out as a simple question: is MPC really the best lossy format for achieving transparency at a reasonable bitrate? Then, the follow-up question: if not, then what format is?

I think answering the first question is more important than answering, for example, "What are the exact quality settings for each encoder so that, on problem samples, 80% of the population represented by the test group will be unable to distinguish between the encoded and original sample?"

A test for transparency must be done differently than a test for quality. For low-bitrate quality tests, we are examining the encoder's ability to intelligently throw out information that is less critical for the reproduction of the musical sample. An encoder that is incapable of producing a transparent encoding can still win a low-bitrate contest, if it degrades more pleasantly than its competitors. For transparency tests, the encoder can't underestimate the audibility of any kind of distortion. Though I've never tuned an audio encoder myself, from what I've read here by codec developers, the tuning needs to be done somewhat differently. Think about the Vorbis discussions that claim Monty is working on improving low bitrates, but not high bitrates. Consider that the tuning done for the alt-presets provided no benefit for lower bitrates in Lame. Hopefully, someone who has done codec tuning will quickly and mercilessly correct me if my assumption is wrong.

Also consider the behavior of a codec when it fails on a problem sample: increasing the bitrate rarely helps. Practically any sample that defeats --alt-preset standard will beat --alt-preset extreme. Garf has expressed his findings that any sample besting MPC --standard will also fail with --xtreme. Problem samples for Vorbis may be non-transparent up through --quality 9. A problem sample can be overcome by telling the encoder to dump buckets of bits on it, but that's beside the point, because the codec has already failed to provide transparency at a quality setting designed to be transparent.

If you want to find out which encoder achieves transparency 90% of the time at the lowest bitrate, you can probably extrapolate from Roberto's 128kbps tests. However, Musepack isn't considered "the best" because it hits the 90% marker at a lower bitrate than its competitors. Musepack gets the crown for being closest to the lossy encoder's ultimate goal of 100% transparency at a reasonable bitrate.

Let me provide a contrived example. Format ABC gets 90% transparency at 120kbps average, XYZ at 150kbps. However, XYZ gets 99% transparency at 180kbps, whereas ABC needs 300kbps. This would be the case if ABC degraded more gracefully than XYZ, but XYZ was tuned to handle extreme test-sample cases more thoroughly than ABC. In this situation, XYZ would be championed as the best encoder when transparency is the goal.

Here's my point: if we're testing for transparency, the exact bitrate or quality setting of the encoder is unimportant, because when an encoder fails, adding 20kbps or 40kbps often won't solve the problem. We can simply pick one setting from each encoder that gives the encoder enough "breathing room" to not run out of bits, and then see which encoder can get over the most problem-sample hurdles. This would mean Musepack "standard", AAC "transparent", and Vorbis "quality 5" (or something along those lines). Perhaps even "xtreme", "extreme", and "quality 6", so that we're really testing the encoder's ability to overcome the worst-case scenario, rather than its ability trim bits as close to the wire as possible.

Now, all of this rambling does nothing to address the very difficult issue of picking samples for a fair multi-codec transparency test...
Title: Why is MPC perceived to be the best?
Post by: ChangFest on 2004-02-11 15:18:26
Quote
Here's my point: if we're testing for transparency, the exact bitrate or quality setting of the encoder is unimportant, because when an encoder fails, adding 20kbps or 40kbps often won't solve the problem.


I don't believe the point of this test is the consideration of "problem" samples because of the exact point you're mentioning: problem samples being problems at all bitrates.
Title: Why is MPC perceived to be the best?
Post by: Eli on 2004-02-11 15:57:08
Quote
Quote
Wouldnt a problem sample set make the most sense? My understanding is that the subband encoding has fewer problem samples. Otherwise there really isnt much point as other codecs, like AAC, also perform very well.

No, no, no!  The point of this test is to find out where (the approximate --quality value and average bitrate) MPC becomes transparent for most people, on most samples and how that compares to other leading codecs (AAC, Ogg, Mp3, etc.).  So a test suite of all problem samples would completely defeat the purpose. . . . . I don't mean to sound too harsh.

I disagree. Just because one codec is transparent at a bitrate of 10-20 lower then another doesnt make it a superior codec as the size difference really isnt that significant. However if one codec handles problem samples alot better then it, IMHO is a better codec.
Title: Why is MPC perceived to be the best?
Post by: ChangFest on 2004-02-11 22:04:53
Quote
However if one codec handles problem samples alot better then it, IMHO is a better codec.


Once again, I don't think the point of this test is to find the "best" codec.  The test is for the determination of the bitrate/transparency threshold for today's most popular codecs.  ScorLibran's title of the thread doesn't really support the reason for the test anymore(IMO).
Title: Why is MPC perceived to be the best?
Post by: Eli on 2004-02-11 22:16:40
Quote
Quote
However if one codec handles problem samples alot better then it, IMHO is a better codec.


Once again, I don't think the point of this test is to find the "best" codec.  The test is for the determination of the bitrate/transparency threshold for today's most popular codecs.  ScorLibran's title of the thread doesn't really support the reason for the test anymore(IMO).

Well, Im not sure what the point of the test is then. The original question was what is the best codec (or why is MPC considered the best)? Somehow that got twisted into a listening test to see at what bitrate codecs become transparent. Most ppl code with a little extra head room anyway so is this even an important question? The big contenders here are MPC and AAC (and vobis I guess). We all know that they can achieve transparency at a realatively reasonable bitrate, so is it meaningful to say that one does it at ~180, one at 150 and one at 200 (random #s)? IMHO a test would do better to assume that the modern codecs do transparency realavily well at a reasonable bit rate, but which one performs the best on problem samples, so that you know your music library has the best fidelity you can get with lossy encoding.
Title: Why is MPC perceived to be the best?
Post by: SometimesWarrior on 2004-02-11 23:38:06
Quote
IMHO a test would do better to assume that the modern codecs do transparency realavily well at a reasonable bit rate, but which one performs the best on problem samples, so that you know your music library has the best fidelity you can get with lossy encoding.

I agree. I'd much rather know which lossy encoder is least likely to fail, given enough headroom to transparently encode normal music. Even if it requires 20-30kbps more on average, I would still use it. If a test could find the exact threshold for transparency on typical audio samples for a typical listener, I would still encode at an arbitrarily higher bitrate to account for somewhat more difficult audio samples, possible sound hardware upgrades, and my steadily-improving ability to hear encoding artifacts. Most people would probably do the same, unless they plan to re-encode often. For this reason, I think a "bitrate for transparency" statistic would not be particularly useful.
Title: Why is MPC perceived to be the best?
Post by: Doctor on 2004-02-11 23:59:54
Hm, I feel a flamefest waiting to happen. Before emotions get out of hand, another chapter from software engineering.

ScorLibran, by virtue of volunteering to organize and run the test, decides how exactly the test is executed. He is the dictator of the test. We can discuss details, but he is free to accept or reject our ideas.

On the other hand, the test is worthless without several dozen participants. So, he better be a benevolent dictator. If he sets unreachable or dubious goals, people will refuse to participate, making the test less conclusive.

So, let's work out something we can calmly agree on.
Title: Why is MPC perceived to be the best?
Post by: Doctor on 2004-02-12 00:10:16
Concerning the goal of the test. It is obviously interesting to know where true transparency is for each codec. But going from "is MPC the best" to "what settings are adequate for each codec" is like saying "I need a house, so let me build a castle". It is called feature creep. We have a budget: patience of testers. We run over the budget, we get no results.

I proposed a scheme that can tell whether MPC is transparent at a lower bitrate than everything else in ~18 codec/quality setups. If the testing is performed on 5 samples, that's 90 samples to test, in three phases of 30. I'd like to see a scheme that can find the transparency setting for each codec without using more than a hundred of samples.
Title: Why is MPC perceived to be the best?
Post by: Kalamity on 2004-02-12 04:23:13
I would not consider this thread flammable, let alone engulfed in flames. However, it would seem that some of the more central points for this test have yet to be hammered out to the satisfaction of all posters.

What is the real question being asked here? How can the test be conducted to most efficiently answer this question?

After some thought, I am even now more interested in a test to see what people think of how different codecs handle all know problem samples. Testing each codec at two settings:Give each encoder a crack at handling all known problem samples for all codecs. The best would have the least number of accurately ABX'ed samples, averaged out over the number of samples that are known to affect that codec. I think this could be fair, as even codecs with greater numbers of problem samples might still handle them well enough for the 'masses' to not notice. Would this be a valid form of handicapping?

I realize that the above is not really an original proposal, though the inner workings might be. This might best follow the test for establishing 'transparent' settings for all involved, though it would be no less relevant as even now individuals are using the 'advertised' transparent settings as such, proven or not.
Title: Why is MPC perceived to be the best?
Post by: 2Bdecided on 2004-02-12 11:18:57
Before spending hours discussing and thinking about this, try a little pre-test... (I originally typed reality check instead of pi]pre-test[/I]!)

Pick an audio sample. It probably doesn't matter what it is, as long as it's not a known "codec killer", but isn't too simple either.

Decide whether to go for "standard" or "extreme" settings - for this pre-test, try standard I think.

Encode this sample using, say, MusePack standard, --alt-preset standard, ogg -q5, AAC transparent etc.

Take these four (or more) samples, as and people to ABX them against the original.

Look at the results. Consider each negative ABX result as a "5.0" grade, and each positive ABX result as a "4.5" grade. Do a statistical analysis. Are the results significant?

If not, I propose you need to think carefully about your "full" test.


You can set up this pre-test in an hour or so - do the encoding, and post the original and decoded FLACs, or make a .bat file to do the job on the users machine if that's easier.

You may decide to use more samples, but this is a pre-test: don't use too many! I think try one, look at the result, try another, look at the results, maybe try a third, look at all the results, and then decide on what the real test should be.


Just my advice - but you're doing the work - I'm just typing waffle!

Cheers,
David.
Title: Why is MPC perceived to be the best?
Post by: Continuum on 2004-02-12 11:40:33
Quote
Consider each negative ABX result as a "5.0" grade, and each positive ABX result as a "4.5" grade. Do a statistical analysis. Are the results significant?

Why do you suggest such a procedure? There have to be more apt analysis methods. Some binomial distributions come to my mind. 

I agree with your (insinuated!) point though. Group tests of high bitrate modes are likely to fail.
Title: Why is MPC perceived to be the best?
Post by: 2Bdecided on 2004-02-12 11:51:36
Quote
Quote
Consider each negative ABX result as a "5.0" grade, and each positive ABX result as a "4.5" grade. Do a statistical analysis. Are the results significant?

Why do you suggest such a procedure? There have to be more apt analysis methods.

You're right - I'm sure one of our resident statistical geniuses (that's not the plural, is it?) will respond in full...


As for high bitrate tests, I've just re-posted the results of the only (very old, quite flawed) test I know of here:
http://www.hydrogenaudio.org/forums/index....25&#entry183841 (http://www.hydrogenaudio.org/forums/index.php?showtopic=18223&st=25&#entry183841)

IIRC some of the results were statistically significant, even though people weren't required to ABX (some listeners did anyway). You would expect enforced ABX to filter out some of the noise. The placing of the high anchor by most listeners suggests that there isn't actually that much noise here though.

EDIT: where "is it transparent or not" is the question, ABX is probably essential. Where only a ranking is required, blind tests, large numbers of listeners and useful statistical analysis could be enough to cancel out placebo and still get useful results. ABX is still useful because it raises the quality of the results.

Before anyone launches into a TOS-8 attack on my lack of respect for ABX, remember that it (or something very like it) is only essential to prove (to a certain probability) that an individual heres a difference. In BS-1116 listening tests, hidden anchors and statistical processing do the job to give meaningful results for the population. You may not know categorically whether a certain individual actually heard a difference for a certain sample+encoder, but you don't need to.

Cheers,
David.
Title: Why is MPC perceived to be the best?
Post by: fanerman91 on 2004-04-18 04:18:38
This "Really Big Codec Test" sounds exciting... did momentum for it die out?  It looks like a ton of work butit would be incredibly helpful for a lot of people if it were carried out... What happened?
Title: Why is MPC perceived to be the best?
Post by: ScorLibran on 2004-04-18 05:43:00
Quote
This "Really Big Codec Test" sounds exciting... did momentum for it die out?  It looks like a ton of work butit would be incredibly helpful for a lot of people if it were carried out... What happened?

As I said on the previous page, the timeframe will be May-June, but may be pushed a little farther out than that to take other scheduling issues into account (June-July?).

And it will be a great deal of work, especially luring enough people to participate to make the results statistically significant.  But against popular belief, this won't necessarily be a "high-bitrate" test.  People don't like testing high bitrates because they can't distinguish artifacts (or only with great difficulty).  That means a lower bitrate range should be used anyway.  Maybe a range like 96kbps to 192kbps (VBR wherever possible/available), so that most people will distinguish variances more easily in part of this range.  They'll actually stop testing once they can't tell the test sample from the reference, as finding their transparency threshold for that sample and format will be the entire goal of the test.

I'll tentatively plan on starting the official discussion for this test soon after Roberto finishes his dial-up bitrate test.
Title: Why is MPC perceived to be the best?
Post by: damiandimitri on 2004-04-19 12:04:25
Quote
Consider each negative ABX result as a "5.0" grade, and each positive ABX result as a "4.5" grade. Do a statistical analysis. Are the results significant?


Why do you suggest such a procedure? There have to be more apt analysis methods. Some binomial distributions come to my mind. 

I agree with your (insinuated!) point though. Group tests of high bitrate modes are likely to fail.



if you want good statistical results...you should make the good and bad results more different then 4,5 and 5.....better take 1 and 10 for example. or...
good 10
don't know 5 << if it excists
bad 0

this way you will get better(clearer) results from your statistics
Title: Why is MPC perceived to be the best?
Post by: tigre on 2004-04-19 12:46:32
Quote
if you want good statistical results...you should make the good and bad results more different then 4,5 and 5.....better take 1 and 10 for example. or...
good 10
don't know 5 << if it excists
bad 0

this way you will get better(clearer) results from your statistics

Probably not. The main problem about listening tests at settings/bitrates aiming for transparency is certainly not the scale used for rating.

If 40% of listeners rate the original lower then the encoded version (in ABC/HR situation) because the difference they hear is based on immagination, you still need a big number of participants to get results that show a significant difference between encoders, no matter what scale you use for rating.

As said before, one other big problem is the way test samples are choosen. Using known problem samples will cause bias against the encoder most commonly used by people with good training in hearing artifacts. Choosing samples randomly won't help either because you need a big number of them to find samples where people can hear differences at all.