Why is MPC perceived to be the best?

Topic: Why is MPC perceived to be the best? (Read 39327 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Why is MPC perceived to be the best?

2004-02-08 17:43:01

This is a question that has floated through my mind for most of a year, but only yesterday became more clear to me.

Actually, it's a two-part question...

... 1. Is MPC commonly accepted among the HA community as the best psychoacoustic encoding format? (i.e., the most efficient at achieving perceptual transparency.)

... 2. If so, why?

The first item I've heard stated quite frequently, but have never seen any results of "transparency threshold tests" that would reveal the superior efficiency of MPC. I've heard that MPC uses superior encoding technology, but I'm referring more to the end result of such development efforts...the perceived sound quality, as measured against other codecs at the point of perceptual transparency for a significant number of people.

These concerns on my part were born from a post I made here, where the points of MPC statistically tying other formats at 128kbps, but no other known test results existing, were brought up. The thread portion ended up in the recycle bin, but I'm taking the chance that my concerns about calling MPC "the best" weren't the reason it was put there.

Hence, I want to bring up this idea in a different context in the off-topic forum (in the hope that this will be the correct area for it).

What I'd like to see, for instance, for the education of myself and others, would be a results summary like the following (though this is a very simplistic example)...

Format............Perceptual Transparency Threshold (nominal bitrate across samples tested)
MPC.................nnn kbps
AAC.................nnn kbps
Vorbis..............nnn kbps

...and so forth

Granted, VBR is more efficient at mid-bitrates and up, and quality-based VBR modes aren't bitrate centric, but we need some means of measurement and comparison between codecs in this context, so if not calling it "nominal bitrate", then perhaps "average filesize per minute of audio across all samples"

Perceptual Transparency Threshold could have a fixed target, like >90% samples with 5.0 subjective ratings, and non-differentiable from reference with ABX testing.

This kind of test has been discussed before, and has been mostly viewed as having little "real-world value". And I agree. Roberto's tests are much more relevant for most music listeners, and for determining the best formats for useful purposes like streaming audio, portable players, etc.

Many of us (including me) have trouble testing even these bitrate ranges, so higher ones would be even more tedious, and would answer not as many pressing questions.

My point, though, is how can MPC be called "the best for achieving transparency" without a test such as this? (Because so far it's been shown to be only "among the best" at lower rates.)

Why is MPC perceived to be the best?

Reply #1 – 2004-02-08 18:53:57

1) The answer appears to be yes.
2) This result came about long before you or I joined HA, possibly before HA was even founded.

There are two specific attitudes I've found here that border on violating TOS #8. You've picked up on one of those attitudes. The other is that --aps uses the most efficient settings for mp3.

Many people have reinterated the above sentiments (myself included), but most of them don't know what formal testing was done to establish MPC and Lame --APS as the best. The generic response tends to be, "They are the best, if you don't think so, scientifically* prove it wrong" (assuming ABX is scientific).

As for your table, the perceived threshold for transparency may be lower than you think. Many people think that 128kbps is transparent. Certainly, even more people will think that 192 is transparent. But most of this is person dependent. Although it's been a while since I've tested, 160 mp3 may very well fit the description of transparent for me, where I can't distinguish 90% of the samples from the original.

Likewise though, there are people on this very board who can hear the difference between the higher settings. So creating a table would be impossible, since it's highly dependent on the person.

But I do agree with you that the lack of data concerning MPC is mildly disturbing

Why is MPC perceived to be the best?

Reply #2 – 2004-02-08 20:51:31

There's never been a hard and fast test as to which is the best. The problem is the same reason why roberto hasn't attempted a test at greater than 128kbps. On representative samples, you run into transparency.

So, the basis for Musepack's high status around here is pretty straightforward: on all the problem samples that kill most encoders, Musepack (at Q5+) tends to do better than the others. This was especially pronounced before AAC and Vorbis were contenders, when LAME was being tuned, --r3mix happened, and Frank was still heavily tuning Musepack and so on. Musepack is also less technically complex, being a subband encoder. Tuning it takes less effort than would tuning a transform encoder like Vorbis, AAC, or MP3. It also has fewer intrinsic problems with things like pre-echo.

I am uncertain if the problem samples comparisons would give the same results now. However, you're not going to get comparative information by any other means than problem samples. The issue there is that different encoding methods give different problem samples, so one that breaks Musepack horribly might not break AAC at all.

It would be nice if one of the older members here would say a word or two. The opinions I've reiterated here were "fresh" when I joined. They've sort of been passed down to later HA members by proxy.

Why is MPC perceived to be the best?

Reply #3 – 2004-02-08 21:52:20

Thanks for the input, guys.

If nothing "formal" was ever done, I'd still like to see some kind of initiative to provide evidence that "MPC is the best" if it's going to claimed as such. I would think a variety of samples, most problem samples, and few that were not previously considered "problem samples", might provide a fairly thorough test package, without being too unmanageable. Granted, there's no way to provide a comprehensive, "across the board" appraisal of the formats (as there never is), but testing 10 samples across a variety of music styles at near-transparent encoding rates could at least provide something to point to and say "MPC is transparent at lower encoding rates than the other formats with these samples to the ears of the people who tested them", for instance.

And different people have different transparency thresholds, certainly. So like with any other listening test, the statistical validity of the results would depend on the number of testers providing results. The problem is finding people willing to listen to samples encoded at close to the edge of transparency and try to pick them out from the references with any consistency.

Special baselines would have to be set for interpreting test results, a little different than previous tests, I would think. For instance, "the threshold of perceptual transparency for a tester will be at the 90th percentile". Interpreting results by starting at the low anchor and putting results in order by increasing encode rates in each format. When a tester cannot differentiate 9 out of 10 samples, then you've found their threshold for that format. Repeat for each tester.

Then you average these threshold points across all testers, and you'd get a number (with an error margin) for each format. Throw them into a table to show which one wins (or how many tie for the "win" based on error margin overlaps).

Now, we're talking a minimum of 10 samples, let's say 5 codecs (LAME, QT-AAC, MPC, Vorbis, and WMA Pro), and at least 4 encoding rates (quality settings) for each format (perhaps targeting kilobitrates of 128, 144, 160 and 192).

10 samples x 5 formats x 4 rates = 200 test groups.

Now it's unmanageable.

The alternative to this kind of (likely unpopular) test, in my opinion, is that no one can say that "MPC is better". Because based on current evidence, AAC, MPC, Vorbis and WMA Pro are equally good at sub-transparent rates, and there has yet be evidence I've seen of which format wins in the range of perceptual transparency.

Unless someone can think of a way to provide evidence another way?

Why is MPC perceived to be the best?

Reply #4 – 2004-02-08 22:13:53

The smartest way to do it would be to encode the problem samples in MPC q5 and q6, get the average bitrate, then approximate LAME, AAC, Vorbis, et al to that bitrate. If the test is focused on MPC, go with q5 and q6. Then you have only two rates. 10x5x2 isn't unmanageable. It's a little larger than Roberto's tests, but the problem samples tend to be shorter and more specific.

Problem samples are not representative though, which will cause scoring issues. Furthermore, because MPC is so widely used and accepted as the ultimate in lossy compression, there have been many samples collected to display where it fails. Thus, we'd need to adjust the problem sample set to balance out the imbalances and be a little more relevant.

Then, once we've determined where the codecs fail, start ramping up the quality levels and see which one becomes transparent first and such.

Using ABC/HR, it's no big deal if people can't tell that a certain sample sounds different. Just leave both sliders at 5, and you're happy. You won't get as many "valid" results, but you'll figure out around where the transparency level lies.

It's odd that this isn't on-topic. I'm thinking it should be moved either to general audio or to listening test discussion.

---

The thing with MPC transparency is that it has sort of been proven, in a very roundabout manner. It's just that the results are so distant from the present day that it's hard to connect the two. I'm of the opinion that we keep the "axiom" that Musepack is the best lossy audio codec, because what evidence we do have tends to corroborate that. It's the codec that's been primarily tuned for nothing but high-end transparency. And if a codec tuned for transparency works well at midbitrates (128), although it may not follow in a mathematical sense that it would achieve it's goal of transparency, it's probably very likely.

I think there's enough "proof" that MPC is the best solution out there that we can keep the idea for the time being until more proof comes. There's yet to be proof that there's a better solution, and there are many technical reasons why Musepack is likely the most transparent solution. (Most having to do with the fact that it is much easier to tune, being subband-based, and the fact, like I've said, that the goal of the tuners was perceptual transparency)

Why is MPC perceived to be the best?

Reply #5 – 2004-02-09 02:38:17

The proof of MPC quality is neither vague nor simply perceived. It is however, hard to find links to the tests that confirmed this fact. MPC seems to have fallen out of favor and popular use. This does not change the fact that, with research, two years ago, I found MPC to be the undisputed leader in 'transparent' lossy codecs.

This link gives a little background:

http://xmixahlx.com/audio/sc/questions.html

This seems to be a very strong link:

http://www.ff123.net/dogies/dogies_plots.html

And another:

http://mp3.radified.com/mp3.htm

These links only represent about 15 mins work on my part. I have done more lengthy research in the past.

I am lead to believe that MPC has not risen to the top due to the larger files and lack of an active developer and champion.

Why is MPC perceived to be the best?

Reply #6 – 2004-02-09 02:45:50

Its more than the quality of the codec that draws me to MPC (though this is certainly its strongest point). Put MPC also has APE tagging, gapless playback, native replay gain support, low cpu low decoding (would mean better battery life for portable if this ever happens)...

Why is MPC perceived to be the best?

Reply #7 – 2004-02-09 03:07:15

Quote

If nothing "formal" was ever done, I'd still like to see some kind of initiative to provide evidence that "MPC is the best" if it's going to claimed as such.

May I suggest you conduce such test?

I would give as much support as I can.

Why is MPC perceived to be the best?

Reply #8 – 2004-02-09 04:09:41

This is what I would like to see compared. It would take some really golden ears to do this.

mpc -q5
vorbis gt3b1 -q5
lame -aps
nero acc -transparent

Why is MPC perceived to be the best?

Reply #9 – 2004-02-09 06:36:29

Quote

Quote
If nothing "formal" was ever done, I'd still like to see some kind of initiative to provide evidence that "MPC is the best" if it's going to claimed as such.

May I suggest you conduce such test?

I would give as much support as I can.

Well, that's an offer I can't refuse!

Unless anyone else would want to run one sooner, I'll have (or make) time this coming spring to do this.

As mentioned, my goal is to see, and not just for MPC, where the average transparency threshold for each of the top five codec "families" resides, based on a variety of samples and variety of listeners (as many as possible...this is critical). ABX on every sample would be required for this test, which would make it more tedious, unfortunately. And using a rating scale would have less meaning, and may not even be needed. The goal is not to compare how "good" each codec sounds compared to the reference, but rather simply find the "point of threshold" for each one on each sample, then average results across samples and then across participants.

The same statistical analysis as Roberto uses should be adhered to to make the results as meaningful as possible.

This would not be a bitrate-centric test, but rather an attempt to "slowly turn up the quality dial", so to speak, until the participant can no longer ABX at p<0.05. We obviously have to start low enough that a significant number of people can actually ABX samples with that p-value or lower. Then try (as) gradually (as possible) increasing bitrates until they can't ABX them with confidence. This would be done with each format in the test.

So, everyone please post whether you think...

1. this test would be possible/manageable,
2. with enough participation, results from this test would be meaningful enough to justify it, and
3. the best method and approach that could balance manageability and effectiveness.

A few preliminaries of how I'd like to do it (or see it done)...

Five formats: LAME MP3, WMA Pro, MPC, AAC, Vorbis. These are the most discussed formats in HA, and the most used in the world, I think. Once variant/version of each. Which variant of AAC (and possibly Vorbis) should be used may require a pre-test.
At least 3 encoding rates/quality settings. I'd like to see more than 2 to be able to represent (at least minimally) what could be considered a "range" of encoding rates with each format. Target lower and upper range limits would probably be 128kbps and 192kbps, based on enough sub-4 ratings on Roberto's 128kbps Extension Test, and on the fact that many people say they prefer settings which generally yield rates that seem to center around 192kbps (160-224kbps) for maximum efficiency. (Speaking of which...providing evidence to support people's beliefs about "maximum efficiency" would be exactly the point of this test.)
8 samples should be sufficient to cover enough musical variety to make the results meaningful for most people, I would hope, while keeping the numbers down in the interest of manageability.
ABX testing, with a target p-value of 0.05 or less, for each sample. Each encode rate (by format) would be double-blind tested to find not it's "level of transparency", but rather which side of the "threshold of transparency" it's on for each particular tester.
Results would be gathered for each tester, each format, each encode rate, and each sample. Averages would be compiled by sample, then across encode rates, then across participants, then the end-numbers for each format would be graphed with statistical error margins shown. (Since there would be no subjective aspect to this test, the error margins could perhaps instead compensate for a limited number of participants, to make the results meaningful for a majority of people.)

If all goes well, we could show an average perceptual transparency threshold for the music samples tested. And something recent to point to when someone asks "where does MPC/MP3/Vorbis/WMA-Pro/AAC become transparent" or "which codec is the most efficient at mid-high bitrates"? (Since Roberto has done so much to provide the same info in other bitrate ranges.)

Issues/Questions/Brainstorming:

5 formats x 3 encoding rates x 8 samples = 120 test groups. Is this feasible? If not, we could break this up into a seperate test for each format, or maybe into 2 "phases"...2 formats in one phase and 3 in the other. Or maybe ideally, have one test per format, and a more discreet scale...perhaps 5 encode rates? This would stretch the schedule, though, allowing at least 15 days (+ or -) per format plus time to have pre- and post-test discussions, determine samples, compile results at the end, etc.
How can people be convinced to participate in a listening test that would be more tedious than most previous tests, when some of those tests even had trouble getting enough participants themselves?

Why is MPC perceived to be the best?

Reply #10 – 2004-02-09 08:00:19

Looks like double-nested QUOTES puts the auto-quoting function out of sorts.

Quote

Issues/Questions/Brainstorming:
5 formats x 3 encoding rates x 8 samples = 120 test groups. Is this feasible? If not...

Some of these codecs have an 'advertised' setting that is supposed to be transparent, coincidentally averaging around 160-200kbps. Perhaps holding them to this would be appropriate here? A pass or failure would determine an appropriate direction (lower or higher) for a second test to determine operational tolerance. Otherwise just start at whatever whole number 'quality' setting (where applicable) gets you closest to 160kbps, and go from there.

Quote

How can people be convinced to participate in a listening test that would be more tedious than most previous tests, when some of those tests even had trouble getting enough participants themselves?

Make participation compulsory for continued HA posting rights?

In all seriousness, maybe those who participate could receive a special 'Official HA Codec Tester' tag under their name. It is amazing what people will do for web board titles.

Why is MPC perceived to be the best?

Reply #11 – 2004-02-09 11:51:31

Just how many people are going to give you anything except 5.0 for all samples?

(I can think of some - but if you slashdot it to get a large number of listeners, I bet the percentage is low!)

There's also the possibility that people who can hear problems with various samples at the settings you suggest have already reported it here. Maybe you could somehow analyse this data?

Some people will have already made themselves more sensitive to one codecs artefacts than another. This would likely bias your test.

If you're going to do this, try a small scale pre-test first to alert yourself to the possible problems. And ask advice from Roberto and ff123.

It's a pity the r3mix forum is dead, because it would be helpful to look at his "archive quality" test results too - I've saved them somewhere, if you're interested. I don't have the discussion surrounding them though.

Cheers,
David.

Why is MPC perceived to be the best?

Reply #12 – 2004-02-09 16:04:35

Is there any site you would suggest where i can find a scientific description of how psycho acoustic models work and what different kinds exist (subband,

, SBR etc) I'm really interested in it but don't want to spend too much money in books.

Why is MPC perceived to be the best?

Reply #13 – 2004-02-09 16:08:33

Quote

This is what I would like to see compared. It would take some really golden ears to do this.

mpc -q5
vorbis gt3b1 -q5
lame -aps
nero acc -transparent

yeah, and throw in vorbis 1.01 -q6 for the hell of it

Why is MPC perceived to be the best?

Reply #14 – 2004-02-09 16:47:05

Quote

How can people be convinced to participate in a listening test that would be more tedious than most previous tests, when some of those tests even had trouble getting enough participants themselves?

I can't help you with this. Hopefully your test will be more popular than the one I attempted.

There are two additional codecs that ought to be tested, WavPack Hybrid and OptimFrog DualStream. These are codecs that have never been formally tested in lossy modes, and Somebody should do it

Given the nature of the test though, it may be too much.

Also, be sure not to crap out rjamorim's tests (or vice-versa). I can give you the samples I used for my test if you'd like, which can save you a sample call. PM me if you're interested

Why is MPC perceived to be the best?

Reply #15 – 2004-02-09 20:30:18

This problem has bothered me as well in the past. Due to the posts I have read by Guruboolez, Garf and many other gifted ABXers here at HA there isn't much doubt in my mind that MPC is, in fact, the best for quality at higher bitrates. However, assumptions like this just don't mesh very well with the core values of HA.org; thus, IMO, they should be proved with a public listening test. Excellent work ScorLibran, I will be happy to participate in your test this spring! As always you add a healthy dose of extra rationality to this forum (not that there isn't enough to begin with ).

We should start making some preparations for this test right away. It will be exceedingly time consuming to come up with settings for all these different encoders that all have the same nominal bitrates. I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5. Finding equal nominal bitrates for all the encoders, should be easy at --quality 4, we already know the correct settings for 128kb/s nominal on most encoders, but trying to find these values for --quality 4.x (where x≠0) will probably be considerably more difficult. We will have to encode a LOT of samples to do this.

-Tyler

P.S. Lexx is the shit!

Why is MPC perceived to be the best?

Reply #16 – 2004-02-09 21:00:19

if all the samples were problem samples, this would easy. however, theres no point in using strictly problem samples.

i recomend splitting the test up by encoder. if its a strictly ABX test, then it woun't be quite as time consuming, IMO, as normal test. you can either tell or you can't. I would like to participate and would be glad to contribute to this (a nice little 'official tester' beneath my name would be nice).

good luck. once you decide on some samples i'll do some private testing and come up with suggestions.

Why is MPC perceived to be the best?

Reply #17 – 2004-02-09 21:03:16

To make this test sensible, you have to remove the 'noise', i.e. the people who dont have the necessary training to differentiate between those codecs.

I recommend to achieve this by either

- doing a pretest, like users have to find out what the 320 kbps MP3 and the original CD is ( quite easy )

- add the original source ( CD ) to the listening test, and null every vote that ranks the original worse than one of the compressed samples

Just my 2 cents ....

Why is MPC perceived to be the best?

Reply #18 – 2004-02-09 21:08:40

Some remarks:

I think neither MP3 nor Vorbis is a contender of MPC at high bitrates. Lame APS is very good but still abx-able at many, many samples (not only problem cases), and even the Insane preset fails obviously on pre-echo material (e.g. castanets, but not only that one).
Vorbis' HF problems were easily noticeable up to -q 8 for me on nearly every quite noisy sample. Pre-echo handling is better than MP3's but still noticeable. Personally, I've only tested version 1, but I can't imagine that a huge leap forward has been made by either 1.0.1 or GT.
Wavpack does consume a lot more bit than MPC standard. Yet there are some deficiencies (according to Guru, post).

I can't really comment on AAC, as I hardly tested that format. Considering, that some (earlier versions of) Nero presets failed miserably with classical content, again according to Guru, and that iTunes is not really tuned for high bitrates (or is it? VBR?), I think it's still quite safe to assume that MPC is the best encoder for transparent lossy.

Quote

I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5

I don't think that would work. IIRC there are some optimizations that kick in at quality level 5 (and are important to quality).

Why is MPC perceived to be the best?

Reply #19 – 2004-02-09 21:19:44

Quote

Quote
I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5
I don't think that would work. IIRC there are some optimizations that kick in at quality level 5 (and are important to quality).

So that may mean that we will find that MPC becomes statistically transparent at exactly --quality 5. What would be the problem with that result?

edit: typo

Why is MPC perceived to be the best?

Reply #20 – 2004-02-09 23:07:18

Quote

We should start making some preparations for this test right away. It will be exceedingly time consuming to come up with settings for all these different encoders that all have the same nominal bitrates. I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5. Finding equal nominal bitrates for all the encoders, should be easy at --quality 4, we already know the correct settings for 128kb/s nominal on most encoders, but trying to find these values for --quality 4.x (where x≠0) will probably be considerably more difficult. We will have to encode a LOT of samples to do this.

While the above process would be very scientific in approach, the resulting number of test groups would be far more than (most) anyone would want to deal with, especially if the goal is to include similar tests for other codecs. You would run the very real risk of boring your test audience into giving up early, and initiating lisening fatigue that could skew the results.

Keep things simple. I would wager most people use whole number 'quality' settings (where applicable) or other significant steps (like --alt-preset standard or extreme). To that end, these settings should be raised or lowered by these significant steps until transpareny is found or lost. It would be just as relevant to prove that, in general, transparency of a given codec occurs between 'quality' 4.0 and 5.0, and be far simpler to test. Future testing could isolate which particular setting is the threshold, if any single one exists.

Why is MPC perceived to be the best?

Reply #21 – 2004-02-10 00:16:58

Quote

Quote
We should start making some preparations for this test right away. It will be exceedingly time consuming to come up with settings for all these different encoders that all have the same nominal bitrates. I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5. Finding equal nominal bitrates for all the encoders, should be easy at --quality 4, we already know the correct settings for 128kb/s nominal on most encoders, but trying to find these values for --quality 4.x (where x≠0) will probably be considerably more difficult. We will have to encode a LOT of samples to do this.

While the above process would be very scientific in approach, the resulting number of test groups would be far more than (most) anyone would want to deal with, especially if the goal is to include similar tests for other codecs. You would run the very real risk of boring your test audience into giving up early, and initiating lisening fatigue that could skew the results.

Keep things simple. I would wager most people use whole number 'quality' settings (where applicable) or other significant steps (like --alt-preset standard or extreme). To that end, these settings should be raised or lowered by these significant steps until transpareny is found or lost. It would be just as relevant to prove that, in general, transparency of a given codec occurs between 'quality' 4.0 and 5.0, and be far simpler to test. Future testing could isolate which particular setting is the threshold, if any single one exists.

I agree with you in terms of tester fatigue/boredom. However I can already tell you the result of this test if we used integer values for --quality:

--quality 4 = not transparent
--quality 5 = transparent

We certainly wouldn't find the exact point at which MPC becomes transparent, which ScorLibran origonally intended:

Quote

Format............Perceptual Transparency Threshold (nominal bitrate across samples tested)
MPC.................nnn kbps
AAC.................nnn kbps
Vorbis..............nnn kbps

...and so forth

Why is MPC perceived to be the best?

Reply #22 – 2004-02-10 00:52:11

Quote

I agree with you in terms of tester fatigue/boredom. However I can already tell you the result of this test if we used integer values for --quality:

--quality 4 = not transparent
--quality 5 = transparent

From what I have read here, even this outcome is being questioned for lack of definitive proof for or against.

Quote

We certainly wouldn't find the exact point at which MPC becomes transparent, which ScorLibran origonally intended...

I do not think you are going to find a single exact point for each of these codecs where all samples become transparent. The best you could hope for is a tight range that generally results in transparency. Any of the various problem samples currently at hand show that issues at quality X might or might not be fixed with quality Y.

Why is MPC perceived to be the best?

Reply #23 – 2004-02-10 01:08:27

Quote

Some of these codecs have an 'advertised' setting that is supposed to be transparent, coincidentally averaging around 160-200kbps. Perhaps holding them to this would be appropriate here?

True, but I would not base this test on how the codecs are marketed. They can call whatever they like "transparent" or "CD quality". But ABX results don't lie.

Quote

A pass or failure would determine an appropriate direction (lower or higher) for a second test to determine operational tolerance.

That's exactly what I had in mind.

Quote

Just how many people are going to give you anything except 5.0 for all samples?

(I can think of some - but if you slashdot it to get a large number of listeners, I bet the percentage is low!)

There's also the possibility that people who can hear problems with various samples at the settings you suggest have already reported it here. Maybe you could somehow analyse this data?

Some people will have already made themselves more sensitive to one codecs artefacts than another. This would likely bias your test.

A rating scale wouldn't even be used for this kind of test. Only ABX. If the tester can get p<0.05, then they "move up" to the next higher encoding rate for the format. If they can't, then their transparency threshold for that sample encoded in that format lies below this rate and above the last one they could differentiate.

And "artifact familiarity" won't have a statistical impact if there are enough testers. Some people would be "attuned" to a format's particular artifacts, but many others won't be.

Quote

There are two additional codecs that ought to be tested, WavPack Hybrid and OptimFrog DualStream. These are codecs that have never been formally tested in lossy modes, and Somebody should do it™

I agree that they should be tested at some point against the ones pointed out previously in this thread. But it should really be done a future test, because a) there will be enough test groups as it is with the formats discussed, and b) I want to first pare down these five most commonly used formats, before tackling others.

Quote

We should start making some preparations for this test right away. It will be exceedingly time consuming to come up with settings for all these different encoders that all have the same nominal bitrates. I suggest, since we are mainly testing Musepack here to start at --quality 4, then go to --quality 4.1, --quality 4.2 etc, until we reach statistical transparency, so to speak, presumably somewhere between --quality 4 and 5.

I agree. And the more I think about it and try to "envision" what the test would be like, I'm thinking we should have one test run for each format, close enough together to minimize unfairness by "version variance" between encoders.

And I don't want to only have 3 rates, as I previously stated. It wouldn't be enough. "Vorbis -q 4 isn't transparent to me, but -q 5 is." OK, so transparency for this tester on this sample in this format has been narrowed to within 32-52kbps of the "line". Not accurate enough. I want the scale to be as granular as possible.

As you point out in your example, I'd like to know a sample's transparency to a particular person with a format to within 10kbps or so.

Quote

i recomend splitting the test up by encoder. if its a strictly ABX test, then it woun't be quite as time consuming, IMO, as normal test. you can either tell or you can't.

My thoughts exactly. I'm hoping it'll make the whole thing more manageable in "smaller chunks". But it would have to be almost a "marathon" of tests. If we wait a month between testing each format, then too many people would say "Yeah, but you tested the old MPC against the new Vorbis v1.3", etc. We could, if possible, prepare for testing all the formats at once (over, say, 6-8 weeks), then we could fire off one test, 11 days, then 3 days to compile and publish results, then fire off the next test, 11 days, 3 days to compile/publish, ...and so forth. Prep time in between tests would be minimal if we were set up at the beginning as much as possible. The whole thing, with 5 formats, would take about 10 weeks.

Quote

To make this test sensible, you have to remove the 'noise', i.e. the people who dont have the necessary training to differentiate between those codecs.

I recommend to achieve this by either

- doing a pretest, like users have to find out what the 320 kbps MP3 and the original CD is ( quite easy )

- add the original source ( CD ) to the listening test, and null every vote that ranks the original worse than one of the compressed samples

Pre-tests may be required to determine which particular format variants would be the "most fair to test at mid-high bitrates", but there would be no subjective rating. ABX only. It would not be possible to "rate a reference" in this kind of test. With each encoder setting tested, it's just p<0.05 or p>0.05. The former shows transparency, the latter does not. Maybe we should define a "gray area" of 0.05>p>0.07, perhaps, to show an "exploded view" of the threshold when compiling results. I'm not sure of how much value this would hold, though. It can always be determined at the end, and even shown both ways if preferred.

Quote

I think it's still quite safe to assume that MPC is the best encoder for transparent lossy.

It's that word, "assume", that we will be killing with this test. If MPC wins, no need to "assume" any more. If not, or if it ties for the top position with other formats, then "assumptions" can summarily be corrected.

Quote

IIRC there are some optimizations that kick in at quality level 5 (and are important to quality).

Then, as Tyler says, that is simply the nature of MPC. As in Roberto's tests, we should seek to minimize worrying too much about how formats "scale" their quality settings. If MPC would indeed perform better with a more shallow quality "slope" between q4.1 and q5, then maybe it should be modified to do just that.

This idea is simply to test the best encoder version that each of these formats brings to the table when measured at the threshold of perceptual transparency. And as mentioned before, we could spend the next few weeks pre-testing the different versions of each encoder (especially the ones with newer versions), and picking ideal samples for this kind of test.

Why is MPC perceived to be the best?

Reply #24 – 2004-02-10 01:18:34

Quote

Quote

I agree with you in terms of tester fatigue/boredom. However I can already tell you the result of this test if we used integer values for --quality:

--quality 4 = not transparent
--quality 5 = transparent

From what I have read here, even this outcome is being questioned for lack of definitive proof for or against.

True, however I highly doubt we would find this to be otherwise, especially if the conditions are going to be as loose as 90% of samples, as was previously suggested. Have you ever tried to ABX a non-problem sample at --quality 5? Yes there might be some worth in simply testing the current settings of each encoder which are supposed to be transparent, but the test would be extremely tedious and I doubt the results would be that interesting. Perhaps a compromise?

--quality 4
--quality 4.33
--quality 4.66
and we would only do --quality 5 and above if necessary (unlikely, IMO, but not impossible).

or --quality 4
--quality 4.25
--quality 4.5
etc.

I would like the increments to be small, however the answer to the question "How small is too small?" is not immediately apparent to me.

Notice