MPC v1.15f (alpha version) ready for download

Reply #25 – 2003-01-01 19:45:38

Quote

JohnV, I agree with your explanation but in my opinion, the issue could be explained in a simpler way.

Take Vorbis (a transform codec) and MPC (a subband codec).

Force both of them to a 64kbps-ish bitrate.

What does Vorbis do ? It encodes less tonal components, with less precision. Result: More tonal signal => less noise.
Consequence: better efficiency on tonal signals at that bitrate, softer sound and smeared transients.

What does MPC do ? It encodes most subband samples with less precision. Result: More noise => less tonal signal.
Consequence: better efficiency on noisy/aggressive signals at that bitrate, harsher sound.

The reason why mpc often sounds worse on highly tonal instruments at this TOO LOW bitrate, is that the codec IS FORCED to add noise almost everywhere to gain bits... and that noise is greater than the allowed masking threshold, almost everywhere in the music.

That's also the reason why, if you lower the bitrate really too much, then MPC - most of the time - will artifact almost everywhere.

Nearly completely wrong.

The problem is situated in the psycho model and has nothing to do
with the problem of a 2048 tap single overlap MDCT vs. a 512 tap
15-time overlap MDCT.

Under some situations a noise level of -30 dB is audible and it is difficult
to find out this situation with numerical means. This problem is
independend from the encoder! MPC can encode with noise levels
down to -86 dB, MP3 down to -106 dB. But this doesn't play any role, because
the problem is the psycho model says -23 dB, but -33 dB is needed.
You can increase SMR by 10 dB, but this increases bitrate for all music by
about 120 kbps, so this is possible for MPC and also Ogg Vorbis (partially
also for MP3), but it is no real solution.

True is that MP3, AAC, Ogg Vorbis can compress tonal signals a little bit better
than MP1, MP2 and Musepack without LPC.

Another remark:
Be careful with any posting here in the forum. Most postings are a dangerous
mixture of wrong information and partially wrong information. Correct information
is very rare and highly correlated with some names I don't want to mention.

Most is pure speculation and waiting for an account from an opposing point of view.

MPC v1.15f (alpha version) ready for download

Reply #26 – 2003-01-02 01:42:09

Quote

Nearly completely wrong.

Give 'em hell Frank!

Quote

Most is pure speculation and waiting for an account from an opposing point of view.

It's called antagonism. The temptation to make comments just to invoke a response happens to the best of us.

MPC v1.15f (alpha version) ready for download

Reply #27 – 2003-01-02 08:54:36

Frank, It's good to see you back here, I really appreciate your knowledge and very informative posts.

About your comments, I think that the situation here is not as bad as you say, although sometimes that happens. Also, note that everybody can be and has been wrong sometimes, including me, you, the admins, etc. Please don't take this as an offense, I just wanted to say that everybody has the 'right' to make an occasional mistake from time to time. As long as it is corrected...

MPC v1.15f (alpha version) ready for download

Reply #28 – 2003-01-04 16:42:54

Quote

Quote
Theoretically what can we expect from this new tonality estimation?

Maybe Musepack not freaking out on 2000 year old string instruments anymore that guruboolez must have digged out of the chinese mud...

Hmm.. I take offence with this post.. perhaps is it just me, or is this view representative of a highly intolerant and uneducated view?

The erhu isn't unknown, on the contrary, it's a staple of most Chinese orchestral music. Just because you don't listen to Chinese orchestral music ( not that I really like it myself .. ) or the fact that you don't know much about it doesn't give you the right to diss on it.. In fact, the lack of knowledge all the more means that you shouldn't really degrade something you don't know. To say that it was dug out of the Chinese mud....

MPC v1.15f (alpha version) ready for download

Reply #29 – 2003-01-05 00:00:20

Don't be hurt. You really just misinterpreted him.

MPC v1.15f (alpha version) ready for download

Reply #30 – 2003-01-06 09:30:22

Quote

Quote
JohnV, I agree with your explanation but in my opinion, the issue could be explained in a simpler way.

Take Vorbis (a transform codec) and MPC (a subband codec).

Force both of them to a 64kbps-ish bitrate.

What does Vorbis do ? It encodes less tonal components, with less precision. Result: More tonal signal => less noise.
Consequence: better efficiency on tonal signals at that bitrate, softer sound and smeared transients.

What does MPC do ? It encodes most subband samples with less precision. Result: More noise => less tonal signal.
Consequence: better efficiency on noisy/aggressive signals at that bitrate, harsher sound.

The reason why mpc often sounds worse on highly tonal instruments at this TOO LOW bitrate, is that the codec IS FORCED to add noise almost everywhere to gain bits... and that noise is greater than the allowed masking threshold, almost everywhere in the music.

That's also the reason why, if you lower the bitrate really too much, then MPC - most of the time - will artifact almost everywhere.

Nearly completely wrong.

The problem is situated in the psycho model and has nothing to do
with the problem of a 2048 tap single overlap MDCT vs. a 512 tap
15-time overlap MDCT.

Under some situations a noise level of -30 dB is audible and it is difficult
to find out this situation with numerical means. This problem is
independend from the encoder! MPC can encode with noise levels
down to -86 dB, MP3 down to -106 dB. But this doesn't play any role, because
the problem is the psycho model says -23 dB, but -33 dB is needed.
You can increase SMR by 10 dB, but this increases bitrate for all music by
about 120 kbps, so this is possible for MPC and also Ogg Vorbis (partially
also for MP3), but it is no real solution.

True is that MP3, AAC, Ogg Vorbis can compress tonal signals a little bit better
than MP1, MP2 and Musepack without LPC.

Another remark:
Be careful with any posting here in the forum. Most postings are a dangerous
mixture of wrong information and partially wrong information. Correct information
is very rare and highly correlated with some names I don't want to mention.

Most is pure speculation and waiting for an account from an opposing point of view.

Thank you for your answer, Frank.
Hmm. It wasn't my goal to create antagonism when I posted my message. I didn't expect such opposition, but your post was a pleasant surprise :-)
There will always be speculation going on. Of course I'm part of it. Nobody can see the whole picture with all details - and as you certainly know, it takes time and effort to dig into another person's source code, especially for a codec. For the rest however, you can safely assume I know what I'm talking about.

Okay, to the point now.

You're saying that basically, 32xPQF and MDCT are equally useful for ~64 kbps audio compression, and only the psymodel makes the difference ? I can't agree with that.

However, I really appreciated your insight about the difficulty of accurately choosing the SMR in the psymodel.

1) In theory yes, with optimal prediction in time domain, subbanding would get all the advantages of MDCT (then MDCT would get mostly useless for both audio and video). Only the psymodel would then matter. Unfortunately, from my experience, at very low bitrates, subband coding (+LPC) is - at best - not really adequate to beat MDCT.

2-a ) Even though Musepack's current psymodel is unoptimal for low bitrates, most if the SMR under-estimation is certainly caused by the agressive command-line settings that are used to reach ~64kbps in the first place. So, the problems *does happen* in the psycho model, but it is *not* the fault of the psycho model itself. After all, how to do proper encoding, if you can't reach the SMR you need :-(

2-b ) In other words: In the --quality 5 preset, Musepack will use some headroom to compensate for possible SMR inaccuracies and reach (near-)transparency in most cases. At 64kbps (0.7 bit per sample !), there's simply no room for this "luxury" !

I actually did (and still do) experiment with audio compression prototypes, and I'm still convinced that the reason why this kind of subband codec (Musepack) will usually need more bits (compared with Vorbis) to encode a tonal signal satisfactorily (NOT transparently), results from the combination of two facts:
- the encoding process is being done in (near-) time domain, and
- the audio signal cannot always be predicted accurately (even if no transient happens). Compare this with MP3/Vorbis where similarity between successions of the same MDCT spectral coefficient can be exploited more easily.

In other words, at low bitrate a pure transform-domain codec can't encode a tonal signal perfectly, but still, at the decoding we'll generate a sine bank that will mimic the tonal signal reasonably well - except for the usual stereo, flanger, pre/post-shoot/echo or transient problems of course.

How possibly could a subband codec lower the bitrate to the same point ? I really don't see where the bits could be taken. Obviously, @ 0.7 bit per sample on average, and few masking effects to exploit, noise HAS TO be added pretty much everywhere, and at a too high level. This can be avoided if, and only if, the prediction algorithm is exceptional, and the entropy coding efficient enough (arith. coding or sample grouping - as in SV8) ! In that case the codec would make most MDCT-based codecs obsolete, even in the worst low-bitrate situations.

Now an interesting (I think) remark: we all know that a transform codec does the same thing as Musepack (put noise bursts into narrow bands of the signal). Now what's interesting, is that at low bitrates (~64 kbps) the "killed" and most aggressively quantized MDCT coefficients will smoothen up the time-domain audio wave in a subtle way, adding a kind of noise that sounds quite different (less "rough/icy" but more "watery/soft sound") than for a subband codec (which creates band-limited bursts of white noise). The "texture" of sound is an important point in low-bitrate encoding IMHO, and sticks to each user's personal preference.

My point is that if, at low bitrates, Musepack sounds quite good, it's only because of its accurate psy model and good entropy coding, which compensate (to a certain extent) for the bigger burden of time-domain encoding at forced low bitrate. Of course, at higher average "bits-per-sample", the time-domain encoding will become a strong point.

If we stay in low bitrates, then the efficiency of time-domain coding is closely tied (in very low-bitrate applications) to the type of signal involved.

About the SMR again - we have seen that Musepack, to reach ~64kbps, MUST take the risk of estimating a too low SMR sometimes. And to reach a ~64kbps bitrate with Musepack, one has to force the codec to use too agressive SMR reductions in many, many places. This can be heard - and again, I don't think this can be called a psymodel "problem".

Now my claim is that with (for example) Vorbis, the encoding is easier at these bitrates, because we can cut into time resolution also, to gain bits - and for these reasons, the encoder can afford to use safer, (and statistically higher) SMR values on highly tonal sounds.

Of course, as soon as masking or violent transients are involved, mpc might take the lead again.

To sum it up, I do NOT think that taking even more risks (in the psymodel) in estimating the suitable SMR, would solve the problem of bit-shortage. I think most improvements would be brought predicting the signal better - but again, for me compression had always been "prediction + arithmetic coding" so I admit my opinion might be biased ;-)

Keep up the excellent work, Frank. Your codec is King B)

* * *
To complete my answer and clear up any confusion (for non-mpc users) concerning my post:
- don't get me wrong, Musepack, to me does a beautiful job at audio compression. I personally think that subbanding is a more elegant solution than MDCT.
- it's just an experiment, with low bitrates reached using non-transparent mpc settings.

[I've been editing this post to make it clearer to read]

Notice