Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: QMF + MDCT vs. MDCT only (Read 13796 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

QMF + MDCT vs. MDCT only

I was reading through ffmpeg's ATRAC3 decoder source code to learn more about the format.  I noticed it uses a QMF to split a signal into 4 equal subbands, and then uses a 512 point MDCT to code each. 

What is the advantage of this approach verses the more common 2048 point direct MDCT?  Is it to save memory by using smaller transforms, or is there some coding advantage?  It seems like an odd way to make a codec.

QMF + MDCT vs. MDCT only

Reply #1
I was reading through ffmpeg's ATRAC3 decoder source code to learn more about the format.  I noticed it uses a QMF to split a signal into 4 equal subbands, and then uses a 512 point MDCT to code each. 

What is the advantage of this approach verses the more common 2048 point direct MDCT?  Is it to save memory by using smaller transforms, or is there some coding advantage?  It seems like an odd way to make a codec.


Given the age of the codec and the requirement for hardware, maybe there was an off-the-shelf chip that would do the 512point iMDCT (or a cheap DSP cheap that had had limited MIPS) and it was cheaper to buy 4 of those than one chip that could do the whole thing.

QMF + MDCT vs. MDCT only

Reply #2
According to the Sony documents I've found, the original ATRAC codecs were implemented on a 16 bit fixed point DSP cores using software emulation of 32 bit fixed point operations for MDCT, so I do not think hardware acceleration was a factor.  Additionally, different ATRAC variants have used different arrangements of the QMF over the years, suggesting again that the QMF itself was desirable and not a work around for hardware limitations of some particular era.

QMF + MDCT vs. MDCT only

Reply #3
According to the Sony documents I've found, the original ATRAC codecs were implemented on a 16 bit fixed point DSP cores using software emulation of 32 bit fixed point operations for MDCT, so I do not think hardware acceleration was a factor.  Additionally, different ATRAC variants have used different arrangements of the QMF over the years, suggesting again that the QMF itself was desirable and not a work around for hardware limitations of some particular era.


There were several ideas. The one Brandenburg and I did was to change the time/frequency tradeoff in different frequency ranges.

That was also possible in ATRAC (which was later), MPEG-2 AAC SSR profile, etc.

None of these have ever achieved common use. There is a fundamental property of any tree filterbank that eats you alive, and that's that a tree filterbank must, in general, have a longer impulse response for a given frequency resolution than a direct filterbank. In practice, pre-echo and/or frequency aliasing bite you on the butt.

There isn't any desirable use for QMF or PQMF, including in MP3, where it was REQUIRED because somebody INSISTED WE USE THEIR FILTERBANK OR ELSE.

Which is a ridiculous reason, but that's the reason, boys and women.

It would have been possible for MP3 to have been a 512 MDCT, and have substantially better performance, but noooo, that wasn't ok, it didn't use the PQMF so it wasn't ok.

There's a long story there, but Guinness or good Zinfandel are required, perhaps along with the Animal's "We gotta get out of this place" and/or "Money for Nothing" by Dire Straits.
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #4
There isn't any desirable use for QMF or PQMF, including in MP3, where it was REQUIRED because somebody INSISTED WE USE THEIR FILTERBANK OR ELSE.

Which is a ridiculous reason, but that's the reason, boys and women.

It would have been possible for MP3 to have been a 512 MDCT, and have substantially better performance, but noooo, that wasn't ok, it didn't use the PQMF so it wasn't ok.


Hmm ok.  I was hoping for a more satisfying reason then "they're idiots" but I guess that will do.

QMF + MDCT vs. MDCT only

Reply #5
There were several ideas. The one Brandenburg and I did was to change the time/frequency tradeoff in different frequency ranges.

There isn't any desirable use for QMF or PQMF,...

Someone (Jürgen, IIRC) once told me that the reason for the 4-band QMF was to be able to apply band-wise temporal gain envelopes to the signal, i.e. flatten the bands temporally, before doing the MDCT in order to reduce pre-echos. I guess that's not needed any more since we have TNS now?

Chris
If I don't reply to your reply, it means I agree with you.

QMF + MDCT vs. MDCT only

Reply #6
There were several ideas. The one Brandenburg and I did was to change the time/frequency tradeoff in different frequency ranges.

There isn't any desirable use for QMF or PQMF,...

Someone (Jürgen, IIRC) once told me that the reason for the 4-band QMF was to be able to apply band-wise temporal gain envelopes to the signal, i.e. flatten the bands temporally, before doing the MDCT in order to reduce pre-echos. I guess that's not needed any more since we have TNS now?

Chris


Well, that was one of the alleged ideas. It turned out to be a whole lot harder to do than was originally imagined. 

TNS is much more useful, and can be applied as needed to the frequency coefficients.

The SSR profile also allows one to turn off the highest band to scale sampling rates.  That was, at least, the theory.  It would work. I'm not sure it ever turned out to happen.
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #7
There isn't any desirable use for QMF or PQMF, including in MP3, where it was REQUIRED because somebody INSISTED WE USE THEIR FILTERBANK OR ELSE.

Which is a ridiculous reason, but that's the reason, boys and women.

It would have been possible for MP3 to have been a 512 MDCT, and have substantially better performance, but noooo, that wasn't ok, it didn't use the PQMF so it wasn't ok.


Hmm ok.  I was hoping for a more satisfying reason then "they're idiots" but I guess that will do.


Um, no, they're not idiots.  Any number of people have tried various split-plus-mdct filterbanks.

Including myself and Brandenburg (Hybrid coder or something like that, circa 1990 published), they have turned out to be more flops, a pain to deal with, etc.

The MP3 filterbank in specific, though, was not idiocy, it was overt, intentional politics, intended, I think, to make Layer 3 unattractive compared to Layer 2.

We all know how that worked out.
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #8
Um, no, they're not idiots.  Any number of people have tried various split-plus-mdct filterbanks.

Including myself and Brandenburg (Hybrid coder or something like that, circa 1990 published), they have turned out to be more flops, a pain to deal with, etc.

The MP3 filterbank in specific, though, was not idiocy, it was overt, intentional politics, intended, I think, to make Layer 3 unattractive compared to Layer 2.

We all know how that worked out.


I was referring to ATRAC3.  I've heard all about MP3 over the years.  But it sounds like there was some technical reason after all.

Quote
Someone (Jürgen, IIRC) once told me that the reason for the 4-band QMF was to be able to apply band-wise temporal gain envelopes to the signal, i.e. flatten the bands temporally, before doing the MDCT in order to reduce pre-echos.


Is this the "gain control" feature the ATRAC docs mention?



QMF + MDCT vs. MDCT only

Reply #11
There isn't any desirable use for QMF


Do you think all this is applicable to HE-AAC?


Sorry, let me be a bit more specific, there is no use for a tree-structure filterbank and then a bunch of MDCT's in a waveform codec. When you start to do modelling or SBR, you're not trying to do high-quality, and I do think the rules change.

When we get to stuff like SBR, that's a different animal.

There is a basic problem in tree-structure filter banks if you're trying to achieve high quality coding, and that is that the length of the impulse response must be longer than that of a single filterbank doing the same resolution.

The "gain control" mechanisms just haven't panned out too well. They should apply some kind of control a lot like TNS, but consider (for both) what happens over time with the time aliasing. It's not pleasant to figure out and deal with.
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #12
I was reading through ffmpeg's ATRAC3 decoder source code to learn more about the format.  I noticed it uses a QMF to split a signal into 4 equal subbands, and then uses a 512 point MDCT to code each. 

What is the advantage of this approach verses the more common 2048 point direct MDCT?  Is it to save memory by using smaller transforms, or is there some coding advantage?  It seems like an odd way to make a codec.


For the ATRAC3, the reason is to apply a time domain gain control tool on the 4 QMF outputs for better temporal compensation.. This Time domain gain control tool is the analogy to the TNS tool used in the MDCT frequency domain.

As in the MP3 encoder/decoder.. by splitting the signal by a 32 band filterbanks before applying the MDCT transform, it allows the low-pass filtering of audio signal by just simply zeroing the outputs of the upper
bands of the 32 band filterbanks before applying the 32 individual 18 points MDCTs..

Notice that you can't achieve this by just zeroing the output of upper coefficients of a single block of 2048 MDCT in AAC as you can do in MP3. It will lead to distortions in the reconstructed time domain signal.

QMF + MDCT vs. MDCT only

Reply #13
Notice that you can't achieve this by just zeroing the output of upper coefficients of a single block of 2048 MDCT in AAC as you can do in MP3. It will lead to distortions in the reconstructed time domain signal.


Uh, yes you can as long as its consistant across blocks. The "distortions" are the absense of the high frequencies, which is kinda expected when you filter them out.

If you do this only in 1 block of many, yeah, it's going to be  .  .  .  wierd. And wrong.
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #14
Woodinville, I think that zeroing a band of MDCT coefficients will bring some nonlinear distortion (or modulation, if you will) near the cutoff frequency, even when applied consistently to all blocks. The amount and spectral distribution will depend on a window shape. But the same should be true for PQMF, no? So, I don't agree with your argument, wong keat wai, either.

QMF + MDCT vs. MDCT only

Reply #15
Woodinville, I think that zeroing a band of MDCT coefficients will bring some nonlinear distortion (or modulation, if you will) near the cutoff frequency, even when applied consistently to all blocks. The amount and spectral distribution will depend on a window shape. But the same should be true for PQMF, no? So, I don't agree with your argument, wong keat wai, either.


Depends on how sharp the edge of the zeroing is. If it's windowed over some small number of lines, then the aliasing will be very minimal.
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #16
Yes, definitely. PQMF may be somewhat better for such a cut-off because it has "gentler" filters (not sure on this though - never compared).
Anyway, I agree that PQMF probably doesn't have any technical meaning in MP3 - it would work (somewhat) better with just plain MDCT. It could definitely be helpful for independent adjustment of T/F resolution in frequency bands, but it wasn't implemented after all.

QMF + MDCT vs. MDCT only

Reply #17
You can actually filter pretty well in the MDCT domain. Think of the "MDCT signal" as 2D signal (time+frequency). For each "frequency line" you can use a certain "compact" 2D impulse response that spans 3 samples in time and, say, 7 samples in frequency. If you want a perfectly good lowpass filter with a certain transition band you only need to touch MDCT samples near the transition band and do a little 2D convolution. For zero-phase filters with smooth transitions (w.r.t. frequency/amplitude response) this can be approximated well by simple band-depending scaling of the samples.

I also wouldn't say "nonlinear" at least not in a mathematical sense. MDCT -> zeroing some bands -> iMDCT is a perfectly linear operation. :-) You will just experience some aliasing near the rather harsh cut-off point.

The nice thing about typical PQMFs is that the overlap of filter responses between neighbouring bands is small relative to the bands' bandwidth. Say, you're using MP3's 32 band filter bank. Try this:
  • Keep all samples between band 0 (inclusive) and band 24 (exclusive)
  • Apply a simple zero-phase filter in band 24 with impulse response [-1 0 9 16 9 0 -1]./32 or alternativly this 2nd order butterworth filter: H(z) = 0.2929 * (1 + 2 z^-1 + z^-2)/(1 + 0.1715 z^-2)
  • Set all samples between band 25 (inclusive) and 32 to zero.


This corresponds to a cut off of 24.5/32 * fs/2 Hz ("twenty four and a half bands"), for fs=44.1 kHz this'll be 16.88 kHz. The amount of resulting aliasing is insignificantly low because the "transition happens" in the center of one band.

There's actually very little difference between what a fixed-window-size MDCT and PQMF does. This 2D convolution idea applies also to PQMFs. The only difference is that if you filter there's less interaction between neighbouring bands but more interaction between neightbouring frames (time slots). So, instead of a 3x7 kernel, you'd use a 7x3 kernel per band.

QMF + MDCT vs. MDCT only

Reply #18
, I agree that PQMF probably doesn't have any technical meaning in MP3 - it would work (somewhat) better with just plain MDCT.

Oh, it has a technical outcome, at least, it makes the undesired part of the standard more complex to impliment
-----
J. D. (jj) Johnston

QMF + MDCT vs. MDCT only

Reply #19
Not to mention a lot slower to decode since you run through a filterbank and an iMDCT.