I'm currently experimenting with a new (MDCT based) filterbank.It looks very promising.pros:- perfect reconstruction- completey MDCT based (don't worry about designing QMF filters)- even more flexible (frame and frequency adaptive) selectable time/frequency resolutions.Such an approach could result in a highly scaleablecodec. Any comments ?
Interesting. Could you shed some more light on how to achieve a more flexible time/frequency resolution? And what criteria are you using for choosing a specific resolution?
It's basically the same as Vorbis' MDCT for the first stage. Then, the temporal resolution of some frequency regions is increased by treating the mdct coeffs of the first MDCT as another time signal which is again MDCT transformed in smaller chunks. Due to the frequency/time duality this 2nd "decorrelation" increases time resolution for those bands.
To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.
Criteria: (1) high spectral resolution vs (2) high temporal resolution:With (1) it's possible to accurately spectral-shape the quantization noise. It usually concentrates energy in a few bins (compact energy representation) for tonal/stationary parts.(2) allows accurate control of the quantization noise's temporal shape and is better suited for quantizing noisy parts at low SNRs.
This is similar to the method of Purat and Noll, in "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", ICASSP '96.
QuoteTo compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames. Since at both stages you use an orthonormal transform, I do not see why these antialiasing butterflies are necessary?
I understand that (1) and (2) are the properties of the resulting transform, but I was interested in the criteria that you use to achieve high spectral resolution for stationary/tonal parts and high temporal resolution for non-stationary/transient parts. Are you using perceptual entropy, transient detection, analysis-by-synthesis methods.
However, there is one definite "con" that you did not mention: side information. You have to tell the decoder what transform structure you used. For adaptive framing, this side information is neglegible, but for adaptive frequency decompositions, this is not the case, even when you use entropy coding of the side info.