Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: downside of transform coders / enhancement idea (Read 4253 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

downside of transform coders / enhancement idea

What do you think are the weaknesses of common transform coders for high-quality encoding ?
Can it be summarized by "lack of high temporal resultion => pulpy transients" ?
I'm just curious about other's opinions for that.

I'm currently experimenting with a new (MDCT based) filterbank.
It looks very promising.

A comparison to a hybrid QMF/Wavelet+MDCT approach like ATRAC:

pros:
- perfect reconstruction
- completey MDCT based (don't worry about designing QMF filters)
- even more flexible (frame and frequency adaptive) selectable
  time/frequency resolutions.

cons:
- slightly more transform work to to
- introduces a higher delay (depends on whether i use
  temporal alias reduction or not)

Such an approach could result in a highly scaleable
codec. Any comments ?

downside of transform coders / enhancement idea

Reply #1
Actually, I don't really understand why the MDCT is called a transform.. It comes under Perfectly Reconstructed (PR) cosine modulated subband filter.. It is basically a type of QMF filterbank !

downside of transform coders / enhancement idea

Reply #2
Quote
I'm currently experimenting with a new (MDCT based) filterbank.
It looks very promising.

pros:
- perfect reconstruction
- completey MDCT based (don't worry about designing QMF filters)
- even more flexible (frame and frequency adaptive) selectable
  time/frequency resolutions.

Such an approach could result in a highly scaleable
codec. Any comments ?


Interesting. Could you shed some more light on how to achieve a more flexible time/frequency resolution? And what criteria are you using for choosing a specific resolution?

Greetz,

downside of transform coders / enhancement idea

Reply #3
Quote
Interesting. Could you shed some more light on how to achieve a more flexible time/frequency resolution? And what criteria are you using for choosing a specific resolution?

Hard to explain in a few words...
It's basically the same as Vorbis' MDCT for the first stage. Then, the temporal resolution of some frequency regions is increased by treating the mdct coeffs of the first MDCT as another time signal which is again MDCT transformed in smaller chunks. Due to the frequency/time duality this 2nd "decorrelation" increases time resolution for those bands. To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.

To give an example what time/frequency resolutions are possible:

1024-sample-block {
    0-4134 Hz: 192 bands with each a single sample
    4134-11025 Hz: 40 bands with each 8 samples
    11025-22050 Hz: 16 bands with each 32 samples
}
// followed by
128-sample-block {
    0-11025 Hz: 64 bands with each a single sample
    11025-22050 Hz: 16 bands with each 4 samples
}
// and so on

In this example "butterflies" can be used for the 4134-22050 Hz region between
these two blocks to remove the temporal alias introduced in the first MDCT stage
(due to windowing)

Criteria: (1) high spectral resolution vs (2) high temporal resolution:
With (1) it's possible to accurately spectral-shape the quantization noise. It usually concentrates energy in a few bins (compact energy representation) for tonal/stationary parts.
(2) allows accurate control of the quantization noise's temporal shape and is better suited for quantizing noisy parts at low SNRs.

edit: fixed example

bye,
Sebastian

 

downside of transform coders / enhancement idea

Reply #4
Quote
It's basically the same as Vorbis' MDCT for the first stage. Then, the temporal resolution of some frequency regions is increased by treating the mdct coeffs of the first MDCT as another time signal which is again MDCT transformed in smaller chunks. Due to the frequency/time duality this 2nd "decorrelation" increases time resolution for those bands.


This is similar to the method of Purat and Noll, in "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", ICASSP '96.

Quote
To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.

Since at both stages you use an orthonormal transform, I do not see why these antialiasing butterflies are necessary?

Quote
Criteria: (1) high spectral resolution vs (2) high temporal resolution:
With (1) it's possible to accurately spectral-shape the quantization noise. It usually concentrates energy in a few bins (compact energy representation) for tonal/stationary parts.
(2) allows accurate control of the quantization noise's temporal shape and is better suited for quantizing noisy parts at low SNRs.

I understand that (1) and (2) are the properties of the resulting transform, but I was interested in the criteria that you use to achieve high spectral resolution for stationary/tonal parts and high temporal resolution for non-stationary/transient parts. Are you using perceptual entropy, transient detection, analysis-by-synthesis methods.

I would be very interested in your results. I agree that with these adaptions you could achieve a high level of scalability. However, there is one definite "con" that you did not mention: side information. You have to tell the decoder what transform structure you used. For adaptive framing, this side information is neglegible, but for adaptive frequency decompositions, this is not the case, even when you use entropy coding of the side info.

Greetz,

Petracci

downside of transform coders / enhancement idea

Reply #5
Quote
This is similar to the method of Purat and Noll, in "A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms", ICASSP '96.

Hmm... is this paper publicy available ?
I tried citeseer but I was not able to download it.

Quote
Quote
To compensate for the time domain alias introduced in the first stage some "butterflies" can be applied in the new "subband transform domain" between frames.
Since at both stages you use an orthonormal transform, I do not see why these antialiasing butterflies are necessary?

They are necessary because each of the n MDCT coeffs affects up to 2*n time samples. Consider MDC-Transforming a unit impulse. This pulse is covered by 2 windows and therefore affects 2 MDCT spectra. The butterflies can be used to reduce this alias effect after the 2nd stage so that there'll only appear one pulse in the transformed version.

Quote
I understand that (1) and (2) are the properties of the resulting transform, but I was interested in the criteria that you use to achieve high spectral resolution for stationary/tonal parts and high temporal resolution for non-stationary/transient parts. Are you using perceptual entropy, transient detection, analysis-by-synthesis methods.

I'm not doing anything right now. I just tinkered a bit around this filterbank idea. Checking the impulse responses of the inverse filterbank for different pulses in the transform domain. I posted it here to discuss this approach. Well, I did not give much details in the first place. But it's hard to explain.

The main difference compared to other hybrid approaches is, the first stage decomposes the signal with a very high spectral resolution and kind of reverts it for some bands whereas common hybrid filterbanks decompose the signal into broader subbands in the first stage and do further band-splitting in the 2nd stage.
(But we can always reduce the alias effect of the first stage after the 2nd stage by applying alias-reduction butterflies)

Quote
However, there is one definite "con" that you did not mention: side information. You have to tell the decoder what transform structure you used. For adaptive framing, this side information is neglegible, but for adaptive frequency decompositions, this is not the case, even when you use entropy coding of the side info.

Yes, therefore I don't think it makes much sense to allow all transform variants. Just a few one that prove to be a good choice in most situations. The sideinformation will be neglegible for just 8 transform variants for example.

Maybe you want to check my view of the MDCT, which explains to some extent why the butterflies after the 2nd stage can be used to cancel the first set of butterflies.
see thread here

bye,
Sebastian

edit: fixed some typos (probably not all)