Arrgh! Subband codec = :wtf:

Topic: Arrgh! Subband codec = :wtf: (Read 7312 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Arrgh! Subband codec = :wtf:

2001-10-04 17:37:12

I've read Andre's own site about MPC, and I've heard over and over again that MPC is a subband codec, which is somehow superior to transform codecs--at least at higher bitrates

Now I sort of know what a transform codec does (but spare me the mathematics :dazed:

But all I know about a 'subband' codec is that it isn't a transform codec.

I mean, 'subband', 'subband', no explanations anywhere about what a 'subband' is.

wtf? :wtf:

Now if somebody would tell me...
Thanks

Arrgh! Subband codec = :wtf:

Reply #1 – 2001-10-04 18:13:15

Quote

Originally posted by Joe Bloggs
But all I know about a 'subband' codec is that it isn't a transform codec. I mean, 'subband', 'subband', no explanations anywhere about what a 'subband' is.

In order to form subbands, you split up the signal using lowpass / highpass filter combinations. After splitting, you typically "critically downsample" the resulting signals. The idea is to keep the total number of samples per second constant, no matter how many subbands you create.

Claiming that one or the other is better is bogus. MP3 uses both. First it generates subbands, then it applies an MDCT (=transformation!) on each subband to further improve spectral resolution.

Christian

Arrgh! Subband codec = :wtf:

Reply #2 – 2001-10-04 18:22:00

Most encoders divide the 0-22kHz frequency band to smaller subbands. This is done with so called filterbank which consists of bandpass filters. For example both MP3 and MPC use a filterbank which divide the full frequency band to 32 subbands (narrowband signals).

Now, MP3 also uses Modified Discrete Cosine Transform to transform the subband-signals to the frequency domain, so it's called a transform codec. MDCT coefficients are then quantisized.

MPC or MP2 quantisizes the subband samples directly without the frequency transform procedure in between, so those are called subband coders.

MDCT in mp3 is a lossy process (due to the fact that quantization is also lossy).

Subband coder can't compete at lower bitrates because the transform procedure gives an edge in encoding efficiency.

Arrgh! Subband codec = :wtf:

Reply #3 – 2001-10-04 21:48:58

Quote

Originally posted by cbuchner1
After splitting, you typically "critically downsample" the resulting signals. The idea is to keep the total number of samples per second constant, no matter how many subbands you create.

This is the part which has always bothered me in subband coding. I've read 2 different books and they seem to tell me different things. One book tells me that each subband is downsampled to 1/32 of the original sampling frequency (assuming 32 subbands which is true for most subband coders). Another book told me that each subband is downsampled to the Nyquist frequency of the subband, i.e. the first subband will be sampled at 1/32, second subband at 1/16, etc and the highest frequency subband will not be downsampled at all.

I'm not quite sure which is correct, but I've always wondered, it the first book is correct, how would the signal survive the downsampling to a sampling frequency less than Nyqust frequency?

Any help appreciated, thanks.

Arrgh! Subband codec = :wtf:

Reply #4 – 2001-10-04 22:07:03

Quote

Originally posted by tangent

Another book told me that each subband is downsampled to the Nyquist frequency of the subband, i.e. the first subband will be sampled at 1/32, second subband at 1/16, etc and the highest frequency subband will not be downsampled at all.

After moving the bandpass coefficients to the basebands ("frequency shift", "demodulation") you can safely apply a Nyquist frequency of
(2 * BW) where BW is the bandwidth of the subband.

As far as I know so-called "polyphase filterbanks" are used in subband generation. And I have no clue how these work internally. Maybe you can find some literature on this.

Arrgh! Subband codec = :wtf:

Reply #5 – 2001-10-05 00:32:26

So a subband is a bandwidth-limited signal existing in the time domain - correct?

Seems only having 32 bands would limit the psychoacoustic effects you could exploit, especially if they're evenly spaced in the spectrum.

-h

Arrgh! Subband codec = :wtf:

Reply #6 – 2001-10-05 00:46:38

Quote

Originally posted by h
Seems only having 32 bands would limit the psychoacoustic effects you could exploit, especially if they're evenly spaced in the spectrum.

Interestingly enough, if you read the MPC author's website it appears that MPC uses more techniques to exploit psychoacoustic effects than the majority of transform codecs (AAC perhaps excluded). One of the more interesting ones is temporal post-masking (and the codec still doesn't cause post-echo issues like most other codecs ) which I don't believe most encoders take advantage of, again I'm not sure about AAC though.

Arrgh! Subband codec = :wtf:

Reply #7 – 2001-10-05 06:35:44

Quote

Interestingly enough, if you read the MPC author's website it appears that MPC uses more techniques to exploit psychoacoustic effects than the majority of transform codecs (AAC perhaps excluded). One of the more interesting ones is temporal post-masking (and the codec still doesn't cause post-echo issues like most other codecs ) which I don't believe most encoders take advantage of, again I'm not sure about AAC though.

Is the overall process anything like this? It's all I've been able to gather:

- break input into subbands via FFT low/highpassing, leave as spectral coefficients
- (MP3/AAC would perform an additional MDCT on the spectra here?)
- perform masking operations on spectra
- quantise/store spectra

Unless there's something I've completely missed, the only reason MPC seems capable of handling pre/post echo so much better than the transformation mob is a better system of detecting it - excessively quantising the subbands should still introduce similar artifacts, MDCT or not.

I think I've missed something.

-h

Arrgh! Subband codec = :wtf:

Reply #8 – 2001-10-05 07:10:28

Psychoacoustic effects could be exploited very good in a subband codec - since even MP3 and AAC codecs deal with the 'scalefactor bands' (ie. groups of the frequency lines, usually 32, or 49 in AAC), too. Dealing with single frequency line would be a waste of time and it won't be good for the compression.

Regarding post masking - this effect (and other psychoacoustic effects) is not dependent on particular codec name. LAME uses postmasking (at least it had a switch when I last checked), PsyTEL AAC, too. We can only guess, but I am quite sure that postmasking is used in the FhG advanced codecs (like MP3Enc, or AAC). Postmasking is also used in the Dolby AC-2 and AC-3 reference implementations.

The advantage of the MPC in pre-echo handling is because subband filtering does not suffer from potential pre-echo (it is much smaller than long MDCT) - long MDCT applied to MP3 (1152 samples) and AAC (2048 samples) is much more sensitive to pre-echo (quantization error is smeared over 2048 sample window). While transform codecs solve this with block switching, or with increasing of the local bitrate - MPC handles this by increasing bitrate only.

MDCT is not a "loosy" process in perceptual point of view. Maybe the result of MDCT-IMDCT won't be bit-to-bit identical to the original, but the distortions found are not significant for the listener. MDCT filterbanks are also called 'perfect reconstruction' filterbanks. Maybe hybrid filterbank in MP3 adds some problems, but filterbank in AAC (pure MDCT) is the perfect-reconstruction one.

MPC, MP2 - Subband Filtering
MP3 - Subband filtering + MDCT on each subband
AAC, PAC, AC-3 - pure MDCT (AC-3 has MDST, too)
AAC SSR, ATRAC - QMF + MDCT
etc...

Arrgh! Subband codec = :wtf:

Reply #9 – 2001-10-05 07:26:16

Quote

The advantage of the MPC in pre-echo handling is because subband filtering does not suffer from potential pre-echo (it is much smaller than long MDCT) - long MDCT applied to MP3 (1152 samples) and AAC (2048 samples) is much more sensitive to pre-echo (quantization error is smeared over 2048 sample window). While transform codecs solve this with block switching, or with increasing of the local bitrate - MPC handles this by increasing bitrate only.

So a subband-only codec can just throw bits at the stepped coefficients that an attack generates, whereas codecs that apply a further MDCT on said coefficients are working against themselves in this situation. Is that getting close?

I think I'd love a diagram like:
- "Here's some PCM input"
- "Here is a spectrogram of 8 sample subbands from the input"
- "Here are said subbands after a subband coder has completed its masking/quantising"
- "Here are said subbands after undergoing an MDCT, masking/quantising, and iMDCT as per a transform codec"
- "Here's the final PCM generated by both"

Yes I know no one's likely to do it, but I've always preferred visual aids over theoretical ones. Are there any books I should be reading on this?

Edit: Just noticed that AAC works with a single MDCT operation on the input - how is that different to MPC creating subbands then operating on the spectra therein? Why would the MDCT series be more prone to pre-echo than the Fourier series that a subband contains? So many questions!

-h

Arrgh! Subband codec = :wtf:

Reply #10 – 2001-10-05 07:50:57

Quote

Originally posted by h

- break input into subbands via FFT low/highpassing, leave as spectral coefficients
- (MP3/AAC would perform an additional MDCT on the spectra here?)
- perform masking operations on spectra
- quantise/store spectra

Unless there's something I've completely missed, the only reason MPC seems capable of handling pre/post echo so much better than the transformation mob is a better system of detecting it - excessively quantising the subbands should still introduce similar artifacts, MDCT or not.

Not really.

Most coders use a Polyphase Quadrature Mirror Filterbank (PQMF) to split the input signal into subbands. As far as I understand it, it does not use transform to split the signal but a set of matrices, in a process called FIR (Finite Impulse Response) Filters. And if I'm not wrong this is how it works:

In the frequency spectrum, a bandpass filter looks like a rectangular pulse function, where G(f)=0 when f is outside the bandpass range and G(f)=1 when f is within the bandpass range. In the frequency domain, this function G(f) is multiplied with the frequency spectrum of the input signal to get a bandpassed frequency spectrum. This translates to a convolution in the time domain, where the input signal can be convoluted with the time-domain transform for the filter.

Coders will then use FFT (for subband coders) or MDCT (for transform coders) to analyse each subband.

MPC, being a subband coder, does not perform masking and quantisation on the spectra but on the subband. The spectra is used to determine masking levels to modify the ATH curve, then the signal level of each subband is measured. In cases where the signal level falls below the masking curve, the entire subband is not coded. If there are signals above the masking curve, the subband is then quantised depending on the frequency of the subband (lower frequency subbands are usually given more bits) and the signal-to-mask ratio (signals with a higher SMR will be given more bits).

MPC does not need to handle pre/post-echos because the signals are left in the time domain when stored. Only in the transform coders where the signals are quantised and stored in the frequency domain would you run into problems with pre/post-echos.

Disclaimer: I don't claim that all I said above is true, and I am hardly an expert on this subject. But this is information I have had to gather from my own readings since I've hardly ever seen the real experts come into the forums to explain all these details to us.

Arrgh! Subband codec = :wtf:

Reply #11 – 2001-10-05 07:56:21

Ok, when you apply 1024 point MDCT to the 2048 samples, you will get 1024 frequency coefficients.

That is excellent from the frequency-resolution point of view, but the time resolution is way too big (>20 ms). So, when you introduce the quantization error it is being smeared over entire frame of 20 ms. Human ear is sensitive to 2 ms pre-echo errors, imagine how does it sound like when it is 20 ms long.

The possible solution is to increase bitrate, in order to decrease coding error - or, to switch to shorter window (for example 8*256).

MPC will be getting 32 subbands, but with time resolution of the WINDOW_SIZE/32. So, the time resolution of the subband codec is much better from the start. Each subband stays in time domain so there is no smearing in time domain. Of course, the frequency resolution is worse, and therefore these kind of codecs are not useful for very low bitrate coding (96 kbits).

Arrgh! Subband codec = :wtf:

Reply #12 – 2001-10-05 08:01:32

Quote

MPC does not need to handle pre/post-echos because the signals are left in the time domain when stored. Only in the transform coders where the signals are quantised and stored in the frequency domain would you run into problems with pre/post-echos.

That clears up a lot of confusion. I had no idea MPC was storing subbands in the time domain, which led to a lot of confusion as to how the input was being processed and stored. Echo is inherently a problem in the frequency domain, and since I thought MPC was storing coefficients I didn't see (until now) how it was overcoming it.

Does that imply that subband coders could benefit from better linear PCM compression methods, ala Monkey's Audio being better than LPAC?

Thanks anyway! That post cleared up a lot of things.

-h

Arrgh! Subband codec = :wtf:

Reply #13 – 2001-10-05 11:34:00

Quote

Originally posted by Ivan Dimkovic
Regarding post masking - this effect (and other psychoacoustic effects) is not dependent on particular codec name. LAME uses postmasking (at least it had a switch when I last checked), PsyTEL AAC, too. We can only guess, but I am quite sure that postmasking is used in the FhG advanced codecs (like MP3Enc, or AAC). Postmasking is also used in the Dolby AC-2 and AC-3 reference implementations.

Yes, LAME does have a switch for this, but unfortunately it doesn't do much good. It barely saves any bits, and does not really allow for much increased quality (via saved bits) at a given bitrate. I imagine the main reason for this is that postmasking does not lend itself exceptionally well to transform coders with poor time resolution (MP3). The fact that LAME stills suffers from audible post-echo on quite a few samples supports the fact that it is not really efficient enough in this regard to exploit this effect properly.

Quote

Originally posted by h
Does that imply that subband coders could benefit from better linear PCM compression methods, ala Monkey's Audio being better than LPAC?

I believe this would probably be the case. On some of the posts where the author explained exactly what he has improved upon over MP2, he often mentions using much more advanced lossless compression methods to keep the bitrate lower, so this would make sense.

Arrgh! Subband codec = :wtf:

Reply #14 – 2001-10-05 11:38:39

Quote

Originally posted by Dibrom

Yes, LAME does have a switch for this, but unfortunately it doesn't do much good. It barely saves any bits, and does not really allow for much increased quality (via saved bits) at a given bitrate. I imagine the main reason for this is that postmasking does not lend itself exceptionally well to transform coders with poor time resolution (MP3). The fact that LAME stills suffers from audible post-echo on quite a few samples supports the fact that it is not really efficient enough in this regard to exploit this effect properly.

Method used in LAME is rather simple - and I agree, post-masking is very hard to exploid on long windows, like long MP3 window (MDCT, 1152 samples)...

But post-masking could be very effective, I remember that my AAC implementation encoded fatboy.wav at ~320 kbits/s without postmasking and ~250 kbits/s with postmasking - which is a very good result.

Arrgh! Subband codec = :wtf:

Reply #15 – 2001-10-05 11:59:26

Quote

Originally posted by Ivan Dimkovic
Method used in LAME is rather simple - and I agree, post-masking is very hard to exploid on long windows, like long MP3 window (MDCT, 1152 samples)...

But post-masking could be very effective, I remember that my AAC implementation encoded fatboy.wav at ~320 kbits/s without postmasking and ~250 kbits/s with postmasking - which is a very good result.

Yeah, I would expect AAC to be able to take advantage of this effect much better than MP3. I didn't realize even with AAC that bit savings would be quite that impressive though. 70kbps really is a pretty significant amount if there is no perceived loss in quality..

Arrgh! Subband codec = :wtf:

Reply #16 – 2001-10-05 12:05:22

Quote

Originally posted by h

Does that imply that subband coders could benefit from better linear PCM compression methods, ala Monkey's Audio being better than LPAC?

It's possible to apply linear predictive DPCM, but it's unlikely that you will get any compression benefits because each subband is further downsampled to 1/32 of the original sampling frequency, reducing most of the intersample redundancy.

Arrgh! Subband codec = :wtf:

Reply #17 – 2001-10-05 12:51:05

About loseless compression in MPC vs MP2, I'd point that to my knowledge there is no loseless compression at all in MP2.

But does anyone know why? I'm wondering because I know that mp2 was designed to be fast, but Huffman is very inexpensive in the decompression step.

Arrgh! Subband codec = :wtf:

Reply #18 – 2001-10-05 13:27:30

Thanks anyway for all the replies... I think my understanding of subband codecs has increased 'a lot'

PS. The forum didn't let me put that many EmoIcons in one post! :insane: :roflmao: :rofl:

Thanks also for making my two threads the most popular here so far!! :flipoff: :evilgrin:

Arrgh! Subband codec = :wtf:

Reply #19 – 2001-10-06 07:34:56

Quote

Originally posted by Joe Bloggs
PS. The forum didn't let me put that many EmoIcons in one post!

There is a reason for this. I put a limit on the amount of icons one could have per post. I may lower this limit even more. Please do not abuse this feature, I'd like to keep things on topic and the signal to noise ratio high. Nothing personal.

The post in question has been split off to the off topic section.

Arrgh! Subband codec = :wtf:

Reply #20 – 2001-10-06 13:24:17

Quote

Originally posted by gabriel
About loseless compression in MPC vs MP2, I'd point that to my knowledge there is no loseless compression at all in MP2.

But does anyone know why? I'm wondering because I know that mp2 was designed to be fast, but Huffman is very inexpensive in the decompression step.

It's probably because Huffman won't achieve much compression in the MP2 stream. You could apply Huffman coding on the frame headers or the granule data, that wouldn't give you much savings. Of course Huffman coding the linear prediction error of the granule data would give you much more savings than directly Huffman coding the samples, but like I mentioned earlier in a previous post, I doubt there's much savings.

Notice