[Total Beginner] Do I Have This Right?

2015-02-12 02:27:55

I'm new to all of this, so I want to make sure I have the fundamentals down first. I would be happy if people correct me wherever I'm wrong.

You start off with a PCM waveform and read it in one segment of fixed length at a time. You do an MDCT transform on this segment after you have applied a windowing function; we need this later for TDAC/overlap. Here you can zero out some of the values; this is called decimation. Those values aren't perceptually important for the IMDCT. You can also do some fancy stuff with psychoacoustics here, if you like. Because this block has a bunch of zero values in it, it's a great candidate for entropy encoding, such as range coding or Huffman.

EDIT: Bah, I forgot that your input has to be read in an overlapping manner. So each sample will effectively be read and compressed twice. Once on the right half of the transform and then again on the left. They have to match up and be added to each other when TDAC happens.

[Total Beginner] Do I Have This Right?

Reply #1 – 2015-02-14 21:10:21

The decimation is not related to perceptual importance. It's just throwing out every other sample. Basically, MDCT is a DCT with frequency-domain decimation. You might also encounter the term "decimation" in descriptions of fast (factored) transform algorithms. In either context, it is not directly related to perceptual coding and data compression.

The perceptual importance comes into play when the frequency components are quantized. This is where lossy compression takes place - in simple words, the magnitudes of some frequency components are represented by smaller ranges of integer numbers, and consequently require less bits to encode them. Which frequencies to encode with which precision is determined by a psychoacoustic model.

Usually the highest frequencies are encoded with the least accuracy. Long runs of zeroes may appear in the high-frequency part of spectrum because of this, and because the high frequencies usually do not carry much energy to begin with. Additionally, an explicit low-pass filtering may be used, which zeroes out the upper part of the spectrum. But it is not correct to think of perceptual coding as simply "zeroing out" certain frequencies. Most of the frequency components are not removed, but replaced with approximate values.

Regarding the windowing function that is applied to the data before transform, it is not needed for aliasing cancellation per se. It is used to minimize the window boundary artifacts caused by frequency components quantization, that could otherwise become very noticeable.

[Total Beginner] Do I Have This Right?

Reply #2 – 2015-02-15 06:42:22

Thank you for replying. I'll study your post and respond with more questions (I'll have them).

Notice