Title: **Scale factor explaination**

Post by:**pratheekp** on **2011-07-08 18:15:47**

Post by:

Hi,

Can anybody explain me the concept of scale factors in AAC? I mean how the scale factors are determined for spectral values?

Also in the distortion control loop how incrementing the scale factors and then scaling with it reduces distortion?

Please explain..

Can anybody explain me the concept of scale factors in AAC? I mean how the scale factors are determined for spectral values?

Also in the distortion control loop how incrementing the scale factors and then scaling with it reduces distortion?

Please explain..

Title: **Scale factor explaination**

Post by:**benski** on **2011-07-08 19:51:34**

Post by:

scale factors are basically the MDCT coefficients, with some math involved to quantize the values to reduce entropy. The amount of quantization necessary for each band/scale-factor (and therefore the amount of precision it will have) is determined as part of your psychoacoustic algorithm.

Title: **Scale factor explaination**

Post by:**alexeysp** on **2011-07-09 12:13:52**

Post by:

I think benski's answer is not very clear. I'll try to explain a little further.

The scale factors are just what they're called - the scale factors. The resolution is controlled by scaling.

Assume we have a set of real numbers from the range of [0, 1]. If we quantize them to, say, 10-bit integer with a uniform quantizer we'll obtain a corresponding set of integer numbers in range [0, 1023]. If, however, we scale the initial set by factor of 0.5 prior to quantization, the resulting integers will fit into [0, 511], occupying not more than 9 bits. Every multiplication by 0.5 effectively shaves off a single bit from the result, at the cost of increased error in reconstruction of the original set.

Now what*exactly* is called a "scale factor" may differ from implementation to implementation. In the above example we could say that the scale factor is 0.5, or we could say that it's 511, or we could call the exponent log2(0.5) = -1 a scale factor, or we could define the scale factor as reciprocal value (1/0.5) etc. Also in actual audio compression algorithms the quantization is non-uniform, and the scale factors are not necessarily restricted to integer powers of two. But the principle is still the same.

As for the particular scale factor values, as benski said, they are assigned by the encoder according to psychoacoustic model that determines the allowed amount of error in each spectrum band depending on the computed masking thresholds.

The scale factors are just what they're called - the scale factors. The resolution is controlled by scaling.

Assume we have a set of real numbers from the range of [0, 1]. If we quantize them to, say, 10-bit integer with a uniform quantizer we'll obtain a corresponding set of integer numbers in range [0, 1023]. If, however, we scale the initial set by factor of 0.5 prior to quantization, the resulting integers will fit into [0, 511], occupying not more than 9 bits. Every multiplication by 0.5 effectively shaves off a single bit from the result, at the cost of increased error in reconstruction of the original set.

Now what

As for the particular scale factor values, as benski said, they are assigned by the encoder according to psychoacoustic model that determines the allowed amount of error in each spectrum band depending on the computed masking thresholds.

Title: **Scale factor explaination**

Post by:**pratheekp** on **2011-07-12 07:52:08**

Post by:

Thanks to both of you for this useful information..

Title: **Scale factor explaination**

Post by:**pratheekp** on **2011-07-12 08:01:36**

Post by:

Hi,

But can you explain me how every multiplication with 0.5 increases error? because to the decoder we are passing that scale factor as side info and it multiplies it with the inverse quantized spectral coefficients..So where does the error come from?Please explain...

Thank you

But can you explain me how every multiplication with 0.5 increases error? because to the decoder we are passing that scale factor as side info and it multiplies it with the inverse quantized spectral coefficients..So where does the error come from?Please explain...

Thank you

Title: **Scale factor explaination**

Post by:**C.R.Helmrich** on **2011-07-12 09:21:02**

Post by:

The increased error happens in the quantization (i.e. rounding) of the pre-scaled MDCT coefficients. Example:

*Encoder*

Input MDCT coefficients: [3, 1, 4, 1, 5, 9, 2, 6, 5]

Divide by scale factor sf1 = 2: [1.5, 0.5, 2, 0.5, 2.5, 4.5, 1, 3, 2.5]

(AAC: apply a power law on the MDCT coefficients here)

Quantize coefficients by e.g. truncating to integer: [1, 0, 2, 0, 2, 4, 1, 3, 2]

*Decoder*

(AAC: apply inverse power law on the quantized coefficients here)

Multiply by scale factor sf1 = 2: [2, 0, 4, 0, 4, 8, 2, 6, 4]

What you get is the dequantized (or inverse quantized) MDCT spectrum. Notice that it has only 5 different values, whereas the original MDCT had 7 different values.

If you would use a scale factor sf2 = 4 = sf1 / 0.5, the quantization would lead to only 3 different values => sf2 leads to more distortion (error) than sf1.

Chris

Input MDCT coefficients: [3, 1, 4, 1, 5, 9, 2, 6, 5]

Divide by scale factor sf1 = 2: [1.5, 0.5, 2, 0.5, 2.5, 4.5, 1, 3, 2.5]

(AAC: apply a power law on the MDCT coefficients here)

Quantize coefficients by e.g. truncating to integer: [1, 0, 2, 0, 2, 4, 1, 3, 2]

(AAC: apply inverse power law on the quantized coefficients here)

Multiply by scale factor sf1 = 2: [2, 0, 4, 0, 4, 8, 2, 6, 4]

What you get is the dequantized (or inverse quantized) MDCT spectrum. Notice that it has only 5 different values, whereas the original MDCT had 7 different values.

If you would use a scale factor sf2 = 4 = sf1 / 0.5, the quantization would lead to only 3 different values => sf2 leads to more distortion (error) than sf1.

Chris

Title: **Scale factor explaination**

Post by:**pratheekp** on **2011-07-12 09:45:06**

Post by:

Thanks chris, it was really a good explaination...

Title: **Scale factor explaination**

Post by:**dduarr** on **2015-03-31 08:08:47**

Post by:

Are spectral coefficients and scalefactor bands same? I'm confused with the scalefactor bands.

Which ones are acutally huffman coded into bitstream?

Thank you.

Which ones are acutally huffman coded into bitstream?

Thank you.

Title: **Scale factor explaination**

Post by:**alexeysp** on **2015-03-31 10:40:47**

Post by:

Are spectral coefficients and scalefactor bands same? I'm confused with the scalefactor bands.

With AAC you have a block of 1024 audio samples, which are transformed to frequency domain; hence you obtain 1024 frequency components (let's forget about overlapping and short blocks for now). These frequency components are then grouped into 49 bands, approximately corresponding to the human auditory system's critical bands (http://en.wikipedia.org/wiki/Bark_scale). Each of these bands is then assigned its own scale factor. So, every scale factor band contains multiple frequency components (coefficients), but all components within a band are quantized with the same scale factor.

Quote

Which ones are acutally huffman coded into bitstream?

The quantized spectral coefficients are being Huffman-coded and packed into a frame, along with the scale factors.

Title: **Scale factor explaination**

Post by:**dduarr** on **2015-03-31 12:59:02**

Post by:

Are spectral coefficients and scalefactor bands same? I'm confused with the scalefactor bands.

With AAC you have a block of 1024 audio samples, which are transformed to frequency domain; hence you obtain 1024 frequency components (let's forget about overlapping and short blocks for now). These frequency components are then grouped into 49 bands, approximately corresponding to the human auditory system's critical bands (http://en.wikipedia.org/wiki/Bark_scale). Each of these bands is then assigned its own scale factor. So, every scale factor band contains multiple frequency components (coefficients), but all components within a band are quantized with the same scale factor.QuoteWhich ones are acutally huffman coded into bitstream?

The quantized spectral coefficients are being Huffman-coded and packed into a frame, along with the scale factors.

Thank you very much, alexeysp.

Title: **Scale factor explaination**

Post by:**dduarr** on **2015-05-06 08:31:29**

Post by:

What is the difference between lines and scale factor bands in AAC? I don't understand lines in FDK AAC library. Please help me explain it. Thank you.

Title: **Re: Scale factor explaination**

Post by:**swijayakoon** on **2016-05-16 11:03:35**

Post by:

Dear Alexeysp

1. In AAC-MPEG4 block size is 1024 or 2048?

2. It is not clear relationship between 49 bands and 24 critical bands. How to divide spectrum in to 49 bands by following 24 critical bands?

3. Can you please tell me the final encoded AAC (MPEG4) frame structure with all necessary bits?

4. Is it enough to have only MDCT, Pshychoacoustic and quantizer for AAC encoder by neglecting other components like TNS, PNS, LTP,etc.?

Thank you

1. In AAC-MPEG4 block size is 1024 or 2048?

2. It is not clear relationship between 49 bands and 24 critical bands. How to divide spectrum in to 49 bands by following 24 critical bands?

3. Can you please tell me the final encoded AAC (MPEG4) frame structure with all necessary bits?

4. Is it enough to have only MDCT, Pshychoacoustic and quantizer for AAC encoder by neglecting other components like TNS, PNS, LTP,etc.?

Thank you

Title: **Re: Scale factor explaination**

Post by:**Woodinville** on **2016-05-21 07:01:13**

Post by:

Scale factors set the step size inside the powerlaw quantizer.

MPEG-2AAC works on a 1024 sample block size, although internally it may handle it differently.

49 Scale factor bands have to do both with ERB's and with phase roll during LR/MS coding when time delay is involvedl

I wouldn't even TRY to describe MPEG-4.

If you don't have TNS you don't have a completely psychoacoustic encoder. ON the other hand, there are patents to be honored there. Good luck with those.

You might look at a 1992 paper in ICASSP by Johnston (and Ferriera?) for some ideas on a lot of the MPEG-2 AAC structure. MPEG-4 Structure is not fixed, there are too many tools and too many targets, and too much system to mess with in my opinion.

MPEG-2AAC works on a 1024 sample block size, although internally it may handle it differently.

49 Scale factor bands have to do both with ERB's and with phase roll during LR/MS coding when time delay is involvedl

I wouldn't even TRY to describe MPEG-4.

If you don't have TNS you don't have a completely psychoacoustic encoder. ON the other hand, there are patents to be honored there. Good luck with those.

You might look at a 1992 paper in ICASSP by Johnston (and Ferriera?) for some ideas on a lot of the MPEG-2 AAC structure. MPEG-4 Structure is not fixed, there are too many tools and too many targets, and too much system to mess with in my opinion.