## Vorbis Noise Normalization

#####
Reply #1 –

The idea is simple, really. A psychoacoustic model basically estimates how much distortions/quantization noise can be tolerated in specific time/frequency regions. But at low signal to noise ratios we can observe that simple undithered scalar quantzation introduces nonlinear artefacts. This is what makes it sound "metallic". Also, for us humans it's very important to preserve the "energy" of the signal (think of energy = sum of squared samples). MPEG4 AAC even uses a tool called PNS (perceptual noise substitution). This tool does only preserve the energy level and nothing else. Back to Vorbis. The Vorbis encoder simply computes the sum of squared original samples over blocks of (I think) 32 samples in the high frequency area and compares it to the sum of squared quantized samples. If the latter sum is less than the former, we lost some energy. Then, some samples are "promoted" to have higher values. This goes both ways. People used to complain about a "HF boost" with Vorbis which can partly be explained by quantization noise adding to the loudness which is directly linked to "energy". Example:

original:

[ -2.35 -0.84 0.65 -1.39 1.25 -0.49 -0.85 0.89 ]

rounded (quantization)

[ -2 -1 1 -1 1 0 -1 1 ]

The sum of squares for the original is 11.9. The sum of squares of the second signal is only 10. We can try to fix that by "rounding" -0.49 to -1 instead of 0. Then we have a sum of squares which equals 11. This is closer to 11.9. In addition, we could "promote" the 4th sample to -2 instead of -1. This would add another 3 to the sum of squared samples. But 14 is a little too much. We should probably stick to

[ -2 -1 1 -1 1 -1 -1 1 ]

as our quantized signal. So, we're not only interested in finding a quantized vector that is close to the original but also in finding one that has about the same "length" (think of it as vector quantization).

This technique also seems to reduce the metallic artefacts.