HydrogenAudio

Lossy Audio Compression => Ogg Vorbis => Ogg Vorbis - Tech => Topic started by: dsimcha on 2010-03-16 15:12:17

Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-16 15:12:17
I've become very curious about how Vorbis noise normalization works.  My (very vague) understanding is that it somehow avoids the metallic artifacts of MP3 and makes Vorbis's degradation when perceptual transparency can't be achieved sound much more like (less annoying) Gaussian noise.  I can't seem to find a much more detailed explanation anywhere, though.  Can someone please either explain it or point me to a decent explanation?  I'm looking for a moderately technical explanation.  Ideally, I want one that is intended for people with significant math background, etc., but is an overview rather than describing every small detail to someone who might want to implement noise normalization.
Title: Vorbis Noise Normalization
Post by: SebastianG on 2010-03-16 17:46:22
The idea is simple, really. A psychoacoustic model basically estimates how much distortions/quantization noise can be tolerated in specific time/frequency regions. But at low signal to noise ratios we can observe that simple undithered scalar quantzation introduces nonlinear artefacts. This is what makes it sound "metallic". Also, for us humans it's very important to preserve the "energy" of the signal (think of energy = sum of squared samples). MPEG4 AAC even uses a tool called PNS (perceptual noise substitution). This tool does only preserve the energy level and nothing else. Back to Vorbis. The Vorbis encoder simply computes the sum of squared original samples over blocks of (I think) 32 samples in the high frequency area and compares it to the sum of squared quantized samples. If the latter sum is less than the former, we lost some energy. Then, some samples are "promoted" to have higher values. This goes both ways. People used to complain about a "HF boost" with Vorbis which can partly be explained by quantization noise adding to the loudness which is directly linked to "energy". Example:

Code: [Select]
original:
[ -2.35  -0.84   0.65  -1.39   1.25  -0.49  -0.85   0.89 ]

rounded (quantization)
[ -2  -1   1  -1   1  0  -1   1 ]

The sum of squares for the original is 11.9. The sum of squares of the second signal is only 10. We can try to fix that by "rounding" -0.49 to -1 instead of 0. Then we have a sum of squares which equals 11. This is closer to 11.9. In addition, we could "promote" the 4th sample to -2 instead of -1. This would add another 3 to the sum of squared samples. But 14 is a little too much. We should probably stick to
Code: [Select]
[ -2  -1   1  -1   1  -1  -1   1 ]

as our quantized signal. So, we're not only interested in finding a quantized vector that is close to the original but also in finding one that has about the same "length" (think of it as vector quantization).

This technique also seems to reduce the metallic artefacts.
Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-17 13:00:42
Thanks.  This basically makes sense.  I'd been experimenting with some listening tests lately and unfortunately you ruined my blissful inability to ABX -q 3 Vorbis.  Now that I'm aware of noise normalization I definitely hear the high frequency boost in a few tracks with a lot of high frequency material and can ABX it.  However, apparently noise normalization is turned off at -q 4, so given a passage with enough high frequency material (read:  only in a few pathological cases), I can ABX -q 4 by listening for a drop in high frequency content.

On the other hand, I don't find these slight changes in the level of high frequency material at all annoying, since it sounds as if the music could have just been mixed slightly differently.  I'd never be able to pick it out in casual listening and I can only even ABX it on samples with lots and lots of high frequency content.  On the other hand, I can ABX Nero AC3 and LAME VBR at similar average bitrates much more easily and the artifacts are subjectively much more annoying, as even in casual listening I would realize I was probably hearing digital artifacts, not just minutely different frequency response.
Title: Vorbis Noise Normalization
Post by: Garf on 2010-03-17 14:12:47
I thought the problem of the HF energy being increased by the quantization in noise normalization was fixed long ago in aoTuv, so I wonder if what you're hearing has anything at all to do with it to begin with.

The change at -q4 might be due to different stereo models, to give an example.
Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-17 14:24:36
I thought the problem of the HF energy being increased by the quantization in noise normalization was fixed long ago in aoTuv, so I wonder if what you're hearing has anything at all to do with it to begin with.

The change at -q4 might be due to different stereo models, to give an example.


I did these tests using libvorbis 1.2.2 (the version included with Foobar), which I thought had all of the more important aoTuv stuff folded in. 

As far as stereo models, I thought that only -q 6 and above use lossless stereo coupling.
Title: Vorbis Noise Normalization
Post by: Garf on 2010-03-17 14:26:46
As far as stereo models, I thought that only -q 6 and above use lossless stereo coupling.


Yes, which implies below -q6 the stereo isn't lossless and can use different frequency thresholds
Title: Vorbis Noise Normalization
Post by: lvqcl on 2010-03-17 15:42:12
Quote
I did these tests using libvorbis 1.2.2 (the version included with Foobar)

?? foobar2000 doesn't have any encoder included.

Quote
which I thought had all of the more important aoTuv stuff folded in

IIRC official libvorbis still incorporates aoTuV beta2 code (i.e. the same as libvorbis 1.1.0).
Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-17 16:16:17
Quote
I did these tests using libvorbis 1.2.2 (the version included with Foobar)

?? foobar2000 doesn't have any encoder included.


My bad.  It does have a UI for one though.  I forgot that I had set this up a long time ago.  Will see if this has been corrected in the most recent versions of AoTuv.
Title: Vorbis Noise Normalization
Post by: googlebot on 2010-03-17 16:27:06
I never understood why AoTuv has been there so long. It's not a classical experimental branch, but widely used as a preferable choice. So why doesn't the author commit to mainline directly, he seems to know what he is doing?
Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-17 16:28:18
I thought the problem of the HF energy being increased by the quantization in noise normalization was fixed long ago in aoTuv, so I wonder if what you're hearing has anything at all to do with it to begin with.


BTW, can you please define "long ago"?  Do you mean before Beta 2, long enough ago that libvorbis would have incorporated it, or not?
Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-17 16:48:28
I never understood why AoTuv has been there so long. It's not a classical experimental branch, but widely used as a preferable choice. So why doesn't the author commit to mainline directly, he seems to know what he is doing?


Yeah, I haven't kept up with the development of Vorbis for the past few years.  I'm just starting catch up now.  I was under the impression until today that AoTuv was a normal experimental branch and that everything worthwhile had been merged into libvorbis 1.1 and 1.2.
Title: Vorbis Noise Normalization
Post by: lvqcl on 2010-03-17 17:02:59
Changes described at http://www.geocities.jp/aoyoume/aotuv/ (http://www.geocities.jp/aoyoume/aotuv/) and http://www.geocities.jp/aoyoume/aotuv/old_beta.html (http://www.geocities.jp/aoyoume/aotuv/old_beta.html) :

Quote
aoTuV Beta5.5:
# Noise Normalization was reviewed. As a result, the bug is revised.

aoTuV Beta5:
# The action of noise normalization has been improved.  This has an effect in the sound roughness and tremor problem etc. in the low bitrate.

Beta4:
# Tuning of Masking relation and Noise Normalization. These mainly influence balance and the quantity of distortion which can be heard.
Title: Vorbis Noise Normalization
Post by: dsimcha on 2010-03-18 00:03:14
Changes described at http://www.geocities.jp/aoyoume/aotuv/ (http://www.geocities.jp/aoyoume/aotuv/) and http://www.geocities.jp/aoyoume/aotuv/old_beta.html (http://www.geocities.jp/aoyoume/aotuv/old_beta.html) :

Quote
aoTuV Beta5.5:
# Noise Normalization was reviewed. As a result, the bug is revised.

aoTuV Beta5:
# The action of noise normalization has been improved.  This has an effect in the sound roughness and tremor problem etc. in the low bitrate.

Beta4:
# Tuning of Masking relation and Noise Normalization. These mainly influence balance and the quantity of distortion which can be heard.



Thanks.  Tried aoTuv and it's truly amazing.  Now I can't even consistently ABX -q 2.  I can (barely) get it on some songs with lots and lots of high frequency content, but it's hard enough that I'm confident it will be transparent in more casual listening.  I keep everything as FLAC anyhow on my hard drive, since storage on a PC is so cheap and plentiful that it's not even worth taking a chance of quality loss here.  The Vorbis files are only for my portable player, where I listen with crappy headphones in crappy listening environments anyhow.

Now, to find more stuff to put into my newfound space on my Sansa...
Title: Vorbis Noise Normalization
Post by: HotshotGG on 2010-03-18 13:13:37
Quote
I've become very curious about how Vorbis noise normalization works. My (very vague) understanding is that it somehow avoids the metallic artifacts of MP3 and makes Vorbis's degradation when perceptual transparency can't be achieved sound much more like (less annoying) Gaussian noise. I can't seem to find a much more detailed explanation anywhere, though. Can someone please either explain it or point me to a decent explanation? I'm looking for a moderately technical explanation. Ideally, I want one that is intended for people with significant math background, etc., but is an overview rather than describing every small detail to someone who might want to implement noise normalization.


SebastianG summed it it pretty well. In addition to that it uses a sorting technique (probably a quicksort of a bubblesort) to redistribute the energy to neighboring bands. It's by-band noise energy.