HydrogenAudio

Hydrogenaudio Forum => Scientific Discussion => Topic started by: charleski on 2008-05-19 18:07:17

Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-19 18:07:17
I hope this is the correct forum for this. I did a search and couldn't find anything directly concerning this issue, though I might have missed something buried in some of the threads, if so, forgive me.

I recently been refreshing my knowledge of dither and noise-shaping and one thing occurred to me. Noise-shaping works by pushing quantisation noise away from the areas where the ear is most sensitive. But, noise-shaping is also very fragile (as shown by Hicks, 95 (http://www.digitalsignallabs.com/noiseb.ps)). So what is the correct thing to do when producing audio that you know is going to be subjected to lossy compression?

I did a test of my own which basically seems to show that you might as well forget about noise-shaping completely for music which will be compressed.
First, I created (at 32bit 48kHz) a -80db 1kHz test tone in Audition.
I then converted that to a 16bit 48kHz file using 0.6bits of triangular dither and Audition's 'D' noise-shaping (which is very similar to many of the psychoacoustic noise-shaping patterns used by other manufacturers).
This file was then converted to mp3 using LAME 3.98 beta 8 at a variety of different bitrates.
The mp3s were then imported back into Audition and compared.

Here is a comparison of the FFT of the original waveform (green) with that of the waveform compressed with -V0 (blue):
(http://img529.imageshack.us/img529/3812/mp3noiseshapinget6.th.jpg) (http://img529.imageshack.us/my.php?image=mp3noiseshapinget6.jpg)
Ouch

As Hicks showed, manipulating the audio after dithering destroys almost all the advantage gained by noise-shaping and leaves the HF noise.

As a check, I went back to the original and re-converted it to 16bits using 0.6bits of triangular dither with no noise-shaping, then compressed it to mp3. The following shows the comparison between the original file(green) and the sound when converted to -V0 mp3 (blue).
(http://img517.imageshack.us/img517/6204/mp3noiseshaping2ek8.th.jpg) (http://img517.imageshack.us/my.php?image=mp3noiseshaping2ek8.jpg)
Finally, a comparison of -V0 mp3s produced from 16bit files without (green) and with (blue) noise-shaping applied during the dithering:
(http://img45.imageshack.us/img45/104/mp3noiseshaping3fh1.th.jpg) (http://img45.imageshack.us/my.php?image=mp3noiseshaping3fh1.jpg)
The mp3 from a noise-shaped source appears to show around a 4dB lower noise-floor through the 2-4kHz band, but has a pronounced HF peak from 15-19kHz.

Files sizes were (at -V0):
No noise-shaping: 95808 bytes
With noise-shaping: 77280 bytes
So the noise-shaped file did compress significantly better. (Maybe as a result of the slightly lower noise-floor in the upper mid-range?)

So... Is noise-shaping good or bad for lossy compressed audio? Could there be some interaction going on here with the algorithms inside the mp3 process? My first reaction on looking at the spectra was that you might as well just use flat dither and forget about noise-shaping, but looking at the file sizes has me wondering. Was there anything I did wrong in my tests? Should I have tested something else? (I plan to do the same thing using Nero's AAC encoder later on.)
Title: Noise-shaping curves and downstream lossy compression
Post by: pdq on 2008-05-19 18:12:35
I'm a little confused here. You should only need to add dither when converting to a lesser bit depth. But if you are next going to convert to lossy compression, why are you first decreasing the bit depth? Just use the full precision going into your lossy compression?
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-19 18:25:26
I'm a little confused here. You should only need to add dither when converting to a lesser bit depth. But if you are next going to convert to lossy compression, why are you first decreasing the bit depth? Just use the full precision going into your lossy compression?

Because I found that LAME didn't support the 32bit file. [Edit: I first thought it merely truncated it, but checked and found that was a result of the frontend I was using. I tried again using it directly on the commanline and got an 'Unsupported data format: 0x0003' error.]
Also, anyone producing material for distribution will need to prepare a 16/44.1 final master, and adding an extra master for conversion to lossy formats is both more expense, and more opportunity for things to go wrong (i.e. '24bit' files that are just 16bits padded out, etc). The important question is: what's the best way to create your final 16/44.1 master that will produce best quality on both CD and mp3/AAC, etc?
Title: Noise-shaping curves and downstream lossy compression
Post by: DualIP on 2008-05-19 21:39:54
-Don't dither, just feed the higher resolution audio into the codec.
(I do that when using oggenc)

-What do your spectographs tell us? It seems to me you're using a mp3 decoder (yeah DEcoder) which doesn't have noise a shaping output. This way advantages of the noise shaped codec input will obviously be lost
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-19 23:35:32
-Don't dither, just feed the higher resolution audio into the codec.
(I do that when using oggenc)

-What do your spectographs tell us? It seems to me you're using a mp3 decoder (yeah DEcoder) which doesn't have noise a shaping output. This way advantages of the noise shaped codec input will obviously be lost

I'm using the Fraunhofer mp3 decoder built into Audition 3.0. I think that's representative of a good-quality commercial mp3 implementation. MP3 decoders may use noise-shaping to alter the quantisation error introduced in their coding process, but this is a different thing to that used in word-length-reduction dither. You'll have to explain how noise-shaping in the decoding stage would retain the noise shape introduced before compression, rather than simply disrupting it, like any signal manipulation would.

I can't see any point at all in feeding a 24/96 signal through a lossy compressor. If you need to use lossy compression to reduce the size of your files, then 16/44.1 must obviously be the first step, I think the issue of the audibility of high sample-rates and word-lengths has already been done to death on this forum. If you want to retain 24/96 for whatever reason then you should be using lossless compression and this discussion has nothing to do with that. A correctly-dithered reduction to 16bits is a highly effective form of lossy audio compression.

Dithering down to 16bits is essential for mp3, at least. I discovered that LAME will accept 32bit data in type1 format (4byte PCM as opposed to the default type 3 floating-point format). I fed this into Lame at -V0 and the result wasn't pretty:
[a href="http://img181.imageshack.us/my.php?image=mp3woditherkt7.jpg" target="_blank"] )
Title: Noise-shaping curves and downstream lossy compression
Post by: DualIP on 2008-05-20 07:03:09
Downsampling from 96k to 44.1 or 48k might be usefull before entering a audio codec, reducing bitlength should simply be avoided. Why go down to 16 bits where encoder internally works with floats/long integers ?
Adding dither will just add to the noisefloor....

Most people enter and leave the mp3 world from 44.1/16bits and so are unaware of the fact that mp3s dynamic range is, unlike CDs, not limited to 96dB!
Make a similar testfile and alter sine amplitude +24dB. create MP3, and use mp3gain to alter the volume by -24 dB.
Now make spectograph of mp3 decoder having float out....


Also, you post lots of spectograps, but what is shown:
bitlength of codec output? Dither applied on codec output? If so, type of dither?
I guess the ogg spectrograms aren't from 16 bit output.
MP3 output seems to be white noise dithered....
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-20 18:03:39
Downsampling from 96k to 44.1 or 48k might be usefull before entering a audio codec, reducing bitlength should simply be avoided. Why go down to 16 bits where encoder internally works with floats/long integers ?
Adding dither will just add to the noisefloor....
Take a look again at the spectrum of the 32bit mp3 decoded with Fraunhofer's decoder and the spew of quantisation distortion introduced. The decoder's output was still truncated even if I used Open As... to open it as a 32bit file. I fiddled with foobar and was able to decode that same mp3 properly back to a 32bit wav, which produced a far more acceptable result, albeit with a 500Hz component 56db below the signal. I then changed the foobar option to output 16bits with no dither (which is the default option in foobar) and produced a wav with the same spectrum seen from the Fraunhofer decode.

I can see your reasoning: an mp3 is just a stream of floating-point coefficients, so in theory the word-length of the source should only affect the noise-floor of the output. In practice, though (and I think it's fair to take the Fraunhofer decoder as representative of the commercial decoders most people use to play their music) something else is going on and feeding the decoder with a signal from a long-wrod-length sources introduces significant distortion.

The reality is that when it comes to distributing your music the norm is 16bits. Distributing at longer word-lengths is more likely to harm your music than help it. It doesn't matter how great your final master sounds in the studio, if it sounds like sh*t on an iPod you aren't going to impress people. In an ideal world all music would be compressed straight from 32bit masters and every (mp3/aac/ogg) player would noise-shape and dither correctly in hardware on playback. The world ain't ideal, though :/. This is why not dithering really isn't an option.

Quote
Most people enter and leave the mp3 world from 44.1/16bits and so are unaware of the fact that mp3s dynamic range is, unlike CDs, not limited to 96dB!
Make a similar testfile and alter sine amplitude +24dB. create MP3, and use mp3gain to alter the volume by -24 dB.
Now make spectograph of mp3 decoder having float out....
Yep, the results should be identical.


Quote
Also, you post lots of spectograps, but what is shown:
bitlength of codec output? Dither applied on codec output? If so, type of dither?
I guess the ogg spectrograms aren't from 16 bit output.
MP3 output seems to be white noise dithered....

The Vorbis filter reports all .ogg files as being 32bits, so I assume it decodes them as such.
I decoded the mp3 files prepared from 16bit dithered wavs back to wav using foobar with the option set to 16 bits no dither and got identical results to those shown from the Fraunhofer decoder, so I assume that's the settings it's using.
Significantly, while I was playing around in foobar I also decoded these files to 32bit wavs (32bit output, no dither) and got the following result: (both mp3s prepared at -V0 from 16bit sources, the source for the green curve dithered with 0.6 bits of flat TPDF dither, the blue used 0.6 bits of TPDF with Audition Type 'D' noise-shaping)
(http://img136.imageshack.us/img136/6533/32bitmp3noiseshaping1kn5.th.jpg) (http://img136.imageshack.us/my.php?image=32bitmp3noiseshaping1kn5.jpg)
Aha!
The shape of the noise-shaping can now be seen (refer to the first spectrum in my original post) and is massively exaggerated. Furthermore, the low-frequency noise seems to be reduced in the noise-shaped example (its level fluctuated greatly, but the capture shown is reasonably representative).

I changed the output setting to 24bits and decoded these mp3s back to wav again, producing a similar result, but with less exaggeration of the noise-shaping (since the overall noisefloor is higher).  As before, green=no noise-shaping, blue=Type 'D' noise-shaping.
(http://img362.imageshack.us/img362/8268/24bitmp3noiseshaping1ly4.th.jpg) (http://img362.imageshack.us/my.php?image=24bitmp3noiseshaping1ly4.jpg)

This is certainly very interesting. Remember that both the source files for the spectra shown were 16bits. Yet these data seem to imply that in order to reproduce the noise-shaping characteristics of the source, the mp3 should be decoded to a higher word-length. A 24bit word-length decode is, at least, a practical option, as 24bit DACs are common these days (though I doubt your common-or-garden Realtek chip produces anywhere near the SNR of a quality 24bit DAC at least it shouldn't truncate the signal).

Of course, I suspect the vast majority of hardware players are stuck with a 16bit decode chain, so the relevance of this is moot, though it's certainly interesting.
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-22 17:23:35
I got access to a couple of systems with many different commercial dithering algorithms installed as part of different packages and managed to test them. Although I didn't expect to see any difference to the noise-shaping algorithm built into Audition, I thought I should give them a chance.

I restricted the tests to mp3 and aac, as the intent is to see what effect noise-shaping has when used with codecs in common commercial use. I will probably return to Ogg with some later tests, though, as it has some interesting properties. Mp3 decoding was performed using the Fraunhofer decoder built into Audition. AAC decoding was performed using this (http://www.free-codecs.com/Nero_MPEG-4_filter_download.htm) filter frontend interfacing with a Nero aac decoder .dll (version 2.5.9.991 dated 23/02/2005). I assume that both represent the current state-of-the-art in terms of commercial decoding, but I plan to setup some further tests later in order to test a few commonly-used players directly. Mp3 encoding was perfromed using Lame 2.98 beta 8 at -V0, AAC encoding was performed using neroaacenc v 1.1.34.2, August 2007 at -q 1.

As before, the input was a -80dB 1kHz sine wave created at 32bits (float) which was converted to 16bits using the dither/noise-shaping algorithm under test, saved as a 16bit .wav and then encoded as detailed above. The graphs show the spectra of the decoded file using the tested noise-shaping algorithm in blue compared to the spectrum of the same tone dithered to 16bits using 0.6 bits of flat triangular PDF dither (green line).

Apogee UV22hr (Normal, no auto-blanking)
mp3
Continued in next post]

Waves IDR (Normal)
mp3
[a href="http://img329.imageshack.us/my.php?image=lamev0idrstdag0.jpg" target="_blank"](http://img329.imageshack.us/img329/807/lamev0idrstdag0.th.jpg) (http://img528.imageshack.us/my.php?image=lamev0uv22hrms5.jpg)
aac
(http://img186.imageshack.us/img186/3974/neroq1idrstdwe7.th.jpg) (http://img186.imageshack.us/my.php?image=neroq1idrstdwe7.jpg)
Note: these files were dithered using an old (2005) standalone IDR plugin. The IDR incorporated into Waves' final-output compressors produced inconsistent results which varied with the host and appears to use a slightly different algorithm.

Pow-r (Type 3)
mp3
(http://img293.imageshack.us/img293/5168/lamev0powr3cr9.th.jpg) (http://img293.imageshack.us/my.php?image=lamev0powr3cr9.jpg)
aac
(http://img169.imageshack.us/img169/5134/neroq1powr3go5.th.jpg) (http://img169.imageshack.us/my.php?image=neroq1powr3go5.jpg)
The Pow-r implementation used was that incorporated into Logic. As Logic (bizarrely) still cannot read 32-bit files, the test-tone was recreated inside Logic using its own tone generator plugin.

All algorithms except UV22hr (which uses shaped dither noise rather than the error feedback of true noise-shaping) exhibit the same failure when passed through a lossy encode->16bit decode process. For reference, here are the original spectra of the different dithering algorithms compared to flat TPDF dither:
(green - flat TPDF; red - IDR; purple - MBIT+; yellow- Pow-r#3; blue - UV22hr)
(http://img175.imageshack.us/img175/994/unencodedgrnnnsredidrpuhn2.th.jpg) (http://img175.imageshack.us/my.php?image=unencodedgrnnnsredidrpuhn2.jpg)
All efforts to introduce more extreme noise-shaping have been restricted to around a 4dB improvement in the critical upper midrange while the high-frequency noise hump remains, though the hf filtering performed by Lame removes much (or, in the case of UV22hr, all) of that. I'm not convinced that any show a significant improvement over simple flat dither under the conditions imposed here.
Title: Noise-shaping curves and downstream lossy compression
Post by: DualIP on 2008-05-22 17:44:50
In the spectograps, I see some effect of noise shape dither so I think it's good to use noise shaped dither.
However, it's better to just encode from 24bits (or float) source.

On small tests I did with ogg codec (float in/out) I got very good results with this codec tandem: the noisefloor is way lower than your spectrums

command line encoder:
oggenc.exe
OggEnc v1.0.2
© 2000-2005 Michael Smith <msmith@xiph.org>

Decoder:
OggdropXPd V1.6.11e

However, don't use this oggdrop version as encoder. It seems like somehow somewhere LSBs are lost, raising the spectrum, but it doesn't sound like dither noise! For very low volumes we're looking at, things like that can break all results.
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-22 19:52:21
In the spectograps, I see some effect of noise shape dither so I think it's good to use noise shaped dither.
However, it's better to just encode from 24bits (or float) source.
Is a 4dB drop in noisefloor over the 1-5kHz band worth the increased noise above 16kHz? Far more importantly, is it worth the bits wasted to encode this high-frequency noise? Personally, I don't think so. The HF noise is a trade-off that is justified in uncompressed signals by a dramatic reduction of the noisefloor in the area where the human ear is most sensitive. When the benefits of noiseshaping are reduced by half or more, yet the penalties remain, then those penalties aren't worth it.

As far as high-bit-depth compression goes, I think the real message is that it's better to decode to a bit depth higher than that used for the encoding. Certainly, as far as mp3s go, if you encode from a 24bit source, you must decode to at least 24bits, which is something that requires special configuration of the decoder that will only be possible in certain situations. Otherwise you get quantisation distortion, which is significantly worse than the uncorrelated noise added by dither.

Quote
On small tests I did with ogg codec (float in/out) I got very good results with this codec tandem: the noisefloor is way lower than your spectrums
Since most of the spectra I posted were of 16bit decodes, it's not surprising that a 32bit file will show a far lower noisefloor. No-one listens to music at 32bits though. 24bits is the limit of all current DACs (and 24bits surpasses the dynamic range of both the human ear and all current audio technology). Decoding to 32bits is of interest to see how the codec performs, but is irrelevant in terms of listening to music, as the signal is just truncated (and thereby distorted) in the process of feeding it to the DAC for conversion to analogue. If your DAC is 24bits, then the quantisation distortion will be at a low enough level to be imperceptible in the vast majority of cases. But if it's only 16bits, or a cheap '24bit' converter that's only really capable of 16bits, then the distortion products caused by the truncation will be objectionable.

The noisefloor seen on an FFT plot is determined by the size of the FFT used. A mechanism know as processing gain causes the SNR of the FFT to increase by 10log10(M/2) where M is the number of points in the FFT. You need to choose an FFT size that will thus give a noisefloor that will allow the products of quantisation distortion to be clearly visible, but beyond that the actual FFT noisefloor is merely an artifact of the parameters used in the measurement. All the spectra I posted used an FFT size of 8192 with Blackman-Harris windowing, giving a measurement noisefloor 36dB below the quantisation limit. (See the paragraphs at the end of Kester, 2005 (http://www.analog.com/en/content/0,2886,761%255F795%255F88014%255F0,00.html).)

Quote
However, don't use this oggdrop version as encoder. It seems like somehow somewhere LSBs are lost, raising the spectrum, but it doesn't sound like dither noise! For very low volumes we're looking at, things like that can break all results.

I haven't examined Ogg further because it appears to be using optimisations that make measurement of low-level noise or distortion products difficult, if not impossible. It certainly operates in a very different way to mp3 and aac.
Title: Noise-shaping curves and downstream lossy compression
Post by: DualIP on 2008-05-23 06:07:17
Is a 4dB drop in noisefloor over the 1-5kHz band worth the increased noise above 16kHz? Far more importantly, is it worth the bits wasted to encode this high-frequency noise? Personally, I don't think so.

When normal volume sounds are present,  dither amplitude is so low compared to audio, it'll be masked completely and won't have any effect on VBR encoding rate.
Even for low volumes, I doubt the HF dither noise will contribute much to required  VBR rate. A good encoder uses same hearing threshold frequency curve as the noise shaped dither.
Best: Whenever you can, input longer wordlengths to encoder, this way no bits are spent encoding dither


if you encode from a 24bit source, you must decode to at least 24bits, which is something that requires special configuration of the decoder that will only be possible in certain situations. Otherwise you get quantisation distortion, which is significantly worse than the uncorrelated noise added by dither.

For mp3 it's irrelevant what the source was. The decoder works internally with floats or long INTs, any mp3 outputted as 16 bits out has quantization in it's output stage. Using another dither algoritm in this output stage trades in this nasty quantization distortion for noise.
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-25 05:07:42
When normal volume sounds are present,  dither amplitude is so low compared to audio, it'll be masked completely and won't have any effect on VBR encoding rate.
Even for low volumes, I doubt the HF dither noise will contribute much to required  VBR rate. A good encoder uses same hearing threshold frequency curve as the noise shaped dither.
Best: Whenever you can, input longer wordlengths to encoder, this way no bits are spent encoding dither
No, long word-lengths are not the answer.

Quantisation distortion is readily audible and FAR worse than any dither noise. I really don't understand your fascination with encoding at long word-lengths. I have shown that, unless you can guarantee that decoding happens at a word-length the same or longer, you merely introduce truncation with all the attendant distortion artifacts.

There is no reason to encode lossy files at a word-length greater than 16bits. You gain no benefit by doing so. If you want to cater to an audience that requires more than the ~100dB dynamic range that a properly-dithered 16bit signal provides, then you will be providing them with LOSSLESSLY encoded files, you won't be using mp3's!!!!!

Lame's in-built low-pass filter does indeed 'compensate' for hf dither noise, it also filters out any real signal in that region. Nero's implementation of the aac spec faithfully reproduces the noise, and no doubt wastes bits doing so.

Quote
For mp3 it's irrelevant what the source was. The decoder works internally with floats or long INTs, any mp3 outputted as 16 bits out has quantization in it's output stage. Using another dither algoritm in this output stage trades in this nasty quantization distortion for noise.
No, I'm sorry, but you're wrong. My tests have shown that mp3 decoding differs according to both the source and output word-length, and I've presented the evidence. I'm still working on testing real-world decoders, but I seem to have a solution and will post my results soon.

But please, can we lay this 'encode at a long word-length' stuff to rest? It's not productive.
1) No DAC can reproduce 32bit signals without distortion, these need to be dithered down to 24bits at least.
2) 16bits remains the standard for distribution of music, and unless you listen using a $500k system in a specially-constructed sound-isolated room, 16 bits is good enough.

[Edit]And finally, I've provided evidence for everything I've said. Some of my findings have surprised me, and certainly your comments spurred me to look at how decoders work when specifically told to output to higher bit-depths, which produced some interesting results. But I showed you what the problems were with encoding at over 16bits very early on, and I think you need to go back and read over this. Merely restating your argument about encoding at a high bit-depth is a waste of time, because it's wrong for everyone who doesn't explicitly setup all their decoders to compensate for this, and it's wrong in a way that's far, far, far more noticeable than dither noise would ever be.
Title: Noise-shaping curves and downstream lossy compression
Post by: SebastianG on 2008-05-25 10:19:24
Quote
Quantisation distortion is readily audible and FAR worse than any dither noise

That's why we dither!


My understanding is that you have a "long word" source and want to create MP3 files. Is this correct?

If so, you already got the correct answer a couple of times now: Encode the "long-word" version directly to MP3. The intermediate step (converting to 16 bits prior encoding) makes absolutely no sense whatsoever. It won't hurt too much either but it doesn't make any sense. You don't gain anything by doing it.

In case you have to generate a 16 bit version (ie for burning a CD) and are only concerned that somebody might create MP3s from it then the correct answer would be: Use a good noise shaping method because it makes the noise more inaudible which should results in the MP3 encoder ignoring it even more due to psychoacoustics. The effect is that you end up with an MP3 that has a very low noise floor because some of the dither/quantization noise has been removed because it's deemed inaudible by the encoder.

That's the theory. The practical side is that there're probably many MP3 decoders available that don't dither.

Think of the MP3 encoding process as something that adds some artefacts. These artefacts are not tied to any word length resolution, they're arbitrary. So when one of your original 16 bit samples was 2042 the encoding part may add "the artefact" 2.1828 which results in the sample 2044.1828 which can't be represented with 16 bit accuracy. So, the decoder always needs to convert high resolution samples to low resolution samples -- unless you let the decoder output floating point data directly. The correct way of converting it to 16 bits includes dithering of course -- regardless of what your source accuracy was.


Cheers,
SG
Title: Noise-shaping curves and downstream lossy compression
Post by: charleski on 2008-05-29 00:27:42
PC player tests:
Obviously it's critical to investigate whether the results seen earlier are representative of the output of players commonly used for compressed music. I looked at three players that are widely used: iTunes, Winamp and foobar 2k. Although Winamp offers a variety of different output modules, none appear to offer 24bit output, and the default Nullsoft modules were used. Unless specifically noted, the output from foobar was measured using the default 16bit setting. All EQ or other DSP in the players was bypassed. The test-tones used were a -60dB sine wave generated at 32bits and dithered to 24bits using 0.6bits of triangular dither or to 16bits using flat traingular dither or Audition Type 'D' noise-shaping. Mp3s were encoded with Lame at -V0, aac was encoded with Nero at -q 1 (program versions as given in previous posts).

To capture the output of these players I used an Emu 1616m soundcard,  placing an ASIO send insert on the stereo Wave input to the DSP mixer (before the faders) and then recording this signal using Audition. This seems to provide a faithful reproduction of the output signal, with no A/D conversion or gain changes. below is a spectrum comparing the original 24bit wave (green) with that recorded from foobar when the output is set to 24bits (blue).
(http://img140.imageshack.us/img140/2179/foobar24wavvj9.th.jpg) (http://img140.imageshack.us/my.php?image=foobar24wavvj9.jpg)

As before, encoding at word lengths longer than 16bits led to significant quantisation distortion at the output in all cases except that in which foobar was set to produce 24bit output. For example, the following spectrum shows the original 24bit signal (green) compared to the output from foobar (red), iTunes (purple) and WinAmp (blue) when decoding an aac file created at 24bits.
(http://img229.imageshack.us/img229/6350/foobar16ituneswinamp24aou7.th.jpg) (http://img229.imageshack.us/my.php?image=foobar16ituneswinamp24aou7.jpg)
Both foobar and iTunes show clear and marked distortion products typical of truncation. The output from iTunes, however, demonstrates what appear to be more complicated intermodulation products.

Encoding 24bit signals directly results in quantisation distortion on decoding. Obviously there are several people on this forum who believe that DCT-based encoding systems are immune to considerations of word-length. The facts say different and make it clear that dithering down to 16bits before encoding is essential to avoid ugly distortion when the music is played. These facts aren't subtle, they're blatantly obvious.

I generated a lot of data in these tests, and will only present the important results here.

The following set of spectra show the output of foobar (yellow), iTunes (red) and Winamp (blue) when fed with an mp3 from a 16bit noise-shaped input. All show the same destruction of the benefits of noise-shaping seen above with the Fraunhofer decoder integrated into Audition.
(http://img412.imageshack.us/img412/151/foobarituneswinampnsmp3nj7.th.jpg) (http://img412.imageshack.us/my.php?image=foobarituneswinampnsmp3nj7.jpg)
The Winamp decoder did appear to generate small intermittent distortion spikes at 2 and 4kHz, one of which is visible in the spectrum. Neither foobar nor iTunes showed the same deficit.

The following is a similar comparison using an aac file encoded from the same 16bit noise-shaped test-tone. As before, foobar's output is in green, iTunes in red and Winamp in blue.
(http://img135.imageshack.us/img135/5431/foobarituneswinampnsaacun6.th.jpg) (http://img135.imageshack.us/my.php?image=foobarituneswinampnsaacun6.jpg)
The distortion spikes produced by Winamp are more pronounced when decoding aac, and it's apparent that Winamp suffers serious problems somewhere in its decoding or output stages. Both foobar and Winamp faithfully reproduce the high-frequency noise introduced by noise-shaping, but iTunes appears to use some form of low-pass filter that rejects this.

When foobar's output was changed to 24bits, the benefits of noise-shaping were exaggerated in a similar fashion to that seen in the earlier posts. The following spectrum shows the original 16bit noise-shaped signal (green) compared to foobar @24bits decoding an mp3 (red) and aac (blue).
(http://img339.imageshack.us/img339/1503/foobar24nsmp3aacab2.th.jpg) (http://img339.imageshack.us/my.php?image=foobar24nsmp3aacab2.jpg)
It does appear that there is some form of optimisation happening here, which is turned on when the signal level drops below a certain floor.
The slightly high noise-floor present in a signal which has used flat triangular dither does not appear to trigger this decoding optimisation, as show here (as before, original signal in green, foobar@24 mp3 decode in red, foobar@24 aac decode in blue):
(http://img504.imageshack.us/img504/161/foobar24nnsmp3aacog3.th.jpg) (http://img504.imageshack.us/my.php?image=foobar24nnsmp3aacog3.jpg)
Title: Noise-shaping curves and downstream lossy compression
Post by: SebastianG on 2008-05-29 08:37:31
Quote
Encoding 24bit signals directly results in quantisation distortion on decoding.

Not necessarily!
Well you always need to quantize to 16 bits within the decoder if you want to get a 16 bit signal! So, you always get quantization errors! The difference is that -- when dithering is in use -- "nasty" quantization errors turn into "friendly" quantization errors.

Quote
Obviously there are several people on this forum who believe that DCT-based encoding systems are immune to considerations of word-length. The facts say different and make it clear that dithering down to 16bits before encoding is essential to avoid ugly distortion when the music is played. These facts aren't subtle, they're blatantly obvious.

Facts? No, it's just your conclusion. Encoding from 24 bit signals is not the problem. It's the decoders. They always need to dither. What you seem to be experiencing is that the "16 bit dither" you applied prior encoding gets passed through mp3 encoding & decoding which prevents bad decoders from producing nasty quantization errors in this particular case.

Consider this: You checked Foobar's 24 Bit output and everything seems fine. Now, converting this 24 bit result to 16 bit is simple. You'd agree with me that with dithered 24->16 conversion there's no way that "truncation artefacts" pop up. So, the problem with decoding it to 16 bit is that some decoders fail to dither. They simply truncate. If you want a decoder to output 16 bit it should generate a high resolution version and dither it down to 16 bits. That's what we've been telling you.

BTW: Foobar's dithering is optional IIRC. Are you sure that you turned it on? Because it doesn't seem like you did.

Cheers.
SG
Title: Noise-shaping curves and downstream lossy compression
Post by: Polouess on 2008-05-30 16:31:02
Encoding from 24 bit signals is not the problem. It's the decoders. They always need to dither. What you seem to be experiencing is that the "16 bit dither" you applied prior encoding gets passed through mp3 encoding & decoding which prevents bad decoders from producing nasty quantization errors in this particular case.
[...]
So, the problem with decoding it to 16 bit is that some decoders fail to dither. They simply truncate.


Yeah, but these bad decoders your're talking about include iTunes and Adobe Flash Player (Win XP). In my own test these programs introduce quantization distortion when decoding 24 bit wave files, and mp3 files encoded from a 24 bit source. Which means iTunes and Flash player don't dither, they truncate!

Why, you ask? Because 99% of the music played through them comes from properly dithered 16 bit sources, and the dither noise is preserved well enough even through mp3 encoding. So the developers just didn't care about this topic.
Even the mp3 decoders built into the professional Sequencers Sequoia and Nuendo truncated my test files to 16 bits.

My conclusion from this is:
If it's just for your personal use with foobar2000 or perhaps some suitable winamp output plugin, then you might encode your 24bit music directly. This would save you some bits the encoder would have wasted for the dither noise, so you'd have a slight advantage even if you dither to 16 bit at playback.

But if you want to encode your music for everybody, then you should in fact feed the encoder with dithered 16 bit, in order to avoid truncation errors at playback with, as I fear, almost any commercial player.


This one is similar like the encoder clipping issue: Of course can the mp3 format store levels higher than full scale (and lower than -96 dB). But none of the players out there (except foobar2000 and perhaps Winamp) care about it! They just clip everything that surpasses 0dBFS (and truncate everything that goes below -96 dBFS). This is true for freeware players as well as professional sequencers.
Encode from 16 bit dithered, and attenuate modern music by, say, 3 dB (prior to dithering!) to avoid encoder clipping.
Title: Noise-shaping curves and downstream lossy compression
Post by: SebastianG on 2008-05-30 17:30:14
Why, you ask? Because 99% of the music played through them comes from properly dithered 16 bit sources, and the dither noise is preserved well enough even through mp3 encoding.

There's no guarantee for that.

So the developers just didn't care about this topic.

Maybe they didn't know.

Encode from 16 bit dithered, and attenuate modern music by, say, 3 dB (prior to dithering!) to avoid encoder clipping.

Then you'd have technically inferior MP3 files due to the unnecessary step of adding noise prior encoding. No thank you. I prefer to blame decoders ...

Cheers.
SG
Title: Noise-shaping curves and downstream lossy compression
Post by: DualIP on 2008-05-31 06:48:14
But if you want to encode your music for everybody, then you should in fact feed the encoder with dithered 16 bit, in order to avoid truncation errors at playback...

You can't model the lossy codec as a straight through wire! Would that be the case, you would be right.

Instead, the lossy codec output isn't exactly equal to the input. Not only the codec removes some audio parts (which it figures aren't hearable) but also due to splitting up the signal in frequency bands, the codec output differs from the input. These differences are float values, up to hundreds of LSBs (16 bit audio)
So even for 16 bit source, the codec out is basically float, and to avoid truncation, dither should be applied.

I once added dither to open source decoders myself, plots with and without dither in output can be found at
http://dither123.dyndns.org/ (http://dither123.dyndns.org/)
PS: At the moment I'm upgrading these routines with noise shaping techniques I learned from Sebastian G
Title: Noise-shaping curves and downstream lossy compression
Post by: Polouess on 2008-05-31 09:32:40
So even for 16 bit source, the codec out is basically float, and to avoid truncation, dither should be applied.


The codec out might be "basically float", but this doesn't mean the encoder is improving the SNR right?
So truncating doesn't hurt too much here, because there's still enough noise left to avoid ugly truncation artifacts (I'm not sure about heavily noise-shaped dither, though, and thus agree with the OP).

You can prove me wrong with an example where a dithered decode of a 16 bit source mp3 performs visibly or audibly better than the truncated one.

Of course decoder dither would be the best thing, but the OP's goal was best possible playback on commonly used players, and I don't think that just blaming decoders is a practicable solution here...
Title: Noise-shaping curves and downstream lossy compression
Post by: A Dawg on 2008-06-03 06:03:46
I have encoded 24 bit wavs I made of my vinyl collection into a lossy compression. WMApro is something I recently discovered and the only 24 bit audio I can play through my 360. I think it sounds 200 times better than mp3 and 5 times better than uncompressed cd audio. (I made those numbers up, but you get my jist I hope)
Has anyone here worked with wmapro in 24/96, because I don't see how my 130 meg flacs get compressed into 7 meg files and still sound so good.
Title: Noise-shaping curves and downstream lossy compression
Post by: benski on 2008-06-03 06:17:27
How are you generating the decoded output for Winamp?  And what version?  Any third party plugins? (like in_mad, in_!mpg123 or the in_mp4 from rarewares?)  There should be no distortion problems unless you've somehow enabled the EQ or pre-amp or a DSP or something like that.  From my own tests, Winamp MP3 decoding (@16bit) differs from Foobar (@16bit) by only about 5 samples per million (and just LSB differences, at that) so I'm surprised by your results.
If you are using the disk writer output plugin, you can enable 24bit playback in winamp preferences->playback (check "Allow 24 bit")
Title: Noise-shaping curves and downstream lossy compression
Post by: DualIP on 2008-06-04 08:39:47
I just did some testing myself and, to some extent, the topicstarter is right!

lets assume this setup:
1) We have a higher resolution source, which is reduced in wordlength to 16 bits, using dither.
2) This 16 bit signal is fed through a lossy codec.
3) the codec works internally in higher than 16 bits resolution.
4) At the codec output this wordlength is reduced to 16 bits, without using dither

From my tests, I've drawn these conclusions:

-Some of the dither noise added in 1) survives the codec.
However the codec might get rid of some of the dither noise, due to switching off some frequency bands.
(Previous screenshots show in mp3 case the removal of dither content above 18kHz.)

-If enough dither noise survives the codec, the wordlength reduction in 4):
a) does not introduce quantization distorsion.
b) adds white noise to the source at a level around -96 dB. This raises the noisefloor compared to signal created in 1). Basically the white noise from step 4) is added to dither noise from 1).
When 1) uses noise shaped dithering, the noise addition of step 4) ruins the preciously created noise spectrum in step 1).  "valleys" in the spectrum where the ear is most sensitive "are filled up."

-If not enough dither noise survives the codec, the wordlength reduction in step 4) introduces quantization distorsion.


So when you do have the setup mentioned in  1)....4), it's best to use in step 1) white noise at somewhat higher than normal value.
Why white noise instead of noise shaped?
-hf frequency bands might get lost. In case of noise shaped dither, too much dither might be filtered out . resulting in quantization distorsion in step 4)
-The spectrum shaped in step 1) gets lost when white noise is added in 4)
Why somewhat higher value?
-to compensate for dither noise getting lost (or becoming too correlated) in the codec.
This way, enough dither is still present in step 4) to lineairize the quantizer.
Title: Noise-shaping curves and downstream lossy compression
Post by: jido on 2008-06-04 10:01:30
I just did some testing myself and, to some extent, the topicstarter is right!

lets assume this setup:
1) We have a higher resolution source, which is reduced in wordlength to 16 bits, using dither.
2) This 16 bit signal is fed through a lossy codec.
3) the codec works internally in higher than 16 bits resolution.
4) At the codec output this wordlength is reduced to 16 bits, without using dither

From my tests, I've drawn these conclusions:

-Some of the dither noise added in 1) survives the codec.
However the codec might get rid of some of the dither noise, due to switching off some frequency bands.
(Previous screenshots show in mp3 case the removal of dither content above 18kHz.)

-If enough dither noise survives the codec, the wordlength reduction in 4):
a) does not introduce quantization distorsion.
b) adds white noise to the source at a level around -96 dB. This raises the noisefloor compared to signal created in 1). Basically the white noise from step 4) is added to dither noise from 1).
When 1) uses noise shaped dithering, the noise addition of step 4) ruins the preciously created noise spectrum in step 1).  "valleys" in the spectrum where the ear is most sensitive "are filled up."

-If not enough dither noise survives the codec, the wordlength reduction in step 4) introduces quantization distorsion.


So when you do have the setup mentioned in  1)....4), it's best to use in step 1) white noise at somewhat higher than normal value.
Why white noise instead of noise shaped?
-hf frequency bands might get lost. In case of noise shaped dither, too much dither might be filtered out . resulting in quantization distorsion in step 4)
-The spectrum shaped in step 1) gets lost when white noise is added in 4)
Why somewhat higher value?
-to compensate for dither noise getting lost (or becoming too correlated) in the codec.
This way, enough dither is still present in step 4) to lineairize the quantizer.

So you are basically using a random function (white noise) as dither???
Does it work well with a 1-bit white noise.

Are the ripped disks from our CD collection affected by this decoding issue?
Title: Noise-shaping curves and downstream lossy compression
Post by: Enig123 on 2008-06-04 10:48:51
Well, the problem addressed in this thread is quite interesting for lossy codec. I read the whole thread and think the topic starter is quite right. I was wondering if product CDs are affected by dither method applied at the last step, and if it's true, how can we apply a simple step (maybe additive white noise? in what amplitude?) before we do lossy coding to get rid of it?
Title: Noise-shaping curves and downstream lossy compression
Post by: pdq on 2008-06-04 11:36:22
Why not make this part of the encoder, i.e. have the encoder compensate for lack of dither during decoding?
Title: Noise-shaping curves and downstream lossy compression
Post by: 2Bdecided on 2008-06-04 11:51:46
No, long word-lengths are not the answer.

Quantisation distortion is readily audible and FAR worse than any dither noise. I really don't understand your fascination with encoding at long word-lengths. I have shown that, unless you can guarantee that decoding happens at a word-length the same or longer, you merely introduce truncation with all the attendant distortion artifacts.
Yes, but...

1. mp3 encoding adds artefacts. Often, they are inaudible. It depends on bitrate, encoder, and content. It's not possible to pick a bitrate and encoder that is artefact-free for all content and all listeners.

2. Quantisation distortion due to truncation from 24-bits to 16-bits is just another artefact. For most content, most of the time, it is entirely irrelevant - completely inaudible, and at a far lower level (subjectively and objectively) than the mp3 encoding artefacts.

3. There is no guarantee that your dither will survive encoding to act as effective dither at the output. Strictly speaking, the signal should be re-dithered at the output - it's just luck that most signals (dithered or not!) adequately self-dither most of the time. Your original dither isn't really "dither" at the output of the decoder - it's a noisy signal which might do the job well enough.

4. When encoding, dither costs bits. Noise shaped dither costs even more bits. This is well agreed, I think. If bits are in short supply, dither could reduce quality by taking bits away from regions that need those bits more. This is speculation - I think it's unlikely, but possible.

5. The people who care most about sound quality aren't using mp3.

6. Of the ones who are using mp3, the ones who care most about sound quality are using decoders with 24-bit or 16-bit dithered outputs.

7. Once you have added dither at the 16-bit level, you can't take it away again.

This all suggests to me that the smart thing to do is to feed the highest resolution you have into the lossy encoder.

If you must use 16-bits, then either don't noise shape at all, or use something like UV22 which adds extra noise only at such high frequencies that the mp3 encoder's low pass filter will remove it before encoding (hopefully in the floating point domain).

btw, the most annoying result of all is to have dither noise which is partially encoded, because it just "tickles" the ATH curve in the encoder. That sounds really annoying, if you turn the volume up loud enough to be able to hear it. It's a great potential reason to avoid ATH-shaped dither if you intend to mp3 encode something - but it's also a reason to avoid dither altogether. I came across it in my mp3 decoder tests, where I stumbled onto some of these issues and tried to examine them in detail...

http://mp3decoders.mp3-tech.org/lsb.html (http://mp3decoders.mp3-tech.org/lsb.html)
http://mp3decoders.mp3-tech.org/24bit.html (http://mp3decoders.mp3-tech.org/24bit.html)
http://mp3decoders.mp3-tech.org/24bit2.html (http://mp3decoders.mp3-tech.org/24bit2.html)

Hope this helps.

Cheers,
David.
Title: Noise-shaping curves and downstream lossy compression
Post by: SebastianG on 2008-06-04 13:01:58
This all suggests to me that the smart thing to do is to feed the highest resolution you have into the lossy encoder.

Thank you 2B for the support.

I have to comment on one thing, though:
[...] or use something like UV22 which adds extra noise only at such high frequencies

UV22 dither has absolutely no advantage to noise shaping. The UV22 dither signal may not contain anything below 20 kHz but quantization adds white noise with a rectangular PDF. So the overall noise you get is really UV22 + white RPDF noise.

Cheers,
SG
Title: Noise-shaping curves and downstream lossy compression
Post by: 2Bdecided on 2008-06-04 14:02:58
In-band it's supposedly a few dB lower than dither without noise shaping.

More importantly, you don't start loading sfb21 with noise, as you would with noise shaping.


Of course 24-bit is a better choice - but sometimes people want to make CDs, knowing that they can be converted to mp3. (More often I suspect the opposite is true!).

Cheers,
David.
Title: Re: Noise-shaping curves and downstream lossy compression
Post by: Kraeved on 2024-02-27 15:09:17
A reader of this topic may have a question: how to do I dither without noise shaping by means of Foobar2000, since its native method uses noise shaping? There are a couple of DSPs with TPDF dither and an optional high-pass filter:

* Smart Dither (https://foobar.hyv.fi/?view=foo_dsp_dither) that leaves digital silence untouched,

(https://i5.imageban.ru/out/2024/02/27/c16fc4ff703c92e6ba9685a482a25aba.png)

* mda Dither (https://foobar.hyv.fi/?view=foo_dsp_mdadither) that replicates DAW plugin (https://patchstorage.com/mda-dither/) by Paul Kellett (by default it uses 'second-order noise shaped' mode).

(https://i1.imageban.ru/out/2024/02/27/a4ce4e26f0654b523c24c0c74c36eda1.png)