16bit to 8bit optimal quantization

Topic: 16bit to 8bit optimal quantization (Read 14640 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

16bit to 8bit optimal quantization

2008-09-16 10:58:09

Hi everybody!

I designed a new filter to reduce quantization noise and I'd like to compare it to other algorithms out there, specifically in the 16 to 8 bit field. My filter consists on a simple correction of the original wave samples before quantization. It's designed to work off-line but I think that I could make a buffered realtime version of it too. It might work also in other ranges, like 24 to 16 bit, but I haven't implemented it. Here's an example:

The original sample (its narrow band spectrum makes it specially sensitive to quantization noise):

http://www.synchrnzr.com/research/original-sample.wav

Here's what we get from a standard quantizacion, for example Audacity's one:

http://www.synchrnzr.com/research/standard-conversion.wav

Here's what I get using my method:

http://www.synchrnzr.com/research/sync-conversion.wav

Quantization noise floor is reduced from -46.963dB to -53.227dB in this example. Due to the high noise floors, the difference of -6.264dB is easily noticeable. I've been working on many mobile phone games audio and I'm a bit tired of using crappy 8-bit samples. I haven't find any other way or software to do a better conversion, that's why I've finally made my own algorithm.

I don't think that comparing it with UV22 or IDR was a great idea, because dithering isn't the point here, it's all about noise reduction (dithering might be added after noise reduction) However, I might try to implement the 24 to 16 bit conversion and compare the results anyway. They might be as useless as interesting, I guess.

sync

16bit to 8bit optimal quantization

Reply #1 – 2008-09-16 13:26:03

Can you demonstrate the theory behind it somewhere. There's some DSP expert around here and may give you good advices for you to polish your algorithm.

16bit to 8bit optimal quantization

Reply #2 – 2008-09-16 13:29:38

Hi!

Quote from: synchrnzr on 2008-09-16 10:58:09

I designed a new filter to reduce quantization noise

Do you mean "filter" as in "linear & time-invariant system"?

Quote from: synchrnzr on 2008-09-16 10:58:09

and I'd like to compare it to other algorithms out there, specifically in the 16 to 8 bit field.

I used to offer a tool for this (the link is dead now) that allows you to tweak many parameters including the noise shaping filter itself.

Quote from: synchrnzr on 2008-09-16 10:58:09

My filter consists on a simple correction of the original wave samples before quantization.

Can you be more specific?

Quote from: synchrnzr on 2008-09-16 10:58:09

Quantization noise floor is reduced from -46.963dB to -53.227dB in this example.

How do you measure this. What audacity settings did you use? At what sampling rate do you operate? (Sorry, I can't test your WAV files atm).

Quote from: synchrnzr on 2008-09-16 10:58:09

[...] because dithering isn't the point here, it's all about noise reduction (dithering might be added after noise reduction)

What's your definition of noise reduction in this case?

edit: I listened to the files. The 8 bit versions contain harmonic distortions. Here's my attempt ("0.6 bits" of rectangular dithering and a psychoacoustic noise shaper).

Cheers,
SG

16bit to 8bit optimal quantization

Reply #3 – 2008-09-17 09:21:36

Hi!

Your sample sounds very well! Thanks for your interest

About your questions:

- It's a time-invariant, discrete-time linear filter (FIR)

- No problem, I've compared your sample instead

- This filter adjusts the biasing of the input samples before quantization to minimize output error. Noise added is 0 in each segment and it's way too bass and soft to be audible overall. I'm still experimenting to get better results, so I can't provide a completely defined algorithm yet. I've got many ideas still running my head to test yet...

- To get the noise I'm comparing the RMS of source and output signals' difference:

Raw quantization noise : output - input = noise
Processed quantization noise: (output + correction) - (input + correction) = noise'

I'm substracting the waves with Audacity to get the noise signals and then using own software to get the RMS power of the noise. It's important to remark that the resulting wave must be compared to the 16-bit corrected wave, not the original one.

- As you can guess from above, I'm talking about the numerical difference between input and output signals (which also affects audible noise in other ways) I don't care about psychoacoustics now, only numbers.

In your sample, dithering rises up the noise floor to -37.813dB. Though it sounds way better than my examples and it's more pleasant to the ear, noise is clearly higher (it's obvious that adding noise to the signal rises the noise floor!) I guess that a corrected wave might need less dithering noise than the original one to avoid quantization artifacts, so the overall quality of the sound might be slightly improved. I'd need to check this to be sure, though.

If you wonder why am I worrying about numbers instead of sound, it's just beacuse I'm mostly thinking of a sampler/mixer/tracker samples. Dithering might be better applied to the final output. Applying it to the samples would make noise appear and disappear each time a sample starts and ends playing respectively.

sync

16bit to 8bit optimal quantization

Reply #4 – 2008-09-17 16:55:24

Perhaps you could be even more specific.

Quote from: synchrnzr on 2008-09-17 09:21:36

As you can guess from above, I'm talking about the numerical difference between input and output signals (which also affects audible noise in other ways) I don't care about psychoacoustics now, only numbers.

That is why you fail -- just kidding
Seriously, the harmonic distortions are a bit annoying. Try to get rid of them.

Quote from: synchrnzr on 2008-09-17 09:21:36

In your sample, dithering rises up the noise floor to -37.813dB. Though it sounds way better than my examples and it's more pleasant to the ear, noise is clearly higher (it's obvious that adding noise to the signal rises the noise floor!)

The important thing to note here is that by adding a bit of white noise prior quantization you increase the sum of squared input-output differences (total power) but you also usually decrease spectral peaks of the noise and flatten out the noise floor (=reducing harmonic distortions). FYI: The noise floor of "-37 dB" isn't due to dithering. This is primarily the effect of noise shaping. It depends on how you measure the noise power -- specifically: what kind of frequency weighting is involved.

Quote from: synchrnzr on 2008-09-17 09:21:36

I guess that a corrected wave might need less dithering noise than the original one to avoid quantization artifacts [...]

Since you didn't explain what you're actually doing I can't comment on it.

Quote from: synchrnzr on 2008-09-17 09:21:36

If you wonder why am I worrying about numbers instead of sound, it's just beacuse I'm mostly thinking of a sampler/mixer/tracker samples.

Well, this is not an argument against dithering. But it's an argument against some types of noise shaping. Try playing back my version at a lower speed (like half the speed). You'll be hearing a lot of high frequency noise.

Why do you want to use 8 bit samples for this? Doesn't the sampler/mixer/tracker support anything else? I guess this is about saving space. What space? disk space? RAM? What's your platform? Do you only use this sampler/mixer/tracker or are you coding it yourself? The more details you give the better the answer might be.

Cheers,
SG

16bit to 8bit optimal quantization

Reply #5 – 2008-09-18 00:23:49

Totally agree. Harmonic distortion is certainly an annoying artifact in my samples. But as I mentioned, I'm thinking of numerical accuracy now (therefore, I'm using a flat frequency weighting)

Thanks for the explanation. I was wondering how a simple 0.6bit dithering noise might add so much noise to the signal. I know the basics of noise shaping only, haven't got much experience on this field.

About the latest questions... I don't really like dealing with 8-bit samples, but most mobile phone games have a very small amount of memory to be used on sample data (most of them don't even use any kind of digital sounds but just MIDI songs and effects) Also, many old units might not even support 16-bit sounds. On the other side I also made a sound engine used in some games by Digital Legends. It does support 8-bit as well as 16-bit samples. However, I can't use compressed samples and the problem now is space. This is one of the games I'm using samples:

http://www.synchrnzr.com/gallery.onesequel.html

However most of the samples used here were 16-bit. I used 8-bit samples for the overdriven guitars, its signal power is very high, and also converted other samples with well balanced harmonic content, where noise isn't very noticeable. I commented it somewhere in the making-of.

I'm currently working on another title which requires MP3-quality music but there isn't much space available, neither in Flash nor in RAM. So I've been forced to convert many samples to 8-bit which didn't sound quite good. I was thinking on an easy way to reduce that noise and I came up with this bias compensation method... while taking a shower. Tried dithering before, but as I said, the result is not good. Mixing is done in 32-bit and quantized to 16-bit. Dithering noise of 8-bit samples is clearly audible, and noise "follows" each 8-bit sample, which isn't very pleasant. I've tried this new approach and I've finally converted some samples to 8-bit without adding much noise to the result. However, I won't do it with bass samples like this kickdrum, because noise is still too high compared to the signal.

It's not that a mobile phone game sounds quite good but I've always liked research. Also, this kind of noise reduction might be used with higher bitrate samples, so it might be useful in other fields. Reducing noise floor by -6dB also means getting extra detail from the original signal without adding any kind of noise. Dithering is used to add extra detail too, but this time, adding noise to the signal. This is the point I don't really like. I know I can't get rid of it but I just hope that correcting the sample values to reduce minimize quantization error you'll need less noise to flatten the noise floor.

Now that I think of it, I should be able to compute the optimal dithering noise's amplitude which should match the highest deviation. Which is equal to quantization noise's maximum amplitude. Before applying the correction, deviation is even higher, that might justify that a corrected sample needs less dithering noise to flatten the noise floor. This is turning out quite interesting

I've got a lot of work these days, but I'll test everything and show the results whenever I can

sync

16bit to 8bit optimal quantization

Reply #6 – 2008-09-18 08:38:46

Assuming you're in charge of the software sampler you could create a new and simple-to-handle sample format that

uses subtractive dithering. It won't increase the noise power at all, it'll just flatten the noise floor and avoid harmonic distortions.
uses per-block normalizing: Normalize a block of say 256 samples prior quantization to 8 bit and remember the inverse gain factors. Restoring the 16 bit samples on-the-fly during mixing is fairly easy. You just combine the "compressed" 8 bit samples with this "volume envelope". Apart from saving disk space and RAM it'll keep the noise floor low during quiet parts of the sample.

You could go even further into "psychoacoustic land" without any effect of the decoding complexity and/or use some simple forms of linear prediction for redundancy removal. But that just might be overkill.

Cheers,
SG

16bit to 8bit optimal quantization

Reply #7 – 2008-09-18 10:29:35

I thought of using block normalizing and envelopes, but it doesn't help with bass samples like this one, as noise is still quite audible during loud parts too, thought it might save space. I haven't tried subtractive dithering, but sounds useful. I'll take a look at it, thanks for pointing it out.

I also thought about using an LP format to reduce sample size. Beyond that, I'd prefer to use a good codec like Tremor and get rid of realtime mixed music. I don't think it should take so much CPU time, but I'm not allowed to do by now. However, engine development is stopped now so I won't add any of these features soon...

On the other side, those things would only affect games using my engine though. I'm also providing music and fx to many J2ME projects that only support simple WAV sample playback, that's why I was trying to find a general solution. Noise is very noticeable here too, cause they're mainly played over clean synthesized MIDI music, so I've got the sample problem with dithering.

sync

16bit to 8bit optimal quantization

Reply #8 – 2008-09-18 10:49:30

What matters is what it sounds like, not what it measures like. SebG's suggestion is the way to go - aggressive noise shaping with some dither - vary the dither to trade noise for distortion.

However, if you add lots of these samples together, or if the amplifier is weak and distortion-prone (as in a mobile phone), you might get clipping or audible distortion - all that high frequency noise has to be amplified and output, even if it's inaudible! This will limit how far you can go with the noise shaping.

A-law/u-law would help greatly. 10 or 12 bit samples are useful too, if available (I assume not ).

Cheers,
David.

16bit to 8bit optimal quantization

Reply #9 – 2008-09-18 13:14:23

Another option (don't mean to drag this completely off topic) might be to use ADPCM, yeah I know even with 4 bit ADPCM there's a fair whack of noise in it, but maybe it would be suitable for environment you're targeting.. In terms of cost there's pretty much bugger all cost for ADPCM.. And of course there's also higher compression ratios available on material that isn't going to suffer too much from some additional noise..

And without going completely bonkers you could probably doing some multi-band ADPCM stuff, or even go the whole hog and implement either a MDCT or PQMF, and assuming you don't want to change the pitch of this stuff you could do your entire audio pipeline in the frequency domain with one single common transform at the end, and as a bonus you gain access to lots of other stuff that would be very expensive in the time domain.. That way you'd have access to higher compression ratios with not much (if not actually less) additional processing overhead.. Of course not using some expensive entropy decoder can make the decode stage blindingly fast..
I mean really very fast, for example something like this to decode 12 samples with range -1,-1
[blockquote]int x = pBits->Read(21);
int c0 = x & 127;
int c1 = (x >> 7) & 127;
int c2 = (x >> 14) & 127;
pQ[iBand + 0] = DecodeTable_3[c0][0];
pQ[iBand + 1] = DecodeTable_3[c0][1];
pQ[iBand + 2] = DecodeTable_3[c0][2];
pQ[iBand + 3] = DecodeTable_3[c0][3];
pQ[iBand + 4] = DecodeTable_3[c1][0];
pQ[iBand + 5] = DecodeTable_3[c1][1];
pQ[iBand + 6] = DecodeTable_3[c1][2];
pQ[iBand + 7] = DecodeTable_3[c1][3];
pQ[iBand + 8] = DecodeTable_3[c2][0];
pQ[iBand + 9] = DecodeTable_3[c2][1];
pQ[iBand + 10] = DecodeTable_3[c2][2];
pQ[iBand + 11] = DecodeTable_3[c2][3];[/blockquote]
Yeah there'd still per a per stream application of scale factors afterwards, but that's trivial and easily applied when performing the sum into the common transform input..

Or maybe, and I've been thinking about this myself for some really cycle starved machines where I can't do the above, use some bit reduction method based on the LossyWav stuff and simply store each block with however many bits are actually required.. That's assuming I've understood what's going on in the LossyWav pre-processing.. I've not been following it's development that closely..

Anyway..

16bit to 8bit optimal quantization

Reply #10 – 2008-09-18 13:25:35

Quote from: synchrnzr on 2008-09-18 10:29:35

but it doesn't help with bass samples like this one

Have you thought about halving the sampling rate and using 16 bits instead? You don't really need sampling frequency of 44100 for this.

Quote from: 2Bdecided on 2008-09-18 10:49:30

aggressive noise shaping with some dither

His use case is to quanrtize samples that may be played back at different speeds. So, using an ATH-based noise shaper like I did is probably a bad idea. That was before I knew what the sample is used for. But a noise shaper that tries to hide the noise "behind the signal" should do the trick.

With those mentioned coding tools (ADPCM-like with nice noise shaping and subtractive dithering) I'm confident that you can get good sound at really low rates already. Decoding simplicity is retained, so, it allows you to do decoding on-the-fly during mixing.

Cheers,
SG

16bit to 8bit optimal quantization

Reply #11 – 2008-09-21 09:53:56

Hi people! Thanks for your answers!

However, we're losing focus, the point was just getting a better result when converting raw WAV samples to lower bitrates. If I could use u-law/a-law or any other formats I'd do it. I could use them with my engine, but as I said before, most J2ME phones only support raw WAV files (some phones support other formats, but aren't usually used for compatibility with older units)

So this thread is about converting a n-bit linear PCM sample to a n'-bit linar PCM sample being n'<n.

Quote

Have you thought about halving the sampling rate and using 16 bits instead? You don't really need sampling frequency of 44100 for this.

Yes, that's what I've doing with kickdrums and basses. Though interpolation used is just linear (CPU time), sound good enough. Downsampling them to 22050 or even 11025 helps a lot

I haven't been able to check my latest theories, been too busy with other jobs. I'll comment everything when I can put my hands on them

sync

16bit to 8bit optimal quantization

Reply #12 – 2008-09-21 10:44:04

Quote from: synchrnzr on 2008-09-21 09:53:56

[...] but as I said before, most J2ME phones only support raw WAV files (some phones support other formats, but aren't usually used for compatibility with older units)

It's not entirely impossible to add your own "sample import code", I suppose.

Quote from: synchrnzr on 2008-09-21 09:53:56

So this thread is about converting a n-bit linear PCM sample to a n'-bit linar PCM sample being n'<n.

There's not much left to talk about, then.

Cheers,
SG

16bit to 8bit optimal quantization

Reply #13 – 2008-09-22 03:43:13

Quote from: synchrnzr on 2008-09-16 10:58:09

I designed a new filter to reduce quantization noise and I'd like to compare it to other algorithms out there, specifically in the 16 to 8 bit field. My filter consists on a simple correction of the original wave samples before quantization. It's designed to work off-line but I think that I could make a buffered realtime version of it too. It might work also in other ranges, like 24 to 16 bit, but I haven't implemented it. Here's an example:

If you want good 8-bit quality, you should consider noise shaping. By carefully optimising several samples at a time, you'd be able to minimise the high-frequency error at the cost of more low-frequency error. The result would be poorer SNR, but better sound quality.

16bit to 8bit optimal quantization

Reply #14 – 2008-09-22 17:34:05

Quote from: jmvalin on 2008-09-22 03:43:13

[...] By carefully optimising several samples at a time [...]

I actually havn't thought about that this could be better than "just" filtering the noise. After some thinking I came to the conclusion that minimization of the perceptually weighted error signal in this context (linear scalar quantization) can be reduced to the lattice problem known as CVP (closest vector problem). This "noise filtering" is just an approximation. So, it's possible that an informed tree search for the next block of samples might improve the weighted SNR a little. I think I'm going to test this ...

Cheers,
SG

16bit to 8bit optimal quantization

Reply #15 – 2008-09-22 21:24:34

You are using a small base of drum/instrument sounds that are triggered in a Midi/tracker-like application for music on a mobile phone where the playback engine/cpu pretty much force you to use n'-bit PCM for delivery, mixed in a 16-bit soundsystem, while memory/bandwidth requirements force you to make n' smaller than that of your original source and what you would ideally like?

I guess that one ideal would be rendering the entire tune as 16-bit audio, then properly dithering the mix to n-bit audio with or without noiseshaping, but this cannot be done because the reason for doing samples in the first place is space/bandwidth, and when mixed before dithering, you cannot separate into samples/note-ons because the contribution from each sample is no longer time-invariant.

If a kick-drum can be used 100 times in a song, I am guessing that any "large-scale" optimization of that single sample will be miniscule due to the many different combinations of sample-mix that will typically occur at different times?

What about polyphony? Is it possible to generate a set of low-level "noise" signals (one or two bits worth) that can be used as approximations to the ideal noise-shaped pattern that one would have used if every sample could be stored individually?

Are there no time-variant filters, amplitude envelopes or any of the other luxuries that often accompany wavetable playback? Can the sample be played back at different speeds at all?

If the playback engine is sufficiently limited and the costs of brute-force optimizing once is small enough, I guess you could make an algorithm that saw the wavetable capabilities of the cell phone as a resource (e.g. 32 channels of 8-16 bit sounds), weighed the cost of total waveforms/MIDI-storage against a perceptual model giving cost of the error? Sounds like a supercomputing project to me, although having access to the source in its "synthetic form" must be an enormous advantage compared to blind encoding ;-)

If the reason for using samples is re-using audio information in a large number of tunes and/or having the music respond dynamically to stuff going on in the game, then what I have said is pretty much irrelevant.

-k

16bit to 8bit optimal quantization

Reply #16 – 2008-09-23 03:36:03

I've been following this with some interest, but I have to admit much of it is over my head. My apologies if this is not helpful.

I was thinking it might be useful, instead of pushing the dither/quantisation noise into the high frequencies, to put the noise into frequency ranges that are masked by the signal. For kick drum, for example, dither/quantisation noise would be in the low frequency range.

[Edit]
Nevermind. I tried adding LF noise to the original sample and then 'bitcrushing' it. Fail. Still a lot of HF quantisation noise/artifacts.

16bit to 8bit optimal quantization

Reply #17 – 2008-09-23 09:36:09

Quote from: SebastianG on 2008-09-18 13:25:35

But a noise shaper that tries to hide the noise "behind the signal" should do the trick.

Yes, if the dynamic noise shaped version of lossyWAV ever becomes a reality, that would be an ideal basis for this - just force the bits_to_remove to 8, and you'd get the "best" that could be done.

Cheers,
David.

16bit to 8bit optimal quantization

Reply #18 – 2008-09-29 12:02:43

Quote from: jmvalin on 2008-09-22 03:43:13

Quote from: synchrnzr on 2008-09-16 10:58:09
I designed a new filter to reduce quantization noise and I'd like to compare it to other algorithms out there, specifically in the 16 to 8 bit field. My filter consists on a simple correction of the original wave samples before quantization. It's designed to work off-line but I think that I could make a buffered realtime version of it too. It might work also in other ranges, like 24 to 16 bit, but I haven't implemented it. Here's an example:
If you want good 8-bit quality, you should consider noise shaping. By carefully optimising several samples at a time, you'd be able to minimise the high-frequency error at the cost of more low-frequency error. The result would be poorer SNR, but better sound quality.

You may want to check out these two publications:

R.A. Wannamaker, "Psychoacoustically Optimal Noise-Shaping", J. Audio Eng. Soc., vol.40, no.7/8, pp. 611-620, July 1992.

C.R. Helmrich, , M.Holters, U. Zölzer, "Improved Psychoacoustic Noise Shaping for Requantization of High-Resolution Digital Audio", Proc. 31st AES International Conference: New Directions in High-Resolution Audio, paper no.27, June 2007.

Notice