Converting 24/96 to 16/44.1 with SoX

Topic: Converting 24/96 to 16/44.1 with SoX (Read 26264 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Converting 24/96 to 16/44.1 with SoX

2011-02-27 00:58:14

To convert a 24-bit/96kHz audio data to 16-bit/44.1kHz, I plan to use SoX...

sox input.wav -b 16 -r 44.1k output.wav

I have one concern, though...

Should I also use dither option?

eg. sox input.wav -b 16 -r 44.1k dither output.wav

Is that really necessary? And why, if that is so.

Thanks.

Converting 24/96 to 16/44.1 with SoX

Reply #1 – 2011-02-27 01:12:19

Apply dither whenever reducing bit depth, to ameliorate the bad effects of quantization error.

Dither is irrelevant to sample rate changes, it just adds (a very small amount) of unnecessary noise.

Converting 24/96 to 16/44.1 with SoX

Reply #2 – 2011-02-27 01:19:39

So in this case I should apply dither, even having this small disadvantage of having some white noise addition...?

Converting 24/96 to 16/44.1 with SoX

Reply #3 – 2011-02-27 08:10:09

It isn’t a disadvantage in any real sense. It will be impossible to hear unless
1) you concentrate on a segment with no audio signal
2) you have the amplifier volume turned so high that it would be painful if there were any significant audio signal present. In addition, most audio applications can “shape” the dither so it mostly occurs only at very high frequencies not hearable by the majority of people.

When you reduce bit depth (from 24 to 16) there will be resulting distortion. This has two aspects, the change (error) in the waveform itself and an unpleasant addition to the sound because of the error. The waveform error is expressed in the output signal as noise, but noise related to (correlated to) the audio from which it is derived. This is a quality of noise that is generally found to be unpleasant.

Adding dither before reducing the bit depth randomizes the error. This completely eliminates the unpleasant sound aspect of the error. Instead it will result in a benign white noise kind of background sound. There is thus new noise from two sources in the bit depth reduced audio. The first is the dither noise added prior to bit reduction. The second is the error noise (quantization error) of the bit reduction process. Without the first, randomizing noise, the second noises add an unpleasant aspect to the audio. With the dither, the total noise is much less objectionable than the quantization noise alone.

This is mathematically correct, and observable with the proper equipment. It is doctrinally correct in the church of quality audio: always add dither when reducing the bit depth. However, I challenge you or anyone else to be able to ABX any real 16 bit music, dithered against non-dithered -- unless perhaps some really bad software is used to do the bit reduction. Going to lower bit depths, such as to 8 bit, frequently makes the dither vs non-dither difference obvious and so this is the way it is demonstrated.

Converting 24/96 to 16/44.1 with SoX

Reply #4 – 2011-02-27 09:28:42

Quote from: krafty on 2011-02-27 00:58:14

To convert a 24-bit/96kHz audio data to 16-bit/44.1kHz, I plan to use SoX...

sox input.wav -b 16 -r 44.1k output.wav

SoX automatically dithers when it is appropriate to do so. So you don't need to specify the dither option unless you want to select a noise-shaping algorithm to use, e.g.

sox input.wav -b 16 -r 44.1k output.wav dither -f shibata

But unless you're 100% sure that you need to use a specific algorithm for the type of audio you have, just let SoX 'do the right thing' per your command above.

Cheers,
Rob

Converting 24/96 to 16/44.1 with SoX

Reply #5 – 2011-02-27 18:14:50

Quote from: bandpass on 2011-02-27 09:28:42

[SoX automatically dithers when it is appropriate to do so. So you don't need to specify the dither option unless you want to select a noise-shaping algorithm to use, e.g.

sox input.wav -b 16 -r 44.1k output.wav dither -f shibata

Latest sox even has option -a for dither, so it deosn´t unnecessarily add dithernoise on silence.

sox input.wav -b 16 -r 44.1k output.wav dither -a -f shibata

Converting 24/96 to 16/44.1 with SoX

Reply #6 – 2011-04-15 18:14:07

I have converted a 24/96 file.
One option omitting "dither".
The other option using "dither" at the end of the command line.

My question is... if using the "dither" option, the man page says that it will use a default filter TPDF. According to bandpass, SoX will automatically do it, if you omit dither option. How come the two files are not bit-identical then.

Without dither option:

Code: [Select]

sox -S stay.wav -b 16 -r 44.1k stay-reduced.wav

Input File     : 'stay.wav'
Channels       : 2
Sample Rate    : 96000
Precision      : 24-bit
Duration       : 00:43:39.98 = 251518062 samples ~ 196498 CDDA sectors
File Size      : 1.51G
Bit Rate       : 4.61M
Sample Encoding: 24-bit Signed Integer PCM

In:100%  00:43:39.98 [00:00:00.00] Out:116M  [      |      ] Hd:5.5 Clip:0    
Done.

With dither option:

Code: [Select]

sox -S stay.wav -b 16 -r 44.1k stay-reduced-dither.wav dither

Input File     : 'stay.wav'
Channels       : 2
Sample Rate    : 96000
Precision      : 24-bit
Duration       : 00:43:39.98 = 251518062 samples ~ 196498 CDDA sectors
File Size      : 1.51G
Bit Rate       : 4.61M
Sample Encoding: 24-bit Signed Integer PCM

In:100%  00:43:39.98 [00:00:00.00] Out:116M  [      |      ] Hd:5.5 Clip:0    
Done.

The two files are not bit-identical:

Code: [Select]

[krafty@linuxstation Process]$ cmp -b stay-reduced.wav stay-reduced-dither.wav
stay-reduced.wav stay-reduced-dither.wav differ: byte 55, line 1 is 101 A 100 @
[krafty@linuxstation Process]$

foobar2000 bit-compare tool:

Code: [Select]

Differences found in 1 out of 1 track pairs.

Comparing:
"H:\dld\Stay On These Roads\Process\stay-reduced-dither.wav"
"H:\dld\Stay On These Roads\Process\stay-reduced.wav"
Differences found: 103994569 sample(s), starting at 0.0000454 second(s), peak: 0.0000610 at 0.0005442 second(s), 2ch

Converting 24/96 to 16/44.1 with SoX

Reply #7 – 2011-04-15 18:36:33

When you dither you are adding pseudo-random noise. Unless the base value for the noise generator is set to be the same starting value each time, the added values will not match.

Converting 24/96 to 16/44.1 with SoX

Reply #8 – 2011-04-15 18:42:44

You mean, an offset is created? But the files are pretty much the same, aren't they?
Should I continue to use the dither switch or just leave it alone for SoX...?
What I want to know is.. there's nothing really happening using dither option or not... SoX is applying dither without the switch 'dither'...?

Converting 24/96 to 16/44.1 with SoX

Reply #9 – 2011-04-15 18:51:26

Properly applied, dither only affects the LSB of the result, so the effect is extremely small. Even though the change is small, the result is not necessarily bit-identical every time you apply it.

Without dithering there is a chance that you could hear some quantization distortion. With dithering this is replaced by some added high-frequency noise, which is either inaudible or much less annoying than the distortion that it replaces.

Converting 24/96 to 16/44.1 with SoX

Reply #10 – 2011-04-15 19:55:43

Any time one reduces bit depth there is an error for each sample, the quantization error. This can be considered as distortion, it is experienced as noise. Going from 24 bit to 16 bit that error is very small and the distortion/noise is extremely unlikely to be heard in any real music. It can be heard with very low signal level test tones. When reducing from 16 or 24 bit to 8 bit the error is much larger and more readily noticed.

The trend of that error is exactly in step with the signal and thus makes a noticeable change in it (under the conditions where it can be heard at all). This is, conceptually, a slightly different aspect of distortion, the sound of the error correlated with the signal. Generally it is experienced as something unpleasant.

Adding dither randomizes the error. This eliminates that correlated aspect of distortion, producing a white noise sort of background that is neutral rather than unpleasant. This is why dithering is considered better than non dithering, the sound is different and “better.” Additionally, if reasonable noise shaping is used, most of the noise is at frequencies few people can hear under any condition.

Since this distortion isn’t audible to begin with, in any real music at useable listening levels, that dither must be used is more a matter of doctrine than functionally.

Dither is random noise added to the signal before reducing the bit depth. Many sources (e.g. cassettes and LPs) already have considerable noise such as tape hiss. Even the best live recordings get some noise from the equipment, especially microphone preamplifiers. This might not make the best dither but it acts in the same way, to largely decorrelate the quantization error from the signal.

I’ve been occasionally asking anyone who believes dither is always necessary to submit any samples that can be discriminated in blind testing. Perhaps a few people believe they can find something but I have yet to hear about any that are audible to very may people.

Converting 24/96 to 16/44.1 with SoX

Reply #11 – 2011-04-16 09:12:45

Quote from: krafty on 2011-04-15 18:42:44

You mean, an offset is created? But the files are pretty much the same, aren't they?
Should I continue to use the dither switch or just leave it alone for SoX...?
What I want to know is.. there's nothing really happening using dither option or not... SoX is applying dither without the switch 'dither'...?

Since it is generally accepted that dither should be applied in this case, SoX invokes dither for you automatically—this is 'fail safe' operation. If, for some reason, you're sure that you don't want dither, then use 'sox -D'. Explicit invocation of dither is useful if you want to select a particular dither algorithm. Bottom line:

usually, just let SoX choose when (and how) to dither
if you're curious, use the -V option to see when this is
if you want repeatability (running the same conversion twice should produce identical files), use the -R option to prevent reseeding the random numbers that are used in dither

Converting 24/96 to 16/44.1 with SoX

Reply #12 – 2011-04-16 21:26:33

bandpass and others, thanks a lot, now I understood

Converting 24/96 to 16/44.1 with SoX

Reply #13 – 2011-04-16 22:32:57

I also tackled exactly this problem with Sox some time ago. Not all was clear to me, therefore thanks to Andy, bandpass & others for good explanation.

Converting 24/96 to 16/44.1 with SoX

Reply #14 – 2011-04-17 16:17:15

Code: [Select]

In:100%  00:39:09.00 [00:00:00.00] Out:104M  [      |      ] Hd:1.2 Clip:10   
sox WARN rate: rate clipped 6 samples; decrease volume?
sox WARN dither: dither clipped 4 samples; decrease volume?
Done.

In this case, what did sox just do, did it decrease the volume automatically? By how much? A little tiny amount? I can't see differences of volume between the two waveforms.

Converting 24/96 to 16/44.1 with SoX

Reply #15 – 2011-04-17 17:03:21

No, it doesn't decrease the volume automatically as this takes a little longer and is not always what you want to do. It can be enabled though with the -G (guard) option; with -V it will tell you how many dBs of attenuation had to be applied to prevent clipping (dB "not reclaimed").

OTOH, if the number of samples clipped is small (as in your example) you might decide not to worry about it (perhaps based on visual/aural inspection of the clipped area).

Converting 24/96 to 16/44.1 with SoX

Reply #16 – 2011-04-17 17:20:01

I see, i used this option and it returned...

Code: [Select]

sox INFO sox: effects chain: input      96000Hz 2 channels
sox INFO sox: effects chain: gain       96000Hz 2 channels
sox INFO sox: effects chain: rate       44100Hz 2 channels
sox INFO sox: effects chain: gain       44100Hz 2 channels
sox INFO sox: effects chain: dither     44100Hz 2 channels
sox INFO sox: effects chain: output     44100Hz 2 channels
In:100%  00:39:09.00 [00:00:00.00] Out:0     [      |      ]        Clip:0    sox INFO gain: 0.0828dB not reclaimed
In:100%  00:39:09.00 [00:00:00.00] Out:104M  [      |      ] Hd:1.3 Clip:0

What does "not reclaimed" mean?
Why the "Out" field says 104M, the actual file is 396MB.

Converting 24/96 to 16/44.1 with SoX

Reply #17 – 2011-04-17 17:32:55

As above, it's the amount of attenuation that had to be applied to prevent clipping i.e. in this case, very little.

IIRC, the "out" count is in samples; file size is roughly that number multiplied by the number of channels (stereo = 2) and the sample size (16bits= 2 bytes).

Edit: also sox reports real megs, computers report computer megs which are a bit bigger.

Converting 24/96 to 16/44.1 with SoX

Reply #18 – 2011-04-17 18:34:22

.... define "real" in relation to "megs" in this context, please.

Converting 24/96 to 16/44.1 with SoX

Reply #19 – 2011-04-17 20:28:31

The real-world, standardized meaning of mega, as used by engineers of virtually all disciplines, i.e. 10⁶; as opposed to the crazy world of computer jargon where it sometimes means 10⁶, sometimes 2²⁰, maybe even 1000×2¹⁰, and you're never quite sure which.

Converting 24/96 to 16/44.1 with SoX

Reply #20 – 2011-04-17 22:45:04

Does "not reclaimed" mean that there will be a volume adjustment, without "restoring" the lost peak? I have seen that after using this option, at least 1 peak was still clipping or "cut".

Converting 24/96 to 16/44.1 with SoX

Reply #21 – 2011-04-17 23:54:05

SoX sample rate change does not perform any individual-peak processing. Peaks may disappear, appear lower, or even higher, due to the removal of the high frequencies in the signal that cannot be retained at the lower sample rate, and due to lower frequencies being sampled at fewer, and different points in time.

Without the -G option there is the possibility that an increased peak will be clipped; with the -G option, SoX guarantees that no peaks will be lost due to clipping at the possible expense of a usually small attenuation across the whole file. The mechanism by which it does this is, roughly speaking, to apply a 3dB attenuation prior to the rate change, and a gain stage after the rate change that "reclaims" as much of the 3dB as it can without clipping the signal; hence, anything that can't be reclaimed by the gain stage is a net attenuation i.e. volume reduction. In this instance, it is 0.08dB, which is imperceptible.

Converting 24/96 to 16/44.1 with SoX

Reply #22 – 2011-04-18 00:32:10

So SoX consider that amount so small it doesn't reclaim it, but if it were to lower the volume by -4dB, it would claim back those 4dB after processing the gain... interesting!!! I really like this tool! In the past I used Secret Rabit Code Libsamplerate + foobar2000 to do those conversions, but now I'm glad there is SoX.

Converting 24/96 to 16/44.1 with SoX

Reply #23 – 2011-04-18 00:56:17

Does this 24/96 original file happen to be a vinyl rip by any chance? If that's so, no dither is even necessary, as the analog noise floor is well above the low bit of a 16-bit encode. No point in adding noise when there's sufficient noise anyhow.

Converting 24/96 to 16/44.1 with SoX

Reply #24 – 2011-04-18 00:57:17

Yes, it's vinyl rip.
But as Andy-Ha said... always dither when going down in bit-rate.

Notice