Skip to main content
Topic: Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit? (Read 18462 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

An easy question. Is the software included with Zoom H4 capable of doing downsampling from 96/24 to 44.1/16 at a high quality? Haven't got the player yet, so the answer might be almost too trivial.

If not, are there any GPL programs that would do it at highest possible perceptive quality? To make things more difficult in this sense I use Mac OS X. However installing any GNU/Linux isn't a problem if such app existed.


Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #2
try SSRC - http://shibatch.sourceforge.net/

Awesome!

It compiled in my PowerBook without problems and the dithering was all what I was looking for.

Code: [Select]
powerbook% ./ssrc_hp 
Shibatch sampling rate converter version 1.30(high precision)

http://shibatch.sourceforge.net/

usage: ssrc [<options>] <source wav file> <destination wav file>
options : --rate <sampling rate>     output sample rate
          --att <attenuation(dB)>    attenuate signal
          --bits <number of bits>    output quantization bit length
          --tmpfile <file name>      specify temporal file
          --twopass                  two pass processing to avoid clipping
          --normalize                normalize the wave file
          --quiet                    nothing displayed except error
          --dither [<type>]          dithering
                                       0 : no dither
                                       1 : no noise shaping
                                       2 : triangular spectral shape
                                       3 : ATH based noise shaping
                                       4 : less dither amplitude than type 3
          --pdf <type> [<amp>]       select p.d.f. of noise
                                       0 : rectangular
                                       1 : triangular
                                       2 : Gaussian

Now I have to figure out only what those dithering and noise types are in terms of perceived dynamic range and quality.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #3
Just use --dither 3 --pdf 1
.halverhahn

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #4
Just use --dither 3 --pdf 1

Out of luck. 96 kHz -> 44.1 kHz is not supported:

Code: [Select]
powerbook% ./ssrc_hp --rate 44.1 --bits 16 --twopass --normalize --dither 3 --pdf 1 foo.wav bar.wav
Shibatch sampling rate converter version 1.30(high precision)

frequency : 96000 -> 44
attenuation : 0dB
bits per sample : 24 -> 16
nchannels : 2
length : 9967488 bytes, 17.3047 secs
dither type : ATH based noise shaping, triangular p.d.f, amp = 0.9

Warning: ATH based noise shaping for destination frequency 44Hz is not available, using triangular dither
Pass 1
Resampling from 96000Hz to 44Hz is not supported.
44/gcd(96000,44)=11 must be divided by 2 or 3.

And not much better for 96 kHz -> 48 kHz either:

Code: [Select]
powerbook% ./ssrc_hp --rate 48 --bits 16 --twopass --normalize --dither 3 --pdf 1 foo.wav bar.wav
Shibatch sampling rate converter version 1.30(high precision)

frequency : 96000 -> 48
attenuation : 0dB
bits per sample : 24 -> 16
nchannels : 2
length : 9967488 bytes, 17.3047 secs
dither type : ATH based noise shaping, triangular p.d.f, amp = 0.9

Warning: ATH based noise shaping for destination frequency 48Hz is not available, using triangular dither
Pass 1
ssrc_hp(1044) malloc: *** vm_allocate(size=2097156096) failed (error code=3)
ssrc_hp(1044) malloc: *** error: can't allocate region
ssrc_hp(1044) malloc: *** set a breakpoint in szone_error to debug
zsh: bus error  ./ssrc_hp --rate 48 --bits 16 --twopass --normalize --dither 3 --pdf 1


Oh well, at least the fast version seems to work:

Code: [Select]
powerbook% ./ssrc --rate 32 --bits 16 --twopass --dither 3 --pdf 1 foo.wav bar.wav
Shibatch sampling rate converter version 1.30

frequency : 96000 -> 32
attenuation : 0dB
bits per sample : 24 -> 16
nchannels : 2
length : 9967488 bytes, 17.3047 secs
dither type : ATH based noise shaping, triangular p.d.f, amp = 0.9

Warning: ATH based noise shaping for destination frequency 32Hz is not available, using triangular dither
Pass 1
100% processed, ETA =   0sec
peak : -41.6046dB

Pass 2
100% processed

Got my Zoom H4 only couple of hours ago, so just playing around and getting to know the equipment. That's the reason why the peak level is so low.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #5
--rate 44100

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #6
Well, the normal version seems to make files around 3 kilobytes, so it isn't working properly either. It just seemed to work, by not making any error messages. And yes, I used the CFLAGS += -DBIGENDIAN thing in the Makefile during compiling. Otherwise it would have ended up with a segmentation fault after printing the Shibatch sampling rate converter version 1.30 line.

--rate 44100

Oh, it was that easy!

It works now, the file is large enough and even the high precision version works.

Code: [Select]
powerbook% ./ssrc_hp --rate 44100 --bits 16 --normalize --dither 3 --pdf 1 foo.wav bar.wav
Shibatch sampling rate converter version 1.30(high precision)

frequency : 96000 -> 44100
attenuation : 0dB
bits per sample : 24 -> 16
nchannels : 2
length : 9967488 bytes, 17.3047 secs
dither type : ATH based noise shaping, triangular p.d.f, amp = 0.9

Pass 1
100% processed, ETA =   0sec
peak : -1.005dB

Pass 2
100% processed, ETA =   0sec

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #7
In case you havn't already noticed: rate is specified Hz, not kHz.

edit: Oh, too late. I oughta press preview first

Cheers!

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #8
Experimenting with the noise pdf settings now.

Triangular is the cleanest, but represents the most greatest amount of hiss. Gaussian makes less hiss, but replaces it with smooth mixture of crackles. The rectangular setting makes less crackles and almost no hiss, but it's fragmented so roughly that it might be quite annoying. Just like a poorly implemented noise cancellation in a phone line.

Interesting tradeoffs. Hard to choose. Or then again, perhaps not.  The triangular noise setting makes the cleanest impression after all.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #9
Uhhmm.... your target is 16/44. With ATH-based noise shaping you shouldn't hear the dither/quantization noise unless you turn the volume up really high.

Or maybe you're using a weird EQ setting for playback boosting high frequencies too much. Since the ATH noise shaper pushes the quantization noise into the higher spectrums parts they might be audible in your setup as bright hiss.

The "color" of the noise can be controled via --dither. Crackling is usually the result of having not enough dither (--pdf, scale too low)

Just to be clear:
"--pdf" controls the dither signal
"--dither" controls the noise shaping filters
The author confused dither with noise shaping.

halverhahn's advice is a very reasonable choice of parameters.

edit: If you don't find a pleasing "--dither" setting you may want to design your own noise shaping filters. See http://www.hydrogenaudio.org/forums/index....showtopic=47980

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #10
Uhhmm.... your target is 16/44. With ATH-based noise shaping you shouldn't hear the dither/quantization noise unless you turn the volume up really high.

I listened to the cleaness of the different approaches using --rate 22050 --bits 8 which might explain it a bit.

edit: If you don't find a pleasing "--dither" setting you may want to design your own noise shaping filters. See http://www.hydrogenaudio.org/forums/index....showtopic=47980

Hmm. Do you mean that it can actually be embedded in the two dimensional const double shapercoefs[8][21] array within ssrc.c? The format looks almost the same, I wonder how that is done the right way.

Already started to like the ssrc_hp, so should I get something better will definitely contribute that to the LGPL source code.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #11
Hmm. Do you mean that it can actually be embedded in the two dimensional const double shapercoefs[8][21] array within ssrc.c? The format looks almost the same, I wonder how that is done the right way.

Either that (*) or you could just use the requant tool that's packaged with iiirdsgn (resample to 24/44 via ssrc and requantize to 16/44 via requant. It should accept input from stdin as well.)

(* SSRC only supports FIR noise shapers. So you can't use poles (those red crosses) = denominator must be 1. Then you can use the nominator coefficients (without the leading 1) and add these as a new filter in ssrc.c if you like.)

But this is really overkill. You should be fine with the standard SSRC noise shaping filters (I havn't actually checked their response, though.)

Cheers!
Sebastian

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #12
[quote name='solarflare' date='Feb 2 2007, 15:54' post='468864']
[quote name='SebastianG' post='468851' date='Feb 2 2007, 20:53']Uhhmm.... your target is 16/44. With ATH-based noise shaping you shouldn't hear the dither/quantization noise unless you turn the volume up really high.[/quote]
I listened to the cleaness of the different approaches using --rate 22050 --bits 8 which might explain it a bit.

I'm not sure that I consider your method of testing to be valid. The part about going to 8 bits is probably OK since it should basically just make the dither more audible. However, changing the rate to 22050 shifts the dither to an entirely different region of the audio spectrum and makes it quite a bit different sounding than your real target rate of 44100. If you are planning on ending up at 22050 then by all means test it the way you did, but if you are going to 44100 then you should test it at 44100. Just my opinion anyway.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #13
The part about going to 8 bits is probably OK since it should basically just make the dither more audible. However, changing the rate to 22050 shifts the dither to an entirely different region of the audio spectrum and makes it quite a bit different sounding than your real target rate of 44100.

I tested it again with --rate 44100 --bits 8 and yes, it was harder to hear the differences. But basically the frequency of the hiss only got higher while the rest of the features remained. The grainy parts in a recording distorted its loud peaks on both the gaussian and rectangular settings.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #14
But can you hear a difference with 16 bit? That's your target, and it's what is really important.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #15
This could be considered analogous to lstening to a low bitrate mp3 in order to learn what defects to listen for before listening to it at the targeted bitrate. If the OP knows what a certain kind of dithering sounds like at a low bit depth, but can't hear it at normal bit depth then he knows that it is working for him.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #16
But can you hear a difference with 16 bit? That's your target, and it's what is really important.

Haven't tested that at all.  My point was that the most pleasing artifacts are generated with the ATH setting, they sound like recorded to very old C-cassette tapes, depending on the settings. So when I choose that one with smaller granularity i.e. higher resolution, I know that should they be audible, it's a lot bearable to me than the rest of them.

By the way, what does the abbreviation ATH mean? All time high noise shaping? It takes the highest possible frequency and embeds extra noise to it?

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #17
ATH = absolute threshold of hearing

It should be noted, by the way (and someone correct me if I am wrong) that if there is even a small amount of high frequency noise already in the audio (small being on the order of one bit at the final resolution) then dithering is not only unneeded, it adds unnecessary noise to the audio, thus actually degrading it. Of course, if there is this much noise in your 24 bit data then what was the point of using 24 bit resolution.

And just out of curiosity, how listenable were your well dithered 8 bit files? There has been a lot of discussion about whether you can hear the difference between 16 bit audio and higher resolution, so I was just wondering what sort of a benchmark we get at 8 bits.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #18
And just out of curiosity, how listenable were your well dithered 8 bit files? There has been a lot of discussion about whether you can hear the difference between 16 bit audio and higher resolution, so I was just wondering what sort of a benchmark we get at 8 bits.

At 44.1 kHz very much so, at 22.05 kHz not so much. Hear it yourself on my sample page.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #19
Speaking of bit depth, I've found 12 bits to be quite sufficient for the vast majority of music, especially with dithering.

 

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #20
ATH = absolute threshold of hearing

Interesting. So it's a kind of lossy algorithm that meets the average ear? Can it be adjusted a bit lower within the source code so that it would meet those golden ears as well?

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #21
It should be noted, by the way (and someone correct me if I am wrong) that if there is even a small amount of high frequency noise already in the audio (small being on the order of one bit at the final resolution) then dithering is not only unneeded, it adds unnecessary noise to the audio, thus actually degrading it. Of course, if there is this much noise in your 24 bit data then what was the point of using 24 bit resolution.
Dithering is needed to avoid truncation distortion when reducing the wordlength (e.g. 48 to 24 bit). It's a choice: you either add (dither) noise or you get extra distortion (which is correlated to the signal). IMHO a dithered signal is closer to "lossless" than a truncated signal.
The "self dithering effect" is often used as an excuse for not using dither. Quote from Bruno Putzeys (ex. Philips engineer):
"There's no such thing as self dithering. There are no "natural" noise sources of which the fourier transform of the PDF has a sting of zeros to coincide with the spikes of the fourier transform of a staircase."

If you're sure you don't want to use dither, that's fine. But please don't blame dithering for "adding unnecessary noise" since it's either noise or distortion.
There ain't no such thing as a free lunch  .

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #22
ATH = absolute threshold of hearing

Interesting. So it's a kind of lossy algorithm that meets the average ear? Can it be adjusted a bit lower within the source code so that it would meet those golden ears as well?


Hmm, I don't think "lossy algortihm" is quite right.

Dither is essential to avoid truncation distortion when reducing bit-depth or doing any other mathematical function on fixed-point audio (including down-sampling without a reduction of bit-depth), and ATH noise shaping is a way of making the perceived noise about 12 to 18 dB quieter than flat dither.

Dither doesn't cause any loss of signal, in fact sufficient dither prevents the loss (and the distortion products) that truncation distortion can allow. It does this at the expense of a broad-bandwidth noise that is not tonal or correlated to the original signal. It's a mathematical function that doesn't depend on psychoacoustics at all, so it won't have unexpected artifacts - just the well-understood dither noise.

If you accept that you have to have sufficient dither for correct reproduction (which is mathematically demonstrable), it makes some sense to shape the noise to approximately minimise its perceptual loudness.

If you're interested, you could read a post I made in my previous incarnation on this forum a few years ago (nickname: DickD), which includes graphical depictions countering many people's erroneous assumptions such as the understandable cursory assumption that when you make CD audio quieter using ReplayGain any signals below -96dB in the original signal are suddenly lost (not true), or that CDs have a dynamic range of 96 dB (in fact it's infinite in theory, subject to an infinite averaging time, and is perceptually around 120 dB in practice given the ear's effective averaging time).

That post shows you the shape of typical ATH-shaped dither compared to flat dither (foobar2000, even back in version 0.6, performs similarly to SSRC) and demonstrates that tonal and noise-like signals (like snare drums) well below single-bit amplitude can be perceived (with audible samples and graphical representations).
Dynamic – the artist formerly known as DickD

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #23
It should be noted, by the way (and someone correct me if I am wrong) that if there is even a small amount of high frequency noise already in the audio (small being on the order of one bit at the final resolution) then dithering is not only unneeded, it adds unnecessary noise to the audio, thus actually degrading it. Of course, if there is this much noise in your 24 bit data then what was the point of using 24 bit resolution.
Dithering is needed to avoid truncation distortion when reducing the wordlength (e.g. 48 to 24 bit). It's a choice: you either add (dither) noise or you get extra distortion (which is correlated to the signal). IMHO a dithered signal is closer to "lossless" than a truncated signal.
The "self dithering effect" is often used as an excuse for not using dither. Quote from Bruno Putzeys (ex. Philips engineer):
"There's no such thing as self dithering. There are no "natural" noise sources of which the fourier transform of the PDF has a sting of zeros to coincide with the spikes of the fourier transform of a staircase."

If you're sure you don't want to use dither, that's fine. But please don't blame dithering for "adding unnecessary noise" since it's either noise or distortion.
There ain't no such thing as a free lunch  .


I stand corrected.

Scaling down 96 kHz/24 bit to 44.1 kHz/16 bit?

Reply #24
If you're sure you don't want to use dither, that's fine. But please don't blame dithering for "adding unnecessary noise" since it's either noise or distortion.

That's exactly what I experienced with the 22050 kHz / 8 bit settings! It was either noise or distortion. And I have to admit that the noise gives far more pleasing impression than the distortion. YMMV.

 
SimplePortal 1.0.0 RC1 © 2008-2020