Question about PCM with 32-bit float

Topic: Question about PCM with 32-bit float (Read 12447 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Question about PCM with 32-bit float

2016-07-29 03:43:08

I am looking to be corrected if I am wrong.

With 16 and 24 integer float, clipping occurs if the digital signal being sampled exceeds 0 dB. They both give the amplitude of the signal with a minimum that is below the quantization floor and a maximum that is 0 dB. Anything above 0 dB is clipped.

With 32 bit float, we get the same dynamic range as 24 bit integer but it can actually record accurate signal above 0 dB giving some head room if either the recording gain is set too high or the filter we run results in signal above 0 dB.

That head room is only temporary, meaning that when exporting to 16-bit for final product, and signals above 0 dB have to be reduced or they are clipped.

So 32-bit float only gives us headroom to avoid clipping if gain set too high, but doesn't actually provide dynamic range beyond 24-bit. Is that correct?

Secondly, if the sound card being used is 24-bit integer, then using 32-bit float in the recording software does not give any protection from input gain being too high - just protection against post recording filters that might otherwise clip. Is that correct?

I've read several links that talk about difference between 24-bit integer and 32-bit float, some of them seem contradictory.

Finally, from what I've read, no recording hardware actually has the fidelity to give us more than 20-bits of dynamic range. Meaning if we record at 24 bit float, we won't actually have more than 120 dB of dynamic range even if such dynamic range were to exist in the analog signal (which is kind of doubtful)

Is that correct?

Thank you for any corrections.

Re: Question about PCM with 32-bit float

Reply #1 – 2016-07-29 04:02:37

Quote

So 32-bit float only gives us headroom to avoid clipping if gain set too high, but doesn't actually provide dynamic range beyond 24-bit. Is that correct?

No... There is a HUGE dynamic range. I don't know what the largest floating-point number and the smallest floating-point number (which is a tiny fractional number) is, but the dynamic range is essentially infinite as far as it relates to sound. I'm not even sure if Excel or a calculator would give you an accurate dB calculation.

This isn't directly related to your question, but 0dB in 16-bits is +32767 or -32,768, and in floating-point 0dB is 1.0. So with typical audio files floating point samples are all below one, and with integer formats the only samples below one are zeros.

Quote

Secondly, if the sound card being used is 24-bit integer, then using 32-bit float in the recording software does not give any protection from input gain being too high - just protection against post recording filters that might otherwise clip. Is that correct?

That's correct. 0dB is defined as the digital maximum (integer) from you ADC (or into your DAC). ADCs and (and DACs) are integer devices, but your driver or software can convert to floating-point or any other bit-depth. (Or in the case of DACs, the driver can convert the data to 24-bit integer before it's sent to the DAC.)

Quote

Finally, from what I've read, no recording hardware actually has the fidelity to give us more than 20-bits of dynamic range. Meaning if we record at 24 bit float, we won't actually have more than 120 dB of dynamic range even if such dynamic range were to exist in the analog signal (which is kind of doubtful)

I believe that's true.

Re: Question about PCM with 32-bit float

Reply #2 – 2016-07-29 05:01:24

Quote from: AliceWonderMiscreations on 2016-07-29 03:43:08

With 16 and 24 integer float, clipping occurs if the digital signal being sampled exceeds 0 dB. They both give the amplitude of the signal with a minimum that is below the quantization floor and a maximum that is 0 dB. Anything above 0 dB is clipped.

There's no such thing as "integer float" - it's either fixed point integer or floating point, but integer clips at 0dBFS, whereas (as stated above) the way it's typically implemented 32 bit floating point allows for hundreds of dB's (or thousands of dB's with 64 bit floating point) of headroom above 0dB.

Quote

With 32 bit float, we get the same dynamic range as 24 bit integer but it can actually record accurate signal above 0 dB giving some head room if either the recording gain is set too high or the filter we run results in signal above 0 dB.

32 bit float has a ridiculous amount of dynamic range from an audio perspective and somewhat more precision than 24 bit integer. But since an ADC is in fixed point there is no way to "record" at greater than 0dB - it will be clipped by the ADC before it is converted to floating point. But you can boost a signal above 0dB in floating point or create/process signals that go above 0dB without clipping.

But note that many commonly used processing plugins will clip - particularly those emulating analog gear because they sometimes deliberately emulate analog clipping. IOW, you don't want to go above 0dB unless you really know what you are doing.

Quote

That head room is only temporary, meaning that when exporting to 16-bit for final product, and signals above 0 dB have to be reduced or they are clipped.

Correct.

Quote

So 32-bit float only gives us headroom to avoid clipping if gain set too high, but doesn't actually provide dynamic range beyond 24-bit. Is that correct?

Again, 32 bit floating point has ridiculous dynamic range, but ultimately you are indeed limited by the dynamic range of the output format.

Quote

Secondly, if the sound card being used is 24-bit integer, then using 32-bit float in the recording software does not give any protection from input gain being too high - just protection against post recording filters that might otherwise clip. Is that correct?

Correct.

Re: Question about PCM with 32-bit float

Reply #3 – 2016-07-29 05:22:09

Sorry integer float was a typo - I meant integer

Re: Question about PCM with 32-bit float

Reply #4 – 2016-07-29 11:47:00

Quote from: AliceWonderMiscreations on 2016-07-29 03:43:08

I am looking to be corrected if I am wrong.

With 16 and 24 integer ~~float~~ , clipping occurs if the digital signal being sampled exceeds 0 dB.

The preferable terminology might be: "Clipping occurs when the signal exceeds digital FS"

With integer data, digital FS is when the data is all zeroes (-FS) or all ones. (+FS)

With floating point data, digital FS is dependent on both the exponent part of the data, and the mantissa part. Since the magnitude represented by the exponent part can easily exceed the magnitude of integer data, a far larger range of data may be handled, but at some cost in precision.

In general DACs and ADCs operate at their core in integer or fixed point. A converter with support for floating point data usually obtains it by means of some built-in numerical conversion facility.

Floating point arithmetic is mostly advantageous in audio editing programs, where intermediate data can be very large or very small. Because they are just software, support for floating point data formats with high precision can be inexpensive and relatively easy to implement.

Re: Question about PCM with 32-bit float

Reply #5 – 2016-08-01 02:28:07

The headroom provided above 0 dB full scale in 32-bit float audio files is not temporary.
It's there as long as the file itself exists. But it is worth noting that when the file is exported down to even 32-bit integer, then those overs are clipped in the new 32-bit integer file. The 32-bit float file remains unchanged by the export. It's a good archival format. And WAVpack is available to compress it losslessly.

And as far as hardware converters go, the theoretical limitations are at about 144 dB of dynamic range and 32-bit float can accommodate that. Not all audio is recorded, however. Plenty of electronic music is generated, mixed, processed, edited, authored, and mastered within computers and 32-bit float and even 64-bit float operations are common now.

Re: Question about PCM with 32-bit float

Reply #6 – 2016-08-01 10:46:08

Quote from: Nystagmus on 2016-08-01 02:28:07

32-bit float … It's a good archival format

Not really: it takes up more space than integer formats and can lack essential information, i.e. what compression/limiting/etc. should be used to obtain something that can actually be sent to a DAC.

Re: Question about PCM with 32-bit float

Reply #7 – 2022-07-28 13:21:24

I can see that 32bit float can handle a much higher dynamic range than 32bit integer. It's easy to verify this with low level signals in a DAW.

But how does 32bit float having less 'precision' work? How does it affect the signal/audio fidelity?
For example, if I compare a 32bit integer and a 32bit float file that's derived from it in Foobar's bit-comparator, it says there's no difference. That's because Foobar converts all files to 32bit float for comparison?
If I compare them in DeltaWave, the 'Correlated Null' is 218dB, which is still a lot. I thought that 32bit float can only match integer precision down to 24 or 25 bit depth. Is that not correct?

Is there an example where a 32bit float encoding would show itself as worse than 32bit integer for audio?
The theory is still over my head, so I'm looking at this from a somewhat more practical/experimental angle.

Re: Question about PCM with 32-bit float

Reply #8 – 2022-07-28 13:25:29

My attempt to explain it:
https://www.audiosciencereview.com/forum/index.php?threads/zoom-f6-portable-field-recorder-review.15668/post-507610

Re: Question about PCM with 32-bit float

Reply #9 – 2022-07-28 17:11:01

Quote from: Brand on 2022-07-28 13:21:24

For example, if I compare a 32bit integer and a 32bit float file that's derived from it in Foobar's bit-comparator, it says there's no difference. That's because Foobar converts all files to 32bit float for comparison?

foobar2000 works in 32-bit float internally, so yeah.

Quote from: Brand on 2022-07-28 13:21:24

I thought that 32bit float can only match integer precision down to 24 or 25 bit depth. Is that not correct?

That doesn't mean that the error in the 26th bit is "large". If you compare 26 bit integer to 25 bit integer (truncated for the sake of simplicity of the argument), the error is the 26th bit. If you convert a 26 bit integer signal to 32 bit float and then back again, some values will be right and some will be wrong, and the average noise will be lower.

Quote from: Brand on 2022-07-28 13:21:24

Is there an example where a 32bit float encoding would show itself as worse than 32bit integer for audio?

"worse" to the human ear? No, we are way beyond human hearing. But yes, there are signals that would be accurately represented in 32 bit integer, but not in 32 bit float.

Quote from: Brand on 2022-07-28 13:21:24

The theory is still over my head, so I'm looking at this from a somewhat more practical/experimental angle.

Suppose that you see 678. Normally that means "6*100 + 7*10+8*1". But if this is a column saying "all figures in percent", it does instead mean a hundredth of this. So by shouting out "here, [different convention] applies!" you can get three consecutive nonnegative integers to mean something else. There will be 1000 different possible numbers to represent - but (usually) a different thousand than 0 to 999.

Let us assume that you have three nonnegative digits to play with then.

For something not far off floating-point binary, think in terms of decimal numbers and the "scientific" notation 1.2*10^3 for 1200. The "most significant" part of this is the exponent "3" that determines the magnitude. Then the 1. And least significant is that decimal thing D.
So shout it out: whenever you store three nonnegative integers "ECD", that shall from now on be read as C.D*10^E.
Here I will use quotation marks around "ECD" to signify that I am indeed using that format! So you want to represent 1200 (<-- no quotation marks, meaning: ordinary one thousand two hundred) - and you do that by "312" meaning 1.2*10^3.

You see already that you can get a much larger range. 1200 is four digits. "312" is three. You can get up to "999" that means 9.9*10^9, which is big.
But you lose resolution above 100. Every number up to 99 can be written as say 67=6.7*10^1 represented as "167", and 100 can be represented as "210" - but the next number is "211" that stands for 110. In face, you cannot represent 101 to 109 this way. And that is to be expected - after all you have only a thousand distinct numbers so if you get some new ones, some old will have to go. When you are in the millions range, the resolution is a hundred thousand: "610" for a million, "611" for 1.1 million etc.

Actually you gain resolution for numbers < 10. For example, 3.4 is not an ordinary integer, but you can represent it as "034" to mean 3.4*10^0. Below 1 you get at odds with "scienfic notation" that would represent 0.2 this as 2*10^(-1) whereas this would represent it as "0.2*10^0", but forget about that and grasp the point:
* In ordinary representation, "ECD" is to be read as 100E+10C+D and represents integers from 0 to 999
* By letting "ECD" mean something else, it , you can get it to represent a different set of numbers.

(Now if you remember the old pocket calculators that would write overflows with an e08 or something? That works this way, but with more decimals after the D and with another digit in the exponent.)

IEEE 754 floating-point does something similar, but with powers of 2 rather than 10 (that makes a few things simpler), with sign - and with exceptions to cover infinities, not-a-number things etc.

Re: Question about PCM with 32-bit float

Reply #10 – 2022-07-28 20:20:59

Something to add, specific to your post:

Quote from: Brand on 2022-07-28 13:21:24

For example, if I compare a 32bit integer and a 32bit float file

It is important to know how the two files are generated/converted/created. The generator must be able to handle 32-bit integer and float with full precision for a meaningful comparison. It is usually achieved by using 64-bit float internal math because it can handle all possible 32-bit integer and float values losslessly.

Quote

If I compare them in DeltaWave

DeltaWave's methodology is based on FFT, which means averaging a lot of samples. The "correlated null" value is not useful for what you want to understand. It is much easier and appropriate to look at a single value.
https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_fround
Change the code in the tutorial in this way:

Code: [Select]

let a = Math.fround(9876543);
let b = Math.fround(987654321);
let c = Math.fround(9876543210);
let d = Math.fround(16777216);
let e = Math.fround(-16777211);
let f = Math.fround(987.654321);

...and the results look like this:

Code: [Select]

9876543
987654336
9876543488
16777216
-16777211
987.654296875

a: No difference, but this value is invalid for signed 24-bit integer which only has a range from -8388608 to 8388607.

b: Error = 15
32-bit integer can losslessly represent this value, but 32-bit float can't because a fixed amount of bits are reserved for exponent in order to represent a huge dynamic range (about 1500dB for 32-bit float).

c: Error = 278
9876543210 will clip or overflow when stored as 32-bit integer, so despite the larger error, float is still light years better than integer. When converting to integer, this value must be normalized/reduced to 2147483647 or smaller, so error will also be reduced proportionally.

d: No difference, and 32-bit float can losslessly represent every integer value from -16777216 to 16777216 losslessly (25 bits)

e: No difference, as an example that float can represent negative values in the same way as positive values.

f: While there is some error, 32-bit float is still much more precise than integer, which will round to 988.

So you can see that float is much more consistent when representing large and small values. 1500dB dynamic range is a waste for audio, but until generic hardware/software can use arbitrary mantissa/exponent bit assignment without performance penalty, it is a necessary compromise when using float. Some GPU/AI processors have more flexibility on floating point types with lower bit-depth (16-24) and different mantissa/exponent bit assignment though.

Out of the processing chain, for a static file format, since 32-bit float will start to lose precision (compared to 32-bit integer) when the sample values are beyond +/-16777216, depends on the audio content, if a specific audio file mostly contains smaller values, float will have an advantage, otherwise 32-bit integer will have better numeric precision. For this reason, it is often recommended to normalize the audio file to somewhere between 0dBFS to -1dBFS when exporting to integer in order to optimize precision.

You can use the program I made to check the distribution of sample values in your audio files, something DeltaWave won't do:
https://hydrogenaud.io/index.php/topic,114816.msg1010860.html#msg1010860
The program won't tell you the bits are data or garbage, of course. It just shows the distribution.

Re: Question about PCM with 32-bit float

Reply #11 – 2022-07-29 15:14:45

Thanks for the explanations! I still have to process it all.

FWIW, the 32bit integer file that I used for comparison is the RMAA test signal.

Re: Question about PCM with 32-bit float

Reply #12 – 2022-07-29 15:43:40

Quote from: Brand on 2022-07-29 15:14:45

FWIW, the 32bit integer file that I used for comparison is the RMAA test signal.

That's no good. RMAA's int32 file is converted from the float32 signal, and if you carefully look at the "stereo crosstalk" spectrogram, all integer formats are clipped, the only unclipped one is the float signal. Luckily RMAA only uses that signal to measure crosstalk and ignores the clipping. The clipping can be easily identified even with a low precision analyzer like Spek. Notice the vertical lines in the integer file, those parts are clipped.