Use of non 8/16/24-bit audio, like 12-bit audio

Topic: Use of non 8/16/24-bit audio, like 12-bit audio (Read 17086 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #50 – 2022-03-09 10:10:08

Quote from: Porcus on 2022-03-09 07:36:16

It does say "nBitsPerSample" which I would assume is a typo for "wBitsPerSample"?

The names are no big deal and they are supposed to be the same thing. Basically one must rely on nBlockAlign and nAvgBytesPerSec, alongside with the other fields to determine how many bytes of data should be read at a time to to avoid issues like reading through boundaries of different samples/channels and such. Put wBitsPerSample at a higher priority could be dangerous.

Quote

Well at least the first sentence suggests that 8 bits in 16 is invalid. (Edit: for WAVE_FORMAT_PCM, at least.)
Entire document preserved at http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf

May have something to do with 8-bit .wav files use unsigned integer to discourage people directly casting 8-bit data to 16-bit without treating the bytes as unsigned.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #51 – 2022-03-09 14:02:36

Quote from: bennetng on 2022-03-09 10:10:08

Basically one must rely on nBlockAlign and nAvgBytesPerSec, alongside with the other fields to determine how many bytes of data should be read at a time to to avoid issues like reading through boundaries of different samples/channels and such. Put wBitsPerSample at a higher priority could be dangerous.

I interpret your last sentence to mean that a player cannot just blindly accept a wBitsPerSample = "20" without checking that the byte count integer is 3 (or at least >2) - is that about right?

Pasting from the spec:

Quote from: spec1.0

Each sample is contained in an integer i. The size of i is the smallest number of bytes required to contain the specified sample size.

This is what I think rules out 8-in-16 or 15-in-24? If wBitsPerSample is 8 resp 15 resp 20, then i must correspond to 8 resp 16 resp 24 bits per channel, as anything more would not be the "smallest" number of bytes.

Quote from: spec1.0

The least significant byte is stored first. The bits that represent the sample amplitude are stored in the most significant bits of i, and the remaining bits are set to zero.

So a 20 bit signal should have bits 21 to 24 set to zero, and the sign bit should be MSB.
And then the 12-bit example confirming this: Doesn't that unambiguously resolve a 12-bit 0000 10000001 0000 - and, to a sign bit = MSB = 0?
(Yeah, chicken myself intentionally gave a palindromic example to guard against my sooner or later inevitable endian blunder. Edited away one, even.)

I keep asking myself why there is a "20" at all if it is really nothing but a 24 with zeroes at the least significants? I have a gut feeling that they wanted to be sufficiently up-front to rule out any disasters caused by "bright" hacks like storing a 12-bit stereo in 3 bytes.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #52 – 2022-03-09 14:22:37

The smallest unit is a byte, not a bit, a nibble or something else. For example in this post:
https://hydrogenaud.io/index.php?topic=121906.msg1006400#msg1006400
The .wav file is still stored as a 8-bit format, it can't be as small as a .dsf or .dff file in uncompressed form.

Endianness is another thing, but .wav must be little.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #53 – 2022-03-09 16:07:25

Quote from: Porcus on 2022-03-09 07:36:16

Sounds weird. The 1.0 WAVE specification explicitly mentions both 12 and 20 bit WAVE_FORMAT_PCM, that 12 bit files in 16 bits WAVE should have the least significant half of the least significant byte zeroed and that the sample amplitude should be in the most significant bits of the integer.

Okay, interesting. I really wonder what is meant by the earlier quote then

Quote

"resolve the ambiguity of wBitsPerSample for WAVE_FORMAT_PCM"

Anyway, patching this into FLAC is a small change, and a check for these bits to be empty is already in place.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #54 – 2022-03-09 17:14:18

Quote from: ktf on 2022-03-09 16:07:25

Okay, interesting. I really wonder what is meant by the earlier quote then
Quote
"resolve the ambiguity of wBitsPerSample for WAVE_FORMAT_PCM"

*googling*

According to https://wavefilegem.com/how_wave_files_work.html , it refers to this sentence:

Quote

The wBitsPerSample field specifies the number of bits of data used to represent each sample of each channel

Read stand-alone out of context without taking note of the discussion the two next pages, it isn't far-fetched that someone could interpret it as well, 20 valid bits require 24 bits "used to represent" them.
This blog seems to agree: https://www.appletonaudio.com/blog/tag/wave-file-format/

But heck, here the International Telecommunication Union speaks about 5 bytes for 20-bit stereo (!)

Quote

NOTE 1 – The original WAVE specification permits, for example 20-bit samples from two channels to be
packed into 5 bytes-sharing a single byte for the least significant bits of the two channels. This
Recommendation specifies a whole number of bytes per audio sample in order to reduce ambiguity in
implementations and to achieve maximum interchange compatibility.

Sounds crazy, but at least an ITU recommendation is a source for something although I am not confident as to precisely what.

The diagram "Data Packing for 16-Bit Stereo PCM" page 59 in the 1.0 PDF also is a bit at odds with the subsequent part on what a "Sample" is. It shows "16-Bit Stereo" as a "Sample" of 32 bits. There might be issues to those who read specs with no more sense of context than how a compiler reads code ...

I am tempted to read the Microsoft article as "not our fault, someone else not us started making mistakes", and that could very well show up for at least one out of two reasons: wanting to blame someone else, and not wanting to put the blame on any particular someone. Speculations though.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #55 – 2022-03-09 18:26:14

Quote from: Porcus on 2022-03-09 17:14:18

There might be issues to those who read specs with no more sense of context than how a compiler reads code ...

To be fair, that is how technical specifications are supposed to be read, because in technical specifications, avoiding ambiguity is crucial, and it turns out that how a compiler reads code is better at avoiding ambiguity than how meat computers read meat scribbles.

That is also why "legalese" is essentially a programming language for law, and why it's so arduous to read.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #56 – 2022-03-09 18:43:40

Quote from: doccolinni on 2022-03-09 18:26:14

To be fair, that is how technical specifications are supposed to be

written.

Methinks.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #57 – 2022-03-09 18:54:04

It should be mentioned that WAVEFORMAT/WAVEFORMATEX/WAVEFORMAT_EXTENSIBLE are both, a physical format and a communication format (i.e. the format used to communicate audio to/from the soundcard).

As such, one thing is the meaningful audio data in the bytes, and another is the bytes transmitted.

Usually, on a file on disk, one would store as few bytes as possible, except in situations where reading/writing that file could cause more difficulties than just making it slightly bigger.
(The case of 5 bytes for a stereo 20bits track is feasible because it is easy to read 5 bytes and split byte 3 in 2 half. A case of a stereo 13bits file would be quite difficult to read)

That's why a 24 bit file on disk would usually be 24bits per channel, while on communication, the common format is 24 in 32bits (i.e. bitspersample 24 but nblockalign 4*numChannels (so, 4 bytes)

Code: [Select]

		/// Number of channels (mono=1, stereo=2)
		uint16_t wChannels;
		/// Sampling rate [Hz]
		uint32_t dwSamplesPerSec;
		/// Indication of the amount of bytes required for one second of audio
		uint32_t dwAvgBytesPerSec;
		/// block align (i.e. bytes of one frame).
		uint16_t wBlockAlign;
		/// bits per sample (i.e. bits of one frame for a single channel)
		uint16_t wBitsPerSample;

Waveformatextensible added this, aside of channel orientation and allowing to specify a specific subformat:

Code: [Select]

		uint16_t numberOfValidBits;

numberOfValidBits does actually allow non-byte-aligned values. The original format was not supposed to support it and of course, bits would be packed into a full byte.

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #58 – 2022-03-09 19:43:22

Yes. For example my Sound Blaster with 24-bit AD/DA converters and advertised as 24-bit in marketing materials uses 32-bit ASIO. The card supports ASIO digital loopback recording ("What U Hear") and in this case the resulting .wav file actually contains 32 valid bits without zero padding.

Code: [Select]

Device: Creative ASIO

Features:
Input channels: 14
Output channels: 18
Input latency: 196
Output latency: 96
Min buffer size: 48
Max buffer size: 32768
Preferred buffer size: 96
Granularity: 8
ASIOOutputReady - supported
Sample rate:
 8000 Hz -  not supported
 11025 Hz - not supported
 16000 Hz - not supported
 22050 Hz - not supported
 32000 Hz - not supported
 44100 Hz - supported
 48000 Hz - supported
 88200 Hz - supported
 96000 Hz - supported
 176400 Hz - not supported
 192000 Hz - not supported
 352800 Hz - not supported
 384000 Hz - not supported
Input channels:
 channel: 0 (Mix FL) - Int32LSB
 channel: 1 (Mix FR) - Int32LSB
 channel: 2 (Mix RL) - Int32LSB
 channel: 3 (Mix RR) - Int32LSB
 channel: 4 (Mix FC) - Int32LSB
 channel: 5 (Mix LFE) - Int32LSB
 channel: 6 (Mix RC or SL) - Int32LSB
 channel: 7 (Mix RC or SR) - Int32LSB
 channel: 8 (Mic /Mic2 L) - Int32LSB
 channel: 9 (Mic /Mic2 R) - Int32LSB
 channel: 10 (Digital-In L) - Int32LSB
 channel: 11 (Digital-In R) - Int32LSB
 channel: 12 (Aux L) - Int32LSB
 channel: 13 (Aux R) - Int32LSB
Output channels:
 channel: 0 (Front L/R) - Int32LSB
 channel: 1 (Front L/R) - Int32LSB
 channel: 2 (Rear L/R) - Int32LSB
 channel: 3 (Rear L/R) - Int32LSB
 channel: 4 (Front C/Sub) - Int32LSB
 channel: 5 (Front C/Sub) - Int32LSB
 channel: 6 (Rear C/Top) - Int32LSB
 channel: 7 (Rear C/Top) - Int32LSB
 channel: 8 (Side L/R) - Int32LSB
 channel: 9 (Side L/R) - Int32LSB
 channel: 10 (FX 1 L/R) - Int32LSB
 channel: 11 (FX 1 L/R) - Int32LSB
 channel: 12 (FX 2 L/R) - Int32LSB
 channel: 13 (FX 2 L/R) - Int32LSB
 channel: 14 (FX 3 L/R) - Int32LSB
 channel: 15 (FX 3 L/R) - Int32LSB
 channel: 16 (FX 4 L/R) - Int32LSB
 channel: 17 (FX 4 L/R) - Int32LSB

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #59 – 2022-03-09 20:31:06

Quote from: bennetng on 2022-03-09 19:43:22

Yes. For example my Sound Blaster with 24-bit AD/DA converters and advertised as 24-bit in marketing materials uses 32-bit ASIO.

ASIO supports (or at least supported) these:
LSB/MSB is about bit endianess. The first number is bitdepth of the bytes, and the second, if present, is the bitdepth of the actual audio
See that it had values for 16, 18, 20, 24 and 32 bits.

Also note that some hardware could opt to use 24bits even if it could only provide 20bits of precision.

Code: [Select]

		case ASIOSTInt16LSB:
		case ASIOSTInt32LSB16: // 32 bit data with 16 bit alignment
		case ASIOSTInt16MSB:
		case ASIOSTInt32MSB16: // 32 bit data with 16 bit alignment
			fullname = fullname + " : 16 bit";
			break;
		case ASIOSTInt32LSB18: // 32 bit data with 18 bit alignment
		case ASIOSTInt32MSB18: // 32 bit data with 18 bit alignment
			fullname = fullname + " : 18 bit";
			break;

		case ASIOSTInt32LSB20: // 32 bit data with 20 bit alignment
		case ASIOSTInt32MSB20: // 32 bit data with 20 bit alignment
			fullname = fullname + " : 20 bit";
			break;
		case ASIOSTInt24LSB:   // used for 20 bits as well
		case ASIOSTInt32LSB24: // 32 bit data with 24 bit alignment
		case ASIOSTInt24MSB:   // used for 20 bits as well
		case ASIOSTInt32MSB24: // 32 bit data with 24 bit alignment
			fullname = fullname + " : 24 bit";
			break;
		case ASIOSTInt32LSB:
		case ASIOSTInt32MSB:
			fullname = fullname + ": 32 bit";
			break;
		case ASIOSTFloat32LSB: // IEEE 754 32 bit float, as found on Intel x86 architecture
			fullname = fullname + ": 32 bit float";
			break;
		case ASIOSTFloat64LSB: // IEEE 754 64 bit double float, as found on Intel x86 architecture
			fullname = fullname + ": 64 bit float";
			break;
		case ASIOSTFloat32MSB: // IEEE 754 32 bit float, Big Endian architecture
		case ASIOSTFloat64MSB: // IEEE 754 64 bit double float, Big Endian architecture

Re: Use of non 8/16/24-bit audio, like 12-bit audio

Reply #60 – 2023-01-03 15:47:15

Quote from: Porcus on 2022-01-17 14:33:35

20-bit files - FLAC, WAVE, AIFF ... - are less widespread, so the question for a FLAC testbench would be how many players can handle 20-bit FLAC files.

24-bit FLAC files can of course carry 20-bit PCM signals by zeroing out the least four bits, and without any size penalty (FLAC just flags "four wasted bits!"). Indeed, some applications do that for compatibility reasons (not only for FLAC compatibility, but also for other formats), and thus making 20-bit FLAC even scarcer.
24-bit FLAC files carrying 20-bit PCM will be decoded to 24-bit WAV. (Well at least I know no decoder that will scan the FLAC file, note that the number of wasted bits is always at least four but sometimes less than eight, and therefore output a 20-bit file.)

(Several lossless formats support only multiples of 8. At the other end, WavPack provides for any (integer) number of bits from 1 to 32.)

Sorry to be pedantic, if I'm understanding correctly there is a space penalty to the wasted bits flag in this case of 4 bits per frame (it's unary-coded so storing 20 bit signals in 24 bits adds 4 bits to each flag). A maximal encoder could use this to its advantage but the benefit is so small on input so rare that no one should bother IMO, it would save ~1KiB per 3 minute 48khz track that started life as 20 bit stored in 24 bit. MD5 is bytewise so there should be no major time difference between encoding 20 bit input vs 20-bit-in-24-bit input FWIW.

The most basic level of lossless that we should care about should be no more than the signal itself not how wide the integer that held the signal before encoding was. But that ideal is trickier than it sounds, it makes sense that most encoders include width of the source as part of the encoding:

The true depth is not known until everything is read, the source depth is known in advance
What is lost rarely in a bloated wasted bits flag is gained always in the sample size encoding per frame (flac uses 3 bits per frame, arbitrary up to 32 bits would take 5 bits, an ideal variable bit encoding would use 1.x bits per frame on average taking into account how common 16/24 is and how rare everything else is)

Notice