Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Understanding PCM (Read 3934 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Understanding PCM

First I just wanted to let everyone know that I think the people here are really knowledgeable.  I've been reading some of the other topics here and you people really know your stuff.

I know my question may seem pretty basic, but I can't seem to find a good answer anywhere:  What exactly is stored in the data for linear PCM audio?  This question has to do as it relates to the data stored in a .wav file.  I've found all kinds of good info about the headers in wav files, how they are a type of RIFF file, specifies how big it is, what kind of encoding, sample rate, etc.  But when I look at that first 8 bits of data (or 16, 20, or 24), what am I seeing?  What does it mean?  The question is because I'm working on a program to manipulate .wav data, but I don't know what to do with the data because I don't know what it means.  Any help with this would be greatly appreciated, even if it is just a link to somewhere that explains this concept, because I couldn't find it.  Thanks.


Understanding PCM

Reply #2
No, that site is similar to the ones I've already found, which talk about the header info, the different chunks, etc.  Then there is this big blanket statement that "the rest of it is the data, with 1 unsigned byte for 8 bit sampling and 2 - 2's complement signed bytes for 16 bit sampling."

What I want to know is:  What does the data mean?  If I'm looking at 16 bit data and the data stores 5000 (decimal), what does that mean?  What if the data stores 15000 (decimal)?  What about 0?  I'm trying to understand how the data that is sampled corresponds to the sound output.  If I were going to go in and use a hexeditor to change the data in a wav file, what would I need to do to change it?  What do the different values correspond to?  That is what I'm trying to find.  But thanks for the help.

Understanding PCM

Reply #3
"16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767."

32767 is max. possible value of the signal. -32768 is min. possible value of the signal. 0  is silence. Those values represent the sampled and quantized values of the analog signal they represent. Play a little bit with a wave editor such as Cool Edit, and look at the binary values of some generated by you waveforms to help you understand how it works.

Understanding PCM

Reply #4
Hi Glycerinesoda;  I don't know if this will answer your question, but let me take a shot.

Suppose I create a simple wave file using the numbers you give as an example.  It would look like this:



There are five "samples", with values of 0, 5000, 15000, 5000, 0.  Now I open it in a hex editor, and the file looks like this:



Now, 5000 (decimal) converts to 1388 (hex), 15000 (dec) converts to 3A98.  Since a wav file uses "little-endian" byte ordering, the 1388 becomes 88 13 and the 3A98 becomes 98 3A.  Now look at the last bytes of data of the file.  You'll see:

00 00 88 13 98 3A 88 13 00 00 - which is, of course, the raw data that makes up the samples in the wav file.  You could, at this point, use any means you're comfortable with, even BASIC, to generate numbers to plug in there to create whatever waveform you needed. (Which could be a chore.)

Hope this helps some.  Dex

BTW - If you compare this data to whats shown on the link that KikeG posted, you'll see the breakdown of how this works.

Understanding PCM

Reply #5
O.K., I think I'm starting to understand this a little more.  Each sample is directly related to the size of the waveform at the moment of the sample, meaning the largest possible positive signal will give the largest signed (or unsigned) number in the data, and vice versa.  A period of silence gives you all 0's, since there is no sound wave.

So what this all means to me is that if I want to create a program to manipulate the aspects of sound that we hear (shift pitch, etc.) then I'm going to have to work on a larger scale than byte-wise (or 2 byte-wise for 16 bit sound), aren't I?  Thanks for the help everyone.  I see now that I have a little more work cut out for myself.

Understanding PCM

Reply #6
Yes, if you are going to do math with the samples you will have to provide elbow room. If you use regular 32-bit integers, you are pretty safe. However, when you convert back to 16-bit you will need to watch for overflow (clipping).

The favorite player of this forum, Foobar2000, has switched to double precision (64 bit) floating point processing to have much more than necessary elbow room. ;-)

Understanding PCM

Reply #7
In my opinion THE way for people which are new to any audio-related matter is to read some stuff about:

what are tones (physically)?  for instance sth. like: given through density-fluctuation of the medium (air, water, etc.) - so there are oscillators which hit a lot of particals of the medium (molecules) so that there is a progressing longitudinal density-distribution which shape carries the information forward until it reaches you and you are going to hear that impressed information (this density-distribution makes your ear-drum (or anything behind it) oscillating - the given puls will reach your brian).

The density-distribution can make a coil in a permanent toric magnet oscillating, current will be induced which shape is characteristic for the given sound (sonic).
This current-impluses can be converted with an analog-to-digital-converter (A/D-C) into a digital bit pattern (! here is quality loss as an infinite exact/accurate value has to be reduced to a value with a adjustable accuracy) which can be saved in different ways.
-> PCM (Puls Code modulation; sin/cos stuff)

-> Quantization gives sample-rate  - only a small number of the infinite exact sound-wave can be saved (otherwise too much diskspace is needed) (Nyquist: human is able to hear maximal 22,05 kHz tone --> 44,1 kHz (means 44100 values in one second) seems to be ok to save the information without loss (means: original wav-shape can be restored)

-> all used values (alotted by quantization) have to be saved; you have to pack a lot of different values into a given number of drawers and this determines the accuracy of your values as there is only a limited number of possible values (drawers) available; the number of values is given through the bitdepth, you may know 16bit for .wav (that container-format by MS that can have PCM-coded-information inside). If it is easier to you you can keep in mind (simplified) that this will affect the digital place of the values.
Best scale to save values: log (-> dB)

In what way does the frequency  and the amplitude sign the tone?
frequency will give the pitch, amplitude is clear I think: higher voulme/sound-level (higher velocity of density-distribution -> higher impuls-carry-over when hitting the microphon-membran -> higher induced current -> higher A/D-C-value).

maybe google for some further infos - ok, not maybe, just do it as fast as you can as it is basic-stuff 

 

Understanding PCM

Reply #8
made a mistake:

of course amplitude is not given trough the velocity of the density-distibution - the velocity is determinated by the medium.
amplitude is given through the pressure of the density-distibution provoked by the higher oscillating source.