Trying to reconstruct a headerless WAV file

2006-06-08 04:22:12

Hi everyone! I noticed that you people really know your stuff, so I brought my headache here for treatment

I have a SanDisk SDMX1-1024R voice recorder. It's not terrible, but it occasionally produces a headerless, unplayable file, which I've been storing away to mess with on a rainy day. It's quite a thunderstorm tonight.

Yesterday I ended up with a 37355354 byte file that appears to be just data, with no header. This seems the right size, or approximately the right size, for the data, so I hope nothing has been truncated. I have tried manually constructing a header for the file, but I've had no success in playing it back (the only sound is popping).

The files are Microsoft ADPCM WAVs (wFormatTag = 0x0002). There are no settings to change, so all the files produced by this device have similar headers, which are 90 bytes long. Here is the full header of a working file it produced, hex formatted:

Code: [Select]

52 49 46 46 52 88 00 00 57 41 56 45 66 6D 74 20 32 00 00 00 02 00 01 00 40 1F 00 00 00 10 00 00 00 01 04 00 20 00 F4 01 07 00 00 01 00 00 00 02 00 FF 00 00 00 00 C0 00 40 00 F0 00 00 00 CC 01 30 FF 88 01 18 FF 66 61 63 74 04 00 00 00 A0 09 01 00 64 61 74 61 00 88 00 00

This file is 34906 bytes long, of which 90 is header and 34816 is data.
I'm sure many of you are familiar with WAV header formatting, but I need as much help as I can get, so I'll break it down for everyone's (including my own) quick reference. ASCII letters are big endian, all other numbers are little endian.

Code: [Select]

52 49 46 46    ChunkID: The ASCII letters "RIFF".

52 88 00 00    ChunkSize: 0x00008852, that's decimal 34898. This is the size of the rest of the file, minus these first eight bytes for ChunkID and ChunkSize.

57 41 56 45    Format: The ASCII letters "WAVE".

66 6D 74 20    Subchunk1ID: The ASCII letters "fmt ".

32 00 00 00    Subchunk1Size: 0x00000032, or decimal 50, the size in bytes of the rest of this "fmt " subchunk.

02 00          wFormatTag: 0x0002. This specifies that this WAV file is WAVE_FORMAT_ADPCM.

01 00          nChannels: 0x0001 for mono.

40 1F 00 00    nSamplesPerSec: 0x00001F40 or decimal 8000 samples per second.

00 10 00 00    nAvgBytesPerSec: 0x00001000 or decimal 4096. I don't understand yet why it's this number and not decimal 4000.

00 01          nBlockAlign: 0x0100 or decimal 256. I don't completely understand it, but I have a reference that states: "Playback software needs to process a multiple of <nBlockAlign>  bytes of data at a time, so that the value of <nBlockAlign>  can be used for buffer alignment."

04 00          wBitsPerSample: 0x0004 for 4 bits per sample in ADPCM.

20 00          cbSize: 0x0020 or decimal 32, the size in bytes of the following extended information specifically relating to the ADPCM format.

F4 01          nSamplesPerBlock: 0x01F4 or decimal 500. I think that this means there are 500 samples stored across each 256 byte block. Note that the ratio of nSamplesPerBlock to nBlockAlign is 1.953125, the same as the ratio of nSamplesPerSec to nAvgBytesPerSec. 

07 00          nNumCoef: 0x0007, the number of coefficient sets used in encoding.

00 01 00 00    The first coefficient set, 0x0100 and 0x0000, decimal 256 and 0. Note that these and all the coefficient sets are signed values.

00 02 00 FF    0x0200, 0xFF00. decimal 512, -256.

00 00 00 00    0x0000, 0x0000. decimal 0, 0.

C0 00 40 00    0x00C0, 0x0040. decimal 192, 64.

F0 00 00 00    0x00F0, 0x0000. decimal 240, 0.

CC 01 30 FF    0x01CC, 0xFF30. decimal 460, -208.

88 01 18 FF    The seventh and last coefficient set. 0x0188, 0xFF18. decimal 392, -232. 

66 61 63 74    Subchunk2ID: The ASCII letters "fact".

04 00 00 00    Subchunk2Size: 0x00000004, indicating that the rest of the contents of this subchunk will take up 4 bytes.

A0 09 01 00    Subchunk2Content: 0x000109A0 or decimal 68000. My reference says this "specifies the time length of the data in samples."

64 61 74 61    Subchunk3ID: The ASCII letters "data".

00 88 00 00    Subchunk3Size: 0x00008800 or decimal 34816. This is end of the WAV header and it contains the number of bytes in the rest of the file (the actual data). Note that the ratio of Subchunk2Content to Subchunk3Size is 1.953125, the same ratio seen before, the ratio of nSamplesPerBlock to nBlockAlign and the ratio of nSamplesPerSec to nAvgBytesPerSec.

The only values that change in the headers of this device's WAVs are ChunkSize (at offset 4-7), the "fact" content I'm calling Subchunk2Content (at offset 78-81), and the Subchunk3Size (at offset 86-89). Again, this final variable is simply the size of the data in bytes, and other two variables should be derivable from it. In decimal, one can add 82 to Subchunk3Size to get ChunkSize, and one can multiply 1.953125 with Subchunk3Size to get Subchunk2Content.

So, again, I have this headerless file that is 37355354 bytes long (0x0239FF5A). That would give a value of 37355436 (0x0239FFAC) for ChunkSize. A problem shows up when deriving Subchunk2Content. 37355354 * 1.953125 = 72959675.78125 which is not an integer. I tried to wing it anyway, rounding down to 72959675 (0x045946BB) then rounding up to 72959676 (0x045946BC). This was useless, both values gave me just the popping sound.

You'll notice that 37355354 is not a multiple of the nBlockAlign, 256. So it seems that something has been truncated, or perhaps necessary paddings have not been added. I suspect I must bring the data block up to the next multiple, which would be 37355520.

If this were a straightforward uncompressed file, I would already be doing just that. But I don't understand how the compression works, how I must handle the coefficients, whether I can just start adding 00's, or what. Also I am confused by the popping sounds... I would expect that if I just needed to add padding to the end of the file, it would play fine until I got there. If I need to pad the beginning, are 00's sufficient or do I need to work with the coefficients?

So, can anyone help me, or steer me in a useful direction from here? Any pointers would be appreciated.

Trying to reconstruct a headerless WAV file

Reply #1 – 2006-06-08 05:53:00

The header says ADPCM right? In which case its not compressed.

Anyway, do you have matlab? You could read in the file as an array, and use the "wavewrite" function to repack it in a new container. If not, theres probably other programs out there that can handle raw PCM.

Trying to reconstruct a headerless WAV file

Reply #2 – 2006-06-08 07:21:05

ADPCM is not raw PCM, it actually is lossy compressed.

Trying to reconstruct a headerless WAV file

Reply #3 – 2006-06-08 07:24:35

ADPCM compression is not lossy. Sure, it might have been created lossily from a higher-bitdepth source, but ADPCM is no more lossy than 8bit bitmaps.

Trying to reconstruct a headerless WAV file

Reply #4 – 2006-06-08 07:46:53

Well, yeah, it's a kind of encoding. Anyway, purecane - SoX can handle raw files (including several kinds of ADPCM), perhaps you'll have some luck with it.

EDIT: I just did a little test - I recorded an MS ADPCM encoded WAV file, stripped the header with RIFFStrip, renamed it to .vox (which is a headerless ADPCM format), and tried to convert it with SoX, resulting file was full of noise but I could hear the original recording through it. VOX uses slightly different ADPCM encoding than MS it seems, but maybe the resulting file's header would be of use for you...

Trying to reconstruct a headerless WAV file

Reply #5 – 2006-06-08 08:59:32

You generally need to specify all the parameters and encoding type with headerless files.

Trying to reconstruct a headerless WAV file

Reply #6 – 2006-06-08 12:47:58

Thanks for the help so far. Mike Giacomelli, do you know if the MATLAB wavewrite function needs raw PCM or could it handle ADPCM?

rutra80, the .vox idea did work the way you described, although the result was a little bit terrifying. At least I can tell that the data really is intact down there. I'm getting a picture of what's happening here. The .vox format is Dialogic OKI ADPCM, and my reference indicates that this format uses a cbSize of 0, which means it's not storing coefficients in the file, and so maybe not using them at all (although they could be hardcoded in the codec instead). If sox doesn't expect them when we force the .vox format, then it's hit and miss, sometimes the coefficients are supposed to be 0 anyway so the results might be the same on certain samples. So some of the content comes through and is almost clear enough to understand.

Anyway, the .vox approach doesn't get close enough to finish with, so I added my "nearest guess" header:

Code: [Select]

52 49 46 46 AC FF 39 02 57 41 56 45 66 6D 74 20 32 00 00 00 02 00 01 00 40 1F 00 00 00 10 00 00 00 01 04 00 20 00 F4 01 07 00 00 01 00 00 00 02 00 FF 00 00 00 00 C0 00 40 00 F0 00 00 00 CC 01 30 FF 88 01 18 FF 66 61 63 74 04 00 00 00 BC 46 59 04 64 61 74 61 5A FF 39 02

and then made another attempt with sox:

Code: [Select]

sox -a -t .wav -r 8000 -c 1 infile -t .wav outfile

This threw thousands of warnings, one for each sample, I think, all identical:

Code: [Select]

sox: MSADPCM bpred >= nCoef, arbitrarily using 0

I haven't gone through the source code yet to see exactly what that's all about.

But upon listening, the results were similar to the .vox results, except with more pops and less white noise. Hints of the underlying content are coming through. I'm guessing these points are where "arbitrarily using 0" coincides with where 0 was the coefficient used.

I'm still expecting that I will have to pad up the file to a multiple of 256, and I'll have to play with actually encoding each byte correctly rather than just pushing in 00's, because the nature of ADPCM is that the sound from each byte depends on the previous byte, and the difference between the two. So arbitrary 00's are unfortunately wrong.

Notice