Hi everyone! I noticed that you people really know your stuff, so I brought my headache here for treatment
I have a SanDisk SDMX1-1024R voice recorder. It's not terrible, but it occasionally produces a headerless, unplayable file, which I've been storing away to mess with on a rainy day. It's quite a thunderstorm tonight.
Yesterday I ended up with a 37355354 byte file that appears to be just data, with no header. This seems the right size, or approximately the right size, for the data, so I hope nothing has been truncated. I have tried manually constructing a header for the file, but I've had no success in playing it back (the only sound is popping).
The files are Microsoft ADPCM WAVs (wFormatTag = 0x0002). There are no settings to change, so all the files produced by this device have similar headers, which are 90 bytes long. Here is the full header of a working file it produced, hex formatted:
52 49 46 46 52 88 00 00 57 41 56 45 66 6D 74 20 32 00 00 00 02 00 01 00 40 1F 00 00 00 10 00 00 00 01 04 00 20 00 F4 01 07 00 00 01 00 00 00 02 00 FF 00 00 00 00 C0 00 40 00 F0 00 00 00 CC 01 30 FF 88 01 18 FF 66 61 63 74 04 00 00 00 A0 09 01 00 64 61 74 61 00 88 00 00
This file is 34906 bytes long, of which 90 is header and 34816 is data.
I'm sure many of you are familiar with WAV header formatting, but I need as much help as I can get, so I'll break it down for everyone's (including my own) quick reference. ASCII letters are big endian, all other numbers are little endian.
52 49 46 46 ChunkID: The ASCII letters "RIFF".
52 88 00 00 ChunkSize: 0x00008852, that's decimal 34898. This is the size of the rest of the file, minus these first eight bytes for ChunkID and ChunkSize.
57 41 56 45 Format: The ASCII letters "WAVE".
66 6D 74 20 Subchunk1ID: The ASCII letters "fmt ".
32 00 00 00 Subchunk1Size: 0x00000032, or decimal 50, the size in bytes of the rest of this "fmt " subchunk.
02 00 wFormatTag: 0x0002. This specifies that this WAV file is WAVE_FORMAT_ADPCM.
01 00 nChannels: 0x0001 for mono.
40 1F 00 00 nSamplesPerSec: 0x00001F40 or decimal 8000 samples per second.
00 10 00 00 nAvgBytesPerSec: 0x00001000 or decimal 4096. I don't understand yet why it's this number and not decimal 4000.
00 01 nBlockAlign: 0x0100 or decimal 256. I don't completely understand it, but I have a reference that states: "Playback software needs to process a multiple of <nBlockAlign> bytes of data at a time, so that the value of <nBlockAlign> can be used for buffer alignment."
04 00 wBitsPerSample: 0x0004 for 4 bits per sample in ADPCM.
20 00 cbSize: 0x0020 or decimal 32, the size in bytes of the following extended information specifically relating to the ADPCM format.
F4 01 nSamplesPerBlock: 0x01F4 or decimal 500. I think that this means there are 500 samples stored across each 256 byte block. Note that the ratio of nSamplesPerBlock to nBlockAlign is 1.953125, the same as the ratio of nSamplesPerSec to nAvgBytesPerSec.
07 00 nNumCoef: 0x0007, the number of coefficient sets used in encoding.
00 01 00 00 The first coefficient set, 0x0100 and 0x0000, decimal 256 and 0. Note that these and all the coefficient sets are signed values.
00 02 00 FF 0x0200, 0xFF00. decimal 512, -256.
00 00 00 00 0x0000, 0x0000. decimal 0, 0.
C0 00 40 00 0x00C0, 0x0040. decimal 192, 64.
F0 00 00 00 0x00F0, 0x0000. decimal 240, 0.
CC 01 30 FF 0x01CC, 0xFF30. decimal 460, -208.
88 01 18 FF The seventh and last coefficient set. 0x0188, 0xFF18. decimal 392, -232.
66 61 63 74 Subchunk2ID: The ASCII letters "fact".
04 00 00 00 Subchunk2Size: 0x00000004, indicating that the rest of the contents of this subchunk will take up 4 bytes.
A0 09 01 00 Subchunk2Content: 0x000109A0 or decimal 68000. My reference says this "specifies the time length of the data in samples."
64 61 74 61 Subchunk3ID: The ASCII letters "data".
00 88 00 00 Subchunk3Size: 0x00008800 or decimal 34816. This is end of the WAV header and it contains the number of bytes in the rest of the file (the actual data). Note that the ratio of Subchunk2Content to Subchunk3Size is 1.953125, the same ratio seen before, the ratio of nSamplesPerBlock to nBlockAlign and the ratio of nSamplesPerSec to nAvgBytesPerSec.
The only values that change in the headers of this device's WAVs are ChunkSize (at offset 4-7), the "fact" content I'm calling Subchunk2Content (at offset 78-81), and the Subchunk3Size (at offset 86-89). Again, this final variable is simply the size of the data in bytes, and other two variables should be derivable from it. In decimal, one can add 82 to Subchunk3Size to get ChunkSize, and one can multiply 1.953125 with Subchunk3Size to get Subchunk2Content.
So, again, I have this headerless file that is 37355354 bytes long (0x0239FF5A). That would give a value of 37355436 (0x0239FFAC) for ChunkSize. A problem shows up when deriving Subchunk2Content. 37355354 * 1.953125 = 72959675.78125 which is not an integer. I tried to wing it anyway, rounding down to 72959675 (0x045946BB) then rounding up to 72959676 (0x045946BC). This was useless, both values gave me just the popping sound.
You'll notice that 37355354 is not a multiple of the nBlockAlign, 256. So it seems that something has been truncated, or perhaps necessary paddings have not been added. I suspect I must bring the data block up to the next multiple, which would be 37355520.
If this were a straightforward uncompressed file, I would already be doing just that. But I don't understand how the compression works, how I must handle the coefficients, whether I can just start adding 00's, or what. Also I am confused by the popping sounds... I would expect that if I just needed to add padding to the end of the file, it would play fine until I got there. If I need to pad the beginning, are 00's sufficient or do I need to work with the coefficients?
So, can anyone help me, or steer me in a useful direction from here? Any pointers would be appreciated.