Skip to main content

Topic: 2-bit per sample ADPCM variant? (Read 837 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
2-bit per sample ADPCM variant?
I honestly do not know where to post, but since WavPack's algorithm is a bit close to ADPCM, I thought of asking it here. If I am in the wrong place, or I shouldn't have even asked, then please forgive me and just delete this post.

See, I'm not really doing anything particularly interesting, but I am trying to experiment with my own audio codec (that is somewhat very specialized in purpose--it's too long to explain, if you really need some background, then please check this: This is what I'm up to. It's already working and the sound quality for 2.32 bits per sample [32KHz sampling rate {partially} stereo] is already quite acceptable. An example of 2.32 bits per sample (a total of 248kbit/s once compressed lossily with JPEG) encoded by the latest alpha version is attached.

I figured that ADPCM might be more suitable for my purpose, instead of having me reinvent the wheel which results in the current sound quality, but ADPCM's 4-bits per 16-bit sample is still a bit too high for my intention. I would like to ask if there is a possibility to adapt ADPCM so that it may work with 2-bits per 8-bit sample. I honestly do not care much about the resulting audio quality, as long as the result is identifiable and the data is not very sensitive to corruption (for example, replacing some bits on an Opus file will trigger either silence or make the decoder skip all corrupted frames [I don't know, I never tried], but replacing some bytes on a WAV file will only result in audible clicks or otherwise non-harmful distortion. I would prefer that the codec reacts like WAV to corruption.)

Can anyone please at least give me hints, or something?


  • polemon
  • [*][*][*][*]
Re: 2-bit per sample ADPCM variant?
Reply #1
2.32bits/sample? so there's dithering involved, etc?

Btw. this is really interesting!

Re: 2-bit per sample ADPCM variant?
Reply #2
Actually, no, the program does not dither at all, it outputs 8-bit samples (I did think about supporting 16-bit audio but the codec itself is pretty noisy because of the limitations I'm working with, so I gave up,  thinking bitdepth isn't the problem here, but the limitations of the codec and JPEG compression).  It's very close to delta PCM encoding but with variable and adaptive jump size. If you try to encode an audio file and look at the very right of the image, you will see binary (either white or black--no greys) pixels--that's the jump size for that frame (the row of pixels) that is computed most optimal for that frame by the program. Smaller than optimal jump size produces less punch and muffled sounds, while higher than optimal jump size produces really noisy audio. You are allowed to change the encoder's tendency to choose higher or smaller than optimal jump sizes using the slider at the settings :)

Although unfortunately, even if I implement dithering, the data integrity errors that JPEG compression will introduce to the image will produce noise anyway, so I don't think that will improve the audio quality. I mean, WavPack at 145kbps performs transparently, while my program at  248kbps (JPEG compressed) is like an out of tune analog FM radio LOL Not encoding the image in JPEG shows the true quality of the audio codec, but really, it's still worse than WavPack.

The main problem I'm facing is data integrity. JPEG can and will alter the values of the pixels, so my codec can't just sit around and expect that the bits it writes into the image will be the same when it reads it back. WavPack performs really well at 145kbps because (slight) alterations to the audio stream is completely not expected, and I still haven't tested yet how WavPack reacts to modified samples--I think it'll just skip encoding of that frame. My codec on the contrary tries to continue decoding while making the error sound as unnoticeable as possible.

Some codecs like DSD produce high bitdepth output even though it stores data as 1bit per sample because it uses ridiculously high sample rate in the megahertz. My program stores 32KHz stereo 8bit audio as 32KHz mid-side 2.32bit audio so... yeah. It's kind of like, my own implementation of ADPCM.

I am honestly very happy to see someone interested in this pretty useless project of mine. I originally made this program to send small files via Facebook Messenger (which is free in my country) back when I didn't have a proper internet connection and only a mobile phone (Android FB Messenger can't send files--just images). A bit of research and countless failures resulted in having an aditional lossy audio encoding mode so music can be uploaded into Facebook (which does not allow music). In the end, I developed 4 audio codecs for storing audio in images that can sustain the changes that JPEG compression will make to the image. A linear codec, which directly maps 8-bit samples to the pixel brightness (once JPEG compressed, most high frequencies are lost) , a logarithmic codec (a complete failure), a DPCM-based codec (better, but not that great), and finally, this ADPCM-like codec. Then support for stereo was added (the two images produced can be played together to get the stereo audio, or just one image to get mono), then finally, currently being implemented is support for incremental 32KHz audio (so you get 16KHz sample rate stereo if you have two images, but 32KHz sample rate partially stereo if you have 4 images. By partially, I mean frequencies above 8KHz will be mono to conserve the amount of pixels used.)

All I'm trying to do here is improve the sound quality while maintaining the amount of pixels used per sample (which is currently 1 pixel per sample)--or maybe even reducing the amount of pixels used per sample.
  • Last Edit: 11 January, 2018, 09:23:05 AM by OrthographicCube

  • j7n
  • [*][*][*][*][*]
Re: 2-bit per sample ADPCM variant?
Reply #3
CoolEdit Pro includes a 2-bit IMA ADPCM codec. The format is apparently decodable with ffmpeg. But it is not exactly error tolerant. A bit error usually causes a wild swing of volume level into clipping until the codec resets itself on next block.

Re: 2-bit per sample ADPCM variant?
Reply #4
That's awesome, my program also does that exact same thing when encountering data corruption, although in this case, the original 8-bit data is decoded as 64-bit integers, so clipping is not an issue, and the program automatically fixes the "swinging of volume level" internally, so the output does not contain that artifact even if the data is mostly modified.
However, if you look closely at the output when severe data corruption is present and the song is relatively loud so that quantization patterns are visible, you can see how the program tries to cope with the "volume level swinging" (I have highlighted the pattern):

Here's how the waveform looks like if the fix is disabled and the source image is heavily corrupted by JPEG

Fortunately, due to how the program handles (and actually anticipates) data corruption, the result is only an audible but not annoying hiss, even with hundreds of data errors in a second.

I have included two example outputs of the program. First ("izayoi") is an example on how the codec sounds with exactly the same setting as the original example, just on a different song sample, but with tons of errors in this source decoded image because of JPEG compression, I guess up to 30% of the audio samples are not the original value that the encoder originally wrote. I have done some experiments to confirm that a significant number of pixels are corrupted, but I still haven't done any numerical computations.

Second ("venus") is the exact same song as the original post (which also contains tons of pixel corruption) but this time, the original output is NOT compressed with JPEG, so this is the exact audio that the encoder outputs, without any damage done by JPEG compression.

I'll check out that version of ADPCM. Although CoolEdit is proprietary. Any chance I can see working source code, or at least an explanation of how the 2bit version differs from the 4bit version? Sorry for being annoying XD Thanks!
  • Last Edit: 11 January, 2018, 10:50:42 AM by OrthographicCube

Re: 2-bit per sample ADPCM variant?
Reply #5
So I was able to implement a real ADPCM encoder/decoder on MATLAB using the code from but of course I adapted it for 2-bit encoded samples. Honestly, the 4-bit normal variant isn't that bad. Sure, it already shows signs of the worse to come on 2-bit per sample, but it's not that bad. However, once I implemented my own 2-bit version... wow. It sounds really bad, and the image still hasn't been touched by JPEG compression! It sounds worse than my own "variable" DPCM variant, the high frequencies are really painful to the ears, not to mention the inherent hiss is not hiss at all, it sounds like, I dunno, sandpaper!

Really, really bad. So I believe I will stick to my own codec, and maybe just fine-tune the numbers a bit more--who knows, I may achieve better audio just by changing the algorithm a bit.

Attached are samples of the same song, same part, featuring my own nameless encoder (not JPEG compressed) @ 2.32 bits per sample, and my 2-bit version of ADPCM (not JPEG compressed) @ 2 bits per sample. I don't need ABX tests to prove this, do I? I mean, it sounds too horrible to not notice!

Thanks for everybody's responses, I appreciate it!^_^

  • j7n
  • [*][*][*][*][*]
Re: 2-bit per sample ADPCM variant?
Reply #6
I agree that it sounds bad, and significantly worse at 32 kHz than at 44 kHz that I tried. Have you explored the possibility of using Opus at a very low bitrate but encoding the result losslessly at maybe 2 shades per pixel?

Re: 2-bit per sample ADPCM variant?
Reply #7
I did, but unfortunately, it doesn't work.

Here's the problem. Facebook not only converts uploaded images to JPEG, but also scales an image down if it gets too big.
Fortunately, my program does support lossless data storage options (black for 0, white for 1, plus 8-bit checksum for every 1016 bits), and storing Opus at 16kbps at less than 3:30 song length in an image indeed does fine even when uploaded to Facebook. Using #000000 (black) and #FFFFFF (white) is enough so that even if JPEG alters the pixels slightly, quantization can still round them back to their original values. However, we all know how bad 16kbps Opus sounds.

Unfortunately, storing 2 bits per pixel is not feasible due to JPEG compression--this time, if JPEG alters the image too much, quantization is not able to restore a pixel to its original value, so data corruption is introduced and if data that can't be altered, for example, an Opus stream, is indeed altered, very annoying audible corruption is produced.

My codec still works acceptably even though it stores 2.32 bits of compressed audio samples per pixel because it was made to not glitch too much even when under heavy corruption. However, while Opus was made to handle packet losses, it was not made to handle this much packet losses.

Also, my codec design was made so that you can get increasing quality the more "parts" of the audio you have. If you have 1 picture, you get 16kHz mono, with two pictures, you get 16kHz stereo, and with 4 pictures, you get 32kHz (partially) stereo audio. With Opus, you get 1 image that contains a 16kHz Opus stream--no further quality improvements possible. Ramping the kbps up, for example, to 48kbps, which is perfectly acceptable in this kind of application, would make the image too big, making Facebook resize it down, therefore destroying the data. Also, I can't just cut the Opus stream into, for example, two images, since if you have 1 image, you can't even listen to the entire song.

(Please note that I do no intend to actually post music on Facebook. Who would? This is only for educational purposes and as you may have already realized, is very useless in everyday application. However, I do upload images to Facebook set to private for testing, so that I may see what happens to the image once it is uploaded. All compressed samples of the original codec that I attached to my posts have been uploaded to Facebook (privately), compressed to JPEG by Facebook itself, downloaded from Facebook in JPEG form, then decoded. The samples are decoded as a complete set of 4 images so the best quality can be heard. However, complete audio can still be decoded even if you have just the base image.)