Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec (Read 4628 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Hi, everybody.

I have been developing a (pretty useless, I know) audio encoder that stores data in images. This isn't "steganography", as it doesn't try to hide the data. It just uses images as its storage medium. I'm just doing this as a hobby. (Well, originally, I used this to send small ZIP files losslessly via Facebook because Facebook is free in my country, and I don't have paid internet, and I need to send small files--later on, I developed an ADPCM-based lossy codec so it can save audio lossily into images)

Here is the latest version if you want to try it out, but please don't bother.
https://sites.google.com/site/orthographiccube/home/software/bitmapper2
(Version 3 is coming soon, with a complete rewrite of the program from C# to Java + Java FX [it looks so gorgeous, and it works faster!] Currently, only the lossless data storing option is implemented in version 3 but the original lossy audio codecs should be implemented soon enough)

It works, but the dilemma here is that while the program saves PNG images (call it the FLAC [or WavPack, depends on your preference] of the image formats), the audio data should be able to survive when the PNG is converted to JPEG (the MP3 of the image formats). The audio data still survives the lossy conversion, and due to the ADPCM-based codec it uses, the distortion to the data is kept at minimum. Audible rumble and noise is present, but fortunately, the primitive DC offset correction in the decoder keeps this tolerable.

Anyway, the real problem is that (for example) Facebook resizes images smaller if they are uploaded too large. Resizing the image means that the file (image) header pixels are destroyed in the process, and audio samples are lost, making the image useless. So keeping the image small enough is first priority. This is achieved when I added mid-side stereo coding, which produces 2 separate images (the "mid" channel is playable separately but results in mono audio), AND the audio encoded is less than 4m30s and is in 8bit 16kHz sampling rate audio.

The problem is, well, 8bit isn't that bad, it's just a bit of noise, but I really need to do something about the 16kHz sampling rate limitation (not a limitation imposed by my program, but rather, by the image size allowed by (for example) Facebook) so I want to store more samples while retaining image size, or maybe, creating more images for one audio file. I am trying my best to avoid creating more images since it makes the format inconvenient for distribution, and I want to make the images incremental to decode (like, you don't need to have ALL the images to be able to hear the whole audio--just one image should be able to provide a decent sounding audio, but having more and more images will gradually improve the audio quality)

Currently, the program encodes a sample as a kind of ADPCM sample... with 5 different possible values per sample, one sample per pixel. This is not that bad since if JPEG alters a pixel too much, a stored value of 4 could become 3, and the data is decoded in an "analog electronics" manner, so it doesn't produce corrupt audio, just a bit of a "click" like a vinyl record (it isn't that audible, really). Using only two possible values per pixel make sure that JPEG will not alter the value in a way that a 1 will become 0 (or vice versa) but that's like... DPCM, the variant used by the Nintendo Entertainment System, and we all know how bad that sounds.

So I have several ideas in mind. I'm thinking of using more images (resulting in 4 images, so it saves stereo audio for 0 to 8kHz, and mono 8kHz to 16kHz) OR maybe there is some way I can use ADPCM with only 2 or 1 bit per sample... or maybe a completely new codec that will not result in corruption when a pixel gets modified.... I dunno.

Does anybody have any possible ideas? (I'll make sure to credit you if I ever release the software to the public :) )

Thanks!

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #1
If you upload a jpeg file of small dimensions/filesize, is that passed on unprocessed?

If so, perhaps you could dissect the jpeg format to figure out how a e.g. 640x480 pixel 200kB jpeg file is organized, what bits can be flipped while still being a legal jpeg file. I am guessing that you might get close to 200kB worth of entropy through such a file. Then it is only a matter of compressing your source audio using any regular codec into such a file size.

-k

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #2
Thank you very much for the reply.

Yes, I considered that already, but unfortunately, many sites, including Facebook, reencode the image to fit their image encoding settings, so I can't expect that the encoded image and decoded image are identical, both pixel-value-wise and binary-wise.

That's why the image decoder accounts for pixel changes and rounds off the brightness to the nearest color that it recognizes (there are 5 accepted values represented as 5 shades of grey) and even if the program rounds off the value to a wrong value (for example, the original 4 value, encoded to JPEG becomes 4.6, and becomes interpreted as 5) my ADPCM-based audio codec will only introduce a faint click (resulting in low-frequency rumble noises if wrong round-offs are frequent) without having the program crash or having undecodeable audio (try bitflipping random bits on an MP3 or Opus and see what happens :) )

However, learning about how JPEG compresses pixel data could help me optimize the image encoding in a way that more information is preserved. Thanks!

 

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #3
You can also try to add more payload and/or make it more robust using forward error coding, the other thing is that the type of compromise that you can make and the type of audio that you want to compress have a lot of influence in what you can use and in some situations even Codec 2 at 700bps can be an option. Also try to view some things about digital modulation for radio communications as it have a lot of things in commom of what you want to do.

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #4
Does anybody have any possible ideas?
Here's one: make, say, an Android application that can "scan" your image/sound files (from a display, or printed) and automagically plays them. Kind of like QR-codes, but not for text/links - for audio! :)

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #5
It's really sad that there are countrie where the net neutrality is violated in such a way, in the first place :-(

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #6
You can also try to add more payload and/or make it more robust using forward error coding, the other thing is that the type of compromise that you can make and the type of audio that you want to compress have a lot of influence in what you can use and in some situations even Codec 2 at 700bps can be an option. Also try to view some things about digital modulation for radio communications as it have a lot of things in commom of what you want to do.
Thank you very much for your reply.
As of now, lossless encoding mode (to store any file assuming it isn't too big) uses 8 bits every 1016 bits to store the checksum (custom algorithm) for those bits, and will report any mismatch.

For lossy audio encoding, I have discovered something.

Opus has gotten good enough, that encoding a 16kbps Opus whole song into a lossless mode image will have a size that does NOT exceed Facebook's limitation, while sounding a wee bit better than mono 16KHz images. The problem is, the resulting image will not be incremental.

The default lossy mode image allows having 1 image for mono audio and having the other image to form an image pair will let you hear stereo audio. Unfortunately, I don't see a way to spread the audio into 2 images without cutting the Opus in half. Also, for every row in the image, 8 bits (8 pixels) are used to set the "ADPCM jump parameter" thingy (I don't know how to call it LOL ) which gives the codec its "Adaptive" quality and improves audio quality, making it acceptable even if there are quiet and very loud parts in the song.

My goal is to store 32KHz sampling rate stereo audio in up to 4 images, while keeping the images incremental.

Still thinking of ways to achieve this :)

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #7
Does anybody have any possible ideas?
Here's one: make, say, an Android application that can "scan" your image/sound files (from a display, or printed) and automagically plays them. Kind of like QR-codes, but not for text/links - for audio! :)
That is an interesting idea, but unfortunately, while you may indeed see this as a kind of QR code, the resulting image is usually extremely dense, and conventional phone cameras will not be able to see the individual pixels. Also, lighting will play a very important role here. QR codes may be able to store several characters reliably, but this project attempts to store whole files and music in images. It's a lot bigger :)

Re: Idea for a Corruption-Tolerant Very-Low-Bitrate Audio Codec

Reply #8
Thank you very much for your reply.
As of now, lossless encoding mode (to store any file assuming it isn't too big) uses 8 bits every 1016 bits to store the checksum (custom algorithm) for those bits, and will report any mismatch.

For lossy audio encoding, I have discovered something.

Opus has gotten good enough, that encoding a 16kbps Opus whole song into a lossless mode image will have a size that does NOT exceed Facebook's limitation, while sounding a wee bit better than mono 16KHz images. The problem is, the resulting image will not be incremental.

The default lossy mode image allows having 1 image for mono audio and having the other image to form an image pair will let you hear stereo audio. Unfortunately, I don't see a way to spread the audio into 2 images without cutting the Opus in half. Also, for every row in the image, 8 bits (8 pixels) are used to set the "ADPCM jump parameter" thingy (I don't know how to call it LOL ) which gives the codec its "Adaptive" quality and improves audio quality, making it acceptable even if there are quiet and very loud parts in the song.

My goal is to store 32KHz sampling rate stereo audio in up to 4 images, while keeping the images incremental.

Still thinking of ways to achieve this :)

One problem is that you are a little scarce about technical details, and how much data you stores in a image, for what you tell is possible that using your mode you can store a convincing 32kHz sample rate stereo in 3 images if you sacrifices hf stereo, basically the 0-8kHz audio is stereo and the 8-16kHz audio part is mono, this is a typical technique for lossy codecs.

Also after reading this: https://www.facebook.com/help/266520536764594 is feasible to store up to approx 95KB in a faux jpeg image and facebook don't recompress it, basically the image have the correct headers and possible little more things if you want it decodable.