Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: New Lossy Audio Codec (Read 37494 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

New Lossy Audio Codec

While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.

So I wrote a shell script which does the following:
1) Takes a 10 second sample of an MP3 and converts it to 8bit 44100Hz raw PCM
2) Arranges the data into a square image and Jpegs it
3) Unjpegs it and converts back to raw PCM data
4) Creates a WAV from the raw sound

I used imagemagick and sox to perform all the necessary conversions.

Looking just at compression, JPEG performs very poorly compared to MP3. Obviously changing the JPEG quality factor made a big difference, but even at terrible quality the images were pretty large compared to the MP3.

We sat down, whipped out the abx program from the LAME source and very quickly decided that JPEG is not a great audio codec. At 95% quality the music was alright - similar quality to a 64kbps MP3. The music degraded quickly as we increased the compression. At 75% the music started sounding really horrible - with wierd artifacts unlike anything I had heard before. The samples were more or less recognisable up till about 20% quality factor, any less and we couldn't tell Al Dimeola from Springbok Nude Girls.

Several conclusions can be drawn from this test:
- Procrastination leads people to do all sorts of insane things
- MP3, Vorbis and the rest do all sorts of magic unrelated to just dumping data
- JPEG's habit of dividing an image into 9x9 pixel blocks produces some very strange artifacts, including what sounded like pre- and post-echos with up to a second delay

At this point some of the involved parties started blaming the fact that we were transcoding for the bad quality of the sound. Another student blamed my speaker cables. It was an interesting experiment. I was very surprised that the sound didn't come out completely mangled.

New Lossy Audio Codec

Reply #1
Why didn't you use original wav source?
I have absolutely no idea if it would have make difference, but doesn't JPG perform better with "smooth" rather than "dithered" data. Maybe the mp3 encoding makes the data more "dithered" (very unscientific description, but I'm tired, infact I don't know if I'm talking only BS), like it has some of the dc-coeffs 0...
Juha Laaksonheimo

 

New Lossy Audio Codec

Reply #2
JPEG performs very poorly on sound, because it's not continous across 8-pixel boundaries. Thus you get discontinuities every 8 samples, basically adding a square wave to your sound 

The more you compress, the more it becomes unusable.

You should definitely try JPEG2000 !!

New Lossy Audio Codec

Reply #3
THere was a program mention at HA probably about a year ago that did the same thing (converted wav's to jpg's and back).  I forget the name of the program, i have it on my computer but seeing that I'm at a buddy's house, i can't remember.  Maybe someone else remembers what i'm talking about.
"You can fight without ever winning, but never win without a fight."  Neil Peart  'Resist'

New Lossy Audio Codec

Reply #4
Have you tried the other way around? Compressing a picture using MP3?

New Lossy Audio Codec

Reply #5
Do you think the RIAA's searches include pictures? If so, could this be the future of trading music on P2P networks? I'm sure I'm not the first person to have this thought. Could you also use PNG?

New Lossy Audio Codec

Reply #6
PNG would be lossless, but would have much poorer compression than any lossless audio codec.  With JPEG compression, you could produce better compression by using a "blur" filter, but I imagine it would have the same (or similar) effect as a combination of a lowpass plus echoes added before and after displaced by a number of samples equal to the width of the image.
I am *expanding!*  It is so much *squishy* to *smell* you!  *Campers* are the best!  I have *anticipation* and then what?  Better parties in *the middle* for sure.
http://www.phong.org/


New Lossy Audio Codec

Reply #8
danchr asked above whether I had tried compressing an image using MP3. I hadn't yet, but I decided to try it. Turns out, LAME --alt-preset standard isn't an all-bad image encoder. Compression is a bit dissapointing, but the picture quality isn't bad at all.

I exported a picture I took of my dog to PPM, stripped the header then converted it to wav with sox. I ran lame on it then used sox to export the raw sound data. I needed to strip off a pile of bytes at the beginning (about 2 lines worth) then re-add the ppm header.

Lame came in with a average bitrate of 130, which created a 340KB file from a 921KB original. Compared to JPEG, it's pretty poor compression, but a 3-1 ratio isn't that bad. The output image is a little bit softer with odd aliasing artifacts on fine details. The colour saturation also seems to have been  increased.

Maybe with a carefully built image it will be possible to see stuff like pre-echo artifacts.

If anybody is interested you can get the input and output images from here:
Original Image
Output Image

Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

New Lossy Audio Codec

Reply #9
Quote
While studying for a Fourier Analysis test some of my flatmates and I were discussing how well JPEG would encode music. Since both lossy audio codecs (MP3, Vorbis, etc) and JPEG operate on the same basic idea (discarding unimportant data in the frequency domain) we decided it would be an interesting thing to test.


That simply won't work well - because audio coders exploit the irrelevancy according to the human psychoacoustics,  adding noise in frequency regions that are masked by outer-inner ear transfer and inner-ear processing.

Good audiovisual coders exploit the visual irrelevancy - so, you will end up with noise allocated in regions that do not correspond to psychovisual masking critereia.

New Lossy Audio Codec

Reply #10
Quote
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

You aren't kidding... are you?

New Lossy Audio Codec

Reply #11
No meaning to be offensive but I know more pleasant ways to waste my time

New Lossy Audio Codec

Reply #12
Quote
Quote
Needless to say, the fact that I use a generic power cord on my PC means the MP3 sounds terrible.

You aren't kidding... are you?

That really made me laugh

New Lossy Audio Codec

Reply #13
The artifacts in the dog picture are mostly horizontal lines. When the picture is converted to sound, is it scanned line after line ?

It's too bad that the sound is one dimentional while the picture is 2 dimentional.
This experiment with jpeg compression will mostly show the effects of the descanning-filtering-rescanning. I think that any other filter, as soon as it is a function of the neighborous pixels (blur, artistic effects...) would have given the same kind of sound artifacts.
The "pre and post echos up to one second delay" comes from the fact that you "listen" to the picture line after line. When one dot is blurred, it expands into the above and below lines of the picture, that are converted into sound data playing long before or long after the central dot.

You should get a fine pre/post echo effect applying a vertical motion blur on the picture instead of a jpeg compression 

New Lossy Audio Codec

Reply #14
Yes Roberto that was it.  Now I know I'm not crazy.  Well, not that crazy anyway.
"You can fight without ever winning, but never win without a fight."  Neil Peart  'Resist'

New Lossy Audio Codec

Reply #15
Gday..

just read this.. and it reminds me of a little util/prog.
called Camouflage.. it disguise the mp3. as jpeg.

it was a huge thing among streamload community
a couple years back.. still in use as far as i see.

and i can`t hear any "damage" on them

there is a few of those progs. and camouflage
is the best one..

for those who wan`t to try this out
http://www.freewaredownloads.de/cgi-bin/de...tail.cgi?ID=228
or do a google.
use a pic. template.. the compression adds the mp3 file
together with the pic. in a jpeg container..
with a option to add pwd. when uncamouflage the file..
you get the option to extract the mp3. or the pic..
the file becomes ca. 10Kb bigger..



New Lossy Audio Codec

Reply #16
That's just 'glueing' two files together, not encoding sound as jpeg  You are just hiding mp3 data inside a jpeg from what I remember?
< w o g o n e . c o m / l o l >

New Lossy Audio Codec

Reply #17
Yup. And from what I remember it's not limited to MP3 and JPEG. I think you could stuff pretty much anything in.

New Lossy Audio Codec

Reply #18
Gday..

@Mac.. i belive i wrote "reminds me"..   
i am totally aware of the fact that camouflage just
write a container with a different extension..
not encode/tranzcode the data..

New Lossy Audio Codec

Reply #19
actually i tried the other way round

  picture -> mp3 -> picture

i dont think i will spend too much time on that, but a short easy way to proof_of_concept:

  http://eugene.ath.cx/graphic2mp3/

ok ok... using mp3 onto the raw rgb data would have been better than to compress a tga header and fix up the resulting wav-data with a new tga header suxx...

but hey, u can see the image !!!!!!!!

but why on earth is it tuned around 180 degrees?!?!

anyways, was fun...

Eugene

New Lossy Audio Codec

Reply #20
ah, i forget the sizes :-(

tga (wav) size: 5760 kb

jpg size before mp3 conversion (90%) : 420 kb

jpg size after mp3 conversion (90%) : 1460 kb

  (as expected more entropy in the decodes mp3-> wav -> tga)

mp3 size : 992 kb (--alt-preset standard) 544 kb (--alt-preset 128)

  (as expected a compressor is better when it knows about the data to operate onto than a generic compressor or a compressor for such a different format)

Eugene

New Lossy Audio Codec

Reply #21
y not try splitting up each second of audio into bitmaps then making them into a avi and compress them with xvid (1pass 100%)
here the catch the bitmaps have to be nearly lossless copys of the original (~10) sec wave after being rebulit..... if someone could make the 10 bitmaps for me i could do the rest 
just a crazy idea using video codecs to store music.......(with minimal loss of data)
thankz, tuxp3
in short
10sec mp3 -> raw pcm audio -> (10x) bmp pix -> 10frame avi -> xvid 100% quality -> then back
(as little sound data loss as possible)

New Lossy Audio Codec

Reply #22
The problem is that if you write the audio file to the uncompressed image one line at a time left to right then your going to hit a JPEG block boundary every 8 pixels and you get 8 different parts of the audio within each block.

I think the best bet would be to walk through the image buffer following a Hilbert curve - that way you will get the highest correlation between samples that are close to each other in the audio and pixels that are close to each other in the image. You'll need to pad you data with zeros to make the image dimensions a power of 2.

New Lossy Audio Codec

Reply #23
Oh yeah, and do 8 bit audio and a monochrome image. I know JPEG only does colour images but you can convert to colour before compressing and back to mono after decompressing. The colour channels in the JPEG will compress down to virtually nothing.

Results will still be poor but will probably be the best you're going to get.

You can Google for a Hilbert Curve if you don't know what it is.

New Lossy Audio Codec

Reply #24
Also, if you want to feed a color picture to an audio codec, make sure to feed each color as a separate audio channel (24 bit -> 8-bit 3-channel). Should be relatively simple with a generic image editor, or if you can make the codec recognize 24 bits as three samples.