No gain? It's impossible! For wave data frequency representation is much more efficient than byte-by-byte.
It must give some gain anyway! I compared files compressed using MP3 128 kbps with original using WaveLab's comparison function: differences are rather subtle (at least for 16-bit data).
I think that "bad quality" audio can be used as a point of reference very efficiently.
I dunno, maybe it was the stupid idea indeed. But noone tried. Maybe there something is yet.
Ivan Dimkovic is right about information entropy.
Lossless audio encoders are a very interesting case here - as they can losslessly compress a sound file into a smaller number of bits than it's information entropy predicts.
While this sounds impossible, it's not as some of the information contained in the waveform is actually contained in the decoding algorithm.
Thus lossless audio compressors will do very badly with the vast majority of waveforms, but very well with ones that represent sensical audio data.
By the way, if you could make an algorithm which reduces any data by just 1 bit, you could also use it several times, and therefore compress anything down to zero bits
Wrong ! The reason why you can usually compress a sound file, is because PCM coding needs more bits than what the actual entropy of the data would require.
A 1-bit compressed file contains just the information: "YES, it is the exact data which is known to the algorithm". Unfortunately it can only be done with one set of input data. And it will expand all other possibilities by 1 bit (1st bit, which will say "NO, this it something else").When you take more space for the algorithm you can make it more clever, but still, all data-dependant information will be in the compressed file. Otherwise you cannot unpack the files.
The result is not so far from the original, compared to the bitrate.
QuoteWrong ! The reason why you can usually compress a sound file, is because PCM coding needs more bits than what the actual entropy of the data would require.I didn't know that and find it very interesting. Do you have any references? I am not doubting you, I would just like to know more.
QuoteThe result is not so far from the original, compared to the bitrate.The result isn't very far, percuptually, from the original - but that doesn't mean that there isn't still a huge amount of data in the difference file. Some of the frequencies in the original will not reduce substantially in amplitude if the encoded file were subtracted from the original.
I know but IMO it would be nice to make a few tests to ensure this is a complete wrong way.
I agree completely. However, your 1 bit files contains, obviously, precisely one bit of information (in the Shannon sense). However the original contained more - which shows that the compressor decreased the information entropy of the original file. This information can't, by definition, dissapear - it's stored in the algorithm itself. There is as much (or more) information in (Algorithm+Compressed File) as in (Original File). If there wasn't then there couldn't be a 1->1 mapping between compressed files and uncompressed files.
It is impossible to make a compression algorithm that can compress random data where the resulting compressed data plus the size of the algorithm is smaller than the original data.
Steve Tate <firstname.lastname@example.org> suggests a good challenge for programsthat are claimed to compress any data by a significant amount: Here's a wager for you: First, send me the DEcompression algorithm. Then I will send you a file of whatever size you want, but at least 100k. If you can send me back a compressed version that is even 20% shorter (80k if the input is 100k) I'll send you $100. Of course, the file must be able to be decompressed with the program you previously sent me, and must match exactly my original file. Now what are you going to provide when... er... if you can't demonstrate your compression in such a way?So far no one has accepted this challenge (for good reasons).Mike Goldman <email@example.com> makes another offer: I will attach a prize of $5,000 to anyone who successfully meets this challenge. First, the contestant will tell me HOW LONG of a data file to generate. Second, I will generate the data file, and send it to the contestant. Last, the contestant will send me a decompressor and a compressed file, which will together total in size less than the original data file, and which will be able to restore the compressed file to the original state. With this offer, you can tune your algorithm to my data. You tell me the parameters of size in advance. All I get to do is arrange the bits within my file according to the dictates of my whim. As a processing fee, I will require an advance deposit of $100 from any contestant. This deposit is 100% refundable if you meet the challenge.
I agree completely. However, your 1 bit files contains, obviously, precisely one bit of information (in the Shannon sense). However the original contained more - which shows that the compressor decreased the information entropy of the original file. This information can't, by definition, dissapear - it's stored in the algorithm itself. There is as much (or more) information in (Algorithm+Compressed File) as in (Original File). If there wasn't then there couldn't be a 1->1 mapping between compressed files and uncompressed files.Suppose I copy all my MP3s (and OGGs, MPCs, etc) into a big database indexed by their MD5 sum. Then I replace all my MP3 files with 128 bit binary files containing this sum. I have "compressed" my MP3 collection to a couple of hundred kilobytes. However that doesn't mean that they would be any good without the database (part of the algorithm), which runs to tens of gigabytes.