ELI5 - How FLAC Works
Reply #9 – 2015-11-16 19:34:26
I've actually done some digging into how FLAC works, so I can give specific answers as to how FLAC is able to make things small. The easiest way it can reduce the file size is by recognizing when the LSBs are all zero and removing the unnecessary bits. (This is more common in high-resolution audio, where e.g. a 24-bit file may only have data in the top 20 bits.) After removing unnecessary bits from the LSB, the next step is stereo decorrelation. Someone else already gave a good explanation of that, so I won't go into detail. No compression happens in this step, but the mid/side representation is often easier to compress than the left/right representation, so the result is a smaller file. After LSB reduction and stereo decorrelation comes prediction. For each block, FLAC comes up with an equation (the "predictor") that can make a pretty good guess for the value of the next sample based on the values of the previous few samples. It subtracts the guess from the actual value of the sample to come up with the difference (the "residual"). It's similar to stereo decorrelation, except it's taking the difference between the prediction and the actual audio instead of the difference between left and right. Just like stereo decorrelation, no compression happens here. The final step is entropy coding. After the stereo decorrelation and prediction, the remaining data (the residual) should be mostly small numbers. This is where most of the compression happens: instead of using the same number of bits to store each sample's residual, FLAC uses a variable number of bits, where small numbers take fewer bits. Let's say you have some hypothetical audio format in base-10, where each sample is represented by nine digits. There's no spaces or punctuation in this format, but I'll space it out so you can read it. Here's what some audio might look like in this format:291781996 624029795 053942663 489114292 357264357 776028602 095338092 180003862 838432156 694551172 If prediction worked well, the residual is a bunch of small numbers. There's still no spaces or punctuation, so you still need nine digits for each number, even though they're all small numbers. Here's what the residual might look like:000000012 000000005 000000000 000000023 000000017 000000101 000000037 000000041 000000017 000000003 Entropy coding is just coming up with a clever way to say how many digits each number needs, so you can make small numbers take less space. There's still no spaces or punctuation, so instead I'll use another digit to say how many digits are in the number.2 12 1 5 0 2 23 2 17 3 101 2 37 2 41 2 17 1 3 Even if I remove all of the spaces, you can still figure out what each number should be. You can even try it yourself, if you'd like: 212150223217310123724121713 The entropy coding in FLAC works on a similar principle, although it operates in binary instead of base 10. Any questions?