ELI5 - How FLAC Works

Topic: ELI5 - How FLAC Works (Read 11155 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

ELI5 - How FLAC Works

2015-11-14 08:28:48

I am curious about how FLAC works. Although it does compress the size of the files, it uses lossless compression. If it is indeed compressing the file, what exactly does it take out to make this happen, to make the file size smaller?

I’ve read the Wikipedia page on lossless compression, and I am afraid that it is still a bit too technical for me at my current level of understanding.

I am still a newbie at this. Perhaps you could use the Reddit.com method of ELI5? Which means Explain Like I’m 5 (https://www.reddit.com/r/explainlikeimfive/).

Thanks,

jcarruth

ELI5 - How FLAC Works

Reply #1 – 2015-11-14 10:25:53

Quote from: jcarruth on 2015-11-14 08:28:48

If it is indeed compressing the file, what exactly does it take out to make this happen, to make the file size smaller?

Let's take an easy example: the beginning and end of a file are usually silence. You can simply say: hey, the first second is silent. That takes you just a few bytes instead of saving the whole silent second. That is one of the compression strategies FLAC uses.

I think I can't explain the main strategy ELI5, but I'll try: The second strategy FLAC uses is prediction. It is like stock markets: when you see a history, you can usually predict where the next value will be at. FLAC saves a 'trend history' so to say. It can use this history to predict a value when playing the file back. This prediction is not always accurate of course (just like stock markets), so this has to be corrected to the real value. This is done with a 'residual' that is saved separately. This residual consists only of small numbers and can be saved with a smart mathematical algorithm called a Rice code. I really can't explain this so-called residual coding ELI5, sorry.

However, the prediction method assumes there is some sort of 'trend' in the file. If the file is highly random (i.e. with little trend) FLAC can't compress the file. In that case, FLAC will mark (part of) the file as incompressible, and just save it as-is with a small overhead. This is called verbatim encoding.

Hope that helps Lossless compression involves a lot of math, which can make it hard to understand.

ELI5 - How FLAC Works

Reply #2 – 2015-11-14 12:47:49

This isn't going to pass the ELI5 test, but here goes.

Another strategy is "stereo decorrelation".

Stereo sound has two separate channels of audio: left, and right. Quite often, the channels are very similar, because we don't really like to listen to music that's got completely different sounds in each ear. So, when the channels are similar ("correlated"), you can do a little trick, either with analog electrical circuitry or with math.

Instead of working with independent left and right channels, you can combine them such that one channel becomes the "mono" info and the other is the "stereo" info (or perhaps you want to think of it as the "stereo-making" info).

First, you mix the left and right together to get a "sum" channel, perhaps easier to think of as the average of the left and right. It's the simplest way of turning stereo into mono, and you can listen to this channel by itself and it will usually sound fine—nothing missing, at least, though there may be some weird sonic effects now and then.

At the same time, you can mix the left and right in a different way—just flipping one of them upside-down before taking the average—in order to get a "difference" channel. If you listen to the difference channel, it will sound weird, like a mono version where all the sounds that were close to the center are now very quiet or nonexistent.

So now instead of left and right (L/R), you have sum and difference. This is also known as middle and side (mid-side or M/S). The more similar ("correlated") the left and right were, the quieter the side channel will be. True mono material, with zero difference between the channels, will have a totally silent side channel, just a flat line, which as mentioned above is going to be very easy to compress, so your mono material can be stored in half the space. Even when the side channel is not completely flat, it's still hovering close to zero, with lots of small numbers which can be compacted more efficiently, saving you tons of space.

What's great about this mid-side representation is that it's completely reversible. You can get your left and right channels back from the mid and side just by doing more math. No loss.

Both lossless and lossy encoders make use of this technique whenever it is more efficient to do so. They can switch back and forth between left/right and mid/side, whatever compresses better, and at playback time any mid/side info is converted back to left & right. The lossy encoders (e.g. MP3 encoders in "joint stereo" mode) will actually do it in such a way that the mid channel gets to have most of the space ("bandwidth") so it's near-perfect, while the side channel gets less room and is lower-quality. The smarter lossy encoders will be very clever about doing this only when overall quality will be acceptable.

This is, more-or-less, how stereo works in broadcast radio, vinyl records, and the AAC+ streaming format. In analog mixing and amplification gear, this technique is also exploited in various ways for noise reduction.

ELI5 - How FLAC Works

Reply #3 – 2015-11-14 15:20:15

Quote from: jcarruth on 2015-11-14 08:28:48

I am curious about how FLAC works. Although it does compress the size of the files, it uses lossless compression. If it is indeed compressing the file, what exactly does it take out to make this happen, to make the file size smaller?

I’ve read the Wikipedia page on lossless compression, and I am afraid that it is still a bit too technical for me at my current level of understanding.

FLAC uses some of the same technology used with other data file compression tools like ZIP file compression to shrink data files by identifying redundant data and replacing it with shorter, unique symbols. Uncompression culls out the unique symbols and converts them back into the original data.

For example, English text is composed of words that could be spelled with fewer letters. Most humans are pretty good at identifying the shortened spellings and replacing them with the longer versions.

Lossless compression was developed and widely implemented on PC's during the early-mid 80s when CPU power and storage costs made it advantageous. These were the beginning days of digital audio, so curious souls experimented with lossless compression of audio signal. It was quickly found the due to its basic nature, audio data could only be compressed this way to a limited degree with the existing tools. FLAC was an attempt to determine if more specialized approaches would work.

Lossy compression was developed to overcome the limited compression that is possible using looseness techniques.

ELI5 - How FLAC Works

Reply #4 – 2015-11-14 21:19:17

Quote from: jcarruth on 2015-11-14 08:28:48

what exactly does it take out to make this happen, to make the file size smaller?

Lossless compression like FLAC does not remove any of the information that was encoded within the original file. It simply rewrites (and encodes) the information to a new data format that takes less space. But now, a FLAC decoder is necessary to read and play the file. The data within the original file is transformed, but the audio information which was being represented by that data is preserved completely.

ELI5 - How FLAC Works

Reply #5 – 2015-11-14 21:49:41

Quote from: jcarruth on 2015-11-14 08:28:48

I am still a newbie at this. Perhaps you could use the Reddit.com method of ELI5? Which means Explain Like I’m 5 (https://www.reddit.com/r/explainlikeimfive/).

Music isn't fully random, and so future values can be predicted from past values and from other channels (think stereo music where the instruments play in both channels, easy to guess what the other channel will sound like in that case). If you can predict a value exactly from other values, you don't have to transmit it. The decoder can just compute it. That is where the space savings come from.

ELI5 - How FLAC Works

Reply #6 – 2015-11-15 19:13:02

ktf, mjb2006, Arnold B. Krueger, trout, saratoga,

Thank you for your explanations. I think I have a much better grasp of how FLAC can make a file smaller without actually removing any music.

And thanks to mjb2006, I now have a little idea of how noise reduction works (as in noise-cancellation headphones? I love those.)

I can see that this is not easily explained to a 5-year-old, but you guys did pretty good breaking it down into easily-understandable language, and I appreciate that.

And Saratoga, I think even a 5-year old would have understood your explanation. Great!

Thanks to all of you.

ELI5 - How FLAC Works

Reply #7 – 2015-11-16 17:54:45

Nope, it is not easily explained to a 5-yr-old! Not easily explained to me, 58 years older than that!

Lossless compression does not compress the music: it compresses the data. Something like FLAC may be tweaked for the kind of data that is likely to occur in music rather than, say, in text files, but that is the only difference, and the only reason that you flac your music but zip your data.

As to the hows, we can easily understand that, if a particular bit sequence is repeated numerous times, we just substitute a shorter pattern for its subsequent occurrences. Beyond that, it is cryptography and maths ...and I got thrown out of the maths class.

Thought arising: I wonder how well flac works for general computer data. Did anyone ever try?

ELI5 - How FLAC Works

Reply #8 – 2015-11-16 18:33:00

Quote from: Thad E Ginathom on 2015-11-16 17:54:45

Thought arising: I wonder how well flac works for general computer data. Did anyone ever try?

Not very well, though I didn't write down any figures: https://www.hydrogenaud.io/forums/index.php...st&p=854411

ELI5 - How FLAC Works

Reply #9 – 2015-11-16 19:34:26

I've actually done some digging into how FLAC works, so I can give specific answers as to how FLAC is able to make things small.

The easiest way it can reduce the file size is by recognizing when the LSBs are all zero and removing the unnecessary bits. (This is more common in high-resolution audio, where e.g. a 24-bit file may only have data in the top 20 bits.)

After removing unnecessary bits from the LSB, the next step is stereo decorrelation. Someone else already gave a good explanation of that, so I won't go into detail. No compression happens in this step, but the mid/side representation is often easier to compress than the left/right representation, so the result is a smaller file.

After LSB reduction and stereo decorrelation comes prediction. For each block, FLAC comes up with an equation (the "predictor") that can make a pretty good guess for the value of the next sample based on the values of the previous few samples. It subtracts the guess from the actual value of the sample to come up with the difference (the "residual"). It's similar to stereo decorrelation, except it's taking the difference between the prediction and the actual audio instead of the difference between left and right. Just like stereo decorrelation, no compression happens here.

The final step is entropy coding. After the stereo decorrelation and prediction, the remaining data (the residual) should be mostly small numbers. This is where most of the compression happens: instead of using the same number of bits to store each sample's residual, FLAC uses a variable number of bits, where small numbers take fewer bits.

Let's say you have some hypothetical audio format in base-10, where each sample is represented by nine digits. There's no spaces or punctuation in this format, but I'll space it out so you can read it. Here's what some audio might look like in this format:

Code: [Select]

If prediction worked well, the residual is a bunch of small numbers. There's still no spaces or punctuation, so you still need nine digits for each number, even though they're all small numbers. Here's what the residual might look like:

Code: [Select]

Entropy coding is just coming up with a clever way to say how many digits each number needs, so you can make small numbers take less space. There's still no spaces or punctuation, so instead I'll use another digit to say how many digits are in the number.

Code: [Select]

Even if I remove all of the spaces, you can still figure out what each number should be. You can even try it yourself, if you'd like: 212150223217310123724121713

The entropy coding in FLAC works on a similar principle, although it operates in binary instead of base 10.

Any questions?

ELI5 - How FLAC Works

Reply #10 – 2015-11-16 22:05:22

Quote from: Octocontrabass on 2015-11-16 19:34:26

The easiest way it can reduce the file size is by recognizing when the LSBs are all zero and removing the unnecessary bits. (This is more common in high-resolution audio, where e.g. a 24-bit file may only have data in the top 20 bits.)

What FLAC does not catch (and neither does WavPack) seems to be if any MSB is zero. AFAIK, DTS on CD leave the two MSBs zero (in order not to blast static at full volume), and so obtaining a reduction of 12.5 percent would be easy if they implemented such a check. But FLAC and WavPack fall about a percent point short (or was it long?) . which is fairly decent. TAK is a point weaker. OptimFrog manages to get over 13 even in the fastest (uh... least slow) mode.

ELI5 - How FLAC Works

Reply #11 – 2015-11-17 04:23:32

this was an enjoyable read. threads like this are why i joined the site. and i love the flac format. i think it's my favorite format.

ELI5 - How FLAC Works

Reply #12 – 2015-11-17 04:47:16

MSB removal probably isn't in FLAC because it's not a very useful compression strategy. When prediction works well, which it does for most audio you'll want to compress, the residual will mostly have more zeroes in the MSB than the input signal. The "wasted" MSBs will remain zero after prediction and be removed by the entropy coder, so the difference between needing to represent 16 bits of range versus 14 will be almost nonexistent.

Adding a special case for white noise (and DTS) with unused MSBs means more work during encoding and decoding, and potentially worse compression on most audio due to the additional parameter that must be signaled.

ELI5 - How FLAC Works

Reply #13 – 2015-11-17 07:47:37

I konda understand why one did not want to provide special provisions for DTS on CD, but back then one could not know that it would die out. But there are good and bad arguments, and these are bad:

Quote from: Octocontrabass on 2015-11-17 04:47:16

Adding a special case for white noise (and DTS) with unused MSBs means more work during encoding and decoding, and potentially worse compression on most audio due to the additional parameter that must be signaled.

The operation of padding 14 bits with two zeroes at the beginning of the word, is how computationally complex you said? Compared with padding them at the end?
Not to mention that extra field populated with a "2" at the beginning of the file, that takes less space than my ReplayGain tags?

ELI5 - How FLAC Works

Reply #14 – 2015-11-17 21:36:01

Quote from: Porcus on 2015-11-17 07:47:37

The operation of padding 14 bits with two zeroes at the beginning of the word, is how computationally complex you said? Compared with padding them at the end?

The two are roughly equal in complexity, but the potential benefits are very different. Prediction tends to leave noise in the residual's LSBs, which can't be compressed by the entropy coder. Removing LSBs early has a very large effect on the compressed size.

I'm not sure if anyone has ever tried removing the MSBs before prediction. I'm doubtful it would have much of an effect on anything outside of contrived test signals (and DTS), since FLAC's entropy coding already adapts somewhat to the range and distribution of the residual.

I suspect the compression you see with OptimFROG is due to a better entropy coder rather than MSB removal; DTS isn't random, so a very good entropy coder (and predictor?) should be able to reduce it better than white noise.

Quote from: Porcus on 2015-11-17 07:47:37

Not to mention that extra field populated with a "2" at the beginning of the file, that takes less space than my ReplayGain tags?

LSB removal is done on individual blocks, not the whole file. MSB removal, if it were added to the format, would probably be the same. Even if it uses one bit per block (to signal the full bit depth), that's still 80 bytes per minute for CD audio at the "--best" compression level. That's reasonable if it turns out to be more useful than I've anticipated, but I'm not sure it'll be smaller than ReplayGain tags.

ELI5 - How FLAC Works

Reply #15 – 2015-11-18 09:50:25

Quote from: Octocontrabass on 2015-11-17 21:36:01

I'm not sure if anyone has ever tried removing the MSBs before prediction.

I was sure I had heard something related. TBeck mentions this on OptimFrog: https://www.hydrogenaud.io/forums/index.php...mp;#entry854974
I have no idea whether it searches for scalings less than 1, but it should be trivial to implement once it searches for "arbitrary".

Then I see it was only the experimental mode - but since then, there is a new froggie, which is the one I used now. Too lazy to repeat on an old.

Quote from: Octocontrabass on 2015-11-17 21:36:01

I suspect the compression you see with OptimFROG is due to a better entropy coder rather than MSB removal; DTS isn't random, so a very good entropy coder (and predictor?) should be able to reduce it better than white noise.

Potentially. I interpreted the fact that TAK at -p4m is not close to 12.5% as an indication on the opposite, as it is generally a good encoder. Then OTOH, as it is worse than FLAC on these files (which it generally is not), it could be an idiosyncracy affecting the entropy encoding art.

Quote from: Octocontrabass on 2015-11-17 21:36:01

LSB removal is done on individual blocks, not the whole file. MSB removal, if it were added to the format, would probably be the same.

There are potentially good reasons to implement wasted bits at file level, since there are "formats" that waste bits ("nobody" stores 20- or 22 bit signals in files less than 24), and that does not change mid-file. But I have to agree that since LSB removal is already considered worth implementing on individual blocks, it makes less sense to have one at file level.
But there are situations where peaks are not normalized at full. One is that when FLAC came around there was already this DTS-on-CD format that only uses the bottom 14; the other is if users with playback systems that do not support ReplayGain, convert everything to 24 bits with RG normalization. I just let fb2k search for %replaygain_track_gain% LESS -6 and it returns more than half of my CD collection. (New algorithm, EBU128.)

ELI5 - How FLAC Works

Reply #16 – 2015-11-18 11:53:46

Quote from: Porcus on 2015-11-18 09:50:25

But I have to agree that since LSB removal is already considered worth implementing on individual blocks, it makes less sense to have one at file level.

Doing this at file level is not a good idea: you lose the possibility to stream. Every parameter critical to properly decode a file has to be in the block header.

ELI5 - How FLAC Works

Reply #17 – 2015-11-18 12:39:46

Quote from: ktf on 2015-11-18 11:53:46

Doing this at file level is not a good idea: you lose the possibility to stream. Every parameter critical to properly decode a file has to be in the block header.

Good point. So, if I store a CD in a 24-bit file, then the device that receives the stream does not even (have to) know that the file is 24 bits?

ELI5 - How FLAC Works

Reply #18 – 2015-11-18 15:58:49

So much for ELI5

ELI5 - How FLAC Works

Reply #19 – 2015-11-18 17:35:41

How about this ELI5:
"I want this and this and this candy" can be also said as
"I want 3 of those candy".
But since candy is incredibly highly used in vocabulary, lets name it cdy and so on, which could eventually make
"3 cdy".

Which has little to do with audio, but still...

ELI5 - How FLAC Works

Reply #20 – 2015-11-18 20:49:37

Quote from: Porcus on 2015-11-18 09:50:25

Quote from: Octocontrabass on 2015-11-17 21:36:01
I'm not sure if anyone has ever tried removing the MSBs before prediction.

I was sure I had heard something related. TBeck mentions this on OptimFrog: https://www.hydrogenaud.io/forums/index.php...mp;#entry854974
I have no idea whether it searches for scalings less than 1, but it should be trivial to implement once it searches for "arbitrary".

Based on the description, it's just a more universal form of LSB removal, where it can also catch a signal that was multiplied by some arbitrary value instead of just a power of 2.

Quote from: Porcus on 2015-11-18 09:50:25

Potentially. I interpreted the fact that TAK at -p4m is not close to 12.5% as an indication on the opposite, as it is generally a good encoder. Then OTOH, as it is worse than FLAC on these files (which it generally is not), it could be an idiosyncracy affecting the entropy encoding art.

The entropy coder in FLAC is based around the assumption that the data it compresses will have a smooth distribution that favors values close to zero more strongly than values far away from zero. It can adapt somewhat based on how strongly the distribution peaks towards zero-values, but it's not very precise. TAK probably uses a similar entropy coder.

DTS doesn't have either of those properties: the distribution is not smooth (values used in block headers will be more common), and it doesn't favor values close to zero (aside from the block headers, the compressed data is close to random). I suspect OptimFROG is able to take advantage of the non-smooth distribution.

Quote from: Porcus on 2015-11-18 09:50:25

But there are situations where peaks are not normalized at full. One is that when FLAC came around there was already this DTS-on-CD format that only uses the bottom 14; the other is if users with playback systems that do not support ReplayGain, convert everything to 24 bits with RG normalization. I just let fb2k search for %replaygain_track_gain% LESS -6 and it returns more than half of my CD collection. (New algorithm, EBU128.)

Prediction usually moves those peaks down even further, so FLAC's entropy coder already adapts to situations where the peak is smaller than the maximum possible value. MSB removal would still improve compression for DTS-on-CD and less-than-full-scale uncompressible noise, since it could chop off the extra MSBs and store the remaining bits uncompressed, but most audio won't see any benefit. (Plus, DTS on CD isn't uncompressed audio, so it probably wasn't a target for FLAC's compression strategies.)

Incidentally, baked-in ReplayGain is one of the situations that TBeck's experimental encoding mode would be especially good for (as long as no dithering is applied).

ELI5 - How FLAC Works

Reply #21 – 2015-11-19 06:52:23

Quote from: Porcus on 2015-11-18 12:39:46

Good point. So, if I store a CD in a 24-bit file, then the device that receives the stream does not even (have to) know that the file is 24 bits?

I don't think I fully understand your question.

There are quite a few things in the block header, you can find the specification at http://xiph.org/flac/format.html#frame Stored are
- Blocking strategy (fixed or variable)
- Block size
- Sample rate
- Sample size (bit depth)
- Number of channels and stereo decorrelation algorithm

Then for each channel, you have a subframe header. The wasted bits flag (for zero LSBs) are in fact stored in the subframe header, so they can be different for each channel. The other thing stored in the subframe header is the subframe type (constant, verbatim, fixed predictor or LPC). The constant subframe is just one constant value, the verbatim is unencoded PCM, the fixed prediction holds warm-up samples and a residual, the LPC holds warm-up samples, predictor coefficients, coefficient precision, coefficient shift and the residual.

The residual in itself is the largest part of the block, and has its own header. This header has the residual type and the partition order. The partitions themselves start with an rice parameter and the rice-encoded residual itself follows.

Quite a lot of parameters to store!

Notice