Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: New FLAC compression improvement (Read 2583 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

New FLAC compression improvement

Hi all,

Last summer, I've been busy thinking of ways to improve on FLACs compression within the specification and without resorting to variable blocksizes, which are in the spec but might not be properly implemented on many decoders. I discussed this old post with its writer: https://hydrogenaud.io/index.php?topic=106545.50

He wrote a small python script to explore the idea of a integer least-squares solver. It's included as ils.py. I explored this idea, read a lot of literature and came up with a different solution.

The current FLAC LPC stage is a classic, textbook example so to speak. This method was designed for speech encoding. The ILS method SebastianG proposed tries to find a more optimal approach. While the current approach comes close to generating least-squares solutions, this could perhaps be improved by a more direct approach.

However, FLACs entropy encoder doesn't encode 'squared' values, it's cost is more linear. That is why I developed a iteratively reweighted least squares solver, which, by weighing, doesn't come to a least squares solution but a so-called least absolute deviation solution. A proof-of-concept is also attached as irils-calculate-improvement.py. It gives mostly equal results to the current approach on most material, but improves on certain, mostly synthetic material like electronic music.

I even got as far as implementing it in FLAC, which can be found here: https://github.com/ktmf01/flac/tree/irls However, this implementation doesn't perform as well as the python proof-of-concept, and as my free time ran out last summer, I haven't been able to improve on it.

Maybe some people here with love for both FLAC and math might be interested in looking into this. I'll look at this again when  I have some time to spare and a mind ready for some challenge  :D

I'd love to hear questions, comments, etc.
Music: sounds arranged such that they construct feelings.

Re: New FLAC compression improvement

Reply #1
Well, did you test it in real world scenarios and what were the results? Don't just drop this here and go away :)
Error 404; signature server not available.

Re: New FLAC compression improvement

Reply #2
That is a very good question. I probably didn't explain myself well enough in that regard.

The C code on github is not working properly I think. It isn't an improvement over the existing FLAC codebase. So, I haven't reached any real-world compression gains yet.

However, the Python proof-of-concept is promising, albeit with a caveat. To simplify the proof-op-concept, it only performs simple rice bits calculation, without partitioning, and it does not do stereo decorrelation. So, I can only compare single-channel FLAC files with rice partitioning turned off. This is why the gains in the proof-of-concept might be more than what would be achievable with rice partitioning and stereo decorrelation.

For most material, my FLAC files got about 0.2% smaller (which would be 0.1% of the original WAV size). In one case, with electronic music (Infected Mushroom - Trance Party) I got an improvement of 1.2% (which would be 0.6% of the original file size).

So, I think this is promising, but I haven't been able to achieve this in C yet.

Music: sounds arranged such that they construct feelings.

Re: New FLAC compression improvement

Reply #3
Hi all,

Last week I've had some spare time and a mind ready for a challenge, so I took a look at the code (https://github.com/ktmf01/flac/tree/irls). Sadly, it seems it still needs quite a bit of work before it performs as well as current FLAC. I'll share my findings here, so I can read them back later and for anyone interested.

How FLAC has worked since the beginning
So, FLAC uses LPC (linear predictive coding) to predict a sample based on previous samples, and stores the error/residual with a so called rice code. Finding a good predictor is done by modelling the input samples as autoregressive, which means there is some correlation between the current sample and past samples. There is a 'textbook way' of calculating the model parameters (which we'll take as the predictor coefficients) by using the Yule-Walker equations. These equations form a Toeplitz matrix, which can be solved quickly with Levinson-Durbin recursion. While there a few shortcuts taken, this should result in a solution which is close to a least-squares solution of the problem, and is very fast to compute.

What could be improved about this
While this is all good and fun, the predictor resulting from this process is not optimal for several reasons. First, the process minimizes the square error, while for the smallest file size, we want the shortest rice code. Second, as FLAC input is usually audio and not some steady-state signal, the optimal predictor changes over time. It might be better to ignore a (short) part of the input when trying to find a predictor. In other words: it might be better to find a good predictor for half the signal, then a mediocre predictor for all of the signal. Third, minimizing the square error puts an emphasis on single outlier samples, which messes up the prediction of all other samples, while this single sample will not fit in any predictor at all.

What has been already improved
Ignoring a short part of the signal is exactly what a patch I submitted a few years ago does. It added partial_tukey and punchout_tukey windows/apodizations, which ignore part of the signal. This is a brute-force approach, the exact same approach is tried on every block of samples.

What I have been trying last year
Now, last summer, I wanted to try a new approach in finding a predictor. This started out as a different way to find a least squares solution to the problem (without taking shortcuts), but I started to realize least squares is not equal to smallest rice code. As the rice code takes up the most space in a FLAC file, that should be the ultimate aim if I want compression gains. I figured that compared to a least squares solution for a predictor, using a least absolution deviation (LAD for short) solution is more resistant to 'outliers'. As these outliers mess up prediction, I thought this might work well, so I implemented code into FLAC to do this. This is done through iteratively reweighted least squares, or IRLS for short. This code works, but it is very slow and does not (yet) compress better.

What is wrong with this approach
After reviewing my code last week, I realise the LAD (least absolute deviation) IRLS implementation that I made is still not minimizing the rice code. For one, the rice code in FLAC is partitioned, which means that a large error at the beginning of the block can grow the rice code length differently than a large error at the end of a block for example. Depending on the rice parameter, a slightly larger error does not cost any extra bits at all (when the error is smaller than 2^(rice parameter - 1)), it might cost one extra bit (when the error is the same size as 2^(rice parameter)) or it might take a few bits (when the error is much larger than 2^(rice parameter)). I could not find any existing documented IRLS weighting (or a so called norm) that works with rice codes.

What I want to improve about it
So, the next step is to improve the IRLS weighting. This weighting procedure should ideally incorporate knowledge of the rice parameter (as this says something about whether bits can be saved or not) but this is not known during weighting. I think using a moving average* of the residual might be a good way to guess the rice parameter during the weighting stage. I could also use a partitioned average*, but as the number of rice partitions is not known beforehand, just as the rice parameter, the size of the partitions to average on will probably not match the rice partition size, and we might get strange behaviour on partition boundaries. With a moving average window the problem is similar (choosing the size of the window). The optimal window size correlates with the rice partition size, but this size is not known during the weighting stage, but at least this moving average doesn't 'jump' on arbitrary boundaries.

Using a moving average*, the weighting procedure can incorporate knowledge about which errors are considered too large to try and fix, and which are too small to bother with. If the error on a sample is much smaller than the moving average it can be ignored, and if the error on a sample is much larger than the moving average it can be ignored too. Only if the error is in a certain band, it should be weighted such that the next iteration tries to decrease this error.

In the current IRLS weighing scheme is least absolute deviation. To get from the default least squares to least absolute deviation, the weights are the inverse of the residual (1/r). For very small r, the weight 1/r is capped at some value. I can change this value depending on the moving average. Above this value, I will try using a weight of 1/r². The effect I think this will have is sketched in the image I have attached

If this doesn't work, maybe a moving average can instruct the algorithm to ignore a block. This is how the partial_tukey and punchout_tukey work, but here the method is not applied brute-force, but with some knowledge of the samples itself. If the moving average is much larger than the average of the whole block, that part is ignored. This way, short bursts of outliers (for example, the attack of a new tone) can be ignored in the predictor modelling.

*when I say average, I mean average of the absolute value of the residual from the last iteration.

X
Music: sounds arranged such that they construct feelings.

Re: New FLAC compression improvement

Reply #4
The last few days I've been busy working on this, and there is some progress. Sadly, the code is very slow, and I don't think this will improve much. This improvement fits a certain niche of FLAC users who want maximum compression within the FLAC format and don't care how long encoding takes, perhaps reencoding in the background.

Attached you'll a 64-bit Windows executable compiled by MinGW with -march=native on a Intel Kaby Lake-R processor and a PDF with a graph of my results. I haven't used -march before, but if I understand correctly, this code should run on fairly recent 64-bit CPUs with AVX2 and FMA3. To make testing a little easier, I have added a new preset, -9, which uses the new code.

The following graph shows the results of encoding with (from left to right) setting -5, -8, -8e, -8ep and finally -9.


The new preset is -8e + a new 'apodization function', irls(2 11). It is technically not an apodization, but in this way, the integration into the FLAC tool is pretty clean. The function has two parameters, the number of iterations per LPC order, and the number of orders. So, with irls(2 11), the encoder does two iterations at order 2, two iterations at order 3, all the way to 2 iterations at order 12. Sadly, there is still something wrong in the code, so I would recommend not using anything else then 11 order at this time. Using more iterations is very well possible, and gives a little gain at the cost of much slowdown.

Last time I improved FLAC compression the improvement was about 0.2% with no slowdown. This one is another 0.2%, but at the cost of a 60x slowdown.

NOTE: Please use the attached binary with caution! There has little testing been done on it!
Music: sounds arranged such that they construct feelings.

Re: New FLAC compression improvement

Reply #5
Not convinced of the usefulness, but interesting yes. Do you have any idea of what orders actually make for the improvement?

But I am curious why your graph wasn't visible here. Hotlinking forbidden?  Anyway, it is at
http://www.audiograaf.nl/misc_stuff/irls-compression.png if any other HA kittehs should think the same:

High Voltage socket-nose-avatar

Re: New FLAC compression improvement

Reply #6
Very interesting stuff but even with a 5900x this is a hard nut.
flac-native -9 indeed compresses the albums i tried better as CUEtools flake -8 while this is slightly better on average as flac -8 ep.
The speed is to slow for my taste to consider it for use in my workflows but it is a very nice idea.
Many thanks for the effort and the working version :)
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: New FLAC compression improvement

Reply #7
To late to edit the above, sorry. I did configure my frontend wrong by keeping -ep in the command.
The above counts for flac-native -9 -ep
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: New FLAC compression improvement

Reply #8
But I am curious why your graph wasn't visible here. Hotlinking forbidden?  ...
Graph shows for me, so I suspect it's something in your browser settings preventing it being shown. It's happened to me before over the years.

Re: New FLAC compression improvement

Reply #9
Some unscientific numbers for 29 random CD format albums:

CUEtools flake -8
7.591.331.615 Bytes

flac-native -8
7.623.712.629 Bytes

flac-native -9
7.586.738.858 Bytes

flac-native -9 -ep
7.581.988.737 Bytes
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: New FLAC compression improvement

Reply #10
The hotlinking is broken because the link is http and the forum is https, and any browser which enforces security rules will block http resources, rather than downgrade the page security.

Re: New FLAC compression improvement

Reply #11
Some unscientific numbers for 29 random CD format albums:
[...]
I was shocked to see that the difference between flake and this new LPC analysis method was so small, but I just realised that is probably because flake uses a smaller padding block by default. I tried to get CUEtools.Flake working here, but for some reason I can't. Wombat, can you check whether padding is indeed smaller with CUEtools.Flake? If there are no tracks longer than 20 minutes in your test, the difference should be 4096 bytes per track.
Music: sounds arranged such that they construct feelings.

Re: New FLAC compression improvement

Reply #12
You can use my old compile. If you want to use the encoder from the recent CUEtools download you need to copy the additional file Newtonsoft.Json.dll
Not to sure about padding. With foobars Optimize file layout + minimize i get this for a single album:
Total file size decreased by 29953 bytes for the CUEtools -8 version and Total file size decreased by 62721 bytes for the flac-native -9 version. I doubt this is much.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: New FLAC compression improvement

Reply #13
8pe vs 9  (+0.14% compression gain on 1 album).
~55-60x vs ~20x on my laptop with 6 cores/12 threads.

As for me, 9 is ok.

P.S. 9pe brings +0.08% compression gain at ~16x comparing to -9. Ok too.

Re: New FLAC compression improvement

Reply #14
You can use my old compile.
[...]
Not to sure about padding. With foobars Optimize file layout + minimize i get this for a single album:
[...]
You are right. I did a comparison and CUEtools.Flake does a much better job than regular flac. It seems my time would have been better spend figuring out why CUEtools.Flake compressed so much better, but perhaps (hopefully) these gains stack, or my work will be for nothing.

Anyway, I've also rewritten my code to be able to use irls as postprocessing. So, when FLAC has found the best predictor in the normal way (through the regular LPC process, with the partial_tukey and punchout_tukey windows), it will try using that result as a starting point for IRLS, instead of starting over. In the graph this is called irlspost, with the number of iterations between round brackets. Using current -8e + some 4 extra windows (punchout_tukey()) like CUEtools.Flake does, and adding 3 iterations comes close to using -9 but is 4x faster.

X
Music: sounds arranged such that they construct feelings.

 
SimplePortal 1.0.0 RC1 © 2008-2021