HydrogenAudio

Lossless Audio Compression => FLAC => Topic started by: ktf on 2020-11-04 20:47:09

Title: New FLAC compression improvement
Post by: ktf on 2020-11-04 20:47:09
Hi all,

Last summer, I've been busy thinking of ways to improve on FLACs compression within the specification and without resorting to variable blocksizes, which are in the spec but might not be properly implemented on many decoders. I discussed this old post with its writer: https://hydrogenaud.io/index.php?topic=106545.50

He wrote a small python script to explore the idea of a integer least-squares solver. It's included as ils.py. I explored this idea, read a lot of literature and came up with a different solution.

The current FLAC LPC stage is a classic, textbook example so to speak. This method was designed for speech encoding. The ILS method SebastianG proposed tries to find a more optimal approach. While the current approach comes close to generating least-squares solutions, this could perhaps be improved by a more direct approach.

However, FLACs entropy encoder doesn't encode 'squared' values, it's cost is more linear. That is why I developed a iteratively reweighted least squares solver, which, by weighing, doesn't come to a least squares solution but a so-called least absolute deviation solution. A proof-of-concept is also attached as irils-calculate-improvement.py. It gives mostly equal results to the current approach on most material, but improves on certain, mostly synthetic material like electronic music.

I even got as far as implementing it in FLAC, which can be found here: https://github.com/ktmf01/flac/tree/irls However, this implementation doesn't perform as well as the python proof-of-concept, and as my free time ran out last summer, I haven't been able to improve on it.

Maybe some people here with love for both FLAC and math might be interested in looking into this. I'll look at this again when  I have some time to spare and a mind ready for some challenge  :D

I'd love to hear questions, comments, etc.
Title: Re: New FLAC compression improvement
Post by: itisljar on 2020-11-05 09:38:57
Well, did you test it in real world scenarios and what were the results? Don't just drop this here and go away :)
Title: Re: New FLAC compression improvement
Post by: ktf on 2020-11-05 17:16:55
That is a very good question. I probably didn't explain myself well enough in that regard.

The C code on github is not working properly I think. It isn't an improvement over the existing FLAC codebase. So, I haven't reached any real-world compression gains yet.

However, the Python proof-of-concept is promising, albeit with a caveat. To simplify the proof-op-concept, it only performs simple rice bits calculation, without partitioning, and it does not do stereo decorrelation. So, I can only compare single-channel FLAC files with rice partitioning turned off. This is why the gains in the proof-of-concept might be more than what would be achievable with rice partitioning and stereo decorrelation.

For most material, my FLAC files got about 0.2% smaller (which would be 0.1% of the original WAV size). In one case, with electronic music (Infected Mushroom - Trance Party) I got an improvement of 1.2% (which would be 0.6% of the original file size).

So, I think this is promising, but I haven't been able to achieve this in C yet.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-07 19:03:15
Hi all,

Last week I've had some spare time and a mind ready for a challenge, so I took a look at the code (https://github.com/ktmf01/flac/tree/irls (https://github.com/ktmf01/flac/tree/irls)). Sadly, it seems it still needs quite a bit of work before it performs as well as current FLAC. I'll share my findings here, so I can read them back later and for anyone interested.

How FLAC has worked since the beginning
So, FLAC uses LPC (linear predictive coding) to predict a sample based on previous samples, and stores the error/residual with a so called rice code. Finding a good predictor is done by modelling the input samples as autoregressive, which means there is some correlation between the current sample and past samples. There is a 'textbook way' of calculating the model parameters (which we'll take as the predictor coefficients) by using the Yule-Walker equations. These equations form a Toeplitz matrix, which can be solved quickly with Levinson-Durbin recursion. While there a few shortcuts taken, this should result in a solution which is close to a least-squares solution of the problem, and is very fast to compute.

What could be improved about this
While this is all good and fun, the predictor resulting from this process is not optimal for several reasons. First, the process minimizes the square error, while for the smallest file size, we want the shortest rice code. Second, as FLAC input is usually audio and not some steady-state signal, the optimal predictor changes over time. It might be better to ignore a (short) part of the input when trying to find a predictor. In other words: it might be better to find a good predictor for half the signal, then a mediocre predictor for all of the signal. Third, minimizing the square error puts an emphasis on single outlier samples, which messes up the prediction of all other samples, while this single sample will not fit in any predictor at all.

What has been already improved
Ignoring a short part of the signal is exactly what a patch I submitted a few years ago does. It added partial_tukey and punchout_tukey windows/apodizations, which ignore part of the signal. This is a brute-force approach, the exact same approach is tried on every block of samples.

What I have been trying last year
Now, last summer, I wanted to try a new approach in finding a predictor. This started out as a different way to find a least squares solution to the problem (without taking shortcuts), but I started to realize least squares is not equal to smallest rice code. As the rice code takes up the most space in a FLAC file, that should be the ultimate aim if I want compression gains. I figured that compared to a least squares solution for a predictor, using a least absolution deviation (LAD for short) (https://en.wikipedia.org/wiki/Least_absolute_deviations) solution is more resistant to 'outliers'. As these outliers mess up prediction, I thought this might work well, so I implemented code into FLAC to do this. This is done through iteratively reweighted least squares (https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares), or IRLS for short. This code works, but it is very slow and does not (yet) compress better.

What is wrong with this approach
After reviewing my code last week, I realise the LAD (least absolute deviation) IRLS implementation that I made is still not minimizing the rice code. For one, the rice code in FLAC is partitioned, which means that a large error at the beginning of the block can grow the rice code length differently than a large error at the end of a block for example. Depending on the rice parameter, a slightly larger error does not cost any extra bits at all (when the error is smaller than 2^(rice parameter - 1)), it might cost one extra bit (when the error is the same size as 2^(rice parameter)) or it might take a few bits (when the error is much larger than 2^(rice parameter)). I could not find any existing documented IRLS weighting (or a so called norm) that works with rice codes.

What I want to improve about it
So, the next step is to improve the IRLS weighting. This weighting procedure should ideally incorporate knowledge of the rice parameter (as this says something about whether bits can be saved or not) but this is not known during weighting. I think using a moving average* of the residual might be a good way to guess the rice parameter during the weighting stage. I could also use a partitioned average*, but as the number of rice partitions is not known beforehand, just as the rice parameter, the size of the partitions to average on will probably not match the rice partition size, and we might get strange behaviour on partition boundaries. With a moving average window the problem is similar (choosing the size of the window). The optimal window size correlates with the rice partition size, but this size is not known during the weighting stage, but at least this moving average doesn't 'jump' on arbitrary boundaries.

Using a moving average*, the weighting procedure can incorporate knowledge about which errors are considered too large to try and fix, and which are too small to bother with. If the error on a sample is much smaller than the moving average it can be ignored, and if the error on a sample is much larger than the moving average it can be ignored too. Only if the error is in a certain band, it should be weighted such that the next iteration tries to decrease this error.

In the current IRLS weighing scheme is least absolute deviation. To get from the default least squares to least absolute deviation, the weights are the inverse of the residual (1/r). For very small r, the weight 1/r is capped at some value. I can change this value depending on the moving average. Above this value, I will try using a weight of 1/r². The effect I think this will have is sketched in the image I have attached

If this doesn't work, maybe a moving average can instruct the algorithm to ignore a block. This is how the partial_tukey and punchout_tukey work, but here the method is not applied brute-force, but with some knowledge of the samples itself. If the moving average is much larger than the average of the whole block, that part is ignored. This way, short bursts of outliers (for example, the attack of a new tone) can be ignored in the predictor modelling.

*when I say average, I mean average of the absolute value of the residual from the last iteration.

X
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-16 20:06:44
The last few days I've been busy working on this, and there is some progress. Sadly, the code is very slow, and I don't think this will improve much. This improvement fits a certain niche of FLAC users who want maximum compression within the FLAC format and don't care how long encoding takes, perhaps reencoding in the background.

Attached you'll a 64-bit Windows executable compiled by MinGW with -march=native on a Intel Kaby Lake-R processor and a PDF with a graph of my results. I haven't used -march before, but if I understand correctly, this code should run on fairly recent 64-bit CPUs with AVX2 and FMA3. To make testing a little easier, I have added a new preset, -9, which uses the new code.

The following graph shows the results of encoding with (from left to right) setting -5, -8, -8e, -8ep and finally -9.
(http://www.audiograaf.nl/misc_stuff/irls-compression.png)

The new preset is -8e + a new 'apodization function', irls(2 11). It is technically not an apodization, but in this way, the integration into the FLAC tool is pretty clean. The function has two parameters, the number of iterations per LPC order, and the number of orders. So, with irls(2 11), the encoder does two iterations at order 2, two iterations at order 3, all the way to 2 iterations at order 12. Sadly, there is still something wrong in the code, so I would recommend not using anything else then 11 order at this time. Using more iterations is very well possible, and gives a little gain at the cost of much slowdown.

Last time I improved FLAC compression (https://hydrogenaud.io/index.php?topic=106545.0) the improvement was about 0.2% with no slowdown. This one is another 0.2%, but at the cost of a 60x slowdown.

NOTE: Please use the attached binary with caution! There has little testing been done on it!
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-17 01:06:23
Not convinced of the usefulness, but interesting yes. Do you have any idea of what orders actually make for the improvement?

But I am curious why your graph wasn't visible here. Hotlinking forbidden?  Anyway, it is at
http://www.audiograaf.nl/misc_stuff/irls-compression.png if any other HA kittehs should think the same:
(https://pics.me.me/thumb_this-is-relevant-to-my-interests-icanhaschee2dorgercom-varied-interests-50253454.png)
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-17 02:41:21
Very interesting stuff but even with a 5900x this is a hard nut.
flac-native -9 indeed compresses the albums i tried better as CUEtools flake -8 while this is slightly better on average as flac -8 ep.
The speed is to slow for my taste to consider it for use in my workflows but it is a very nice idea.
Many thanks for the effort and the working version :)
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-17 03:51:10
To late to edit the above, sorry. I did configure my frontend wrong by keeping -ep in the command.
The above counts for flac-native -9 -ep
Title: Re: New FLAC compression improvement
Post by: jaybeee on 2021-06-17 09:09:48
But I am curious why your graph wasn't visible here. Hotlinking forbidden?  ...
Graph shows for me, so I suspect it's something in your browser settings preventing it being shown. It's happened to me before over the years.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-17 21:27:44
Some unscientific numbers for 29 random CD format albums:

CUEtools flake -8
7.591.331.615 Bytes

flac-native -8
7.623.712.629 Bytes

flac-native -9
7.586.738.858 Bytes

flac-native -9 -ep
7.581.988.737 Bytes
Title: Re: New FLAC compression improvement
Post by: kode54 on 2021-06-18 04:41:54
The hotlinking is broken because the link is http and the forum is https, and any browser which enforces security rules will block http resources, rather than downgrade the page security.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-18 08:57:17
Some unscientific numbers for 29 random CD format albums:
[...]
I was shocked to see that the difference between flake and this new LPC analysis method was so small, but I just realised that is probably because flake uses a smaller padding block by default. I tried to get CUEtools.Flake working here, but for some reason I can't. Wombat, can you check whether padding is indeed smaller with CUEtools.Flake? If there are no tracks longer than 20 minutes in your test, the difference should be 4096 bytes per track.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-18 14:37:15
You can use my old compile (https://hydrogenaud.io/index.php?topic=106446.msg903939#msg903939). If you want to use the encoder from the recent CUEtools download you need to copy the additional file Newtonsoft.Json.dll
Not to sure about padding. With foobars Optimize file layout + minimize i get this for a single album:
Total file size decreased by 29953 bytes for the CUEtools -8 version and Total file size decreased by 62721 bytes for the flac-native -9 version. I doubt this is much.
Title: Re: New FLAC compression improvement
Post by: IgorC on 2021-06-18 21:36:23
8pe vs 9  (+0.14% compression gain on 1 album).
~55-60x vs ~20x on my laptop with 6 cores/12 threads.

As for me, 9 is ok.

P.S. 9pe brings +0.08% compression gain at ~16x comparing to -9. Ok too.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-19 19:48:26
You can use my old compile (https://hydrogenaud.io/index.php?topic=106446.msg903939#msg903939).
[...]
Not to sure about padding. With foobars Optimize file layout + minimize i get this for a single album:
[...]
You are right. I did a comparison and CUEtools.Flake does a much better job than regular flac. It seems my time would have been better spend figuring out why CUEtools.Flake compressed so much better, but perhaps (hopefully) these gains stack, or my work will be for nothing.

Anyway, I've also rewritten my code to be able to use irls as postprocessing. So, when FLAC has found the best predictor in the normal way (through the regular LPC process, with the partial_tukey and punchout_tukey windows), it will try using that result as a starting point for IRLS, instead of starting over. In the graph this is called irlspost, with the number of iterations between round brackets. Using current -8e + some 4 extra windows (punchout_tukey()) like CUEtools.Flake does, and adding 3 iterations comes close to using -9 but is 4x faster.

X
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 08:27:17
I found out why Flake compressed so much better than FLAC. Apparently, this has been a thing for at least 14 years: flake computes autocorrelation with double precision, and FLAC with single precision. this makes quite a difference on some tracks, especially piano music.

See the flac-dev mailinglist for the complete story: http://lists.xiph.org/pipermail/flac-dev/2021-June/006470.html

See the graph below for the results.

X
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 09:55:15
Nice catch!

So the worst-slowdown CPUs for going double, would be those with SSE (which get the dark-blue benefits over the green) but no SSE2 (thus getting the light-blue penalty vs the red) - they will at -8 have a slowdown by a factor of 2.5 or so?
But at a benefit.


Thinking aloud about what near-equivalences between "new" and "old" for those worst-case  SSE-but-not-SSE2 CPUs:
If I may presume that the most interesting (to end-users!) compression levels are -0, -8 and the default  - signifying "speed please!", "compression please!" and "I don't care, it works":
* For the SSE-but-not-SSE2, going from dark-blue -6 to light-blue -5 would get as good as the same outcome. Of course you don't change the default out of the [light|dark]blue, but for those who choose default from the "I don't care, it works" it shouldn't matter much if a new build gives them the performance equivalent of bumping -5 to -6.
* And if the using-whatever-defaults users start to care, they can start using options. You didn't include any -3, but what would the light-blue -3 be? Hunch: close to dark-blue -5?
* It is also a stretch to redefine -8 or reassign "--best" to something else than -8 (even if the documentation always said "Currently" synonymous with -8) but the -7 suddenly looks damn good - and that goes for the red as well.


So maybe at the conclusion of your efforts one should rethink what the numbers - and "--best" (and "--fast"?) stand for. -6 to -8 use -r 6 which is not the subset maximum; and "4096" is not the maximum subset blocksize.


(And dare I say it, but as impressive as TAK performs, one may wonder how much of it would be feasible within other formats too.)
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 10:26:25
So the worst-slowdown CPUs for going double, would be those with SSE (which get the dark-blue benefits over the green) but no SSE2 (thus getting the light-blue penalty vs the red) - they will at -8 have a slowdown by a factor of 2.5 or so?
Yes. SSE2 has been around for 20 years and is present on all 64-bit capable CPU's, so this should not affect many users. I'm not sure about the VSX part (for POWER8 and POWER9 CPU's) but I think the penalty will be comparable to the SSE-but-not-SSE2 part if nobody updates these routines before the next release. I don't have the hardware, so I can't develop it.


Quote
* And if the using-whatever-defaults users start to care, they can start using options. You didn't include any -3, but what would the light-blue -3 be? Hunch: close to dark-blue -5?
-3 disables stereo decorrelation and is in a completely different ballpark. See http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%204.pdf Graph from that PDF is below, see red circle for -3. Using -3 is pretty much pointless, but perhaps some archaic decoders can only decode without stereo decorrelation. I think I'd rather leave that untouched.

X
Quote
* It is also a stretch to redefine -8 or reassign "--best" to something else than -8 (even if the documentation always said "Currently" synonymous with -8) but the -7 suddenly looks damn good - and that goes for the red as well.
Yes, but the name "best" does imply that is the best. It isn't, -8e is better and -8ep is even better, but it is the best preset. I wouldn't want to change that.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 11:41:17
the name "best" does imply that is the best. It isn't,

Let's hear it for the good(?) old  --super-secret-totally-impractical-compression-level ! :))
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 12:06:24
But on -3 I think you turned two arguments on their head, or am I just having a bad hair day?

* -3 improves over -2 at both speed and compression simultaneously. That is not pointless, that is good.
If anything of those settings is pointless from a performance perspective, it is -2.

* if your new -3 would happen to turn out as good as the old -5, it would mean that users who want "old -5" performance can happily go "new -3".
But that does not prevent anyone with a special need for -3 to keep using -3.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 12:24:23
Preset -0, -1 and -2 are only using fixed subframes. These are really easy to decode. If one wants maximum performance with only fixed subframes, -2 is the way to go, It is the odd one out performance wise, but I think is is like using TAK's -p0m instead of -p1: slower encoding, same decoding speed and less compression, but there is some limitation in the encoding process which should help decoding. I don't know for sure, I've never studied the TAK format.

Anyway, from a performance perspective, I think -0, -1, -2 and -3 are all pointless. The gain from -4 to -8 is less than 0.5%point (~1%) while the gain from -3 to -4 is 2%point (~4%). However, in the very early days of FLAC support on hardware devices through Rockbox, some levels gave a longer battery life than others, and some devices with very limited hardware didn't run fast enough to decode LPC. There is probably some documentation around referring to FLAC presets in terms of decoding performance. That's why I think the presets should only be changed as long as decoding performance is not affected.

Very archaic indeed, but FLAC has been broadly accepted for quite a long time now, so I guess it's part of the deal
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 12:54:02
But again, allowing users to migrate to a lower-number setting does not hurt any compatibility.

Admittedly, I only speculated through gut-feeling extrapolation, as of what your new -3 would be able to do.
But let me just point at the graph you posted. If I understand your light-blue -5 vs dark-blue -6 correct, it means that an SSE (no SS2) user who as of today prefers -6 could get a performance hit - which could be as good as eliminated by switching to -5 in the new version.

Going -6 to -5 is ... fine. Going the other way, forcing users to choose a higher number, that has a risk - but -6 on an old version to -5
on a new version means they keep their performance and maybe even get a slightly better battery life upon decoding. So that is a case for just adopting the double.

Of course it isn't that straight - not everyone has a collection represented by your graphs.
Title: Re: New FLAC compression improvement
Post by: Rollin on 2021-06-24 12:58:11
n the very early days of FLAC support on hardware devices through Rockbox, some levels gave a longer battery life than others, and some devices with very limited hardware didn't run fast enough to decode LPC
Can someone name at least one such slow device? According to these results (https://www.rockbox.org/wiki/CodecPerformanceComparison) FLAC -8 decoding is even faster than mp3 decoding on all tested hardware.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 13:11:05
Can someone name at least one such slow device? According to these results (https://www.rockbox.org/wiki/CodecPerformanceComparison) FLAC -8 decoding is even faster than mp3 decoding on all tested hardware.
Seeing that table I think I remembered wrong. That table was what I remembered but couldn't find, FLAC compression levels being referred to in a benchmark on decoding performance.

Anyway, I'd like the opinion of the people reading this on the following (quote from the mailing list: http://lists.xiph.org/pipermail/flac-dev/2021-June/006470.html)

Quote
Code is here: https://github.com/ktmf01/flac/tree/autoc-sse2 Before I send a push request, I'd like to discuss a choice that has to be made.
I see a few options
- Don't switch to autoc[] as doubles, keep current speed and ignore possible compression gain
- Switch to autoc[] as doubles, but keep current intrinsics routines. This means some platforms (with only SSE but not SSE2 or with VSX) will get less compression, but won't see a large slowdown.
- Switch to autoc[] as doubles, but remove current SSE and disable VSX intrinsics for someone to update them later (I don't have any POWER8 or POWER9 hardware to test). This means all platforms will get the same compression, but some (with only SSE but not SSE2 or with VSX) will see a large slowdown.

Thanks in advance for your replies and comments on this.

So, switching from single precision to double precision, I'm faced with a choice: should I keep the fast but single-precision routines for SSE and VSX, so they keep the same speed but no compression benefit? Or should these be removed/disabled so all platforms (ARM, POWER, ia32, AMD64) get the same compression, but at varying costs?
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-24 16:21:55
Very good findings and developement you very well presented here ktf, thanks!
Changing any generic behaviour you suggest directly to flac may be hard to keep if other authors don't like you or ideas of yours.
This may be a silly idea but why not optimize flake further and doing a ktf flac encoder for enthusiasts that way?
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 23:16:56
Anyway, I'd like the opinion of the people reading this on the following

I think the "don't switch" (= do nothing) is the worst. Well with the reservation that you might want to wait until there is a new version with other improvements too.

Then opening some cans with possible dumb questions:
* is it possible to make the switch for a 64-bit-only version?  (I understand from what you mentioned above, those platforms affected cannot run the 64-bit version anyway?)
* is it possible to make an option to turn off those routines? --SSEonlyplatform?
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-30 09:56:40
Then opening some cans with possible dumb questions:
* is it possible to make the switch for a 64-bit-only version?  (I understand from what you mentioned above, those platforms affected cannot run the 64-bit version anyway?)
* is it possible to make an option to turn off those routines? --SSEonlyplatform?
First is possible (but I don't see the benefit of doing that?), the second would mean changing the libFLAC interface and sacrificing binary compatibility, so I'd rather avoid that.

Anyway, I've finished polishing the changes and I've sent a pull request through github: https://github.com/xiph/flac/pull/245 Further technical discussion is probably best placed there.

Here's a mingw 64-bit Windows compile for anyone to test. This does not have the IRLS enhancement and thus no compression level 9 like the previous binary, but it compressed better on preset -3 to -8, mostly on classical music. I've changed the vendor string (I forgot that on the previous binary) to "libFLAC Hydrogenaud.io autoc-double testversion 20210630", so if you try this on any part of your music collection, it is possible to check which files have been encoded with this binary.

Feel free to test this and report back your findings. I think the results will be quite a bit closer to what CUEtools.Flake is producing
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-30 18:02:37
First is possible (but I don't see the benefit of doing that?)
If I understood correctly, all CPUs that would get the slowdown are 32-bit CPUs. Making the change for the 64-bit executable will be a benefit to several processors and not disadvantage anyone. If I got it right?

Then afterwards one can decide what to do for the 32-bit executable.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-07-01 03:20:47
Feel free to test this and report back your findings. I think the results will be quite a bit closer to what CUEtools.Flake is producing
This SSE improvement indeed is a very nice finding!
The compile you offer ends for the 29 CDs with a size of:
7.592.544.746 Bytes
The small difference against flake is mainly due to padding this time i guess.
Some unscientific numbers for 29 random CD format albums:

CUEtools flake -8
7.591.331.615 Bytes

flac-native -8
7.623.712.629 Bytes
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-07-01 11:45:02
The difference was about a megabyte per CD. Now it is down to a megabyte in total. Dunno if you use tracks or images.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-07-02 18:22:15
I tried some 24/96 albums and the new binary improved compression very well. Even better as CUEtools flake.
Really looking forward for a version you add the IRLS weighing back in over a dedicated switch for example.
Maybe also interesting the -8 -ep size for these 29 albums:
7.584.945.065 Bytes
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-07-02 21:19:48
Really looking forward for a version you add the IRLS weighing back in over a dedicated switch for example.
Sadly, the gains do not stack well. Most of the gains from the IRLS patch are the same gains that this autocorrelation precision doubling also gets. It still works, but not as much as I posted earlier.

On the upside, I've now programmed with intrinsics, and it went pretty well. It was a nice challenge, so maybe I can speed up the IRLS code quite a bit to compensate for the lower gains. Also, the autocorrelation precision is also relevant in levels -3, -4, -5, -6 and -7, and -5 is relevant to the numbers in the Hydrogenaud.io wiki Lossless comparison (https://wiki.hydrogenaud.io/index.php?title=Lossless_comparison)  :))
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-07-03 16:03:49
Looks like http://www.audiograaf.nl/downloads.html has not seen its final edition yet   :)
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-01 15:37:30
@ktf , you are probably more busy looking into https://hydrogenaud.io/index.php?topic=121349.msg1001309 , but still:
I've tested your double-precision build a little bit back and forth about compression levels on a varied (but missing chartbusters) corpus of 96k/24 files, 96/24 because I conjecture FLAC was primarily developed for CDDA, and so try something else to look for surprises. All stereo, though. Comparison done to official flac.exe 1.3.1.

Your -5 does pretty well I think. Results at the end, but first: minor curious things going on:

* -l 12 -b 4096 -m -r 6 (that is, "-7 without apodization") and -l 12 -b 4096 -m -r 6 -A tukey(0,5) produce bit-identical output.  So is the case for 1.3.1:  "-A tukey(0,5)" makes no difference.
(My locale uses comma for decimal separator.  By using the wrong one I also found out that "tukey(0.5)" is interpreted as "tukey(0)", which appears to be outright harmful to compression.)

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) .  Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

* It seems that -b 4608 improves slightly - however I couldn't reproduce that on CDDA signals.  Also, 16384 did not improve over 8192 on the 96/24 corpus.


So, the improvement figures. It improves quite a bit on -5. Note, sizes are done on the entire corpus, but times are not and are just an indication; for times, I only took a single album (NIN: The Slip) over to an SSD, hence the low numbers, and they are run only once.

* PCM is ~ 13.6 GB, compresses to 7.95 GB using 1.3.1 -5, and from then on we have MB savings as follows
80 MB going 1.3.1 -5  (24 seconds on The Slip) to 1.3.1 -8 (43 seconds)
193 MB going 1.3.1 -8 to your -5 (25 seconds). Heck, even your -3 (20 seconds) is 65 MB better than 1.3.1 -8.
38 MB going your -5 to your -8 (63 seconds). Note how this makes "-8" less of a bounty now. The big thing is how your -5 improves over 1.3.1 -5 by 3.34 percent or 1.9 percent points and at around zero speed penalty. Oh, and new -7 clocked in at 42 seconds, compare to old -8.

* I tested a bit of  -e and  -p on both new and old. A minor surprise:
- 5 at 1.3.1: -p (41 seconds) saves 2 MB, -e (instead of -p of course; 36 seconds) saves 35.
- 5 at yours: Both quite pointless, savings 0.4 (taking 45 seconds) and 0.6 (39 seconds).
Your build from -8 -b 8192 -r 8 (64 seconds, only one more than -8): adding "-p" (220 seconds) saves 2 MB, adding "-e" (instead, takes 214 seconds) saves 11.
So your "-5" picks up nearly all the advantages of -p or -e ... ?!

* For the hell of it, I started a comparison of -8 -b 8192 -l 32 -p -e, mistakenly thinking it would finish overnight ... I aborted it after a quarter run, having done only the"Lindberg" part below, and the autoc-double seems to improve a slight bit less here than in -8 mode. To get you an idea of what improves or not, fb2k-reported bitrates for that part:
2377 for 1.3.1 -5
2361 for 1.3.1 -8
2349 for 1.3.1 -8 -b 8192 -l 32 -p -e
2329 for your -3
2314 for your -5
2310 for your -8
2305 for your -8 -b 8192 (roundoff takes out the -p or -e improvements
2302 for your -8 -b 8192 -l 32
2301 for your -8 -b 8192 -l 32 -p -e


I can give more details and more results but won't bother if all this is well in line with expectations.
FLAC sizes: I removed all tags from the .wav files, used --no-seektable --no-padding and compared file sizes, since differences were often within fb2k-reported rounded-off bitrates.
Corpus (PCM sizes, -5 bitrates)
3.42 GB -> 2314 various classical/jazz tracks downloaded from the Lindberg 2L label's free "test bench", http://www.2l.no/hires/
2.64 GB -> 2161 the "Open" Goldberg Variations album, https://opengoldbergvariations.org/
3.18 GB -> 3035 Kayo Dot: "Hubardo" double album (avant-rock/-metal with all sorts of instruments)
1.41 GB -> 2913 NIN: The Slip
1.24 GB -> 2829 Cult of Luna: The Raging River (post-hc / sludge metal)
1.14 GB -> 2588 Cascades s/t (similar style, but not so dense, a bit more atmosphere parts yield the lower compressed bitrate)
0.57 GB -> 2724 The Tea Party: Tx 20 EP (90's Led Zeppelin / 'Moroccan Roll' from Canada)
Title: Re: New FLAC compression improvement
Post by: bennetng on 2021-08-02 12:54:01
* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) .  Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.
Some observations by trial and error:

The order of windows work best from widest to narrowest in terms of time domain. For example, windows that don't take arguments, from widest to narrowest are Rectangular, Welch, Triangular/Hann, Blackman, then Flat top.
https://en.wikipedia.org/wiki/Window_function#A_list_of_window_functions
The left (blue) plots are time domain, and right (orange) ones are frequency domain, windows occupy more blue regions are wider.

Windows taking arguments like Tukey, when used repeatly, also work best from widest to narrowest, I believe Tukey(0) is same as Rectangular and Tukey(1) is same as Hann, according to this:
https://www.mathworks.com/help/signal/ref/tukeywin.html

I don't understand partial Tukey and punchout Tukey, look like these are combinations of several Tukey windows by looking at the source codes.

-e may adversely affect compression ratio with the above ordering (plus it is super slow anyway).

-6 to -8 already specified one or more windows, so don't use them if you are planning to use custom windows ordering.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-02 13:57:53
-e may adversely affect compression ratio with the above ordering

So the above test runs indicate the opposite: it improves every album, and even at hard settings with the "good order", but - and this is the surprise - it the improvement is near zero at the new build's "-5".
Gut feeling is that the new -5 is the oddball, and in a benevolent way in that it picks up a lot of improvement without resorting to -e or -p

I have very limited experience with -e and -p for this obvious reason:
(plus it is super slow anyway)
On this machine, not so slow as -p, and better (though not for every album; but, bitrates reported by fb2k, I have yet to see -p being more than 1 kbit/s better).
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-04 08:29:18
* -l 12 -b 4096 -m -r 6 (that is, "-7 without apodization") and -l 12 -b 4096 -m -r 6 -A tukey(0,5) produce bit-identical output.  So is the case for 1.3.1:  "-A tukey(0,5)" makes no difference.
(My locale uses comma for decimal separator.  By using the wrong one I also found out that "tukey(0.5)" is interpreted as "tukey(0)", which appears to be outright harmful to compression.)
That is because -A tukey(0,5) is the default. So, both should produce bit-identical output.

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) .  Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.
It is not something well known. However, the differences should be very, very small. The reason the order might matter is that FLAC estimates the frame size for each apodization, it does not fully calculate it. If two apodization give the same frame size estimate, the one that is evaluated first is taken. The estimate might be a bit off though, which means that swapping the order can change the resulting filesize.

At least, that is how I understand it. This means that this influence should be minor, as having two estimates does not occur often and the actual difference should not be large, as the estimate is usually quite good. There could be something else at work though.

* PCM is ~ 13.6 GB, compresses to 7.95 GB using 1.3.1 -5, and from then on we have MB savings as follows
80 MB going 1.3.1 -5  (24 seconds on The Slip) to 1.3.1 -8 (43 seconds)
193 MB going 1.3.1 -8 to your -5 (25 seconds). Heck, even your -3 (20 seconds) is 65 MB better than 1.3.1 -8.
38 MB going your -5 to your -8 (63 seconds). Note how this makes "-8" less of a bounty now. The big thing is how your -5 improves over 1.3.1 -5 by 3.34 percent or 1.9 percent points and at around zero speed penalty. Oh, and new -7 clocked in at 42 seconds, compare to old -8.
Interesting. This is too little material to go on I think, but the change from float to double for autocorrelation calculation had most effect on classical music, and almost none on more 'noisy' material. For example, see track 11 in this PDF: http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics-per-track.pdf which is also NIN. There is almost no gain. That is does work well with the Slip has mostly to do with the higher bitdepth (24-bit) and not so much with the higher samplerate I´d say.

Quote
So your "-5" picks up nearly all the advantages of -p or -e ... ?!
-e does a search for the best order, without -e it uses an by-product of the construction of the predictor to guess the best order. It could be that the higher prediction also results in a better guess, but since the release of 1.3.1 there's also another change to this guess: https://github.com/ktmf01/flac/commit/c97e057ee57d552a3ccad2d12e29b5969d04be97

I can only guess why -p loses it's advantage. Perhaps because the predictor is more accurate, the default high precision is used better and searching for a lower precision does not trade-off well anymore?
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-04 08:32:58
fb2k-reported bitrates here. Not only is -5 nearly catching up with -6, but look at -4. And: -7 nearly catching -8.
If it were solely due to material, one would have expected 1.3.1 to show the same. Not quite. (There, -6 is quite good, that is in line with previous anecdotal observations.)


2644 for -3 useless or not, your build's -3 produces smaller files than 1.3.1 does at -8
2606 for -4 that shaves off 38 from -3. (2703 with 1.3.1 improves 20 from its -3)
2602 for -5 only improves 4 over -4. (But: 2692 with 1.3.1)
2599 for -6 only improves 3 over -5. (1.3.1: 2674, improves by 18 over -5.)
2590 for -7 improves by 9 over -6. (1.3.1: 2671, small improvement over -6.)
2590 for -8 calculated improvement 0.48. (1.3.1: 2666, improves more over -7 than -7 vs -6).
2580 for -8 -b 8192 -l 32 which is subset because 96 kHz. (1.3.1: 2670, worse than -8)


And then facepalming over myself:
"-7 without apodization"
save for the inevitable dohdefault.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-04 10:03:45
All right, you posted while I thought the error message I got was that I had been logged out. Then since you mentioned something about what should benefit, I looked over the various albums, and I have seen the ten percent improvement mark!
And that is not classical or anything, it is the Tea Party four-song EP (yes the full EP, not an individual track). So I went for the current Rarewares 1.3.3 build, which does no better than 1.3.1.
Major WTF! You got PM in a minute.


As for the order of apodization functions: Yes, small difference. .04 percent to .08 percent.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-04 20:07:06
After doing some research with @Porcus, it became clear that I was wrong on the following:

That is does work well with the Slip has mostly to do with the higher bitdepth (24-bit) and not so much with the higher samplerate I´d say.

As it turns out, it is the high samplerate and not the bitdepth that makes the difference. However, further research showed that it is actually the absence of high-frequency content in high samplerate files that make this difference. As a test, I took a random audio file (44.1kHz/16-bit) from my collection, and encoded it with and without the double-precision changes, and the difference in compression was 0.2%. When I applied a steep 4kHz low-pass on the file, this difference rose to 10%. To be clear, this is not the difference between the input file and the low-passed file, but the differences between the two FLAC binaries on the same low-passed file.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-12 09:20:03
Two more tests started when I went away for a few days.

* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?)  Also tested against compression_level 12. 
I don't know how subset-compliant ffmpeg is on hirez ... also, comparing file sizes could be kB's off, I had to run a metaflac removing everything on the ffmpeg-generated files - and then it turns out that there would still be bytes to save with fb2k's minimize file size utility.

* The other test was to see how close that "sane" settings get to insane ones.  For that purpose I let it run (for six days!) compressing at  -p -e --lax -l 32 -r 15 -b 8192 -A [quite a few].  TL;DR for that test: your new build gets much closer, the return on going bonkers is getting smaller with the double-precision build than it was with 1.3.1.  Not unexpected, as better compression --> closing in on theoretical minimum.


Results:

1.3.1:
8066 MB for -8, that is actually 0.15 percent better than -8 -b 8192 -l 32 (also subset!).
7966 for the weeklong (well four days) max lax insanity, that shaves off more than a percent and ... and finally gets 1.3.1 to beat TAK -p0 which is 8017, this "for reference"  O:)

The double precision build, and ffmpeg:
7873 MB for -5
7836 MB for -8
7831 MB for ffmpeg at compression_level 8 (fb2k reports 2 kbit/s better than your -8)
7815 MB for -8 --lax -b 8192 -r 15
7809 MB for ffmpeg at compression_level 12
7808 MB for -8 -b 8192 -l 32 (subset! -l 32 does better than -r 15, and 1 kbit/s better than ffmpeg)
7779 MB for the five days max lax insanity setting.

So are there differences at how ffmpeg does? 
-8 vs compression_level 8: No clear pattern.  Of seven files, ffmpeg wins 4 loses 3.  Largest differences: ffmpeg gains 17 for Cascades, yours gains 15 for Nine Inch Nails. 
-8 -b 8192 -l 32 vs compression_level 12: ffmpeg wins 3 loses 4, but the largest differences favor ffmpeg: 41 for The Tea Party. Again yours has the upper hand of 13 on NIN.

Now which signals *do* improve from the insane setting?  Absolutely *not* the jazz/classical - nor is it the highest bitrate signal (Kayo Dot), they are within 5 kbit/s from -8.
It is The Tea Party, gaining 98 kbit/s over the subset, which again gains 46 over -8. 

So the TTP EP indicates there *is* something to be found for music that is not extreme.  Note that TTP is the shortest (in seconds) file of them all, and that could make for larger variability.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-12 09:26:27
By the way, TBeck speaks about splitting windows and that this is possible within the FLAC spec (https://hydrogenaud.io/index.php?topic=44229.msg402752#msg402752). This is beyond me (I stopped learning Fourier analysis long before getting hands-on and I totally suck at code), but ... ?
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-12 14:15:15
Thanks for digging that one up. Possibility A is what I implemented quite a few years ago in FLAC 1.3.1, partial_tukey and punchout_tukey. TBeck was talking about using various variations of the triangle window, this works a little different. Partial tukey uses only a part of the signal (hence partial_tukey) for LPC analysis, punchout tukey masks a part of the signal (it 'punches' out a part of the signal) by muting it for LPC analysis. This works exactly as described in the post you link

If so, then one frame often will contain parts with different signal characteristics, which better should be predicted seperately. But this does not happen.

This is my hypothese, why windowing helps FLAC that much: It surpresses the contribution of of one or two (potential) subframes  at the frame borders to the predictor calculation and hence improves the quality of the (potential) subframe within the center. At least this one now gets "cleaner" (not polluted by the other subframes) or better adapted predictor coefficients and overall the compression increases.

Possibility B would make use of a variable blocksize. This is possible within the FLAC format, and flake (and cuetools.flake) implement this. However, as this has never been implemented in the reference encoder, it might be that there are (embedded) FLAC decoders that cannot properly handle this. I have wanted to play with variable blocksizes for a long time, but if it were to succeed, this might create a division of "old" fixed-blocksize FLAC files and "new" variable-blocksize FLAC files, where the latter is unplayable by certain devices/software.

I cannot (yet) substantiate this fear. Moreover, implementing this in libFLAC would require *a lot* of work. Perhaps it would be better to experiment with new approaches in cuetools flake, if I run out of room for improvements in libFLAC
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-12 14:58:42
Yeah, so the following musing is probably just ... not much pursuing, but that hasn't stopped me from thinking aloud:

* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on.
But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset?

* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384.
Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192?

(... it is then so that increasing n in partial_tukey(n) allows further splitting up?)


Anyway, CUETools flake may in itself support non-CDDA, but I doubt that it will be used outside CDDA when CUETools is restricted that way. Meaning, you might not get much testing done. ffmpeg, on the other hand ... ?
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-13 15:07:51
* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?)  Also tested against compression_level 12. 
Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.

* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on.
But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset?
Yes, it might be unsafe to use -l 32 -b 8192 when encoding for maximum compatibility. However, properly decoding -l 32 -b 8192 is quite simple, as it is simply more of the same (longer block, longer predictor). Also, it is part of the FLAC test suite.

Variable blocksizes are not part of the FLAC test suite, and quite a few things change. For example, with a fixed blocksize, the frame header encodes the frame number, whereas with a variable blocksize, the sample number is encoded.

Quote
* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384.

The problem is that it is probably impossible to know upfront what the optimal blocksize is for a whole file.

Quote
Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192?
I cannot answer that question without just trying a bunch of things and see. I will explain the idea behind these apodizations (which I will call partial windows)

I would argue that a certain set of partial windows @ 44.1kHz with blocksize 4096 should work precisely the same with that file @ 88.2kHz with a blocksize of 8192.

partial_tukey(2) adds two partial windows, one in which the first half of the block is muted and one in which the second half of the block is muted. partial_tukey(3) adds three windows, one in which the first 2/3th of the block is muted, one in which the last 2/3th of the block is muted and one in the first and last 1/3th of the block is muted.

punchout_tukey does the opposite of partial_tukey. Using punchout_tukey(2) and partial_tukey(2) together makes no sense, because you get the same windows twice, because punchout_tukey(2) creates the same two windows but swapped. punchout_tukey(3) adds 3 windows, one with the first 1/3th muted, one with the second 1/3th muted and one with the third 1/3th muted.

If a block consists of a single, unchanging sound, partial_tukey and punchout_tukey do not improve compression. If a block has a transient in the middle of the block (for example, the attack of a new note at roughly the same pitch) punchout_tukey(3) adds a window in which the middle with this transient is muted. The LPC stage can now focus on accurately predicting  the note and not bother with the transient. This is why adding these apodizations improves compression. As TBeck put it, the signal is cleaner. Of course this transient still has to be handled by the entropy coding stage, but a predictor that does part of the block very good beats one that does of all of the block mediocre on compression.

So, choosing how many partial windows to add depends on the number of transients in the music, the samplerate and the blocksize. If the samplerate doubles, the blocksize can be doubled keeping the same number of partial windows. If the samplerate doubles and the blocksize is kept the same, the number of partial windows can be halved. If you samplerate is kept the same and the blocksize is doubled, the number of partial windows should be doubled.

However, it depends on the music as well. Solo piano music at a slow tempo won't benefit from more partial windows as there are few transients to 'evade'. Non-tonal music on which prediction doesn't do much won't benefit either. Very fast paced tonal music might benefit from more partial windows. The current default works quite well on a mix of (44.1kHz 16-bit) music.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-13 15:56:13
* All right, I got the punchout now, thanks!


* I don't know if CUETools' flake even supports higher resolutions, but if so, how do I call it outside CUETools? I could run the test at that too.


* You mentioned solo piano, so let me mention the following: The Open Goldberg Variations does routinely not benefit from increasing the Rice parameter value (max, I never set min). By that I mean that settings that only differed by -r, say -r 6 vs -r 8, would often yield the same file (same .sha1) for that album. Yes a full album .flac file, I did not bother about tracks here.


* Meanwhile, I started another test using ffmpeg -compression_level 8 -lpc_type cholesky.  It improves over -compression_level 8 for every sample, -3 kbit/s on the total.  So at the ffmpeg camp they have been doing something bright.


* And then, since you posted in the https://hydrogenaud.io/index.php?topic=120906 thread, reminding me of that thing, I tried this new build on test signals: sine waves (at peak near full scale) - or tracks that have four sine waves in succession (peaks .72 to .75).

So even tracks that are damn close to continuing sine, benefit from double precision - but even then, ffmpeg beats it. Results:
-5 results:
388 for ffmpeg at -compression_level 5
376 for 1.3.1 at -5
362 for double precision at -5
-8:
360 for 1.3.1 at -8
343 for double precision at -8
330 for ffmpeg at -compression_level 8.  Same as for your build with -p or -e (-p -e is down to 321).
314 for ffmpeg -compression_level 8 -lpc_type cholesky

Lengths and track names - the "1:06" vs the final "1:04" must be gaps appended. Total length 9:16 including 12 seconds gap then.
1:38 Mastering calibration - 1 kHz
1:38 Mastering calibration - 10 kHz
1:38 Mastering calibration - 100 Hz
1:06 Frequency check - 20, 32, 40 & 64
1:06 Frequency check - 120, 280, 420, 640
1:06 Frequency check - 800, 1200, 2800 & 5000
1:04 Frequency check - 7500, 12000, 15000 & 20000
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-08-13 16:27:04
CUETools flake supports at least 24-192. When CUEtools was new i reported problems with highbitrate material to Grigory Chudov and he fixed it immediately.
If you want to test recent behaviour just use my 2.16 encoder compile. AFAIK nothing changed for the flake encoder since.
Regarding non default blocksizes My Slimdevices Transporter for example can't play 24-96 material with 8192.
Title: Re: New FLAC compression improvement
Post by: Rollin on 2021-08-13 17:28:43
I don't know if CUETools' flake even supports higher resolutions, but if so, how do I call it outside CUETools? I could run the test at that too.
Yes, it does support higher resolutions. It is easy to use it as custom encoder in foobar2000.
(https://i.imgur.com/WreabtQ.png)
-q -8 --ignore-chunk-sizes --verify - -o %d
For compression levels higher than 8, --lax should be added to command line.

ffmpeg beats
Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8. And there is no option in ffmpeg to set block size.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-13 20:45:10
More testing with CUETools.Flake.exe -8 -P 0 --no-seektable --and-maybe-some-more-options, after I (thanks to Rollin!) found out that yes it is an .exe and not just a .dll ... I don't feel smart right now.

Results given for three groups of material, all figures are fb2k-reported bitrates.
Maybe the most interesting news:
No miracles from variable block size (--vbr 4, at one instance also --vbr 2).  


* First, those test signals where 1.3.1 ended up at 360: four CUETools.Flake.exe runs all ended up at 359 (with or without --vbr 4, with or without -t search), that is not on par with ktf's double precision build

* Then the 96k/24 corpus, put the Flake into the relevant orders; all encoders that are not specified, are ktf's double precision build - to be clear, it is the libFLAC Hydrogenaud.io autoc-double testversion 20210630

2666 for reference flac.exe 1.3.1 at -8
2644 for -3
2609 for CUETools.Flake.exe at -8
2607 for CUETools.Flake.exe at -8 --vbr 4 -t search -r 8 -l 32
2606 for -4
2605 for CUETools.Flake.exe at -8 --vbr 4 (better than throwing in -t search -r 8 -l 32)
2602 for -5
2599 for -6
2590 for -7
2590 for -8 (0.48 better than -7, calculated from file size)
2588 ffmpeg -compression_level 8
2585 ffmpeg -compression_level 8 -lpc_type cholesky
2584 for -8 -b 8192 -r 8 -p
2584 for --lax -r 15 -8 -b 8192
2581 for -8 -b 16384 -l 32
2581 for -8 -b 8192 -r 8 -e (slightly smaller files than the "2581" above)
2581 ffmpeg  -compression_level 12 (again slightly smaller files than previous 2581)
2580 for -8 -b 8192 -l 32 (subset too, notice 8192 better than 16384)
2571 for --lax -p -e -l 32 -b 8192 -r 15 -A enough-for-five-days
2563 for TAK at the default -p2.


* And, here are some results for one multi-channel (5.1 in 6 channels) DVD-rip (48k/24), 80 minutes progressive rock:
4119 for 1.3.3 at -8
4091 for ktf's -8
4088 for ffmpeg -compression_level 8
4080 for ffmpeg -compression_level 8 -lpc_type cholesky
4065 for CUETools.Flake.exe at -8, and also at -8 --vbr 2 and at -8 --vbr 4 (I overwrote without checking file size differences)
And lax options, not sure what to make out of them:
4044 for ffmpeg -compression_level 12 -lpc_type cholesky
4023 for CUETools.Flake.exe at -11 --lax
4006 for ktf's -8 -b 8192 -l 32 -r 8 --lax



Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8.
With reference FLAC I did not get 4608 to improve over 4096 at CDDA.  Only very rough testing (but more than this test CD ...). 
Also with 96k/24 I did not get 16384 to improve over 8192.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-14 17:18:01
* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?)  Also tested against compression_level 12. 
Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.
Apparently, I was too quick dismissing ffmpeg.  As can be read https://hydrogenaud.io/index.php?topic=45013.msg412644#msg412644 (https://hydrogenaud.io/index.php?topic=45013.msg412644#msg412644), flake's developer actually did quite a bit of development after integrating flake into ffmpeg. Also, others did.

It turns out this cholesky factorization (which can be used with the option -lpc_passes) does pretty much what the IRLS approach does which this thread starts with. I am truly surprised with this, especially as it has been there since 2006 with apparently nobody here at hydrogenaudio using it. I quote from https://github.com/FFmpeg/FFmpeg/commit/ab01b2b82a8077016397b483c2fac725f7ed48a8 (emphasis mine)
Quote
optionally (use_lpc=2) support Cholesky factorization for finding the…

… lpc coeficients

  this will find the coefficients which minimize the sum of the squared errors,
  levinson-durbin recursion OTOH is only strictly correct if the autocorrelation matrix is a
  toeplitz matrix which it is only if the blocksize is infinite, this is also why applying
  a window (like the welch winodw we currently use) improves the lpc coefficients generated
  by levinson-durbin recursion ...

optionally (use_lpc>2) support iterative linear least abs() solver using cholesky
  factorization with adjusted weights in each iteration

compression gain for both is small, and multiple passes are of course dead slow

Originally committed as revision 5747 to svn://svn.ffmpeg.org/ffmpeg/trunk
That description perfectly matches IRLS (iteratively reweighted least squares). My IRLS code uses Cholesky factorization as well. I'll look into this if I take another look at my IRLS code for libFLAC.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-16 00:13:18
ffmpeg already in 2006 ... hm.

This is probably my final test, and it is neither nice nor "fair": it is the single worst TAK-able track in my CD collection, Merzbow's "I lead you towards glorious times" off the Veneorology album. For those not familiar with this kind of noise music, you can listen at YouTube (https://www.youtube.com/watch?v=OzWNJtN86kU) to understand why these insane bitrates.

For ktf's double-precision build there is a simple TL;DR: absolutely no difference on this track.
After I put every flac file through metaflac to remove everything including padding, ktf's build produces bit-identically the same file as flac 1.3.1, for each setting. (Those were: -0 through -8, for -4 -e though -8 -e, for -4 -p -e through -8 -p -e. More than I intended, but a "wtf?" or two here.)

Also, some more bit-identicals:
flac.exe produce bit-identical files for the following groups: (-3, -4, -4 -e, -4 -p -e); (-6, -7, -8); (-6 -e, -7 -e, -8 -e); (-6 -p -e, -7 -p -e, -8 -p -e).
flake -11 and -11 --vbr 4 produce bit-identical files ... and they are quite a bit better than any other FLAC.


I deliberately picked a track that I knew would make for strange orderings in that TAK cannot handle it (only -4 without e and m gets within .wav size) - but look at how ffmpeg cannot agree with its own ordering. Also look at where flake put its -8 and -9. flac.exe only misses the order once, in that -0 produces smaller file than -1.


Lazy screendump coming up. The "cholesky" is the same kind of option as above.
(https://i.imgur.com/HmOGKy4.png)
Oh, OptimFrog managed to get down to 53 222 937, but I don't have any fb2k component for it.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-17 18:56:28
This is probably my final test, and it is neither nice nor "fair": it is the single worst TAK-able track in my CD collection
Couldn't help myself since there was another discussion on decoding efficiency, tried the ape. "Revised" worst end of the file size list:

1418 TAK -p1
1424 Monkey's Insane
1424 Monkey's High
1424 TAK -p0
1424 Monkey's Extra High
1426 Monkey's Normal

... so even when TAK fails at beating PCM, it succeeds at improving over Monkey's.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-21 13:16:07
After the 'double autocorrelation' change which has been submitted to FLAC quite a while ago, I've been busy improving the IRLS code for which I started this topic. Source code can now be found here: https://github.com/ktmf01/flac/tree/autoc-double-irls

Please see the image below

X

Compared are the exe I dumped here the 16th of June, the exe I'm attaching now and CUEtools.Flake 2.1.6. Presets for FLAC are, from left to right, -4, -5, -6, -7, -8, -8e, -8ep and -9. Presets for Flake are -4, -5, -6, -7, -8, -8 --vbr 4. As you can see, the largest difference is because of the 'double autocorrelation' change, which is clearly visible from -4 to -8ep. However, the change from old -9 to new -9 is what I've been working on. The IRLS code is now much faster, used more efficiently and compresses slightly better.

Feel free to try to exe, look through the source etc. Please be cautious with the exe, I've been quite busy tuning but have done little testing.  I've only tested on CDDA material, maybe there are still some surprises on 96kHz/24-bit material left.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-22 05:06:34
Again my boring 29 CDs

IRLS beta -9
7.583.395.627 Bytes

IRLS beta -9ep
7.582.193.957 Bytes

older numbers
flac-native -9
7.586.738.858 Bytes

flac-native -9 -ep
7.581.988.737 Bytes

-9 speed has indeed improved,thanks!
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-22 20:53:23
maybe there are still some surprises on 96kHz/24-bit material left.
I said the Hydrogenaud.io autoc-double testversion 2021063 -7 was good on hi-rez, this one is even better - on some material. Your new -9 is slooow on this material, good I didn't test the first one.

tl;dr on the below specified four hours of 96/24 (no fancy compression options given!)

-9: spends 40 minutes to achieve 57.25%
-8e spends 15:44 to achieve 57.30% (compared to Hydrogenaud.io autoc-double testversion 2021063 it shaves off .31 points at a cost of 8 seconds)
-8: spends 5:36 to achieve 57.33 (savings: 0.38 points, costs 12 seconds). ffmpeg at -8 gets inside that 0.38 interval, no matter whether it uses lpc_order 2 (spending 6:28) or 6 (at 17 minutes)
-7: spends 3:38 to achieve 57.37, which is still better than the autoc-double testversion's -8e
-6: spends 3:11 to achieve 57.83, that is not good compared to -7. Here and down to -4, the differences to the autoc-double testversion is at most .17
-5 spends 2:10 to achieve 57.92. -4 spends 2:00 to achieve 57.98. That's on par with ffmpeg -5, but twice as fast.

I tested CUETools.Flake -4 to -8, not so much variation, spending from 8:27 down to 3:04 for 58.23% to 58.48%.
I tested 1.3.1 at -8 -e (the -e by mistake), took twelve minutes for 58.97 and was worst for all files - except ordinary -8 half a point worse.


But a lot of the improvement is due to an album and an EP out of four. Your new -7 is faster than 1.3.1 -8 and yields savings by half a percent point up to 8.5 (!!) percentage points, and it is the biggest file that is least compressible.



Material: to get done in a day, I selected the following four hours from the above 96/24 corpus, in order of (in)compressibility:

* Kayo Dot: Hubardo. 93 minutes, prog.metal. Needs high bitrate despite not sounding as dense as the next one.
All about the same, all within half a percent point. And this is the biggest file of them all
Best: flake -8 at 65.72, then your new -9 at .73. (Heck, even OptimFrog -10 only beats this by 1 point.)

* Cult of Luna: The Raging River. 38 minutes sludge metal/post-hardcore.
Large variation, flake does not like this.
Best: New -9 at 59.45. -7 and up shaves a full percentage point over the autoc-double testversion. ffmpeg -8 about as good. ffmpeg -5 at 60.8. flake -8 at 62.59, 1.3.1 even a point worse at -8 -e.

* The Tea Party: Tx20. An EP, only 18 minutes Moroccan Roll. Earlier tests reveal: differs significantly between encoding options.
Large variations. Your -9: 53.95. Your -7 beats your new -6 by 3.2 points and your previous -8e by half that margin. ffmpeg varies by 3 points - here is the file where one more lpc pass makes for .1 rather than .02. Flake runs 60 to 61. flac 1.3.1 62 and 63.

* Open Goldberg Variations.  82 minutes piano, compresses to ~47 percent. Earlier tests reveal: doesn't use high Rice partition order.
Best: ffmpeg -8, but between 46.71 and 46.92 except flac -1.3.1 (add a point or more).



Done on an SSD, writing the files takes forty to sixty seconds. Percentages are file sizes without metadata, padding or seektables, but those don't matter on the percentages for such big files anyway.
Title: Re: New FLAC compression improvement
Post by: danadam on 2021-09-23 15:21:05
I don't know how subset-compliant ffmpeg is on hirez ...
AFAICT it should be on 88.2k - 192k by default and above that if you force block size to 16384.

From what I understand prediction order is not limited when sampling frequency >48000, partition order is limited to 8 but this the max that ffmpeg is using on any level, which leaves block size. ffmpeg is using 105 ms block size, which is translated to:
Code: [Select]
44100 - 4608
48000 - 4608
88200 - 8192
96000 - 8192
176400 - 16384
192000 - 16384
352800 - 32768
384000 - 32768
Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8. And there is no option in ffmpeg to set block size.
"-frame_size 4096" works for me.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-25 16:10:45
@ktf : On my computer, -9 -e generates same files as -9, and -9 -p -e the same (in an awful lot of time) as -9 -p; is that to be expected? I tried the 96/24 and a small CDDA set too.
Asking because it could depend on CPU-specifics.

(I wonder if combinations of -9, -e and/or -p will do calculations that couldn't lead to improvements. I don't know if there is any demand for any optimization; on one hand, who uses -p -e after all? on the other, yes those who use -9 -p apparently do get the same as -9 -e -p, so ...)


partition order is limited to 8 but this the max that ffmpeg is using on any level
[...]
"-frame_size 4096" works for me.
Interesting. Thx, that leaves room to test whether it matters - and if so, whether defaults are optimized.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-25 16:58:34
As you can see above -ep only slightly improves compression because the SSE double precision already doesn't leave  much room. I use a Ryzen 5900x.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-26 04:00:44
@ktf : On my computer, -9 -e generates same files as -9, and -9 -p -e the same (in an awful lot of time) as -9 -p; is that to be expected?
The same behaviour here. -e seems to do nothing with -9
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-26 09:29:08
Preset -9 is equivalent to -b 4096 -m -l 12 -e -r 6 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost-p(3)", so it already includes -e.

irlspost-p takes the result from evaluating tukey(5e-1);partial_tukey(2);punchout_tukey(3) and iterates on it, with the final iteration also being evaluated with -p. Using -9p gives only very small gains as irlspost-p already uses p (and irls usually results in the smallests file)

If you want even better compression, I'd recommend either using -9 -r 8 in the case of electronic music (some chiptune can really gain a lot by using this) or -9 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost-p(10)" or an even higher number for irlspost-p. I haven't seen much improvement with more than 10 iterations, but perhaps this is different for hi-res material.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-26 14:56:40
Are there any known decoding problems going above -r 6?
Playing with the parameter i see no real gains and even slightly worse results for 24-96 stuff.
I think the default -9 setting is well chosen. -p is simply to slow for its minimal gains.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-26 19:11:14
Are there any known decoding problems going above -r 6?
Not that I know of. ffmpeg and CUEtools.Flake use it in some presets by default. For most music it is not worth the trade-off, but for chiptune (Game Boy emulation) I've seen gains of 3 - 4%. There, the high partition order actually switches on low-frequency square wave transitions.

I think the default -9 setting is well chosen. -p is simply to slow for its minimal gains.
Yes, with the irlspost-p I tried to get the gains I saw with using -p but for minimal speed loss. As the irlspost stuff takes the best predictor from the tukey windows and does a few iterations on them, the result from the irls process usually gives the smallest frame.

From the small difference between -9 and -9p you can see that these iterations sometimes give results worse instead of better (because a improvement between -9 and -9p implies that this is from the regular tukey apodizations, and thus that the IRLS process did not improve upon them), it also shows that the process usually works very well.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-26 20:57:13
It seems that adding another "-e" still slows it down, so apparently some combinations force the encoder to do the same work twice.

Anyway, I will be putting the build at some overnight work out of curiosity, but avoid the -p and -e.

Question: Does irlspost-p(N) correspond to ffmpeg's lpc_order N in terms of passes, or to N-1 or to N+1 or justforgeteverythingaboutcomparing?


Edit: Oh, and on the Rice: The Open Goldberg Variations album would frequently end up in same files at a number of different -r settings.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-27 07:31:55
It seems that adding another "-e" still slows it down, so apparently some combinations force the encoder to do the same work twice.
Are you sure? I cannot think of a way that can happen

Quote
Question: Does irlspost-p(N) correspond to ffmpeg's lpc_order N in terms of passes, or to N-1 or to N+1 or justforgeteverythingaboutcomparing?
I assume you mean -lpc_passes? If you want to compare anything, I'd say comparing to irlspost(N) instead of irlspost-p(N) is more fair. irlspost(5) would correspond to -lpc_passes 6, but the algorithms are quite a bit different.

The basic idea is the same, but the execution is very different. In both implementations the basic weight is the inverse of the residual of the pass before. This is the so called L1-norm (https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares#L1_minimization_for_sparse_recovery). The implementation in ffmpeg has a factor summed to the absolute of the residual before inversion (https://github.com/FFmpeg/FFmpeg/blob/c3222931aba47b313a5b5b9f3796f08433c5f3b9/libavcodec/lpc.c#L261) which grows smaller every iteration. I don't know what the idea for that is, and it seems counterproductive to me. My implementation in libFLAC weighs according to the L1-norm, but has a cut-off in place for small residuals, this is something suggested by most books I read on the subject. Cutoff is currently at 2 (https://github.com/ktmf01/flac/blob/97d52faec95ebee4b3ffc8512a0d9ee03e2a0d45/src/libFLAC/lpc.c#L257), but this is something that is tuneable. This cut-off makes sure small residuals don't get too much attention, and it protects against division by zero. I have experimented with much larger values, and some music seems to compress better with values over 50 for example, so this still needs tuning.

My implementation also weighs with a moving average of the data. Current moving average window width is 128 samples (https://github.com/ktmf01/flac/blob/97d52faec95ebee4b3ffc8512a0d9ee03e2a0d45/src/libFLAC/lpc.c#L50), but this also something that is tuneable. This is because the way rice partitions work: a large residual in one part of the block does not have to have the same impact in another part of the block. By using a moving average as a proxy for this effect, the IRLS algorithm can try to optimize the whole block for the minimal number of bits, instead of only the hardest-to-predict parts. This moving average window width is also tunable, and is related to the -r parameter.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-27 12:30:39
Are you sure? I cannot think of a way that can happen

I ran two rounds -e and one -p -e, happened on all three. Will run them over a few times to see if it was a coincidence.

By the way, was there any reason that the new -5 to -8 should improve compression over the double precision version I tested above? (Because, apparently they do improve. Again only tested the 96/24 files.)


Quote
I assume you mean -lpc_passes?
Yes.

Quote
If you want to compare anything, I'd say comparing to irlspost(N) instead of irlspost-p(N) is more fair. irlspost(5) would correspond to -lpc_passes 6, but the algorithms are quite a bit different.

Depends on whether you think irlspost(N) is ready for test runs?
Naming suggests that irlspost(N) is like irlspost-p(N) but without the final "p" - and so that the algorithms that are "quite a bit different" are those two vs ffmpeg, and not irlspost vs irlspost-p?
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-27 20:37:58
By the way, was there any reason that the new -5 to -8 should improve compression over the double precision version I tested above? (Because, apparently they do improve. Again only tested the 96/24 files.)
Yes, the code has seen some improvements since I uploaded that binary. They have been included in the most recent binary.

Depends on whether you think irlspost(N) is ready for test runs?
Naming suggests that irlspost(N) is like irlspost-p(N) but without the final "p" - and so that the algorithms that are "quite a bit different" are those two vs ffmpeg, and not irlspost vs irlspost-p?
3 'apodizations' have been added. They aren't apodizations, but that was a good way to introduce it and it works in the exact same part of the code. Those three are irls(N), irlspost(N) and irlspost-p(N). They work with the exact same code, but the way in which they interact with the rest of the code is different.

irls(N) does iterations 'from scratch', irlspost(N) takes the best of the previous apodizations as a starting point (hence the post, it works like post-processing), irlspost-p(N) is the same but with precision search just for the end results of the IRLS process. The inner process of all three is the same.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-29 16:33:14
First: false alarm on -e taking more time. Probably it was because I kept files on a spinning drive approaching full, and so I/O would always take more time on the second run (and I would do -9 before -9 -e).

So I re-started the overnight jobs to get accurate encoding times for the full seven-hour 96/24 corpus. Here are a few at 4096 except ffmpeg at its default 8192 and Flake at variable blocking:

* 2644 for flac.exe 1.3.1 at -8 -p -e . It spent 170 minutes on this, only because I had to test how low the new could go and still beat it:
* 2641: In four minutes, your new -2 compresses better than 1.3.1 -8 -p -e. That is worth something eh? :-D
* 2605: as good as CUETools' Flake got it within subset. (Not timed, from earlier,  -8 --vbr)
* 2597: new -5 (five minutes)
* 2585: ffmpeg -compression_level 12 (No time, from earlier.)

Bearing in mind the improvement from 1.3 1 and Flake down to the next one, the compression improvements from -7 aren't much:
* 2578: new -7 (< 7 minutes).
* 2577: ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 2 (45 minutes), narrowly beaten by your new -8:
* 2577: new -8 (10 minutes)
* 2575: new -8 -p -e takes 3h38min, that is slower than 2x realtime, and I got better compression from "-9 without -e" (not timed)
* 2575: 69 minutes for ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 4
* 2574: 92 minutes for -9. The improvement over "without -e" is 0.48 kbit/s
* 2574: 93 minutes for ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 6, narrowly beats -9. And another 24 minutes on -lpc_passes 8 only improved a fraction of a kbit/s.
* 2574: 84 minutes for "-9 with -e replaced by -r 8", beats -9 by 0.15 kbit/s - that is due to The Tea Party EP (which also by the way is better off at -b 2048)
* 2574: -9 -p improves 0.35 over -9 and is so slow I won't touch it again.


For what this limited testing is worth, it points at:
* -7 is so near -8 that ... Should one beef up -8 to get a proper difference? Try to look for a different apodization function? Look for overlap, add a fourth, do -r 8 or ... ? On the other thand, it is slower than the stock compile.
* -8 -p -e must die. Die. Oh it doesn't even move. Oh yes it does ... in a few hours. But nobody uses it eh? Or well maybe somebody does because it is perceived as the "best" preset.
* For those who want to wait for -9, it is on par with with what ffmpeg can do in the same amount of time (ffmpeg: -lpc_passes costs 6 minutes per pass. I didn't calculate until after running the even-only).
* While I don't think people will do -9 without having CPU time to spend, consider whether a new -9 should force -e included, when it isn't in -0 to -8. Maybe better keep it optional, as omitting it doesn't lead to unreasonable compressions.
* The average improvements for the tougher modes, are driven by a few signals - the TTP EP and Cult of Luna, primarily. That points towards an adaptive "this is enough!" mode, if anyone bothers to implement it. Or, actually, even if one doesn't want to go for the variable blocking strategy, then one could do two full -5 runs only to guesstimate the best for-the-file blocksize before running -9, only adding ten percent to the overall time. We have an expensive -p and an expensive -e, so heck ...
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-01 10:56:19
More overnight testings. The question I was thinking of asking is: how much CPU time do you have to spend to s(h)ave the next megabyte? (Amounting to about a third of a kbit/s on this corpus.) You would expect this marginal time cost to increase as you first select the ripe fruits to pick, and the following suggest that it holds (compare to 1.3.1, where going from -8 to -8 -e saves 62 MB in a quarter of an hour, that is around fifteen seconds per):
* Going new -5 (or -6, also just tested, is not much better than -5) to new -7: a few seconds per megabyte s(h)aved. Did I say that new -7 is damn good?!
* new -7 to -8: a minute per megabyte s(h)aved
* -8 to -9: about ten minutes for the next megabyte. Sounds horrible, but it is better than going -8 to -8 -p -e.

But then at this "ten minutes for the next" (for those willing to spend that CPU time going from -8 to -9) - then interesting things happen:
There are alternatives that get you a megabyte per ten minutes, and several of them seem to add up nicely! Given that the "cost" a megabyte just jumped by a factor of ten, you would expect the next to be jumping further? No, not necessarily:
The following are about in the same minutes-for-the-next-megabyte and can be combined on top of -8:
* A manual "-9 without -e"
* adding "-r 8 -e" to the previous
* Two more apodization functions: I just naively continued the "partial_tukey(2);punchout_tukey(3)" pattern with A tukey(5e-1);partial_tukey(2);punchout_tukey(3);partial_tukey(4);punchout_tukey(5)

That the marginal cost stays flat for a while, suggests that there is some smart setting to be found that catches the lion's share of the improvement much cheaper. What to try?
(Idea: maybe try an approach where higher order partial_tukeys are done by rather than making 4 each of length 1/4, take the first and last 1/2, and first and last 1/4 and leave the middle to a different function name?)



What I actually intended with these tests, was something else: My idea was that with these improvements, there might be settings that are no longer well suited because they only take time doing what the new code already picks up. (Like -p ... pending more testing, the documentation could indicate that this is less worth than -e.) Apart from -7 getting so close to -8, I didn't find anything consistent. Some isolated strange things are going on; like, I tried two files with -9 -l 32 (which encodes at half real-time!) and one of them became bigger than -9.



Also, wishlist items:
* After displaying compression ratio with three decimals (should be enough for most purposes, but ...), it wouldn't hurt with a "(+)" if it is >1 and "(=)" if it hits exactly the same audio size (which it sometimes will, say if recompressing with a -r 8 that makes no difference).
* --force-overwrite-iff-higher-compression . (Which spawns the question: if your irlsposts are post-processing algorithms, could they take as starting point an existing flac file without doing the initial compression?)
* Maybe abort with error if user gives -A "blah blah blah without closing quotation mark -- "filename with space.wav" ? I learned it leaves flac.exe hanging doing nothing responding nothing.
* For the documentation (https://xiph.org/flac/documentation_tools_flac.html): the fact that flac accepts 5e-1, avoiding locale-specific 0.5 vs 0,5. Should be recommended.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-10-01 19:32:44
* A manual "-9 without -e"
Sadly, even a -9 without -e internally forces the -e on the IRLS code because I haven't implemented a 'guess' method for the predictor order for IRLS code. Perhaps I could check what would happen if the code just defaulted to the highest order calculated.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-03 20:34:30
Perhaps I could check what would happen if the code just defaulted to the highest order calculated.
Why not ... since it is not my time. Well actually: presets 0 through 8 all have the 2x2=4 combinations of -N, -N -p, -N -e and -N -p -e, so, not a bad idea either.

Did another test on a different computer. Earlier on we found that double precision did well on higher frequency content (you did an aggressive lowpass to get the point across), and I was curious whether it carries over to even higher-rez. Turns out yes - and also your most recent update does.
This is one track only, so all reservations taken. I went over to http://www.2l.no/hires/ and picked the Britten track in 352.8/24 as well as 88.2/24 (that's in the 96 column ...) and ran it with (I) stock 1.3.1, (II) your "native" as posted here, (III) your double precision as posted here, and (IV) your most recent IRLS.
Did -8 and -8 -e.  And then (V) on the new build: -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost(3)" (no -p!)  and -9. Then, (VI): replaced the irlspost(3) by irlspost(7) and for -9, bumped up to irlspost-p(7). And finally, your new -7.

Observed the gains:
(I) 4515 kbit/s at -8. Bitrate savings to -e: 42 resp. 19.
(II) Nine kbit/s better than 1.3.1, and Savings to -e comparble as for 1.3.1, ending up at 6842 resp 2127
(III) HUGE savings for the 352 file! Bitrates 6525 resp. 2101.
Going to -e saves another 30 resp ... rounded off to 0.
(IV) 6444 vs again 2101. And, rounded off to integer kbit/s, -e gains nothing.
(V) irlspost(3) makes no difference. -9 gains 5 resp. 10 kbit/s.
(VI) Just don't. One file up a couple hundred bytes, one down about the same.

Also, going -8 to -6 hurts the 352 the most. Same about -4 to -3 and -1 to -0. Going -8 to -7 does hardly anything, neither does -6 to -5 to -4 or -2 to -1. Only  -3 to -2 hits the 88.2 slightly more.
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-10-04 12:43:25
Observed the gains:
(I) 4515 kbit/s at -8. Bitrate savings to -e: 42 resp. 19.
(II) Nine kbit/s better than 1.3.1, and Savings to -e comparble as for 1.3.1, ending up at 6842 resp 2127
(III) HUGE savings for the 352 file! Bitrates 6525 resp. 2101.
Going to -e saves another 30 resp ... rounded off to 0.
(IV) 6444 vs again 2101. And, rounded off to integer kbit/s, -e gains nothing.
(V) irlspost(3) makes no difference. -9 gains 5 resp. 10 kbit/s.
(VI) Just don't. One file up a couple hundred bytes, one down about the same.
Could you explain this a little more, I can't follow. (I) has only 1 result for -8 but (II) and (III) have 2 which are in completely different ballparks? The number in the 6000-range is the 384kHz file and the one in the 2000-range the 88.2kHz number, I think? What is the 4000-range number for (I)?
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-04 14:27:44
Gosh, sorry. (4515 is the average of the two.) And then I got something wrong because (I) and (II) yield the same at -e. Restating (I) and (II):

(I): -8 yields 6884 and 2146. -8 -e yields 6842 vs 2127, improving 42 resp. 19.
(II): -8 yields 6869 and 2142. Adding a "-e": again 6842 resp 2127, so the "-e" improvements are down to 25 resp. 15.

That's still about the same ballpark when you compare to what gives the BIG differences on the 352.8 file. While the 88.2 varies with at most 46 kbit/s - that is still two percent! - the big one is reduced by six percent, and the contributions can be summarized in order of significance:

* 327 (that is nearly five percent) by going from your "native" build -8 -e to the double-precision build at -8
* 79 by going native -8 -e to IRLS -8 (without -e)
* Twenties to forties: adding "-e" in (I), (II), (III)
* A few kbit/s: Going 1.3.1 -8 to native -8; and, adding "-p" to IRLS -8;
* Zero to one, all on the IRLS build: adding "-e" to -8; further adding "-p" to -8 -e and to -9.

Here since I had to do things over again, I also let the IRLS beta do -p. The bitrates for the big file for the IRLS version are
6444 for -8 -e (gains 1208 bytes, that is 15 parts per million, over -8)
6439 for -9
6438 for -8 -p (improves over -8 -e, contrary to my uh, well-founded prejudices ...),
6437 for -8 -p -e
6437 for -9 -p

So the huge gain is the move to double precision; and then on top of that, your new build improves further, dominating the advantages that "-e" used to give. Then my interpretation was that new -8 is so good that "-e" no longer saves much; but, here -p did something.

(Hm, is there any way to make an "intelligent guess at doing halfway -e, halfway -p, halfway various -b" without going full brutal on it all? Let's not call the semiexpensive  switch "--sex", rather ... hey, -x is not taken.)



Oh, and one more thing. On the seven-file corpus, where I said -9 -p is so slow I wouldn't touch it again: nevertheless, to get a more accurate timing of it, I gave it an overnight job to arrive at around 1x realtime, i.e.:
-9 -p = -9 -p -e took around twice the time of -8 -p -e.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-05 20:56:51
New testing with conflicting results, CDDA corpus and new higher-rez.

A bit of tuning on the previous 96/24 found that I could use -8 -Ax4, where the four being punchout_tukey(7) and partial_tukeys 7, 4 and 1 (=the default tukey(5e-1) to beat -9 - and comfortably and a fraction of the time.
So to see if that was a more universal observation, I ditched the old corpus and found a new.  Results vary wildly from CDDA to hi-rez.

Second and third columns are improvement in 1/100 percent (not percentage points!) over new -8 and over the previous boldfaced setting in the column; the "-b 4608" is over the setting in question.

Second column is CDDA, third is hirez.  Positives are savings. Fourth column is encoding time. For "reference": new -7 (damn good I say!), new -5, and the original 1.3.1 -8 files.

*EDIT:* Facepalm, the table had only encoding time for the full thing, not CDDA. And -e is more expensive for higher rez, so that -8 -e actually takes less time than the Ax4 (458 vs 475 seconds). Trying to edit manually:
Setting etcCDDAhirezencoding time
1.3.1 -8 to new -8-11.7-542.1n/a
-5 to -8-48.8-167.6149
-7 to -8-3.7-17.5242
(This is -8)00347
-b 4608 impr:-0.62.2
-e over -81.313.5869CDDA: 458
over previous1.313.5
-b 4608 impr:-0.62.3
Ax4 over -84.535.7838CDDA: 475
over previous3.222.1
-b 4608 impr:-0.33.1
Ax4 -r8 over -84.635.6894
over previous0.1-0.1
-b 4608 impr:-0.33.4
Ax4 -r8 -e over -85.842.03429
over previous1.26.4
-b 4608 impr:-0.33.5
-9 over -810.929.12820CDDA: 1537
over previous5.1-12.9
-b 4608 impr:-1.43.7
-9 -r8 over -811.029.03794
over previous0.1-0.1
-b 4608 impr:-1.33.7
... thanks to whomever posted the link to https://theenemy.dk/table/



What we can see:
* No reason to use -8 -e.  The "Ax4" beats -e quite a lot, at about the same time.
* -9 for CDDA: improvement over Ax4 even more than Ax4 improves over -e, but expensive in time. 
* -9 for hi-rez: loses to Ax4
* -b 4608 hurts CDDA slightly (unexpected), benefits hi-rez (expected).
* Net effect of -r 8 is around zero.



Music then.  
CDDA, I sorted my collection (tracks, not images!) on audio MD5, started somewhere and highlighted about 2 GB FLAC (1.3.1) files consecutive in the sorting. That's Beastie Boys and Black Sabbath, Zappa and Muse, Skinny Puppy and Creedence ... but overall heavy on electric guitar-driven rock, prog and metal.
hi-rez, I took a few albums and topped up with what was hi-rez from an Earache free sampler - makes this even heavier metal oriented than the CDDA part. Got to two GB there too.

No classical this time, because I have not much more > CDDA than what I have already used. 
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-08 14:22:07
Well as if I haven't already posted enough misunderstandings of mine, here I think I made another.

found that I could use -8 -Ax4, where the four being punchout_tukey(7) and partial_tukeys 7, 4 and 1 (=the default tukey(5e-1) to beat -9

Let's see: partial_tukey(N) creates N functions and as I think @ktf tried to teach me, each of them apparently takes about as much computational effort as any other. (I casually tested ... it seems so.)
Back in the younger neolithic, this forum discussed how -Ax2 was interesting but not worth it; well -8 is now an Ax6 and the above is a nineteen to beat -e.

And for -9 and IRLS: a -Ax19 -e to beat -Ax6+irlspost+-e ...

(Looks like the battle between brute force and a sensible algorithm. Except, it seems, the sensible algorithm takes so much time that semi-Brutus can give it a few casual stabs and then go home.)
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-27 23:17:19
Anyone feels like testing the flac-irls-2021-09-21.exe posted at https://hydrogenaud.io/index.php?topic=120158.msg1003256#msg1003256 with parameters as below?  Rather than me writing walls of text about something that could be spurious ...


For CDDA:
* Time (should be similar) and size: -8 against -8 -A "tukey(5e-1);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"  against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"  (I expect no big differences between the latter two.)
* Time (should not be too different) and size: -7 -p against -8 -A "welch;flattop;partial_tukey(3/0/999e-3);punchout_tukey(4/0/8e-2)" (or replace the welch by the tukey if you like)
* Time (should not be too different) and size (will differ!): -8 -e against -8 -p against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)" 
Note, it is irlspost, not irlspost-p.


For samplerate at least 88.2:
* -8 against -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"
* For each of those two: How much does -e improve size?
* How much larger and faster than -e, is -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)" ?

My tests indicate that the gauss(3e-3) combination is impresses nobody on CDDA, makes very little difference on most hirez files - but for a few it could be a percent.  And, then the "-e" improvement was a WTF. But hi-rez performance is absolutely not consistent ... well it is much better than the official release.
Title: Re: New FLAC compression improvement
Post by: bennetng on 2021-10-30 11:33:20
But hi-rez performance is absolutely not consistent
I guess it's because most of the CDDA frequnecy bandwidth in typical audio files is used to encode audible signal with more predictable harmonic and transient structure. On the other hand, hi-res files can have gross amount of modulator noise from AD converters, idle tones, obvious low pass at 20-24kHz, resampling artifacts, and occasionally real musical harmonics and transients in the ultrasonic frequencies. They are quite different things and therefore hard to predict.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-31 19:23:39
and occasionally real musical harmonics and transients

Does my sensitive snout smell the sweet scents of sarcasm ... ?  :))   Indeed I suspect the same reasons.
And even if it weren't, it would still be well justified to tune the encoder more for CDDA than for formats with less market share.

Still it would have been nice to find something that catches several phenomena cheaply. I ran mass-testing on one hirez corpus and then observed the same thing on another - but due to very few files in both - and I still wonder if it is too spurious.
Title: Re: New FLAC compression improvement
Post by: bennetng on 2021-11-01 11:34:39
Does my sensitive snout smell the sweet scents of sarcasm ... ?  :))
Maybe, but speaking of sarcasm and unpredictable factors:
https://www.audiosciencereview.com/forum/index.php?threads/sound-liaison-pcm-dxd-dsd-free-compare-formats-sampler-a-new-2-0-version.23274/post-793538
Those who produce hi-res recording couldn't hear those beeps in the first place?

I've seen lossy video codecs can be tuned to optimize for film grain, Anime and such, but don't know if it is relevant to lossless audio or not.
Title: Re: New FLAC compression improvement
Post by: rrx on 2021-11-05 12:33:06
Anyone feels like testing the flac-irls-2021-09-21.exe posted at https://hydrogenaud.io/index.php?topic=120158.msg1003256#msg1003256 with parameters as below?  Rather than me writing walls of text about something that could be spurious ...


For CDDA:
* Time (should be similar) and size: -8 against -8 -A "tukey(5e-1);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"  against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"  (I expect no big differences between the latter two.)
* Time (should not be too different) and size: -7 -p against -8 -A "welch;flattop;partial_tukey(3/0/999e-3);punchout_tukey(4/0/8e-2)" (or replace the welch by the tukey if you like)
* Time (should not be too different) and size (will differ!): -8 -e against -8 -p against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)" 
Note, it is irlspost, not irlspost-p.


For samplerate at least 88.2:
* -8 against -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"
* For each of those two: How much does -e improve size?
* How much larger and faster than -e, is -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)" ?

My tests indicate that the gauss(3e-3) combination is impresses nobody on CDDA, makes very little difference on most hirez files - but for a few it could be a percent.  And, then the "-e" improvement was a WTF. But hi-rez performance is absolutely not consistent ... well it is much better than the official release.

I'm too lazy to be putting together the encoding times table, but here's a few compression comparisons:

16/44.1 (https://i.imgur.com/ixrbCkI.jpeg), 16/44.1 (https://i.imgur.com/JwIAnnA.jpeg), 16/44.1 (https://i.imgur.com/UEHZ7I3.jpeg), 16/44.1 (https://i.imgur.com/BDfGP73.jpeg), 16/44.1 (https://i.imgur.com/OzH7fCJ.jpeg), 16/44.1 (https://i.imgur.com/7UZyTVJ.jpeg), 24/44.1 (https://i.imgur.com/bP4Ha7T.jpeg), 16/48 (https://i.imgur.com/oR40XPd.jpeg), 24/48 (https://i.imgur.com/0ISEpYK.jpeg), 24/96 (https://i.imgur.com/D3vBuu3.jpeg), 24/192 (https://i.imgur.com/hb4Vye3.jpeg), 24/192 (https://i.imgur.com/5HW8MEx.jpeg)

Based on what we have there, flac-irls-2021-09-21 -8 -p outperformed every other parameter set in eleven cases out of twelve, -8 -e winning in one case. What surprised me was that flac-irls-2021-09-21 -8 -p (and in some cases just -8) also outperformed flaccl v2.1.9 -11 in eight cases out of twelve, flaccl curiously winning in both 24/192 cases and in two 16/44.1 cases out of six. All of outperforming, mind, was achieved by a very slim margin.

In terms of speed, I did a quick test with Amon Tobin's How Do You Live LP, 16/44.1.
For flac-irls-2021-09-21 -8 -p, total encoding time was 0:53.509, 50.33x realtime, while for flaccl -11 it was 0:23.978, 112.33x realtime.
Decoding time for flac-irls-2021-09-21 -8 -p tracks was 0:03.284, 818.845x realtime, and for flaccl -11 it was 0:03.496, 769.270x realtime.

P.S. Whoops, I missed -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)". Oh well.
Title: Re: New FLAC compression improvement
Post by: Adil on 2021-11-05 12:47:53
@rrx
What syntax do you use to view percentage rates in the "Compression" tab of Playlist View?
Title: Re: New FLAC compression improvement
Post by: rrx on 2021-11-05 12:53:03
@rrx
What syntax do you use to view percentage rates in the "Compression" tab of Playlist View?
Good question.
Code: [Select]
$if($or($strcmp($ext(%path%),cue),$stricmp($ext(%path%),ifo),$stricmp($info(cue_embedded),yes)),$puts(percent,$div($div($mul(100000000,%length_samples%,%bitrate%),%samplerate%),$mul($info(channels),%length_samples%,$if($strcmp($info(encoding),lossless),$info(Bitspersample),16))))$left($get(percent),$sub($len($get(percent)),3))','$right($get(percent),3),$puts(percent,$div($mul(800000,%filesize%),$mul($info(channels),%length_samples%,$if($stricmp($info(encoding),lossless),$info(Bitspersample),16))))$left($get(percent),$sub($len($get(percent)),3))','$right($get(percent),3))'%')
Title: Re: New FLAC compression improvement
Post by: Adil on 2021-11-05 13:02:02
Thank you very much!
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-11-05 14:26:40
Besides the testing here has anyone an idea if there ever will be another official flac release? Since the cancelled 1.34 version it became very silent.
Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-11-05 15:55:01
Would be bloody annoying if not, when ktf has found the improvements that gave the independent implementations the upper hand over the official for fifteen yeart, and fixed bugs that made files blow up (https://hydrogenaud.io/index.php?topic=121349.msg1001227#msg1001227).
Title: Re: New FLAC compression improvement
Post by: ktf on 2021-11-09 19:35:28
Okay, I have something fresh to chew on for those interested: a Windows 64-bit binary is attached. Code is here but needs cleaning up before this could be merged (https://github.com/ktmf01/flac/tree/autoc-double-irls-subblock).

I've rewritten the partial_tukey(n) and punchout_tukey(n) code into something new: subblock(n). Partial_tukey(n) and punchout_tukey(n) still exist, this is a (faster) reimplementation recycling as much calculations as is possible. It is rather easy to use:


The main benefit is that it is a bit faster and also much cleaner to read. The reason for the weird tukey parameters (4e-2 etc.) is that the sloped parts for the main tukey and the partial tukeys have to be the same, as the code works by subtracting the autocorrelation calculated for the partial tukey to calculate the punchout tukey.

Attached is also a PDF with three lines:

From right to left are encoder presets -4, -5, -6, -7 and -8, with the darkgreen extending further left with -9 and the lightblue extending further left with -8 -A subblock(6), -9 and finally -9 -A subblock(6);irlspost-p(3). To make it even more complex. The darkgreen -9 was defined as -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost-p(3)" with -e, the lightblue -9 is defined as "subblock(3);irlspost-p(4)" without -e. That last change was a suggestion from @Porcus

As you can see in the graph, this change makes presets -6, -7 and -8 a little faster. -9 is now 3 times as fast (mind that the scale isn't logarithmical) but compression is slightly worse. Perhaps I should increase to irlspost-p(5).

edit: I just realize that subblock is probably a confusing name, so perhaps I'll come up with another.
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-11-10 03:21:59
Still the same 29 CDs. Not sure -9 is convincing enough.

IRLS-subblock beta -8
7.590.501.741 Bytes

IRLS-subblock beta -8 -ep
7.584.176.739 Bytes

IRLS-subblock beta -9
7.588.567.011 Bytes

IRLS-subblock beta -9 -ep
7.584.089.533 Bytes

older results:
IRLS beta -9
7.583.395.627 Bytes

IRLS beta -9 -ep
7.582.193.957 Bytes

CUEtools flake -8
7.591.331.615 Bytes
Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-11-10 04:33:20
Missed that one, sorry. Only slightly slower as -9

IRLS-subblock beta -8 -p
7.585.883.902 Bytes