Print Page - New FLAC compression improvement

Title: New FLAC compression improvement
Post by: ktf on 2020-11-04 20:47:09

Hi all,

Last summer, I've been busy thinking of ways to improve on FLACs compression within the specification and without resorting to variable blocksizes, which are in the spec but might not be properly implemented on many decoders. I discussed this old post with its writer: https://hydrogenaud.io/index.php?topic=106545.50

He wrote a small python script to explore the idea of a integer least-squares solver. It's included as ils.py. I explored this idea, read a lot of literature and came up with a different solution.

The current FLAC LPC stage is a classic, textbook example so to speak. This method was designed for speech encoding. The ILS method SebastianG proposed tries to find a more optimal approach. While the current approach comes close to generating least-squares solutions, this could perhaps be improved by a more direct approach.

However, FLACs entropy encoder doesn't encode 'squared' values, it's cost is more linear. That is why I developed a iteratively reweighted least squares solver, which, by weighing, doesn't come to a least squares solution but a so-called least absolute deviation solution. A proof-of-concept is also attached as irils-calculate-improvement.py. It gives mostly equal results to the current approach on most material, but improves on certain, mostly synthetic material like electronic music.

I even got as far as implementing it in FLAC, which can be found here: https://github.com/ktmf01/flac/tree/irls However, this implementation doesn't perform as well as the python proof-of-concept, and as my free time ran out last summer, I haven't been able to improve on it.

Maybe some people here with love for both FLAC and math might be interested in looking into this. I'll look at this again when I have some time to spare and a mind ready for some challenge :D

I'd love to hear questions, comments, etc.

Title: Re: New FLAC compression improvement
Post by: itisljar on 2020-11-05 09:38:57

Well, did you test it in real world scenarios and what were the results? Don't just drop this here and go away :)

Title: Re: New FLAC compression improvement
Post by: ktf on 2020-11-05 17:16:55

That is a very good question. I probably didn't explain myself well enough in that regard.

The C code on github is not working properly I think. It isn't an improvement over the existing FLAC codebase. So, I haven't reached any real-world compression gains yet.

However, the Python proof-of-concept is promising, albeit with a caveat. To simplify the proof-op-concept, it only performs simple rice bits calculation, without partitioning, and it does not do stereo decorrelation. So, I can only compare single-channel FLAC files with rice partitioning turned off. This is why the gains in the proof-of-concept might be more than what would be achievable with rice partitioning and stereo decorrelation.

For most material, my FLAC files got about 0.2% smaller (which would be 0.1% of the original WAV size). In one case, with electronic music (Infected Mushroom - Trance Party) I got an improvement of 1.2% (which would be 0.6% of the original file size).

So, I think this is promising, but I haven't been able to achieve this in C yet.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-07 19:03:15

Hi all,

Last week I've had some spare time and a mind ready for a challenge, so I took a look at the code (https://github.com/ktmf01/flac/tree/irls (https://github.com/ktmf01/flac/tree/irls)). Sadly, it seems it still needs quite a bit of work before it performs as well as current FLAC. I'll share my findings here, so I can read them back later and for anyone interested.

How FLAC has worked since the beginning
So, FLAC uses LPC (linear predictive coding) to predict a sample based on previous samples, and stores the error/residual with a so called rice code. Finding a good predictor is done by modelling the input samples as autoregressive, which means there is some correlation between the current sample and past samples. There is a 'textbook way' of calculating the model parameters (which we'll take as the predictor coefficients) by using the Yule-Walker equations. These equations form a Toeplitz matrix, which can be solved quickly with Levinson-Durbin recursion. While there a few shortcuts taken, this should result in a solution which is close to a least-squares solution of the problem, and is very fast to compute.

What could be improved about this
While this is all good and fun, the predictor resulting from this process is not optimal for several reasons. First, the process minimizes the square error, while for the smallest file size, we want the shortest rice code. Second, as FLAC input is usually audio and not some steady-state signal, the optimal predictor changes over time. It might be better to ignore a (short) part of the input when trying to find a predictor. In other words: it might be better to find a good predictor for half the signal, then a mediocre predictor for all of the signal. Third, minimizing the square error puts an emphasis on single outlier samples, which messes up the prediction of all other samples, while this single sample will not fit in any predictor at all.

What has been already improved
Ignoring a short part of the signal is exactly what a patch I submitted a few years ago does. It added partial_tukey and punchout_tukey windows/apodizations, which ignore part of the signal. This is a brute-force approach, the exact same approach is tried on every block of samples.

What I have been trying last year
Now, last summer, I wanted to try a new approach in finding a predictor. This started out as a different way to find a least squares solution to the problem (without taking shortcuts), but I started to realize least squares is not equal to smallest rice code. As the rice code takes up the most space in a FLAC file, that should be the ultimate aim if I want compression gains. I figured that compared to a least squares solution for a predictor, using a least absolution deviation (LAD for short) (https://en.wikipedia.org/wiki/Least_absolute_deviations) solution is more resistant to 'outliers'. As these outliers mess up prediction, I thought this might work well, so I implemented code into FLAC to do this. This is done through iteratively reweighted least squares (https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares), or IRLS for short. This code works, but it is very slow and does not (yet) compress better.

What is wrong with this approach
After reviewing my code last week, I realise the LAD (least absolute deviation) IRLS implementation that I made is still not minimizing the rice code. For one, the rice code in FLAC is partitioned, which means that a large error at the beginning of the block can grow the rice code length differently than a large error at the end of a block for example. Depending on the rice parameter, a slightly larger error does not cost any extra bits at all (when the error is smaller than 2^(rice parameter - 1)), it might cost one extra bit (when the error is the same size as 2^(rice parameter)) or it might take a few bits (when the error is much larger than 2^(rice parameter)). I could not find any existing documented IRLS weighting (or a so called norm) that works with rice codes.

What I want to improve about it
So, the next step is to improve the IRLS weighting. This weighting procedure should ideally incorporate knowledge of the rice parameter (as this says something about whether bits can be saved or not) but this is not known during weighting. I think using a moving average* of the residual might be a good way to guess the rice parameter during the weighting stage. I could also use a partitioned average*, but as the number of rice partitions is not known beforehand, just as the rice parameter, the size of the partitions to average on will probably not match the rice partition size, and we might get strange behaviour on partition boundaries. With a moving average window the problem is similar (choosing the size of the window). The optimal window size correlates with the rice partition size, but this size is not known during the weighting stage, but at least this moving average doesn't 'jump' on arbitrary boundaries.

Using a moving average*, the weighting procedure can incorporate knowledge about which errors are considered too large to try and fix, and which are too small to bother with. If the error on a sample is much smaller than the moving average it can be ignored, and if the error on a sample is much larger than the moving average it can be ignored too. Only if the error is in a certain band, it should be weighted such that the next iteration tries to decrease this error.

In the current IRLS weighing scheme is least absolute deviation. To get from the default least squares to least absolute deviation, the weights are the inverse of the residual (1/r). For very small r, the weight 1/r is capped at some value. I can change this value depending on the moving average. Above this value, I will try using a weight of 1/r². The effect I think this will have is sketched in the image I have attached

If this doesn't work, maybe a moving average can instruct the algorithm to ignore a block. This is how the partial_tukey and punchout_tukey work, but here the method is not applied brute-force, but with some knowledge of the samples itself. If the moving average is much larger than the average of the whole block, that part is ignored. This way, short bursts of outliers (for example, the attack of a new tone) can be ignored in the predictor modelling.

*when I say average, I mean average of the absolute value of the residual from the last iteration.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-16 20:06:44

The last few days I've been busy working on this, and there is some progress. Sadly, the code is very slow, and I don't think this will improve much. This improvement fits a certain niche of FLAC users who want maximum compression within the FLAC format and don't care how long encoding takes, perhaps reencoding in the background.

Attached you'll a 64-bit Windows executable compiled by MinGW with -march=native on a Intel Kaby Lake-R processor and a PDF with a graph of my results. I haven't used -march before, but if I understand correctly, this code should run on fairly recent 64-bit CPUs with AVX2 and FMA3. To make testing a little easier, I have added a new preset, -9, which uses the new code.

The following graph shows the results of encoding with (from left to right) setting -5, -8, -8e, -8ep and finally -9.
(http://www.audiograaf.nl/misc_stuff/irls-compression.png)

The new preset is -8e + a new 'apodization function', irls(2 11). It is technically not an apodization, but in this way, the integration into the FLAC tool is pretty clean. The function has two parameters, the number of iterations per LPC order, and the number of orders. So, with irls(2 11), the encoder does two iterations at order 2, two iterations at order 3, all the way to 2 iterations at order 12. Sadly, there is still something wrong in the code, so I would recommend not using anything else then 11 order at this time. Using more iterations is very well possible, and gives a little gain at the cost of much slowdown.

Last time I improved FLAC compression (https://hydrogenaud.io/index.php?topic=106545.0) the improvement was about 0.2% with no slowdown. This one is another 0.2%, but at the cost of a 60x slowdown.

NOTE: Please use the attached binary with caution! There has little testing been done on it!

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-17 01:06:23

Not convinced of the usefulness, but interesting yes. Do you have any idea of what orders actually make for the improvement?

But I am curious why your graph wasn't visible here. Hotlinking forbidden? Anyway, it is at
http://www.audiograaf.nl/misc_stuff/irls-compression.png if any other HA kittehs should think the same:
(https://pics.me.me/thumb_this-is-relevant-to-my-interests-icanhaschee2dorgercom-varied-interests-50253454.png)

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-17 02:41:21

Very interesting stuff but even with a 5900x this is a hard nut.
flac-native -9 indeed compresses the albums i tried better as CUEtools flake -8 while this is slightly better on average as flac -8 ep.
The speed is to slow for my taste to consider it for use in my workflows but it is a very nice idea.
Many thanks for the effort and the working version :)

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-17 03:51:10

To late to edit the above, sorry. I did configure my frontend wrong by keeping -ep in the command.
The above counts for flac-native -9 -ep

Title: Re: New FLAC compression improvement
Post by: jaybeee on 2021-06-17 09:09:48

Quote from: Porcus on 2021-06-17 01:06:23

But I am curious why your graph wasn't visible here. Hotlinking forbidden? ...

Graph shows for me, so I suspect it's something in your browser settings preventing it being shown. It's happened to me before over the years.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-17 21:27:44

Some unscientific numbers for 29 random CD format albums:

CUEtools flake -8
7.591.331.615 Bytes

flac-native -8
7.623.712.629 Bytes

flac-native -9
7.586.738.858 Bytes

flac-native -9 -ep
7.581.988.737 Bytes

Title: Re: New FLAC compression improvement
Post by: kode54 on 2021-06-18 04:41:54

The hotlinking is broken because the link is http and the forum is https, and any browser which enforces security rules will block http resources, rather than downgrade the page security.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-18 08:57:17

Quote from: Wombat on 2021-06-17 21:27:44

Some unscientific numbers for 29 random CD format albums:
[...]

I was shocked to see that the difference between flake and this new LPC analysis method was so small, but I just realised that is probably because flake uses a smaller padding block by default. I tried to get CUEtools.Flake working here, but for some reason I can't. Wombat, can you check whether padding is indeed smaller with CUEtools.Flake? If there are no tracks longer than 20 minutes in your test, the difference should be 4096 bytes per track.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-18 14:37:15

You can use my old compile (https://hydrogenaud.io/index.php?topic=106446.msg903939#msg903939). If you want to use the encoder from the recent CUEtools download you need to copy the additional file Newtonsoft.Json.dll
Not to sure about padding. With foobars Optimize file layout + minimize i get this for a single album:
Total file size decreased by 29953 bytes for the CUEtools -8 version and Total file size decreased by 62721 bytes for the flac-native -9 version. I doubt this is much.

Title: Re: New FLAC compression improvement
Post by: IgorC on 2021-06-18 21:36:23

8pe vs 9 (+0.14% compression gain on 1 album).
~55-60x vs ~20x on my laptop with 6 cores/12 threads.

As for me, 9 is ok.

P.S. 9pe brings +0.08% compression gain at ~16x comparing to -9. Ok too.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-19 19:48:26

Quote from: Wombat on 2021-06-18 14:37:15

You can use my old compile (https://hydrogenaud.io/index.php?topic=106446.msg903939#msg903939).
[...]
Not to sure about padding. With foobars Optimize file layout + minimize i get this for a single album:
[...]

You are right. I did a comparison and CUEtools.Flake does a much better job than regular flac. It seems my time would have been better spend figuring out why CUEtools.Flake compressed so much better, but perhaps (hopefully) these gains stack, or my work will be for nothing.

Anyway, I've also rewritten my code to be able to use irls as postprocessing. So, when FLAC has found the best predictor in the normal way (through the regular LPC process, with the partial_tukey and punchout_tukey windows), it will try using that result as a starting point for IRLS, instead of starting over. In the graph this is called irlspost, with the number of iterations between round brackets. Using current -8e + some 4 extra windows (punchout_tukey()) like CUEtools.Flake does, and adding 3 iterations comes close to using -9 but is 4x faster.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 08:27:17

I found out why Flake compressed so much better than FLAC. Apparently, this has been a thing for at least 14 years: flake computes autocorrelation with double precision, and FLAC with single precision. this makes quite a difference on some tracks, especially piano music.

See the flac-dev mailinglist for the complete story: http://lists.xiph.org/pipermail/flac-dev/2021-June/006470.html

See the graph below for the results.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 09:55:15

Nice catch!

So the worst-slowdown CPUs for going double, would be those with SSE (which get the dark-blue benefits over the green) but no SSE2 (thus getting the light-blue penalty vs the red) - they will at -8 have a slowdown by a factor of 2.5 or so?
But at a benefit.

Thinking aloud about what near-equivalences between "new" and "old" for those worst-case SSE-but-not-SSE2 CPUs:
If I may presume that the most interesting (to end-users!) compression levels are -0, -8 and the default - signifying "speed please!", "compression please!" and "I don't care, it works":
* For the SSE-but-not-SSE2, going from dark-blue -6 to light-blue -5 would get as good as the same outcome. Of course you don't change the default out of the [light|dark]blue, but for those who choose default from the "I don't care, it works" it shouldn't matter much if a new build gives them the performance equivalent of bumping -5 to -6.
* And if the using-whatever-defaults users start to care, they can start using options. You didn't include any -3, but what would the light-blue -3 be? Hunch: close to dark-blue -5?
* It is also a stretch to redefine -8 or reassign "--best" to something else than -8 (even if the documentation always said "Currently" synonymous with -8) but the -7 suddenly looks damn good - and that goes for the red as well.

So maybe at the conclusion of your efforts one should rethink what the numbers - and "--best" (and "--fast"?) stand for. -6 to -8 use -r 6 which is not the subset maximum; and "4096" is not the maximum subset blocksize.

(And dare I say it, but as impressive as TAK performs, one may wonder how much of it would be feasible within other formats too.)

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 10:26:25

Quote from: Porcus on 2021-06-24 09:55:15

So the worst-slowdown CPUs for going double, would be those with SSE (which get the dark-blue benefits over the green) but no SSE2 (thus getting the light-blue penalty vs the red) - they will at -8 have a slowdown by a factor of 2.5 or so?

Yes. SSE2 has been around for 20 years and is present on all 64-bit capable CPU's, so this should not affect many users. I'm not sure about the VSX part (for POWER8 and POWER9 CPU's) but I think the penalty will be comparable to the SSE-but-not-SSE2 part if nobody updates these routines before the next release. I don't have the hardware, so I can't develop it.

Quote

* And if the using-whatever-defaults users start to care, they can start using options. You didn't include any -3, but what would the light-blue -3 be? Hunch: close to dark-blue -5?

-3 disables stereo decorrelation and is in a completely different ballpark. See http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%204.pdf Graph from that PDF is below, see red circle for -3. Using -3 is pretty much pointless, but perhaps some archaic decoders can only decode without stereo decorrelation. I think I'd rather leave that untouched.

Quote

* It is also a stretch to redefine -8 or reassign "--best" to something else than -8 (even if the documentation always said "Currently" synonymous with -8) but the -7 suddenly looks damn good - and that goes for the red as well.

Yes, but the name "best" does imply that is the best. It isn't, -8e is better and -8ep is even better, but it is the best preset. I wouldn't want to change that.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 11:41:17

Quote from: ktf on 2021-06-24 10:26:25

the name "best" does imply that is the best. It isn't,

Let's hear it for the good(?) old --super-secret-totally-impractical-compression-level ! :))

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 12:06:24

But on -3 I think you turned two arguments on their head, or am I just having a bad hair day?

* -3 improves over -2 at both speed and compression simultaneously. That is not pointless, that is good.
If anything of those settings is pointless from a performance perspective, it is -2.

* if your new -3 would happen to turn out as good as the old -5, it would mean that users who want "old -5" performance can happily go "new -3".
But that does not prevent anyone with a special need for -3 to keep using -3.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 12:24:23

Preset -0, -1 and -2 are only using fixed subframes. These are really easy to decode. If one wants maximum performance with only fixed subframes, -2 is the way to go, It is the odd one out performance wise, but I think is is like using TAK's -p0m instead of -p1: slower encoding, same decoding speed and less compression, but there is some limitation in the encoding process which should help decoding. I don't know for sure, I've never studied the TAK format.

Anyway, from a performance perspective, I think -0, -1, -2 and -3 are all pointless. The gain from -4 to -8 is less than 0.5%point (~1%) while the gain from -3 to -4 is 2%point (~4%). However, in the very early days of FLAC support on hardware devices through Rockbox, some levels gave a longer battery life than others, and some devices with very limited hardware didn't run fast enough to decode LPC. There is probably some documentation around referring to FLAC presets in terms of decoding performance. That's why I think the presets should only be changed as long as decoding performance is not affected.

Very archaic indeed, but FLAC has been broadly accepted for quite a long time now, so I guess it's part of the deal

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 12:54:02

But again, allowing users to migrate to a lower-number setting does not hurt any compatibility.

Admittedly, I only speculated through gut-feeling extrapolation, as of what your new -3 would be able to do.
But let me just point at the graph you posted. If I understand your light-blue -5 vs dark-blue -6 correct, it means that an SSE (no SS2) user who as of today prefers -6 could get a performance hit - which could be as good as eliminated by switching to -5 in the new version.

Going -6 to -5 is ... fine. Going the other way, forcing users to choose a higher number, that has a risk - but -6 on an old version to -5
on a new version means they keep their performance and maybe even get a slightly better battery life upon decoding. So that is a case for just adopting the double.

Of course it isn't that straight - not everyone has a collection represented by your graphs.

Title: Re: New FLAC compression improvement
Post by: Rollin on 2021-06-24 12:58:11

Quote from: ktf on 2021-06-24 12:24:23

n the very early days of FLAC support on hardware devices through Rockbox, some levels gave a longer battery life than others, and some devices with very limited hardware didn't run fast enough to decode LPC

Can someone name at least one such slow device? According to these results (https://www.rockbox.org/wiki/CodecPerformanceComparison) FLAC -8 decoding is even faster than mp3 decoding on all tested hardware.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-24 13:11:05

Quote from: Rollin on 2021-06-24 12:58:11

Can someone name at least one such slow device? According to these results (https://www.rockbox.org/wiki/CodecPerformanceComparison) FLAC -8 decoding is even faster than mp3 decoding on all tested hardware.

Seeing that table I think I remembered wrong. That table was what I remembered but couldn't find, FLAC compression levels being referred to in a benchmark on decoding performance.

Anyway, I'd like the opinion of the people reading this on the following (quote from the mailing list: http://lists.xiph.org/pipermail/flac-dev/2021-June/006470.html)

Quote

Code is here: https://github.com/ktmf01/flac/tree/autoc-sse2 Before I send a push request, I'd like to discuss a choice that has to be made.
I see a few options
- Don't switch to autoc[] as doubles, keep current speed and ignore possible compression gain
- Switch to autoc[] as doubles, but keep current intrinsics routines. This means some platforms (with only SSE but not SSE2 or with VSX) will get less compression, but won't see a large slowdown.
- Switch to autoc[] as doubles, but remove current SSE and disable VSX intrinsics for someone to update them later (I don't have any POWER8 or POWER9 hardware to test). This means all platforms will get the same compression, but some (with only SSE but not SSE2 or with VSX) will see a large slowdown.

Thanks in advance for your replies and comments on this.

So, switching from single precision to double precision, I'm faced with a choice: should I keep the fast but single-precision routines for SSE and VSX, so they keep the same speed but no compression benefit? Or should these be removed/disabled so all platforms (ARM, POWER, ia32, AMD64) get the same compression, but at varying costs?

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-06-24 16:21:55

Very good findings and developement you very well presented here ktf, thanks!
Changing any generic behaviour you suggest directly to flac may be hard to keep if other authors don't like you or ideas of yours.
This may be a silly idea but why not optimize flake further and doing a ktf flac encoder for enthusiasts that way?

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-24 23:16:56

Quote from: ktf on 2021-06-24 13:11:05

Anyway, I'd like the opinion of the people reading this on the following

I think the "don't switch" (= do nothing) is the worst. Well with the reservation that you might want to wait until there is a new version with other improvements too.

Then opening some cans with possible dumb questions:
* is it possible to make the switch for a 64-bit-only version? (I understand from what you mentioned above, those platforms affected cannot run the 64-bit version anyway?)
* is it possible to make an option to turn off those routines? --SSEonlyplatform?

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-06-30 09:56:40

Quote from: Porcus on 2021-06-24 23:16:56

Then opening some cans with possible dumb questions:
* is it possible to make the switch for a 64-bit-only version? (I understand from what you mentioned above, those platforms affected cannot run the 64-bit version anyway?)
* is it possible to make an option to turn off those routines? --SSEonlyplatform?

First is possible (but I don't see the benefit of doing that?), the second would mean changing the libFLAC interface and sacrificing binary compatibility, so I'd rather avoid that.

Anyway, I've finished polishing the changes and I've sent a pull request through github: https://github.com/xiph/flac/pull/245 Further technical discussion is probably best placed there.

Here's a mingw 64-bit Windows compile for anyone to test. This does not have the IRLS enhancement and thus no compression level 9 like the previous binary, but it compressed better on preset -3 to -8, mostly on classical music. I've changed the vendor string (I forgot that on the previous binary) to "libFLAC Hydrogenaud.io autoc-double testversion 20210630", so if you try this on any part of your music collection, it is possible to check which files have been encoded with this binary.

Feel free to test this and report back your findings. I think the results will be quite a bit closer to what CUEtools.Flake is producing

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-06-30 18:02:37

Quote from: ktf on 2021-06-30 09:56:40

First is possible (but I don't see the benefit of doing that?)

If I understood correctly, all CPUs that would get the slowdown are 32-bit CPUs. Making the change for the 64-bit executable will be a benefit to several processors and not disadvantage anyone. If I got it right?

Then afterwards one can decide what to do for the 32-bit executable.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-07-01 03:20:47

Quote from: ktf on 2021-06-30 09:56:40

Feel free to test this and report back your findings. I think the results will be quite a bit closer to what CUEtools.Flake is producing

This SSE improvement indeed is a very nice finding!
The compile you offer ends for the 29 CDs with a size of:
7.592.544.746 Bytes
The small difference against flake is mainly due to padding this time i guess.

Quote from: Wombat on 2021-06-17 21:27:44

Some unscientific numbers for 29 random CD format albums:

CUEtools flake -8
7.591.331.615 Bytes

flac-native -8
7.623.712.629 Bytes

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-07-01 11:45:02

The difference was about a megabyte per CD. Now it is down to a megabyte in total. Dunno if you use tracks or images.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-07-02 18:22:15

I tried some 24/96 albums and the new binary improved compression very well. Even better as CUEtools flake.
Really looking forward for a version you add the IRLS weighing back in over a dedicated switch for example.
Maybe also interesting the -8 -ep size for these 29 albums:
7.584.945.065 Bytes

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-07-02 21:19:48

Quote from: Wombat on 2021-07-02 18:22:15

Really looking forward for a version you add the IRLS weighing back in over a dedicated switch for example.

Sadly, the gains do not stack well. Most of the gains from the IRLS patch are the same gains that this autocorrelation precision doubling also gets. It still works, but not as much as I posted earlier.

On the upside, I've now programmed with intrinsics, and it went pretty well. It was a nice challenge, so maybe I can speed up the IRLS code quite a bit to compensate for the lower gains. Also, the autocorrelation precision is also relevant in levels -3, -4, -5, -6 and -7, and -5 is relevant to the numbers in the Hydrogenaud.io wiki Lossless comparison (https://wiki.hydrogenaud.io/index.php?title=Lossless_comparison) :))

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-07-03 16:03:49

Looks like http://www.audiograaf.nl/downloads.html has not seen its final edition yet :)

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-01 15:37:30

@ktf , you are probably more busy looking into https://hydrogenaud.io/index.php?topic=121349.msg1001309 , but still:
I've tested your double-precision build a little bit back and forth about compression levels on a varied (but missing chartbusters) corpus of 96k/24 files, 96/24 because I conjecture FLAC was primarily developed for CDDA, and so try something else to look for surprises. All stereo, though. Comparison done to official flac.exe 1.3.1.

Your -5 does pretty well I think. Results at the end, but first: minor curious things going on:

* -l 12 -b 4096 -m -r 6 (that is, "-7 without apodization") and -l 12 -b 4096 -m -r 6 -A tukey(0,5) produce bit-identical output. So is the case for 1.3.1: "-A tukey(0,5)" makes no difference.
(My locale uses comma for decimal separator. By using the wrong one I also found out that "tukey(0.5)" is interpreted as "tukey(0)", which appears to be outright harmful to compression.)

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) . Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

* It seems that -b 4608 improves slightly - however I couldn't reproduce that on CDDA signals. Also, 16384 did not improve over 8192 on the 96/24 corpus.

So, the improvement figures. It improves quite a bit on -5. Note, sizes are done on the entire corpus, but times are not and are just an indication; for times, I only took a single album (NIN: The Slip) over to an SSD, hence the low numbers, and they are run only once.

* PCM is ~ 13.6 GB, compresses to 7.95 GB using 1.3.1 -5, and from then on we have MB savings as follows
80 MB going 1.3.1 -5 (24 seconds on The Slip) to 1.3.1 -8 (43 seconds)
193 MB going 1.3.1 -8 to your -5 (25 seconds). Heck, even your -3 (20 seconds) is 65 MB better than 1.3.1 -8.
38 MB going your -5 to your -8 (63 seconds). Note how this makes "-8" less of a bounty now. The big thing is how your -5 improves over 1.3.1 -5 by 3.34 percent or 1.9 percent points and at around zero speed penalty. Oh, and new -7 clocked in at 42 seconds, compare to old -8.

* I tested a bit of -e and -p on both new and old. A minor surprise:
- 5 at 1.3.1: -p (41 seconds) saves 2 MB, -e (instead of -p of course; 36 seconds) saves 35.
- 5 at yours: Both quite pointless, savings 0.4 (taking 45 seconds) and 0.6 (39 seconds).
Your build from -8 -b 8192 -r 8 (64 seconds, only one more than -8): adding "-p" (220 seconds) saves 2 MB, adding "-e" (instead, takes 214 seconds) saves 11.
So your "-5" picks up nearly all the advantages of -p or -e ... ?!

* For the hell of it, I started a comparison of -8 -b 8192 -l 32 -p -e, mistakenly thinking it would finish overnight ... I aborted it after a quarter run, having done only the"Lindberg" part below, and the autoc-double seems to improve a slight bit less here than in -8 mode. To get you an idea of what improves or not, fb2k-reported bitrates for that part:
2377 for 1.3.1 -5
2361 for 1.3.1 -8
2349 for 1.3.1 -8 -b 8192 -l 32 -p -e
2329 for your -3
2314 for your -5
2310 for your -8
2305 for your -8 -b 8192 (roundoff takes out the -p or -e improvements
2302 for your -8 -b 8192 -l 32
2301 for your -8 -b 8192 -l 32 -p -e

I can give more details and more results but won't bother if all this is well in line with expectations.
FLAC sizes: I removed all tags from the .wav files, used --no-seektable --no-padding and compared file sizes, since differences were often within fb2k-reported rounded-off bitrates.
Corpus (PCM sizes, -5 bitrates)
3.42 GB -> 2314 various classical/jazz tracks downloaded from the Lindberg 2L label's free "test bench", http://www.2l.no/hires/
2.64 GB -> 2161 the "Open" Goldberg Variations album, https://opengoldbergvariations.org/
3.18 GB -> 3035 Kayo Dot: "Hubardo" double album (avant-rock/-metal with all sorts of instruments)
1.41 GB -> 2913 NIN: The Slip
1.24 GB -> 2829 Cult of Luna: The Raging River (post-hc / sludge metal)
1.14 GB -> 2588 Cascades s/t (similar style, but not so dense, a bit more atmosphere parts yield the lower compressed bitrate)
0.57 GB -> 2724 The Tea Party: Tx 20 EP (90's Led Zeppelin / 'Moroccan Roll' from Canada)

Title: Re: New FLAC compression improvement
Post by: bennetng on 2021-08-02 12:54:01

Quote from: Porcus on 2021-08-01 15:37:30

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) . Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

Some observations by trial and error:

The order of windows work best from widest to narrowest in terms of time domain. For example, windows that don't take arguments, from widest to narrowest are Rectangular, Welch, Triangular/Hann, Blackman, then Flat top.
https://en.wikipedia.org/wiki/Window_function#A_list_of_window_functions
The left (blue) plots are time domain, and right (orange) ones are frequency domain, windows occupy more blue regions are wider.

Windows taking arguments like Tukey, when used repeatly, also work best from widest to narrowest, I believe Tukey(0) is same as Rectangular and Tukey(1) is same as Hann, according to this:
https://www.mathworks.com/help/signal/ref/tukeywin.html

I don't understand partial Tukey and punchout Tukey, look like these are combinations of several Tukey windows by looking at the source codes.

-e may adversely affect compression ratio with the above ordering (plus it is super slow anyway).

-6 to -8 already specified one or more windows, so don't use them if you are planning to use custom windows ordering.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-02 13:57:53

Quote from: bennetng on 2021-08-02 12:54:01

-e may adversely affect compression ratio with the above ordering

So the above test runs indicate the opposite: it improves every album, and even at hard settings with the "good order", but - and this is the surprise - it the improvement is near zero at the new build's "-5".
Gut feeling is that the new -5 is the oddball, and in a benevolent way in that it picks up a lot of improvement without resorting to -e or -p

I have very limited experience with -e and -p for this obvious reason:

Quote from: bennetng on 2021-08-02 12:54:01

(plus it is super slow anyway)

On this machine, not so slow as -p, and better (though not for every album; but, bitrates reported by fb2k, I have yet to see -p being more than 1 kbit/s better).

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-04 08:29:18

Quote from: Porcus on 2021-08-01 15:37:30

* -l 12 -b 4096 -m -r 6 (that is, "-7 without apodization") and -l 12 -b 4096 -m -r 6 -A tukey(0,5) produce bit-identical output. So is the case for 1.3.1: "-A tukey(0,5)" makes no difference.
(My locale uses comma for decimal separator. By using the wrong one I also found out that "tukey(0.5)" is interpreted as "tukey(0)", which appears to be outright harmful to compression.)

That is because -A tukey(0,5) is the default. So, both should produce bit-identical output.

Quote from: Porcus on 2021-08-01 15:37:30

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) . Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

It is not something well known. However, the differences should be very, very small. The reason the order might matter is that FLAC estimates the frame size for each apodization, it does not fully calculate it. If two apodization give the same frame size estimate, the one that is evaluated first is taken. The estimate might be a bit off though, which means that swapping the order can change the resulting filesize.

At least, that is how I understand it. This means that this influence should be minor, as having two estimates does not occur often and the actual difference should not be large, as the estimate is usually quite good. There could be something else at work though.

Quote from: Porcus on 2021-08-01 15:37:30

* PCM is ~ 13.6 GB, compresses to 7.95 GB using 1.3.1 -5, and from then on we have MB savings as follows
80 MB going 1.3.1 -5 (24 seconds on The Slip) to 1.3.1 -8 (43 seconds)
193 MB going 1.3.1 -8 to your -5 (25 seconds). Heck, even your -3 (20 seconds) is 65 MB better than 1.3.1 -8.
38 MB going your -5 to your -8 (63 seconds). Note how this makes "-8" less of a bounty now. The big thing is how your -5 improves over 1.3.1 -5 by 3.34 percent or 1.9 percent points and at around zero speed penalty. Oh, and new -7 clocked in at 42 seconds, compare to old -8.

Interesting. This is too little material to go on I think, but the change from float to double for autocorrelation calculation had most effect on classical music, and almost none on more 'noisy' material. For example, see track 11 in this PDF: http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics-per-track.pdf which is also NIN. There is almost no gain. That is does work well with the Slip has mostly to do with the higher bitdepth (24-bit) and not so much with the higher samplerate I´d say.

Quote

So your "-5" picks up nearly all the advantages of -p or -e ... ?!

-e does a search for the best order, without -e it uses an by-product of the construction of the predictor to guess the best order. It could be that the higher prediction also results in a better guess, but since the release of 1.3.1 there's also another change to this guess: https://github.com/ktmf01/flac/commit/c97e057ee57d552a3ccad2d12e29b5969d04be97

I can only guess why -p loses it's advantage. Perhaps because the predictor is more accurate, the default high precision is used better and searching for a lower precision does not trade-off well anymore?

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-04 08:32:58

fb2k-reported bitrates here. Not only is -5 nearly catching up with -6, but look at -4. And: -7 nearly catching -8.
If it were solely due to material, one would have expected 1.3.1 to show the same. Not quite. (There, -6 is quite good, that is in line with previous anecdotal observations.)

2644 for -3 useless or not, your build's -3 produces smaller files than 1.3.1 does at -8
2606 for -4 that shaves off 38 from -3. (2703 with 1.3.1 improves 20 from its -3)
2602 for -5 only improves 4 over -4. (But: 2692 with 1.3.1)
2599 for -6 only improves 3 over -5. (1.3.1: 2674, improves by 18 over -5.)
2590 for -7 improves by 9 over -6. (1.3.1: 2671, small improvement over -6.)
2590 for -8 calculated improvement 0.48. (1.3.1: 2666, improves more over -7 than -7 vs -6).
2580 for -8 -b 8192 -l 32 which is subset because 96 kHz. (1.3.1: 2670, worse than -8)

And then facepalming over myself:

Quote from: Porcus on 2021-08-01 15:37:30

"-7 without apodization"

save for the inevitable ~~doh~~default.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-04 10:03:45

All right, you posted while I thought the error message I got was that I had been logged out. Then since you mentioned something about what should benefit, I looked over the various albums, and I have seen the ten percent improvement mark!
And that is not classical or anything, it is the Tea Party four-song EP (yes the full EP, not an individual track). So I went for the current Rarewares 1.3.3 build, which does no better than 1.3.1.
Major WTF! You got PM in a minute.

As for the order of apodization functions: Yes, small difference. .04 percent to .08 percent.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-04 20:07:06

After doing some research with @Porcus, it became clear that I was wrong on the following:

Quote from: ktf on 2021-08-04 08:29:18

That is does work well with the Slip has mostly to do with the higher bitdepth (24-bit) and not so much with the higher samplerate I´d say.

As it turns out, it is the high samplerate and not the bitdepth that makes the difference. However, further research showed that it is actually the absence of high-frequency content in high samplerate files that make this difference. As a test, I took a random audio file (44.1kHz/16-bit) from my collection, and encoded it with and without the double-precision changes, and the difference in compression was 0.2%. When I applied a steep 4kHz low-pass on the file, this difference rose to 10%. To be clear, this is not the difference between the input file and the low-passed file, but the differences between the two FLAC binaries on the same low-passed file.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-12 09:20:03

Two more tests started when I went away for a few days.

* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12.
I don't know how subset-compliant ffmpeg is on hirez ... also, comparing file sizes could be kB's off, I had to run a metaflac removing everything on the ffmpeg-generated files - and then it turns out that there would still be bytes to save with fb2k's minimize file size utility.

* The other test was to see how close that "sane" settings get to insane ones. For that purpose I let it run (for six days!) compressing at -p -e --lax -l 32 -r 15 -b 8192 -A [quite a few]. TL;DR for that test: your new build gets much closer, the return on going bonkers is getting smaller with the double-precision build than it was with 1.3.1. Not unexpected, as better compression --> closing in on theoretical minimum.

Results:

1.3.1:
8066 MB for -8, that is actually 0.15 percent better than -8 -b 8192 -l 32 (also subset!).
7966 for the weeklong (well four days) max lax insanity, that shaves off more than a percent and ... and finally gets 1.3.1 to beat TAK -p0 which is 8017, this "for reference" O:)

The double precision build, and ffmpeg:
7873 MB for -5
7836 MB for -8
7831 MB for ffmpeg at compression_level 8 (fb2k reports 2 kbit/s better than your -8)
7815 MB for -8 --lax -b 8192 -r 15
7809 MB for ffmpeg at compression_level 12
7808 MB for -8 -b 8192 -l 32 (subset! -l 32 does better than -r 15, and 1 kbit/s better than ffmpeg)
7779 MB for the five days max lax insanity setting.

So are there differences at how ffmpeg does?
-8 vs compression_level 8: No clear pattern. Of seven files, ffmpeg wins 4 loses 3. Largest differences: ffmpeg gains 17 for Cascades, yours gains 15 for Nine Inch Nails.
-8 -b 8192 -l 32 vs compression_level 12: ffmpeg wins 3 loses 4, but the largest differences favor ffmpeg: 41 for The Tea Party. Again yours has the upper hand of 13 on NIN.

Now which signals *do* improve from the insane setting? Absolutely *not* the jazz/classical - nor is it the highest bitrate signal (Kayo Dot), they are within 5 kbit/s from -8.
It is The Tea Party, gaining 98 kbit/s over the subset, which again gains 46 over -8.

So the TTP EP indicates there *is* something to be found for music that is not extreme. Note that TTP is the shortest (in seconds) file of them all, and that could make for larger variability.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-12 09:26:27

By the way, TBeck speaks about splitting windows and that this is possible within the FLAC spec (https://hydrogenaud.io/index.php?topic=44229.msg402752#msg402752). This is beyond me (I stopped learning Fourier analysis long before getting hands-on and I totally suck at code), but ... ?

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-12 14:15:15

Thanks for digging that one up. Possibility A is what I implemented quite a few years ago in FLAC 1.3.1, partial_tukey and punchout_tukey. TBeck was talking about using various variations of the triangle window, this works a little different. Partial tukey uses only a part of the signal (hence partial_tukey) for LPC analysis, punchout tukey masks a part of the signal (it 'punches' out a part of the signal) by muting it for LPC analysis. This works exactly as described in the post you link

Quote from: TBeck on 2006-06-14 05:10:14

If so, then one frame often will contain parts with different signal characteristics, which better should be predicted seperately. But this does not happen.

This is my hypothese, why windowing helps FLAC that much: It surpresses the contribution of of one or two (potential) subframes at the frame borders to the predictor calculation and hence improves the quality of the (potential) subframe within the center. At least this one now gets "cleaner" (not polluted by the other subframes) or better adapted predictor coefficients and overall the compression increases.

Possibility B would make use of a variable blocksize. This is possible within the FLAC format, and flake (and cuetools.flake) implement this. However, as this has never been implemented in the reference encoder, it might be that there are (embedded) FLAC decoders that cannot properly handle this. I have wanted to play with variable blocksizes for a long time, but if it were to succeed, this might create a division of "old" fixed-blocksize FLAC files and "new" variable-blocksize FLAC files, where the latter is unplayable by certain devices/software.

I cannot (yet) substantiate this fear. Moreover, implementing this in libFLAC would require *a lot* of work. Perhaps it would be better to experiment with new approaches in cuetools flake, if I run out of room for improvements in libFLAC

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-12 14:58:42

Yeah, so the following musing is probably just ... not much pursuing, but that hasn't stopped me from thinking aloud:

* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on.
But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset?

* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384.
Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192?

(... it is then so that increasing n in partial_tukey(n) allows further splitting up?)

Anyway, CUETools flake may in itself support non-CDDA, but I doubt that it will be used outside CDDA when CUETools is restricted that way. Meaning, you might not get much testing done. ffmpeg, on the other hand ... ?

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-13 15:07:51

Quote from: Porcus on 2021-08-12 09:20:03

* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12.

Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.

Quote from: Porcus on 2021-08-12 14:58:42

* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on.
But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset?

Yes, it might be unsafe to use -l 32 -b 8192 when encoding for maximum compatibility. However, properly decoding -l 32 -b 8192 is quite simple, as it is simply more of the same (longer block, longer predictor). Also, it is part of the FLAC test suite.

Variable blocksizes are not part of the FLAC test suite, and quite a few things change. For example, with a fixed blocksize, the frame header encodes the frame number, whereas with a variable blocksize, the sample number is encoded.

Quote

* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384.

The problem is that it is probably impossible to know upfront what the optimal blocksize is for a whole file.

Quote

Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192?

I cannot answer that question without just trying a bunch of things and see. I will explain the idea behind these apodizations (which I will call partial windows)

I would argue that a certain set of partial windows @ 44.1kHz with blocksize 4096 should work precisely the same with that file @ 88.2kHz with a blocksize of 8192.

partial_tukey(2) adds two partial windows, one in which the first half of the block is muted and one in which the second half of the block is muted. partial_tukey(3) adds three windows, one in which the first 2/3th of the block is muted, one in which the last 2/3th of the block is muted and one in the first and last 1/3th of the block is muted.

punchout_tukey does the opposite of partial_tukey. Using punchout_tukey(2) and partial_tukey(2) together makes no sense, because you get the same windows twice, because punchout_tukey(2) creates the same two windows but swapped. punchout_tukey(3) adds 3 windows, one with the first 1/3th muted, one with the second 1/3th muted and one with the third 1/3th muted.

If a block consists of a single, unchanging sound, partial_tukey and punchout_tukey do not improve compression. If a block has a transient in the middle of the block (for example, the attack of a new note at roughly the same pitch) punchout_tukey(3) adds a window in which the middle with this transient is muted. The LPC stage can now focus on accurately predicting the note and not bother with the transient. This is why adding these apodizations improves compression. As TBeck put it, the signal is cleaner. Of course this transient still has to be handled by the entropy coding stage, but a predictor that does part of the block very good beats one that does of all of the block mediocre on compression.

So, choosing how many partial windows to add depends on the number of transients in the music, the samplerate and the blocksize. If the samplerate doubles, the blocksize can be doubled keeping the same number of partial windows. If the samplerate doubles and the blocksize is kept the same, the number of partial windows can be halved. If you samplerate is kept the same and the blocksize is doubled, the number of partial windows should be doubled.

However, it depends on the music as well. Solo piano music at a slow tempo won't benefit from more partial windows as there are few transients to 'evade'. Non-tonal music on which prediction doesn't do much won't benefit either. Very fast paced tonal music might benefit from more partial windows. The current default works quite well on a mix of (44.1kHz 16-bit) music.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-13 15:56:13

* All right, I got the punchout now, thanks!

* I don't know if CUETools' flake even supports higher resolutions, but if so, how do I call it outside CUETools? I could run the test at that too.

* You mentioned solo piano, so let me mention the following: The Open Goldberg Variations does routinely not benefit from increasing the Rice parameter value (max, I never set min). By that I mean that settings that only differed by -r, say -r 6 vs -r 8, would often yield the same file (same .sha1) for that album. Yes a full album .flac file, I did not bother about tracks here.

* Meanwhile, I started another test using ffmpeg -compression_level 8 -lpc_type cholesky. It improves over -compression_level 8 for every sample, -3 kbit/s on the total. So at the ffmpeg camp they have been doing something bright.

* And then, since you posted in the https://hydrogenaud.io/index.php?topic=120906 thread, reminding me of that thing, I tried this new build on test signals: sine waves (at peak near full scale) - or tracks that have four sine waves in succession (peaks .72 to .75).

So even tracks that are damn close to continuing sine, benefit from double precision - but even then, ffmpeg beats it. Results:
-5 results:
388 for ffmpeg at -compression_level 5
376 for 1.3.1 at -5
362 for double precision at -5
-8:
360 for 1.3.1 at -8
343 for double precision at -8
330 for ffmpeg at -compression_level 8. Same as for your build with -p or -e (-p -e is down to 321).
314 for ffmpeg -compression_level 8 -lpc_type cholesky

Lengths and track names - the "1:06" vs the final "1:04" must be gaps appended. Total length 9:16 including 12 seconds gap then.
1:38 Mastering calibration - 1 kHz
1:38 Mastering calibration - 10 kHz
1:38 Mastering calibration - 100 Hz
1:06 Frequency check - 20, 32, 40 & 64
1:06 Frequency check - 120, 280, 420, 640
1:06 Frequency check - 800, 1200, 2800 & 5000
1:04 Frequency check - 7500, 12000, 15000 & 20000

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-08-13 16:27:04

CUETools flake supports at least 24-192. When CUEtools was new i reported problems with highbitrate material to Grigory Chudov and he fixed it immediately.
If you want to test recent behaviour just use my 2.16 encoder compile. AFAIK nothing changed for the flake encoder since.
Regarding non default blocksizes My Slimdevices Transporter for example can't play 24-96 material with 8192.

Title: Re: New FLAC compression improvement
Post by: Rollin on 2021-08-13 17:28:43

Quote from: Porcus on 2021-08-13 15:56:13

I don't know if CUETools' flake even supports higher resolutions, but if so, how do I call it outside CUETools? I could run the test at that too.

Yes, it does support higher resolutions. It is easy to use it as custom encoder in foobar2000.
(https://i.imgur.com/WreabtQ.png)
-q -8 --ignore-chunk-sizes --verify - -o %d
For compression levels higher than 8, --lax should be added to command line.

Quote from: Porcus on 2021-08-13 15:56:13

ffmpeg beats

Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8. And there is no option in ffmpeg to set block size.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-13 20:45:10

More testing with CUETools.Flake.exe -8 -P 0 --no-seektable --and-maybe-some-more-options, after I (thanks to Rollin!) found out that yes it is an .exe and not just a .dll ... I don't feel smart right now.

Results given for three groups of material, all figures are fb2k-reported bitrates.
Maybe the most interesting news:
No miracles from variable block size (--vbr 4, at one instance also --vbr 2).

* First, those test signals where 1.3.1 ended up at 360: four CUETools.Flake.exe runs all ended up at 359 (with or without --vbr 4, with or without -t search), that is not on par with ktf's double precision build

* Then the 96k/24 corpus, put the Flake into the relevant orders; all encoders that are not specified, are ktf's double precision build - to be clear, it is the libFLAC Hydrogenaud.io autoc-double testversion 20210630

2666 for reference flac.exe 1.3.1 at -8
2644 for -3
2609 for CUETools.Flake.exe at -8
2607 for CUETools.Flake.exe at -8 --vbr 4 -t search -r 8 -l 32
2606 for -4
2605 for CUETools.Flake.exe at -8 --vbr 4 (better than throwing in -t search -r 8 -l 32)
2602 for -5
2599 for -6
2590 for -7
2590 for -8 (0.48 better than -7, calculated from file size)
2588 ffmpeg -compression_level 8
2585 ffmpeg -compression_level 8 -lpc_type cholesky
2584 for -8 -b 8192 -r 8 -p
2584 for --lax -r 15 -8 -b 8192
2581 for -8 -b 16384 -l 32
2581 for -8 -b 8192 -r 8 -e (slightly smaller files than the "2581" above)
2581 ffmpeg -compression_level 12 (again slightly smaller files than previous 2581)
2580 for -8 -b 8192 -l 32 (subset too, notice 8192 better than 16384)
2571 for --lax -p -e -l 32 -b 8192 -r 15 -A enough-for-five-days
2563 for TAK at the default -p2.

* And, here are some results for one multi-channel (5.1 in 6 channels) DVD-rip (48k/24), 80 minutes progressive rock:
4119 for 1.3.3 at -8
4091 for ktf's -8
4088 for ffmpeg -compression_level 8
4080 for ffmpeg -compression_level 8 -lpc_type cholesky
4065 for CUETools.Flake.exe at -8, and also at -8 --vbr 2 and at -8 --vbr 4 (I overwrote without checking file size differences)
And lax options, not sure what to make out of them:
4044 for ffmpeg -compression_level 12 -lpc_type cholesky
4023 for CUETools.Flake.exe at -11 --lax
4006 for ktf's -8 -b 8192 -l 32 -r 8 --lax

Quote from: Rollin on 2021-08-13 17:28:43

Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8.

With reference FLAC I did not get 4608 to improve over 4096 at CDDA. Only very rough testing (but more than this test CD ...).
Also with 96k/24 I did not get 16384 to improve over 8192.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-08-14 17:18:01

Quote from: ktf on 2021-08-13 15:07:51

Quote from: Porcus on 2021-08-12 09:20:03
* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12.
Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.

Apparently, I was too quick dismissing ffmpeg. As can be read https://hydrogenaud.io/index.php?topic=45013.msg412644#msg412644 (https://hydrogenaud.io/index.php?topic=45013.msg412644#msg412644), flake's developer actually did quite a bit of development after integrating flake into ffmpeg. Also, others did.

It turns out this cholesky factorization (which can be used with the option -lpc_passes) does pretty much what the IRLS approach does which this thread starts with. I am truly surprised with this, especially as it has been there since 2006 with apparently nobody here at hydrogenaudio using it. I quote from https://github.com/FFmpeg/FFmpeg/commit/ab01b2b82a8077016397b483c2fac725f7ed48a8 (emphasis mine)

Quote

optionally (use_lpc=2) support Cholesky factorization for finding the…

… lpc coeficients

this will find the coefficients which minimize the sum of the squared errors,
levinson-durbin recursion OTOH is only strictly correct if the autocorrelation matrix is a
toeplitz matrix which it is only if the blocksize is infinite, this is also why applying
a window (like the welch winodw we currently use) improves the lpc coefficients generated
by levinson-durbin recursion ...

optionally (use_lpc>2) support iterative linear least abs() solver using cholesky
factorization with adjusted weights in each iteration

compression gain for both is small, and multiple passes are of course dead slow

Originally committed as revision 5747 to svn://svn.ffmpeg.org/ffmpeg/trunk

That description perfectly matches IRLS (iteratively reweighted least squares). My IRLS code uses Cholesky factorization as well. I'll look into this if I take another look at my IRLS code for libFLAC.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-16 00:13:18

ffmpeg already in 2006 ... hm.

This is probably my final test, and it is neither nice nor "fair": it is the single worst TAK-able track in my CD collection, Merzbow's "I lead you towards glorious times" off the Veneorology album. For those not familiar with this kind of noise music, you can listen at YouTube (https://www.youtube.com/watch?v=OzWNJtN86kU) to understand why these insane bitrates.

For ktf's double-precision build there is a simple TL;DR: absolutely no difference on this track.
After I put every flac file through metaflac to remove everything including padding, ktf's build produces bit-identically the same file as flac 1.3.1, for each setting. (Those were: -0 through -8, for -4 -e though -8 -e, for -4 -p -e through -8 -p -e. More than I intended, but a "wtf?" or two here.)

Also, some more bit-identicals:
flac.exe produce bit-identical files for the following groups: (-3, -4, -4 -e, -4 -p -e); (-6, -7, -8); (-6 -e, -7 -e, -8 -e); (-6 -p -e, -7 -p -e, -8 -p -e).
flake -11 and -11 --vbr 4 produce bit-identical files ... and they are quite a bit better than any other FLAC.

I deliberately picked a track that I knew would make for strange orderings in that TAK cannot handle it (only -4 without e and m gets within .wav size) - but look at how ffmpeg cannot agree with its own ordering. Also look at where flake put its -8 and -9. flac.exe only misses the order once, in that -0 produces smaller file than -1.

Lazy screendump coming up. The "cholesky" is the same kind of option as above.
(https://i.imgur.com/HmOGKy4.png)
Oh, OptimFrog managed to get down to 53 222 937, but I don't have any fb2k component for it.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-08-17 18:56:28

Quote from: Porcus on 2021-08-16 00:13:18

This is probably my final test, and it is neither nice nor "fair": it is the single worst TAK-able track in my CD collection

Couldn't help myself since there was another discussion on decoding efficiency, tried the ape. "Revised" worst end of the file size list:

1418 TAK -p1
1424 Monkey's Insane
1424 Monkey's High
1424 TAK -p0
1424 Monkey's Extra High
1426 Monkey's Normal

... so even when TAK fails at beating PCM, it succeeds at improving over Monkey's.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-21 13:16:07

After the 'double autocorrelation' change which has been submitted to FLAC quite a while ago, I've been busy improving the IRLS code for which I started this topic. Source code can now be found here: https://github.com/ktmf01/flac/tree/autoc-double-irls

Please see the image below

Compared are the exe I dumped here the 16th of June, the exe I'm attaching now and CUEtools.Flake 2.1.6. Presets for FLAC are, from left to right, -4, -5, -6, -7, -8, -8e, -8ep and -9. Presets for Flake are -4, -5, -6, -7, -8, -8 --vbr 4. As you can see, the largest difference is because of the 'double autocorrelation' change, which is clearly visible from -4 to -8ep. However, the change from old -9 to new -9 is what I've been working on. The IRLS code is now much faster, used more efficiently and compresses slightly better.

Feel free to try to exe, look through the source etc. Please be cautious with the exe, I've been quite busy tuning but have done little testing. I've only tested on CDDA material, maybe there are still some surprises on 96kHz/24-bit material left.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-22 05:06:34

Again my boring 29 CDs

IRLS beta -9
7.583.395.627 Bytes

IRLS beta -9ep
7.582.193.957 Bytes

older numbers
flac-native -9
7.586.738.858 Bytes

flac-native -9 -ep
7.581.988.737 Bytes

-9 speed has indeed improved,thanks!

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-22 20:53:23

Quote from: ktf on 2021-09-21 13:16:07

maybe there are still some surprises on 96kHz/24-bit material left.

I said the Hydrogenaud.io autoc-double testversion 2021063 -7 was good on hi-rez, this one is even better - on some material. Your new -9 is slooow on this material, good I didn't test the first one.

tl;dr on the below specified four hours of 96/24 (no fancy compression options given!)

-9: spends 40 minutes to achieve 57.25%
-8e spends 15:44 to achieve 57.30% (compared to Hydrogenaud.io autoc-double testversion 2021063 it shaves off .31 points at a cost of 8 seconds)
-8: spends 5:36 to achieve 57.33 (savings: 0.38 points, costs 12 seconds). ffmpeg at -8 gets inside that 0.38 interval, no matter whether it uses lpc_order 2 (spending 6:28) or 6 (at 17 minutes)
-7: spends 3:38 to achieve 57.37, which is still better than the autoc-double testversion's -8e
-6: spends 3:11 to achieve 57.83, that is not good compared to -7. Here and down to -4, the differences to the autoc-double testversion is at most .17
-5 spends 2:10 to achieve 57.92. -4 spends 2:00 to achieve 57.98. That's on par with ffmpeg -5, but twice as fast.

I tested CUETools.Flake -4 to -8, not so much variation, spending from 8:27 down to 3:04 for 58.23% to 58.48%.
I tested 1.3.1 at -8 -e (the -e by mistake), took twelve minutes for 58.97 and was worst for all files - except ordinary -8 half a point worse.

But a lot of the improvement is due to an album and an EP out of four. Your new -7 is faster than 1.3.1 -8 and yields savings by half a percent point up to 8.5 (!!) percentage points, and it is the biggest file that is least compressible.

Material: to get done in a day, I selected the following four hours from the above 96/24 corpus, in order of (in)compressibility:

* Kayo Dot: Hubardo. 93 minutes, prog.metal. Needs high bitrate despite not sounding as dense as the next one.
All about the same, all within half a percent point. And this is the biggest file of them all
Best: flake -8 at 65.72, then your new -9 at .73. (Heck, even OptimFrog -10 only beats this by 1 point.)

* Cult of Luna: The Raging River. 38 minutes sludge metal/post-hardcore.
Large variation, flake does not like this.
Best: New -9 at 59.45. -7 and up shaves a full percentage point over the autoc-double testversion. ffmpeg -8 about as good. ffmpeg -5 at 60.8. flake -8 at 62.59, 1.3.1 even a point worse at -8 -e.

* The Tea Party: Tx20. An EP, only 18 minutes Moroccan Roll. Earlier tests reveal: differs significantly between encoding options.
Large variations. Your -9: 53.95. Your -7 beats your new -6 by 3.2 points and your previous -8e by half that margin. ffmpeg varies by 3 points - here is the file where one more lpc pass makes for .1 rather than .02. Flake runs 60 to 61. flac 1.3.1 62 and 63.

* Open Goldberg Variations. 82 minutes piano, compresses to ~47 percent. Earlier tests reveal: doesn't use high Rice partition order.
Best: ffmpeg -8, but between 46.71 and 46.92 except flac -1.3.1 (add a point or more).

Done on an SSD, writing the files takes forty to sixty seconds. Percentages are file sizes without metadata, padding or seektables, but those don't matter on the percentages for such big files anyway.

Title: Re: New FLAC compression improvement
Post by: danadam on 2021-09-23 15:21:05

Quote from: Porcus on 2021-08-12 09:20:03

I don't know how subset-compliant ffmpeg is on hirez ...

AFAICT it should be on 88.2k - 192k by default and above that if you force block size to 16384.

From what I understand prediction order is not limited when sampling frequency >48000, partition order is limited to 8 but this the max that ffmpeg is using on any level, which leaves block size. ffmpeg is using 105 ms block size, which is translated to:

Code: [Select]

44100 - 4608
48000 - 4608
88200 - 8192
96000 - 8192
176400 - 16384
192000 - 16384
352800 - 32768
384000 - 32768

Quote from: Rollin on 2021-08-13 17:28:43

Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8. And there is no option in ffmpeg to set block size.

"-frame_size 4096" works for me.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-25 16:10:45

@ktf : On my computer, -9 -e generates same files as -9, and -9 -p -e the same (in an awful lot of time) as -9 -p; is that to be expected? I tried the 96/24 and a small CDDA set too.
Asking because it could depend on CPU-specifics.

(I wonder if combinations of -9, -e and/or -p will do calculations that couldn't lead to improvements. I don't know if there is any demand for any optimization; on one hand, who uses -p -e after all? on the other, yes those who use -9 -p apparently do get the same as -9 -e -p, so ...)

Quote from: danadam on 2021-09-23 15:21:05

partition order is limited to 8 but this the max that ffmpeg is using on any level
[...]
"-frame_size 4096" works for me.

Interesting. Thx, that leaves room to test whether it matters - and if so, whether defaults are optimized.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-25 16:58:34

As you can see above -ep only slightly improves compression because the SSE double precision already doesn't leave much room. I use a Ryzen 5900x.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-26 04:00:44

Quote from: Porcus on 2021-09-25 16:10:45

@ktf : On my computer, -9 -e generates same files as -9, and -9 -p -e the same (in an awful lot of time) as -9 -p; is that to be expected?

The same behaviour here. -e seems to do nothing with -9

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-26 09:29:08

Preset -9 is equivalent to -b 4096 -m -l 12 -e -r 6 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost-p(3)", so it already includes -e.

irlspost-p takes the result from evaluating tukey(5e-1);partial_tukey(2);punchout_tukey(3) and iterates on it, with the final iteration also being evaluated with -p. Using -9p gives only very small gains as irlspost-p already uses p (and irls usually results in the smallests file)

If you want even better compression, I'd recommend either using -9 -r 8 in the case of electronic music (some chiptune can really gain a lot by using this) or -9 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost-p(10)" or an even higher number for irlspost-p. I haven't seen much improvement with more than 10 iterations, but perhaps this is different for hi-res material.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-09-26 14:56:40

Are there any known decoding problems going above -r 6?
Playing with the parameter i see no real gains and even slightly worse results for 24-96 stuff.
I think the default -9 setting is well chosen. -p is simply to slow for its minimal gains.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-26 19:11:14

Quote from: Wombat on 2021-09-26 14:56:40

Are there any known decoding problems going above -r 6?

Not that I know of. ffmpeg and CUEtools.Flake use it in some presets by default. For most music it is not worth the trade-off, but for chiptune (Game Boy emulation) I've seen gains of 3 - 4%. There, the high partition order actually switches on low-frequency square wave transitions.

Quote from: Wombat on 2021-09-26 14:56:40

I think the default -9 setting is well chosen. -p is simply to slow for its minimal gains.

Yes, with the irlspost-p I tried to get the gains I saw with using -p but for minimal speed loss. As the irlspost stuff takes the best predictor from the tukey windows and does a few iterations on them, the result from the irls process usually gives the smallest frame.

From the small difference between -9 and -9p you can see that these iterations sometimes give results worse instead of better (because a improvement between -9 and -9p implies that this is from the regular tukey apodizations, and thus that the IRLS process did not improve upon them), it also shows that the process usually works very well.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-26 20:57:13

It seems that adding another "-e" still slows it down, so apparently some combinations force the encoder to do the same work twice.

Anyway, I will be putting the build at some overnight work out of curiosity, but avoid the -p and -e.

Question: Does irlspost-p(N) correspond to ffmpeg's lpc_order N in terms of passes, or to N-1 or to N+1 or justforgeteverythingaboutcomparing?

Edit: Oh, and on the Rice: The Open Goldberg Variations album would frequently end up in same files at a number of different -r settings.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-27 07:31:55

Quote from: Porcus on 2021-09-26 20:57:13

It seems that adding another "-e" still slows it down, so apparently some combinations force the encoder to do the same work twice.

Are you sure? I cannot think of a way that can happen

Quote

Question: Does irlspost-p(N) correspond to ffmpeg's lpc_order N in terms of passes, or to N-1 or to N+1 or justforgeteverythingaboutcomparing?

I assume you mean -lpc_passes? If you want to compare anything, I'd say comparing to irlspost(N) instead of irlspost-p(N) is more fair. irlspost(5) would correspond to -lpc_passes 6, but the algorithms are quite a bit different.

The basic idea is the same, but the execution is very different. In both implementations the basic weight is the inverse of the residual of the pass before. This is the so called L1-norm (https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares#L1_minimization_for_sparse_recovery). The implementation in ffmpeg has a factor summed to the absolute of the residual before inversion (https://github.com/FFmpeg/FFmpeg/blob/c3222931aba47b313a5b5b9f3796f08433c5f3b9/libavcodec/lpc.c#L261) which grows smaller every iteration. I don't know what the idea for that is, and it seems counterproductive to me. My implementation in libFLAC weighs according to the L1-norm, but has a cut-off in place for small residuals, this is something suggested by most books I read on the subject. Cutoff is currently at 2 (https://github.com/ktmf01/flac/blob/97d52faec95ebee4b3ffc8512a0d9ee03e2a0d45/src/libFLAC/lpc.c#L257), but this is something that is tuneable. This cut-off makes sure small residuals don't get too much attention, and it protects against division by zero. I have experimented with much larger values, and some music seems to compress better with values over 50 for example, so this still needs tuning.

My implementation also weighs with a moving average of the data. Current moving average window width is 128 samples (https://github.com/ktmf01/flac/blob/97d52faec95ebee4b3ffc8512a0d9ee03e2a0d45/src/libFLAC/lpc.c#L50), but this also something that is tuneable. This is because the way rice partitions work: a large residual in one part of the block does not have to have the same impact in another part of the block. By using a moving average as a proxy for this effect, the IRLS algorithm can try to optimize the whole block for the minimal number of bits, instead of only the hardest-to-predict parts. This moving average window width is also tunable, and is related to the -r parameter.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-27 12:30:39

Quote from: ktf on 2021-09-27 07:31:55

Are you sure? I cannot think of a way that can happen

I ran two rounds -e and one -p -e, happened on all three. Will run them over a few times to see if it was a coincidence.

By the way, was there any reason that the new -5 to -8 should improve compression over the double precision version I tested above? (Because, apparently they do improve. Again only tested the 96/24 files.)

Quote

I assume you mean -lpc_passes?

Yes.

Quote

If you want to compare anything, I'd say comparing to irlspost(N) instead of irlspost-p(N) is more fair. irlspost(5) would correspond to -lpc_passes 6, but the algorithms are quite a bit different.

Depends on whether you think irlspost(N) is ready for test runs?
Naming suggests that irlspost(N) is like irlspost-p(N) but without the final "p" - and so that the algorithms that are "quite a bit different" are those two vs ffmpeg, and not irlspost vs irlspost-p?

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-09-27 20:37:58

Quote from: Porcus on 2021-09-27 12:30:39

By the way, was there any reason that the new -5 to -8 should improve compression over the double precision version I tested above? (Because, apparently they do improve. Again only tested the 96/24 files.)

Yes, the code has seen some improvements since I uploaded that binary. They have been included in the most recent binary.

Quote from: Porcus on 2021-09-27 12:30:39

Depends on whether you think irlspost(N) is ready for test runs?
Naming suggests that irlspost(N) is like irlspost-p(N) but without the final "p" - and so that the algorithms that are "quite a bit different" are those two vs ffmpeg, and not irlspost vs irlspost-p?

3 'apodizations' have been added. They aren't apodizations, but that was a good way to introduce it and it works in the exact same part of the code. Those three are irls(N), irlspost(N) and irlspost-p(N). They work with the exact same code, but the way in which they interact with the rest of the code is different.

irls(N) does iterations 'from scratch', irlspost(N) takes the best of the previous apodizations as a starting point (hence the post, it works like post-processing), irlspost-p(N) is the same but with precision search just for the end results of the IRLS process. The inner process of all three is the same.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-09-29 16:33:14

First: false alarm on -e taking more time. Probably it was because I kept files on a spinning drive approaching full, and so I/O would always take more time on the second run (and I would do -9 before -9 -e).

So I re-started the overnight jobs to get accurate encoding times for the full seven-hour 96/24 corpus. Here are a few at 4096 except ffmpeg at its default 8192 and Flake at variable blocking:

* 2644 for flac.exe 1.3.1 at -8 -p -e . It spent 170 minutes on this, only because I had to test how low the new could go and still beat it:
* 2641: In four minutes, your new -2 compresses better than 1.3.1 -8 -p -e. That is worth something eh? :-D
* 2605: as good as CUETools' Flake got it within subset. (Not timed, from earlier, -8 --vbr)
* 2597: new -5 (five minutes)
* 2585: ffmpeg -compression_level 12 (No time, from earlier.)

Bearing in mind the improvement from 1.3 1 and Flake down to the next one, the compression improvements from -7 aren't much:
* 2578: new -7 (< 7 minutes).
* 2577: ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 2 (45 minutes), narrowly beaten by your new -8:
* 2577: new -8 (10 minutes)
* 2575: new -8 -p -e takes 3h38min, that is slower than 2x realtime, and I got better compression from "-9 without -e" (not timed)
* 2575: 69 minutes for ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 4
* 2574: 92 minutes for -9. The improvement over "without -e" is 0.48 kbit/s
* 2574: 93 minutes for ffmpeg -compression_level 12 -lpc_type cholesky -lpc_passes 6, narrowly beats -9. And another 24 minutes on -lpc_passes 8 only improved a fraction of a kbit/s.
* 2574: 84 minutes for "-9 with -e replaced by -r 8", beats -9 by 0.15 kbit/s - that is due to The Tea Party EP (which also by the way is better off at -b 2048)
* 2574: -9 -p improves 0.35 over -9 and is so slow I won't touch it again.

For what this limited testing is worth, it points at:
* -7 is so near -8 that ... Should one beef up -8 to get a proper difference? Try to look for a different apodization function? Look for overlap, add a fourth, do -r 8 or ... ? On the other thand, it is slower than the stock compile.
* -8 -p -e must die. Die. Oh it doesn't even move. Oh yes it does ... in a few hours. But nobody uses it eh? Or well maybe somebody does because it is perceived as the "best" preset.
* For those who want to wait for -9, it is on par with with what ffmpeg can do in the same amount of time (ffmpeg: -lpc_passes costs 6 minutes per pass. I didn't calculate until after running the even-only).
* While I don't think people will do -9 without having CPU time to spend, consider whether a new -9 should force -e included, when it isn't in -0 to -8. Maybe better keep it optional, as omitting it doesn't lead to unreasonable compressions.
* The average improvements for the tougher modes, are driven by a few signals - the TTP EP and Cult of Luna, primarily. That points towards an adaptive "this is enough!" mode, if anyone bothers to implement it. Or, actually, even if one doesn't want to go for the variable blocking strategy, then one could do two full -5 runs only to guesstimate the best for-the-file blocksize before running -9, only adding ten percent to the overall time. We have an expensive -p and an expensive -e, so heck ...

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-01 10:56:19

More overnight testings. The question I was thinking of asking is: how much CPU time do you have to spend to s(h)ave the next megabyte? (Amounting to about a third of a kbit/s on this corpus.) You would expect this marginal time cost to increase as you first select the ripe fruits to pick, and the following suggest that it holds (compare to 1.3.1, where going from -8 to -8 -e saves 62 MB in a quarter of an hour, that is around fifteen seconds per):
* Going new -5 (or -6, also just tested, is not much better than -5) to new -7: a few seconds per megabyte s(h)aved. Did I say that new -7 is damn good?!
* new -7 to -8: a minute per megabyte s(h)aved
* -8 to -9: about ten minutes for the next megabyte. Sounds horrible, but it is better than going -8 to -8 -p -e.

But then at this "ten minutes for the next" (for those willing to spend that CPU time going from -8 to -9) - then interesting things happen:
There are alternatives that get you a megabyte per ten minutes, and several of them seem to add up nicely! Given that the "cost" a megabyte just jumped by a factor of ten, you would expect the next to be jumping further? No, not necessarily:
The following are about in the same minutes-for-the-next-megabyte and can be combined on top of -8:
* A manual "-9 without -e"
* adding "-r 8 -e" to the previous
* Two more apodization functions: I just naively continued the "partial_tukey(2);punchout_tukey(3)" pattern with A tukey(5e-1);partial_tukey(2);punchout_tukey(3);partial_tukey(4);punchout_tukey(5)

That the marginal cost stays flat for a while, suggests that there is some smart setting to be found that catches the lion's share of the improvement much cheaper. What to try?
(Idea: maybe try an approach where higher order partial_tukeys are done by rather than making 4 each of length 1/4, take the first and last 1/2, and first and last 1/4 and leave the middle to a different function name?)

What I actually intended with these tests, was something else: My idea was that with these improvements, there might be settings that are no longer well suited because they only take time doing what the new code already picks up. (Like -p ... pending more testing, the documentation could indicate that this is less worth than -e.) Apart from -7 getting so close to -8, I didn't find anything consistent. Some isolated strange things are going on; like, I tried two files with -9 -l 32 (which encodes at half real-time!) and one of them became bigger than -9.

Also, wishlist items:
* After displaying compression ratio with three decimals (should be enough for most purposes, but ...), it wouldn't hurt with a "(+)" if it is >1 and "(=)" if it hits exactly the same audio size (which it sometimes will, say if recompressing with a -r 8 that makes no difference).
* --force-overwrite-iff-higher-compression . (Which spawns the question: if your irlsposts are post-processing algorithms, could they take as starting point an existing flac file without doing the initial compression?)
* Maybe abort with error if user gives -A "blah blah blah without closing quotation mark -- "filename with space.wav" ? I learned it leaves flac.exe hanging doing nothing responding nothing.
* For the documentation (https://xiph.org/flac/documentation_tools_flac.html): the fact that flac accepts 5e-1, avoiding locale-specific 0.5 vs 0,5. Should be recommended.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-10-01 19:32:44

Quote from: Porcus on 2021-10-01 10:56:19

* A manual "-9 without -e"

Sadly, even a -9 without -e internally forces the -e on the IRLS code because I haven't implemented a 'guess' method for the predictor order for IRLS code. Perhaps I could check what would happen if the code just defaulted to the highest order calculated.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-03 20:34:30

Quote from: ktf on 2021-10-01 19:32:44

Perhaps I could check what would happen if the code just defaulted to the highest order calculated.

Why not ... since it is not my time. Well actually: presets 0 through 8 all have the 2x2=4 combinations of -N, -N -p, -N -e and -N -p -e, so, not a bad idea either.

Did another test on a different computer. Earlier on we found that double precision did well on higher frequency content (you did an aggressive lowpass to get the point across), and I was curious whether it carries over to even higher-rez. Turns out yes - and also your most recent update does.
This is one track only, so all reservations taken. I went over to http://www.2l.no/hires/ and picked the Britten track in 352.8/24 as well as 88.2/24 (that's in the 96 column ...) and ran it with (I) stock 1.3.1, (II) your "native" as posted here, (III) your double precision as posted here, and (IV) your most recent IRLS.
Did -8 and -8 -e. And then (V) on the new build: -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost(3)" (no -p!) and -9. Then, (VI): replaced the irlspost(3) by irlspost(7) and for -9, bumped up to irlspost-p(7). And finally, your new -7.

Observed the gains:
(I) 4515 kbit/s at -8. Bitrate savings to -e: 42 resp. 19.
(II) Nine kbit/s better than 1.3.1, and Savings to -e comparble as for 1.3.1, ending up at 6842 resp 2127
(III) HUGE savings for the 352 file! Bitrates 6525 resp. 2101.
Going to -e saves another 30 resp ... rounded off to 0.
(IV) 6444 vs again 2101. And, rounded off to integer kbit/s, -e gains nothing.
(V) irlspost(3) makes no difference. -9 gains 5 resp. 10 kbit/s.
(VI) Just don't. One file up a couple hundred bytes, one down about the same.

Also, going -8 to -6 hurts the 352 the most. Same about -4 to -3 and -1 to -0. Going -8 to -7 does hardly anything, neither does -6 to -5 to -4 or -2 to -1. Only -3 to -2 hits the 88.2 slightly more.

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-10-04 12:43:25

Quote from: Porcus on 2021-10-03 20:34:30

Observed the gains:
(I) 4515 kbit/s at -8. Bitrate savings to -e: 42 resp. 19.
(II) Nine kbit/s better than 1.3.1, and Savings to -e comparble as for 1.3.1, ending up at 6842 resp 2127
(III) HUGE savings for the 352 file! Bitrates 6525 resp. 2101.
Going to -e saves another 30 resp ... rounded off to 0.
(IV) 6444 vs again 2101. And, rounded off to integer kbit/s, -e gains nothing.
(V) irlspost(3) makes no difference. -9 gains 5 resp. 10 kbit/s.
(VI) Just don't. One file up a couple hundred bytes, one down about the same.

Could you explain this a little more, I can't follow. (I) has only 1 result for -8 but (II) and (III) have 2 which are in completely different ballparks? The number in the 6000-range is the 384kHz file and the one in the 2000-range the 88.2kHz number, I think? What is the 4000-range number for (I)?

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-04 14:27:44

Gosh, sorry. (4515 is the average of the two.) And then I got something wrong because (I) and (II) yield the same at -e. Restating (I) and (II):

(I): -8 yields 6884 and 2146. -8 -e yields 6842 vs 2127, improving 42 resp. 19.
(II): -8 yields 6869 and 2142. Adding a "-e": again 6842 resp 2127, so the "-e" improvements are down to 25 resp. 15.

That's still about the same ballpark when you compare to what gives the BIG differences on the 352.8 file. While the 88.2 varies with at most 46 kbit/s - that is still two percent! - the big one is reduced by six percent, and the contributions can be summarized in order of significance:

* 327 (that is nearly five percent) by going from your "native" build -8 -e to the double-precision build at -8
* 79 by going native -8 -e to IRLS -8 (without -e)
* Twenties to forties: adding "-e" in (I), (II), (III)
* A few kbit/s: Going 1.3.1 -8 to native -8; and, adding "-p" to IRLS -8;
* Zero to one, all on the IRLS build: adding "-e" to -8; further adding "-p" to -8 -e and to -9.

Here since I had to do things over again, I also let the IRLS beta do -p. The bitrates for the big file for the IRLS version are
6444 for -8 -e (gains 1208 bytes, that is 15 parts per million, over -8)
6439 for -9
6438 for -8 -p (improves over -8 -e, contrary to my uh, well-founded prejudices ...),
6437 for -8 -p -e
6437 for -9 -p

So the huge gain is the move to double precision; and then on top of that, your new build improves further, dominating the advantages that "-e" used to give. Then my interpretation was that new -8 is so good that "-e" no longer saves much; but, here -p did something.

(Hm, is there any way to make an "intelligent guess at doing halfway -e, halfway -p, halfway various -b" without going full brutal on it all? Let's not call the semiexpensive switch "--sex", rather ... hey, -x is not taken.)

Oh, and one more thing. On the seven-file corpus, where I said -9 -p is so slow I wouldn't touch it again: nevertheless, to get a more accurate timing of it, I gave it an overnight job to arrive at around 1x realtime, i.e.:
-9 -p = -9 -p -e took around twice the time of -8 -p -e.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-05 20:56:51

New testing with conflicting results, CDDA corpus and new higher-rez.

A bit of tuning on the previous 96/24 found that I could use -8 -Ax4, where the four being punchout_tukey(7) and partial_tukeys 7, 4 and 1 (=the default tukey(5e-1) to beat -9 - and comfortably and a fraction of the time.
So to see if that was a more universal observation, I ditched the old corpus and found a new. Results vary wildly from CDDA to hi-rez.

Second and third columns are improvement in 1/100 percent (not percentage points!) over new -8 and over the previous boldfaced setting in the column; the "-b 4608" is over the setting in question.

Second column is CDDA, third is hirez. Positives are savings. Fourth column is encoding time. For "reference": new -7 (damn good I say!), new -5, and the original 1.3.1 -8 files.

*EDIT:* Facepalm, the table had only encoding time for the full thing, not CDDA. And -e is more expensive for higher rez, so that -8 -e actually takes less time than the Ax4 (458 vs 475 seconds). Trying to edit manually:

*Setting etc*	*CDDA*	*hirez*	*encoding time*




1.3.1 -8 to new -8	-11.7	-542.1	n/a


-5 to -8	-48.8	-167.6	149


-7 to -8	-3.7	-17.5	242


(This is -8)	0	0	347

-b 4608 impr:	-0.6	2.2


-e over -8	1.3	13.5	869	CDDA: 458
over previous	1.3	13.5
-b 4608 impr:	-0.6	2.3


Ax4 over -8	4.5	35.7	838	CDDA: 475
over previous	3.2	22.1
-b 4608 impr:	-0.3	3.1


Ax4 -r8 over -8	4.6	35.6	894
over previous	0.1	-0.1
-b 4608 impr:	-0.3	3.4


Ax4 -r8 -e over -8	5.8	42.0	3429
over previous	1.2	6.4
-b 4608 impr:	-0.3	3.5


-9 over -8	10.9	29.1	2820	CDDA: 1537
over previous	5.1	-12.9
-b 4608 impr:	-1.4	3.7


-9 -r8 over -8	11.0	29.0	3794
over previous	0.1	-0.1
-b 4608 impr:	-1.3	3.7

... thanks to whomever posted the link to https://theenemy.dk/table/

What we can see:
* No reason to use -8 -e. The "Ax4" beats -e quite a lot, at about the same time.
* -9 for CDDA: improvement over Ax4 even more than Ax4 improves over -e, but expensive in time.
* -9 for hi-rez: loses to Ax4
* -b 4608 hurts CDDA slightly (unexpected), benefits hi-rez (expected).
* Net effect of -r 8 is around zero.

Music then.
CDDA, I sorted my collection (tracks, not images!) on audio MD5, started somewhere and highlighted about 2 GB FLAC (1.3.1) files consecutive in the sorting. That's Beastie Boys and Black Sabbath, Zappa and Muse, Skinny Puppy and Creedence ... but overall heavy on electric guitar-driven rock, prog and metal.
hi-rez, I took a few albums and topped up with what was hi-rez from an Earache free sampler - makes this even heavier metal oriented than the CDDA part. Got to two GB there too.

No classical this time, because I have not much more > CDDA than what I have already used.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-08 14:22:07

Well as if I haven't already posted enough misunderstandings of mine, here I think I made another.

Quote from: Porcus on 2021-10-05 20:56:51

found that I could use -8 -Ax4, where the four being punchout_tukey(7) and partial_tukeys 7, 4 and 1 (=the default tukey(5e-1) to beat -9

Let's see: partial_tukey(N) creates N functions and as I think @ktf tried to teach me, each of them apparently takes about as much computational effort as any other. (I casually tested ... it seems so.)
Back in the younger neolithic, this forum discussed how -Ax2 was interesting but not worth it; well -8 is now an Ax6 and the above is a nineteen to beat -e.

And for -9 and IRLS: a -Ax19 -e to beat -Ax6+irlspost+-e ...

(Looks like the battle between brute force and a sensible algorithm. Except, it seems, the sensible algorithm takes so much time that semi-Brutus can give it a few casual stabs and then go home.)

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-27 23:17:19

Anyone feels like testing the flac-irls-2021-09-21.exe posted at https://hydrogenaud.io/index.php?topic=120158.msg1003256#msg1003256 with parameters as below? Rather than me writing walls of text about something that could be spurious ...

For CDDA:
* Time (should be similar) and size: -8 against -8 -A "tukey(5e-1);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)" against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)" (I expect no big differences between the latter two.)
* Time (should not be too different) and size: -7 -p against -8 -A "welch;flattop;partial_tukey(3/0/999e-3);punchout_tukey(4/0/8e-2)" (or replace the welch by the tukey if you like)
* Time (should not be too different) and size (will differ!): -8 -e against -8 -p against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)"
Note, it is irlspost, not irlspost-p.

For samplerate at least 88.2:
* -8 against -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"
* For each of those two: How much does -e improve size?
* How much larger and faster than -e, is -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)" ?

My tests indicate that the gauss(3e-3) combination is impresses nobody on CDDA, makes very little difference on most hirez files - but for a few it could be a percent. And, then the "-e" improvement was a WTF. But hi-rez performance is absolutely not consistent ... well it is much better than the official release.

Title: Re: New FLAC compression improvement
Post by: bennetng on 2021-10-30 11:33:20

Quote from: Porcus on 2021-10-27 23:17:19

But hi-rez performance is absolutely not consistent

I guess it's because most of the CDDA frequnecy bandwidth in typical audio files is used to encode audible signal with more predictable harmonic and transient structure. On the other hand, hi-res files can have gross amount of modulator noise from AD converters, idle tones, obvious low pass at 20-24kHz, resampling artifacts, and occasionally real musical harmonics and transients in the ultrasonic frequencies. They are quite different things and therefore hard to predict.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-10-31 19:23:39

Quote from: bennetng on 2021-10-30 11:33:20

and occasionally real musical harmonics and transients

Does my sensitive snout smell the sweet scents of sarcasm ... ? :)) Indeed I suspect the same reasons.
And even if it weren't, it would still be well justified to tune the encoder more for CDDA than for formats with less market share.

Still it would have been nice to find something that catches several phenomena cheaply. I ran mass-testing on one hirez corpus and then observed the same thing on another - but due to very few files in both - and I still wonder if it is too spurious.

Title: Re: New FLAC compression improvement
Post by: bennetng on 2021-11-01 11:34:39

Quote from: Porcus on 2021-10-31 19:23:39

Does my sensitive snout smell the sweet scents of sarcasm ... ? :))

Maybe, but speaking of sarcasm and unpredictable factors:
https://www.audiosciencereview.com/forum/index.php?threads/sound-liaison-pcm-dxd-dsd-free-compare-formats-sampler-a-new-2-0-version.23274/post-793538
Those who produce hi-res recording couldn't hear those beeps in the first place?

I've seen lossy video codecs can be tuned to optimize for film grain, Anime and such, but don't know if it is relevant to lossless audio or not.

Title: Re: New FLAC compression improvement
Post by: rrx on 2021-11-05 12:33:06

Quote from: Porcus on 2021-10-27 23:17:19

Anyone feels like testing the flac-irls-2021-09-21.exe posted at https://hydrogenaud.io/index.php?topic=120158.msg1003256#msg1003256 with parameters as below? Rather than me writing walls of text about something that could be spurious ...

For CDDA:
* Time (should be similar) and size: -8 against -8 -A "tukey(5e-1);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)" against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)" (I expect no big differences between the latter two.)
* Time (should not be too different) and size: -7 -p against -8 -A "welch;flattop;partial_tukey(3/0/999e-3);punchout_tukey(4/0/8e-2)" (or replace the welch by the tukey if you like)
* Time (should not be too different) and size (will differ!): -8 -e against -8 -p against -8 -A "welch;partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)"
Note, it is irlspost, not irlspost-p.

For samplerate at least 88.2:
* -8 against -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)"
* For each of those two: How much does -e improve size?
* How much larger and faster than -e, is -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2);irlspost(1)" ?

My tests indicate that the gauss(3e-3) combination is impresses nobody on CDDA, makes very little difference on most hirez files - but for a few it could be a percent. And, then the "-e" improvement was a WTF. But hi-rez performance is absolutely not consistent ... well it is much better than the official release.

I'm too lazy to be putting together the encoding times table, but here's a few compression comparisons:

16/44.1 (https://i.imgur.com/ixrbCkI.jpeg), 16/44.1 (https://i.imgur.com/JwIAnnA.jpeg), 16/44.1 (https://i.imgur.com/UEHZ7I3.jpeg), 16/44.1 (https://i.imgur.com/BDfGP73.jpeg), 16/44.1 (https://i.imgur.com/OzH7fCJ.jpeg), 16/44.1 (https://i.imgur.com/7UZyTVJ.jpeg), 24/44.1 (https://i.imgur.com/bP4Ha7T.jpeg), 16/48 (https://i.imgur.com/oR40XPd.jpeg), 24/48 (https://i.imgur.com/0ISEpYK.jpeg), 24/96 (https://i.imgur.com/D3vBuu3.jpeg), 24/192 (https://i.imgur.com/hb4Vye3.jpeg), 24/192 (https://i.imgur.com/5HW8MEx.jpeg)

Based on what we have there, flac-irls-2021-09-21 -8 -p outperformed every other parameter set in eleven cases out of twelve, -8 -e winning in one case. What surprised me was that flac-irls-2021-09-21 -8 -p (and in some cases just -8) also outperformed flaccl v2.1.9 -11 in eight cases out of twelve, flaccl curiously winning in both 24/192 cases and in two 16/44.1 cases out of six. All of outperforming, mind, was achieved by a very slim margin.

In terms of speed, I did a quick test with Amon Tobin's How Do You Live LP, 16/44.1.
For flac-irls-2021-09-21 -8 -p, total encoding time was 0:53.509, 50.33x realtime, while for flaccl -11 it was 0:23.978, 112.33x realtime.
Decoding time for flac-irls-2021-09-21 -8 -p tracks was 0:03.284, 818.845x realtime, and for flaccl -11 it was 0:03.496, 769.270x realtime.

P.S. Whoops, I missed -8 -A "gauss(3e-3);partial_tukey(2/0/999e-3);punchout_tukey(3/0/8e-2)". Oh well.

Title: Re: New FLAC compression improvement
Post by: Adil on 2021-11-05 12:47:53

@rrx
What syntax do you use to view percentage rates in the "Compression" tab of Playlist View?

Title: Re: New FLAC compression improvement
Post by: rrx on 2021-11-05 12:53:03

Quote from: Adil on 2021-11-05 12:47:53

@rrx
What syntax do you use to view percentage rates in the "Compression" tab of Playlist View?

Good question.

Code: [Select]

$if($or($strcmp($ext(%path%),cue),$stricmp($ext(%path%),ifo),$stricmp($info(cue_embedded),yes)),$puts(percent,$div($div($mul(100000000,%length_samples%,%bitrate%),%samplerate%),$mul($info(channels),%length_samples%,$if($strcmp($info(encoding),lossless),$info(Bitspersample),16))))$left($get(percent),$sub($len($get(percent)),3))','$right($get(percent),3),$puts(percent,$div($mul(800000,%filesize%),$mul($info(channels),%length_samples%,$if($stricmp($info(encoding),lossless),$info(Bitspersample),16))))$left($get(percent),$sub($len($get(percent)),3))','$right($get(percent),3))'%')

Title: Re: New FLAC compression improvement
Post by: Adil on 2021-11-05 13:02:02

Thank you very much!

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-11-05 14:26:40

Besides the testing here has anyone an idea if there ever will be another official flac release? Since the cancelled 1.34 version it became very silent.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2021-11-05 15:55:01

Would be bloody annoying if not, when ktf has found the improvements that gave the independent implementations the upper hand over the official for fifteen yeart, and fixed bugs that made files blow up (https://hydrogenaud.io/index.php?topic=121349.msg1001227#msg1001227).

Title: Re: New FLAC compression improvement
Post by: ktf on 2021-11-09 19:35:28

Okay, I have something fresh to chew on for those interested: a Windows 64-bit binary is attached. Code is here but needs cleaning up before this could be merged (https://github.com/ktmf01/flac/tree/autoc-double-irls-subblock).

I've rewritten the partial_tukey(n) and punchout_tukey(n) code into something new: subblock(n). Partial_tukey(n) and punchout_tukey(n) still exist, this is a (faster) reimplementation recycling as much calculations as is possible. It is rather easy to use:

using flac -A subblock(2) is roughly similar to using flac -A tukey(7e-2);partial_tukey(2/0/14e-2)
using flac -A subblock(3) is roughly similar to using flac -A tukey(5e-2);partial_tukey(2/0/1e-1);partial_tukey(3/0/15e-2);punchout_tukey(3/0/15e-2)
using flac -A subblock(4) is roughly similar to using flac -A tukey(4e-2);partial_tukey(2/0/8e-2);partial_tukey(3/0/12e-2);punchout_tukey(3/0/12e-2);partial_tukey(4/0/16e-2);punchout_tukey(4/0/16e-2)

The main benefit is that it is a bit faster and also much cleaner to read. The reason for the weird tukey parameters (4e-2 etc.) is that the sloped parts for the main tukey and the partial tukeys have to be the same, as the code works by subtracting the autocorrelation calculated for the partial tukey to calculate the punchout tukey.

Attached is also a PDF with three lines:

the darkblue one with triangles is the current git (without any of the patches mentioned in this thread)
the green one with squares is with all patches up until now
the lightblue one with 'diamonds' is the binary I'm attaching, in which settings -6, -7, -8 and -9 use the new subblock apodization

From right to left are encoder presets -4, -5, -6, -7 and -8, with the darkgreen extending further left with -9 and the lightblue extending further left with -8 -A subblock(6), -9 and finally -9 -A subblock(6);irlspost-p(3). To make it even more complex. The darkgreen -9 was defined as -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3);irlspost-p(3)" with -e, the lightblue -9 is defined as "subblock(3);irlspost-p(4)" without -e. That last change was a suggestion from @Porcus

As you can see in the graph, this change makes presets -6, -7 and -8 a little faster. -9 is now 3 times as fast (mind that the scale isn't logarithmical) but compression is slightly worse. Perhaps I should increase to irlspost-p(5).

edit: I just realize that subblock is probably a confusing name, so perhaps I'll come up with another.

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-11-10 03:21:59

Still the same 29 CDs. Not sure -9 is convincing enough.

IRLS-subblock beta -8
7.590.501.741 Bytes

IRLS-subblock beta -8 -ep
7.584.176.739 Bytes

IRLS-subblock beta -9
7.588.567.011 Bytes

IRLS-subblock beta -9 -ep
7.584.089.533 Bytes

older results:
IRLS beta -9
7.583.395.627 Bytes

IRLS beta -9 -ep
7.582.193.957 Bytes

CUEtools flake -8
7.591.331.615 Bytes

Title: Re: New FLAC compression improvement
Post by: Wombat on 2021-11-10 04:33:20

Missed that one, sorry. Only slightly slower as -9

IRLS-subblock beta -8 -p
7.585.883.902 Bytes

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-25 13:54:51

So the build posted in https://hydrogenaud.io/index.php/topic,122179.msg1014061.html#msg1014061 has implemented the "subblock" as "subdivide_tukey(n)" and also the double precision fix. I ran it through the same high resolution corpus as in Reply #54 (https://hydrogenaud.io/index.php/topic,120158.msg1003288.html#msg1003288). I forgot all about padding this time, all numbers are with default padding, file sizes as reported by Windows.

TL;DR up to -8:
* New -7 is damn good and takes about the same time as old -8. Consistent with results on the test builds posted here.
* -6 is still like -7 -l 8, right? The thing is, -6 was completely useless on this material. -7 would be within seconds (actually, re-runs indicate that the seven seconds quoted below could be a high estimate of the time difference), and the size gain over -6 would dwarf the -5 to -6
* Nothing says -l 12 is the sweet spot. It could be higher than 12.
(It is known that higher -l is not unambiguously better, too high -l might lead to bad estimates and bad choices made; Indeed, results are ambiguous going from -7 -l 14 to -7 -l 15. But, all files improved going from -8 -l 14 to -8 -l 15.
Obvious first step is to test whether -l 13 or -l 14 should be considered for presets -7 or -8.)

TL;DR for those who want more than -8:
* -8e and -8p get completely murdered here. In particular, higher subdivide_tukey achieves smaller files in shorter time. I've had some surprises on the relationship between -8e and -8p, but on this corpus, either can be improved upon at half the time.
* Somewhat higher -l appears to improve even more and cheaper. While all files are bigger with -8 -l 14 than with -8 -l 15 and the improvement is cheap per MB, I would be careful about drawing conclusions on what -l to recommend. After all, small corpus and only high resolution.,
* size gains (= s(h)avings!) are often concentrated on only a few signals.

Results. Initially I was curious to see about what -8e -A "subdivide_tukey(n)" would outcompress -8 -A "subdivide_tukey(n+1)". Turns out ... well, numbers are telling. Uncompressed wave is 14 620 215 098 file size (seven hours three minutes). FLAC sizes are file sizes, no tags but default padding used.
1.3.4 -8: 464 seconds for 57.85 percent of wav size.
New -3: 244 seconds for 57.35. Not all albums improve over old -8. Kayo Dot is up nearly a percent, from 2 264 254 762 to 2 285 220 291
New -5: 311 seconds for 56.34.
New -6: 434 seconds for 56.27. Not attractive! 123 seconds more than -5 for 10.3 MB gains - spend a few seconds more and gain 47.
New -7: 441 seconds for 55.95. Time pretty close to old -8. All improved over old -8 (Kayo Dot only .7 percent though.)
New -7 -l 13: 463 seconds for 55.93. Just because how -7 improves over -6, I wondered whether it makes sense to stretch the -l even further. It seems to do so: a third of the size gain in an eleventh of the time penalty.
New -7 -l 14: 463 seconds here too. 55.92, and The Tea Party EP accounts for > 1.5 MB of the gain over -l 13. Indicates that FLAC rarely chooses this high order? Anyway, -l 14 takes out half the size gain between -7 and -8.
New -7 -l 15: 477 seconds for 55.91. Ambiguous result: remove TTP and sizes would be bigger than -7 -l 14. TTP still .7 GB bigger than -8.

-8 and above follow, all figures are 55.8xy hence a third decimal for these small differences. "-8 -st4" is short for -8 -A "subdivide_tukey(4)" etc.
plain -8: 673 seconds for 55.895. The TTP EP accounts for more than half of the size gain over -7, (4 422 584), despite being only 4.2 percent of total duration.
-8 -l 14: 734 seconds for 55.860. 70 percent of the size gain over -8 is TTP.
-8 -l 15: 749 seconds for 55.848. All files improve over -8 -l 14 (contrary to using -7), but again most (1MB) of the size gain is TTP.
-8e: 38 minutes and 55.879. Larger than -8 -A "subdivide_tukey(4)" (one file is a kilobyte smaller though), and slower than -8 -A "subdivide_tukey(7)"
-8p: 48 minutes and 55.873. All the -8p files are larger than with -8 -A "subdivide_tukey(5)". (Ambiguous against (4).)
-8 st4: 15 minutes, 55.870. Again, TTP accounts for most of the size gain over -8: 1 983 892 of 3 660 748. (Which in turn is ~1/40th of a percentage point. All this over plain -8, no -e nor -p)
-8 st5: 20 minutes, 55.855. Takes this much to beat -8 -l 14. TTP accounts for nearly half the 2 MB size gain over st=4 and is now 19 kilobytes smaller than -8 -l 14, Cult of Luna (nine percent of duration) thirty percent.
-8 st6: 26 minutes, 55.845. Takes this much to beat -8 -l 15 (on average - the TTP is bigger than the -8 -l 15 version). TTP nearly half, TTP&CoL three quarters of size gain over st=5.
-8 st7: 36 minutes, 55.838. TTP still bigger than -8 l 15. TTP&CoL three quarters of size gain over st=6.
-8 st8: 41 minutes, 55.833. TTP&CoL three quarters of size gain.
-8 st9: 51 minutes, 55.829. TTP&CoL nearly three quarters of size gain.
-8 st10:59 minutes, 55.825. TTP&CoL are seventy percent of the half a megabyte gained.
-8 st11:90 minutes, 55.823. Ditto. That was a jump in time.
-8e -A "subdivide_tukey(5)": 93 minutes, 55.842. Between -8 st6 and 7 in size.

Why go to these extremes? More to (t)establish that -e is not worth it. An initial test on one file indicated that the "-e" would overtake "the next n in subdivide_tukey(n)" around 5. Then I ran an overnight test, and lo and behold: -8e -A "subdivide_tukey(5)" puts itself between 6 and 7 (Cult of Luna makes for half the gain over subdivide_tukey(6)) - and is so slow that well, here you got -8 -A "subdivide_tukey(11)" just to compare. Not saying it is at all useful.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-25 14:58:55

Oh, one more thing tested: Does an additional flattop help the -8? Spoiler: not worth it.

Reason to ask: The subdivide_tukey sets out to test many windowing functions by designing them to recycle calculations. It is based on a successful choice, but the recent modification is designed for speed. Interesting question is then if forcing in an additional function makes for big improvements - possibly costly yes. If it does not: fine!
Choosing flattop as it is quite different from tukey and, back in the day when both CPUs and FLAC were slower, it was often included when people tested "Ax2".

Result: Does not take much extra time, doesn't make much improvement. Tried both orders, remember -8 took 673 seconds:
-8 -A "subdivide_tukey(3);flattop". 698 seconds, improves 194 kilobytes or 0.0013 percentage points.
-8 -A "flattop;subdivide_tukey(3)". slightly slower, a kilobyte bigger.
(Didn't bother to check individual files.)

Conclusion: subdivide_tukey is fine! At least on this high resolution corpus.

Oh, and:

Quote from: Porcus on 2022-08-25 13:54:51

(It is known that higher -l is not unambiguously better.

... in size. (Speed totally disregarded.)

Title: Re: New FLAC compression improvement
Post by: Wombat on 2022-08-26 03:59:03

On the 29 CD corpus this git build pretty much behaves the same as the IRLS-subblock beta above.
-8 -p (7.585.569.422 Bytes) is absolutely usable with that while -ep for my taste is much to slow to justify.

I tried that -8 -A "subdivide_tukey(9)" and with these CD Resolution files it seems not to scale as good.
7.587.283.854 Bytes

Title: Re: New FLAC compression improvement
Post by: ktf on 2022-08-26 19:35:48

I'll have a go too :D

Here's a comparison of FLAC 1.3.4 and current git (freshly compiled with the same toolchain) for 16-bit material. Compression presets are -0 at the top left through -8 at the bottom right.

Here's the same thing for 24-bit material

Here's a comparison on the same material, but starting with -4 at the top left and with a few additions

The same thing for 24-bit material

edit: I just forgot to say, these additions seem to have made -e useless, I have plans to make -p 'useless' as well. -e and -p are brute-force approaches, and I see a possibility to have something outsmart -p fast enough to be universally applicable. I've worked with @SebastianG on this. As soon as the next FLAC release is out and the code is no longer in need of fixes, maybe I can work on that. In the meantime, I've posted a math question on StackExchange (https://math.stackexchange.com/questions/4488974/efficient-way-of-solving-a-matrix-equation-with-integer-solution) but sadly with little reply. Maybe someone here can help out.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-26 22:31:51

24 bits, but still 44.1 kHz or 48 kHz? Strange things seem to happen at higher sampling rates.
(Which brings me to a completely different question: why is the subset requirement less restrictive for higher sampling rates? Getting streamability would be a harder task when you have to process more data per second, so why allow for an additional computational burden in the algorithm precisely where it becomes harder in the data as well? Is it so that, twenty years ago when there hardly was any high-res, one accepted that a "FLAC" player could just specify that it didn't accept higher resolutions?)

As for the StackExchange question ... integer programming is not my thing, and it isn't easy [enter joke about being NP-hard ...].
Also here the objective is - well at least nearly - to find a predictor that makes the encoded residual short. And the integer solution to the matrix eq is strictly speaking a solution to a different problem?
(Typically, how much of the time is spent on solving for the predictor and how much is spent on packing the residual?)

Title: Re: New FLAC compression improvement
Post by: ktf on 2022-08-27 09:07:49

Quote from: Porcus on 2022-08-26 22:31:51

24 bits, but still 44.1 kHz or 48 kHz? Strange things seem to happen at higher sampling rates.

I'll rerun with upsampled material.

Quote

(Which brings me to a completely different question: why is the subset requirement less restrictive for higher sampling rates? Getting streamability would be a harder task when you have to process more data per second, so why allow for an additional computational burden in the algorithm precisely where it becomes harder in the data as well? Is it so that, twenty years ago when there hardly was any high-res, one accepted that a "FLAC" player could just specify that it didn't accept higher resolutions?)

I don't know what the rationale was. Looking at the testbench results (https://wiki.hydrogenaud.io/index.php?title=FLAC_decoder_testbench) it is clear many hardware players still simply ignore high-res material.

I can only conjecture the idea here was that the blocksize for CDDA translates to a higher blocksize if you want the same number of blocks per second. If it is still music sampled at higher rates, this makes sense: if a certain blocksize and predictor order makes sense for music in general, than double the sample rate would imply double the blocksize and double the predictor order. This would imply music waveforms remain stable (i.e. well-predictable) for about 100ms.

This is actually how ffmpeg encodes FLAC: it doesn't set a blocksize, but a timebase. For 44.1kHz material this maxes out the blocksize at 4906, at 96kHz it picks 8192 and at 192kHz is uses 16384. It sticks to standard blocksizes.

Quote

As for the StackExchange question ... integer programming is not my thing, and it isn't easy [enter joke about being NP-hard ...].
Also here the objective is - well at least nearly - to find a predictor that makes the encoded residual short. And the integer solution to the matrix eq is strictly speaking a solution to a different problem?

The objective is to find a predictor that produces a residual that can be stored with the least bits, yes. Translating this into a mathematical description is rather hard: it isn't a least-squares problem, but it isn't a least-absolute-deviation problem either.

Quote

(Typically, how much of the time is spent on solving for the predictor and how much is spent on packing the residual?)

For the not-so-brute-force presets (like 3, 4 and 5) about one third of the time is spent on finding a predictor, one third on calculating the residual and one third on other things like reading, writing, logic etc. preset 5 only brute-forces the stereo decorrelation part, nothing else.

Preset 8 is closer to half of the time spent on finding a predictor, half of the time calculating a residual and a negligible amount of time on other things. For -8ep pretty much all of the time is spent calculating residuals for various ways of expressing the same predictor.

Now for the reason I think the integer solution is a possible improvement: look at the 24-bit graphs I just posted. The size of a single subframe at preset 8 is 24 * 4096 * 0.715 = 70000 bits. Assuming the predictor order is 10 on average , the predictor takes up 10*15 = 150 bits, which is 0.2%. With -p, the precision can at most be lowered from 15 to 5, meaning the predictor cannot grow smaller than 50 bits, saving 0.15%. However, in many cases, using -p gives you more than that theoretical 0.15%. Why is this?

I think the problem is that the predictor is calculated in double precision floating point numbers, and the rounding treats each predictor coefficient as independent. However, these coefficients aren't independent. As p simply successively tries coarser roundings, it might hit a point where the most important coefficients 'round the right way'. That would explain why the possible savings are higher than one would expect from only looking at the space saved by storing smaller predictor coefficients.

Now, one way to round coefficients smarter is by not treating each coefficient individually but by treating them together, as a vector. That's why I think this is a closest vector problem, but I'm not sure.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-27 10:15:20

Quote from: ktf on 2022-08-27 09:07:49

As p simply successively tries coarser roundings, it might hit a point where the most important coefficients 'round the right way'. That would explain why the possible savings are higher than one would expect from only looking at the space saved by storing smaller predictor coefficients.

Now, one way to round coefficients smarter is by not treating each coefficient individually but by treating them together, as a vector. That's why I think this is a closest vector problem, but I'm not sure.

And here is where I was - for no good reason but "came to think of" - suspecting:
* The closest vector would not solve the ultimate problem
* Rounding off is a way of "just moving coefficients in some direction, hoping for an improvement, and comparing"
So, conjecture: part of the success of -p is not saving space for the coefficients, but "randomly" finding a better fit (I think that is what you are suggesting too?).
But then: unless the closest vector is truly a good choice for the ultimate problem, it is not at all given that you should go to great efforts finding it. It is not at all given that you should go to great efforts minimising an L1 norm. For example, it could be that searching in a direction that happens to improve would pay off.
If it tries each coefficient independent, does it then (for order 10) try 10 individual round-offs first? And if one is for the worse, does it round the other direction?

(... calling for a post-processor that takes an encoded FLAC file and works from there)

Title: Re: New FLAC compression improvement
Post by: ktf on 2022-08-27 19:59:28

Here is the upsampled material. This is the exact same material as for the 16-bit tests, but upsampled. So, there is no content above 20kHz.

Here's presets -0 through -8

Here's -4, -5, -6, -7, -8, -8 -A subdivide_tukey(4), -8 -A subdivide_tukey(6), -8e and -8ep.

The difference here is indeed stunning.

Quote from: Porcus on 2022-08-27 10:15:20

And here is where I was - for no good reason but "came to think of" - suspecting:
* The closest vector would not solve the ultimate problem

Yes, you are right. When the ultimate problem is 'get a quantized predictor that produces the smallest possible subframe' then yes, this is not that solution. However, it might be used as a part in that ultimate solution.

Quote

* Rounding off is a way of "just moving coefficients in some direction, hoping for an improvement, and comparing"
So, conjecture: part of the success of -p is not saving space for the coefficients, but "randomly" finding a better fit (I think that is what you are suggesting too?).

To remain lossless, the predictor in the FLAC format uses integer coefficients only. So, if one doesn't use integer programming to get to an integer solution all the way, there has to be some rounding. And yes, what you conjecture is what I suggested.

Quote

But then: unless the closest vector is truly a good choice for the ultimate problem, it is not at all given that you should go to great efforts finding it.

Yes, I'm not sure that this will help at all, but it seems good enough of an idea to pursue. It just seems that there should be a better way to get a quantized predictor than some simple rounding.

Quote

It is not at all given that you should go to great efforts minimising an L1 norm. For example, it could be that searching in a direction that happens to improve would pay off. If it tries each coefficient independent, does it then (for order 10) try 10 individual round-offs first? And if one is for the worse, does it round the other direction?

I'd like to get off the brute-force approach for a while. If you want to check individual round-offs, that would take 24 residual calculations at least for a 12th order predictor, which would make it about twice as slow as just using -p (which tries 10 different roundings). Furthermore, that misses relations between coefficients. It might be that rounding the 1st coefficient up means that the second coefficient should be rounded down to compensate, if they are linked. Finding such a relation would require a lot more brute-force.

However, as it seems to me, these relations are embedded in the matrix that is solved to find the (unquantized) predictor. If it is not solved directly and the result then rounded, but used directly to find a integer vector, than these relations can be used to find a predictor that comes closer to the original values.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-28 10:38:56

Yeah, so ... getting a gradient by say, starting with solving for a z, rounded off (floor/closest/whatever) to a vector x of integers, then
calculate total size S(x); then proceeding to
calculate S(x+e_i)-S(x) for each standard unit vector e_i,
and one will have done 1 + 12 = 13 full compressions just to get a direction of steepest descent.

Instead, we conjecture that a good choice for x has Ax-b small - and then starting with the naive round-off of the "exact argmin" z,, we want to get closer, cheaper than by full compression?

Then for the "outsmart" part: the z vector minimizes a norm that, well we know it isn't likely to be the best? Which spawns the question: if another norm (say L1) is conjectured to be a better choice (but computationally nowhere as tractable, cf the IRLS buils), and you want to round off in a direction that improves - why not use the roundoffs to pursue an improvement in that one?

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-29 13:31:47

Back to the graphs. I didn't realize that FLAC 1.3.4 also has a "useless -6". But there are other settings that are ... well, "easy to improve upon at costs the user would already have accepted". Including -8, but the "-8" name is kinda reserved - it is easier to play with -6 and -7.

If we disregard presets that "serve special purposes" (that would be 0 to 3), then a "reasonable choice of presets" would make for a convex graph: the return (in terms of bytes saved) should diminish in CPU effort, because the encoder presets should pick the low-hanging fruit first.
In particular: if we extrapolate the line from -4 to -5 forth and it later (that is, further to the right in these graphs) it hits or overshoots a dot (say -7), then I would claim "if you are willing to wait the extra time taken for the improvement of -5 over -4, then you would also accept the time taken for the improvement to -7". This does assume that the user's your time/size trade-off is constant, which I think is reasonable for users who are indeed willing to go beyond default (and especially for those willing to wait for -8)

Which is what I mean by "useless -6":
* If you are willing to wait for -6 (over -5), you should be willing to wait for -7.
* And if you are willing to wait for -5 (from -4, the previous dot in the convex minorant!), then "at least nearly" you would be willing to wait for -7 for CDDA (at least you shouldn't complain much!), and you should definitely in your graph of upsampled material.
Furthermore:
* For CDDA: If you are willing to go from -7 to -8, you should also be willing to run -8p.
* For the upsamples: If you are willing to go from -7 to -8, then you should at least nearly accept -A subdivide_tukey(6)

Now here is a question:
Can the flac encoder easily accommodate different presets depending on signal? Like what ffmpeg does:

Quote from: ktf on 2022-08-27 09:07:49

This is actually how ffmpeg encodes FLAC: it doesn't set a blocksize, but a timebase. For 44.1kHz material this maxes out the blocksize at 4906, at 96kHz it picks 8192 and at 192kHz is uses 16384. It sticks to standard blocksizes.

Reference FLAC seems well tuned to select 4096 (4906 was a typo) for CDDA; the "also-standard" 4608 seems not to be of much use (maybe some curious people can test if it is an idea for 48k ... hm, another idea could be -b <list of blocksizes>, but right now you are not in the mood for brute force I see ;-) .)

Say, -8 could be unchanged for signals up to 64k (threshold could be determined by a bit of testing), and from then on and up, selection -8 would invoke also -b 8192 -l 14 (and maybe a higher subdivide_tukey). But that comes at a risk of people complaining over slowdowns. After all, git -8 is so much slower than 1.3.4 -8 on high resolution (although it IMHO pays off!) that maybe, maybe, a split-by-sample-rate option should rather speed up high resolution.
So, alternatives might be e.g. a semi-documented --Best with capital B, Don't call it --super-secret-totally-impractical-compression-level, because it is likely to be "practical" to everyone who would find -8 worth it. The "-9" is a bit dangerous to spend ... or well, "-9" could stand for "higher compression than -8, but beware it will change in git whenever we feel like testing a possible improvement" (and thus you are free to alter -9 to an IRLS in a test build too).

Title: Re: New FLAC compression improvement
Post by: MrRom92 on 2022-08-29 17:56:59

As a somewhat basic end user who doesn’t understand the vast majority of the technical discussion here over the past few days - I do like the idea of some sort of preset that always enables the most extreme form of compression, with no other considerations to processing speed or anything else. With the caveat that it’s experimental and may change across future builds. If I’m not misunderstanding that this is what’s being proposed?

If it has a cute name like —super-secret-totally-impractical-compression-level, then even better!

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-29 19:00:43

Quote from: MrRom92 on 2022-08-29 17:56:59

If I’m not misunderstanding that this is what’s being proposed?

Actually it isn't precisely what is being done. Sometimes we/I cannot help ourselves/myself putting on some stupidly slow compression job before going away for the week, and by posting it here I have probably fueled that misunderstanding big time.

But if you want the most extreme form of compression, then FLAC is not the codec. FLAC was created for low decoding footprint - i.e. to be played back on ridiculously low power devices - it takes less computing power than e.g. mp3 to decode! If you look at ktf's lossless tests at http://www.audiograaf.nl/downloads.html , you will see that there are codecs that out-compress FLAC but require a lot more computational effort.

FLAC is extremely "asymmetric": There is hardly any limit to how slow you can get FLAC encoding by going beyond the presets (the -0 through -8) and directly accessing the filter bank and ordering brute-force calculation of as much as possible - yet it will still decode at ultra-light footprint by fifteen year old decoder in a device that was considered low-power even at that time. This compression improvement thread is more about finding improvements within practical speeds - or at least, "sort of practical" speeds, again sometimes I cannot . Surely I have exceeded the "practical" every once in a while, but still.

There are even compressors that aren't even remotely useful for playback and that you won't find in ktf's comparisons up there. Look at https://hydrogenaud.io/index.php/topic,122040.msg1010086.html#msg1010086 when I put an encoder at work for like twelve hours plus on this single track (https://merzbow.bandcamp.com/track/i-lead-you-towards-glorious-times) to achieve something that would take twice as much time decoding than playing (well on an old CPU!) - but at a size much smaller than any playback-useful codec.
On the other hand, sometimes FLAC makes wonders (https://hydrogenaud.io/index.php/topic,122179.msg1014245.html#msg1014245) that other codecs can not.

"--super-secret-totally-impractical-compression-level" was indeed a name used in some very old flac versions. I don't know when it disappeared.

Title: Re: New FLAC compression improvement
Post by: Aleron Ives on 2022-08-29 21:38:25

I tried -8 -p in ye olde FLAC 1.2.1 (-e was already enabled with -8 back then), and not only did it make compression much slower, but the compression ratio actually got worse! :))

I admire your tenacity with trying to find better compression options when the potential gains are so small.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-29 22:15:00

Quote from: Aleron Ives on 2022-08-29 21:38:25

when the potential gains are so small.

... and worth so little in terms of hard drive cost.

But humans still play chess. Sometimes on their smartphones, which have more computing power than Deep Blue which 25 years ago defeated Garry Kasparov.

And computers still play chess. Heck, in the major computer chess championship, they even play on when one has already won the match. (https://chess24.com/en/watch/live-tournaments/tcec-season-20-superfinal-2021/1/1/100)

Title: Re: New FLAC compression improvement
Post by: MrRom92 on 2022-08-29 23:49:02

Hahah, yeah I don’t know if there will ever be any practical use to this but I guess my idea of “fun” is pushing the impractical to its limits. Some people like souping up a Honda Civic to drive like a racecar, I like seeing what can be done within the confines of the FLAC format, without breaking the standard or changing formats entirely. Different strokes for different folks…

As of now I use -8 -e -p as a standard across everything I do. On my old system it was pretty painfully slow and just compressing a 16/44.1 album could run slower than real time. A 24/192 release could take days on end. -8 was already fast enough to not even think twice about no matter the source file.

On my new system -8 -e -p runs in about the same amount of time it took my old system to handle a regular -8. Pretty amazing. Now I want to grind things down to a halt again!

Title: Re: New FLAC compression improvement
Post by: ktf on 2022-08-30 06:17:14

Quote from: MrRom92 on 2022-08-29 23:49:02

On my new system -8 -e -p runs in about the same amount of time it took my old system to handle a regular -8. Pretty amazing. Now I want to grind things down to a halt again!

Have you tried -8ep --lax -l 32 yet?

Title: Re: New FLAC compression improvement
Post by: MrRom92 on 2022-08-30 16:48:34

Quote from: ktf on 2022-08-30 06:17:14

Quote from: MrRom92 on 2022-08-29 23:49:02
On my new system -8 -e -p runs in about the same amount of time it took my old system to handle a regular -8. Pretty amazing. Now I want to grind things down to a halt again!

Have you tried -8ep --lax -l 32 yet?

I have not, but I do appreciate the suggestion!
I should clarify, I don’t want to generate non-subset files or break the standard in any way that might cause unexpected behaviors/incompatibilities… I’m more interested in finding the best possible compression, just within the confines of an otherwise “normal” FLAC file. Would -8ep be as far as things can be pushed given those considerations?

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-08-30 17:36:34

Quote from: MrRom92 on 2022-08-30 16:48:34

I don’t want to generate non-subset files

If sample rate is > 48 kHz, you can use -l 32 and still stay within subset. You can get the "Lindberg" I have used for testing in this thread, for free from http://www.2l.no/hires/ .

-8pe -r 8 -l 32
would be quite slow. But then you have not accessed the bank of flac's apodization functions other than through -8. You will find some examples here in this thread, but with the git build you can as well throw in another "-A subdivide_tukey(<unreasonable number>)" just for the excitement of watching paint dry.

Title: Re: New FLAC compression improvement
Post by: MrRom92 on 2022-08-30 20:11:35

Quote from: Porcus on 2022-08-30 17:36:34

Quote from: MrRom92 on 2022-08-30 16:48:34
I don’t want to generate non-subset files
If sample rate is > 48 kHz, you can use -l 32 and still stay within subset. You can get the "Lindberg" I have used for testing in this thread, for free from http://www.2l.no/hires/ .

-8pe -r 8 -l 32
would be quite slow. But then you have not accessed the bank of flac's apodization functions other than through -8. You will find some examples here in this thread, but with the git build you can as well throw in another "-A subdivide_tukey(<unreasonable number>)" just for the excitement of watching paint dry.

Thank you, I will play around with that for a bit and see how things compare! :)

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-09-18 21:23:10

Quote from: MrRom92 on 2022-08-29 23:49:02

Now I want to grind things down to a halt again!

Can I then tempt you with the following output line that ffmpeg presented to me, after a day of hard work?

size= 72kB time=00:00:00.17 bitrate=3440.1kbits/s speed=4.08e-06x

If it goes on like this, it will get a full second encoded in less than three days. Is that "down to a halt" enough, even with data density as high as 96/24? ;)
(I used a long line with several slowing-down options, including the "-multi_dim_quant" option, which must be either a horrible piece of work or a successful trolling. Even a -compression_level 8 -multi_dim_quant 1 spends 41 minutes to encode a second. And compresses like flac -4 - to the extent you can conclude from a three-second corpus :-o )

Title: Re: New FLAC compression improvement
Post by: mycroft on 2022-09-19 07:32:21

Amusing, comparing apples with oranges. Nothing new here, move along.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-09-19 09:25:41

Doing a web search, there is a seven year old ticket (https://trac.ffmpeg.org/ticket/4773) on this fault in ffmpeg's FLAC encoder, so this is not comparing apples to anything of nutritional value.

Anyway it was an attempted proof of concept to users who say they want the absolute best FLAC compression ratio - no they don't, as there is no practical limit to how slow encoding can get. The attempt was futile as goes the "best" (try for yourself with the attachment!), and there seems to be nothing ffmpeg can do to outcompress 1.4.0 (https://hydrogenaud.io/index.php/topic,122949.msg1015740.html#msg1015740).

And I still wonder why it is necessary for you to spew toxic over HA without even offering solutions you claim exist (https://hydrogenaud.io/index.php/topic,122094.msg1008187.html#msg1008187). At that stage you just disappear, only to return with the same attitude. Oh, and do as I say (https://hydrogenaud.io/index.php/topic,122094.msg1007983.html#msg1007983) not as I do (https://hydrogenaud.io/index.php/topic,121478.msg1015597.html#msg1015597).

Title: Re: New FLAC compression improvement
Post by: mycroft on 2022-09-19 09:33:52

Just tried random wav file with your beloved 1.4.0 flac encoder.
And ffmpeg beats it at ease at -compression_level 12. Even tried -8e flag.

But be arrogant as always and live with it, can not expect anything more from such kind.

Title: Re: New FLAC compression improvement
Post by: mycroft on 2022-09-19 09:35:15

And that ticket is irrelevant, but teaching pig new skills are impossible.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-09-19 13:26:37

Thank you for the well-documented test using -compression_level 12 to compare CDDA (or something you deliberately left out was different).

People are poking fun at how a certain codec developer posted a test corpus of three given - in the early 2000s when when computing power was expensive. Fast-forward twenty years and a test corpus of thirty-eight given CDs and specifically sticking to subset, is irrelevant against a claim of "random wav file", unnamed and of undocumented resolution, and such that the combination of signal and setting cannot possibly be. If then you wanted a random file I attached one. Did you test it?

And, what is irrelevant about that ticket? It reports that a certain setting takes uselessly long time to encode, are you saying that speed is not relevant to an encoder?

Title: Re: New FLAC compression improvement
Post by: doccolinni on 2022-09-19 19:43:18

Quote from: mycroft on 2022-09-19 09:33:52

Just tried random wav file with your beloved 1.4.0 flac encoder.

Just hacked into your computer to check that you really did do that, and it turns out that you completely made it up. You didn't encode a single thing.

Also, you really don't need to download porn nowadays.

Title: Re: New FLAC compression improvement
Post by: itisljar on 2022-09-20 09:45:51

So, I've taken randomly "Bravo Hits 57" compilation and created wav image files with external CUE files. It's modern mastering pop music with some acoustic tracks. Encoders are:

e:\WORK>flac -v
flac 1.4.0

e:\WORK>ffmpeg
ffmpeg version N-107264-g23fde3c7df-gc471cc7474+3 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
Writing application : Lavf59.25.100
(ffmpeg was compiled on this machine couple of months ago)

Parameters for encoding were these:

flac -8 and flac -8e
ffmpeg -compression_level 12

Results are as follows:

Code: [Select]

Various Artists - Bravo Hits 57 CD1.wav	766,37 M
Various Artists - Bravo Hits 57 CD1 (ffmpeg-l12).flac	539,54 M
Various Artists - Bravo Hits 57 CD1 (flac -8).flac	538,24 M
Various Artists - Bravo Hits 57 CD1 (flac -8e).flac	538,12 M

Code: [Select]

Various Artists - Bravo Hits 57 CD2.wav	740,79 M
Various Artists - Bravo Hits 57 CD2 (ffmpeg-l12).flac	526,02 M
Various Artists - Bravo Hits 57 CD2 (flac -8).flac	525,00 M
Various Artists - Bravo Hits 57 CD2 (flac -8e).flac	524,91 M

So, to note, I didn't measure time, but flac -8 was subjectively fastest, I think ffmpeg was a bit faster than flac -8e. ffmpeg showed encoding speed as 40x, not bad.

To conclude, there may be cases where ffmpeg flac encoder will "win", but it all depends on type of music, mastering, noise levels. To conclude it has better compression all the time over official flac is stupid.

Title: Re: New FLAC compression improvement
Post by: smok3 on 2022-09-20 14:49:07

Tiny compression differences seem to be due padding, test on 43 minute 16bit 48kHz wav:

Code: [Select]

ls -lahS --block-size=K *.flac                                                                         
-rwxrwxrwx 1 b b 246494K Sep 20 15:14 flac14best.flac
-rwxrwxrwx 1 b b 246442K Sep 20 15:18 ffmpeg12.flac
-rwxrwxrwx 1 b b 246430K Sep 20 15:14 flac14bestNoPadding.flac
-rwxrwxrwx 1 b b 246321K Sep 20 15:14 flac14bestNoPadding8e.flac

Title: Re: New FLAC compression improvement
Post by: Porcus on 2022-09-20 16:08:51

ktf & co killed -e with release 1.4.0, at least for CDDA; in my test, -8p was faster than -8e (https://hydrogenaud.io/index.php/topic,122949.new.html#info_1016018), and the difference from -8 twice as big. That is 2 * [small number] though.

Of course -e could be useful for slowing down flac if you want to give it a handicap in a competition against a weaker opponent ;D

Title: Re: New FLAC compression improvement
Post by: Bogozo on 2022-09-20 16:20:57

On some files ffmpeg 12 can beat 1.4.0 -8. But on the same files 1.4.0 -8 -r 8 can beat ffmpeg 12, while still being significantly faster.

Title: Re: New FLAC compression improvement
Post by: Porcus on 2023-01-01 16:41:41

I once discovered that with ffmpeg's flac encoder, higher -lpc_passes doesn't necessarily translate to better compression (the phenomenon isn't much visible at "sane" number of iterations) - so during the holidays I went back to the flac-irls-2021-09-21.exe build and ran a few days of encoding tests - this time on the CDDA corpus in my signature.

tl;dr:
A "superior" choice of windowing functions gives a size advantage that the irlspost runs cannot catch up with - indicating thatboth the least-squares and least-absolute-value approaches are too vulnerable to the outliers that the windowing functions happen to discard.

What I did:
* picked a few settings as "starting point", and for each: added atop an irlspost(N) or irlspost-p(N) for N=1 to 9 (which is already beyond sanity I think) or to 14 or even to 19, and recorded the sizes per album
* did "the same" with ffmpeg, with compression levels 8, 8 with additional precision, and 12.

Some findings:
* Even more: Atop a "powerful" setting (example: -8p), increasing the passes count will improve file size until way past sanity. (Exceptions there are, but for two thirds of the albums, the 19th & final made for the smallest files.)
* Atop a "bad setting", on the other hand ... I chose -l5 -q5 -b4097 -A "irlspost(%l)" in order to fix coefficient precision and keep the "Rice partitioning equal" (i.e. no partitioning at all, that's equal). The number of passes that made for smallest files were, per album, 6 4 4 5 5 6 5 4 5 4 4 5 4 6 9 4 4 3 3 6 4 4 4 5 3 2 9 8 4 3 9 3 4 9 4 4 5 3 (where 9 was max run in this part of the test).
* ffmpeg's "powerful settings" aren't equally robust as reference flac's. Starting out with -lpc_passes atop -compression_level 12, a mighty thirteen passes do more often than not generate larger files than a mere two passes do. Not that it is much
* Both with ffmpeg and with reference flac's "not so powerful" settings, the following holds with very few (and small in size difference) exceptions: Size has a \_/ relationship with the irlspost count. Increasing the count would first improve monotonically to a minimum and then make monotonically worse.

I could post some size data, but ... nah, I don't think they are quantitatively much to gain insight from.

Learnings?
Above I indicated the following qualified guess:
The reason the irls passes "cease to improve" is that also this method needs a good choice of apodization function. (Here I have outright presumed that the IRLS iterations are on the windowed data - @ktf, is that correct?)
Assinging more windowing functions is a "kinda brute-force" way to arbitrarily remove or downweigh parts of the signal, and this suggests that there might be some theoretical benefit from (1) initial run, (2) re-windowing based on what are the actual outliers. Emphasis on theoretical because nothing says this is an efficient use of CPU time - the use of windowing functions might accomplish "nearly the same much faster" for all that I know.

(Hm, ktf: if you feel like compiling the irlspost routine into the 1.4.x, I could redo the test with current -8 windowing - if only to confirm the suspicion.)

HydrogenAudio

Lossless Audio Compression => FLAC => Topic started by: ktf on 2020-11-04 20:47:09