New FLAC compression improvement

Topic: New FLAC compression improvement (Read 46091 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: New FLAC compression improvement

Reply #25 – 2021-06-24 23:16:56

Quote from: ktf on 2021-06-24 13:11:05

Anyway, I'd like the opinion of the people reading this on the following

I think the "don't switch" (= do nothing) is the worst. Well with the reservation that you might want to wait until there is a new version with other improvements too.

Then opening some cans with possible dumb questions:
* is it possible to make the switch for a 64-bit-only version? (I understand from what you mentioned above, those platforms affected cannot run the 64-bit version anyway?)
* is it possible to make an option to turn off those routines? --SSEonlyplatform?

Re: New FLAC compression improvement

Reply #26 – 2021-06-30 09:56:40

Quote from: Porcus on 2021-06-24 23:16:56

Then opening some cans with possible dumb questions:
* is it possible to make the switch for a 64-bit-only version? (I understand from what you mentioned above, those platforms affected cannot run the 64-bit version anyway?)
* is it possible to make an option to turn off those routines? --SSEonlyplatform?

First is possible (but I don't see the benefit of doing that?), the second would mean changing the libFLAC interface and sacrificing binary compatibility, so I'd rather avoid that.

Anyway, I've finished polishing the changes and I've sent a pull request through github: https://github.com/xiph/flac/pull/245 Further technical discussion is probably best placed there.

Here's a mingw 64-bit Windows compile for anyone to test. This does not have the IRLS enhancement and thus no compression level 9 like the previous binary, but it compressed better on preset -3 to -8, mostly on classical music. I've changed the vendor string (I forgot that on the previous binary) to "libFLAC Hydrogenaud.io autoc-double testversion 20210630", so if you try this on any part of your music collection, it is possible to check which files have been encoded with this binary.

Feel free to test this and report back your findings. I think the results will be quite a bit closer to what CUEtools.Flake is producing

Re: New FLAC compression improvement

Reply #27 – 2021-06-30 18:02:37

Quote from: ktf on 2021-06-30 09:56:40

First is possible (but I don't see the benefit of doing that?)

If I understood correctly, all CPUs that would get the slowdown are 32-bit CPUs. Making the change for the 64-bit executable will be a benefit to several processors and not disadvantage anyone. If I got it right?

Then afterwards one can decide what to do for the 32-bit executable.

Re: New FLAC compression improvement

Reply #28 – 2021-07-01 03:20:47

Quote from: ktf on 2021-06-30 09:56:40

Feel free to test this and report back your findings. I think the results will be quite a bit closer to what CUEtools.Flake is producing

This SSE improvement indeed is a very nice finding!
The compile you offer ends for the 29 CDs with a size of:
7.592.544.746 Bytes
The small difference against flake is mainly due to padding this time i guess.

Quote from: Wombat on 2021-06-17 21:27:44

Some unscientific numbers for 29 random CD format albums:

CUEtools flake -8
7.591.331.615 Bytes

flac-native -8
7.623.712.629 Bytes

Re: New FLAC compression improvement

Reply #29 – 2021-07-01 11:45:02

The difference was about a megabyte per CD. Now it is down to a megabyte in total. Dunno if you use tracks or images.

Re: New FLAC compression improvement

Reply #30 – 2021-07-02 18:22:15

I tried some 24/96 albums and the new binary improved compression very well. Even better as CUEtools flake.
Really looking forward for a version you add the IRLS weighing back in over a dedicated switch for example.
Maybe also interesting the -8 -ep size for these 29 albums:
7.584.945.065 Bytes

Re: New FLAC compression improvement

Reply #31 – 2021-07-02 21:19:48

Quote from: Wombat on 2021-07-02 18:22:15

Really looking forward for a version you add the IRLS weighing back in over a dedicated switch for example.

Sadly, the gains do not stack well. Most of the gains from the IRLS patch are the same gains that this autocorrelation precision doubling also gets. It still works, but not as much as I posted earlier.

On the upside, I've now programmed with intrinsics, and it went pretty well. It was a nice challenge, so maybe I can speed up the IRLS code quite a bit to compensate for the lower gains. Also, the autocorrelation precision is also relevant in levels -3, -4, -5, -6 and -7, and -5 is relevant to the numbers in the Hydrogenaud.io wiki Lossless comparison

Re: New FLAC compression improvement

Reply #32 – 2021-07-03 16:03:49

Looks like http://www.audiograaf.nl/downloads.html has not seen its final edition yet

Re: New FLAC compression improvement

Reply #33 – 2021-08-01 15:37:30

@ktf , you are probably more busy looking into https://hydrogenaud.io/index.php?topic=121349.msg1001309 , but still:
I've tested your double-precision build a little bit back and forth about compression levels on a varied (but missing chartbusters) corpus of 96k/24 files, 96/24 because I conjecture FLAC was primarily developed for CDDA, and so try something else to look for surprises. All stereo, though. Comparison done to official flac.exe 1.3.1.

Your -5 does pretty well I think. Results at the end, but first: minor curious things going on:

* -l 12 -b 4096 -m -r 6 (that is, "-7 without apodization") and -l 12 -b 4096 -m -r 6 -A tukey(0,5) produce bit-identical output. So is the case for 1.3.1: "-A tukey(0,5)" makes no difference.
(My locale uses comma for decimal separator. By using the wrong one I also found out that "tukey(0.5)" is interpreted as "tukey(0)", which appears to be outright harmful to compression.)

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) . Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

* It seems that -b 4608 improves slightly - however I couldn't reproduce that on CDDA signals. Also, 16384 did not improve over 8192 on the 96/24 corpus.

So, the improvement figures. It improves quite a bit on -5. Note, sizes are done on the entire corpus, but times are not and are just an indication; for times, I only took a single album (NIN: The Slip) over to an SSD, hence the low numbers, and they are run only once.

* PCM is ~ 13.6 GB, compresses to 7.95 GB using 1.3.1 -5, and from then on we have MB savings as follows
80 MB going 1.3.1 -5 (24 seconds on The Slip) to 1.3.1 -8 (43 seconds)
193 MB going 1.3.1 -8 to your -5 (25 seconds). Heck, even your -3 (20 seconds) is 65 MB better than 1.3.1 -8.
38 MB going your -5 to your -8 (63 seconds). Note how this makes "-8" less of a bounty now. The big thing is how your -5 improves over 1.3.1 -5 by 3.34 percent or 1.9 percent points and at around zero speed penalty. Oh, and new -7 clocked in at 42 seconds, compare to old -8.

* I tested a bit of -e and -p on both new and old. A minor surprise:
- 5 at 1.3.1: -p (41 seconds) saves 2 MB, -e (instead of -p of course; 36 seconds) saves 35.
- 5 at yours: Both quite pointless, savings 0.4 (taking 45 seconds) and 0.6 (39 seconds).
Your build from -8 -b 8192 -r 8 (64 seconds, only one more than -8): adding "-p" (220 seconds) saves 2 MB, adding "-e" (instead, takes 214 seconds) saves 11.
So your "-5" picks up nearly all the advantages of -p or -e ... ?!

* For the hell of it, I started a comparison of -8 -b 8192 -l 32 -p -e, mistakenly thinking it would finish overnight ... I aborted it after a quarter run, having done only the"Lindberg" part below, and the autoc-double seems to improve a slight bit less here than in -8 mode. To get you an idea of what improves or not, fb2k-reported bitrates for that part:
2377 for 1.3.1 -5
2361 for 1.3.1 -8
2349 for 1.3.1 -8 -b 8192 -l 32 -p -e
2329 for your -3
2314 for your -5
2310 for your -8
2305 for your -8 -b 8192 (roundoff takes out the -p or -e improvements
2302 for your -8 -b 8192 -l 32
2301 for your -8 -b 8192 -l 32 -p -e

I can give more details and more results but won't bother if all this is well in line with expectations.
FLAC sizes: I removed all tags from the .wav files, used --no-seektable --no-padding and compared file sizes, since differences were often within fb2k-reported rounded-off bitrates.
Corpus (PCM sizes, -5 bitrates)
3.42 GB -> 2314 various classical/jazz tracks downloaded from the Lindberg 2L label's free "test bench", http://www.2l.no/hires/
2.64 GB -> 2161 the "Open" Goldberg Variations album, https://opengoldbergvariations.org/
3.18 GB -> 3035 Kayo Dot: "Hubardo" double album (avant-rock/-metal with all sorts of instruments)
1.41 GB -> 2913 NIN: The Slip
1.24 GB -> 2829 Cult of Luna: The Raging River (post-hc / sludge metal)
1.14 GB -> 2588 Cascades s/t (similar style, but not so dense, a bit more atmosphere parts yield the lower compressed bitrate)
0.57 GB -> 2724 The Tea Party: Tx 20 EP (90's Led Zeppelin / 'Moroccan Roll' from Canada)

Re: New FLAC compression improvement

Reply #34 – 2021-08-02 12:54:01

Quote from: Porcus on 2021-08-01 15:37:30

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) . Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

Some observations by trial and error:

The order of windows work best from widest to narrowest in terms of time domain. For example, windows that don't take arguments, from widest to narrowest are Rectangular, Welch, Triangular/Hann, Blackman, then Flat top.
https://en.wikipedia.org/wiki/Window_function#A_list_of_window_functions
The left (blue) plots are time domain, and right (orange) ones are frequency domain, windows occupy more blue regions are wider.

Windows taking arguments like Tukey, when used repeatly, also work best from widest to narrowest, I believe Tukey(0) is same as Rectangular and Tukey(1) is same as Hann, according to this:
https://www.mathworks.com/help/signal/ref/tukeywin.html

I don't understand partial Tukey and punchout Tukey, look like these are combinations of several Tukey windows by looking at the source codes.

-e may adversely affect compression ratio with the above ordering (plus it is super slow anyway).

-6 to -8 already specified one or more windows, so don't use them if you are planning to use custom windows ordering.

Re: New FLAC compression improvement

Reply #35 – 2021-08-02 13:57:53

Quote from: bennetng on 2021-08-02 12:54:01

-e may adversely affect compression ratio with the above ordering

So the above test runs indicate the opposite: it improves every album, and even at hard settings with the "good order", but - and this is the surprise - it the improvement is near zero at the new build's "-5".
Gut feeling is that the new -5 is the oddball, and in a benevolent way in that it picks up a lot of improvement without resorting to -e or -p

I have very limited experience with -e and -p for this obvious reason:

Quote from: bennetng on 2021-08-02 12:54:01

(plus it is super slow anyway)

On this machine, not so slow as -p, and better (though not for every album; but, bitrates reported by fb2k, I have yet to see -p being more than 1 kbit/s better).

Re: New FLAC compression improvement

Reply #36 – 2021-08-04 08:29:18

Quote from: Porcus on 2021-08-01 15:37:30

* -l 12 -b 4096 -m -r 6 (that is, "-7 without apodization") and -l 12 -b 4096 -m -r 6 -A tukey(0,5) produce bit-identical output. So is the case for 1.3.1: "-A tukey(0,5)" makes no difference.
(My locale uses comma for decimal separator. By using the wrong one I also found out that "tukey(0.5)" is interpreted as "tukey(0)", which appears to be outright harmful to compression.)

That is because -A tukey(0,5) is the default. So, both should produce bit-identical output.

Quote from: Porcus on 2021-08-01 15:37:30

* Order of the apodization functions matters! -A partial_tukey(2);tukey(0,5);punchout_tukey(3) is not the same as -A tukey(0,5);partial_tukey(2);punchout_tukey(3) . Having observed that, I vaguely remember someone saying there is no reason they should be equal, so maybe this is well-known? Also from the observation that the -8 order scores better than permuting.

It is not something well known. However, the differences should be very, very small. The reason the order might matter is that FLAC estimates the frame size for each apodization, it does not fully calculate it. If two apodization give the same frame size estimate, the one that is evaluated first is taken. The estimate might be a bit off though, which means that swapping the order can change the resulting filesize.

At least, that is how I understand it. This means that this influence should be minor, as having two estimates does not occur often and the actual difference should not be large, as the estimate is usually quite good. There could be something else at work though.

Quote from: Porcus on 2021-08-01 15:37:30

* PCM is ~ 13.6 GB, compresses to 7.95 GB using 1.3.1 -5, and from then on we have MB savings as follows
80 MB going 1.3.1 -5 (24 seconds on The Slip) to 1.3.1 -8 (43 seconds)
193 MB going 1.3.1 -8 to your -5 (25 seconds). Heck, even your -3 (20 seconds) is 65 MB better than 1.3.1 -8.
38 MB going your -5 to your -8 (63 seconds). Note how this makes "-8" less of a bounty now. The big thing is how your -5 improves over 1.3.1 -5 by 3.34 percent or 1.9 percent points and at around zero speed penalty. Oh, and new -7 clocked in at 42 seconds, compare to old -8.

Interesting. This is too little material to go on I think, but the change from float to double for autocorrelation calculation had most effect on classical music, and almost none on more 'noisy' material. For example, see track 11 in this PDF: http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics-per-track.pdf which is also NIN. There is almost no gain. That is does work well with the Slip has mostly to do with the higher bitdepth (24-bit) and not so much with the higher samplerate I´d say.

Quote

So your "-5" picks up nearly all the advantages of -p or -e ... ?!

-e does a search for the best order, without -e it uses an by-product of the construction of the predictor to guess the best order. It could be that the higher prediction also results in a better guess, but since the release of 1.3.1 there's also another change to this guess: https://github.com/ktmf01/flac/commit/c97e057ee57d552a3ccad2d12e29b5969d04be97

I can only guess why -p loses it's advantage. Perhaps because the predictor is more accurate, the default high precision is used better and searching for a lower precision does not trade-off well anymore?

Re: New FLAC compression improvement

Reply #37 – 2021-08-04 08:32:58

fb2k-reported bitrates here. Not only is -5 nearly catching up with -6, but look at -4. And: -7 nearly catching -8.
If it were solely due to material, one would have expected 1.3.1 to show the same. Not quite. (There, -6 is quite good, that is in line with previous anecdotal observations.)

2644 for -3 useless or not, your build's -3 produces smaller files than 1.3.1 does at -8
2606 for -4 that shaves off 38 from -3. (2703 with 1.3.1 improves 20 from its -3)
2602 for -5 only improves 4 over -4. (But: 2692 with 1.3.1)
2599 for -6 only improves 3 over -5. (1.3.1: 2674, improves by 18 over -5.)
2590 for -7 improves by 9 over -6. (1.3.1: 2671, small improvement over -6.)
2590 for -8 calculated improvement 0.48. (1.3.1: 2666, improves more over -7 than -7 vs -6).
2580 for -8 -b 8192 -l 32 which is subset because 96 kHz. (1.3.1: 2670, worse than -8)

And then facepalming over myself:

Quote from: Porcus on 2021-08-01 15:37:30

"-7 without apodization"

save for the inevitable ~~doh~~default.

Re: New FLAC compression improvement

Reply #38 – 2021-08-04 10:03:45

All right, you posted while I thought the error message I got was that I had been logged out. Then since you mentioned something about what should benefit, I looked over the various albums, and I have seen the ten percent improvement mark!
And that is not classical or anything, it is the Tea Party four-song EP (yes the full EP, not an individual track). So I went for the current Rarewares 1.3.3 build, which does no better than 1.3.1.
Major WTF! You got PM in a minute.

As for the order of apodization functions: Yes, small difference. .04 percent to .08 percent.

Re: New FLAC compression improvement

Reply #39 – 2021-08-04 20:07:06

After doing some research with @Porcus, it became clear that I was wrong on the following:

Quote from: ktf on 2021-08-04 08:29:18

That is does work well with the Slip has mostly to do with the higher bitdepth (24-bit) and not so much with the higher samplerate I´d say.

As it turns out, it is the high samplerate and not the bitdepth that makes the difference. However, further research showed that it is actually the absence of high-frequency content in high samplerate files that make this difference. As a test, I took a random audio file (44.1kHz/16-bit) from my collection, and encoded it with and without the double-precision changes, and the difference in compression was 0.2%. When I applied a steep 4kHz low-pass on the file, this difference rose to 10%. To be clear, this is not the difference between the input file and the low-passed file, but the differences between the two FLAC binaries on the same low-passed file.

Re: New FLAC compression improvement

Reply #40 – 2021-08-12 09:20:03

Two more tests started when I went away for a few days.

* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12.
I don't know how subset-compliant ffmpeg is on hirez ... also, comparing file sizes could be kB's off, I had to run a metaflac removing everything on the ffmpeg-generated files - and then it turns out that there would still be bytes to save with fb2k's minimize file size utility.

* The other test was to see how close that "sane" settings get to insane ones. For that purpose I let it run (for six days!) compressing at -p -e --lax -l 32 -r 15 -b 8192 -A [quite a few]. TL;DR for that test: your new build gets much closer, the return on going bonkers is getting smaller with the double-precision build than it was with 1.3.1. Not unexpected, as better compression --> closing in on theoretical minimum.

Results:

1.3.1:
8066 MB for -8, that is actually 0.15 percent better than -8 -b 8192 -l 32 (also subset!).
7966 for the weeklong (well four days) max lax insanity, that shaves off more than a percent and ... and finally gets 1.3.1 to beat TAK -p0 which is 8017, this "for reference"

The double precision build, and ffmpeg:
7873 MB for -5
7836 MB for -8
7831 MB for ffmpeg at compression_level 8 (fb2k reports 2 kbit/s better than your -8)
7815 MB for -8 --lax -b 8192 -r 15
7809 MB for ffmpeg at compression_level 12
7808 MB for -8 -b 8192 -l 32 (subset! -l 32 does better than -r 15, and 1 kbit/s better than ffmpeg)
7779 MB for the five days max lax insanity setting.

So are there differences at how ffmpeg does?
-8 vs compression_level 8: No clear pattern. Of seven files, ffmpeg wins 4 loses 3. Largest differences: ffmpeg gains 17 for Cascades, yours gains 15 for Nine Inch Nails.
-8 -b 8192 -l 32 vs compression_level 12: ffmpeg wins 3 loses 4, but the largest differences favor ffmpeg: 41 for The Tea Party. Again yours has the upper hand of 13 on NIN.

Now which signals *do* improve from the insane setting? Absolutely *not* the jazz/classical - nor is it the highest bitrate signal (Kayo Dot), they are within 5 kbit/s from -8.
It is The Tea Party, gaining 98 kbit/s over the subset, which again gains 46 over -8.

So the TTP EP indicates there *is* something to be found for music that is not extreme. Note that TTP is the shortest (in seconds) file of them all, and that could make for larger variability.

Re: New FLAC compression improvement

Reply #41 – 2021-08-12 09:26:27

By the way, TBeck speaks about splitting windows and that this is possible within the FLAC spec. This is beyond me (I stopped learning Fourier analysis long before getting hands-on and I totally suck at code), but ... ?

Re: New FLAC compression improvement

Reply #42 – 2021-08-12 14:15:15

Thanks for digging that one up. Possibility A is what I implemented quite a few years ago in FLAC 1.3.1, partial_tukey and punchout_tukey. TBeck was talking about using various variations of the triangle window, this works a little different. Partial tukey uses only a part of the signal (hence partial_tukey) for LPC analysis, punchout tukey masks a part of the signal (it 'punches' out a part of the signal) by muting it for LPC analysis. This works exactly as described in the post you link

Quote from: TBeck on 2006-06-14 05:10:14

If so, then one frame often will contain parts with different signal characteristics, which better should be predicted seperately. But this does not happen.

This is my hypothese, why windowing helps FLAC that much: It surpresses the contribution of of one or two (potential) subframes at the frame borders to the predictor calculation and hence improves the quality of the (potential) subframe within the center. At least this one now gets "cleaner" (not polluted by the other subframes) or better adapted predictor coefficients and overall the compression increases.

Possibility B would make use of a variable blocksize. This is possible within the FLAC format, and flake (and cuetools.flake) implement this. However, as this has never been implemented in the reference encoder, it might be that there are (embedded) FLAC decoders that cannot properly handle this. I have wanted to play with variable blocksizes for a long time, but if it were to succeed, this might create a division of "old" fixed-blocksize FLAC files and "new" variable-blocksize FLAC files, where the latter is unplayable by certain devices/software.

I cannot (yet) substantiate this fear. Moreover, implementing this in libFLAC would require *a lot* of work. Perhaps it would be better to experiment with new approaches in cuetools flake, if I run out of room for improvements in libFLAC

Re: New FLAC compression improvement

Reply #43 – 2021-08-12 14:58:42

Yeah, so the following musing is probably just ... not much pursuing, but that hasn't stopped me from thinking aloud:

* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on.
But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset?

* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384.
Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192?

(... it is then so that increasing n in partial_tukey(n) allows further splitting up?)

Anyway, CUETools flake may in itself support non-CDDA, but I doubt that it will be used outside CDDA when CUETools is restricted that way. Meaning, you might not get much testing done. ffmpeg, on the other hand ... ?

Re: New FLAC compression improvement

Reply #44 – 2021-08-13 15:07:51

Quote from: Porcus on 2021-08-12 09:20:03

* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12.

Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.

Quote from: Porcus on 2021-08-12 14:58:42

* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on.
But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset?

Yes, it might be unsafe to use -l 32 -b 8192 when encoding for maximum compatibility. However, properly decoding -l 32 -b 8192 is quite simple, as it is simply more of the same (longer block, longer predictor). Also, it is part of the FLAC test suite.

Variable blocksizes are not part of the FLAC test suite, and quite a few things change. For example, with a fixed blocksize, the frame header encodes the frame number, whereas with a variable blocksize, the sample number is encoded.

Quote

* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384.

The problem is that it is probably impossible to know upfront what the optimal blocksize is for a whole file.

Quote

Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192?

I cannot answer that question without just trying a bunch of things and see. I will explain the idea behind these apodizations (which I will call partial windows)

I would argue that a certain set of partial windows @ 44.1kHz with blocksize 4096 should work precisely the same with that file @ 88.2kHz with a blocksize of 8192.

partial_tukey(2) adds two partial windows, one in which the first half of the block is muted and one in which the second half of the block is muted. partial_tukey(3) adds three windows, one in which the first 2/3th of the block is muted, one in which the last 2/3th of the block is muted and one in the first and last 1/3th of the block is muted.

punchout_tukey does the opposite of partial_tukey. Using punchout_tukey(2) and partial_tukey(2) together makes no sense, because you get the same windows twice, because punchout_tukey(2) creates the same two windows but swapped. punchout_tukey(3) adds 3 windows, one with the first 1/3th muted, one with the second 1/3th muted and one with the third 1/3th muted.

If a block consists of a single, unchanging sound, partial_tukey and punchout_tukey do not improve compression. If a block has a transient in the middle of the block (for example, the attack of a new note at roughly the same pitch) punchout_tukey(3) adds a window in which the middle with this transient is muted. The LPC stage can now focus on accurately predicting the note and not bother with the transient. This is why adding these apodizations improves compression. As TBeck put it, the signal is cleaner. Of course this transient still has to be handled by the entropy coding stage, but a predictor that does part of the block very good beats one that does of all of the block mediocre on compression.

So, choosing how many partial windows to add depends on the number of transients in the music, the samplerate and the blocksize. If the samplerate doubles, the blocksize can be doubled keeping the same number of partial windows. If the samplerate doubles and the blocksize is kept the same, the number of partial windows can be halved. If you samplerate is kept the same and the blocksize is doubled, the number of partial windows should be doubled.

However, it depends on the music as well. Solo piano music at a slow tempo won't benefit from more partial windows as there are few transients to 'evade'. Non-tonal music on which prediction doesn't do much won't benefit either. Very fast paced tonal music might benefit from more partial windows. The current default works quite well on a mix of (44.1kHz 16-bit) music.

Re: New FLAC compression improvement

Reply #45 – 2021-08-13 15:56:13

* All right, I got the punchout now, thanks!

* I don't know if CUETools' flake even supports higher resolutions, but if so, how do I call it outside CUETools? I could run the test at that too.

* You mentioned solo piano, so let me mention the following: The Open Goldberg Variations does routinely not benefit from increasing the Rice parameter value (max, I never set min). By that I mean that settings that only differed by -r, say -r 6 vs -r 8, would often yield the same file (same .sha1) for that album. Yes a full album .flac file, I did not bother about tracks here.

* Meanwhile, I started another test using ffmpeg -compression_level 8 -lpc_type cholesky. It improves over -compression_level 8 for every sample, -3 kbit/s on the total. So at the ffmpeg camp they have been doing something bright.

* And then, since you posted in the https://hydrogenaud.io/index.php?topic=120906 thread, reminding me of that thing, I tried this new build on test signals: sine waves (at peak near full scale) - or tracks that have four sine waves in succession (peaks .72 to .75).

So even tracks that are damn close to continuing sine, benefit from double precision - but even then, ffmpeg beats it. Results:
-5 results:
388 for ffmpeg at -compression_level 5
376 for 1.3.1 at -5
362 for double precision at -5
-8:
360 for 1.3.1 at -8
343 for double precision at -8
330 for ffmpeg at -compression_level 8. Same as for your build with -p or -e (-p -e is down to 321).
314 for ffmpeg -compression_level 8 -lpc_type cholesky

Lengths and track names - the "1:06" vs the final "1:04" must be gaps appended. Total length 9:16 including 12 seconds gap then.
1:38 Mastering calibration - 1 kHz
1:38 Mastering calibration - 10 kHz
1:38 Mastering calibration - 100 Hz
1:06 Frequency check - 20, 32, 40 & 64
1:06 Frequency check - 120, 280, 420, 640
1:06 Frequency check - 800, 1200, 2800 & 5000
1:04 Frequency check - 7500, 12000, 15000 & 20000

Re: New FLAC compression improvement

Reply #46 – 2021-08-13 16:27:04

CUETools flake supports at least 24-192. When CUEtools was new i reported problems with highbitrate material to Grigory Chudov and he fixed it immediately.
If you want to test recent behaviour just use my 2.16 encoder compile. AFAIK nothing changed for the flake encoder since.
Regarding non default blocksizes My Slimdevices Transporter for example can't play 24-96 material with 8192.

Re: New FLAC compression improvement

Reply #47 – 2021-08-13 17:28:43

Quote from: Porcus on 2021-08-13 15:56:13

I don't know if CUETools' flake even supports higher resolutions, but if so, how do I call it outside CUETools? I could run the test at that too.

Yes, it does support higher resolutions. It is easy to use it as custom encoder in foobar2000.

-q -8 --ignore-chunk-sizes --verify - -o %d
For compression levels higher than 8, --lax should be added to command line.

Quote from: Porcus on 2021-08-13 15:56:13

ffmpeg beats

Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8. And there is no option in ffmpeg to set block size.

Re: New FLAC compression improvement

Reply #48 – 2021-08-13 20:45:10

More testing with CUETools.Flake.exe -8 -P 0 --no-seektable --and-maybe-some-more-options, after I (thanks to Rollin!) found out that yes it is an .exe and not just a .dll ... I don't feel smart right now.

Results given for three groups of material, all figures are fb2k-reported bitrates.
Maybe the most interesting news:
No miracles from variable block size (--vbr 4, at one instance also --vbr 2).

* First, those test signals where 1.3.1 ended up at 360: four CUETools.Flake.exe runs all ended up at 359 (with or without --vbr 4, with or without -t search), that is not on par with ktf's double precision build

* Then the 96k/24 corpus, put the Flake into the relevant orders; all encoders that are not specified, are ktf's double precision build - to be clear, it is the libFLAC Hydrogenaud.io autoc-double testversion 20210630

2666 for reference flac.exe 1.3.1 at -8
2644 for -3
2609 for CUETools.Flake.exe at -8
2607 for CUETools.Flake.exe at -8 --vbr 4 -t search -r 8 -l 32
2606 for -4
2605 for CUETools.Flake.exe at -8 --vbr 4 (better than throwing in -t search -r 8 -l 32)
2602 for -5
2599 for -6
2590 for -7
2590 for -8 (0.48 better than -7, calculated from file size)
2588 ffmpeg -compression_level 8
2585 ffmpeg -compression_level 8 -lpc_type cholesky
2584 for -8 -b 8192 -r 8 -p
2584 for --lax -r 15 -8 -b 8192
2581 for -8 -b 16384 -l 32
2581 for -8 -b 8192 -r 8 -e (slightly smaller files than the "2581" above)
2581 ffmpeg -compression_level 12 (again slightly smaller files than previous 2581)
2580 for -8 -b 8192 -l 32 (subset too, notice 8192 better than 16384)
2571 for --lax -p -e -l 32 -b 8192 -r 15 -A enough-for-five-days
2563 for TAK at the default -p2.

* And, here are some results for one multi-channel (5.1 in 6 channels) DVD-rip (48k/24), 80 minutes progressive rock:
4119 for 1.3.3 at -8
4091 for ktf's -8
4088 for ffmpeg -compression_level 8
4080 for ffmpeg -compression_level 8 -lpc_type cholesky
4065 for CUETools.Flake.exe at -8, and also at -8 --vbr 2 and at -8 --vbr 4 (I overwrote without checking file size differences)
And lax options, not sure what to make out of them:
4044 for ffmpeg -compression_level 12 -lpc_type cholesky
4023 for CUETools.Flake.exe at -11 --lax
4006 for ktf's -8 -b 8192 -l 32 -r 8 --lax

Quote from: Rollin on 2021-08-13 17:28:43

Notice, that ffmpeg uses non-default block size. 4608 for 44100/16 with compression_level 8.

With reference FLAC I did not get 4608 to improve over 4096 at CDDA. Only very rough testing (but more than this test CD ...).
Also with 96k/24 I did not get 16384 to improve over 8192.

Re: New FLAC compression improvement

Reply #49 – 2021-08-14 17:18:01

Quote from: ktf on 2021-08-13 15:07:51

Quote from: Porcus on 2021-08-12 09:20:03
* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12.
Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.

Apparently, I was too quick dismissing ffmpeg. As can be read https://hydrogenaud.io/index.php?topic=45013.msg412644#msg412644, flake's developer actually did quite a bit of development after integrating flake into ffmpeg. Also, others did.

It turns out this cholesky factorization (which can be used with the option -lpc_passes) does pretty much what the IRLS approach does which this thread starts with. I am truly surprised with this, especially as it has been there since 2006 with apparently nobody here at hydrogenaudio using it. I quote from https://github.com/FFmpeg/FFmpeg/commit/ab01b2b82a8077016397b483c2fac725f7ed48a8 (emphasis mine)

Quote

optionally (use_lpc=2) support Cholesky factorization for finding the…

… lpc coeficients

this will find the coefficients which minimize the sum of the squared errors,
levinson-durbin recursion OTOH is only strictly correct if the autocorrelation matrix is a
toeplitz matrix which it is only if the blocksize is infinite, this is also why applying
a window (like the welch winodw we currently use) improves the lpc coefficients generated
by levinson-durbin recursion ...

optionally (use_lpc>2) support iterative linear least abs() solver using cholesky
factorization with adjusted weights in each iteration

compression gain for both is small, and multiple passes are of course dead slow

Originally committed as revision 5747 to svn://svn.ffmpeg.org/ffmpeg/trunk

That description perfectly matches IRLS (iteratively reweighted least squares). My IRLS code uses Cholesky factorization as well. I'll look into this if I take another look at my IRLS code for libFLAC.