Re: New FLAC compression improvement
Reply #44 – 2021-08-13 15:07:51
* One is against ffmpeg; TL;DR for that test is that your build at -8 nearly catches ffmpeg's compression_level 8 (maybe look at that source code for a tip?) Also tested against compression_level 12. Perhaps I will. As far as I know, ffmpeg's FLAC is based on Flake, like Cuetools.flake. However, Cuetools' flake has seen some development (like implementation of partial_tukey and punchout_tukey), unlike ffmpeg.* So, given the worry that variable block size won't decode well, maybe the "safer" way would be to, if not formally restricting "subset" to require fixed block size, then in practice stick to it in the reference encoder as default, so that if variable block size is supported it has to be actively switched on. But by the same "who has even tested this?" consideration, maybe it is even unsafe to bet that -l 32 -b 8192 may decode fine even for sample rates when it is subset? Yes, it might be unsafe to use -l 32 -b 8192 when encoding for maximum compatibility. However, properly decoding -l 32 -b 8192 is quite simple, as it is simply more of the same (longer block, longer predictor). Also, it is part of the FLAC test suite. Variable blocksizes are not part of the FLAC test suite, and quite a few things change. For example, with a fixed blocksize, the frame header encodes the frame number, whereas with a variable blocksize, the sample number is encoded.* But within fixed block size and within subset, it is possible to calculate/estimate/guesstimate what block size is best. No idea if there is a quick way of estimating without fully encoding the full file. But there seems to be some sweet-spot, not so that larger is better, in that 4096 apparently beats 4608 and 8192 beats 16384. The problem is that it is probably impossible to know upfront what the optimal blocksize is for a whole file. Now that might be due to the functions being optimized by testing at 4096? If so, is there any particular combination of partial_tukey(x);partial_tukey(y);punchout_tukey(s);punchout_tukey(t) that would be worth looking into for 4608 / 8192? I cannot answer that question without just trying a bunch of things and see. I will explain the idea behind these apodizations (which I will call partial windows) I would argue that a certain set of partial windows @ 44.1kHz with blocksize 4096 should work precisely the same with that file @ 88.2kHz with a blocksize of 8192. partial_tukey(2) adds two partial windows, one in which the first half of the block is muted and one in which the second half of the block is muted. partial_tukey(3) adds three windows, one in which the first 2/3th of the block is muted, one in which the last 2/3th of the block is muted and one in the first and last 1/3th of the block is muted. punchout_tukey does the opposite of partial_tukey. Using punchout_tukey(2) and partial_tukey(2) together makes no sense, because you get the same windows twice, because punchout_tukey(2) creates the same two windows but swapped. punchout_tukey(3) adds 3 windows, one with the first 1/3th muted, one with the second 1/3th muted and one with the third 1/3th muted. If a block consists of a single, unchanging sound, partial_tukey and punchout_tukey do not improve compression. If a block has a transient in the middle of the block (for example, the attack of a new note at roughly the same pitch) punchout_tukey(3) adds a window in which the middle with this transient is muted. The LPC stage can now focus on accurately predicting the note and not bother with the transient. This is why adding these apodizations improves compression. As TBeck put it, the signal is cleaner. Of course this transient still has to be handled by the entropy coding stage, but a predictor that does part of the block very good beats one that does of all of the block mediocre on compression. So, choosing how many partial windows to add depends on the number of transients in the music, the samplerate and the blocksize. If the samplerate doubles, the blocksize can be doubled keeping the same number of partial windows. If the samplerate doubles and the blocksize is kept the same, the number of partial windows can be halved. If you samplerate is kept the same and the blocksize is doubled, the number of partial windows should be doubled. However, it depends on the music as well. Solo piano music at a slow tempo won't benefit from more partial windows as there are few transients to 'evade'. Non-tonal music on which prediction doesn't do much won't benefit either. Very fast paced tonal music might benefit from more partial windows. The current default works quite well on a mix of (44.1kHz 16-bit) music.