FLAC v1.4.x Performance Tests

Topic: FLAC v1.4.x Performance Tests (Read 97829 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #275 – 2023-03-24 18:54:22

Tested:
30-ish GB of decoded HDCD rips. Yeah I know I shouldn't have made that irreversible mistake fifteen years ago, but here we are.
So these are effectively 17-ish bits in 24 bit container - 95 percent of the tracks have a peak less than .8 scanned with oversampling.

* Why test these? Just to see whether there are any surprises with slightly unusual signals.
* Were there any? Not really. -4 isn't particularly good; I have earlier on questioned whether -5 is really much of an improvement over -4, but here it is. Anyway, that question has probably not made any great impact, I mean who uses -4?

What I did was to re-encode FLAC files (with overwrite) on an SSD. That is why I quote the times per encoded gigabyte. "142" means reference FLAC 1.4.2 Win64, 134 means 1.3.4 Win64, both Xiph builds.
Numbers then. Size relative to 1.4.2 -5, then setting, then comment with time taken. Sizes are file sizes, with tags and default padding.

+1.574%   142-4   ~25 sec per GB (encoded GB).
+0.119%   134-5
ref.point   142-5   ~30 sec/GB. 31 808 619 711 bytes
-0.255%   134-7
-0.306%   134-8
-0.347%   142-7 ~37 sec/GB.
-0.397%   142-8 ~1 minute/GB.
-0.399%   134-8p
-0.412%   142-8e   ~3 minutes/GB.
-0.428%   142-"all the sevens but no p" (see below) - also ~3 minutes/GB.
-0.461%   134-8pe ~20 min/GB.
-0.480%   142-8p 2min40s/GB.
-0.507%   142-8pe Also in the ~20min/GB ballpark
-0.514%   142-"-p all the sevens", about the same time as -8pe

That "all the sevens" - and why not "8"? It is not because it is good! It is because I wanted to come up with a command that was easy to remember, takes about as much time as "-e" and outcompresses -e. Supporting the claim that "-e" should not be used on typical music with normal resolutions: if you are even willing to wait for -e, then there are better things around. (It is known that -e still has something for it on higher sampling rates, potentially.)
The actual option line is -7r7 -A "flattop;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" with an additional "-p" for the last line. And a -f to overwrite, but that goes for everything.
For those who ask "why -7 and not -8"? It wouldn't make any difference, -8 is -7 with a different (and heavier) "-A", and the moment I write "-A" here I override that by specifying yet a different (and heavier!!) -A.

Quote from: Porcus on 2022-10-24 21:49:05

Quote from: bennetng on 2022-10-24 18:30:28
Did you really type "flatopp"?
Damn, only one way to find out, and that is not done in two minutes ...

Here I made sure to get it right!

But anyway, bottom line is what we knew, 1.4.x improves, and -e does not deliver at these resolutions. -p is nearly as expensive, but much better

Re: FLAC v1.4.x Performance Tests

Reply #276 – 2023-03-26 10:10:02

Can anyone try to replicate the following observation, using their fave build and CPU?

Prediction order (the "-l" switch) makes for big time penalty somewhere above -l12.

Being --lax settings, they may have gone under everyone's radar for good reason. But the impact is unexpectedly big here, see plot below.

Here is what I did, using the "timer64" tool - but PowerShell wizards can probably come up with something built-in (and *n*x users, you likely know what to do):
for /l %l IN (6,1,32) DO timer64 flac --lax -fr0 -ss -l %l filename*.flac >> logfile.txt
for /l %l IN (6,1,32) DO timer64 flac --lax -fpr0 -ss -l %l filename*.flac >> logfile.txt
... re-encoding yes (that's the -f), so in principle that means every successive encode has to read a more complicated FLAC file, but (1) FLAC decodes so quick it shouldn't matter, and (2) anyway a jump would be a surprise. The "-r0" to ensure that the partitioning is done the same for every run.

Timings on a quick run on one album (Swordfishtrombones) - this fanless computer is cooling constrained and timings have shown to be quite unreliable, but I ran -l15 and -l16 (indicated in the oval) several times on several files and that particular jump is quite consistent. For -p, the impact is more dramatic already at -l 13.

Re: FLAC v1.4.x Performance Tests

Reply #277 – 2023-03-26 14:01:55

Quote from: Porcus on 2023-03-26 10:10:02

Prediction order (the "-l" switch) makes for big time penalty somewhere above -l12.

Makes perfect sense. Loops are only unrolled until order 12, not for orders above that.

Re: FLAC v1.4.x Performance Tests

Reply #278 – 2023-03-26 17:53:30

Quote from: ktf on 2023-03-26 14:01:55

Loops are only unrolled until order 12

That is something I need translated ... _{or maybe I should just not bother my pretty little head with those details}.

Re: FLAC v1.4.x Performance Tests

Reply #279 – 2023-03-26 18:46:36

Sounds like code optimization (-funroll-loops) that's only beneficial until you reach max lpc order of 12

Re: FLAC v1.4.x Performance Tests

Reply #280 – 2023-03-26 19:15:06

Quote from: Porcus on 2023-03-26 17:53:30

That is something I need translated

Good thing we're on the Internet!

https://en.wikipedia.org/wiki/Loop_unrolling

Re: FLAC v1.4.x Performance Tests

Reply #281 – 2023-03-26 19:18:51

I don't know how to explain this in simple terms, but let's say that for each order up to and including 12, there is code optimized for that specific order. For orders above 12, there is generic code.

A compiler can optimize loops in code much better if it knows in advance how often that loop will be traversed. It can 'unroll' a loop. In the generic code, the CPU will have to check after each addition and/or multiplication whether it needs to do another one for this sample, or whether it can move on to the next sample. When a loop is unrolled, there are simply a number of additions and multiplications after one another before encountering a check.

So, generic code looks like this:

Code: [Select]

repeat the following code for each sample {
     repeat the following code for each order {
          do multiplication
          do addition
     }
}

In FLAC, this is unrolled for orders below 12 to the following.

Code: [Select]

[...]
Use this code for order 2:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
}

Use this code for order 3:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
}

Use this code for order 4:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
}

This is pretty much what happens for residual calculation, strictly up to order 12. This is the change you're seeing for the red line, because when using -p the residual calculation code dominates the execution time. Just look at the code here: https://github.com/xiph/flac/blob/master/src/libFLAC/lpc.c#L1101

For the blue line, the change between 15 and 16, is a little bit more complicated. This has to do with the autocorrelation calculation, which can be optimized in groups of 4, more or less. So, there is code for order below 8, below 12 and below 16. You see this with the red line, because when not using -p (or -e) the autocorrelation calculation dominates the execution time. Look at the code here: https://github.com/xiph/flac/blob/68f605bd281a37890ed696555a52c6180457164f/src/libFLAC/lpc.c#L158

Re: FLAC v1.4.x Performance Tests

Reply #282 – 2023-03-26 19:54:41

Ah, OK. So it could be in the code and it could be done at compile time - meaning that some builds might potentially behave different? Or maybe not. Anyway, problem "solved" ...

... except for those who might think that hey, if they are willing to consider a different lossless codec than FLAC, then non-subset FLAC is at least as compatible maybe?

Re: FLAC v1.4.x Performance Tests

Reply #283 – 2023-04-12 11:50:36

To the "showcases" division, a file downloaded from <stupidlyhi-rez site that also offers a DSD file at 16x the "ordinary" 2822400>
Resolutions 384/24 and 384/32 (huh, that says encoded with 1.3.1 that one too).
Duration: 5:04.

The 32-bit file:
480 434 934 bytes (12637 kbps) for the one downloaded that says it was encoded with reference libFLAC 1.3.1 20141125
310 731 527 bytes for 1.4.2 at -8per7 -b8192 -A <quite some but not running overnight>

The 24-bit file is maybe more interesting, since flac.exe 1.3.1 supports 24 bits and it is easier to track the options used:
244 474 825 bytes for 1.3.1 at -5 (matches the 6428 kbit/s as downloaded).
240 441 708 bytes for 1.3.1 at -8pe (this is hi-rez, -e matters more than -p on this track)
I couldn't get 1.4.2 with fixed predictor to beat any of these, but ...
231 107 835 bytes for 1.4.2 at -3r0 -l4 deliberately using as weak LPC as ...
182 464 188 bytes for 1.4.2 at -3 taking more than 20 percent off what 1.3.1 could achieve
177 297 877 bytes for ffmpeg at default
175 257 853 bytes for 1.4.2 at -3e - see, "-e" matters quite a lot even here.
164 436 893 bytes for 1.4.2 at -5 default.
164 426 962 bytes for ffmpeg at -compression_level 8
153 320 636 bytes for ffmpeg at -compression_level 12
148 262 872 bytes for 1.4.2 at -8
145 371 133 bytes for 1.4.2 at -8e; -e matters less here!
142 006 237 bytes for 1.4.2 at -8per7 -b8192 -A <quite some but not running overnight>

Re: FLAC v1.4.x Performance Tests

Reply #284 – 2023-04-15 20:06:33

Quote from: Wombat on 2023-04-13 01:54:34

Here are compiles using a more generic CPU optimization x86-64-v3 instead of haswell while using similar capabilities up to AVX2 in the hope for better performance across more modern CPU types.
Inside are builds with Clang 16.0.1, GCC 12.2.0 and a "disable-asm-optimizations" version with faster 16bit performance but slower 24bit performance for apps like CUETools or EAC.

Thanks. It is the first time that Clang got the best overall results with my i3-12100 in both CDDA and hi-res. The GCC builds are pretty similar to the builds in last October.

Re: FLAC v1.4.x Performance Tests

Reply #285 – 2023-04-15 20:44:45

Thank you for the feedback. Here on my 5900x GCC is still slightly faster. I may add exact benchmarking is hard on this system because every run is different.

Re: FLAC v1.4.x Performance Tests

Reply #286 – 2023-04-15 22:06:31

Quote from: Porcus on 2023-04-12 11:50:36

To the "showcases" division, a file downloaded from <stupidlyhi-rez site

According to @bennetng , it is not only the resolution itself that is dumb, it is the combination of resolution and "crappy quality junior MIDI sequencing stuff", so take it with a grain of salt. Anyway, stupid resolutions are a waste of space, and at least not bad that FLAC 1.4.x wastes less space.

Also, you got artefacts like the following, for the 384/24 file:
308 548 254 for WavPack -hx, but -hx4 compresses to less than half of that: 151 671 854
209 to 222 for Monkey's ... which isn't happy about stupidly high resolutions and gets beaten by FLAC -3
FLAC -8e beats OptimFROG --preset 0.

Something is "as should be" though: When I "faked it as 192 kHz" (that is, same samples just telling the file they are at half speed) so that TAK can handle it, it compresses better than anything but default-and-up OptimFROG. Which at preset 10 can get it to < 119. Not quite as far to half the monkey.

Re: FLAC v1.4.x Performance Tests

Reply #287 – 2023-04-16 08:09:04

The original sample rate is likely 96kHz. See how clean the upper spectrum is, it is impossible for anything originally recorded in 384kHz with an ADC. Basically it is the same thing I mentioned previously regarding the use of -e, even if the content is not "crappy quality junior MIDI sequencing stuff".
https://hydrogenaud.io/index.php/topic,123025.msg1018144.html#msg1018144

Another thing is that the 24-bit file on the website is clipped (the vertical cyan lines). The website sells conversion software and it is a pretty pathetic way to sell the product with such a careless conversion.

Here are some examples of how clip-free 24-bit conversions should look like when using the 32-bit file as input.

Re: FLAC v1.4.x Performance Tests

Reply #288 – 2023-04-16 15:45:12

Quote from: bennetng on 2023-04-16 08:09:04

The original sample rate is likely 96kHz. See how clean the upper spectrum is, it is impossible for anything originally recorded in 384kHz with an ADC.

It makes me wonder if it is possible to exploit without a brute-force -e.
How are upsamples done in practice and how would that translate to "a good linear predictor"? Say for the sake of the illustration: my "upsampling" just copies each sample - and say if you put weights ABCDEFGH on the eight past signals of the original, you could just try
0, A, 0, B, 0, C, 0, D, 0, E, 0, F, 0, G, 0, H
for the upsampled.
I have a hunch that it isn't as simple. Of course if one were using an iterative estimation, that could be one starting point to try.

Quote from: bennetng on 2023-04-16 08:09:04

Another thing is that the 24-bit file on the website is clipped (the vertical cyan lines). The website sells conversion software and it is a pretty pathetic way to sell the product with such a careless conversion.

Vicious.
(I love it!)

So whoever did this went to lenghts to force libflac 1.3.1 to handle a 32-bit signal (I mean, flac.exe couldn't!) but the 24-bit signal behaves different ... what for? 32-bits rounding up the 24th to overflow? Anyway, foo_bitcompare reports peak difference of -20.48 dB, which is suspiciously high but not high enough for my ears to bother trying to ABX it out.

(So next time I would worry, do I really want to give much publicity to a site that helps you "upconvert" your music to something nonsensical just to audiophool customers, then ... probably someone will uncover the ways to ridicule them. Let's see:
* this one
* Sound Liaison going 768 kHz ... for what?! It was recorded at much less. Oh, it captures the all-important noise of the analog tape they put their digital recording through
* nativedsd putting out high resolution test files that are just zero-padded to a higher bit depth)

Re: FLAC v1.4.x Performance Tests

Reply #289 – 2023-04-16 16:08:48

Quote from: Porcus on 2023-04-16 15:45:12

It makes me wonder if it is possible to exploit without a brute-force -e.

My guess is the provided -A options are not designed for this kind of abrupt and deep cutoff, but even if more options are provided we still need to try them out manually.

Quote

So whoever did this went to lenghts to force libflac 1.3.1 to handle a 32-bit signal (I mean, flac.exe couldn't!) but the 24-bit signal behaves different ... what for? 32-bits rounding up the 24th to overflow? Anyway, foo_bitcompare reports peak difference of -20.48 dB, which is suspiciously high but not high enough for my ears to bother trying to ABX it out.

It may not be a flac issue, my theory is that the clipping was introduced before encoding, could be an overlook when doing DAW export, or a bug in that guy's software converter.

Some other findings would like to see others to confirm/test:

"Wait for Spring" 32/384:
Upper: @Wombat 's Clang
Lower: Retune by @ktf
Method: Encode the original file to a new file in PowerShell.

-8 -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(2e-2)"
323473111 bytes, 49 seconds
337529462 bytes, 103 seconds

-8l32 -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(3e-2)"
348323578 bytes, 105 seconds
337529462 bytes, 103 seconds

-8e -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(3e-2)"
307255778 bytes, 266 seconds
303683449 bytes, 1551 seconds

-8p -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(3e-2)"
317620242 bytes, 429 seconds
332639121 bytes, 879 seconds

I don't know why -l32 makes no difference on the retune, perhaps it detects the bloat?

Re: FLAC v1.4.x Performance Tests

Reply #290 – 2023-04-16 17:25:41

Quote from: bennetng on 2023-04-16 16:08:48

It may not be a flac issue

I didn't intend to suggest it was. Rather, whoever made this converter went to lenghts to bend flac-before-32-bit-support to accommodate it, but couldn't be bothered to check for clipping.

Quote from: bennetng on 2023-04-16 16:08:48

I don't know why -l32 makes no difference on the retune, perhaps it detects the bloat?

That is because the retune's "-8" does in fact select "-l 32" whenever the sampling rate is high enough for it to be subset-compliant. "-r 8" as well, since that part of it got a speedup.
But also, the retune did tweak the algorithm to select LPC order, with consequences you can find in that thread. Next test build is likely different.

Re: FLAC v1.4.x Performance Tests

Reply #291 – 2023-04-16 17:34:30

Quote from: Porcus on 2023-04-16 17:25:41

That is because the retune's "-8" does in fact select "-l 32" whenever the sampling rate is high enough for it to be subset-compliant. "-r 8" as well, since that part of it got a speedup.
But also, the retune did tweak the algorithm to select LPC order, with consequences you can find in that thread. Next test build is likely different.

Ah yes, I've completely forgotten these details

Re: FLAC v1.4.x Performance Tests

Reply #292 – 2023-04-17 18:58:09

The retune can be quite efficient with other settings without increasing -l.
"Wait for Spring" 32/384:

-8b8192 -l12 -A "subdivide_tukey(5/1);blackman;gauss(5e-2);gauss(2e-2)"
314252255 bytes, 52 secs

-8b8192 -l12 -A "subdivide_tukey(12/2);blackman;gauss(5e-2);gauss(2e-2)"
304400309 bytes, 203 secs

-8eb8192 -l12 -A "subdivide_tukey(5/1);blackman;gauss(5e-2);gauss(2e-2)"
302974489 bytes, 322 secs

Some characteristics of a file can be obtained in a much cheaper way by scanning the whole file, like EBU loudness stats and this kind of resampling. It should be possible to make a separate app to gather these information then generate a suggested command-line batch file to feed the encoder.

For example, loudly mastered files could be benefited from using fixed -q values:
https://hydrogenaud.io/index.php/topic,123025.msg1018053.html#msg1018053

Re: FLAC v1.4.x Performance Tests

Reply #293 – 2023-04-29 14:27:10

@Porcus

have you ever experimented with the other anodization functions in flac? apart from the widely used subdivide_tukey, there are a bazillion other functions there and you can actually use multiple of them together

this, together with, -r 8 -p -l 12, seems to give slightly better or equal to -l8 compression, yet noticeably faster

any ideas? oh, I also found that for some tracks, like electronic or digital music, much small block sizes like 128, 256, were actually giving phenomal compression compared to the default L8 and 4096 size... in some usual tracks, I tested 2048 as block size and it worked better, but then it craps out in another set... I wonder if there's a way to find which block size could be optimal for a particular track/set of tracks without manually brute forcing them...

any ideas? I wonder if that -e extended model search does this, because it is not documented anywhere exactly what it does apart from the "expensive!!!" prompt in FLAC docs

MOD note: This post was merged from another topic.

Re: FLAC v1.4.x Performance Tests

Reply #294 – 2023-05-01 00:03:34

Brief while out on the road, @darkalex :
Moderator moved this to here. You will find several suggestions in this thread.

Re: FLAC v1.4.x Performance Tests

Reply #295 – 2023-05-15 21:37:27

Another example that -e is more suitable for these kinds of test signals with a lot of unused spectral spaces.
https://www.soundonsound.com/techniques/sos-audio-test-files-downloads

Re: FLAC v1.4.x Performance Tests

Reply #296 – 2023-05-16 06:31:23

Got any idea of any "easy" way to improve on the guesstimation?
flac.exe can hardly employ a full signal analysis, but I suppose some blockwise variability metric might be put at work.

Re: FLAC v1.4.x Performance Tests

Reply #297 – 2023-05-16 07:31:44

I want to ask the same question as well. My comments about -e are based on observation, not because I precisely know how -e works. Just wondering how much time can be saved if a two-pass approach is allowed.

Some audio interfaces (e.g. Merging) have additional filters to reduce ultrasonic noise from ADC modulators when using >= 176.4k recording rates, also one cannot rule out that some hi-res releases have ultrasonic noise attenuated by using a DAW, and filter characteristics would vary depending on the mastering engineers' preferences. The same applies to DSD to PCM transcoding as well, different software may have different filters.

Re: FLAC v1.4.x Performance Tests

Reply #298 – 2023-05-16 14:17:37

can the dev or someone who knows the source code of FLAC chime in and clarify this for us?

the -e function seems to be the most complex in FLAC and except the ambiguous 1 liner description, we have no idea how it works or what it does

anyone?

Re: FLAC v1.4.x Performance Tests

Reply #299 – 2023-05-16 15:02:14

It is actually rather simple. Normally, FLAC uses some well-known math: calculating LPC coefficients with the Yule-Walker equations solved with Levinson-Durbin recursion. This goes back many decades, to 1960.

Anyway, this recursion gives us a set of models, one for each order. If you specified a max LPC predictor order of 12 (this is default for compression level 12) this means 12 models are returned, each with an associated error. This error does not correlate exactly with compression, but it works rather well. The error is slightly weighted to account for the fact that choosing a higher order gives a slight amount of overhead.

If you do not specify -e, FLAC picks the model of which the weighted error is lowest. If you specify -e, FLAC does not use the error and simply tries all generated models one by one, picking the one giving the highest compression. This takes quite a while of course.

If you specify more than one apodization function, FLAC does this procedure (generating models and subsequently trying 1 or all) once for each apodization.

Notice