Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLAC v1.4.x Performance Tests (Read 23071 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #275
Tested:
30-ish GB of decoded HDCD rips. Yeah I know I shouldn't have made that irreversible mistake fifteen years ago, but here we are.
So these are effectively 17-ish bits in 24 bit container - 95 percent of the tracks have a peak less than .8 scanned with oversampling.

* Why test these? Just to see whether there are any surprises with slightly unusual signals.
* Were there any? Not really. -4 isn't particularly good; I have earlier on questioned whether -5 is really much of an improvement over -4, but here it is. Anyway, that question has probably not made any great impact, I mean who uses -4?

What I did was to re-encode FLAC files (with overwrite) on an SSD. That is why I quote the times per encoded gigabyte. "142" means reference FLAC 1.4.2 Win64, 134 means 1.3.4 Win64, both Xiph builds.
Numbers then. Size relative to 1.4.2 -5, then setting, then comment with time taken. Sizes are file sizes, with tags and default padding.

+1.574%   142-4   ~25 sec per GB (encoded GB).
+0.119%   134-5 
ref.point   142-5   ~30 sec/GB. 31 808 619 711 bytes
-0.255%   134-7 
-0.306%   134-8 
-0.347%   142-7  ~37 sec/GB.
-0.397%   142-8  ~1 minute/GB.
-0.399%   134-8p
-0.412%   142-8e   ~3 minutes/GB.
-0.428%   142-"all the sevens but no p" (see below) - also ~3 minutes/GB.
-0.461%   134-8pe ~20 min/GB.
-0.480%   142-8p 2min40s/GB.
-0.507%   142-8pe Also in the ~20min/GB ballpark
-0.514%   142-"-p all the sevens", about the same time as -8pe

That "all the sevens" - and why not "8"? It is not because it is good! It is because I wanted to come up with a command that was easy to remember, takes about as much time as "-e" and outcompresses -e. Supporting the claim that "-e" should not be used on typical music with normal resolutions: if you are even willing to wait for -e, then there are better things around. (It is known that -e still has something for it on higher sampling rates, potentially.)
The actual option line is -7r7 -A "flattop;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" with an additional "-p" for the last line. And a -f to overwrite, but that goes for everything.
For those who ask "why -7 and not -8"? It wouldn't make any difference, -8 is -7 with a different (and heavier) "-A", and the moment I write "-A" here I override that by specifying yet a different (and heavier!!) -A.

Did you really type "flatopp"?
Damn, only one way to find out, and that is not done in two minutes ...
Here I made sure to get it right!  O:)

But anyway, bottom line is what we knew, 1.4.x improves, and -e does not deliver at these resolutions. -p is nearly as expensive, but much better

Re: FLAC v1.4.x Performance Tests

Reply #276
Can anyone try to replicate the following observation, using their fave build and CPU?

Prediction order (the "-l" switch) makes for big time penalty somewhere above -l12.

Being --lax settings, they may have gone under everyone's radar for good reason. But the impact is unexpectedly big here, see plot below.

Here is what I did, using the "timer64" tool - but PowerShell wizards can probably come up with something built-in (and *n*x users, you likely know what to do):
for /l %l IN (6,1,32) DO timer64 flac --lax -fr0 -ss -l %l filename*.flac >> logfile.txt
for /l %l IN (6,1,32) DO timer64 flac --lax -fpr0 -ss -l %l filename*.flac >> logfile.txt
... re-encoding yes (that's the -f), so in principle that means every successive encode has to read a more complicated FLAC file, but (1) FLAC decodes so quick it shouldn't matter, and (2) anyway a jump would be a surprise. The "-r0" to ensure that the partitioning is done the same for every run.

Timings on a quick run on one album (Swordfishtrombones) - this fanless computer is cooling constrained and timings have shown to be quite unreliable, but I ran  -l15 and -l16 (indicated in the oval) several times on several files and that particular jump is quite consistent. For -p, the impact is more dramatic already at -l 13.
              

Re: FLAC v1.4.x Performance Tests

Reply #277
Prediction order (the "-l" switch) makes for big time penalty somewhere above -l12.
Makes perfect sense. Loops are only unrolled until order 12, not for orders above that.
Music: sounds arranged such that they construct feelings.


Re: FLAC v1.4.x Performance Tests

Reply #279
Sounds like code optimization (-funroll-loops) that's only beneficial until you reach max lpc order of 12


Re: FLAC v1.4.x Performance Tests

Reply #281
I don't know how to explain this in simple terms, but let's say that for each order up to and including 12, there is code optimized for that specific order. For orders above 12, there is generic code.

A compiler can optimize loops in code much better if it knows in advance how often that loop will be traversed. It can 'unroll' a loop. In the generic code, the CPU will have to check after each addition and/or multiplication whether it needs to do another one for this sample, or whether it can move on to the next sample. When a loop is unrolled, there are simply a number of additions and multiplications after one another before encountering a check.

So, generic code looks like this:
Code: [Select]
repeat the following code for each sample {
     repeat the following code for each order {
          do multiplication
          do addition
     }
}

In FLAC, this is unrolled for orders below 12 to the following.
Code: [Select]
[...]
Use this code for order 2:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
}

Use this code for order 3:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
}

Use this code for order 4:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
}

This is pretty much what happens for residual calculation, strictly up to order 12. This is the change you're seeing for the red line, because when using -p the residual calculation code dominates the execution time. Just look at the code here: https://github.com/xiph/flac/blob/master/src/libFLAC/lpc.c#L1101

For the blue line, the change between 15 and 16, is a little bit more complicated. This has to do with the autocorrelation calculation, which can be optimized in groups of 4, more or less. So, there is code for order below 8, below 12 and below 16. You see this with the red line, because when not using -p (or -e) the autocorrelation calculation dominates the execution time. Look at the code here: https://github.com/xiph/flac/blob/68f605bd281a37890ed696555a52c6180457164f/src/libFLAC/lpc.c#L158
Music: sounds arranged such that they construct feelings.

 

Re: FLAC v1.4.x Performance Tests

Reply #282
Ah, OK. So it could be in the code and it could be done at compile time - meaning that some builds might potentially behave different? Or maybe not. Anyway, problem "solved" ...

... except for those who might think that hey, if they are willing to consider a different lossless codec than FLAC, then non-subset FLAC is at least as compatible maybe?  O:)