FLAC v1.4.x Performance Tests

Topic: FLAC v1.4.x Performance Tests (Read 79770 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #350 – 2023-07-28 17:08:51

Quote from: Porcus on 2023-07-28 16:44:05

BTW, should it matter at all what CPU is used for compiling?

(Compatibility issues here are only AVX for one Rarewares build and which ones of yours? Plus AVX512 for your v4?)

The CPU shouldn't play a role for compiling until you tell the compiler to use "native" optimization and it tries to detect the CPU in use.
The last binaries are all AVX2 if not stated otherwise.

Re: FLAC v1.4.x Performance Tests

Reply #351 – 2023-07-29 13:41:58

Some multithreading results posted in that thread: https://hydrogenaud.io/index.php/topic,124437.msg1030783.html#msg1030783

In the last line of the reply, the "three -j1 runs discarded" (from that table) were done with Wombat's "GCC" build, to check whether times were about the same for the "v3" build posted above. Not much differences to write home about. The "v3" had so small variations to ktf's build, and also the GCC build might have had some inconsistent timing, being ran immediately after a different setting (say its -5r7 ran after another build doing something that was more than twice as intensive).

Anyway, "v3" didn't make for miracles on that computer.

Re: FLAC v1.4.x Performance Tests

Reply #352 – 2023-07-29 14:47:37

Many hanks for more numbers! While the last v3+v4 was more meant to check if AVX-512 helps.
If this was your i5-7500T it also doesn't like the -falign-functions=32 compiler flag.

Re: FLAC v1.4.x Performance Tests

Reply #353 – 2023-07-30 23:34:26

It was the i5-7500T - which also, I misread it, is 4 cores 4 threads.

But here are some figures from my i5-1135G7. Lower is better (and more negative is better). Only one run each, so take with a grain of salt.
I have two different things into the table here as well: typically, percents are "overhead penalty" of actual time vs the idealized "j1 time / # of threads". Edit: The "j9" is corrected to be 8 threads, which is what the CPU has. The -j9 setting was to verify that nothing too stupid happens.
But, the "time w4 vs w3" line has nothing to do with overhead, it measures how much time changed moving to your v4 from your v3. You see benefits at -8, but the other way around for -5, -3 and -2er7. At -2r0 things are going so fast anyway that I won't trust the numbers without multiple runs, but in the very least the sign is negative more often than positive.

The two rightmost columns are the same encoding parameter plus the "-M" to check if the overhead is that nasty as previous figures suggested (it is).
I also ran a range of -0r0 -b <something>, and there is no sign that the "v4" is worth it.

-8:	j1 time/diff	j2 ovrhd/diff	j3 ovrhd/diff	j4 ovrhd/diff	j5 ovrhd/diff	j8 ovrhd/diff	j9 ovrhd/diff	-Mj1 time/diff	j2 ovrhd/diff
v5	121	8%	23%	40%	55%	124%	138%	83	97%
Wombat3	113	16%	29%	43%	70%	138%	158%	81	96%
Wombat4	102	23%	34%	46%	75%	153%	155%	75	111%
time w4 vs 43	−9%	−4%	−5%	−7%	−6%	−4%	−10%	−8%	0%
-5:
v5	49	24%	56%	80%	105%	239%	246%	40	88%
Wombat3	46	29%	56%	77%	123%	248%	243%	39	91%
Wombat4	47	46%	55%	111%	127%	257%	270%	41	84%
time w4 vs w3	1%	14%	0%	20%	3%	3%	9%	5%	1%
-3:
v5	37	26%	68%	100%	148%	312%	307%	38	92%
Wombat3	36	25%	62%	109%	146%	320%	317%	38	83%
Wombat4	36	32%	66%	120%	162%	335%	345%	38	99%
time w4 vs w3	−1%	4%	1%	4%	5%	2%	5%	0%	8%
-2er7:
v5	70	16%	38%	52%	75%	180%	212%	50	102%
Wombat3	65	20%	39%	64%	86%	188%	227%	48	104%
Wombat4	66	17%	37%	64%	94%	203%	229%	48	95%
time w4 vs w3	2%	−1%	1%	1%	6%	7%	2%	1%	−4%
-2r0:
v5	44	24%	57%	85%	153%	377%	320%	38	91%
Wombat3	39	46%	81%	123%	180%	394%	362%	38	101%
Wombat4	38	55%	63%	111%	205%	379%	360%	36	120%
time w4 vs w3	−1%	5%	−12%	−7%	7%	−4%	−2%	−5%	4%

Re: FLAC v1.4.x Performance Tests

Reply #354 – 2023-07-31 00:33:14

I guess original v5 is without AVX-2. So for high compression the speedup from non to AVX-2 to AVX-512 scales well at least.
For -5 the numbers are a bit surprising.
Again thanks for the numbers!

Re: FLAC v1.4.x Performance Tests

Reply #355 – 2023-08-31 13:41:49

In order not to hijack @ktf 's comparison update thread with too many FLAC specifics:
The fastest flac.exe encoding is now faster than decoding. I replicated this on a small sample of 192/24 files on an i5-7500T, repeated a few times to give you "ballpark" timings. Everything done on an internal SSD to the same SSD.
FLAC files: created using -0, or for the even faster times: -0 --no-md5

16 seconds / 14 seconds: decoding, respectively, -0 files / -0 --no-md5 files
13 seconds / 8 seconds: encoding -0 / -0 --no-md5
10 seconds / 6 seconds: flac -t on -0 / --no-md5

I did much of the same thing on a larger corpus with 96 kHz files too, similar results.

Now if I add -b4096 to everything, numbers improve slightly. Same order as above:
13/11 decoding
11/7 encoding
9/5.5 test

Add a very few tenths of a second to get the difference between -0b4096 and -1b4096, the latter being faster than -0 as well - actually, -2b4096 (optimizing joint stereo / dual mono over -0's dual mono only) was at -0 speed, give or take a few tenths of a second. It looks like -b2048 could be just as fast as -b4096, which would be in line with what I have seen before - but I am not going to pretend that kind of accuracy.

Re: FLAC v1.4.x Performance Tests

Reply #356 – 2023-09-22 13:35:04

Didn't read the whole thread, so apologies if this was mentioned...
I'm seeing significantly higher bitrates with FLAC 1.4.2 and 1.4.3 for "simple" signals, like sine waves, compared to FLAC 1.3.4, both at -8 compression.
For example, a 1kHz 48k-16bit sine tone is at 164 kbps with 1.3.4 and at 231 kbps with 1.4.3.

Pink noise, on the other hand, has almost the same bitrate with both versions.

Is this known/expected?

Re: FLAC v1.4.x Performance Tests

Reply #357 – 2023-09-22 15:34:44

Quote from: Brand on 2023-09-22 13:35:04

Is this known/expected?

No, not really. I'm able to reproduce this.

FLAC 1.4.0 had some major changes that benefit most sources, but not all. Apparently the sine wave you mention is one of the cases that do not benefit. However, a 1kHz sine sampled at 44.1kHz or a 1.01kHz sine sampled at 48kHz show a much smaller loss.

Re: FLAC v1.4.x Performance Tests

Reply #358 – 2023-09-23 11:23:45

Just tried a 10 seconds full scale dithered 1kHz sine at 16/48.

1.4.3
253kbps -8
146kbps -8p
114kbps -8e

1.3.4
186kbps -8
157kbps -8e
143kbps -8p

[edit]Attached a 4567Hz sine, 1.4.3's -8e performs much better than 1.3.4.

Re: FLAC v1.4.x Performance Tests

Reply #359 – 2023-09-23 13:03:15

One thing is that "-8" differs, because it is now synonymous to something else with different apodization functions.
But -5 also. bennetng's file recompressed:
322377 bytes with 1.3.4 win64 at -5 (379745 (bigger!) by adding --lax -l32)
381923 bytes with 1.4.2 win64 at -5 (382278 (bigger!) by adding --lax -l32)

Adding -p, similar happens:
222297 bytes with 1.3.4 win64 at -5p
268901 bytes with 1.4.2 win64 at -5p

-e instead of -p reverses the order, now 1.4 makes smaller:
276571 bytes with 1.3.4 win64 at -5e
269589 bytes with 1.4.2 win64 at -5e

-pe then, 1.3.4 is back winning, but not at -l32:
184466 bytes with 1.3.4 win64 at -5pe (down to 183542 by adding --lax -l32)
188771 bytes with 1.4.2 win64 at -5pe (down to 180144 by adding --lax -l32)

Edit: Could get it down as far as this:
161713 bytes with 1.4.2 win64 at -pe --lax -l32 -A<tonsofthem>. -r4 or -r15 didn't matter, -r3 inflated it one byte
139241 by adding -b 32768 -r15
136487 by adding -b 65535 , confirming that once the predictor is good enough, ... or am I interpreting it wrong?

Since -e makes the difference, is there something about the model guesstimation algorithm?

Re: FLAC v1.4.x Performance Tests

Reply #360 – 2023-09-23 14:43:59

https://hydrogenaud.io/index.php/topic,123025.msg1025264.html#msg1025264
https://hydrogenaud.io/index.php/topic,123025.msg1027285.html#msg1027285
A good thing about flac 1.4 is that the effect of -8e is quite predictable when the input files have a lot of unused spectral spaces, so the first thing to try with simple sine waves is to use -e.

Anyway, the tunings are still based on a very large set of corpus instead of very specific set of test samples, like waveforms from South Pole or 384-channel brainwaves.

Re: FLAC v1.4.x Performance Tests

Reply #361 – 2023-09-23 15:37:42

Yes, I think tuning for specific samples will result in a bad overall tuning.

Re: FLAC v1.4.x Performance Tests

Reply #362 – 2023-09-23 16:05:16

Yeah. The observation that -e improves on certain material does suggest that the model selection algorithm could be improved, but until one can actually capture both these and those signals, it is not a good idea to chase the oddballs.

429428 bytes for the above file with TAK -p4m
420210 bytes for wavpack -hhx6

More sines tested: https://hydrogenaud.io/index.php/topic,122444.0.html

Re: FLAC v1.4.x Performance Tests

Reply #363 – 2023-09-23 22:33:41

A weirdness or two, though. (Got 1.4.3 on this computer and replicated with that.)

-e -l <N> would never pick order = N. At most order = N-1. Or do I misinterpret the flac -a output? (I read the "order=" part, which does not start at 0. order=7 means coefficients enumerated 0 through 6, and was the highest I got out of -l7.)
Anyway, a bunch of .ana files attached.

Also, with -b48000 - so the blocks "nearly repeat", but not exactly - the predictor coefficients do vary quite a lot between the frames. However as a sine should be perfectly replicable with order = 2 - presuming sufficient precision, which I am too lazy to check out - there would be a whole lot of different predictor vectors that would make for equally good prediction.

Re: FLAC v1.4.x Performance Tests

Reply #364 – 2023-09-24 06:49:01

Quote from: Porcus on 2023-09-23 22:33:41

I read the "order=" part, which does not start at 0. order=7 means coefficients enumerated 0 through 6, and was the highest I got out of -l7.

out of -l8. Argh.

Re: FLAC v1.4.x Performance Tests

Reply #365 – 2023-09-24 11:40:34

Quote from: Porcus on 2023-09-23 22:33:41

A weirdness or two, though. (Got 1.4.3 on this computer and replicated with that.)

This is only on the sine waves, right?

Quote

Also, with -b48000 - so the blocks "nearly repeat", but not exactly - the predictor coefficients do vary quite a lot between the frames. However as a sine should be perfectly replicable with order = 2 - presuming sufficient precision, which I am too lazy to check out - there would be a whole lot of different predictor vectors that would make for equally good prediction.

Analyzing a sine wave in the LPC stage gets one step near a singularity (or somesuch, I'm not too familiar with the terminology) and round-off errors get a tremendous influence. That is why optimizing for sine waves doesn't work, why the differences between 1.3.4 and 1.4.3 are so large and why I don't think it is a good idea to spend too much time on this.

Re: FLAC v1.4.x Performance Tests

Reply #366 – 2023-09-24 17:41:33

Wouldn't it be a good idea to have it optimized for sines of standard scales and round ones? 440Hz, 1000Hz etc.

Re: FLAC v1.4.x Performance Tests

Reply #367 – 2023-09-24 20:18:30

Quote from: ktf on 2023-09-24 11:40:34

Quote from: Porcus on 2023-09-23 22:33:41
A weirdness or two, though. (Got 1.4.3 on this computer and replicated with that.)
This is only on the sine waves, right?

Yes. And furthermore, I forgot that you can have non-fixed frames of order 4 or less. Trying those ... -l3, it alternates between order 2 and 3, and -l2 uses order 2.
Anyway since there are several equally good predictor vectors on a sine, I couldn't even call it out as something you can improve upon, but it looked strange: if -el 6 finds the best predictor to be of order 5, you would expect the very same algorithm to select the very same predictor at -el 5 - unless the algorithm is written so that it starts out one step below and then goes up down one and up one, or something like that.

Quote from: ktf on 2023-09-24 11:40:34

Analyzing a sine wave in the LPC stage gets one step near a singularity (or somesuch, I'm not too familiar with the terminology) and round-off errors get a tremendous influence. That is why optimizing for sine waves doesn't work, why the differences between 1.3.4 and 1.4.3 are so large and why I don't think it is a good idea to spend too much time on this.

Yeah, well, sines are special in this sense. Although they are quite easy to model when you know it is a sine, they are hardly the Billboard chartbusters - and those who know they are about to compress sines, can invoke -e ...

But as part of a bigger picture, there might be some information value from the test even if you are not going to chase sines per se. -e still makes a difference with high resolution signals, and some synth too - and if I understand @bennetng right, it appears to be down to signals that have very little content at the top.
So if that is the hypothesis: "The model selection algorithm doesn't work that well for signals with very little content in the top octave-or-so" (or was "octave-or-so" too far off?)
- then this experiment gives support to it.

Re: FLAC v1.4.x Performance Tests

Reply #368 – 2023-09-24 20:46:52

Quote from: rutra80 on 2023-09-24 17:41:33

Wouldn't it be a good idea to have it optimized for sines of standard scales and round ones? 440Hz, 1000Hz etc.

The thing about a signal that is a single sine, is that you can predict them by two coefficients of which one is = -1.
You have a predictor x(N+1) = x(N) * 2 cos q - x(N-1) and with q fit to pi* 2f/F where f is the sine's frequency and F (>2f) is the sampling frequency, then you are all good - provided that the format can offer high enough precision for that 2 cos q.

So if you want a flac encoder that beats the official reference on sines and does nothing else worse, then - with reservation for that precision - you can do that by including a special algorithm for the order two predictor, and spending extra effort on every signal doing that number-crunching only to ditch it because the signal isn't close to a sine.
I have absolutely no idea how much it would slow down the encoder.

(Actually when I first started playing around with codec performance, I was indeed surprised at how bad codecs compress sines - I'd have this hunch that if you started to fiddle around with lossless compression, one would try to get those signals going first. But there are reasons why the engineering approaches have rather targetted real-world applications where data perfectly suiting the models are just not going to show up.)

Re: FLAC v1.4.x Performance Tests

Reply #369 – 2023-09-25 08:00:27

Instead of changing the source code someone can simply write a "tips and tricks" guide or something similar. For example, video codecs also have guides for optimizing some specific contents (e.g. anime).

I think one can see that tones can be generated at arbitrary frequencies, and can be multi-tone, can be sweep etc, this will add a lot of complications. Also, if one has to use --lax to get close to subset -e performance it is also not very practical too.

For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.

So, something like -8b8192 -A "subdivide_tukey(3);blackman" may often help CDDA > 24/192 clean upsampling without a lot of trial and error or subdividing too many tukeys or doing some difficult gausswork, or increasing other symmetric parameters which may further slowdown decoding.

Re: FLAC v1.4.x Performance Tests

Reply #370 – 2023-09-25 09:11:35

Quote from: bennetng on 2023-09-25 08:00:27

For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.

Yeah, "often" - I didn't find it too predictable though ...

If one wants to enhance the encoder, then one could make it accept a -b"2048,4096,8192,16384" to do four encodes and pick the smallest file. (That's constant block size - variable is a way to go to implement, and besides you might for compatibility want to stick to constant.) I have no idea if such a construct would be effectively multithread-able.

What I wonder though: say you got a CDDA signal that is fairly well compressed using an 8-order predictor. Now upsample it to 88.2. Wouldn't that suggest that you could try to "compress as undersampled" and optimize coefficients on N-2, N-4, ..., N-14, and pin N-1, N-3, N-<odd> to zero?

Also, one could think up some smarter "-M-alike" thing to get out the larger parts of the "-pemr8 -A <tonsofthem>" gain for cheap. Say, brute-force some frames at "smart intervals" based on the performance of the previous brute-force'd choice and the encoder's guesstimate.

... but, someone's got to do it - and neither CPU nor storage is as expensive as when the codecs were new.
To all you computer science professors reading this - wouldn't there be some fun student projects from all this?

(Hm, "all" = "none" I guess.)

Re: FLAC v1.4.x Performance Tests

Reply #371 – 2023-09-25 11:17:07

Quote from: Porcus on 2023-09-25 09:11:35

Quote from: bennetng on 2023-09-25 08:00:27
For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.
Yeah, "often" - I didn't find it too predictable though ...

Let's compared it to "-8" alone, and convert your 38 CDs to 24/192 using SoX best quality with -2dB gain reduction to reduce chance of clipping. Use -8b8192 -A "subdivide_tukey(3);blackman"

Re: FLAC v1.4.x Performance Tests

Reply #372 – 2023-09-25 13:10:56

Quote from: bennetng on 2023-09-25 11:17:07

Quote from: Porcus on 2023-09-25 09:11:35
Quote from: bennetng on 2023-09-25 08:00:27
For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.
Yeah, "often" - I didn't find it too predictable though ...
Let's compared it to "-8" alone, and convert your 38 CDs to 24/192 using SoX best quality with -2dB gain reduction to reduce chance of clipping. Use -8b8192 -A "subdivide_tukey(3);blackman"

SoX best quality ... It's going to take the day to work itself up the gigabytes, so I hope this that I copied from someone out at the 'net is good enough?
rate -v -b 95.4 -b 45 -a <samplerate>
and with -b 24 and -v 0.666 (two dB was not enough, and -v 0.7 wasn't either)

Edit:
Anyway, a "controlled" experiment unlike if I obtain high resolution files from label/artist with who knows what software and source file.

Re: FLAC v1.4.x Performance Tests

Reply #373 – 2023-09-25 13:37:45

So I tried it. The total duration of the two attached playlists is 41h 39m 06s. The playlists contain single tracks and images, so not every single track name is shown.

-8
38.1 GB (41002782313 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
37.7 GB (40507157807 bytes)

5 files out of 306 still clipped with -2dB gain. Yes, if one keeps throwing Merzbow to the chain that requires like -10dB it would harm the stats, so don't do that. Clipping should be controlled to as low as possible, or not at all (e.g. by using RG max non-clip gain).

For simplicity and speed, I did all the conversions within foobar and foo_dsp_resampler.

Re: FLAC v1.4.x Performance Tests

Reply #374 – 2023-09-25 18:56:26

Same as above but using RetroArch "Higher" quality, 16/192. same -2dB gain without dither. 3 out of 306 files clipped.

-8
20.2 GB (21735937324 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
20.0 GB (21563020744 bytes)

-8e
19.8 GB (21326877503 bytes)

Notice