Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLAC v1.4.x Performance Tests (Read 73220 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #25
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)

Then read the entire post:

I did that (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

So if you care more about speed than compression, just lower the preset and you will get both better and faster compression.

And if you only care about speed and energy usage while the size doesn't matter at all, don't compress at all - just use WAV. Can't beat that.
FLAC seems pretty energy efficient and quite fast to decode compared to something like Monkey's Audio which just eats energy and is sloooooooooow.

Encoding is another story but if your stuff is already encoded, you could just leave it be.  Even on a UPS it has no major effect on battery life and that's running an Icecast server and yes I had a power outage while using Icecast and running a bunch of other things as well as FLAC encoding.

Re: FLAC v1.4.x Performance Tests

Reply #26
Nothing in this thread tells us about energy efficiency. Something slower may use less joules to do it's job than something that finishes faster.

Re: FLAC v1.4.x Performance Tests

Reply #27
For a given executable with more CPU intensive setting, I would not be surprised if it translates quite well. (As encoding goes - decoding is a different matter, and for FLAC a quite irrelevant one.)
But if you have a more modern CPU, a TDP of 15 watts is quite common, which amounts to some 2.52 kWh if run for a whole week.

Of course, GB cost is only relevant when your drive is getting full. Implying, as Chibisteven points out, leave encodes as they are.

Re: FLAC v1.4.x Performance Tests

Reply #28
I wanted to test the effect on lossyWAV encodes. So I lazily picked thirteen songs from various artists in my collection (honestly, there really isn't a whole lot of variety here) and tested it as follows:

  • The parameters for non-lossy encodes for both 1.3.4 and 1.4.1 were -8p.
  • The parameters for lossyWAV were -q S.
  • The parameters for lossy encodes for both 1.3.4 and 1.4.1 were -8p -b 512 --keep-foreign-metadata.

Here are the results:

Format
Size (bytes)
Percentage
wav
534185270
100%
1.3.4
371429142
69.532%
1.4.1
370814508
69.417%
lossy-1.3.4
154850676
28.988%
lossy-1.4.1
154596798
28.941%

More or less what I expected. Someone with a better / more varied testing playlist should probably redo the test, though.

Re: FLAC v1.4.x Performance Tests

Reply #29
Why -6 does so bad - and why keep it as it is:
(i.e. what part of the settings)

Compared to -5, -6 goes up from -r5 to -r6 and tries another set of windowing functions.
But prediction order stays at 8.
Going up to -7 increases the prediction order to 12. It spends only a tiny bit more time, and compresses quite a lot better. See charts with nearly-the-same-as 1.4 at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 , 3rd diagram.

Question: why is the difference to -5 small and the difference to -7 large?
* It is not the -r5 to -r6. In the 38 CD corpus in my signature, -5 -r6 improved 0.0044 percent.
* It is the prediction order. Trying -5 -l10 and -5 -l12 ditches the two other measures (-r and windowing function) and increases the prediction order. The former nearly catches -6 at size, the latter overtakes it. Both encode significantly faster on both Intel and Ryzen.

Why keep -6 then, when you can compress better at shorter time?
Decoding CPU footprint.
FLAC already decodes faster than pretty much anything, so is that necessary? ... well, FLAC has some "special olympics" settings: -0 and -3 are dual mono, -0 to -2 are fixed predictor. Starting at -4 you have the 8th order predictor and stereo decorrelation (adaptive for -4, brute-force'd at higher). Look at the decoding chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf (and remember it starts at -0 not at -1): -7 and -8 take - percentwise - more CPU decoding, and that is because the -l12. Even for a "fastest there is", there is a reason for a couple of "fastest among the fastest".

So there you go, a special purpose setting for those who want something that, decoding-wise "cannot be distinguished from the default" (not completely true? There is a difference, the -r ...).
A beefed-up default is just ... nothing wrong with having one such for those inclined. Keep it. The rest of us ... don't use it.

Re: FLAC v1.4.x Performance Tests

Reply #30
Look at the decoding chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf (and remember it starts at -0 not at -1): -7 and -8 take - percentwise - more CPU decoding, and that is because the -l12. Even for a "fastest there is", there is a reason for a couple of "fastest among the fastest".
Looks like the graph suggests that -3 should have the fastest decoding speed and -7 or -8 will be slowest. I zoomed the pdf plot a lot and see that the triangle mark in -3 is still sitting at 0% decoding CPU time. Anyway here are my results.

Test settings:
Code: [Select]
System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.12
Settings:
  High priority: yes
  Buffer entire file into memory: yes
  Warm-up: yes
  Passes: 5
  Threads: 1
  Postprocessing: none

-3
Code: [Select]
Stats by codec:
  FLAC: 1506.230x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:15.008
  Speed (x realtime): 1506.230

-8
Code: [Select]
Stats by codec:
  FLAC: 1501.024x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:15.060
  Speed (x realtime): 1501.024

.wav as a reference
Code: [Select]
Stats by codec:
  PCM: 59556.250x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.000
  Decoding time: 0:00.379
  Speed (x realtime): 59556.250

The benchmark is very sensitive, for example, if I use RAM drive and set "Buffer entire file into memory" to "no":
Code: [Select]
Stats by codec:
  PCM: 31524.633x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:00.717
  Speed (x realtime): 31524.633

Re: FLAC v1.4.x Performance Tests

Reply #31
Decoding speed varies from album to album, but relative differences among 0-8 are similar. Pay attention to -3. The files are all encoded with flac 1.4.1

MA Recordings - Rediscovered Memories
https://www.discogs.com/release/7300636-Various-Rediscovered-Memories
0 (x realtime): 2251.000 min, 2271.072 max, 2261.496 average
1 (x realtime): 2190.539 min, 2195.430 max, 2193.093 average
2 (x realtime): 2180.179 min, 2197.860 max, 2187.784 average
3 (x realtime): 1606.081 min, 1608.317 max, 1606.900 average
4 (x realtime): 1625.529 min, 1627.217 max, 1626.215 average
5 (x realtime): 1621.599 min, 1625.611 max, 1623.098 average
6 (x realtime): 1587.640 min, 1590.859 max, 1589.449 average
7 (x realtime): 1536.414 min, 1539.106 max, 1538.185 average
8 (x realtime): 1534.826 min, 1536.911 max, 1535.957 average

FINAL FANTASY XIII Original Soundtrack, Disc 4
https://vgmdb.net/album/15980
0 (x realtime): 2144.344 min, 2153.433 max, 2149.787 average
1 (x realtime): 2043.644 min, 2054.518 max, 2048.156 average
2 (x realtime): 2024.009 min, 2040.479 max, 2033.885 average
3 (x realtime): 1506.466 min, 1509.097 max, 1508.142 average
4 (x realtime): 1556.885 min, 1559.473 max, 1558.123 average
5 (x realtime): 1564.512 min, 1565.964 max, 1565.423 average
6 (x realtime): 1558.401 min, 1560.904 max, 1559.655 average
7 (x realtime): 1499.939 min, 1500.600 max, 1500.309 average
8 (x realtime): 1499.042 min, 1502.150 max, 1500.238 average

黃耀明 - 信望愛
https://music.apple.com/hk/album/%E4%BF%A1%E6%9C%9B%E6%84%9B/1356525393
0 (x realtime): 2145.599 min, 2162.814 max, 2156.138 average
1 (x realtime): 2045.117 min, 2054.068 max, 2048.449 average
2 (x realtime): 2036.280 min, 2054.765 max, 2046.030 average
3 (x realtime): 1475.368 min, 1477.189 max, 1476.618 average
4 (x realtime): 1528.601 min, 1532.385 max, 1529.803 average
5 (x realtime): 1544.374 min, 1546.838 max, 1545.812 average
6 (x realtime): 1527.229 min, 1529.417 max, 1528.687 average
7 (x realtime): 1422.072 min, 1423.042 max, 1422.549 average
8 (x realtime): 1423.491 min, 1424.172 max, 1423.799 average

Full reports attached.

Re: FLAC v1.4.x Performance Tests

Reply #32
@ktf: When you wrote that -0 to -2 got faster, does that apply to decoding as well? If the improved algorithm is "reversible into an improved decompression algorithm", then that would explain. (Edit: oh I could have set up a RAM disk myself ... maybe)

Looks like the graph suggests that -3 should have the fastest decoding speed and -7 or -8 will be slowest. I zoomed the pdf plot a lot and see that the triangle mark in -3 is still sitting at 0% decoding CPU time.
It looks like the left and right borders by construction go through the fastest and slowest point, but even then: the scale is logarithmic, so there is no 0. The next (or "previous") step left of 0.5% would be 0.4%.
-3 appears to be sitting at "point fortysomething".

 

Re: FLAC v1.4.x Performance Tests

Reply #33
@ktf: When you wrote that -0 to -2 got faster, does that apply to decoding as well?
Nope. The two things are fundamentally different problems, improvements to one are rarely applicable to the other.

My most plausible explanation here is simply differences between CPUs. Perhaps newer CPUs can better precondition the code used for decoding fixed subframes. Branch prediction and other front-end wizardry are a major factor in the performance of today's CPUs.

edit: I've always been baffled by -3 being decoded faster than -2. The decoding results here make much more sense. In the past, @Porcus found that blocksize has a profound influence on decoding speed. Might be related to training of branch prediction. It might be that newer CPUs have branch prediction with a larger capacity, in which this training only has to happen once every file instead of once every block. In that case, influence of block size might be much smaller.
Music: sounds arranged such that they construct feelings.

Re: FLAC v1.4.x Performance Tests

Reply #34
My most plausible explanation here is simply differences between CPUs. Perhaps newer CPUs can better precondition the code used for decoding fixed subframes. Branch prediction and other front-end wizardry are a major factor in the performance of today's CPUs.
Just tested on a super old and low-end Lumia 520 with LineageOS and foobar2000 APK. Concatenated the first track of the three albums mentioned in Reply #31. 11m53s single file encoded from 0 to 8. I tried to use the first two tracks for each album but then foobar crashed during test, probably not enough RAM (only 512MB).
X

foobar's console shows a floppy disk icon which is supposed to save the output to text files, I tried but it can't save anything, so here are the screenshots. The speed transition seems smoother.

0
X

1
X

2
X

3
X

4
X

5
X

6
X

7
X

8
X

Re: FLAC v1.4.x Performance Tests

Reply #35
An AMD thing then?
* From http://www.audiograaf.nl/losslesstest/ I see that @ktf used an AMD on edition 3, 4, 5, but an Intel on edition 1, 2.
* In editions 3ff, -3 decodes fastest
* In edition 2, I see from http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%202.pdf figure 2.2 (using 2.1 to verify that -3 is the one on the 58 percent mark) that 0, 1, 2 decoded faster.

I think presets 0, 1, 2, 3 (and 4, 5) were synonyms for the same thing then as now.

Re: FLAC v1.4.x Performance Tests

Reply #36
GCC 12.2.0 john33 vs Case, plus Xiph build:
Zero Wing OST, VGM format rendered by foo_gep at 16/44
https://www.youtube.com/playlist?list=PLPAbo-cOSKYzgI65IOneAq_oTzOjI6k4F

case -8p
Total encoding time: 0:14.562, 113.62x realtime

case -8p -b2304
Total encoding time: 0:17.172, 96.35x realtime

john33 -8p
Total encoding time: 0:14.890, 111.12x realtime

john33 -8p -b2304
Total encoding time: 0:17.297, 95.65x realtime

xiph -8p
Total encoding time: 0:14.969, 110.53x realtime

xiph -8p -b2304
Total encoding time: 0:17.203, 96.18x realtime

The OST is not very long (27m53s) so others can do a longer test, perhaps with some temperature tests too.


File size:
Code: [Select]
wav         291,876,908   100.000%
-8p         164,386,268   56.3204%
-8p -b2304  163,492,544   56.0142%

Decoding speed:
Code: [Select]
System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.12
Settings:
  High priority: yes
  Buffer entire file into memory: yes
  Warm-up: yes
  Passes: 5
  Threads: 1
  Postprocessing: none

-8p
Speed (x realtime): 1430.445 min, 1433.292 max, 1432.189 average

-8p -b2304
Speed (x realtime): 1467.472 min, 1469.977 max, 1468.835 average

Re: FLAC v1.4.x Performance Tests

Reply #37
Interesting, thanks. I just posted a clang compile that seems a little faster on my system, but not with any extensive validation done speed-wise.

Re: FLAC v1.4.x Performance Tests

Reply #38
Zero Wing OST
-8p
Total encoding time: 0:19.640, 84.24x realtime
-8p -b2304
Total encoding time: 0:22.609, 73.18x realtime

So slower, but previous Clang builds are also slower on my machine, as well as other members.

Re: FLAC v1.4.x Performance Tests

Reply #39
OK, thanks.

Re: FLAC v1.4.x Performance Tests

Reply #40
...so, after all tests, which build (and encoding parameters) I have to use on my new i5-12600 for BOTH speed and size ?
Hybrid Multimedia Production Suite will be a platform-indipendent open source suite for advanced audio/video contents production.
Official git: https://www.forart.it/HyMPS/

Re: FLAC v1.4.x Performance Tests

Reply #41
for BOTH speed and size ?

Use -0, that gives you max speed and max size.  ;D

Oh, you wanted small size? On a more serious note, then:
You need to take some kind of stand on how much CPU time it is worth to save a mega-/gigabyte. Compared to fifteen years ago, the extra time spent on doing -7 instead of -5, is much lower - but the space saved is much less costly.
Also if you have a spinning drive (people do have that for larger collections ...) - or even more, a NAS - then I/O will take out a lot of the speed, so that might not be a concern unless you are considering -8p.

Generally the "most economic" suggestion is too not recompress until you have to: leave your FLAC files as they are, and only when your drive is closing in on full, recompress using a quite heavy setting. If you are on a spinning drive, that is sure as hell to get you fragmentation though - in case that matters anymore.


Edit: for new files, you can of course choose whatever, but you cannot tell flac.exe to "recompress those which were compressed with -5 and leave the -8p alone" - it doesn't know that. If on the other hand you use foobar2000 to recompress, you can filter on those not created by 1.4.x. But that does not recompress in-place. What I would do, would be to (1) run a verification on everything to make sure no FLAC file is corrupted, (2) make sure the backup is sync'ed, (3) foo_audiomd5 writes an actual audio md5 to a file that can be checked afterwards, (4) flac -f to recompress the whole thing in-place, (5) verify against foo_audiomd5's files.

Re: FLAC v1.4.x Performance Tests

Reply #42
Mostly academic but try Merzbow Pulse Demon / Venereology with -l 12 -b 512 -r 8 without windowing.

Re: FLAC v1.4.x Performance Tests

Reply #43
Settings above -8, anyone?

Come on, there are a few of you out there. Care to run an overnight job on at least a handful of CDs? There are a few settings that might be better or worse return on CPU time. I'll explain the choices at the end, at the risk of posting spoilers.


* I've not been so concerned about single songs, so I've been converting a whole image to .wav, removing all tags if they were carried over, getting them into the same folder in an SSD, and running the second script. But if you use the first one, you will not get the logfile spammed down with long dir listings.
* There are elegant PowerShell solutions, but I'm too stuck in cmd and bash. So for timing (on Windows), I'm using timer64 off the https://sourceforge.net/projects/sevenmax/ package.  Syntax (will overwrite flac files) that I used, e.g.
   <directory-to-timer64>\timer64.exe flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
and then du or dir.
With timer64.exe, flac.exe and the .wav files in the same directory, I would run the following (let's keep it simple, no FOR loop that makes for the single/double percentage sign issue - should preferably be run a couple of times as timings are not ... exact, but then it might take quite a while).
Copy, put in a flactest.bat file in the same directory as the timer64.exe, flac.exe and wave. (Or modify accordingly.)
Open a cmd window, cd to the directory, and run .\flactest.bat - or you can double click it. 

Code: [Select]
.\flac -3pf *.wav 
.\timer64.exe .\flac -7pf  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -r7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -r8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(5)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(5)" *.wav >> logfile.log
du *.flac >> logfile.log
The last one is going to be slow ... but in my testing, more than 3x as fast as the infamous -8ep.

Afterwards, you will get a logfile.log with for each setting, timing and total .flac file size in order. I think the "Global time =" figure is the most interesting.

If you are interested in how each file compresses, that is, the individual .flac sizes and not only the aggregate, you can replace the du command by the old-fashioned DOS command dir and run this instead:

Code: [Select]
.\flac -3pf *.wav 
.\timer64.exe .\flac -7pf  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -r7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -r8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(5)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(5)" *.wav >> logfile.log
dir *.flac >> logfile.log

Then you will get tons of text out. Most interesting here I guess is still the total at the bottom of each dir.


So why these choices? Apart from the -3p that is there just to heat up the CPU a little in order not to let the -7p have an advantage of no-throttling?
-7 is synonymous to -l 12 -b 4096 -m -r 6 -A subdivide_tukey(2) and -8 to same but "(3)" at the end. Next step would be (4) then? But at some stage, the "-p" will be more worth it. When? That's why the initial -7p too; I expect it not to be worth it, but I see some surprises on some material.
Also there is this thing about -r8, which people often use to squeeze the most out of the files - expected is that -r7 offers better value for money, of course, but how much? When does it pay off to nudge up that parameter?
Before going to subdivide_tukey(4), there are a few stranger ones. Like -8 -A "tukey(5e-1);subdivide_tukey(3)"; isn't there already a single tukey function in subdivide_tukey(3)? Yes, but that is steeply tapered: closer to a rectangle than to the tukey(onehalf) that is the -5 default. So, trying that instead of the subdivide_tukey(4), hoping to get catch more compression cheaper?
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)": Here I'm making them more different, to see if that helps. (666e-3 = "0.666" or "0,666" depending on locale - don't use locale-dependent code, use this).
Then there is -p , and combinations.


Re: FLAC v1.4.x Performance Tests

Reply #45
I did a little test with different builds posted here in the forum and my AMD Ryzen 5 3600X under Windows 11.
The file used, is a almost 2 hours long DJ set, of electronic music.
The fastest build was flac-1.4.1-win64-znver3 (Case).

Code: [Select]
Codec      :     PCM (WAV)
Duration   :     57:21:749
Sample rate:     48000 Hz
Channels   :     2
Bits per sample: 16

Igor Pavlov's timer64 have been used to measure the time.

Code: [Select]
timer64.exe flac -8p

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
Global  Time =    53.220
wrote 425812103 bytes, ratio=0,644

flac-1.4.1-x64-znver2-GCC1220 (john33)
Code: [Select]
Global  Time =    53.621
wrote 425812103 bytes, ratio=0,644

flac-1.4.1-win64-gcc12 (Case)
Code: [Select]
Global  Time =    56.626
wrote 425812106 bytes, ratio=0,644

FLAC-1.4.1_Win64_GCC122 (NetRanger)
Code: [Select]
Global  Time =    59.990
wrote 425812106 bytes, ratio=0,644

flac-1.4.1-x64-AVX2 (john33)
Code: [Select]
Global  Time =    73.045
wrote 425812100 bytes, ratio=0,644

flac-1.4.1-x64-AVX2-clang1500
Code: [Select]
Global  Time =    78.772
wrote 425812103 bytes, ratio=0,644

FLAC-1.4.1_Win64_Intel 19.2 (rarewares)
Code: [Select]
Global  Time =    79.738
wrote 425812100 bytes, ratio=0,644

FLAC-1.4.1_Win64_CLANG15 (NetRanger)
Code: [Select]
Global  Time =   100.662
wrote 425812106 bytes, ratio=0,644

Re: FLAC v1.4.x Performance Tests

Reply #46
Nearly a factor of two, that is quite a lot.

(But you didn't test the official build? Also, you say nearly two hours, but it says 57 minutes?)

Re: FLAC v1.4.x Performance Tests

Reply #47
(But you didn't test the official build? Also, you say nearly two hours, but it says 57 minutes?)

Oops yes it's 57 minutes not two hours. My mistake.

Official build from Xiph against flac-1.4.1-win64-znver3 (Case).

Official Xiph
Code: [Select]
Global  Time =    53.937
wrote 425812106 bytes, ratio=0,644

win64-znver3 (Case)
Code: [Select]
Global  Time =    51.475
wrote 425812103 bytes, ratio=0,644

Re: FLAC v1.4.x Performance Tests

Reply #48
Testing the Acer (Ryzen-equipped) laptop, as that has more consistent timings (two fan settings, on + off?) - Intel considerations at the end;

... with the following:

Settings above -8, anyone?
[...]
So why these choices? Apart from the -3p that is there just to heat up the CPU a little in order not to let the -7p have an advantage of no-throttling?
-7 is synonymous to -l 12 -b 4096 -m -r 6 -A subdivide_tukey(2) and -8 to same but "(3)" at the end. Next step would be (4) then? But at some stage, the "-p" will be more worth it. When? That's why the initial -7p too; I expect it not to be worth it, but I see some surprises on some material.
Also there is this thing about -r8, which people often use to squeeze the most out of the files - expected is that -r7 offers better value for money, of course, but how much? When does it pay off to nudge up that parameter?
Before going to subdivide_tukey(4), there are a few stranger ones. Like -8 -A "tukey(5e-1);subdivide_tukey(3)"; isn't there already a single tukey function in subdivide_tukey(3)? Yes, but that is steeply tapered: closer to a rectangle than to the tukey(onehalf) that is the -5 default. So, trying that instead of the subdivide_tukey(4), hoping to get catch more compression cheaper?
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)": Here I'm making them more different, to see if that helps. (666e-3 = "0.666" or "0,666" depending on locale - don't use locale-dependent code, use this).
Then there is -p , and combinations.

The following was done on all the 38 CDs with no attempt at checking differences between musical genres. I did note though, that -7p sizes varied from larger than -8 to smaller than -8 -A subdivide_tukey(4), but if you are willing to wait for -7p you might as well select something better.

First observations:
* -r7 isn't ruled out at first observation, only later it turns out not to be worth it on the Ryzen - but maybe it is on the Intel, see below. However -r8 offers very little over -r7. 1/15ths of the size improvement per second CPU on the Ryzen, and not worth it on the Intel either, at least not until you slow it down considerably by the apodization functions. So in the following, I ditch all the -r8.
* -A "tukey(666e-3);subdivide_tukey(3/333e-3)" improved over -A  "tukey(5e-1);subdivide_tukey(3)" at about the same time - YMMV. I remove the latter, as it is only going to make numbers look weird. Maybe I should rather have removed the "customized" one?!
* -7p is not worth it. Better go up to -8 -A subdivide_tukey(4 or 5)

So with those deletions, I ordered by compressed size, and calculated: how many bytes can I save per second it costs moving up one step?
* -r7: not that much - it was cheaper to add another apodization function. Only tried up to subdivide_tukey(5).
* -p becomes worth it around at the subdivide_tukey(5) point: if you consider going up from -8 -A subdivide_tukey(4) to 5, consider another doubling of encoding time to -8p instead, as that pays off nearly the same per extra second;
* ... but, in this test, the extra tukey also is worth about the same.

Some numbers after more deletions - but kept a doubtful one, to be explained below the table:
Sizesecondssaved per secondsetting
11969604531 833    -8
11968502388 94010300 -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
1196755657511554399 -8 -A "subdivide_tukey(4)"
1196646337115552733 -8 -A "subdivide_tukey(5)"
1196129143330033572 -8p (note jump in time when using -p)
1196017971933503204 -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
119592501645131522 -8p -A "subdivide_tukey(4)"
119581254247796422 -8p -A "subdivide_tukey(5)"
The "saved per second" means: if you go from the previous setting to this, it is going to take more seconds, (e.g. 107 extra from -8 to the next), but how many bytes do you save per second?
Now the fact that the "2733" is less than the next "3572" indicates you should not use the -8 -A subdivide_tukey(5) - if you think those 400 seconds are worth the savings, you should rather spend even more seconds going to -8p. Deleting that row, the "3572" will also changes - to 3390 - because it is relative to the "previous" row which is now another one.  But YMMV here, it surely depends on material and hardware and maybe even whether you use a different of the compiles posted here.

For comparison, adding a "-r 7" to the -8p -A "subdivide_tukey(4)" line would save 132 bytes per extra second, so you could as well go to  -8p -A "subdivide_tukey(5)"; and, going from -r7 to -r8 makes for only 11.


Evidence from the Ryzen then:
* If you want something heavier than -8, but are not willing to wait for -8p, try another tukey as described, or for simplicity, upping the game to -A subdivide_tukey(4).
* If you want something heavier than -8p, try the same thing
* If you are willing to wait for something that takes >10x as long as -8 (>3x as long as -8p), then I haven't checked much. Maybe at this stage you could consider -r7 - maybe -r7 is worth it "even earlier" on the Intel (see next).



Over to the Intel-equipped Dell laptop, I found something similar, except:
* timings vary too much to be reliable "at every line", but applying the rule of "when in doubt, delete one that the Ryzen results suggest I delete" I end up with something that by and large appears to give the same expression - except possibly -r7. And with the reservation for unreliable timings taken:
* -r7 doesn't seem that worthless as on the Ryzen. Actually you can consider -8p  A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r7 rather than going to subdivide_tukey(4) or at least against (5).  Or maybe for short is you don't feel like typing wrong: -8p -r7 might be worth it. YMMV.


Oh, and for reference, compared to WavPack's -x4 vs -x
* Going up to -8p is about like going -hx to -hx4 with WavPack in terms of gains per second. (-hx4 is considered very slow, but the gains are higher; -8p is not that slow, but saves about proportionally less).
* From -8p to -8p  A "tukey(666e-3);subdivide_tukey(3/333e-3)" is about like going -hhx to -hhx4.