HydrogenAudio

Lossless Audio Compression => FLAC => Topic started by: bennetng on 2022-09-22 18:21:22

Title: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-22 18:21:22
https://vgmdb.net/album/5066
https://www.youtube.com/playlist?list=PLyxqUxT0goirVyF6FILtXMKpcQfOwCC5d
100.00% 696,947,036 Einhander.wav
65.900% 459,284,936 flac141 -8p.flac
65.817% 458,712,584 flac141 -8p -b2304.flac

CUETools.Flake 2.2.2 failed to encode with --vbr, but now I can predict what kinds for files can benefit from changing the default block size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 11:39:56
At Netranger's request, here's a new topic for v1.4.x testing...

A short test of Case's 1.4.1 build on my Intel Core i7-7700 CPU @ 3.60GHz:
(NB. Compared to this test (https://hydrogenaud.io/index.php/topic,122949.msg1015601.html#msg1015601), I changed the test procedure (added --silent to flac options) -> little speed increase and less variations between the different test runs)
Code: [Select]
FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 29.407 seconds, Encoding speed = 367.67x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141_Case.exe -7 (844800 Bytes)
- Average time  = 29.780 seconds, Encoding speed = 363.06x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 45.762 seconds, Encoding speed = 236.27x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141_Case.exe -8 (844800 Bytes)
- Average time  = 46.053 seconds, Encoding speed = 234.77x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)
So I can't confirm ktf's assumption (https://hydrogenaud.io/index.php/topic,123014.msg1016154.html#msg1016154) (Binary might be an unmeasurable tiny amount faster because the binary is slightly smaller).
At least as far as Case's builds are concerned.

Meanwhile I've upgraded my main computer to the next CPU generation: Intel Core i7-8700 CPU @ 3.20GHz
Except being a little faster, the results are comparable:
Code: [Select]
FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 27.269 seconds, Encoding speed = 396.50x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac140_Case.exe -8 (852992 Bytes)
- Average time  = 42.676 seconds, Encoding speed = 253.35x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141_Case.exe -7 (844800 Bytes)
- Average time  = 27.487 seconds, Encoding speed = 393.36x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141_Case.exe -8 (844800 Bytes)
- Average time  = 42.478 seconds, Encoding speed = 254.53x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

Title: Re: FLAC v1.4.x Performance Tests
Post by: .halverhahn on 2022-09-23 12:44:20
One more time of my performance test of different flavors of FLAC 1.4.1 (Win64) on my Win10, i7-1185G7 Laptop.
Using a 16bit/44.1khz WAV file, size: 2.710.211.996 byte, Flac @ -8

Long story short:

flac141case-hashwell: ~56.5s
flac141xiph: ~60.8s
flac141rarewares-avx2: ~57.6s

Code: [Select]
PS C:\temp\FLAC141> Measure-Command { .\flac141case-hashwell.exe -8 image.wav -f -o image.flac.141case-hashwell.flac | Out-Default }

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

image.wav: wrote 1619924066 bytes, ratio=0,598


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 56
Milliseconds      : 506
Ticks             : 565069945
TotalDays         : 0,000654016140046296
TotalHours        : 0,0156963873611111
TotalMinutes      : 0,941783241666667
TotalSeconds      : 56,5069945
TotalMilliseconds : 56506,9945



PS C:\temp\FLAC141> Measure-Command { .\flac141xiph.exe -8 image.wav -f -o image.flac.141xiph.flac | Out-Default }

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

image.wav: wrote 1619924066 bytes, ratio=0,598


Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 0
Milliseconds      : 813
Ticks             : 608136756
TotalDays         : 0,000703861986111111
TotalHours        : 0,0168926876666667
TotalMinutes      : 1,01356126
TotalSeconds      : 60,8136756
TotalMilliseconds : 60813,6756



PS C:\temp\FLAC141> Measure-Command { .\flac141rarewares-avx2.exe -8 image.wav -f -o image.flac.141rarewares-avx2.flac | Out-Default }

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

image.wav: wrote 1619924072 bytes, ratio=0,598


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 57
Milliseconds      : 670
Ticks             : 576700994
TotalDays         : 0,000667478002314815
TotalHours        : 0,0160194720555556
TotalMinutes      : 0,961168323333333
TotalSeconds      : 57,6700994
TotalMilliseconds : 57670,0994



PS C:\temp\FLAC141>
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-23 13:29:37
i3-12100, single wav CDDA image, all tested in RAM drive:

Case -8
Total encoding time: 0:17.078, 264.75x realtime
Case -7p
Total encoding time: 0:23.672, 191.00x realtime

john33 avx2 -8
Total encoding time: 0:18.203, 248.39x realtime
john33 avx2 -7p
Total encoding time: 0:28.234, 160.14x realtime

xiph -8
Total encoding time: 0:17.734, 254.95x realtime
xiph -7p
Total encoding time: 0:22.922, 197.25x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-09-23 15:14:01
Your plain -8 numbers are pretty similar to my Ryzen 5900x.
Still for -8 -p the official xiph binaries are clearly the fastest.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-23 15:40:03
Yes, xiph is the fastest on 8p for me as well:

xiph:
Total encoding time: 0:45.406, 99.57x realtime
Case:
Total encoding time: 0:50.281, 89.92x realtime
john33 avx2:
Total encoding time: 1:03.641, 71.04x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 17:07:34
And here's another round of tests with -7 (Intel Core i7-8700 CPU @ 3.20GHz):
Code: [Select]
FLAC Binary: xiph-140\flac.exe -7 (328704 Bytes)
- Average time  = 28.104 seconds (10 rounds), Encoding speed = 384.72x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: xiph-141\flac.exe -7 (299520 Bytes)
- Average time  = 27.900 seconds (10 rounds), Encoding speed = 387.53x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 27.559 seconds (10 rounds), Encoding speed = 392.32x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141_Case.exe -7 (844800 Bytes)
- Average time  = 27.707 seconds (10 rounds), Encoding speed = 390.22x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141-john-avx2.exe -7 (1212928 Bytes)
- Average time  = 30.217 seconds (10 rounds), Encoding speed = 357.81x
- FLAC file size = 1.167.014.370 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141-case-haswell.exe -7 (860160 Bytes)
- Average time  = 25.416 seconds (10 rounds), Encoding speed = 425.40x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141-case-gcc12.exe -7 (781312 Bytes)
- Average time  = 26.174 seconds (10 rounds), Encoding speed = 413.08x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)
tldr;
With the official xiph binary ktf is right: 141 is a bit faster than 140.
Case's binaries still outperform xiph on my setup, but the difference is small (see note below).
Case's Haswell build is the fastest on my setup at -7. I will re-run my tests with -8.

NB. I realized that in my test here (https://hydrogenaud.io/index.php/topic,122949.msg1015601.html#msg1015601) I mislabeled the flac140.exe binary. It was NOT the official binary from xiph! Sorry for that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-23 17:14:25
Here's another 1.4.1 compile with newer GCC (12.2.0). Over here it's faster than Xiph build even with -p.
Yes. Both -8 and -8p are improved. Same test as my previous posts:
-8
Total encoding time: 0:16.547, 273.24x realtime
-8p
Total encoding time: 0:44.312, 102.03x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 17:32:09
Code: [Select]
FLAC Binary: xiph-141\flac.exe (299520 Bytes)
- Average time  = 41.038 seconds (5 rounds), Encoding speed = 263.46x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
- Average time  = 38.473 seconds (5 rounds), Encoding speed = 281.03x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141-john-avx2.exe (1212928 Bytes)
- Average time  = 46.881 seconds (5 rounds), Encoding speed = 230.63x
- FLAC file size = 1.166.206.863 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141-case-gcc12.exe (781312 Bytes)
- Average time  = 39.323 seconds (5 rounds), Encoding speed = 274.95x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

Speed ranking with -8 here: Case-Haswell -> Case-GCC12 -> Xiph -> John-AVX2
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-09-23 17:43:01
Case and Xiph is ~102x for -8 -p on the Ryzen 5900x, nice.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 18:25:13
And finally with -8p (although out of my league):
Code: [Select]
FLAC Binary: flac141-case-gcc12.exe (781312 Bytes)
- Average time  = 127.483 seconds (5 rounds), Encoding speed = 84.81x
- FLAC file size = 1.165.475.620 Bytes (= 61,107% of WAV size)

FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
- Average time  = 128.832 seconds (5 rounds), Encoding speed = 83.92x
- FLAC file size = 1.165.475.620 Bytes (= 61,107% of WAV size)

FLAC Binary: xiph-141\flac.exe (299520 Bytes)
- Average time  = 134.315 seconds (5 rounds), Encoding speed = 80.50x
- FLAC file size = 1.165.475.620 Bytes (= 61,107% of WAV size)

FLAC Binary: flac141-john-avx2.exe (1212928 Bytes)
- Average time  = 190.607 seconds (3 rounds), Encoding speed = 56.72x
- FLAC file size = 1.165.475.622 Bytes (= 61,107% of WAV size)
Here Case's GCC12 build is the fastest, but both of Case's binaries are faster than Xiph on my setup.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-23 21:22:26
Got Windows screaming false positives on the various compiles posted here, so ... no 1.4.1 speed tests from me until next set of definitions downloaded.
(Submitting false positives reports to Microsoft isn't outright easy?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 09:09:24
1.4.0 compiles then, augmenting the Intel figures from here (https://hydrogenaud.io/index.php/topic,122949.msg1015694.html#msg1015694) with Ryzen 2500U and also did flac -0 for those who want the fastest. Reply number is from that thread.

I don't know how much the individual figures can be trusted - and if I ever do this again, it will not be on 1.4.0 - but the overall picture among the fastest is pretty clear: the Case compile wins on -0 and -5, and the official build wins at -7p.  I suspect it is when it calls for several residual compressions then?


-0. Why @john33's newer Intel compile is so good compared to the other builds posted I don't know - maybe order of runs matter, if the CPU has just struggled less or more, but just speculations. Adding up it overtakes the Xiph build.
Intel   Ryzen (much cheaper)   numbers are ordered by Intel
211      258      Case compile (x64) from reply 57
238      312      Xiph (x64 only)
240      293      john33's reply 82 with newer compile
240      332      NetRanger GCC-64 from Reply 10
254      348      NetRanger GCC-32 from Reply 10
258      n/a      Rarewares-x64 from john33's Reply 34 (first link)
264      367      Rarewares-x86-nonXP from john33's Reply 34 (first link)
269      360      NetRanger CLANG14-64 from Reply 15
274      360      NetRanger CLANG15-w64 from Reply 68
276      365      Rarewares-x86 from john33's Reply 34 (second link)
282      381      NetRanger CLANG14-32 from Reply 15
295      383      NetRanger CLANG15-w32
Rarewares-x64 produced strange numbers. Didn't go back and check again.


-5:
Intel   Ryzen   numbers are ordered by Intel
256      353      Case compile (x64) from reply 57
271      372      Xiph (x64 only)
298      375      john33's reply 82 with newer compile
300      412      Rarewares-x64 from john33's Reply 34 (first link)
321      481      NetRanger CLANG14-64 from Reply 15
328      462      Rarewares-x86-nonXP from john33's Reply 34 (first link)
330      464      Rarewares-x86 from john33's Reply 34 (second link)
336      454      NetRanger GCC-32 from Reply 10
344      447      NetRanger GCC-64 from Reply 10
347      450      NetRanger CLANG14-32 from Reply 15
366      n/a      NetRangerCLANG15-w64 from Reply 68
395      457      NetRangerCLANG15-w32
Deleted a nonsense result from CLANG15. Did Windows Update run or something?

-7p. Results start to vary, I wonder how much they can be trusted. Xiph was run immediately after all the -5 were done, maybe -5 was not enough to heat up the CPU that much. Will not re-run any 1.4.zero .
Intel   Ryzen    numbers are ordered by Intel and here the Ryzen numbers are not much in order.
 831      1279      Xiph (x64 only)
 880      1539      Case compile (x64)
 882      1426      NetRanger GCC-64
1006      1632      NetRanger GCC-32
1015      1600      NetRanger CLANG14-64
1029      1442      john33's reply 82 with newer compile
1035      1431      Rarewares-x64
1089      1888      Rarewares-x86
1170      1573      NetRanger CLANG14-32
1284      1862      Rarewares-x86-nonXP
1508      1783      NetRanger CLANG15-64
n/a       1797      NetRanger CLANG15-32
User aborted the CLANG15-32 was on Intel after, and the CLANG14-32 on Ryzen - the latter after two runs had completed, consistent figures, so was included.

Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2022-09-24 10:38:50
If you get weird times with john33 AVX2 compile, it might be CPU overheating - AVX2 is extremely heavy on power and heat, if there are any cooling deficiencies, CPU will throttle.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 10:51:57
Quite possible. Everything was done on laptops. Dell with Intel and consumer-grade Acer with Ryzen. I've had three generations of Dell Latitude, and consumer-grade Dell before that - Dell fan control was and remains a mystery.

A "speed concern" is also, how much are you doing at the time? If one is compressing only a few albums, one will wait for it to finish and then time is annoying - and then half of them might be done before the worst throttling kicks in, maybe?
If one is compressing a lot of them, overnight at full steam, that is something else - but speed is not that crucial as long as it only makes the difference between done at 0430 and done at 0510.
If one is migrating to (new) FLAC - running for days and weeks - then both duration and heat (slowdown and fan speed on a computer you are usinig every day) would be a big thing again.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-09-24 10:57:19
Long story short: on all of the Intels I've tried, from Gen.4 to Gen.8, the Haswell build by Case is the fastest one. Using -6 because I like to use -6, best speed/compression ratio IMHO.
But an older Flac 1.3.3 x64 build by Case is slightly faster with only 152940 bytes more out of 583 MB:

Flac 1.3.3 x64 by Case 11/08/2019
Code: [Select]
flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 611507403 bytes, ratio=0,734

Kernel  Time =     0.812 =    6%
User    Time =    10.687 =   88%
Process Time =    11.500 =   94%    Virtual  Memory =     14 MB
Global  Time =    12.140 =  100%    Physical Memory =     13 MB


Flac 1.4.1 x64 Haswell by Case 23/09/2022
Code: [Select]
flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 611354463 bytes, ratio=0,734

Kernel  Time =     0.687 =    5%
User    Time =    11.671 =   92%
Process Time =    12.359 =   97%    Virtual  Memory =     14 MB
Global  Time =    12.685 =  100%    Physical Memory =     17 MB
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 11:13:07
Using -6 because I like to use -6, best speed/compression ratio IMHO.
What CPU? All testing I have done so far, indicates that -6 is the useless one; -7 is quite close to -6 on time (on Intel) and quite close to -8 at size. Visualised by ktf at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 - see the third diagram that starts at -4.
-6 was even more strikingly bad with 1.3.4 it seems.

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-24 11:30:16
Don't know about specific CPUs but -6 is pretty poor on compression ratio. -7 is the best for speed vs compression ratio IMO. As for throttling, run HWiNFO in background so one knows that the CPU is throttled. Pretty much a non-issue for desktop systems with AVX2 stress tests on all cores.
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2022-09-24 11:47:04
It depends on CPU and cooling, some are designed to be low on power & heat, yet my i7-4790K with a huge tower air cooler hits 100C with Prime95 AVX2 stress test and is very close to Tcase with everyday AVX2 apps.
With current turbo technology you may miss turbo targets when it's warmer and will get varying results.
Laptops are always cooling deficient.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-09-24 11:56:59
I've tried with -7 and 1.3.3 is still faster than 1.4.1:

Flac 1.3.3 x64 by Case 11/08/2019
Code: [Select]
flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 609653841 bytes, ratio=0,732

Kernel  Time =     0.828 =    6%
User    Time =    12.250 =   91%
Process Time =    13.078 =   97%    Virtual  Memory =     14 MB
Global  Time =    13.345 =  100%    Physical Memory =     13 MB

Flac 1.4.1 x64 Haswell by Case 23/09/2022
Code: [Select]
flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 609301753 bytes, ratio=0,732

Kernel  Time =     0.593 =    3%
User    Time =    14.000 =   93%
Process Time =    14.593 =   97%    Virtual  Memory =     14 MB
Global  Time =    14.967 =  100%    Physical Memory =     13 MB
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-09-24 12:13:00
I've tried with -7 and 1.3.3 is still faster than 1.4.1:

Yes.

Quote from: https://xiph.org/flac/2022/09/09/flac-1-4-0-released.html
  • Compression for presets 3 through 8 has improved with only a small decrease in encoding speed, while presets 0, 1 and 2 got faster.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 12:31:29
Edit: If you want a competition to -6, try -5 -l 10 and see what happens. Compare to both old -6 and new -6. (Reason: it seems it is the low -l that makes -6 underperform.)

1.3.3 is still faster than 1.4.1:
Sure, but compare to 1.3.3 at -8 and see which one is fastest and compresses best?
I did that (https://hydrogenaud.io/index.php/topic,122949.msg1015396.html#msg1015396) (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

1.4 starts using double-precision coefficients, which gave the competition an upper hand for fifteen years. (https://hydrogenaud.io/index.php/topic,120158.msg999746.html#msg999746) Double precision takes a bit more time, but compresses better. The impact is greatest on high resolution material (well similar has been seen on lower resolution when there is not much content in the top octave). See that thread, first my reply 33 and then ktf testing upsampled material in reply 94.

(If 1.4 takes more time at high presets, then why not on 0, 1, 2? Code improvements.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-24 13:23:56
OK here are my speed and temperature tests, running 109 tracks in -8pe to make the test longer and uses all cores.

john33 AVX2
Total encoding time: 8:57.484, 38.06x realtime
Case Haswell
Total encoding time: 7:04.062, 48.24x realtime
Xiph
Total encoding time: 5:59.062, 56.98x realtime
Case GCC 12.2.0
Total encoding time: 5:55.157, 57.60x realtime

CPU package max temperature:
Case Haswell: 92C, the only one with "Yes" on "Power Limit Exceeded", no throttling though, it's a non-k i3 after all.
X

Xiph: 91C
X

Case GCC 12.2.0 88C
X

john33 AVX2 87C
X

The screenshot on Reply #17 can be used as a reference idle temperature. All tests ran on the stock cooler.
(https://hydrogenaud.io/index.php?action=dlattach;attach=23452;image)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-09-24 13:37:39
Double precision takes a bit more time, but compresses better.
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-09-24 13:48:57
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)

Then read the entire post:

I did that (https://hydrogenaud.io/index.php/topic,122949.msg1015396.html#msg1015396) (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

So if you care more about speed than compression, just lower the preset and you will get both better and faster compression.

And if you only care about speed and energy usage while the size doesn't matter at all, don't compress at all - just use WAV. Can't beat that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Chibisteven on 2022-09-24 14:04:06
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)

Then read the entire post:

I did that (https://hydrogenaud.io/index.php/topic,122949.msg1015396.html#msg1015396) (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

So if you care more about speed than compression, just lower the preset and you will get both better and faster compression.

And if you only care about speed and energy usage while the size doesn't matter at all, don't compress at all - just use WAV. Can't beat that.
FLAC seems pretty energy efficient and quite fast to decode compared to something like Monkey's Audio which just eats energy and is sloooooooooow.

Encoding is another story but if your stuff is already encoded, you could just leave it be.  Even on a UPS it has no major effect on battery life and that's running an Icecast server and yes I had a power outage while using Icecast and running a bunch of other things as well as FLAC encoding.
Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2022-09-24 17:16:22
Nothing in this thread tells us about energy efficiency. Something slower may use less joules to do it's job than something that finishes faster.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 17:43:33
For a given executable with more CPU intensive setting, I would not be surprised if it translates quite well. (As encoding goes - decoding is a different matter, and for FLAC a quite irrelevant one.)
But if you have a more modern CPU, a TDP of 15 watts is quite common, which amounts to some 2.52 kWh if run for a whole week.

Of course, GB cost is only relevant when your drive is getting full. Implying, as Chibisteven points out, leave encodes as they are.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-09-24 23:21:34
I wanted to test the effect on lossyWAV encodes. So I lazily picked thirteen songs from various artists in my collection (honestly, there really isn't a whole lot of variety here) and tested it as follows:


Here are the results:

Format
Size (bytes)
Percentage
wav
534185270
100%
1.3.4
371429142
69.532%
1.4.1
370814508
69.417%
lossy-1.3.4
154850676
28.988%
lossy-1.4.1
154596798
28.941%

More or less what I expected. Someone with a better / more varied testing playlist should probably redo the test, though.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-25 12:20:32
Why -6 does so bad - and why keep it as it is:
(i.e. what part of the settings)

Compared to -5, -6 goes up from -r5 to -r6 and tries another set of windowing functions.
But prediction order stays at 8.
Going up to -7 increases the prediction order to 12. It spends only a tiny bit more time, and compresses quite a lot better. See charts with nearly-the-same-as 1.4 at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 , 3rd diagram.

Question: why is the difference to -5 small and the difference to -7 large?
* It is not the -r5 to -r6. In the 38 CD corpus in my signature, -5 -r6 improved 0.0044 percent.
* It is the prediction order. Trying -5 -l10 and -5 -l12 ditches the two other measures (-r and windowing function) and increases the prediction order. The former nearly catches -6 at size, the latter overtakes it. Both encode significantly faster on both Intel and Ryzen.

Why keep -6 then, when you can compress better at shorter time?
Decoding CPU footprint.
FLAC already decodes faster than pretty much anything, so is that necessary? ... well, FLAC has some "special olympics" settings: -0 and -3 are dual mono, -0 to -2 are fixed predictor. Starting at -4 you have the 8th order predictor and stereo decorrelation (adaptive for -4, brute-force'd at higher). Look at the decoding chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf (and remember it starts at -0 not at -1): -7 and -8 take - percentwise - more CPU decoding, and that is because the -l12. Even for a "fastest there is", there is a reason for a couple of "fastest among the fastest".

So there you go, a special purpose setting for those who want something that, decoding-wise "cannot be distinguished from the default" (not completely true? There is a difference, the -r ...).
A beefed-up default is just ... nothing wrong with having one such for those inclined. Keep it. The rest of us ... don't use it.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-25 13:25:04
Look at the decoding chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf (and remember it starts at -0 not at -1): -7 and -8 take - percentwise - more CPU decoding, and that is because the -l12. Even for a "fastest there is", there is a reason for a couple of "fastest among the fastest".
Looks like the graph suggests that -3 should have the fastest decoding speed and -7 or -8 will be slowest. I zoomed the pdf plot a lot and see that the triangle mark in -3 is still sitting at 0% decoding CPU time. Anyway here are my results.

Test settings:
Code: [Select]
System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.12
Settings:
  High priority: yes
  Buffer entire file into memory: yes
  Warm-up: yes
  Passes: 5
  Threads: 1
  Postprocessing: none

-3
Code: [Select]
Stats by codec:
  FLAC: 1506.230x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:15.008
  Speed (x realtime): 1506.230

-8
Code: [Select]
Stats by codec:
  FLAC: 1501.024x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:15.060
  Speed (x realtime): 1501.024

.wav as a reference
Code: [Select]
Stats by codec:
  PCM: 59556.250x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.000
  Decoding time: 0:00.379
  Speed (x realtime): 59556.250

The benchmark is very sensitive, for example, if I use RAM drive and set "Buffer entire file into memory" to "no":
Code: [Select]
Stats by codec:
  PCM: 31524.633x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:00.717
  Speed (x realtime): 31524.633
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-25 15:19:50
Decoding speed varies from album to album, but relative differences among 0-8 are similar. Pay attention to -3. The files are all encoded with flac 1.4.1

MA Recordings - Rediscovered Memories
https://www.discogs.com/release/7300636-Various-Rediscovered-Memories
0 (x realtime): 2251.000 min, 2271.072 max, 2261.496 average
1 (x realtime): 2190.539 min, 2195.430 max, 2193.093 average
2 (x realtime): 2180.179 min, 2197.860 max, 2187.784 average
3 (x realtime): 1606.081 min, 1608.317 max, 1606.900 average
4 (x realtime): 1625.529 min, 1627.217 max, 1626.215 average
5 (x realtime): 1621.599 min, 1625.611 max, 1623.098 average
6 (x realtime): 1587.640 min, 1590.859 max, 1589.449 average
7 (x realtime): 1536.414 min, 1539.106 max, 1538.185 average
8 (x realtime): 1534.826 min, 1536.911 max, 1535.957 average

FINAL FANTASY XIII Original Soundtrack, Disc 4
https://vgmdb.net/album/15980
0 (x realtime): 2144.344 min, 2153.433 max, 2149.787 average
1 (x realtime): 2043.644 min, 2054.518 max, 2048.156 average
2 (x realtime): 2024.009 min, 2040.479 max, 2033.885 average
3 (x realtime): 1506.466 min, 1509.097 max, 1508.142 average
4 (x realtime): 1556.885 min, 1559.473 max, 1558.123 average
5 (x realtime): 1564.512 min, 1565.964 max, 1565.423 average
6 (x realtime): 1558.401 min, 1560.904 max, 1559.655 average
7 (x realtime): 1499.939 min, 1500.600 max, 1500.309 average
8 (x realtime): 1499.042 min, 1502.150 max, 1500.238 average

黃耀明 - 信望愛
https://music.apple.com/hk/album/%E4%BF%A1%E6%9C%9B%E6%84%9B/1356525393
0 (x realtime): 2145.599 min, 2162.814 max, 2156.138 average
1 (x realtime): 2045.117 min, 2054.068 max, 2048.449 average
2 (x realtime): 2036.280 min, 2054.765 max, 2046.030 average
3 (x realtime): 1475.368 min, 1477.189 max, 1476.618 average
4 (x realtime): 1528.601 min, 1532.385 max, 1529.803 average
5 (x realtime): 1544.374 min, 1546.838 max, 1545.812 average
6 (x realtime): 1527.229 min, 1529.417 max, 1528.687 average
7 (x realtime): 1422.072 min, 1423.042 max, 1422.549 average
8 (x realtime): 1423.491 min, 1424.172 max, 1423.799 average

Full reports attached.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-25 15:37:46
@ktf: When you wrote that -0 to -2 got faster (https://hydrogenaud.io/index.php/topic,122949.msg1015314.html#msg1015314), does that apply to decoding as well? If the improved algorithm is "reversible into an improved decompression algorithm", then that would explain. (Edit: oh I could have set up a RAM disk myself ... maybe)

Looks like the graph suggests that -3 should have the fastest decoding speed and -7 or -8 will be slowest. I zoomed the pdf plot a lot and see that the triangle mark in -3 is still sitting at 0% decoding CPU time.
It looks like the left and right borders by construction go through the fastest and slowest point, but even then: the scale is logarithmic, so there is no 0. The next (or "previous") step left of 0.5% would be 0.4%.
-3 appears to be sitting at "point fortysomething".
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-09-25 18:03:08
@ktf: When you wrote that -0 to -2 got faster (https://hydrogenaud.io/index.phpN/topic,122949.msg1015314.html#msg1015314), does that apply to decoding as well?
Nope. The two things are fundamentally different problems, improvements to one are rarely applicable to the other.

My most plausible explanation here is simply differences between CPUs. Perhaps newer CPUs can better precondition the code used for decoding fixed subframes. Branch prediction and other front-end wizardry are a major factor in the performance of today's CPUs.

edit: I've always been baffled by -3 being decoded faster than -2. The decoding results here make much more sense. In the past, @Porcus found that blocksize has a profound influence on decoding speed. Might be related to training of branch prediction. It might be that newer CPUs have branch prediction with a larger capacity, in which this training only has to happen once every file instead of once every block. In that case, influence of block size might be much smaller.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-26 10:02:26
My most plausible explanation here is simply differences between CPUs. Perhaps newer CPUs can better precondition the code used for decoding fixed subframes. Branch prediction and other front-end wizardry are a major factor in the performance of today's CPUs.
Just tested on a super old and low-end Lumia 520 (https://www.gsmarena.com/nokia_lumia_520-5322.php) with LineageOS and foobar2000 APK. Concatenated the first track of the three albums mentioned in Reply #31 (https://hydrogenaud.io/index.php/topic,123025.msg1016404.html#msg1016404). 11m53s single file encoded from 0 to 8. I tried to use the first two tracks for each album but then foobar crashed during test, probably not enough RAM (only 512MB).
X

foobar's console shows a floppy disk icon which is supposed to save the output to text files, I tried but it can't save anything, so here are the screenshots. The speed transition seems smoother.

0
X

1
X

2
X

3
X

4
X

5
X

6
X

7
X

8
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-26 10:30:09
An AMD thing then?
* From http://www.audiograaf.nl/losslesstest/ I see that @ktf used an AMD on edition 3, 4, 5, but an Intel on edition 1, 2.
* In editions 3ff, -3 decodes fastest
* In edition 2, I see from http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%202.pdf figure 2.2 (using 2.1 to verify that -3 is the one on the 58 percent mark) that 0, 1, 2 decoded faster.

I think presets 0, 1, 2, 3 (and 4, 5) were synonyms for the same thing then as now.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-26 15:47:02
GCC 12.2.0 john33 (https://hydrogenaud.io/index.php/topic,123014.msg1016445.html#msg1016445) vs Case (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265), plus Xiph build:
Zero Wing OST, VGM format rendered by foo_gep at 16/44
https://www.youtube.com/playlist?list=PLPAbo-cOSKYzgI65IOneAq_oTzOjI6k4F

case -8p
Total encoding time: 0:14.562, 113.62x realtime

case -8p -b2304
Total encoding time: 0:17.172, 96.35x realtime

john33 -8p
Total encoding time: 0:14.890, 111.12x realtime

john33 -8p -b2304
Total encoding time: 0:17.297, 95.65x realtime

xiph -8p
Total encoding time: 0:14.969, 110.53x realtime

xiph -8p -b2304
Total encoding time: 0:17.203, 96.18x realtime

The OST is not very long (27m53s) so others can do a longer test, perhaps with some temperature tests too.


File size:
Code: [Select]
wav         291,876,908   100.000%
-8p         164,386,268   56.3204%
-8p -b2304  163,492,544   56.0142%

Decoding speed:
Code: [Select]
System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.12
Settings:
  High priority: yes
  Buffer entire file into memory: yes
  Warm-up: yes
  Passes: 5
  Threads: 1
  Postprocessing: none

-8p
Speed (x realtime): 1430.445 min, 1433.292 max, 1432.189 average

-8p -b2304
Speed (x realtime): 1467.472 min, 1469.977 max, 1468.835 average
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-09-26 17:25:42
Interesting, thanks. I just posted a clang compile that seems a little faster on my system, but not with any extensive validation done speed-wise.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-26 18:54:37
Zero Wing OST
-8p
Total encoding time: 0:19.640, 84.24x realtime
-8p -b2304
Total encoding time: 0:22.609, 73.18x realtime

So slower, but previous Clang builds are also slower on my machine, as well as other members.
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-09-26 20:37:08
OK, thanks.
Title: Re: FLAC v1.4.x Performance Tests
Post by: forart.eu on 2022-09-27 11:14:04
...so, after all tests, which build (and encoding parameters) I have to use on my new i5-12600 for BOTH speed and size ?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-27 12:05:19
for BOTH speed and size ?

Use -0, that gives you max speed and max size.  ;D

Oh, you wanted small size? On a more serious note, then:
You need to take some kind of stand on how much CPU time it is worth to save a mega-/gigabyte. Compared to fifteen years ago, the extra time spent on doing -7 instead of -5, is much lower - but the space saved is much less costly.
Also if you have a spinning drive (people do have that for larger collections ...) - or even more, a NAS - then I/O will take out a lot of the speed, so that might not be a concern unless you are considering -8p.

Generally the "most economic" suggestion is too not recompress until you have to: leave your FLAC files as they are, and only when your drive is closing in on full, recompress using a quite heavy setting. If you are on a spinning drive, that is sure as hell to get you fragmentation though - in case that matters anymore.


Edit: for new files, you can of course choose whatever, but you cannot tell flac.exe to "recompress those which were compressed with -5 and leave the -8p alone" - it doesn't know that. If on the other hand you use foobar2000 to recompress, you can filter on those not created by 1.4.x. But that does not recompress in-place. What I would do, would be to (1) run a verification on everything to make sure no FLAC file is corrupted, (2) make sure the backup is sync'ed, (3) foo_audiomd5 writes an actual audio md5 to a file that can be checked afterwards, (4) flac -f to recompress the whole thing in-place, (5) verify against foo_audiomd5's files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-27 15:02:01
Mostly academic but try Merzbow Pulse Demon / Venereology with -l 12 -b 512 -r 8 without windowing.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-27 21:50:39
Settings above -8, anyone?

Come on, there are a few of you out there. Care to run an overnight job on at least a handful of CDs? There are a few settings that might be better or worse return on CPU time. I'll explain the choices at the end, at the risk of posting spoilers.


* I've not been so concerned about single songs, so I've been converting a whole image to .wav, removing all tags if they were carried over, getting them into the same folder in an SSD, and running the second script. But if you use the first one, you will not get the logfile spammed down with long dir listings.
* There are elegant PowerShell solutions, but I'm too stuck in cmd and bash. So for timing (on Windows), I'm using timer64 off the https://sourceforge.net/projects/sevenmax/ package.  Syntax (will overwrite flac files) that I used, e.g.
   <directory-to-timer64>\timer64.exe flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
and then du or dir.
With timer64.exe, flac.exe and the .wav files in the same directory, I would run the following (let's keep it simple, no FOR loop that makes for the single/double percentage sign issue - should preferably be run a couple of times as timings are not ... exact, but then it might take quite a while).
Copy, put in a flactest.bat file in the same directory as the timer64.exe, flac.exe and wave. (Or modify accordingly.)
Open a cmd window, cd to the directory, and run .\flactest.bat - or you can double click it. 

Code: [Select]
.\flac -3pf *.wav 
.\timer64.exe .\flac -7pf  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -r7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -r8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(5)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(5)" *.wav >> logfile.log
du *.flac >> logfile.log
The last one is going to be slow ... but in my testing, more than 3x as fast as the infamous -8ep.

Afterwards, you will get a logfile.log with for each setting, timing and total .flac file size in order. I think the "Global time =" figure is the most interesting.

If you are interested in how each file compresses, that is, the individual .flac sizes and not only the aggregate, you can replace the du command by the old-fashioned DOS command dir and run this instead:

Code: [Select]
.\flac -3pf *.wav 
.\timer64.exe .\flac -7pf  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -r7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -r8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(5)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(5)" *.wav >> logfile.log
dir *.flac >> logfile.log

Then you will get tons of text out. Most interesting here I guess is still the total at the bottom of each dir.


So why these choices? Apart from the -3p that is there just to heat up the CPU a little in order not to let the -7p have an advantage of no-throttling?
-7 is synonymous to -l 12 -b 4096 -m -r 6 -A subdivide_tukey(2) and -8 to same but "(3)" at the end. Next step would be (4) then? But at some stage, the "-p" will be more worth it. When? That's why the initial -7p too; I expect it not to be worth it, but I see some surprises on some material.
Also there is this thing about -r8, which people often use to squeeze the most out of the files - expected is that -r7 offers better value for money, of course, but how much? When does it pay off to nudge up that parameter?
Before going to subdivide_tukey(4), there are a few stranger ones. Like -8 -A "tukey(5e-1);subdivide_tukey(3)"; isn't there already a single tukey function in subdivide_tukey(3)? Yes, but that is steeply tapered: closer to a rectangle than to the tukey(onehalf) that is the -5 default. So, trying that instead of the subdivide_tukey(4), hoping to get catch more compression cheaper?
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)": Here I'm making them more different, to see if that helps. (666e-3 = "0.666" or "0,666" depending on locale - don't use locale-dependent code, use this).
Then there is -p , and combinations.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-28 13:55:22
and then du or dir
f**c, "du" isn't in Windows out of the box?! That's what I get for installing sysinternals first day ...

Use the bottom one then.

Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-09-28 23:33:45
I did a little test with different builds posted here in the forum and my AMD Ryzen 5 3600X under Windows 11.
The file used, is a almost 2 hours long DJ set, of electronic music.
The fastest build was flac-1.4.1-win64-znver3 (Case).

Code: [Select]
Codec      :     PCM (WAV)
Duration   :     57:21:749
Sample rate:     48000 Hz
Channels   :     2
Bits per sample: 16

Igor Pavlov's timer64 have been used to measure the time.

Code: [Select]
timer64.exe flac -8p

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
Global  Time =    53.220
wrote 425812103 bytes, ratio=0,644

flac-1.4.1-x64-znver2-GCC1220 (john33)
Code: [Select]
Global  Time =    53.621
wrote 425812103 bytes, ratio=0,644

flac-1.4.1-win64-gcc12 (Case)
Code: [Select]
Global  Time =    56.626
wrote 425812106 bytes, ratio=0,644

FLAC-1.4.1_Win64_GCC122 (NetRanger)
Code: [Select]
Global  Time =    59.990
wrote 425812106 bytes, ratio=0,644

flac-1.4.1-x64-AVX2 (john33)
Code: [Select]
Global  Time =    73.045
wrote 425812100 bytes, ratio=0,644

flac-1.4.1-x64-AVX2-clang1500
Code: [Select]
Global  Time =    78.772
wrote 425812103 bytes, ratio=0,644

FLAC-1.4.1_Win64_Intel 19.2 (rarewares)
Code: [Select]
Global  Time =    79.738
wrote 425812100 bytes, ratio=0,644

FLAC-1.4.1_Win64_CLANG15 (NetRanger)
Code: [Select]
Global  Time =   100.662
wrote 425812106 bytes, ratio=0,644
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-28 23:46:15
Nearly a factor of two, that is quite a lot.

(But you didn't test the official build? Also, you say nearly two hours, but it says 57 minutes?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-09-29 00:11:17
(But you didn't test the official build? Also, you say nearly two hours, but it says 57 minutes?)

Oops yes it's 57 minutes not two hours. My mistake.

Official build from Xiph against flac-1.4.1-win64-znver3 (Case).

Official Xiph
Code: [Select]
Global  Time =    53.937
wrote 425812106 bytes, ratio=0,644

win64-znver3 (Case)
Code: [Select]
Global  Time =    51.475
wrote 425812103 bytes, ratio=0,644
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 10:00:36
Testing the Acer (Ryzen-equipped) laptop, as that has more consistent timings (two fan settings, on + off?) - Intel considerations at the end;

... with the following:

Settings above -8, anyone?
[...]
So why these choices? Apart from the -3p that is there just to heat up the CPU a little in order not to let the -7p have an advantage of no-throttling?
-7 is synonymous to -l 12 -b 4096 -m -r 6 -A subdivide_tukey(2) and -8 to same but "(3)" at the end. Next step would be (4) then? But at some stage, the "-p" will be more worth it. When? That's why the initial -7p too; I expect it not to be worth it, but I see some surprises on some material.
Also there is this thing about -r8, which people often use to squeeze the most out of the files - expected is that -r7 offers better value for money, of course, but how much? When does it pay off to nudge up that parameter?
Before going to subdivide_tukey(4), there are a few stranger ones. Like -8 -A "tukey(5e-1);subdivide_tukey(3)"; isn't there already a single tukey function in subdivide_tukey(3)? Yes, but that is steeply tapered: closer to a rectangle than to the tukey(onehalf) that is the -5 default. So, trying that instead of the subdivide_tukey(4), hoping to get catch more compression cheaper?
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)": Here I'm making them more different, to see if that helps. (666e-3 = "0.666" or "0,666" depending on locale - don't use locale-dependent code, use this).
Then there is -p , and combinations.

The following was done on all the 38 CDs with no attempt at checking differences between musical genres. I did note though, that -7p sizes varied from larger than -8 to smaller than -8 -A subdivide_tukey(4), but if you are willing to wait for -7p you might as well select something better.

First observations:
* -r7 isn't ruled out at first observation, only later it turns out not to be worth it on the Ryzen - but maybe it is on the Intel, see below. However -r8 offers very little over -r7. 1/15ths of the size improvement per second CPU on the Ryzen, and not worth it on the Intel either, at least not until you slow it down considerably by the apodization functions. So in the following, I ditch all the -r8.
* -A "tukey(666e-3);subdivide_tukey(3/333e-3)" improved over -A  "tukey(5e-1);subdivide_tukey(3)" at about the same time - YMMV. I remove the latter, as it is only going to make numbers look weird. Maybe I should rather have removed the "customized" one?!
* -7p is not worth it. Better go up to -8 -A subdivide_tukey(4 or 5)

So with those deletions, I ordered by compressed size, and calculated: how many bytes can I save per second it costs moving up one step?
* -r7: not that much - it was cheaper to add another apodization function. Only tried up to subdivide_tukey(5).
* -p becomes worth it around at the subdivide_tukey(5) point: if you consider going up from -8 -A subdivide_tukey(4) to 5, consider another doubling of encoding time to -8p instead, as that pays off nearly the same per extra second;
* ... but, in this test, the extra tukey also is worth about the same.

Some numbers after more deletions - but kept a doubtful one, to be explained below the table:
Sizesecondssaved per secondsetting
11969604531 833    -8
11968502388 94010300 -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
1196755657511554399 -8 -A "subdivide_tukey(4)"
1196646337115552733 -8 -A "subdivide_tukey(5)"
1196129143330033572 -8p (note jump in time when using -p)
1196017971933503204 -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
119592501645131522 -8p -A "subdivide_tukey(4)"
119581254247796422 -8p -A "subdivide_tukey(5)"
The "saved per second" means: if you go from the previous setting to this, it is going to take more seconds, (e.g. 107 extra from -8 to the next), but how many bytes do you save per second?
Now the fact that the "2733" is less than the next "3572" indicates you should not use the -8 -A subdivide_tukey(5) - if you think those 400 seconds are worth the savings, you should rather spend even more seconds going to -8p. Deleting that row, the "3572" will also changes - to 3390 - because it is relative to the "previous" row which is now another one.  But YMMV here, it surely depends on material and hardware and maybe even whether you use a different of the compiles posted here.

For comparison, adding a "-r 7" to the -8p -A "subdivide_tukey(4)" line would save 132 bytes per extra second, so you could as well go to  -8p -A "subdivide_tukey(5)"; and, going from -r7 to -r8 makes for only 11.


Evidence from the Ryzen then:
* If you want something heavier than -8, but are not willing to wait for -8p, try another tukey as described, or for simplicity, upping the game to -A subdivide_tukey(4).
* If you want something heavier than -8p, try the same thing
* If you are willing to wait for something that takes >10x as long as -8 (>3x as long as -8p), then I haven't checked much. Maybe at this stage you could consider -r7 - maybe -r7 is worth it "even earlier" on the Intel (see next).



Over to the Intel-equipped Dell laptop, I found something similar, except:
* timings vary too much to be reliable "at every line", but applying the rule of "when in doubt, delete one that the Ryzen results suggest I delete" I end up with something that by and large appears to give the same expression - except possibly -r7. And with the reservation for unreliable timings taken:
* -r7 doesn't seem that worthless as on the Ryzen. Actually you can consider -8p  A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r7 rather than going to subdivide_tukey(4) or at least against (5).  Or maybe for short is you don't feel like typing wrong: -8p -r7 might be worth it. YMMV.


Oh, and for reference, compared to WavPack's -x4 vs -x
* Going up to -8p is about like going -hx to -hx4 with WavPack in terms of gains per second. (-hx4 is considered very slow, but the gains are higher; -8p is not that slow, but saves about proportionally less).
* From -8p to -8p  A "tukey(666e-3);subdivide_tukey(3/333e-3)" is about like going -hhx to -hhx4.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 11:54:29
appears to give the same expression
"impression" ...
(https://ih1.redbubble.net/image.2299611267.5326/tb,1000x1000,small-pad,750x1000,f8f8f8.jpg)

Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 15:08:21
To whom it may concern:
On my quest to find a setting for v1.4.1 that rivals my favourite encoding setup (v1.3.3 with -7 = my sweet spot for encoding time and compressed file size), I found these settings for v1.4.1.
The goal was to achieve the same (or better) encoding speed with better compression.
Code: [Select]
Reference:
FLAC Binary: flac133_case.exe
FLAC Option: -7
 Average time  = 22.682 seconds (5 rounds), Encoding speed = 476.67x
 FLAC file size = 1.168.025.916 Bytes (= 61,241% of WAV size)
For my timings I only used Case's Haswell build, since this was the fastest on my computer.
I haven't varied the windowing functions, because frankly I have little idea what I'm going to do there...
Code: [Select]
a) FLAC Option: -l11 -b4096 -m -r6 -A subdivide_tukey(2)
 Average time  = 22.927 seconds (3 rounds), Encoding speed = 471.58x <= worse encoding speed: 477x -> 472x
 FLAC file size = 1.167.741.823 Bytes (= 61,226% of WAV size) <= better compression: 0.015 percent points

b) FLAC Option: -l11 -b4096 -m -r5 -A subdivide_tukey(2)
 Average time  = 22.134 seconds (5 rounds), Encoding speed = 488.48x <= better encoding speed: 477x -> 488x
 FLAC file size = 1.167.807.739 Bytes (= 61,229% of WAV size) <= better compression: 0.012 percent points
 
c) FLAC Option: -l11 -b3072 -m -r5 -A subdivide_tukey(2)
 Average time  = 21.051 seconds (5 rounds), Encoding speed = 513.62x <= better encoding speed: 477x -> 514x
 FLAC file size = 1.167.708.945 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points

d) FLAC Option: -l11 -b3584 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.729 seconds (3 rounds), Encoding speed = 521.58x <= best encoding speed: 477x -> 522x
 FLAC file size = 1.167.755.713 Bytes (= 61,227% of WAV size) <= better compression: 0.014 percent points

e) FLAC Option: -l11 -b3328 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.866 seconds (3 rounds), Encoding speed = 518.16x <= better encoding speed: 477x -> 518x
 FLAC file size = 1.167.700.585 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points
So it's basically -l11 and -r5 with variations of block size between 4KB and 3KB.
I also tested these settings with another list of WAV files (some 3 hrs of playing time, mostly rock music) and the ranking of the results were the same.
So it's going to be either d) (block size = 0x0E00) or e) (blocksize = 0x0D00) for me.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 15:41:59
The "natural" result first: As -r6 does -r5 plus another little try, a) compresses better than b). And it spends only a split second more. 

More odd is that a) outcompresses -7.  a) is basically just -7 with the "12th" order parameter forced to 0. The only explanation I can come up with, is the fact that FLAC guesstimates first and calculates more exactly when it has picked what it thinks is best, and here it seems that - contrary to guesstimate - putting that parameter to zero actually improves. It is known since back in the day that too high -l could lead to this, but it is a bit surprising that it still kicks in between 11 and 12, if only by 0.015 percentage points.

The other are alternative block sizes.  All of those are divisible by ... well all are indeed divisible by 256, so they don't put restrictions on -r.
Have you tried -b 2048 (edit: or 2304)? Sometimes that improves. 2048 means twice as many blocks as default -b 4096, so twice as big block overhead - but the other side of the coin is that each block has its own predictor, which means it could offer a better fit.

Also you don't need to type out all this. Synonyms:
a) -7 -l 11
b) -7 -l 11 -r 5
c) -7 -l 11 -r 5 -b 3072
d) -7 -l 11 -r 5 -b 3584
e) -7 -l 11 -r 5 -b 3328

Edit: Did you try "cde with -r 6"? Expected: tiny improvement at tiny cost, just like a) over b). Any case of -r5 outcompressing -r6 would, I suppose, be "due to guesstimation error".
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 16:50:44
Here's what I got with a concatenated 6h43m54s .wav file over 7 CDs in different genres.

-5
Total encoding time: 0:48.563, 499.02x realtime
2066932342

-6
Total encoding time: 0:58.828, 411.95x realtime
2062800852

-l 12 -b 3456
Total encoding time: 0:47.531, 509.86x realtime
2060849025

-7
Total encoding time: 1:07.625, 358.36x realtime
2056437870

PS: 3456 = 1152 * 3
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 17:08:06
@Porcus:
Quote
More odd is that a) outcompresses -7
It does not. It outcompresses -7 from v1.3.3.
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -7
 Average time  = 25.416 seconds (10 rounds), Encoding speed = 425.40x <= worse speed: 477x -> 425x
 FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size) <= better compression: 0.053 percent points
which was ruled out because it is much slower than -7 on v1.3.3

I also tried blocksizes of 2048 and >3584, but results are worse.

I'm aware of the synonyms, but I prefer to use the "full" settings in my test, just to see all the parameters and don't have to remember what "-7" stands for.

Going to test cde with -r6 asap...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 17:31:06
https://datatracker.ietf.org/doc/draft-ietf-cellar-flac/
Code: [Select]
10.1.1.  Blocksize bits

   Following the frame sync code and blocksize strategy bit are 4 bits
   referred to as the blocksize bits.  Their value relates to the
   blocksize according to the following table, where v is the value of
   the 4 bits as an unsigned number.  In case the blocksize bits code
   for an uncommon blocksize, this is stored after the coded number, see
   section uncommon blocksize (#uncommon-blocksize).

van Beurden & Weaver      Expires 31 March 2023                [Page 29]
Internet-Draft                    FLAC                    September 2022

      +=================+===========================================+
      | Value           | Blocksize                                 |
      +=================+===========================================+
      | 0b0000          | reserved                                  |
      +-----------------+-------------------------------------------+
      | 0b0001          | 192                                       |
      +-----------------+-------------------------------------------+
      | 0b0010 - 0b0101 | 144 * (2^v), i.e. 576, 1152, 2304 or 4608 |
      +-----------------+-------------------------------------------+
      | 0b0110          | uncommon blocksize minus 1 stored as an   |
      |                 | 8-bit number                              |
      +-----------------+-------------------------------------------+
      | 0b0111          | uncommon blocksize minus 1 stored as a    |
      |                 | 16-bit number                             |
      +-----------------+-------------------------------------------+
      | 0b1000 - 0b1111 | 2^v, i.e. 256, 512, 1024, 2048, 4096,     |
      |                 | 8192, 16384 or 32768                      |
      +-----------------+-------------------------------------------+

                                  Table 13
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 17:48:31
It does not. It outcompresses -7 from v1.3.3.
Ah, I cannot read.

But, try 1.3.3 at -5. Reason: You say -7 was your sweet spot, but then it is relevant: how much did you actually pay (in seconds and milliseconds) for the -7 compression improvement over -5?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 19:19:13
Here's what I got with a concatenated 6h43m54s .wav file over 7 CDs in different genres.

-5
Total encoding time: 0:48.563, 499.02x realtime
2066932342

-6
Total encoding time: 0:58.828, 411.95x realtime
2062800852

-l 12 -b 3456
Total encoding time: 0:47.531, 509.86x realtime
2060849025

-7
Total encoding time: 1:07.625, 358.36x realtime
2056437870

PS: 3456 = 1152 * 3
The tests above were done using Case's GCC 12.2.0 build (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265). I tried to disable AVX in BIOS to compare the differences but flac.exe simply crashed.

Tests below used the Xiph build which does not crash, the two sets of results showed the differences of using AVX or not.

5
Total encoding time: 0:54.640, 443.52x realtime
2066932338
Total encoding time: 0:50.906, 476.06x realtime
2066932342

6
Total encoding time: 1:09.578, 348.30x realtime
2062800850
Total encoding time: 1:01.516, 393.95x realtime
2062800852

-l 12 -b 3456
Total encoding time: 0:52.922, 457.92x realtime
2060849025
Total encoding time: 0:50.953, 475.62x realtime
2060849021

7
Total encoding time: 1:16.656, 316.14x realtime
2056437872
Total encoding time: 1:10.250, 344.97x realtime
2056437869
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 21:31:42
@Porcus:
Here are the -r6 results, side-by-side with the previously posted -r5:
(better/worse and faster/slower always compared to the "reference" v1.3.3 -7)
Code: [Select]
c5) FLAC Option: -l11 -b3072 -m -r5 -A subdivide_tukey(2)
 Average time  = 21.051 seconds (5 rounds), Encoding speed = 513.62x <= better encoding speed: 477x -> 514x
 FLAC file size = 1.167.708.945 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points

c6) FLAC Option: -l11 -b3072 -m -r6 -A subdivide_tukey(2)
 Average time  = 22.011 seconds (3 rounds), Encoding speed = 491.21x <= faster encoding (477x -> 491x)
 FLAC file size = 1.167.688.587 Bytes (= 61,223% of WAV size) <= better compression: 0.018 percent points


d5) FLAC Option: -l11 -b3584 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.729 seconds (3 rounds), Encoding speed = 521.58x <= best encoding speed: 477x -> 522x
 FLAC file size = 1.167.755.713 Bytes (= 61,227% of WAV size) <= better compression: 0.014 percent points

d6) FLAC Option: -l11 -b3584 -m -r6 -A subdivide_tukey(2)
 Average time  = 21.606 seconds (3 rounds), Encoding speed = 500.41x <= faster encoding (477x -> 500x)
 FLAC file size = 1.167.713.160 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points


e5) FLAC Option: -l11 -b3328 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.866 seconds (3 rounds), Encoding speed = 518.16x <= better encoding speed: 477x -> 518x
 FLAC file size = 1.167.700.585 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points

e6) FLAC Option: -l11 -b3328 -m -r6 -A subdivide_tukey(2)
 Average time  = 21.926 seconds (3 rounds), Encoding speed = 493.11x <= faster encoding (477x -> 493x)
 FLAC file size = 1.167.669.980 Bytes (= 61,222% of WAV size) <= better compression: 0.019 percent points

tldr; speed is clearly slower with -r6 while compression gains are "nothing to write home about"  ;)

Talking about -5: I found my sweet spot @ -7 because I used -8 since I went with FLAC and here the gain in speed was remarkable while I didn't care about the compression loss of -0.039 percent points. Using -5 would boost my encoding speed by some 50% while losing 0.43% of disk space. Really worth to consider if you're planning to reencode your whole collection.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 22:07:43
Here's what I got with a concatenated 6h43m54s .wav file over 7 CDs in different genres.
-l 12 -b 3456
Total encoding time: 0:47.531, 509.86x realtime
2060849025
Just tried sundance's fastest setting on my data using the same Case GCC 12.2.0 build:

-l11 -b3584 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:54.484, 444.79x realtime
2058772911
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 22:33:40
That's faster here to, but losing compression, so not my goal (to achieve the same (or better) encoding speed with better compression than -7 on v1.3.3)...
In the end, you'll have to make up your mind what you're after...  ;)
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -l12 -b3456 <= bennetng setting
 Average time  = 16.319 seconds (3 rounds), Encoding speed = 662.55x <= way faster (477x -> 662x)
 FLAC file size = 1.168.849.826 Bytes (= 61,284% of WAV size) <= worse compression: -0.043 percent points
But your blocksize is very close to my fastest setting along with better compression:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -l11 -b3456 -m -r5 -A subdivide_tukey(2) <= bennetng block size
 Average time  = 20.825 seconds (3 rounds), Encoding speed = 519.18x <= faster encoding (477x -> 519x)
 FLAC file size = 1.167.698.586 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 23:00:22
My data with -l11 -b3456 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:55.047, 440.24x realtime
2058918957

Also worth to note that my data set's uncompressed size is 4274945852, so quite different to your data set.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 23:27:47
@bennetng:
Why do you think a larger data set makes a remarkable difference? Because it covers a greater variety of music/genre/styles or something I didn't think of yet? Btw. my encoding times are one-file-at-a-time, single core, running the timer64'd flac binary in a console window.
Just out of curiosity: How did you find your "magic" blocksize of 3456? Did you try all blocksizes with an interval of 128 bytes or was this a "lucky punch"?  ;) 
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-30 06:26:51
@bennetng:
Why do you think a larger data set makes a remarkable difference? Because it covers a greater variety of music/genre/styles or something I didn't think of yet? Btw. my encoding times are one-file-at-a-time, single core, running the timer64'd flac binary in a console window.
Just out of curiosity: How did you find your "magic" blocksize of 3456? Did you try all blocksizes with an interval of 128 bytes or was this a "lucky punch"?  ;)
What I meant was your compressed data set is in general around 61.2% of PCM, but mine is in general around 48.2%, even with -5 it is still around 48.35% (see Reply #52). Which means that baseline compression ratios are quite different. So mixing both data sets yields somthing like 54.7% for better corpus averaging. For example, one of ktf's plots also showed a similar ratio:
http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf

The table in Reply #54 shows some common blocksizes including 576, 1152, 2304 or 4608. 1152 is used in presets 0-2, and 4096 for 3-8. So 3456 is just a convenient number I got from the original presets.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-30 07:21:19
The 7 CDs used:


Hdcd Sampler Volume 2
https://www.discogs.com/release/6921177-Various-Hdcd-Sampler-Volume-2


Kaitou Saint Tail Original Soundtrack (Disc 1)
https://vgmdb.net/album/61630


Ondekoza (VDR-25231)
Can't find a suitable link for this specific CD, but in general Japanese arrangements featuring Taiko and Shamisen.
https://en.wikipedia.org/wiki/Ondekoza


Persona 2: Innocent Sin ~ The Errors of Their Youth
https://vgmdb.net/album/4383


Picture Of Primitive Hunting (Chinese Ancient Music)
https://www.discogs.com/release/16142867-Various-%E5%8E%9F%E5%A7%8B%E7%8B%A9%E7%8C%8E%E5%9B%BE-%E4%B8%AD%E5%9B%BD%E5%8F%A4%E4%B9%90-Picture-Of-Primitive-Hunting-Chinese-Ancient-Music


Tchaikovsky : 1812, Marche slave
https://www.amazon.com/Tchaikovsky-Marche-slave-Peter-Ilyich/dp/B000001GDT


何婉盈 / Elaine
https://youtu.be/kB7vQQ7hmcg
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-30 08:18:57
Seems your test corpus is easier to encode than my stuff (80% Classic Rock, some 10% Blues, no Classical tracks, no speech). Sadly, lots of the music from the 90s and later suffer from heavy compression (DR < 6) and are a challenge for audio compression.
Just finished some tests with a random selection of 160 audio files (all CDDA) from my collection (WAV file size = 6.111.491.436 bytes) and the results don't differ much from my regular test set:
FLAC file size = 3.736.404.536 Bytes (= 61,137% of WAV size, avg. bitrate = 863 kbps)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-30 08:51:46
A lot of metal here, CDDA averages 918 or something with 1.3.x at -8.
Classical music section encodes to the low 600s even if there are (literal!) tons of loud organ pipes.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-30 09:12:52
My neigbour was nice enough to give me a Classical CD (Mozart) for this test:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -l11 -b3456 -m -r5 -A subdivide_tukey(2)
 Average time  = 5.128 seconds (3 rounds), Encoding speed = 553.66x
 FLAC file size = 220.191.908 Bytes (= 43,964% of WAV size, avg. bitrate = 620 kbps)

FLAC Binary: flac133_case.exe (718848 Bytes)
FLAC Option: -7
 Average time  = 5.712 seconds (3 rounds), Encoding speed = 496.99x
 FLAC file size = 220.654.634 Bytes (= 44,056% of WAV size, avg. bitrate = 622 kbps)
So compression of this kind of music is in bennetng's ballpark. And still outperforms v1.3.3 -7  :))
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-30 13:11:59
Some free or pay-what-you-can CD resolution audio if people want to test other genres (and broaden their taste).

https://akelei.bandcamp.com/album/de-zwaarte-van-het-doorstane
https://soundcloud.com/blackmooncircle/sets/andromeda (hit the arrow and the download button, you will get .wav I think)
https://blacktapeforabluegirl.bandcamp.com/album/highlights-name-your-price
original master: https://bongripper.bandcamp.com/album/satan-worshipping-doom
same, a remaster: https://bongripper.bandcamp.com/album/satan-worshipping-doom-2020-remaster
https://kavakon.bandcamp.com/album/virgin-lava
https://nadja.bandcamp.com/album/autopergamene
https://thereisnoreasonforanyofthistohavehappened.bandcamp.com/album/til-eru-hr-sem-hafa-aldrei-veri-menn-og-munu-aldrei-ver-a-au-lifi-enn
https://udom.bandcamp.com/album/and-be-no-more
https://zeitgeber-aus.bandcamp.com/album/heteronomy

and Bach's organ works, I used one of them in my test corpus: http://www.blockmrecords.org/bach/download.htm
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-30 13:31:22
Here is a very biased corpus with only EDM music showing -b2880 is optimal. I would try something divisible by 512 or 576 without further subdividing, as the differences are too small.

EINHÄNDER ORIGINAL SOUNDTRACK
https://vgmdb.net/album/14

Dariusburst Original Soundtrack
https://vgmdb.net/album/16136

carpe diem "SENKO no RONDE" ORIGINAL SOUND TRACKS Volume 2
https://vgmdb.net/album/4419

BORDER DOWN -Sound Tracks-
https://vgmdb.net/album/311

PCM
2822769308

-6
Total encoding time: 0:40.313, 396.94x realtime
1937591945
68.6415%

-l11 -b3456 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.360, 428.32x realtime
1933346426
68.4911%

-l11 -b2560 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.734, 424.07x realtime
1933285028
68.4889%

-l11 -b3072 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.500, 426.72x realtime
1933147042
68.4841%

-l11 -b2880 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.938, 421.79x realtime
1933105922
68.4826%

-7
Total encoding time: 0:45.453, 352.05x realtime
1932486517
68.4607%
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-30 14:20:41
-b 3456 makes for larger files in my corpus (on first few tests). 0.05 to 0.06 percent, so not much, but not the other way.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-01 13:33:12
What FLAC settings should you avoid?
(... if you have my CDDA test corpus and no special considerations.)


Idea: How many bytes do you get for spending an extra second encoding? 
You would pick the low-hanging fruit first. That means, settings that pick "expensive" improvements should be avoided until you are squeezing the last drops out. For example, in my tests, -e is not worth it because you can get the same improvement cheaper elsewhere. (At least, up to settings "nobody" will want to use.) It surely is material dependent; for example, @Gravity Stupor has posted two examples here (https://hydrogenaud.io/index.php/topic,122949.msg1015422.html#msg1015422) and here (https://hydrogenaud.io/index.php/topic,122949.msg1016038.html#msg1016038) about -e actually not being completely dead.


Some initial tl;dr's:
* Avoid -e.
* Avoid -6.
* -p: only  for "-8p", as -7p is not worth it. If you want something in between -8 and -8p, then -8 -A subdivide_tukey(4) is not a bad thing - and maybe even -8 -A subdivide_tukey(5) also makes sense, but somewhere around there you would rather jump to -8p.
* -r8 is not worth it. -r7 ... that depends. I tried an Intel-equipped Dell business laptop and a Ryzen-equipped Acer consumer laptop, and I wouldn't use -r7 on the latter, it took too much time. The Intel Dell shows quite a bit of timing variability, but it seems it does the -r part a bit faster for some strange reason - anyway, I guess it is only in consideration for those who are already at -8p even if I could find a "non-p" setting where it wasn't hopeless?

Note that -8 is "-7 but changing the subdivide_tukey from 2 to 3" [there are some fine detail about that, but forget those], then the natural continuation would be to increase that number - and if you for simplicity apply the rule of thumb that above -5, there is -7, -8, -8p and higher subdivide_tukey(N), you won't do that much wrong.
That said, -7 -l 11 can serve the purpose @sundance tested it to obtain. And myself I found a higher-than-8 customized setting that worked fairly well - the -A "tukey(666e-3);subdivide_tukey(3/333e-3)" which takes the -8 windowing functions, the -5 tapering function, makes them more different from each other and combines them. But that might be a spurious result. But these are fine-tunings, and don't change the impression that the "natural choices" are pretty good with the exceptions in the above bullet items.


Assumptions made:
* FLAC subset or bust - and you will anyway choose -4 or higher
* If you think saving B bytes for a second extra encoding time is acceptable improvement, you will also accept waiting another second for another B bytes saving.  Only when the savings per second falls, you've had enough.
The dubious part of this assumption is that it requires you to behave as if you knew the outcome in advance.
* FLAC decodes fast enough for you: you don't care about decoding CPU footprint.  See https://hydrogenaud.io/index.php/topic,123025.msg1016398.html#msg1016398 and the bottom chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf .
Nothing decodes as fast as FLAC, but if you want something for mass-decoding from SSD or the like, you might want to use -6 or oddballs like -6p or -8p -l 8.
* My hardware and test corpus (and official 1.4 build) are sane enough for testing :-)

One implications is that curves like 3rd diagram at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 "should be convex" (i.e.: stretch a ribbon around it, you won't select a point that is above the ribbon).  You see that -6 violates that: it lies above the straight line from the -5 to the -7 points.  And so you should not choose -6: if you are willing to wait for the improvement -5 to -6, you would also be willing to wait for -7.

Because I am in the land of imprecise measurements, as this involves dividing by the difference between times that may vary between runs.  So I did three measurements, picked the median time for each "genre section" (see signature link), each for an Intel-equipped Dell business laptop and a much cheaper Ryzen-equipped Acer Aspire.  Still the latter is more reliable.


Settings tested:
-4 (well even more) to -8 with or without -e or -p. 
-8 with higher subdivide_tukey(N); that is the same as -7 with higher subdivide_tukey(N): -7 implies N=2, upping it to N=3 yields standard -8, so also N=4 to N=8 are a natural continuation. (Edit: to test. Not saying the borderline between natural to use and not, lies precisely between N=8 and N=9.)
-8 with an additional function; partial_tukey(2) was tested for having a different taper parameter than subdivide_tukey(3), but "more promising" was  -A "tukey(666e-3);subdivide_tukey(3/333e-3)" that adds "only one" tukey function, but changes the tapering "in opposite directions" to make them more different.
-8ep
Also various -r settings, but not on everything.
Finally, a few results with the lighter-than-7 -7 -l 11 because @sundance started on it for time saving purposes. -7 -l 11 isn't a bad thing! (Could also have tested -l 10 then, but I didn't.)
With reference to @sundance 's testing: alternative -b were at no use here.


To the results:
What eliminates -e, is that you can get better compression cheaper. Smaller and faster than -8e are -8 -A subdivide_tukey(5) to (7). Smaller and faster than -8ep are are -8p -A subdivide_tukey(5) to (7).
What eliminates -6, is the "convexity" argument: if you are willing to pay for the improvement from -5 to -6, then the improvement from -6 to -7 is so much cheaper: the next byte saved costs less time than the previous.
What eliminates -r 8 is the same as -6. Although, I have not checked whether -8p -A subdivide_tukey(big number) -r 8 can be improved upon by -8p -A subdivide_tukey(bigger number) when "big" is too high.
And I don't think -r7 is useful until you are already at least at -8p.
Since -r7 is "questionable", one may wonder whether -r5 saves a lot of time with insignificant size difference? Turns out that it doesn't save much time. If I allow myself to deviate a little bit from the assumptions and say "at best you won't bother"

When to use -p? As said above, somewhere around -8 -A subdivide_tukey(4) to (5) you will rather take the (quite big!) time cost to get the benefit, as increasing the subdivide_tukey parameter will increase time at small benefit.
At -8p you are at about the same s(h)avings per second as going from WavPack -hx to -hx4.

Then the convexity argument could even rule out -5 in favour of "either -4 or -7" - even more so if one finds a good "lighter -7 alternative". Of course -5 will be used quite a lot out of being the default, but the argument for -4 would be as follows: If you think -5 is better than -7, as the time saved makes up for the size, you get about as good a time saving per megabyte to go to -4. (Not to -3, reveals a brief check.)

The above considerations don't appear to depend heavily on what "genre section" of my corpus I used. Sure some features are more pronounced on this or that, and there might possibly be an odd "error" in the sense that if I had deleted myself down to only one, I would have eliminated otherwise - but the big picture remains.

Lighter than -7, you said? Actually, -7 -11 might be considered. It saves some ten percent time on the total; but more significant, the extra time -5 to -7 is cut by a third. But the size gain from -5 to -7 then? Oh, you only forego ten percent of that.

And finally, how do all these considerations compare to FLAC 1.3? Not tested so much, but as you can expect: the double percision improvement picks some low-hanging fruit, so going -5 to -7 is not as lucrative as it was. On the Ryzen: 1.3.x -5 to -7 would save you 143 kilobytes per extra second taken, and this number is now reduced to 105. For -7 to -8, you would save 21 kilobytes per extra second of encoding, now down to 8.


Some numerical examples: on the Acer/Ryzen, not the most expensive CPU but I got more consistent timings.
-5 takes 419 seconds for 12034842978 bytes, -7 takes 603 seconds for 11976656017 bytes. Size difference divided by time difference becomes 316 kilobytes.
But if instead we went only to -7 -l 11, that would be 535 seconds for 11982589839 bytes. Now we can calculate two sizediff/timediff ratios: -5 to -7 -l 11 becomes 449, while -7 -l 11 becomes 87.
Those are as they should be, 449 is > 87; had the order been the other way around, -7 -l 11 would have been outright bad.
-7 to -8: saves about 30k per second. Again, makes sense that this is < 87, that means we pick those two fruits in the correct order.
-8 to -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)"  (mentioned in a post above): saves about 10k per second
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)"  to -8 -A subdivide_tukey(4): saves about 4k.
Going forth to (5) and then from there to -8p and then from -8p to -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)": around 3k at each step.
-8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" to -8p -A subdivide_tukey(4): down to half a k. Only slightly less from (4) to (5)

What about -r7 being bad (on the Ryzen)? From -8p to -8p -r7 you only save like 0.36 kilobytes per extra second. And from -8p -r7 to -8p -r8: only 22 bytes. But it looks like the Intel i7 does -r7 faster than the Ryzen does and can be worth it somewhere.


Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-01 23:16:17
Oh, and:
To speed up -7, one could of course consider dropping the subdivide_tukey(2) (that would then default to subdivide_tukey(1)=tukey(5e-1)). But that gives worse results than going down to -7 -l 11.

Then a stupid error, essentially a common factor of three due to three runs:
And finally, how do all these considerations compare to FLAC 1.3? Not tested so much, but as you can expect: the double percision improvement picks some low-hanging fruit, so going -5 to -7 is not as lucrative as it was. On the Ryzen: 1.3.x -5 to -7 would save you 143 kilobytes per extra second taken, and this number is now reduced to 105. For -7 to -8, you would save 21 kilobytes per extra second of encoding, now down to 8.
Wrong, that was sum over three runs. Luckily, the relationships between them are all good: you can multiply them all by three (well that gets you mean rather than the median I have elsewhere used, but, no big deal).
-5 takes 419 seconds for 12034842978 bytes, -7 takes 603 seconds for 11976656017 bytes. Size difference divided by time difference becomes 316 kilobytes.
This is correct with 1.4.1: Divide 316 by three and you get 105-ish. (Three quite even runs and roundoffs ...)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-02 08:55:39
@john33
I tried some files in the 2L websites.
http://www.2l.no/hires/
Code: [Select]
filename                              MD5 sum
-------------------------------------------------------------------------------
2L-038_01_stereo_FLAC_44k_16b.flac             80b5c0c20168c21073c95699a0bbf992
2L-064_stereo192kHz_01_08.flac                 576ed036eb1ffe78bdb40a131e6dd23f
2L-120_01_stereo.mqacd.mqa.flac                622afbe406784c4cd54226350b2289c9
2L-125_stereo-352k-24b_04.flac                 3f3ffbdb84654e7fb22767fedcfaa30e
2L-139_01_stereo.mqa.flac                      78f2e3afc697cbc4d3ff43567e968187
2L48SACD_14_stereo_96k.flac                    1fb4209b9db97a0089baf37ca5846214
The files are then converted to wav without changing the original bit-depth and sample rate on a RAM drive, then converted to flac with these settings.
X

Then I disabled and enabled AVX support in BIOS, and CPU-Z reported that AVX, AVX2 and FMA3 are being affected. Some flac builds crashed wtih AVX disabled, so here are the tests on some non-crashing builds:

Xiph
Total encoding time: 2:32.250, 14.12x realtime
425513526 bytes
Total encoding time: 1:12.562, 29.63x realtime
425513433 bytes

Free encoder pack
Total encoding time: 2:17.781, 15.60x realtime
425513499 bytes
Total encoding time: 1:36.328, 22.32x realtime
425513499 bytes

https://hydrogenaud.io/index.php/topic,123014.msg1016215.html#msg1016215
Total encoding time: 2:26.344, 14.69x realtime
425513526 bytes
Total encoding time: 1:13.328, 29.32x realtime
425513471 bytes

https://www.rarewares.org/files/lossless/flac-1.4.1-x64.zip
Total encoding time: 2:35.859, 13.79x realtime
425513666 bytes
Total encoding time: 2:36.781, 13.71x realtime
425513632 bytes

The rarewares build is the only one showed almost no speed difference, is this expected?
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-10-02 09:02:56
The rarewares build is generic with no cpu optimisations so I'm not really surprised.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-02 10:18:19
Thanks. Here are results with some AVX-only builds.

Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-znver2-GCC1220.zip
Total encoding time: 1:13.328, 29.32x realtime
425513429 bytes

znver3 (https://hydrogenaud.io/index.php/topic,123014.msg1016407.html#msg1016407)
Total encoding time: 1:11.891, 29.91x realtime
425513429 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-AVX2%20-GCC1220.zip
Total encoding time: 1:12.250, 29.76x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

It seems that the Ryzen builds have no compatibility issue with my Intel CPU.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-10-02 13:07:22
Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

As far as I can tell, the only difference between these two builds is GCC version (12.2.0 for the former, 7.3.0 for the latter). @Case is that correct?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-02 15:05:02
https://hydrogenaud.io/index.php/topic,123025.msg1016339.html#msg1016339
The differences are quite consistent on multi-thread tests as well.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Case on 2022-10-02 17:22:07
Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

As far as I can tell, the only difference between these two builds is GCC version (12.2.0 for the former, 7.3.0 for the latter). @Case is that correct?
That is correct. The builds use identical configuration settings but different compiler version.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-10-02 21:45:33
That is correct. The builds use identical configuration settings but different compiler version.

Not huge, but a pretty decent improvement on GCC's part. Obviously enough to make it jump from last to 1st place in that particular list.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-03 01:36:40
One quick and dirty metaflac performance test. I apply album RG to 20 24bit albums. 18,63GB altogether on my Ryzen 5900x.
Time in minutes:
2:42 xiph official
2:33 john33 GCC 12.2.0 znver2
2:28 Case GCC 12.2.0 thanks Case btw. :) from here: (https://hydrogenaud.io/index.php/topic,123014.msg1016842.html#msg1016842)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-05 02:28:59
Finaly found some time to set up GCC. I used the flags Case kindly offered but set CFLAG -Ofast instead of -O3.
The binary is even a bit faster over here. Maybe others want to try because -Ofast may include optimizations that make problems. It is flac from current git.

btw. if this is faster it would be nice if Case could do a 1.41 official flac and metaflac. I am not so sure about the preconfigured scripts i use. If you want recent git versions Netranger is the man. He has routine with that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-05 09:10:14
Yes, faster on my i3-12100 too. The first result is 4 CDs combined into a single .wav, the second result is the same 4 CDs split into 85 files. All tests used -8p.

Case GCC 12.2.0
Total encoding time: 2:23.610, 101.98x realtime
1465547067 bytes
Total encoding time: 0:33.735, 434.16x realtime
1465976688 bytes

Wombat GCC 12.2.0 Ofast
Total encoding time: 2:19.765, 104.79x realtime
1465547074 bytes
Total encoding time: 0:32.875, 445.52x realtime
1465977283 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-05 11:30:01
Code: [Select]
FLAC Binary: flac141-wombat-gcc1220-OFast.exe (794624 bytes)
FLAC Option: -7
 Average time =  26.876 seconds (5 rounds), Encoding speed = 402.29x
 FLAC size = 1.167.014.661 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  25.931 seconds (3 rounds), Encoding speed = 416.95x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
Global  Time =    25.546 =  100%    Physical Memory =     14 MB
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-05 15:21:14
Maybe others want to try because -Ofast may include optimizations that make problems.
Googled a bit and it seems that -Ofast is not an optimization on processor architecture, it does optimization by changing some floating point logic. Would like to see comments from some developers about whether it is safe to use or not in the case of FLAC.
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-10-05 16:04:33
Actual definition is:
Quote
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-05 16:25:25
The created files are exactly the same here using -O3 or -Ofast.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-05 16:32:22
floating point logic. Would like to see comments from some developers about whether it is safe to use or not in the case of FLAC.

No reason it should be unsafe. You use whatever-you-like to come up with a predictor, good or bad - the difference between a "good" and a "bad" one being the size of the residual. At the end of this post, (https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227) ktf refers to this Stackexchange question (https://math.stackexchange.com/questions/4488974/efficient-way-of-solving-a-matrix-equation-with-integer-solution) where the point is described: it rounds off, and maybe that gives a slightly suboptimal predictor, which may be a reason why -p makes for a better performance "than it should".

Different compiles leading to slightly different .flac files has been a FAQ item for ages, and lo and behold they still do even if Wombat did not experience such differences right here.  https://hydrogenaud.io/index.php/topic,122949.msg1015699.html#msg1015699 . Unlike Monkey's - where signal and mode (Normal, High etc) give a unique encoded file except header/footer (tags) stuff and thus Monkey's chooses an MD5 on the encode and not the PCM - there are literally millions of potential FLAC files that represent the same audio.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-05 16:49:10
It is already shown on other tests that different compiles created different files which decoded to the same PCM output, no MD5 error and such, like this:
[edit: corrected wrong link]
https://hydrogenaud.io/index.php/topic,123025.msg1016817.html#msg1016817

But this article indicated that for example, Infinities and NaNs can be treated differently:
https://simonbyrne.github.io/notes/fastmath/

So it is not only a math precision issue, it depends on how the higher level codes want the program to do. I am not saying the risk of producing non-bitperfect FLAC files, but risks of significant slowdown or crash when encoding some inputs, especially the crafted ones.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Vladeimir on 2022-10-05 17:44:10
The FMA intrinsics are compiled with "-ffast-math".

https://github.com/xiph/flac/blob/master/src/libFLAC/include/private/cpu.h#L110
https://github.com/xiph/flac/blob/master/src/libFLAC/lpc_intrin_fma.c#L46

I am not sure why the SSE and AVX ones are not.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-05 18:03:56
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
That's why a -Ofast compile from Case's enviroment could be interesting.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-05 18:05:16
The FMA intrinsics are compiled with "-ffast-math".
[...]
I am not sure why the SSE and AVX ones are not.
Because the SSE and AVX code is with intrinsics, but the FMA is plain C targeted at FMA. For SSE and AVX instructions need not to be reordered, but with FMA there is this need, so -fassociative-math is needed, which is part of -ffast-math
Title: Re: FLAC v1.4.x Performance Tests
Post by: Vladeimir on 2022-10-05 18:27:44
That makes total sense, thank you.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-06 04:58:34
A gcc -Ofast compile of flac and metaflac from reference lib flac 1.4.1.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-06 11:22:46
subdivide_tukey(N/taper), testing the tapering parameter

Background: As explained in the docs (https://xiph.org/flac/documentation_tools_flac.html), subdivide_tukey(N) by default tapers a fraction of 0.5 divided by N. The "/" is not a division slash here, only a separator character: subdivide_tukey(N/P) means that a fraction P/N is tapered.
P defaults to 0.5 - as has been the case for the default windowing function up to level -5, and before 1.4 also up to -8.
The question is why divide the 0.5 (or other P) by N? Must be: ktf &co have tested it and found it improves.

Indeed, I found some evidence that it "is not enough": if you bother to tweak, make the tapering parameter even smaller. 
Test for yourselves, it is likely material-dependent. And use scientific notation like 8e-2 rather than the locale-dependent 0.08 or 0,08.

What I did: ran tapering parameters 8e-2, 16e-2, upwards in steps of 8, with N=3, 4, 5. So, subdivide_tukey(3/8e-2), subdivide_tukey(3/16e-2), ... and then bumping the "3" up to 4 and 5. (Not hitting -8 exactly - I did subdivide_tukey(3/48e-2) and not 3/50e-2 , but I had already a standard -8 ... not that it mattered.) Why stop at 5? Because testing indicated that around there, -8p becomes more attractive.

Results: For both N=3 (as in -8!), N=4 and N=5, the taper parameter that made for smallest files, was 24e-2 i.e. 0.24 rather than the default 0.5. Then 0.32 was marginally better than 0.16.

This points at a smaller taper parameter than the one third in the "arbitrarily tested" combination in Reply 48 (https://hydrogenaud.io/index.php/topic,123025.msg1016625.html#msg1016625).

Also checked (this preliminary): as in Reply 48, combining with a bigger single tukey.
Hypothesis: because single tukey has always had the default parameter 0.5 - this after quite a bit of testing back in the day - there is no good reason that this small tapering should be good for a single tukey run, ==> reason why it works is for the subdivisions ==> if you want to improve, try one with a bigger taper parameter like -5 uses.
This to be tested with -p, because - I guess! - there is more of a demand for something that squeezes out more than -8p without wasting the month, than for something between -8 and -8p. (However, since -7 is so good, it might be a case for beefing up -8 further.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 15:11:38
Code: [Select]
FLAC Binary: flac141-wombat-gcc1220-OFast.exe (794624 bytes)
FLAC Option: -7
 Average time =  26.876 seconds (5 rounds), Encoding speed = 402.29x
 FLAC size = 1.167.014.661 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  25.931 seconds (3 rounds), Encoding speed = 416.95x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
Global  Time =    25.546 =  100%    Physical Memory =     14 MB
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
Could you do the same test on the Xiph build as well? Because it is interesting that your speed ranking is opposite to mine. For me Wombat's build is the fastest while Case's GCC 7.3.0 is the slowest.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-06 15:46:11
One quick and dirty metaflac performance test. I apply album RG to 20 24bit albums. 18,63GB altogether on my Ryzen 5900x.
Time in minutes:
2:42 xiph official
2:33 john33 GCC 12.2.0 znver2
2:28 Case GCC 12.2.0 thanks Case btw. :) from here: (https://hydrogenaud.io/index.php/topic,123014.msg1016842.html#msg1016842)
2:12 for the -Ofast version.
All files i created on the Ryzen are the same as the -O3 compile. The further -Ofast optimizations work well together with the flac code.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 16:34:11
@bennetng: Here you go:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
- Average time =  25.288 seconds (5 rounds), Encoding speed = 427.56x (a little faster today... ;)
- FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: xiph-141\flac.exe (299520 bytes)
FLAC Option: -7
- Average time =  27.598 seconds (5 rounds), Encoding speed = 391.77x
- FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 17:03:54
Thanks. Interesting that my results are more similar to the Ryzen than a not much older i7.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-06 18:31:08
Thanks. Interesting that my results are more similar to the Ryzen than a not much older i7.
Four generations, four years - but nothing revolutionary in the architecture? I agree that it is kinda unexpected.
Thinking aloud:
* You are both running them single-threaded? They differ: 6 cores 12 threads  vs 4 cores 8 threads.
* Is there any reason that e.g. RAM should matter?
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 18:46:17
My test environment is this:
CPU: Intel Core i7-8700 CPU @ 3.20GHz
RAM: 2 x 16 GB DDR4-2666 (1333 MHz) SK-Hynix
HDD: Samsung SSD 860 EVO 500GB

Both the source WAVs and the created FLACs come from/go to that SSD.
Btw. is there any (= freeware, trusted, non-system-cluttering) RAM disk solution for Windows 10 to recommend?
And yes, I'm running the test single core, file by file, from a console window. Timing is done with Igor Pavlov's timer64.exe.
But I did not try to select a specific CPU (I think there are tools for this), but let the Windows task manager do its job. So when you watch the task manager, there's not a single CPU constantly at 100% while the test is running, but CPUs are swapped.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 19:50:38
In my case, single or multi-thread does not affect speed ranking. For example Case's GCC 7.3.0 Haswell compile is always the slowest in both single and multi-thread tests.

For RAM, I am using a budget motherboard which only supports DDR4, even though the CPU supports DDR5. DDR4 has been mainstream for more than 5 years. I am using 2x8GB DDR4 3200.

As for AVX, AVX2 and FMA3, the 2013 Intel Haswell (4th gen) already supports all of them, and I was using i3-4160 before February this year.

In fact, Intel 12th gen does not officially support AVX-512, but some of the older Core i does, even though flac 1.4.x does not seem to use AVX-512 at all.

I am using this RAM disk:
https://sourceforge.net/projects/imdisk-toolkit/
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 20:20:05
I am using this RAM disk:
https://sourceforge.net/projects/imdisk-toolkit/
Since you use (and like) it, I'm gonna give it a try.
Does this RAM Disk hold your WAVs and FLACs during your performance tests?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 20:48:46
Does this RAM Disk hold your WAVs and FLACs during your performance tests?
Yes, all files are in the RAM drive, but I don't use timer64, I use foobar's console for timing. To enforce a single encoder instance, either combine everything into a single file, or do this in foobar's converter dialog:
https://hydrogenaud.io/index.php/topic,123025.msg1016809.html#msg1016809

Also, if relevant, I always use FAT32 to format the RAM drive, as NTFS is a more complex file system and occupies more space when formatted. The limitation is FAT32 only allows up to 4GB for a single file. If you have 32GB it should be no issue to create at least a 24GB RAM drive, but a single file cannot exceed 4GB if formatted in FAT32.

Make sure "Create virtual disk in physical memory" is selected when creating the RAM disk.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 21:13:56
Tried in foobar2000 like you suggested (single thread, 40 WAVs):
-> Total encoding time: 0:39.531, 273.51x realtime (single thread)
-> Total encoding time: 0:06.688, 1616.65x realtime (allow multiple threads), around 6x faster (matches the 6 cores)
But the single thread encode is way slower compared to flac.exe started in a console window: 0:25.288

Btw. Tested the RAM disk (NTFS) and the encoding time improved by around 200 msec (0.8%) + you extend your SSDs lifetime...
Going to repeat the test with FAT32...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 21:36:37
My previous tests with CDDA including -7 and other settings:
https://hydrogenaud.io/index.php/topic,123025.msg1016652.html#msg1016652

The important thing is relative speed ranking, for example, is Case's GCC v7.3.0 Haswell compile still the fastest when the test method is changed?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-06 21:43:38
In fact, Intel 12th gen does not officially support AVX-512, but some of the older Core i does, even though flac 1.4.x does not seem to use AVX-512 at all.

Sundance's i7-8gen (https://ark.intel.com/content/www/us/en/ark/products/126686/intel-core-i78700-processor-12m-cache-up-to-4-60-ghz.html) doesn't either, it seems. Your i3-12gen here, (https://ark.intel.com/content/www/us/en/ark/products/134584/intel-core-i312100-processor-12m-cache-up-to-4-30-ghz.html) the same instruction set extensions are listed.

However the 12th generation boasts the fancy name of "Gaussian & Neural Accelerator" which, at the risk of just parroting marketing spin,  "is an ultra-low power accelerator block designed to run audio and speed-centric AI workloads. Intel® GNA is designed to run audio based neural networks at ultra-low power, while simultaneously relieving the CPU of this workload."
Not sure if anything will utilize that?!
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 21:49:26
I disabled GNA in BIOS in all tests.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-07 09:25:44
Seems I'm quite limited here with my hp BIOS @ Elitedesk 800 G4.
There are no such settings like to disable some of the extended CPU features, I only can toggle "Multithreading" and "VTx"...  :(
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 09:30:46
The FMA intrinsics are compiled with "-ffast-math".
[...]
I am not sure why the SSE and AVX ones are not.
Because the SSE and AVX code is with intrinsics, but the FMA is plain C targeted at FMA. For SSE and AVX instructions need not to be reordered, but with FMA there is this need, so -fassociative-math is needed, which is part of -ffast-math
Does it mean using -Ofast globally can affect something completely irrelevant like progress indicator and such? Are there inline codes to prevent such kinds of global optimizations in certain parts of the codes?

My experience in vectorization is rather limited in GPU shaders and game engines, without touching low level stuff like intrinsics.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-07 15:14:34
Btw. Tested the RAM disk (NTFS) and the encoding time improved by around 200 msec (0.8%) + you extend your SSDs lifetime...
Going to repeat the test with FAT32...
I use the softperfect RAMdisk and exFAT is clearly the fastest with it but has a very big overhead due to its 64k cluster size. It shouldn't matter until you use lots of small files on it.
Until lately Windows had a uppercase renaming bug together with exFAT. That was fixed lately.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 16:10:30
Seems I'm quite limited here with my hp BIOS @ Elitedesk 800 G4.
There are no such settings like to disable some of the extended CPU features, I only can toggle "Multithreading" and "VTx"...  :(
Motherboards being sold separately like the ones from Asus, Gigabyte, MSI and such usually offer more options.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 17:51:16
Sizesecondssaved per secondsetting
11969604531 833    -8
11968502388 94010300 -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
1196755657511554399 -8 -A "subdivide_tukey(4)"
1196646337115552733 -8 -A "subdivide_tukey(5)"
1196129143330033572 -8p (note jump in time when using -p)
1196017971933503204 -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
119592501645131522 -8p -A "subdivide_tukey(4)"
119581254247796422 -8p -A "subdivide_tukey(5)"
How about this setting on your corpus?
-8 -A "tukey(75e-2);subdivide_tukey(3/25e-2)"
Of course I asked this because it works better on my corpus (about one day of duration), and I adjusted the corpus weighting so that the compression ratio is roughly 55%. You can also try other values which don't require rounding, for example 666e-3 may mean something like 0.66600000858306884765625 in single float.

Title: Re: FLAC v1.4.x Performance Tests
Post by: Octocontrabass on 2022-10-07 19:15:32
GNA [...] Not sure if anything will utilize that?!
It's like a GPU but much smaller. FLAC won't use it.

Does it mean using -Ofast globally can affect something completely irrelevant like progress indicator and such?
Potentially yes, but in practice most floating-point code doesn't rely on the compiler precisely following the floating-point standards. If the progress indicator is affected, you probably wouldn't be able to see what's different.

The real problem with -Ofast is that it can insert code that switches the CPU into a faster but not standards-compliant mode, and this can affect any program that loads a library compiled with -Ofast.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 20:13:51
I somehow understand why this Audition bug happened and what the OptimFROG author wanted to correct:
https://hydrogenaud.io/index.php/topic,114816.msg1009053.html#msg1009053
My program will definitely fail if done in the -Ofast way.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-08 10:01:44
How about this setting on your corpus?
-8 -A "tukey(75e-2);subdivide_tukey(3/25e-2)"
Of course I asked this because it works better on my corpus (about one day of duration),

Improves - because of the 25e-2.  The difference between 75e-2 and 666e-3 in the single tukey is ambiguous over genre (the latter is better in the classical section, the former in the "other"), but the overall impact is less than a part per million.
Tested same with "-p" added.

But lowering the subdivide_tukey tapering parameter helps and I think it should be even lower. I tested and found -A subdivide_tukey(24e-2) (https://hydrogenaud.io/index.php/topic,123025.msg1017066.html#msg1017066) to be a good one without the additional -A tukey, but preliminary testing indicates that 25e-2 is "too high" in the presence of that.

The 666 & 333 were not "optimal" choices - they were picked more out of the idea that if I wanted to deviate from 1/2 and 1/2 parameters, then "2/3 and 1/3" would be the next idea. I surely tested both 666&333 and 333&666, but I didn't do any exhaustive testing. So why then state with this three-decimal "accuracy"? Hey, 3/333e-3 is easy to remember. (And then the metal swine selected 666 over 667 for kinda the same reason.)


You can also try other values which don't require rounding, for example 666e-3 may mean something like 0.66600000858306884765625 in single float.
The predictor is rounded off to integer, so decimals beyond some kth will in the very least not matter very often. Quick testing on 11 CD images, starting from your 0.75 & 0.25, I got bit-identical files if I tweaked the fifth decimal, but the fourth would matter. I mean, not "matter" much, but yield different files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-09 11:49:11
Tested: To "-8" and above, added a tukey to a subdivide_tukey, various taperings tested. Do these choices make (much) different impact across genres? (No!)
-8p -A "tukey(Q);subdivide_tukey(N/P)" for N=3, 4, 5 and various P and Q.
Also without "-p".

Of course it doesn't matter much! On one hand, you can shrug it off as nothing by saying that for N=3, the extra tukey - with "optimal" parameters - saves 0.01 percent over standard -8p, and good/bad parameters make for only half of this. Nothing to care about? On the other hand, it is only slightly less than going up to N=4, and slightly more than going from N=4 to N=5. Each of those cost much more time.
So if standard -8p is not enough for you - well for the sports of it I guess - and you are ready to type in some -A manually, you might as well consider this. Same if you want to go up from -8 but without all the way to -8p; then you can just remove the "p" from the below, your material is likely to make more difference than that.

tl;dr: if adding an additional tukey to get ~half the benefit of higher subdivide_tukey at a fraction of the extra time, make its tapering parameter bigger than default (well maybe default if you are at very high compression) and the subdivide_tukey taper parameter very small.
If you like to think in 1/16ths terms: after a bit tweaking, you could try something like 11, 10, 9 or 10, 9, 8 combined with a 1/8 as follows:
N=3: -8p -A "tukey(6875e-4);subdivide_tukey(3/125e-3)" <---- 11/16ths & 1/8th, or reduce the first to 10/16ths for classical music
N=4: -8p -A "tukey(6250e-4);subdivide_tukey(4/125e-3)" <---- 10/16ths & 1/8th, or reduce the first to 9/16ths for classical music. Yes keep the 1/8th.
N=5: -8p -A "tukey(5625e-4);subdivide_tukey(5/125e-3)" <---- 9/16ths & 1/8th, or reduce the first to 5e-1 for classical music. Again keep the 1/8th.

But the genre differences between classical, heavier/metal and "other" didn't cause much drama - not even "relatively" to the very small impact of it all. That is kinda reassuring; even if classical music could use N/2e-1, it gained virtually nothing going down to N/<one eighth>.


So just to explain what I did here:
Also checked (this preliminary): as in Reply 48, combining with a bigger single tukey.
Hypothesis: because single tukey has always had the default parameter 0.5 - this after quite a bit of testing back in the day - there is no good reason that this small tapering should be good for a single tukey run, ==> reason why it works is for the subdivisions ==> if you want to improve, try one with a bigger taper parameter like -5 uses.
This to be tested with -p
I first made the "arbitrary" selection (files with "j" in the name) and then ran the test on the remainder, distinguishing between the classical music, the heavy rock/metal and the "other".
The P and Q are "7e-2", "14e-2" etc., i.e. 0.07 apart, though only the "most reasonable" ones tested on the big corpus. Then tweaked the parameters slightly from the "best", if only to see if small tweaks led to unexpectedly big changes. (They did not.)

Results: Well not unexpected given Reply 48: Make the Q and P tapering parameters quite far from each other as tukey(<big P>);subdivide_tukey(N/<small Q>). The "big" does not mean close to 1, though.
Genre differences: Nothing dramatic - nothing "relatively dramatic" relative to the .01 percent impact either. Sure there is a clear pattern in that the heavier music wants smaller Q, down below 0.1, and also slightly bigger P, but not much - and the classical music calls for slightly lower P. But the "overall" minimum is not far (in kilobytes) from each genre's minimum.

So the first runs ended up with
N=3: -8p -A tukey(70e-2);subdivide_tukey(3/14e-2)
N=4: -8p -A tukey(56e-2);subdivide_tukey(4/14e-2)
N=5: -8p -A tukey(49e-2);subdivide_tukey(5/14e-2)
Tweaking it and looking at genre differences, I ended up with something like up there with the tl;dr. It was the classical music section that made the "56" and "49" win, and it is the heavier section that pulls the other direction. The 14 was a bit too high except for classical music where it mattered very very little, like a few kb on 4 giga.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-09 16:22:48
With only -8 subdivide_tukey(3/x) I got these figures, from best to worst compression:

Difficult content (~70.64% compression ratio)

3/1875e-4
2531723115 bytes

3/2e-1
2531723128 bytes

3/22e-2
2531723205 bytes

3/25e-2
2531724460 bytes

3/125e-3
2531724619 bytes

-8
2531763292 bytes

Difficult contents are the usual electronic music in my collection, and some loudness war songs. Simple contents include speech, classical, ethnic and songs with simple accompaniment.

Simple content (~43.13% compression ratio)

3/25e-2
1799030423 bytes

3/2e-1
1799035729 bytes

3/22e-2
1799046777 bytes

3/1875e-4
1799054794 bytes

3/125e-3
1799080973 bytes

-8
1799116764 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-09 18:45:33
With only -8 subdivide_tukey(3/x) I got these figures, from best to worst compression:

Here is where I actually got a weirdness: .21 was worse than both .20 and .22. And the effect was not due to one genre. Tested a few more because in Reply #93 (https://hydrogenaud.io/index.php/topic,123025.msg1017066.html#msg1017066) I found .32 to be better than .16 over all three, so the below results point at a parameter slightly less than expected.

Anyway, disregarding .21 and doing (nearly) only your parameters, results are not outrageously far from yours, but slightly different - I suspect your speech content makes some impact?
 2e-1 was the best for both classical music and the "other" section. .22 was better than .1875 in these two genre sections. .2 also won the overall.

For my heavier material, go lower: 3/125e-3 is better than .1875 better than .2 better than .22 better than .25
Also checked 1e-1, which narrowly lost to 125e-3.

Impact of choosing "wrong": With your "simple" content, even the difference between the two best was like 3 parts per million. For my classical music, everything from .1875 and up would be within that interval, and same for the "other" genre.
But for your "difficult" material, everything from 125e-3 and up fell within one ppm, and my material needed 4ppm.
Not much still.


The low tapering parameter I found in Reply 115 just underlines that with an additional tukey, you want the two tukeys to be different.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-10 04:11:56
Guess this is my last try.
I am using ./configure options also now from Case's suggestion, -Ofast and -fipa-pta suggested elsewhere. -fipa-pta optimizes a tiny bit and saves some kb from the binaries by only the cost of compiling time.

Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-10 08:22:47
On my set of test files, your latest (really hopefully not last) build is right between Case's gcc v12.2 and gcc v7.3 builds:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes) = gcc v7.3
FLAC Option: -7
 Average time =  25.268 seconds (3 rounds), Encoding speed = 427.89x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-wombat2.exe (784384 bytes)
FLAC Option: -7
 Average time =  25.710 seconds (3 rounds), Encoding speed = 420.54x
 FLAC size = 1.167.014.381 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  26.100 seconds (3 rounds), Encoding speed = 414.26x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

And, fwiw, I was able to get some speed gain compared to plain -7 (on my test set [classic rock music]) @ almost no cost with smaller block size:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7 -b3584
 Average time =  23.949 seconds (3 rounds), Encoding speed = 451.46x <= faster encoding (428x -> 451x) [ comparted to -7]
 FLAC size = 1.167.032.442 bytes (= 61,189% of WAV size, ~863 kbps) <= min. worse compression: 0.001 percent points
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-10 17:07:28
Added more electronic and loudness war contents, hand-picked to only include the highest bitrate files, but does not contain noise music. Around 74.5% compression ratio.

1.3.1 (Xiph)

-8 -b2304
3200412387 bytes

-8
3202131236 bytes

1.3.2 (Xiph)

-8 -b2304
3200203911 bytes

-8
3201989505 bytes

1.4.1 (Case GCC 12.2.0)

-8 -b2304
3199429338 bytes

-8
3201122995 bytes

-8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)"
3201407279 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-10 18:22:45
Yikes, I suck at PowerShell ...

Can anyone hack together for me a script that does the following:

FOR every *.flac IN (D:\given path pattern...\*.flac) DO flac <parameters> with output <same filename except that in E: rather than D>
and measures total CPU time and total time including I/O?

Point being: how much "compression effort" is "free in time" because it compresses while busy writing?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-11 17:50:16
lol. Why look so far. :) When searching the web for compiler options guess where it leads to?
far far away (https://hydrogenaud.io/index.php/topic,118008.msg982030.html#msg982030)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-11 20:18:51
lol. Why look so far. :) When searching the web for compiler options guess where it leads to?
far far away (https://hydrogenaud.io/index.php/topic,118008.msg982030.html#msg982030)
"This is likely to be my last build"

Cue 2022:
Guess this is my last try.

Porcus quoting self:
Quote from: Porcus
rehab is for quitters

 O:)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-11 21:04:20
You got me  :-[  but somehow it makes to much fun  :)
Guess i have to try some more and maybe a 'skylake' version for sundance to test when i am at my PC later.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-11 21:09:14
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476

This might make compiling with -march=native much more rewarding when combined with --disable-asm-optimizations. I've changed to code in such a way that it is much easier to vectorize by a compiler. Currently the intrinsics routines cannot really be tuned by a compiler, but with this change a compiler can use the C code to get an even better result.

I've seen improvements of over 10% with preset 8, when run with -march=native. One could then use AVX512 for example. I don't have access to hardware with AVX512, so I can't say whether that would make sense.

Also, as a bug was found in libFLAC that affects playback with gstreamer, I won't wait long with releasing.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-10-11 22:30:56
much easier to vectorize by a compiler.

Speaking of which, GCC 12 vectorizes even at -O2:
Quote from: https://gcc.gnu.org/gcc-12/changes.html
Vectorization is enabled at -O2 which is now equivalent to the original -O2 -ftree-vectorize -fvect-cost-model=very-cheap.

Not sure how much of an impact this particular change has here given that pretty much everyone just builds FLAC with -O3, but it's interesting nonetheless.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 01:47:09
I took the flags GrieverV uses back with the 1.3.3 version. I left out the math flags because they are part of -Ofast already.
The compile options as single steps are hard to measure here but using them all together creates clearly smaller binaries with a small speed advantage.
A -8p single file encode is now at ~110x vs ~108x or ~1070x vs ~1058x for multiple files in foobar.
I have atached a gcc skylake tuned version also. No difference for me on the 5900x against the haswell tuning but others may test.
And now lets see what ktf's git offers :)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 02:53:43
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476
I compiled reference libFLAC git-3d55a9dc 20221009 in 3 ways but left all additional flags in, sorry.
Nonetheless strange results.
I added --disable-asm-optimizations to ../configure
Compare the numbers to my posts above.

mtune=native is really slow!
408x
40x
since i have a Zen 3 5900x i tried mtune=znver3 and it crawls exactly as slow.

Finaly a mtune=haswell and numbers are almost normal but not fast.
981.51x
104x

It may be it collides with the additional flags but i wonder why mtune=haswell works.

Edit: the same slowness for mtune=native without fancy additional flags
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-12 06:36:56
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476
I compiled reference libFLAC git-3d55a9dc 20221009 in 3 ways but left all additional flags in, sorry.
You've compiled the wrong branch. The branch you're compiling is one without the mentioned optimizations. Checkout branch libFLAC-fast-math
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-12 07:41:07
@Wombat : Tested your builds here:
Code: [Select]
Reference:
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
 Average time =  25.384 seconds (5 rounds), Encoding speed = 425.94x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-wombat-manyflags.exe (718848 bytes)
FLAC Option: -7
 Average time =  25.283 seconds (5 rounds), Encoding speed = 427.65x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-wombat-manyflags-skylake.exe (712192 bytes)
FLAC Option: -7
 Average time =  26.346 seconds (5 rounds), Encoding speed = 410.39x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
So your "manyflags" build with GrieverV settings is a little faster than Case's Haswell build here. But, oddly enough, your Skylake build is slower although my 8th gen i7 is a family member...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 11:15:14
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476

This might make compiling with -march=native much more rewarding when combined with --disable-asm-optimizations. I've changed to code in such a way that it is much easier to vectorize by a compiler. Currently the intrinsics routines cannot really be tuned by a compiler, but with this change a compiler can use the C code to get an even better result.

I've seen improvements of over 10% with preset 8, when run with -march=native. One could then use AVX512 for example. I don't have access to hardware with AVX512, so I can't say whether that would make sense.

Also, as a bug was found in libFLAC that affects playback with gstreamer, I won't wait long with releasing.
Intel's way to deal with AVX-512 in 12th gen Core i is completely unfair to the the non-K i5 and i3 as they don't use E-cores so there should be no compatibility issue with AVX-512. @Porcus  ' CPU should support AVX-512?
On this CPU, an 11th generation i7 mobile

Also thanks for looking into the -ffast-math issue.
Title: Re: FLAC v1.4.x Performance Tests
Post by: cid42 on 2022-10-12 11:57:22
...
Intel's way to deal with AVX-512 in 12th gen Core i is completely unfair to the the non-K i5 and i3 as they don't use E-cores so there should be no compatibility issue with AVX-512. @Porcus  ' CPU should support AVX-512?
Earlier 12th gen didn't have AVX512 fused off so some motherboard+bios combinations allowed you to enable AVX512 if you disabled E-cores, it was disabled across the board so they didn't have to validate and because otherwise they'd have a situation where cheaper models would perform better than expensive models in some situations which would not be a good look, marketing nonsense. AFAIK newer 12th gen runs have unfortunately disabled AVX512 properly.

Muddying the waters a bit more is that Zen 4's AVX512 implementation differs in some key areas (some better some worse, some instruction-chaining performs well/poorly on one arch but not the other, etc), adds to the benchmarking fun: https://mersenneforum.org/showthread.php?t=28102
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-12 12:17:00
Heck, this sounds like "fun" ...

Question:
As go differences between compiles: if build X is faster than build Y, is that
* due to "fewer instructions" executed (--> less heat generated)
or
* due to "instructions queued more efficiently" and some CPU-internal parallelization (--> same heat generated in shorter time)

- or a combination of both? 

On a cooling-constrained setup (laptop!) that makes differences - which depend critically on how much you actually are FLACing at one run:
For someone who acquires a lossless album, does the tagging, and then (re-)compresses it to get everything from a tiny improvement to a large depending on the source file - that is when you will actually watch the thing run to the end, right? - then one might be pretty much done with the album before the CPU needs to wipe sweat? Long-term energy usage that would need to be dissipated during an overnight job is simply not the yardstick then.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 14:49:31
So your "manyflags" build with GrieverV settings is a little faster than Case's Haswell build here. But, oddly enough, your Skylake build is slower although my 8th gen i7 is a family member...
Nice. It may be GCC 12.2.0 does things differently with skylake as older versions and even when your 8700 is a coffee lake it does better with the haswell optimizations.

Also thanks for looking into the -ffast-math issue.
What exactly was this math issue?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 15:58:24
You've compiled the wrong branch. The branch you're compiling is one without the mentioned optimizations. Checkout branch libFLAC-fast-math
WOW! Great job!
This time really the fast-math files :)

Compiled with
haswell
~1290x
~135x

and

native
~1280x
~136x

Most likely only measuring tolerance . It identifies as reference libFLAC 1.4.1 20220922. Is it ok to offer it here?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 16:12:37
What exactly was this math issue?
As mentioned by ktf:
https://github.com/xiph/flac/pull/476
There are a lot of online resources on this denormal topic, for example in this interactive demo:
https://www.h-schmidt.net/FloatConverter/IEEE754.html
You can toggle the checkboxes to see the numeric representations. Specifically, when all "Exponent" checkboxes are empty, the represented values are called denormals (or subnormals). One of what -ffast-math does is setting denormals to zero. Depends on the programmer's intent it may break some codes as the values are no longer the intented ones.

Even if this version of flac is safe to do math in this way, the mentioned issue is the -ffast-math logic could affect other codes which are unrelated to flac, and those codes may require proper denormal support.

A separate process (e.g. foobar2000 loading flac.exe) should be safe, as the exe is being loaded as a separate process.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 16:29:12
Thanky! Didn't have a problem yet with my frontends but ktf's effort is surely most welcome.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-12 19:14:58
As go differences between compiles: if build X is faster than build Y, is that
* due to "fewer instructions" executed (--> less heat generated)
or
* due to "instructions queued more efficiently" and some CPU-internal parallelization (--> same heat generated in shorter time)

- or a combination of both? 
It is a combination. What instructions execute more efficiently varies highly between CPUs. In fact, the resource you linked on AVX512 in Zen 4 lists quite a few such issues. Certain instructions are executed directly on a specific part of the CPU, while others need to be decoded into several instructions. On another CPU, other instructions might have dedicated silicon. This dedicated silicon might be more power hungry, like AVX512.

WOW! Great job!
This time really the fast-math files :)
If I read this correctly, you're seeing a 20% speedup, right?

Quote
Most likely only measuring tolerance . It identifies as reference libFLAC 1.4.1 20220922. Is it ok to offer it here?
Yes, sure. You probably downloaded a tarball instead of checking out git. It can only generate the proper version string when checked out with git. No worries though, this is very close to libFLAC 1.4.1.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 19:50:08
Indeed ~20%! The several additional flags optimize C code further and they seem to work well here.

I used https://github.com/ktmf01/flac.git so it gave me the wrong files but the zip from fast-math downloaded manualy worked.
Attached the version i tested above.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-12 20:18:19
You've compiled the wrong branch. The branch you're compiling is one without the mentioned optimizations. Checkout branch libFLAC-fast-math

I compiled this (flac git-cb822660 20221012) on Linux using -march=znver3 -Ofast.  I get the same performance with or without asm optimizations.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-12 20:33:35
Tested ktf's fastmath build:
(sorry, test results were corrupted, will re-test asap)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 20:49:34
Added more electronic and loudness war contents, hand-picked to only include the highest bitrate files, but does not contain noise music. Around 74.5% compression ratio.

1.3.1 (Xiph)

-8 -b2304
3200412387 bytes

-8
3202131236 bytes

1.3.2 (Xiph)

-8 -b2304
3200203911 bytes

-8
3201989505 bytes

1.4.1 (Case GCC 12.2.0)

-8 -b2304
3199429338 bytes

-8
3201122995 bytes

-8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)"
3201407279 bytes
Yes, somewhat bigger file size, see the quoted data for comparison.
-8 -b2304
The two speeds are single and multi-thread results.

Case GCC 12.2.0
Total encoding time: 1:39.094, 245.88x realtime
Total encoding time: 0:29.609, 822.91x realtime
3199429338 bytes

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 1:23.563, 291.58x realtime
Total encoding time: 0:25.078, 971.59x realtime
3200178267 bytes

[EDIT] Added -8p -b2304 tests, only multi-thread:

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 1:21.890, 297.54x realtime
3196718833 bytes

Case GCC 12.2.0
Total encoding time: 1:30.437, 269.42x realtime
3196159402 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-12 21:33:50
... now the corrected results for ktf's fastmath build:
(somehow an orphaned flac file wasn't deleted before starting the test and was accounted in the total FLAC size)
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes) = Reference
FLAC Option: -7
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-ktf-fastmath.exe (665600 bytes)
FLAC Option: -7
 Average time =  21.760 seconds (5 rounds), Encoding speed = 496.87x   <= faster encoding (429x -> 497x)
 FLAC size = 1.167.045.858 bytes (= 61,189% of WAV size, ~863 kbps) <= on-par compression: -0.001 percent points
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-12 21:36:26
@bennetng : 1/40th of a percent bigger files. Actually, if you want that much compression improvement by tweaking parameters, you will likely have to pay more than those nineteen percent time penalty? Going to the Case compile seems to be the cheapest bytes saved?
That is about tenfold the savings in @sundance 's test run?

How does it fare with -8p [and your fave -b]? Asking because "p" brute-forces "a certain task", so there is something it does particularly much of.
(That can be said about -8e as well, and -8 --lax -r 12 also? In the latter case, no other -b please. Not saying they are useful for anything but testing.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 21:58:49
@bennetng : 1/40th of a percent bigger files. Actually, if you want that much compression improvement by tweaking parameters, you will likely have to pay more than those nineteen percent time penalty? Going to the Case compile seems to be the cheapest bytes saved?
That is about tenfold the savings in @sundance 's test run?

How does it fare with -8p [and your fave -b]? Asking because "p" brute-forces "a certain task", so there is something it does particularly much of.
(That can be said about -8e as well, and -8 --lax -r 12 also? In the latter case, no other -b please. Not saying they are useful for anything but testing.)
As mentioned in the quoted box of my previous test, the corpus used was heavily biased to the very high bitrate files (~74.5% compression ratio). I just conveniently reused this corpus because it is still in my foobar playlist, so it can be considered as a special case, and -b2304 is suitable for this brutal set of files.

I think the significance of ktf's latest tweak is it offers obvious speed boost for different types of CPUs, and makes -8 much cheaper. -8 is an important preset that many people actually use.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 02:39:17
I tested to compilpe without any flags but -Ofast -m64 -march=haswell to check if the inreased size of resulting files is due to the gcc optimizations. The resulting files are identical so it must be new flac code itself.
My single wav testfile is 2.729.717.132 Bytes consisting of several cd images of different genres.
It compresses to 1.526.366.181 Bytes and 1.526.597.886 Bytes so a 0,015% file increase.

I was also asked for the additional flags. It is no secret and i copied them more or less from Case and GrieverV.
With the new fast-math code Everything together after fno-stack-protector makes almost no difference. Less or even at all against the older flac code.

-Ofast -m64 -march=haswell -fipa-pta -funroll-loops -fno-stack-protector -fno-common -fno-plt -fno-semantic-interposition -falign-functions=32 -fdevirtualize-at-ltrans -fgraphite-identity -floop-nest-optimize -flto -ffat-lto-objects -pipe
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 05:13:01
Ops. Above post misses that these 0,015% file increase is for -8p. Must be i forgot to mention it because i only use -8p for every test here.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 08:14:51
Another corpus with a more typical compression ratio. Faster overall speed than the previous corpus with an extreme ratio.

PCM (17 files)
4223331916 bytes

-8

ktf-fast-math-noasm-manyflags-haswell
multi: 0:17.453, 1371.78x realtime
single: 1:17.250, 309.92x realtime
2511293691 bytes
59.462%

Case GCC 12.2.0
multi: 0:20.859, 1147.79x realtime
single: 1:33.891, 254.99x realtime
2510290651 bytes
59.439%

-8 -A subdivide_tukey(3/2e-1)

ktf-fast-math-noasm-manyflags-haswell
multi: 0:17.422, 1374.22x realtime
single: 1:17.203, 310.11x realtime
2511287164 bytes
59.462%

Case GCC 12.2.0
multi: 0:20.984, 1140.95x realtime
single: 1:34.297, 253.89x realtime
2510265971 bytes
59.438%

-8p

ktf-fast-math-noasm-manyflags-haswell
multi: 0:53.156, 450.40x realtime
single: 3:44.875, 106.46x realtime
2509518160 bytes
59.420%

Case GCC 12.2.0
multi: 1:01.547, 389.00x realtime
single: 4:17.797, 92.87x realtime
2508762363 bytes
59.402%
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-13 08:53:06
Ops. Above post misses that these 0,015% file increase is for -8p.
Are we killing the double precision here?!?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 10:18:55
Multi-thread tests only, with slower settings to stress test temperature and power limit.

-8p -A subdivide_tukey(9)

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 4:32.344, 87.91x realtime
2509390434 bytes

Case GCC 12.2.0
Total encoding time: 8:23.812, 47.52x realtime
2507621979 bytes

-8p -A subdivide_tukey(7)

Case GCC 12.2.0
Total encoding time: 5:12.453, 76.62x realtime
2507754046 bytes

Case GCC 7.3.0
Total encoding time: 5:22.157, 74.31x realtime
2507754044 bytes

So with the latest tweak, (9) is bigger than (7).

My previous tests showed that Case GCC 7.3.0 can make the CPU very hot, and exceed power limit with -8pe, but without -e the max temperature is similar to other builds in this test (83C max), also within power limit.

Still, these temperatures are far below TjMAX (100C) which will trigger thermal throttling.

Tests were done using Intel stock cooler at 27C ambient temperature.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 11:46:20
Another corpus with a more typical compression ratio. Faster overall speed than the previous corpus with an extreme ratio.

PCM (17 files)
4223331916 bytes

-8p

ktf-fast-math-noasm-manyflags-haswell
multi: 0:53.156, 450.40x realtime
single: 3:44.875, 106.46x realtime
2509518160 bytes
59.420%

Case GCC 12.2.0
multi: 1:01.547, 389.00x realtime
single: 4:17.797, 92.87x realtime
2508762363 bytes
59.402%
FLAC-git-90d7fdb3_20221012_Win_GCC122 (https://hydrogenaud.io/index.php/topic,123176.msg1017465.html#msg1017465)
multi: 1:00.453, 396.03x realtime
single: 4:19.515, 92.25x realtime
2509518286 bytes

Multi-thread tests only, with slower settings to stress test temperature and power limit.

-8p -A subdivide_tukey(9)

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 4:32.344, 87.91x realtime
2509390434 bytes

Case GCC 12.2.0
Total encoding time: 8:23.812, 47.52x realtime
2507621979 bytes
FLAC-git-90d7fdb3_20221012_Win_GCC122
Total encoding time: 5:25.297, 73.59x realtime
2509390559 bytes

So, the older version at -8p produced smaller files than the newer version at -8p -A subdivide_tukey(9)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-13 13:58:20
-8p -A subdivide_tukey(9)

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 4:32.344, 87.91x realtime
2509390434 bytes

Case GCC 12.2.0
Total encoding time: 8:23.812, 47.52x realtime

Look at the times. Could it be that the guesstimations now fails to distinguish between functions to apply? Cf my uneducated outburst at 149 (https://hydrogenaud.io/index.php/topic,123025.100.html) ...
@ktf : would this even be possible?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-10-13 14:09:42
Attached the version i tested
ktf-fast-math-noasm-manyflags-haswell-Wombat.7z (https://hydrogenaud.io/index.php?action=dlattach;topic=123025.0;attach=23688)
This build is quite slow at decoding to WAV.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-13 14:46:22
Look at the times. Could it be that the guesstimations now fails to distinguish between functions to apply? Cf my uneducated outburst at 149 (https://hydrogenaud.io/index.php/topic,123025.100.html) ...
@ktf : would this even be possible?
I'm not sure why, but the last commit is faulty (https://github.com/xiph/flac/commit/90d7fdb3e1f058bd7b94330afc872cf277eae541). I've been doing some testing seeing the results here, and reverting that commit results in compression being back at 1.4.1 levels.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 14:49:33
Another corpus with completely different files.
Upper: ktf-fast-math-noasm-manyflags-haswell
Lower: Case GCC 12.2.0
Single thread, 31 files, 4278658628 bytes PCM size

-5
Total encoding time: 0:44.234, 548.34x realtime
2314954727 bytes
Total encoding time: 0:51.359, 472.27x realtime
2314955036 bytes

-6
Total encoding time: 0:54.000, 449.17x realtime
2310798337 bytes
Total encoding time: 1:01.671, 393.30x realtime
2310700059 bytes

-7
Total encoding time: 1:00.390, 401.64x realtime
2304079795 bytes
Total encoding time: 1:10.094, 346.04x realtime
2304028372 bytes

-8
Total encoding time: 1:19.453, 305.28x realtime
2303392314 bytes
Total encoding time: 1:37.187, 249.57x realtime
2302612016 bytes

-8 -A "tukey(7e-1);subdivide_tukey(3/2e-1)"
Total encoding time: 1:23.250, 291.35x realtime
2303079539 bytes
Total encoding time: 1:47.063, 226.55x realtime
2302352189 bytes

-8e
Total encoding time: 4:18.547, 93.81x realtime
2302133626 bytes
Total encoding time: 4:24.266, 91.78x realtime
2302133736 bytes

-8p
Total encoding time: 3:40.109, 110.19x realtime
2301560531 bytes
Total encoding time: 4:22.265, 92.48x realtime
2300997902 bytes

So -5 and -8e are the ones with marginally smaller sizes.

Not completely on topic but something interesting:
https://twitter.com/foone/status/1126996260026605568?lang=en
https://github.com/flyinghead/flycast/issues/644
Some 15 years ago I reported a similar issue on another Dreamcast emulator (Makaron) and the author talked about the same thing: precision differences between the Hitachi SH-4 CPU and generic x86 CPUs.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 14:53:15
This build is quite slow at decoding to WAV.
No difference here to older versions or even Case 12.2.0

Total:
  Decoded length: 21:29:32.933
  Opening time: 0:00.001
  Decoding time: 0:48.545
  Speed (x realtime): 1593.811
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 14:58:48
Total:
  Decoded length: 21:29:32.933
  Opening time: 0:00.001
  Decoding time: 0:48.545
  Speed (x realtime): 1593.811
Looks like a foo_benchmark report but foobar does not use flac.exe to decode files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 15:11:40
Ha! Didn't know that.
In frontah a 1.6GB flac decodes to wav in ~10 sec. with the Case and my manyflag fast-math version. I don't see a problem. You may have a better test.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-13 20:52:39
Okay, I've done some testing with the last commit removed, which is now current flac git. See the image below

X

What you see here is three different compiles of FLAC 1.4.1 and one of current git, all with GCC 12.2. Each line has, from left to right, presets -8, -7, -6, -5 and -4.

The light blue one is FLAC 1.4.1 as is, quite similar to what can be downloaded from xiph.org. The dark blue line adds -march=x86-64-v3. That is a 'vendor-neutral' shorthand for including all SSE, AVX, AVX2 and FMA3 instruction set extensions. Most recent CPUs (less than about 6 years old) have those. You can see that is a bit faster, but not much. If you add --disable-asm-optimizations, you get the red line, which is much slower. This is because GCC isn't able to properly optimize.

Now, with the recent changes, you get the green line when using those last options: disabling of specially crafted SSE/AVX/FMA routines so the compiler can try to do better, combined with saying it can use all SSE, AVX, AVX2 and FMA.

For -8 the difference is rather small but for other presets it is quite interesting.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-13 21:31:56
The graph cannot show size differences, I mean even the dots obscure that; but: are the sizes so close to equal that we can be pretty sure the builds do, per file, (as good as) the same thing?
That none of the compiles would make round-offs in the model estimation that leads to them selecting a less elaborate compression?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-14 06:24:57
There will always round-off differences, but these are no longer significant. With the builds posted here recently, a difference was clearly visible in my graphs. Don't have them around anymore though.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-14 08:08:17
For -8 the difference is rather small

In part because the first axis is speed (not time). It seems that if green is X seconds faster than blue at -7, then it would be about X seconds faster at -8 as well. And not far from X for -5 either. (Quick and dirty "calculations" from quick and dirty graph reading - you got the actual times?)

Anyway: does this give any information on what part of the job the fast compile actually does fast?
(Does it matter? Maybe not, but one is curious eh?)

There will always round-off differences, but these are no longer significant.
Looks that way! But visuals sometimes make for optical illusions.

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-14 11:30:37
In part because the first axis is speed (not time). It seems that if green is X seconds faster than blue at -7, then it would be about X seconds faster at -8 as well. And not far from X for -5 either. (Quick and dirty "calculations" from quick and dirty graph reading - you got the actual times?)

Anyway: does this give any information on what part of the job the fast compile actually does fast?
(Does it matter? Maybe not, but one is curious eh?)
The pixel resolution is good up to .002% size difference and I am seeing something like 190/195x vs 210x in -8. Relative speed differences may vary in different CPUs so I would rather do my own tests later on.

In general, if the compiler is allowed to change anything which is logically correct with unlimited precision, then additional steps should be made to limit the intermediate values within a smaller range to avoid significant loss of accuracy.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-14 15:24:00
Size is back to 1.4.1 with current git and speed is inbetween.
-8 -p is now at ~123x and ~1199x realtime.
reference libFLAC git-0665053c 20221013 attached.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-14 16:43:53
fast-math-noasm-manyflags-haswell-git
Same test condition as Reply #155 (https://hydrogenaud.io/index.php/topic,123025.msg1017484.html#msg1017484)

-5
Total encoding time: 0:41.406, 585.79x realtime
2314954944 bytes

-6
Total encoding time: 0:51.437, 471.55x realtime
2310700076 bytes

-7
Total encoding time: 1:00.016, 404.14x realtime
2304028346 bytes

-8
Total encoding time: 1:23.453, 290.64x realtime
2302612081 bytes

-8 -A "tukey(7e-1);subdivide_tukey(3/2e-1)"
Total encoding time: 1:33.125, 260.46x realtime
2302352273 bytes

-8e
Total encoding time: 4:06.922, 98.23x realtime
2302133843 bytes

-8p
Total encoding time: 3:58.109, 101.86x realtime
2300997970 bytes

So around 10-20% faster than Case GCC 12.2.0 with almost identical file sizes.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-14 18:19:58
Time for 1.4.2?  ;)
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-14 19:08:42
@ktf: Pardon my ignorance, just to make sure I understood what's going on right now:
There was a coding error that crept in which you found and corrected as reported in #154.
This glitch resulted in higher encoding speed @ a little worse compression. After correction, the encoded file size is back to what is to be expected and we lost some speed.
-> But the FLACs produced by the faulty binary are still fine? (at least they have the same MD5 hash as the ones from the reference encoder)
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-14 20:10:49
Time for 1.4.2?  ;)
Yes, but for another reason: https://github.com/xiph/flac/issues/471

But the FLACs produced by the faulty binary are still fine?
Yes, nothing wrong with those.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-15 00:40:07
@ktf  - whenever I specify CFLAGS or CXXFLAGS environmental variables, I see some of the default optimizations disappear in the makefile such as -O3 and -funroll-loops.  I also see they are prepended, so any conflicting flags will be overridden by the defaults


So after messing with compile flags, these seem to get me the fastest encode times.  There may be others I haven't tried that might improve things further.  Using gcc-11 seemed to have a slight edge over gcc-12.  I also have to manually edit the Makefile to remove the -fstack-protector-strong flags.
Code: [Select]
export CC="/usr/bin/gcc-11"  
export CXX="/usr/bin/g++-11"
export CFLAGS="-march=native -O3 -funroll-loops -pipe -flto -fomit-frame-pointer -fno-stack-protector"
export CXXFLAGS="-march=native -O3 -funroll-loops -pipe -flto -fomit-frame-pointer -fno-stack-protector"
export LDFLAGS="-Wl,-s"
./configure --disable-asm-optimizations --disable-altivec

flac v1.3.4 - default Debian build.
Code: [Select]
time flac -V -8
File-01.wav: Verify OK, wrote 24439771 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24699891 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39523493 bytes, ratio=0.701
File-04.wav: Verify OK, wrote 40045704 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15922434 bytes, ratio=0.366

real 0m4.398s
user 0m4.144s
sys 0m0.244s

 
time flac -V -8 -e *.wav
File-01.wav: Verify OK, wrote 24435335 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24662789 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39491754 bytes, ratio=0.700
File-04.wav: Verify OK, wrote 40039188 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15782279 bytes, ratio=0.363

real 0m10.772s
user 0m10.523s
sys 0m0.240s


time flac -V -8 -p *.wav
File-01.wav: Verify OK, wrote 24419397 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24686505 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39490413 bytes, ratio=0.700
File-04.wav: Verify OK, wrote 40014362 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15900707 bytes, ratio=0.366

real 0m9.328s
user 0m9.006s
sys 0m0.316s


time flac -V -8 -e -p *.wav
File-01.wav: Verify OK, wrote 24415349 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24649517 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39456681 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40008204 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15754777 bytes, ratio=0.362

real 1m0.028s
user 0m59.751s
sys 0m0.276s


time flac -V -8 -e -p -b 2304 *.wav
File-01.wav: Verify OK, wrote 24435705 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24730440 bytes, ratio=0.563
File-03.wav: Verify OK, wrote 39359937 bytes, ratio=0.698
File-04.wav: Verify OK, wrote 40034300 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15796351 bytes, ratio=0.363

real 1m15.970s
user 1m15.594s
sys 0m0.376s

 
time -V -8 -b 2304 *.wav
File-01.wav: Verify OK, wrote 24465811 bytes, ratio=0.728
File-02.wav: Verify OK, wrote 24780187 bytes, ratio=0.564
File-03.wav: Verify OK, wrote 39445204 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40077690 bytes, ratio=0.819
File-05.wav: Verify OK, wrote 15970568 bytes, ratio=0.367

real 0m4.687s
user 0m4.353s
sys 0m0.320s

 
time flac -V -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)" *.wav
File-01.wav: Verify OK, wrote 24439771 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24699891 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39523493 bytes, ratio=0.701
File-04.wav: Verify OK, wrote 40045704 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15922434 bytes, ratio=0.366

real 0m4.421s
user 0m4.119s
sys 0m0.292s


flac git-0665053c 20221013
Code: [Select]
time flac -V -8 *.wav
File-01.wav: Verify OK, wrote 24434850 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24616849 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39438963 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40041267 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15382115 bytes, ratio=0.354

real 0m4.620s
user 0m4.294s
sys 0m0.292s
 

time flac -V -8 -e *.wav
File-01.wav: Verify OK, wrote 24431769 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24615103 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39419532 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40036367 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15367594 bytes, ratio=0.354

real 0m12.232s
user 0m11.943s
sys 0m0.276s
 

time flac -V -8 -p *.wav
File-01.wav: Verify OK, wrote 24416121 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24604533 bytes, ratio=0.560
File-03.wav: Verify OK, wrote 39416318 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40010908 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15358013 bytes, ratio=0.353

real 0m11.245s
user 0m10.935s
sys 0m0.293s
 

time flac -V -8 -e -p *.wav
File-01.wav: Verify OK, wrote 24412782 bytes, ratio=0.726
File-02.wav: Verify OK, wrote 24602298 bytes, ratio=0.560
File-03.wav: Verify OK, wrote 39387742 bytes, ratio=0.698
File-04.wav: Verify OK, wrote 40005678 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15343571 bytes, ratio=0.353

real 1m19.520s
user 1m18.936s
sys 0m0.408s
 

time flac -V -8 -e -p -b 2304 *.wav
File-01.wav: Verify OK, wrote 24433519 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24697968 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39305681 bytes, ratio=0.697
File-04.wav: Verify OK, wrote 40031779 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15459132 bytes, ratio=0.356

real 1m45.081s
user 1m44.448s
sys 0m0.416s
 

time flac -V -8 -b 2304 *.wav
File-01.wav: Verify OK, wrote 24460714 bytes, ratio=0.728
File-02.wav: Verify OK, wrote 24720984 bytes, ratio=0.563
File-03.wav: Verify OK, wrote 39357368 bytes, ratio=0.698
File-04.wav: Verify OK, wrote 40072725 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15503783 bytes, ratio=0.357

real 0m4.857s
user 0m4.491s
sys 0m0.356s
 

time flac -V -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)" *.wav
File-01.wav: Verify OK, wrote 24438161 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24619892 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39500724 bytes, ratio=0.700
File-04.wav: Verify OK, wrote 40043969 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15389983 bytes, ratio=0.354

real 0m5.573s
user 0m5.337s
sys 0m0.225s

Compression results.  File names appended with version and options used.
ex..  B = -b 2304, A = -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)"
Code: [Select]
24412782 File-01.flac-1.4.1-EP
24415349 File-01.flac-1.3.4-EP
24416121 File-01.flac-1.4.1-P
24419397 File-01.flac-1.3.4-P
24431769 File-01.flac-1.4.1-E
24433519 File-01.flac-1.4.1-EPB
24434850 File-01.flac-1.4.1
24435335 File-01.flac-1.3.4-E
24435705 File-01.flac-1.3.4-EPB
24438161 File-01.flac-1.4.1-A
24439771 File-01.flac-1.3.4
24439771 File-01.flac-1.3.4-A
24460714 File-01.flac-1.4.1-B
24465811 File-01.flac-1.3.4-B
33605420 File-01.wav
 
24602298 File-02.flac-1.4.1-EP
24604533 File-02.flac-1.4.1-P
24615103 File-02.flac-1.4.1-E
24616849 File-02.flac-1.4.1
24619892 File-02.flac-1.4.1-A
24649517 File-02.flac-1.3.4-EP
24662789 File-02.flac-1.3.4-E
24686505 File-02.flac-1.3.4-P
24697968 File-02.flac-1.4.1-EPB
24699891 File-02.flac-1.3.4
24699891 File-02.flac-1.3.4-A
24720984 File-02.flac-1.4.1-B
24730440 File-02.flac-1.3.4-EPB
24780187 File-02.flac-1.3.4-B
43911884 File-02.wav
 
39305681 File-03.flac-1.4.1-EPB
39357368 File-03.flac-1.4.1-B
39359937 File-03.flac-1.3.4-EPB
39387742 File-03.flac-1.4.1-EP
39416318 File-03.flac-1.4.1-P
39419532 File-03.flac-1.4.1-E
39438963 File-03.flac-1.4.1
39445204 File-03.flac-1.3.4-B
39456681 File-03.flac-1.3.4-EP
39490413 File-03.flac-1.3.4-P
39491754 File-03.flac-1.3.4-E
39500724 File-03.flac-1.4.1-A
39523493 File-03.flac-1.3.4
39523493 File-03.flac-1.3.4-A
56417468 File-03.wav
 
40005678 File-04.flac-1.4.1-EP
40008204 File-04.flac-1.3.4-EP
40010908 File-04.flac-1.4.1-P
40014362 File-04.flac-1.3.4-P
40031779 File-04.flac-1.4.1-EPB
40034300 File-04.flac-1.3.4-EPB
40036367 File-04.flac-1.4.1-E
40039188 File-04.flac-1.3.4-E
40041267 File-04.flac-1.4.1
40043969 File-04.flac-1.4.1-A
40045704 File-04.flac-1.3.4
40045704 File-04.flac-1.3.4-A
40072725 File-04.flac-1.4.1-B
40077690 File-04.flac-1.3.4-B
48963980 File-04.wav
 
15343571 File-05.flac-1.4.1-EP
15358013 File-05.flac-1.4.1-P
15367594 File-05.flac-1.4.1-E
15382115 File-05.flac-1.4.1
15389983 File-05.flac-1.4.1-A
15459132 File-05.flac-1.4.1-EPB
15503783 File-05.flac-1.4.1-B
15754777 File-05.flac-1.3.4-EP
15782279 File-05.flac-1.3.4-E
15796351 File-05.flac-1.3.4-EPB
15900707 File-05.flac-1.3.4-P
15922434 File-05.flac-1.3.4
15922434 File-05.flac-1.3.4-A
15970568 File-05.flac-1.3.4-B
43467356 File-05.wav
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 12:22:34
It would be interesting to check the "fake hi-res improvement" as well. Same corpus as Reply #155 (https://hydrogenaud.io/index.php/topic,123025.msg1017484.html#msg1017484). Speed not shown as I don't have enough RAM drive space.

RetroArch 88200Hz, highest quality, 24-bit, no dither, no RG (i.e. with intersample over induced clipping)

Case GCC 12.2.0 (-8)
5606032877 bytes

fast-math-noasm-manyflags-haswell-git (-8)
5606121407 bytes

v1.3.2 Xiph x64 (-8)
7179114110 bytes

v1.3.2 (-8p)
7174033037 bytes

v1.3.2 (-8e)
7026991603 bytes

RetroArch 88200Hz, lower quality, 24-bit, no dither, with RG (i.e. no clipping but with some ultrasonic leakage)

Case GCC 12.2.0 (-8)
6946871439 bytes

fast-math-noasm-manyflags-haswell-git (-8)
6946870597 bytes

v1.3.2 (-8)
7200099834 bytes

v1.3.2 (-8p)
7196121884 bytes

v1.3.2 (-8e)
7138462138 bytes

RetroArch 88200Hz, normal quality, 24-bit, no dither, with RG (i.e. no clipping but with minor ultrasonic leakage)

Case GCC 12.2.0 (-8)
6231623126 bytes

fast-math-noasm-manyflags-haswell-git (-8)
6231620978 bytes

I don't have a lot of "real" hi-res files to test apart from the free ones. Files above 48kHz may use -b beyond 4608 so others can test it too.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-15 14:49:11
One tiny thing is that git version generated files are some bits larger only because of the longer version number string.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 15:09:59
In this case the file size difference should be consistant among different settings, but for example the post right above it:

RetroArch 88200Hz, normal quality, 24-bit, no dither, with RG (i.e. no clipping but with minor ultrasonic leakage)

Case GCC 12.2.0 (-8)
6231623126 bytes

fast-math-noasm-manyflags-haswell-git (-8)
6231620978 bytes

The git version is 2148 bytes smaller, while in another test:

RetroArch 88200Hz, highest quality, 24-bit, no dither, no RG (i.e. with intersample over induced clipping)

Case GCC 12.2.0 (-8)
5606032877 bytes

fast-math-noasm-manyflags-haswell-git (-8)
5606121407 bytes

The git version is 88530 bytes bigger. Negligible differences, but not consistent.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-15 15:33:47
Sorry, was not related to these files especialy. The longer git string is 6 or 7 bits. Didn't check it further.

This is for 18.6GB -8 -p HiBitrate files i used for RG testing:

19.713.863.889 Bytes flac 1.4.1
19.713.858.079 Bytes fast-math-noasm-manyflags-haswell-git
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 18:27:46
http://www.2l.no/hires/
Code: [Select]
http://www.lindberg.no/hires/test/2L-145/2L-45_stereo_01_FLAC_352k_24b.flac
http://www.lindberg.no/hires/test/2L-139/2L-139_stereo_FLAC_176k_24b_01.flac
http://www.lindberg.no/hires/test/2L-106/2L-106_stereo_PCM-96k_MAGNIFICAT_04.flac
http://www.lindberg.no/hires/test/2L38_01_96kHz.flac

(https://hydrogenaud.io/index.php?action=dlattach;attach=23555;image)

Upper: fast-math-noasm-manyflags-haswell-git
Lower: Case GCC 12.2.0
1579150352 bytes PCM size

-8
Total encoding time: 0:31.921, 49.25x realtime
772176807 bytes
Total encoding time: 0:30.422, 51.67x realtime
772176835 bytes

-8e
Total encoding time: 2:08.391, 12.24x realtime
772158544 bytes
Total encoding time: 1:47.328, 14.64x realtime
772158545 bytes

-8p
Total encoding time: 2:39.969, 9.82x realtime
771495942 bytes
Total encoding time: 2:10.266, 12.06x realtime
771495961 bytes

-8 -b16384
Total encoding time: 0:31.125, 50.51x realtime
769125599 bytes
Total encoding time: 0:29.109, 54.01x realtime
769125592 bytes

-8 -b16384 -A subdivide_tukey(5)
Total encoding time: 1:01.484, 25.57x realtime
769070756 bytes
Total encoding time: 0:52.516, 29.93x realtime
769070764 bytes

Case GCC 12.2.0 is consistently faster with these hi-res files on my i3-12100.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 20:06:30
Upper: fast-math-noasm-manyflags-haswell-git
Lower: Case GCC 12.2.0
1579150352 bytes PCM size
-8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"
Total encoding time: 0:41.172, 38.18x realtime
769068252 bytes
Total encoding time: 0:36.953, 42.54x realtime
769068236 bytes

Combining cheap windows sometimes can produce good results.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-15 20:35:32
hann is tukey(1) - the tukey windowing is a rectangle with cosine tapering, and when the "rectangle" hits zero width there is only the cosine left: https://en.wikipedia.org/wiki/List_of_window_functions#Tukey_window
So again you got "two very differently tapered tukeys".
flattop is a weirdo, it is even negative somewhere, but at HA fifteen years ago, it would do well in combination with tukey, so it is an obvious "try this if you want another".

For high resolution - and I got a bit of my testing material from 2L too! - I recall that a gauss window sometimes did "surprisingly" well. That is, surprise compared to it being of very little value for CDDA.
Note, welch = parabola and gauss = exp(parabola).
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-16 02:36:41
Tested a loud 24-96 album and the performance hit is pretty hefty indeed. -8 -p single file.

Case 1.4.1
26.71x realtime

1.4.1 manyflags -Ofast
27.63x realtime

fast-math-noasm-manyflags-haswell-git
21.36x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-16 10:37:31
Oh, 96/24 ... Attached: a few seconds of 96/24, squeezed down to < 2 MB.

Not at all randomly selected: 1.4.1's double precision makes savings like 17 percent at -7, that is YUGE. (But, it is not so that I cherry-picked the best-looking few seconds in the track. Though I did avoid the most dense part.)

* Compared to CDDA, it is easier to beat -p by stacking up with -A [functions]. For the full track, I could beat -p at half the encoding time
* -b [something] can often make a difference on 96/24, but default looks good on this clip. Also the gains from -r are not jaw-dropping either.

At https://hydrogenaud.io/index.php/topic,120158.msg1003288.html#msg1003288 I used the entire EP for testing, but of course I cannot share more than a clip. Buy it :-)

Music: "Temptation", 2017 remake, by Canadian band The Tea Party (who, name-wise, suffered halfway the same fate as ISIS ...)
Known for "Moroccan roll" style hard rock, however this track is largely industrial synth. I bought the EP from https://teaparty.com/tx20 , (2.99 Canadian $ - and 1.99 in MP3 for those of you who happily let others do the lossless testing). You can listen there or on Spotify, https://open.spotify.com/album/6Q3GV4HsGwPzQ9a2TA8cg0

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-16 11:41:08
Tested a loud 24-96 album and the performance hit is pretty hefty indeed.
If it is unavoidable then I can still use 1.4.1 for hi-res and 1.4.2 for CDDA.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-16 19:18:49
flattop is a weirdo, it is even negative somewhere, but at HA fifteen years ago, it would do well in combination with tukey, so it is an obvious "try this if you want another".
Flattop tends to work better when the upper spectrum is more empty. An example of DSD to flac conversion with different filtering.
X
-8 -A subdivide_tukey(3/2e-1) with and without flattop

818465962 25kHz flattop.flac
818511667 25kHz.flac (.0055842% bigger)

886963408 Multistage flattop.flac
886996117 Multistage.flac (.0036878% bigger)

====================================

Another example, the "raw DXD" file:
http://www.2l.no/hires/DXD-DSD/index.html
-8 -A subdivide_tukey(3/2e-1) with and without flattop

209196492 JGH flattop.flac
209215084 JGH no flattop.flac (.0088873% bigger)

With optimal -b the effect is even bigger.
-8 -b16384 -A subdivide_tukey(3/2e-1) with and without flattop

207277309 JGH flattop.flac
207300539 JGH no flattop.flac (.0112072% bigger)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-17 09:04:22
The weakness of git version is 24-bit.
Upper: Case GCC 12.2.0
Lower: fast-math-noasm-manyflags-haswell-git
-8p

16/96
Total encoding time: 0:05.204, 59.04x realtime
Total encoding time: 0:04.922, 62.43x realtime

16/192
Total encoding time: 0:12.406, 24.76x realtime
Total encoding time: 0:11.859, 25.91x realtime

16/352
Total encoding time: 0:26.329, 11.67x realtime
Total encoding time: 0:25.657, 11.97x realtime

24/96
Total encoding time: 0:12.828, 23.95x realtime
Total encoding time: 0:15.469, 19.86x realtime

24/192
Total encoding time: 0:27.688, 11.09x realtime
Total encoding time: 0:34.485, 8.91x realtime

24/352
Total encoding time: 0:53.562, 5.73x realtime
Total encoding time: 1:26.813, 3.53x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-18 02:38:05
flac git-92928f28 20221017

Clang 16 vs GCC 12.  Both compiled with the same cflags/cxxflags.

Compiled with GCC 12.2.0
Code: [Select]
flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658916991 bytes, ratio=0.600

/usr/local/bin/flac -V -8 the_fragile_album.wav
Encode Time: 1:34.67



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658690092 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -e the_fragile_album.wav
Encode Time: 1:56.36



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658391508 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -p the_fragile_album.wav
Encode Time: 1:55.56



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658115494 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -e -p the_fragile_album.wav
Encode Time: 7:12.51



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659488938 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -b 2304 the_fragile_album.wav
Encode Time: 1:22.54



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659082364 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -A tukey(5e-1);partial_tukey(2);punchout_tukey(3) the_fragile_album.wav
Encode Time: 1:25.99

Clang 16.0.0
Code: [Select]
flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658916991 bytes, ratio=0.600

/usr/local/bin/flac -V -8 the_fragile_album.wav
Encode Time: 1:06.14



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658690092 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -e the_fragile_album.wav
Encode Time: 1:56.12



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658391508 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -p the_fragile_album.wav
Encode Time: 2:00.89



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658115494 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -e -p the_fragile_album.wav
Encode Time: 11:54.96



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659488936 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -b 2304 the_fragile_album.wav
Encode Time: 0:58.16



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659082364 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -A tukey(5e-1);partial_tukey(2);punchout_tukey(3) the_fragile_album.wav
Encode Time: 0:51.12
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-18 06:58:11
Wow, night and day speed differences! Effect of Remove all assembler (https://github.com/xiph/flac/commit/75ef7958df603ca6de29fa00e82615e0da017903) and / or Assume Clang supports x86 intrinsics up to FMA (https://github.com/xiph/flac/commit/90c0562d4eb302b01d9b82c75a7f6a66261c5546)?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-18 13:08:30
Compiling with --disable-asm-optimizations Clang slows the performance to half for 24-96 files against pure gcc.
Is --disable-asm-optimizations the right way atm.?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-18 14:55:36
Ha! Didn't know that.
In frontah a 1.6GB flac decodes to wav in ~10 sec. with the Case and my manyflag fast-math version. I don't see a problem. You may have a better test.
flac 1.4.1 = Case GCC 12.2.0
flac git-0665053c 20221013 = fast-math-noasm-manyflags-haswell-git
8h46m48s single flac file (~3502MB) encoded with -l0 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:20,06

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:23,49

Same file encoded with -l6 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:20,61

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:21,39

Same file encoded with -l12 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:22,18

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:22,95

Same file encoded with --lax -l32 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:25,00

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:25,73

The timing was done using this method:
https://stackoverflow.com/a/9938411

I modified the script a bit, drag and drop a flac file into the cmd file to test in the same way I did. Even preset -5 uses -l8 so the differences should be small in most cases. Don't know it is CPU dependent or not.
Title: Re: FLAC v1.4.x Performance Tests
Post by: .halverhahn on 2022-10-18 15:28:26
The timing was done using this method:
https://stackoverflow.com/a/9938411

Or just use the Powershell and the Measure-Command ;)

e.g.:
Code: [Select]
PS C:\TEMP\FLAC141> Measure-Command { .\flac141xiph.exe -8 imput.flac -f -o output.flac141xiph.flac | Out-Default }
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-18 15:58:21
Oh. If now I could get PowerShell to do something as simple as FOR /R %f IN (*.flac) ...

... any help at https://hydrogenaud.io/index.php/topic,123025.msg1017305.html#msg1017305 ?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-18 16:05:18
Speed differences in decoding lower -l is real.

-l0

PS H:\> measure-command {h:\flac-case -ts image.flac | out-default}
TotalSeconds      : 20.2462909

PS H:\> measure-command {h:\flac-git -ts image.flac | out-default}
TotalSeconds      : 23.5196237

-l8

PS H:\> measure-command {h:\flac-case -ts image.flac | out-default}
TotalSeconds      : 21.362982

PS H:\> measure-command {h:\flac-git -ts image.flac | out-default}
TotalSeconds      : 22.1673663
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-20 00:40:29
So it seems in my test with Clang 16 vs GCC 12, Clang only seems to have the advantage with larger files being written to the storage device.  In my case, I'm using a single SSD in a ZFS pool.  I noticed encoding with the GCC compiled version, times were consistent within 1 second after multiple rounds of the same test, however, with the Clang compiled version, times seemed to vary more between rounds.  I wonder why Clang seems to do better when files are written to a disk...

Each round of test was done in order with these options:
Code: [Select]
flac -d *.flac
flac -V -8 *.wav
flac -V -8 -e *.wav
flac -V -8 -p *.wav
flac -V -8 -e -p *.wav
flac -V -8 -b 2304 *.wav
flac -V -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)" *.wav


Nine Inch Nails - The Fragile (single file for whole album)
GCC 12.2.0 - R/W to disk
Code: [Select]
Decode Time: 0:37.55 
Encode Time: 1:34.83
Encode Time: 1:56.33
Encode Time: 1:53.33
Encode Time: 7:09.77
Encode Time: 1:30.12
Encode Time: 1:24.01

Nine Inch Nails - The Fragile (single file for whole album)
Clang 16.0.0 - R/W to disk
Code: [Select]
Decode Time: 0:36.79 
Encode Time: 1:19.53
Encode Time: 2:28.14
Encode Time: 2:26.93
Encode Time: 11:54.74
Encode Time: 1:23.08
Encode Time: 0:59.60

Nine Inch Nails - The Fragile (single file for whole album)
GCC 12.2.0 - R/W to ramdisk
Code: [Select]
Decode Time: 0:05.28 
Encode Time: 0:20.43
Encode Time: 0:56.15
Encode Time: 0:53.01
Encode Time: 6:11.10
Encode Time: 0:22.19
Encode Time: 0:24.88

Nine Inch Nails - The Fragile (single file for whole album)
Clang 16.0.0 - R/W to ramdisk
Code: [Select]
Decode Time: 0:05.52 
Encode Time: 0:23.80
Encode Time: 1:27.26
Encode Time: 1:29.84
Encode Time: 10:47.43
Encode Time: 0:24.70
Encode Time: 0:28.96

And because someone mentioned The Tea Party! 
This test mixed the album Transmission as individual tracks and Interzone Mantras as a single file.
GCC 12.2.0 R/W to disk
Code: [Select]
Decode Time: 0:42.11 
Encode Time: 1:18.81
Encode Time: 1:58.11
Encode Time: 1:52.34
Encode Time: 7:26.50
Encode Time: 1:25.81
Encode Time: 1:27.76

Clang 16.0.0 R/W to disk
Code: [Select]
Decode Time: 0:43.53 
Encode Time: 1:27.22
Encode Time: 2:30.46
Encode Time: 2:35.19
Encode Time: 12:20.65
Encode Time: 1:15.82
Encode Time: 1:00.24

GCC 12.2.0 - R/W to ramdisk
Code: [Select]
Decode Time: 0:05.37 
Encode Time: 0:20.91
Encode Time: 0:57.80
Encode Time: 0:53.64
Encode Time: 6:24.09
Encode Time: 0:22.74
Encode Time: 0:25.49

Clang 16.0.0 - R/W ramdisk
Code: [Select]
Decode Time: 0:05.67 
Encode Time: 0:23.27
Encode Time: 1:29.83
Encode Time: 1:29.42
Encode Time: 11:07.48
Encode Time: 0:25.76
Encode Time: 0:25.12
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 13:35:29
flac1013 = Wombat fast-math-noasm-manyflags-haswell-git
flac1021 = john33 flac-1.4.1-git-6abf272-20221021
flac141 = Case GCC 12.2.0

24-bit transcoding (96-352kHz)

PS H:\> measure-command{h:\flac1013 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  60.2480702


PS H:\> measure-command{h:\flac1021 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  56.8414738


PS H:\> measure-command{h:\flac141 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  55.1369771


16-bit transcoding (48kHz)

PS H:\> measure-command{h:\flac1013 *.flac -fs8}|select totalseconds

TotalSeconds
------------
   76.156825


PS H:\> measure-command{h:\flac1021 *.flac -fs8}|select totalseconds

TotalSeconds
------------
  86.8591274


PS H:\> measure-command{h:\flac141 *.flac -fs8}|select totalseconds

TotalSeconds
------------
  83.7913263


Decoding files encoded with -8 (16-bit 48kHz) For unknown reasons there is always a startup delay in the first, non-repeating decoding command despite using RAM disk, so both first and second runs are posted.

PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  19.7201548


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  16.9045647


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  20.4929391


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  17.6664767


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  19.6514963


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  16.8442894


Decoding files encoded with -8 -b16384 (24-bit 96-352kHz)


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
   11.280577


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
   8.4720569


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  11.5633101


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
   8.7760674


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  11.2297442


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
   8.4221105
  
  
Decoding files encoded with -l0, mixed bit-depth and sample rate:

PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  14.9447091


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  12.1270033


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
   13.809708


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  10.9962077


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  13.2682311


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
   10.469793
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-21 14:37:41
Seems that with the next release building without asm optimizations is good for 16bit only apps llike CUETools (besides HDCD).
Attached a current git versioin of both ways to compile.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 15:12:22
flac1013 = Wombat fast-math-noasm-manyflags-haswell-git
flac1021 = john33 flac-1.4.1-git-6abf272-20221021
flac141 = Case GCC 12.2.0

24-bit transcoding (96-352kHz)

PS H:\> measure-command{h:\flac1013 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  60.2480702


PS H:\> measure-command{h:\flac1021 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  56.8414738


PS H:\> measure-command{h:\flac141 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  55.1369771
PS H:\> measure-command{h:\flac1021wombat *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  55.8933077


PS H:\> measure-command{h:\flac1021wombat-noasm *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  60.1707838

Same files in same sample rates, but 16-bit

PS H:\> measure-command{h:\flac1021wombat *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  37.7704419


PS H:\> measure-command{h:\flac1021wombat-noasm *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  35.4706544
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 16:07:52
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-10-21 16:09:51
I tested again my AMD Ryzen 5 3600X with different builds to see if there is some encoding speed up.  8)

Code: [Select]
flac -8p

Source
Code: [Select]
Codec      :     PCM (WAV)
Duration   :     57:21:749
Sample rate:     48000 Hz
Channels   :     2
Bits per sample: 16

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    56.570

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 425812103 bytes, ratio=0,644
Global  Time =    54.792

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    57.293

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    52.017
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-21 16:18:00
@music_1 : If you test -8, -8r8 and -8e rather than -8p, what happens?
Asking because -p brute-forces part of the process, -e a different one. Of course -8 is faster than either (and -8e is not much useful anymore!), but it is interesting to see whether the order of compiles stays the same. If not, then one makes part of the job more efficient and another a different part of the job.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 16:34:33
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
flac1021znver2john33
24/48
Total encoding time: 1:03.859, 200.51x realtime
16/48
Total encoding time: 0:29.985, 427.03x realtime

My i3-12100 must be a remarked Ryzen  :))
Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-10-21 16:48:00
flac -8

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 426124832 bytes, ratio=0,645
Global  Time =    18.246

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 426124828 bytes, ratio=0,645
Global  Time =    17.863

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    17.792

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 426124836 bytes, ratio=0,645
Global  Time =    17.647

flac -8r8

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 426124602 bytes, ratio=0,645
Global  Time =    20.921

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 426124598 bytes, ratio=0,645
Global  Time =    20.196

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 426124606 bytes, ratio=0,645
Global  Time =    20.341

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 426124606 bytes, ratio=0,645
Global  Time =    20.960

flac -8e

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 426050030 bytes, ratio=0,645
Global  Time =    51.351

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 426050026 bytes, ratio=0,645
Global  Time =    52.222

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 426050035 bytes, ratio=0,645
Global  Time =    50.218

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 426050035 bytes, ratio=0,645
Global  Time =    51.782
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-21 17:22:26
My i3-12100 must be a remarked Ryzen  :))

The only 12th generation Intel here? Plot thickens.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 17:54:47
Thanks. Here are results with some AVX-only builds.

Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-znver2-GCC1220.zip
Total encoding time: 1:13.328, 29.32x realtime
425513429 bytes

znver3 (https://hydrogenaud.io/index.php/topic,123014.msg1016407.html#msg1016407)
Total encoding time: 1:11.891, 29.91x realtime
425513429 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-AVX2%20-GCC1220.zip
Total encoding time: 1:12.250, 29.76x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

It seems that the Ryzen builds have no compatibility issue with my Intel CPU.
No joke. Looks like a znver3 build would ever be better.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-22 14:35:17
flac 1.4.2 arrived. Here my attempt with GCC 12.2.0 and the same flags as before. One build with disable-asm-optimizations and faster 16bit encoding for use in CUETools for example.
btw. configure spits out "unrecognized options: --enable-sse" Is it meant to be that way and the compiler decides? In that case different CPU versions could benefit more from the compiler choices?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-22 15:46:05
btw. configure spits out "unrecognized options: --enable-sse" Is it meant to be that way and the compiler decides? In that case different CPU versions could benefit more from the compiler choices?
--enable-sse was a misnomer. It was actually 'force sse2'. This option has been removed. See here (https://github.com/xiph/flac/issues/486) and the changelog (https://github.com/xiph/flac/blob/master/CHANGELOG.md).

It didn't do anything for 64-bit compiles anyway, only for 32-bit compiles.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-22 15:49:58
Thanks. Now Clang 15.0.3 (MSYS2) creates faster binaries for the first time. Attached some.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-22 16:32:24
Looks like -q is quite predictable for high bitrate CDDA transcodes. Here are some files from different lossless formats (ape, flac, tak, tta, wv) transcoded to flac, sorted by their original bitrates.

1000-1177kbps compressed, 119 files, 12h55m50s

-8 -q8
6,305,740,006 bytes

-8 -q9
6,305,212,370 bytes

-8 -q10
6,305,587,831 bytes

-8
6,307,265,332 bytes

950-999kbps compressed, 142 files, 10h8m32s

-8 -q9
4,458,162,567 bytes

-8 -q10
4,457,902,583 bytes

-8 -q11
4,458,321,772 bytes

-8
4,459,051,184 bytes

Lower bitrate files are much harder to predict, -p will make more sense.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-22 18:15:57
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
flac1021znver2john33
24/48
Total encoding time: 1:03.859, 200.51x realtime
16/48
Total encoding time: 0:29.985, 427.03x realtime

My i3-12100 must be a remarked Ryzen  :))
flac 1.4.1 Case GCC 12.2.0
24/48: 1:02.718, 204.16x realtime
16/48: 0:32.359, 395.70x realtime

flac 1.4.2 Wombat Clang 15.0.3
24/48: 1:01.844, 207.04x realtime
16/48: 0:30.000, 426.82x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-22 18:51:21
Nice. For 16-44.1 GCC 12.2.0 and disable asm is the fastest. Clang does bad with it disabled. Will be interesting how fast Case and his clean enviroment does. I depend on MSYS2.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-23 02:40:54
Pic favorites out of packages :)

Ryzen 5900x
-8 -p single file 24-96
-8 -p single file 16-44.1
metaflac Replaygain 18,6GB Hibitrate files

Clang
28.75x realtime
112.03x realtime
2:20 minutes

GCC
27.97x realtime
106.54x realtime
2:09 minutes

GCC disable-asm-optimizations
21.48x realtime
132.45x realtime
2:09 minutes
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-23 12:13:26
Looks like -q is quite predictable for high bitrate CDDA transcodes. Here are some files from different lossless formats (ape, flac, tak, tta, wv) transcoded to flac, sorted by their original bitrates.
950-999kbps compressed, 142 files, 10h8m32s

-8 -q9
4,458,162,567 bytes

-8 -q10
4,457,902,583 bytes

-8 -q11
4,458,321,772 bytes

-8
4,459,051,184 bytes

Lower bitrate files are much harder to predict, -p will make more sense.
All flac 1.4.2, multi-thread. They all have same file sizes.

Upper: -8p
4455579306 bytes

Lower: -8 -q10 -b2880
4456433345 bytes

Wombat GCC 12.2.0
Total encoding time: 1:19.875, 457.10x realtime
Total encoding time: 0:28.890, 1263.81x realtime

Wombat GCC 12.2.0 noasm

Total encoding time: 1:23.219, 438.74x realtime
Total encoding time: 0:29.890, 1221.53x realtime

Wombat Clang 15.0.3
Total encoding time: 1:24.375, 432.72x realtime
Total encoding time: 0:29.797, 1225.34x realtime

Xiph
Total encoding time: 1:26.812, 420.58x realtime
Total encoding time: 0:31.203, 1170.13x realtime

Finally different from a real Ryzen.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-23 12:53:19
24-bit, 88.2-352.8kHz, 14 files, multi-thread, -8p

Wombat GCC 12.2.0
Total encoding time: 1:40.266, 40.59x realtime
1916812600 bytes

Wombat GCC 12.2.0 noasm
Total encoding time: 1:54.453, 35.56x realtime
1916812600 bytes

Wombat Clang 15.0.3
Total encoding time: 1:39.859, 40.76x realtime
1916812631 bytes

Xiph
Total encoding time: 1:41.250, 40.20x realtime
1916812621 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-23 14:07:37
I'll have a go too.

Here are the results for testing the 'Xiph' Win64 binary

Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 19:20:41
High resolution coming up. Prepare to be impressed if you haven't already seen what 1.4.x can do to high resolution.

* No classical music in this corpus - that behaves different (much smaller benefits from going above -7), so this is arguably a bit 1.4-friendly
* All stereo. Vast majority 96/24; a little bit of it is 88.2, and one track is 96/16.
* 178 files. Nearly all my non-classical high resolution stereo downloads. (DSD test files excluded, but who the hell uses those for anything but WavPack worship?)
* File sizes with tags removed. Which I forgot to do about the FLACs, so I removed afterwards, saw a decrease of 22 670 684, and adjusted. I think this means padding is removed too. Maybe a bit unfair to the APEv2 tagged formats, which don't need padding, but more interesting to discussing the codecs themselves. (But, I used MD5 ... as if that matters much.)
* Everything that isn't stated as a different codec or as 1.3, are 1.4.1 or 1.4.2

64.286%ALAC (refalac 1.75)
63.589%FLAC 1.3 at -5
62.647%TAK -p0
62.058%FLAC 1.3 at -8e
61.799%-3 comfortably beats old -8pe too, not only -8e
61.477%TTA
60.746%Monkey's Normal
60.745%Monkey's High
60.798%WavPack -hx
59.920%-5
59.868%Monkey's Insane
59.742%Monkey's Extra High
59.571%TAK -p1
59.313%MPEG-4 ALS at default
58.979%-7
59.047%TAK -p2
58.673%-8e is faster and compresses better than -8p. but there are better options
58.649%avoid this -8pe. it takes ten times -8e
58.560%-7r7 -A "subdivide_tukey(6)"
58.525%-7r7 -A "subdivide_tukey(6)" -l 13
58.525%-7r7 -A "subdivide_tukey(6)" -l 14 (yes slightly smaller than -l 13)
58.489%-7r7 -A "subdivide_tukey(6)" -l 15
58.477%about -8e speed: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14
58.475%-7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14 -b 8192
58.454%-7r7 -A "subdivide_tukey(7);tukey(7e-2)" -l 16 -b 8192
58.059%WavPack -hx4
57.879%TAK -p3
57.753%TAK -p4m
57.412%OptimFROG --preset 2 (default setting)
.
Evidence from this and some other non-rigorous testing on a part of these 178 files:
* This is damn good, although it's not gonna touch TAK nor WavPack -hx4 (where for the WavPack I tried only those two settings, -hx and -hx4 this time).
Actually, if I take the high-rez part of ktf's comparison (http://audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%205%20-%20hires.html) and "manually" imagine a FLAC improvement like this, it doesn't seem to beat TAK -p1, so this is likely "more FLAC-friendly material" - relatively at least.
* That "all the sevens" thing: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" - with or without -l 14 (which is a "seven" good enough to remember) - were at first arbitrarily chosen to see "what does it take to get around -8e time". As you see, it is much better ... urhmh ... to the extent anything around there is "much".
* -8e beats -8p at both time and size. Maybe surprising - to those who haven't already tested it. ( @bennetng , you just tested hi-rez -8p: how dows it work with your material?)
* -l 13 to -l 15 have something to them, but careful: It does not seem to be the case directly off -7 or -8. Say -8 -l 13 is not good, but -8 -A [something slow] -l 13 is. A bit of testing indicates that -l 13 starts saving space at -A subdivide_tukey(5) and -l 14 at (6).
With high-res classical music, -l 13 is the setting that improves over -7.
* -b 8192 also needs "-A [something slow]", it seems also to do harm when applied to -7 or -8 plain. But it doesn't help much here.
* -r7 is a good thing, but not at my high-resolution classical music; there, the sixth and seventh order are seldom used at all. (But at worst I found, -r7 at classical makes for 0.2 parts per million in size and only costs a bit of time, so ... if you want one monster slow setting, include -r7. -r8 doesn't improve much over -r7 it seems.)
* Monkey's wasn't developed for high resolution. Well we knew that - none of the codecs were. But however, on the Planet of the Apes there is nothing to do about it within a given compression mode: then a given signal yields a given encoded bitstream with no room for improving anything but speed.

Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-23 19:35:24
What are logical values for the x_tukey options?  Seems I can put anything and the encoder accepts it.  I see above you're using (7e-1), I've seen (3/2e-1).  I tried something random like (9/7z-1) and it works.  I've had good results using whole numbers, but what about these other values?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-23 20:01:38
I tried something random like (9/7z-1) and it works.
It is a bit tricky that the encoder accepts anything and silently drops stuff it doesn't understand.

If you want to know what it does, read the explanation at the bottom of this page: https://xiph.org/flac/documentation_tools_flac.html

TL;DR: for starters, just use whole numbers like subdivide_tukey(5) or something. If you feel like it, you can specify a second fraction between 0 and 1, like subdivide_tukey(5/0.2) The second value is locale-specific, so is subdivide_tukey(5/0,2) for many non-English PCs. Using scientific notation (2e-1) is a way around that. For other apodizations, see the linked document.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-23 20:08:14
* Monkey's wasn't developed for high resolution. Well we knew that - none of the codecs were. But however, on the Planet of the Apes there is nothing to do about it within a given compression mode: then a given signal yields a given encoded bitstream with no room for improving anything but speed.
If FLAC wasn't changed in a non-backwards compatible way back in 2007 (https://www.ietf.org/archive/id/draft-ietf-cellar-flac-07.html#name-addition-of-5-bit-rice-para), FLAC would have done terrible at 24-bit too. Luckily that change went in before Josh left. Now it is way too late to make such a change.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 20:30:17
Yeah, maybe I should not ask more questions on that document by now, but ... C3, you set the limit at 16 bits. How about 20?
Asking because the hdcd.exe utility appeared around 2007. So quite soon there would indeed be quite a few CD-sourced 24 bit (with at least four wasted) files.
And so if that were a problem, it would likely have manifested itself - unless reference FLAC would use Rice-4 on those signals then.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 20:56:41
I've seen (3/2e-1)
Just a point: "/" is not a division slash, it is a separator between arguments, where the first is mandatory.
tukey(P) takes only one argument in, and that is a number between 0 and 1. The subdivide_tukey can be specified as subdivide_tukey(N) and optionally subdivide_tukey(N/P) - but then again, the "/P" has nothing to do with division. As ktf says, for starters stick to N and remember that higher N will slow down.

What this tukey function does? For the block of the signal - 4096 samples, typically - it keeps the middle 1-P fraction and it downweighs the beginning and end according to a cosine function. Turns out, it typically gives a much better predictor than not applying any weight - that would be "rectangle".
The subdivide_tukey "generates more functions" in a way that recycles lots of calculations. It takes time to try them all, but it doesn't make for more complicated decoding. They are several simple attempts, and the encoder picks the one that happens to fit best. The decoder doesn't know how hard the encoder tried.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-23 21:05:30
@bennetng , you just tested hi-rez -8p: how dows it work with your material?
Files in Reply #208 contain classical, jazz and pop stuff but in general -b is much more important than -p. Acoustic / unplugged materials as usual may benefit from higher -b, for 88.2k and above I would try 6144 to 16384 for these genres. Amplified / electronic / loudness war hi-res files still prefer lower -b. When -b is wrong, windowing also works poorly.

I also found something about decoded HDCD, I mean the "real" tracks which really make use of transient filter and peak extension. flac seems pretty good at dealing with HDCD. I used flac -a and see many wasted bits in a decoded HDCD image.

I investigated wasted bits with a normal 16-bit file, using foo_dsp_utility > Scale. 24-bit output with 0.5 scale is essentially bit shift, so bitrate almost remain the same, but then I tried 0.75, 0.625 and 0.875 and flac can still get a lot of wasted bits. However, with dither or something like -0.1dB gain flac can no longer see wasted bits. foo_dsp_utility > Add Noise kills wasted bits as well, like 0.000001 noise with 16 to 24-bit conversion.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 21:22:41
Files in Reply #208 contain classical, jazz and pop stuff but in general -b is much more important than -p.
Comparing -8e to -8p?
For CDDA, -8p is better and -8e is (not always (https://hydrogenaud.io/index.php/topic,122949.msg1015422.html#msg1015422)) so much outdone that you wouldn't use it.
For high resolution, -e cannot be so easily written off, at least it is better than -p - in my tests, that is.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-24 02:12:16
Comparing -8e to -8p?
Only one of the 20 24-96 albums i use for the Replaygain benchmark comes out smaller with -8 -e vs -8 -p.
The size difference is not so much but speed is indeed.
The GCC version is much faster in multithreading with these as the Clang version if anyboby wants to know.

-8 -p
233.22x realtime
19.713.844.242 Bytes

-8 -e
292.17x realtime
19.721.033.993 Bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-24 06:25:41
Yeah, maybe I should not ask more questions on that document by now, but ... C3, you set the limit at 16 bits. How about 20?
That bit is directly what libFLAC does too. Seems silly to change it when there are already so many versions out that do this at 16 bit. Also, it would probably hurt compression a bit and ffmpeg doesn't care anyway, it even uses 5-bit Rice parameters for 16-bit audio in extreme cases.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 07:56:10
Here is what I can do with -e: resample a CDDA album to 24/88.2 with different resamplers. Care was taken to avoid clipping.
Code: [Select]
1052063534 SoX best 8p.flac
1050564778 SoX best 8e.flac

1214873052 RetroArch normal 8p.flac
1215123562 RetroArch normal 8e.flac

1335883144 RetroArch lower 8p.flac
1336339394 RetroArch lower 8e.flac
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-24 08:06:48
This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements (https://hydrogenaud.io/index.php/topic,120158.msg1014265.html#msg1014265).
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 08:22:49
No contradiction because the difference is night and day when compared to 1.3.x. No -p or -e.

1052844799 SoX best flac 142 -8.flac
1257694431 SoX best flac 134 -8.flac
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-24 09:39:54
Don't want to crash the hires party, but here are some test results with v1.4.2 binaries floating around here.
As always, CPU is Intel Core i7-8700 CPU @ 3.20GHz, test corpus is mostly classic rock CDDA material).
Code: [Select]
FLAC Binary: xiph-142\flac.exe (299520 bytes)
FLAC Option: -7
- Average time =  27.940 seconds (5 rounds), Encoding speed = 386.97x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-gcc1220-Ofast+manyflags-noasm-wombat_2022-10-23.exe (665600 bytes)
FLAC Option: -7
- Average time =  22.560 seconds (5 rounds), Encoding speed = 479.25x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-gcc1220-Ofast+manyflags-wombat_2022-10-23.exe (737280 bytes)
FLAC Option: -7
- Average time =  25.519 seconds (5 rounds), Encoding speed = 423.68x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-Clang1503-Ofast-wombat_2022-10-23.exe (613376 bytes)
FLAC Option: -7
- Average time =  24.699 seconds (5 rounds), Encoding speed = 437.75x
- FLAC size = 1.167.014.372 bytes (= 61,188% of WAV size, ~863 kbps)
So here wombat's gcc build w/noasm is the fastest, on-par with his 141-fastmath build from 2022-10-14.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-24 10:26:56
This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements (https://hydrogenaud.io/index.php/topic,120158.msg1014265.html#msg1014265).
The improvements in 1.4.0 weren't related to the guesstimation. 1.4.0 improved the accuracy with which predictors were formed, the guesstimation (that -e circumvents by brute-forcing) is in which order to pick.

FLAC's LPC encoding works by first calculating autocorrelation. These calculated autocorrelation numbers are then crunched to form a set of predictors, one for each prospective LPC order (for preset 8 that is order 1 through 12). These predictors have become more accurate with the release of 1.4.0. However, the encoder still has to guess which order will result in the smallest representation. This guesstimation remains unchanged.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 11:26:23
@ktf what is the problem with this file?
https://hydrogenaud.io/index.php/topic,123219.msg1018107.html#msg1018107

BTW... (https://hydrogenaud.io/index.php/topic,123025.msg1018053.html#msg1018053)
Code: [Select]
34586615 -8p.flac
34590833 -8 -b2304 -q10.flac
34598936 -8e.flac
34606376 -8.flac
50697404 Desert Rose.wav
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 17:15:10
Try this with 24-bit and >= 88.2kHz, ideally with appropriate -b:

-8 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"

Much faster than -8e and seems to give good results, including "real" and "fake" hi-res and DSD transcodes with appropriate ultrasonic filtering. May not work well with clipped and loudness war hi-res though. -8p for 24-bit is just too slow. -8e is about two times as fast but still slow.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-24 17:48:34
This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements (https://hydrogenaud.io/index.php/topic,120158.msg1014265.html#msg1014265).
The improvements in 1.4.0 weren't related to the guesstimation. 1.4.0 improved the accuracy with which predictors were formed, the guesstimation (that -e circumvents by brute-forcing) is in which order to pick.

Yes, but: for CDDA it seems you (& the rest of the developers) effectively killed the need for -e - and without touching the guesstimation algorithm, then how?
I can only guess that the better you hit (and double precision does that!), the closer you get to what FLAC reasonably can achieve, and the smaller are the improvements available by any means.
Now with the files I tested above, FLAC can beat TAK -p2. Not proving anything, but mildly suggesting that at least here, you really hit something close to optimum - but all for sudden there is a brute-force switch that appears more attractive than for CDDA.
And that is kinda puzzling.

Also -p not doing well ... actually, I might be fooled here by -p having become damn slow, which leads to writing it off as "not worth it". Question: are the new versions doing the -p routine entirely in double precision even after precision is truncated down to 15...5 bits - and can that explain the slowdown? (And if so: is that even necessary? Is there anything gained to stay in double precision after you have calculated a proto-predictor to be rounded off?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 18:30:28
58.477%about -8e speed: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14
58.475%-7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14 -b 8192
Did you really type "flatopp"?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-24 19:30:51
It is a bit tricky that the encoder accepts anything and silently drops stuff it doesn't understand.

If you want to know what it does, read the explanation at the bottom of this page: https://xiph.org/flac/documentation_tools_flac.html

TL;DR: for starters, just use whole numbers like subdivide_tukey(5) or something. If you feel like it, you can specify a second fraction between 0 and 1, like subdivide_tukey(5/0.2) The second value is locale-specific, so is subdivide_tukey(5/0,2) for many non-English PCs. Using scientific notation (2e-1) is a way around that. For other apodizations, see the linked document.

Just a point: "/" is not a division slash, it is a separator between arguments, where the first is mandatory.
tukey(P) takes only one argument in, and that is a number between 0 and 1. The subdivide_tukey can be specified as subdivide_tukey(N) and optionally subdivide_tukey(N/P) - but then again, the "/P" has nothing to do with division. As ktf says, for starters stick to N and remember that higher N will slow down.

What this tukey function does? For the block of the signal - 4096 samples, typically - it keeps the middle 1-P fraction and it downweighs the beginning and end according to a cosine function. Turns out, it typically gives a much better predictor than not applying any weight - that would be "rectangle".
The subdivide_tukey "generates more functions" in a way that recycles lots of calculations. It takes time to try them all, but it doesn't make for more complicated decoding. They are several simple attempts, and the encoder picks the one that happens to fit best. The decoder doesn't know how hard the encoder tried.

Thanks for the information.  I did a test of different combinations of options, so see what kind of times vs compression I would get.  I started the test before I asked the question, and it took over a day to complete.  I had put in a couple random values for subdivide_tukey(X/Xe-1) It looks like "subdivide_tukey(21/15e-1)" shouldn't work, as that would exceed what should work for tukey, but it did seem to be helpful here.

Code: [Select]
Ratio:      Size:             Enc Time   Options used
59.41%   464.176M   8:32:57    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(21)  
59.41%   464.178M   5:15:19    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(21)  
59.41%   464.185M   5:34:39    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(17)  
59.41%   464.187M   3:29:06    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(17)  
59.42%   464.205M   3:15:35    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(13)  
59.42%   464.206M   2:03:37    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(13)  
59.43%   464.281M   31:07.50    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(21/15e-1)  
59.43%   464.281M   46:29.04    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(21)  
59.43%   464.283M   31:08.72    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(21)  
59.43%   464.292M   30:45.12    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(17)  
59.43%   464.293M   20:43.93    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(17)  
59.43%   464.297M   20:36.87    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(17/12e-1)  
59.43%   464.315M   18:20.15    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(13)  
59.43%   464.316M   12:19.42    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(13)  
59.43%   464.320M   12:20.79    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(13/9e-1)  
59.43%   464.342M   1:35:41    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(9)  
59.43%   464.343M   23:23.17    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(21)  
59.43%   464.344M   59:58.54    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(9)  
59.44%   464.352M   15:32.61    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(17)  
59.44%   464.373M   9:18.64    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(13)  
59.45%   464.444M   30:08.22    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(5)  
59.45%   464.447M   19:02.92    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(5)  
59.45%   464.466M   6:05.76    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(9/6e-1)  
59.45%   464.477M   9:01.27    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(9)  
59.45%   464.479M   6:05.86    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(9)  
59.46%   464.525M   4:39.14    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(9)  
59.47%   464.595M   2:00.80    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(5/3e-1)  
59.47%   464.598M   2:55.68    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(5)  
59.47%   464.600M   2:00.44    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(5)  
59.47%   464.644M   1:35.94    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(5)  
59.48%   464.661M   6:08.93    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(21)  
59.48%   464.663M   4:25.36    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(21)  
59.48%   464.670M   4:09.56    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(17)  
59.48%   464.671M   3:01.99    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(17)  
59.48%   464.691M   2:34.32    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(13)  
59.48%   464.692M   1:54.71    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(13)  
59.48%   464.723M   3:34.07    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(21)  
59.48%   464.730M   2:28.18    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(17)  
59.49%   464.748M   1:33.87    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(13)  
59.50%   464.849M   1:21.94    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(9)  
59.50%   464.850M   1:02.65    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(9)  
59.51%   464.896M   0:52.11    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(9)  
59.51%   464.962M   0:32.64    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(5)  
59.51%   464.963M   0:26.19    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(5)  
59.52%   465.009M   0:22.85    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(5)  
59.54%   465.197M   0:12.73    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(3)   # Same as preset -8 #
59.59%   465.591M   0:08.72    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(2)   # Same as preset -7 #
59.74%   466.717M   0:07.24    -b 4096 -m -l 8 -r 6 -A subdivide_tukey(2)   # Same as preset -6 #
59.90%   467.965M   0:05.40    -b 4096 -m -l 8 -r 5   # Same as preset -5 #
64.92%   507.177M   0:03.73    -b 1152 -l 0 -r 3 --no-mid-side   # Same as preset -0 #
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-24 21:24:30
Nice. For 16-44.1 GCC 12.2.0 and disable asm is the fastest. Clang does bad with it disabled. Will be interesting how fast Case and his clean enviroment does. I depend on MSYS2.

I noticed this as well.  With asm enabled, Flac performs much better, but still a little behind Flac compiled with GCC.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-24 21:49:05
Thanks for the information.  I did a test of different combinations of options, so see what kind of times vs compression I would get.  I started the test before I asked the question, and it took over a day to complete.  I had put in a couple random values for subdivide_tukey(X/Xe-1) It looks like "subdivide_tukey(21/15e-1)" shouldn't work, as that would exceed what should work for tukey, but it did seem to be helpful here.

There is a complication particular to subdivide_tukey(N):
subdivide_tukey(1) is the same as a single tukey(.5), but subdivide_tukey(N) tapers off a fraction of .5/N (that slash is a division slash!).
It wants to generate N small tukey humps (and more!) - and with a small tukey hump, it doesn't make that much sense to taper away a big fraction of the total from every small one. But the consequence is that 21/15e-1 tapers off 1.5/21ths = 1/14ths or around 7 percent of the window, or around 3.5 percent at each end.

That  .5 divided by N is also the reason why I was testing something like
-A "subdivide_tukey(5/125e-1);tukey(666e-3)"
The subdivide_tukey tapers very little (that is, is fairly close to rectangle) so I combine it with a tukey that tapers a lot (that is, tapers as much as the left third and the right third and leaves only the middle third un-downweighted). Why do that? Making the functions different. There is no use trying two identical functions, and very little use trying two near-identical ones.
And that function is faster than subdivide_tukey(6/P).

Before partial_tukey, punchout_tukey and subdivide_tukey, you had to type in each and every function and it would only do one function per one you typed. subdivide_tukey(5) does a lot (and it does them faster than typing N individual into it).


Did you really type "flatopp"?
Damn, only one way to find out, and that is not done in two minutes ...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 07:54:50
I'd take a look at how the "best" settings compared to others with the attached simple signal :D

-b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(21)
4970861 bytes 3:22.843, 0.14x realtime

-b512 -r8 -q5
4583068 bytes 0:00.171, 175.43x realtime

WavPack hhx6
4376064 bytes 0:01.313, 22.84x realtime

APE insane
5018164 bytes 0:00.391, 76.72x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-25 09:21:51
I'd take a look at how the "best" settings compared to others with the attached simple signal :D
If the signal is so simple that 7 zip outperforms all audio codecs, I don't think is fit for such a test
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 11:05:34
Merzbow Pulse Demon and Venereology fit into this catagory. xz even works better than 7z.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-25 14:02:02
Merzbow Pulse Demon and Venereology fit into this catagory. xz even works better than 7z.
Venereolog'ing the compressors: https://hydrogenaud.io/index.php/topic,122040.msg1010086.html#msg1010086

The ultra-useless sac compressor could shave more than a quarter off the smallest OptimFROG.
I got xz down to 40 972 856, not beating more than the second-smallest .sac file. Well xz took only 20 seconds.
7z was around ten percent bigger than xz.

There must be some long-term repeating patterns there. Long-term as far as audio goes. AFAIunderstand, an OptimFROG block can be several seconds long (and an insane monkey nearly half a minute, without being able to make any sense out of that track). In the very least, xz got the flac file down by ten percent.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 17:02:46
With -l12 -r8 -b512 -q5 I got 424599071 bytes in Venereology at 207x speed.

Then the slow stuff:
424286964 bytes with subdivide_tukey(21)
424266692 bytes with subdivide_tukey(14)
424261436 bytes with subdivide_tukey(11/1)
424249707 bytes with subdivide_tukey(21/3)
424238652 bytes with subdivide_tukey(15/2)

11/1 is about two times faster than others, other settings have similar speed, at least in this test.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 19:16:08
* Monkey's Audio needs more bits than uncompressed PCM. TAK makes it below 1411 on a setting.
* 58176764: the .wav file
* 56535466 for the smallest .flac I could get
* 53920654 for the smallest .wv I could get
* 53222937 for the smallest frog
* 50613838 for sac at default
* 49046602 for the twelve-hour sac compression. But hey, it decodes in less than ten minutes. (Realtime playback? Please, no irrelevant questions! O:)  )
I see, only track 3 is used.
-l3 -b512 -r8 -q5 -e -A subdivide_tukey(15/2)
52292911 bytes
Higher -l does nothing for this track apart from slowing things down. The whole album would need a higher value though.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-25 19:46:42
Merzbow Pulse Demon and Venereology fit into this catagory. xz even works better than 7z.
I wouldn't call Merzbow proper material to base a test on, except when part of a well-rounded corpus. In this particular case, you have a signal for which 7z outperforms audio codecs by a factor of more than 10.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 20:20:49
I wouldn't call Merzbow proper material to base a test on, except when part of a well-rounded corpus. In this particular case, you have a signal for which 7z outperforms audio codecs by a factor of more than 10.
But you can see the potential of flac (vs other lossless audio formats, not 7z or xz) in these special signals when the optimal parameters are being used. Assume there is a multi-pass encoder (can't use pipe?) or a "long-term analyzer" does it mean there is no need for end users to guess the optimal settings? Not that I want something like this but just curious about it.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-25 20:53:46
I think implementing variable blocksize encoding would solve most of it. However, as the recent problems with variable blocksize encoding in CUETools have shown, this is no easy task. Many other things in FLAC are already being 'brute-forced'. It is rather high on my list because it would make the reference encoder cover more of the FLAC specification, which is something I pursue because of the IETF standardization effort.

I think that is a better idea than encoding a file several times with different fixed blocksizes to see which one results in the smallest file. -q is already brute-forced per-subframe when you use -p.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-25 21:55:49
variable blocksize encoding [...] is rather high on my list
And I was about to write that well, this makes a case for variable block size, which is now waiting in line somewhere between positions 2048 and 4096 on the to do list ...

Actually, with in particular partial_tukey(2) and thus subdivide_tukey(2), I've been thinking: if that is what the encoder selects - just dropping half of the block when designing the predictor - then that is a case where you would try to split the block in two, keep the predictor where it is good, and look for another for the rest? Same goes for the (4).

Here I am making an assumption that could be fiddled with, namely that we are not using "adjacent" samples to calculate the predictor of a certain frame. That isn't god-given either.

As for CUETools, their solution is to remove the variable block size option, https://github.com/gchudov/cuetools.net/pull/223 . So without it finding its way into reference flac, it won't be ... anywhere?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-25 22:07:32
... but for the Merzbow track, the magic is actually not in the -b, but in the -r. Try options
--lax  -r 15
That's right, no -8, no -p, no -e, no -b (edit: wrote that wrong!)
I got 51954953 bytes. Throwing in a -b 16383 reduces it slightly to 51923540 bytes.

The reason that -b512 looks so good, is not that it is a good block size - it is that it makes only a few samples per Rice partition within the subset's maximum order of 8. Relaxing that ...

So that gives another "potential case for" a variable block size.
* test -r9. If that improves, then dammit we are out of subset, and the easiest way to get back in is to halve the -b. One doesn't even have to recalculate the predictor to get that improvement.
* of course, a "smarter way" starting from current -7 and -8 would be to see if Rice pt order 6 is actually used and good for something for that frame. If it is, try order 7. If it helps, try 8. If that helps, ... previous item.

BUT: -r 8 only rarely improves, right?

Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-26 06:49:31
--lax  -r 15
That's right, no -8, no -p, no -e, no -b (edit: wrote that wrong!)
I got 51954953 bytes. Throwing in a -b 16383 reduces it slightly to 51923540 bytes.
Well 16384 "of course".
The reason for -r 15 was to try even bigger block size, but as I "discovered" that 65536 is invalid and max is 65535, a -1 snuck in when I typed it here. I wonder if anything can utilize the 15th order then, when 65536 is invalid.
51924581 for --lax -r 15 -b 32768.

So here is a weird thing:
51924584 for --lax -r 13,14 -b 32768
52709806 for --lax -r 13,13 -b 32768 and same for -r 12,13, and -r 12,12 is as "bad as" 54771887
52765170 for --lax -r 14,14 -b 32768 and unsurprisingly the same for -r 14,15. But ... but ... same also for -r 15,15?! (Maybe I need to compute some Rice encodings by hand?)

Now 16384
51923763 for --lax -r 12,13 -b 16384 (good!)
52710096 for --lax -r 11,12 -b 16384 which is lower than the next line, so here the 11 is used while for 32768 the 12 was ... well if it was, it didn't impact size.
52710316 for --lax -r 12,12 -b 16384
52763867 for --lax -r 13,13 -b 16384 and unsurprisingly the same for -r 13,14. Also for -r 14,14.

Well impact isn't monstrous. Still a bit strange.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-26 09:04:23
Another example with non-experimental music:
https://youtu.be/wJW9AIwHSGY

40388588 16 Syvalion arrange MMIX.wav
38322604 xz.xz
34931072 7z.7z
33814392 --lax -l15 -8 -r15 -b32768.flac
33742999 --lax -l32 -8 -r15 -b32768.flac
33691294 --lax -l15 -8 -r10 -b8192.flac
33627273 -8.flac
33627254 -8 -r8.flac
33627254 -8 -r7.flac
33622698 --lax -l15 -8 -r8.flac
33617925 --lax -l15 -8 -r8 -b3456.flac
33602205 -8 -r7 -q7.flac
33597671 --lax -l32 -8 -r15 -b2304.flac
33594087 --lax -l15 -8 -r8 -q7.flac
33594087 --lax -l15 -8 -r7 -q7.flac
33588219 -8pe.flac
33578671 -8 -r7 -q7 -b2304.flac
33574773 --lax -l15 -8 -r7 -q7 -b2304.flac
33568744 --lax -l20 -8 -r7 -b2304 -q7.flac
33558642 --lax -l32 -8 -r7 -b2304 -q7.flac
33558642 --lax -l32 -8 -r15 -b2304 -q7.flac

Basically, Merzbow is mostly clipped at max/min PCM values.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-26 21:47:26
I found another case where -e is much better than -p. I converted my test signal archive (sine, multitone, sweep etc) to flac from different sources including analog and digital I/O of sound cards, smartphones, cassette decks, disc players, motherboard codecs with GPU noise injected and so on. The sizes listed below are at -8.

 2371488 ALC892_44kHz.flac
 2551014 ALC892_48kHz.flac
 4632494 ALC892_96kHz.flac
11118051 Bifrost toslink_192kHz.flac
 7752605 Bifrost toslink_96kHz.flac
 8721549 Bifrost USB 96kHz.flac
 5805120 DN-C635 to Multiface II analog.flac
 2613358 headphone 44k.flac
 2791264 headphone 48k.flac
 5096341 headphone 96k.flac
 1305056 Hi Gain 24bit 44k.flac
 1405998 Hi Gain 24bit 48k.flac
 4874672 HTC one sv 1644.flac
 4954664 HTC one sv 1648.flac
  692250 imd sweep.flac
 1661020 MZ-R3 J-Test 1644.flac
 2064002 no gpu_44kHz.flac
 2189291 no gpu_48kHz.flac
 3859889 no gpu_96kHz.flac
 2504677 rca 44k.flac
 2676443 rca 48k.flac
 5714205 rca 88k asio 5ms.flac
 4838871 rca 96k.flac
 3478233 rca CD analog.flac
 9674809 rca dolby off 2496.flac
10425569 rca dolby on 2496.flac
 2328929 Result_44kHz.flac
 2420994 Result_48kHz.flac
 4837262 Result_96kHz.flac
 1994047 SMSL_48kHz.flac
 3280415 via2444.flac
 1433300 via2444spdif.flac
 3503452 via2448.flac
 1535871 via2448spdif.flac
 6628764 via2496.flac
 2459044 via2496spdif.flac

Total size:

-8
146195011 bytes

-8p
145864737 bytes

-8e
145252942 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-26 22:37:19
If you compress the at most 48 kHz vs > 48 kHz separately?
For 88.2 and up, my files compress to smaller size with -8e than -8p, although I see Wombat's 20 GB fare the opposite way.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-27 05:38:33
I refined the list to exclude 16-bit files and digitally recorded files (SPDIF), and added more analog files that I found in my archive, so all of them are 24-bit now. 24-bit means the recording bit-depth, the playback device may only support 16-bit or has no bit-depth (e.g cassette deck).

88.2 to 192k, but mostly 96k:

-8
200533202 bytes
-8p
199737874 bytes
-8e
199401868 bytes

44 and 48k:

-8
118779872 bytes
-8p
118049531 bytes
-8e
117806233 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-27 06:55:26
These test files have no copyright concern so I uploaded them, I may remove them later though.
https://1drv.ms/u/s!AvzB71jO7t0-gYwqfkhypQsi5ZNCJQ?e=MIITzy
Some files are better with 8e while others are better with 8p. The archive above was encoded with --lax to reduce the upload size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-27 16:32:34
Just encoded the original digital signals...

H:\>flac -f *.wav -8e
Test signal (44 kHz 16-bit).wav: wrote 1742181 bytes, ratio=0.153
Test signal (44 kHz 24-bit).wav: wrote 2819275 bytes, ratio=0.165

H:\>flac -f *.wav -8p
Test signal (44 kHz 16-bit).wav: wrote 1820499 bytes, ratio=0.160
Test signal (44 kHz 24-bit).wav: wrote 2950729 bytes, ratio=0.173

H:\>flac -f *.wav -8
Test signal (44 kHz 16-bit).wav: wrote 1904695 bytes, ratio=0.168
Test signal (44 kHz 24-bit).wav: wrote 3007455 bytes, ratio=0.176

Looks like -e just likes clean and stable signals with a lot of empty spectral content regardless of bit-depth. Record through the analog chain will somewhat pollute these test signals, but not dirty enough to change the overall results that -e works better with these signals.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-28 14:33:45
Here is an example that -p works better than -e:
https://www.soundliaison.com/index.php/studio-masters/856-ray-carmen-gomes-inc
The 768kHz file is free for download. --lax is required for 768kHz files.
X

PS H:\> measure-command{h:\flac -f *.wav --lax -8e -b16384}|select totalseconds
wrote 641447814 bytes, ratio=0.663

TotalSeconds
------------
  52.4976358


PS H:\> measure-command{h:\flac -f *.wav --lax -8p -b16384}|select totalseconds
wrote 641307624 bytes, ratio=0.663

TotalSeconds
------------
   68.536428


The additional windows are not very effective if the spectrum does not have a smooth decaying trend in higher frequencies. Blindly use higher subdivide_tukey(n) is just a waste of time.

PS H:\> measure-command{h:\flac -f *.wav --lax -8 -b16384 -A "subdivide_tukey(6)"}|select totalseconds
wrote 641444963 bytes, ratio=0.663

TotalSeconds
------------
  37.5487237


PS H:\> measure-command{h:\flac -f *.wav --lax -8 -b16384 -A "subdivide_tukey(5);tukey(75e-2);gauss(5e-2);blackman"}|select totalseconds
wrote 641444667 bytes, ratio=0.663

TotalSeconds
------------
  33.4128744


PS H:\> measure-command{h:\flac -f *.wav --lax -8 -b16384 -A "subdivide_tukey(5);welch;hann;flattop"}|select totalseconds
wrote 641443864 bytes, ratio=0.663

TotalSeconds
------------
   33.674331


Usually, the quickest way to reduce size is increasing -l, but setting -l too high can harm decoding speed.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Bogozo on 2022-10-28 15:47:16
Here is an example that -p works better than -e:
https://www.soundliaison.com/index.php/studio-masters/856-ray-carmen-gomes-inc
The 768kHz file is free for download. --lax is required for 768kHz files.
Looks like conversion from DSD made without high-frequency noise filtering, not normal PCM.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-28 16:09:18
The file was originally recorded in DXD, then played through a Studer tape machine and re-digitized at 768kHz using an RME interface with AKM ADC. ADC these days are mostly multibit delta-sigma and therefore the rise of noise. You can also see a faint dip at 150kHz. It is the tape bias, but someone in ASR complained about it and Sound Liason filtered the bias tone.

Read this for the whole story:
https://www.audiosciencereview.com/forum/index.php?threads/finally-music-we-can-buy-in-768-khz-sampling-rates.29544/

You can see the rise of noise when the RME interface is operating at high sample rates even when using PCM.
https://archimago.blogspot.com/2019/03/measurements-look-at-audio-ultra-high.html
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-28 19:59:18
48 kHz sample rate. Heavy (quite heavy indeed) metal.

Questions:
* For high sampling rates, 1.4.x would lead to quite impressive improvements. For CDDA, 1.4.0 "at preset N" would by and large beat 1.3.x "at preset N+1" - for high resolution, new -3 or -4 would beat old -8pe. What about for 48 kHz?
tl;dr: New -4 beat old -8p (didn't try -8ep).
* Block size. Is it a good idea to go up to the next standard block size (namely -b 4608) to closer "maintain time per block"? (Relative to 4096 samples for 44.1 kHz.)
tl;dr: from -7 up it did improve, but at -8p the improvement was only 0.003 percent, so ... do you care?

Corpus: Took everything I had of lossless 48 kHz. foobar2000 reports 58 percent 24-bit and 42 percent 16-bit. 
Not a well-balanced corpus: Mostly heavier metal, indeed a quarter of the forty-ish GB came from one single publisher of doom/stoner samplers.

Results for 739 files, sorted by file size. All are 1.4.1 except those marked "1.3.4"
462301283893b4608
462230031753
451037146201.3.4 -8b4608
450987188251.3.4 -8
450756239831.3.4 -8pb4608
450678931801.3.4 -8p
450487638724b4608
450451736844
449846006675b4608
449786770315
Below this line,-b 4680 will improve
446100977277
446071122737b4608
445874267328
445857801548r7
445848594968r8
445835800238b4608
445825244138e
445817437578r7b4608
445808340288r8b4608
445786332978eb4608
445726637888r7 -A "subdivide_tukey(5)"
445679434108r7 -A "subdivide_tukey(5)" -b4608
445611326018p
445597567638pb4608
.
Not sure if 4608 is worth the effort compared to just encoding and be done with it, biggest impact here is 0.01 percent, but ... anyway, to the questions I raised, this test indicates the following:
* So -3 is not enough to slay 1.3.x, but it seems you don't have to go much above 44.1 to see how even low 1.4.x presets are better than anything 1.3 could accomplish.
* This is material where -r8 does matter, and that suggests that smaller block size could be advantageous.
* But still, -b 4608 improves once the predictor already is good enough, which requires 1.4.x. And with 1.4.x it does happen earlier (i.e. for lower subdivide_tukey) than for 96/24, see below. At default -5, -b 4608 was slightly harmful, but even at -7 it would help.
Actually, the biggest difference that -b 4608 did, was at the -8r7 -A subdivide_tukey(5); that was the only that crossed the 0.01 percent mark. 
But then the difference is down to 0.003 percent at -8p, so ... mostly academic interest this.

However the corpus may reduce the benefit of -b 4608. At least, for 96/24 it was the classical music that benefited the most from adjusting block size, and heavier music (like here) did not benefit as much.


High resolution coming up.
[...]
* -l 13 to -l 15 have something to them, but careful: It does not seem to be the case directly off -7 or -8. Say -8 -l 13 is not good, but -8 -A [something slow] -l 13 is. A bit of testing indicates that -l 13 starts saving space at -A subdivide_tukey(5) and -l 14 at (6).
With high-res classical music, -l 13 is the setting that improves over -7.
* -b 8192 also needs "-A [something slow]", it seems also to do harm when applied to -7 or -8 plain. But it doesn't help much here.

... for 48 kHz and 4608, it didn't need "something slow".
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-28 20:20:29
How about dividing your metals into two groups?
[1] A lot of fast drumming especially the higher pitched ones with strong transients (hi-hat, snare, rim shot...)
[2] Mostly heavy in guitar and bass but in general slower paced.
Does one of them benefit from a different block size than the other? (edit: including -b below 4096)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-29 12:14:54
-e requires fairly low noise to work. Spek's spectrogram only has about 14 bits of dynamic range. The attached noisy.flac looks identical to clean.flac which can be misleading. Better use other tools like SoX to view the spectrum.

H:\>flac -f *.wav -8p
clean.wav: wrote 562708 bytes, ratio=0.425
noisy.wav: wrote 687112 bytes, ratio=0.519

H:\>flac -f *.wav -8e
clean.wav: wrote 515394 bytes, ratio=0.390
noisy.wav: wrote 687255 bytes, ratio=0.519

The advantage of -e is gone after I added some -80dB noise.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-29 20:17:17
How about dividing your metals into two groups?
[1] A lot of fast drumming especially the higher pitched ones with strong transients (hi-hat, snare, rim shot...)
[2] Mostly heavy in guitar and bass but in general slower paced.
Does one of them benefit from a different block size than the other? (edit: including -b below 4096)

I took ~10 GB (238 files) from the "least distorted" end of it. 

Now higher block size is not that good:
* I had to go all the way to -A <something higher> again ...
* ... and enter a "p" in there, and 4096 rules.  Even with -8p -A <something higher>.

Everywhere in the following, -A is short for -A "subdivide_tukey(4/125e-3);tukey(7e-1);flattop".

10396908384   -7b2048
10390256536   -8b2048
10387465483   -8b2048 -A
10380212639   -8b2304 -A
10374886478   -8pb2048
10372850371   -8pb2048 -A
10370624591   -7b4608
10370154789   -7
10368208665   -8b3072 -A
10366619584   -8pb2304 -A
10364101954   -8b4608
10363963614   -8
10360997863   -8 -A
10360867169   -8b4608 -A 
<-- the only case where a different block size helps!
10359300604   -8pb3072 -A
10357464670   -8pb4608
10356736011   -8p
10354455085   -8pb4608 -A
10353968178   -8p -A

.

You see that -b 2048 degrades -8p -A "subdivide_tukey(4/125e-3);tukey(7e-1);flattop" down to worse compression than -7. 2304 is much better.

What included: Prog, heavy post-rock and not-so-growling guitars.  Not much Iron Maiden-alike metal - well, https://zephaniahband.bandcamp.com/track/destiny was part of it (not the album, this track from a sampler).  And there is music like https://houseofmythology.bandcamp.com/track/the-power-of-love from former black metal act Ulver.  (Yes former, just listen.)
And a couple of bootleg albums too (here is a long one, Nine Inch Nails & David Bowie: https://ninlive.com/shows/1995/19951011.html) and a couple of vinyl rips. 

What omitted: to give you an idea, https://bspliveseries.bandcamp.com/track/you-write-your-name-in-my-skin-live-2 .  Yes the drum machine provides for some transients, not all slow - but the guitar is not strummed in fast succession.  Lots of slow heavy music in the remaining 30 GB where -b 4608 could be of help.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-29 21:52:16
Thanks, good to have some data.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-30 09:44:54
Yeah, your results with lower block sizes [table below now includes a couple of -b 3456 too] have puzzled me a bit in general, as I only very rarely experience the same. So I tried a few more settings and found out that on this corpus, -r7 and even -r8 have some impact. That should indicate that by halving the block size, you get for free a "better" partitioning for Rice'ing - but that is not enough by far: halving block size inflates files by around .15 percentage points (tenfold what the -A thing improves!) Hm ... have you tried, whenever lower block size improves, whether -r7 or even -r8 would make for the same benefit?

Some more tests - to compare with other "minor impact" changes in parameters - are filled in to the table below.
Also some tests are not included in the table, as I scripted them with, *cough* different padding and I didn't bother to go back and re-do it. They are not comparable with the table, but they are comparable "with each other", so I give size orderings (worse to better):
-8eb2304  >  -8eb2304 -A  >  -8eb8192  >  -8eb8192 -A  >  -8eb4608  >  -8e  >  -8eb4608 -A  >  -8e -A  >  -8eb4608 -A
Then for -3/-5, it seems that 3000s fare well, but default is never far from best.
Then for -2, ordering: b512 > b1024 > b1152 default > b8192 with --lax > b3456 > b4608 > b2048 > b3072 > b3072 > b4096 > b2304.  I guess those who contemplate -l 0 have other concerns than .15 percentage points size impact though.

Table then, this time with compression ratios. Again the "-A" signifies -A "subdivide_tukey(4/125e-3);tukey(7e-1);flattop".

100,000%.wav
63,004%-8 --no-mid-side (i.e. dual mono)
62,075%-5b8192 --lax
62,026%-5b2048
61,971%-5
61,660%-8Mb2048
61,615%-8r2 -b2048
61,601%-7b2048
61,573%-8r4 -b2048
61,561%-8b2048
61,552%-8r8 -b2048
61,545%-8b2048 -A
61,507%-8r2
61,505%-7 -l 11
61,502%-8b2304 -A
61,489%-8M
61,470%-8pb2048
61,466%-8b8192 --lax
61,462%-7b3456
61,458%-8pb2048 -A
61,445%-7b4608
61,442%-7
61,431%-8b3072 -A
61,429%-8b3456
61,422%-8r4
61,421%-8pb2304 -A
61,413%-8b3456 -A
61,406%-8b4608
61,406%-8
61,403%-8r7
61,400%-8r8
61,388%-8 -A
61,387%-8b4608 -A
61,378%-8pb3072 -A
61,367%-8pb4608
61,365%-8pb3456
61,363%-8p
61,349%-8pb4608 -A
61,346%-8p -A
61,342%-8r9 -l 13 --lax
.

The table includes dual mono and -M (which selects decorrelation strategy adaptively) and carelessly I did not think over some of it being mono already - but that material (namely the NIN+Bowie bootleg) amounts only to 4.5 percent of the total file size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-30 12:31:56
I compiled a CDDA list that I thought would work best with -b3456 without using --lax. Turns out -b2304 won. They are mostly J-pop, not necessarily very fast but in general have percussion with good transients and not too much reverb.
Code: [Select]
https://youtu.be/wZ6D6ikU7Qs
https://youtu.be/L-0cJqZ5WU4
https://youtu.be/MM8RufZr5lw
https://youtu.be/pYnLO7MVKno
Total length 8h56m42s, around 63.98% of original size when compressed.

-8b4608 -r8
3635937892 bytes

-8b2048
3635016402 bytes

-8
3634663623 bytes

-8 -r8
3634586639 bytes

-8b4608 -r8 -A subdivide_tukey(5/2e-1)
3634481652 bytes

-8b3456
3634303829 bytes

-8b3456 -r8
3634260975 bytes

-8b3072 -r8
3634036597 bytes

-8p -b4608 -r8
3633576341 bytes

-8b2304
3633523804 bytes

-8b2304 -r7
3633489680 bytes

-8b2304 -r8
3633480956 bytes

-8b3456 -r8 -A subdivide_tukey(5/2e-1)
3633136956 bytes

-8b2304 -r8 -A subdivide_tukey(5/2e-1)
3632658660 bytes

Looks like 1024 and 1152 based block sizes are not really correlated to sample rates, otherwise -b2048 should not perform this bad. The -b3456 thing in my previous test may have some stuff with fewer transients. If you still want to try CUETools.Flake, -8 --vbr 4 will be more effective with -r 8 and -s search.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-30 13:11:33
Why -8b2304 is so much better than -8b2048, putting them on opposite sides of the 3456/4096 ...
... probably we are in for another brute force.

Edit: Also -r8 helps -8, much more than it helps -8b<lower>, "the 8th r" doesn't help that much over the seventh ...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-30 16:46:03
I compiled a CDDA list that I thought would work best with -b3456 without using --lax. Turns out -b2304 won. They are mostly J-pop, not necessarily very fast but in general have percussion with good transients and not too much reverb.
Code: [Select]
https://youtu.be/wZ6D6ikU7Qs
https://youtu.be/L-0cJqZ5WU4
https://youtu.be/MM8RufZr5lw
https://youtu.be/pYnLO7MVKno
Total length 8h56m42s, around 63.98% of original size when compressed.
[...]
If you still want to try CUETools.Flake, -8 --vbr 4 will be more effective with -r 8 and -s search.
Files were transcoded form APE, flac and WavPack instead of wav, multi-thread.

flac 1.4.2

-8b2304 -r8 -p
Total encoding time: 2:05.453, 256.68x realtime
3630058884 bytes

-8b2304 -r8 -pe
Total encoding time: 24:46.406, 21.66x realtime
3629421254 bytes

CUETools.Flake 2.2.2 (MD5 mismatch in 2 files)

-8 -r 8 -b 4608
3635394735 bytes

-8 -r 8 -b 2048
3635167153 bytes

-8 -r 8
3634138483 bytes

-8 -r 8 -b 3456
3633820714 bytes

-8 -r 8 -b 3072
3633680354 bytes

-8 -r 8 -b 2304
3633559971 bytes

The fixed block sizes tests above all have around 500x speed.

-8 -r 8 --vbr 4
Total encoding time: 1:26.344, 372.94x realtime
3627508218 bytes

-8 -r 8 --vbr 4 -s search
Total encoding time: 3:54.438, 137.35x realtime
3625410640 bytes

So vbr is an amazing thing... if there is no MD5 mismatch. Perhaps just scan for integrity after encoding, and re-encode the corrupted files with another encoder. In fact, after seeing this glitch I even scan files encoded with the Xiph encoders, if I am going to delete the original.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-30 17:45:16
Encoding from .ape will skew the timings, as .ape takes even longer time decoding than encoding. But as long as conditions are equal for each run, cardinal time figures are nothing but indications anyway, in this thread where the number of compiles x CPUs probably match the number of FLAC options humans have ever hand-coded ...

In fact, after seeing this glitch I even scan files encoded with the Xiph encoders, if I am going to delete the original.
I always do. What if the process is aborted for whatever stupid reason, leaving a partial file?
Sure there is the -V , but for mass conversion: running a foo_bitcompare on all, to obtain one single line saying no differences, that is more idiot-proof than a human (myself) reading flac.exe's output.

Of course until fb2k v2 & foo_bitcompare are updated to treat 32-bit integer losslessly (that isn't the case yet I think?!) one has to use a different approach for those ... but then they aren't many. I don't have music in that format.

(And even with that zealous attitude of mine ... the first floating-point .wav's I downloaded, were Audition's format, and I should have WavPack'ed them using official wavpack.exe rather than through foobar2000.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-31 23:17:35
-0 does not pick the fastest block size!  Test done on CDDA with official 1.4.1 x64.

Since different block sizes have been tested, I did that for -0 and -2. Recall the difference between those is that -0 does dual-mono while -2 brute-forces the stereo decorrelation strategy. Both have an implicit -r3 which I have not touched.
I took all multiples of 512 and 576 up to 4608. Recall that -0 to -2 use 1152=2*576.

Computer: my friend's Ryzen-equipped not-so-expensive Acer consumer laptop, which delivers more consistent results than my Intels.
Corpus: As in my signature ... well nearly: by mistake one file had a second copy. 39 CDs. But I corrected sizes, they are 38.

Timings are median of 6 runs (i.e. average between the two middle ones); first I did each setting separate, three runs; then I did three runs of -0b512, three of -0b576 etc.
Results, thanks to https://theenemy.dk/table/ , are sorted by time.
"BS" - for "BigSlow" but surely intended to be read as "bullshit" yes - indicates it is both bigger & slower than the one immediately above. "bs" indicates that if all the "BS" were removed, this would become a BS.
0b3072275,1913 304 034 333
0b3456277,3213 304 683 018BS
0b2880277,3213 304 059 146bs
0b2560278,2813 305 036 984BS
0b2048278,2813 303 957 236
0b2304280,2113 301 667 605
0b1728283,9113 316 001 267BS
0b1536288,6013 321 899 365BS
0b1152292,0113 332 311 287BS
0b1024293,6413 342 207 656BS
0b4096295,9113 304 264 090bs
0b4608299,0613 307 182 585BS
2b1728298,2712 724 538 882
2b1536301,6512 730 340 579BS
2b2880301,3412 713 304 251
0b0512301,3413 442 182 994BS - well really, this is a "-0" down in the "-2" bunch
2b2560302,9012 714 138 123bs - really a capital BS, only "saved by" the "-0"
2b3072302,7812 713 394 065bs
2b2304304,2212 710 560 120
2b1152302,7812 740 658 828BS
2b3456303,0812 714 271 305bs
2b2048308,7512 712 682 405bs
0b0576317,1913 418 999 908BS - another "-0" here
2b1024323,5612 750 559 594bs - this too saved from "BS" by a "-0"
2b0512327,2412 851 045 240BS
2b0576357,1712 827 742 003bs
2b4608385,7312 717 232 770bs
2b4096386,2812 714 139 624bs
.
Inferences:  Well I don't really believe this to be any universal truth. Why should -0b2048 be faster than -0b2304 while -2b2048 is slower than -2b2304? But some patterns are obvious. For example, the "extremes" are quite slow, and not the best. 
* The smallest block sizes are obtained for -b2304 in both settings. I guess the only reason for -2 is to avoid using multiplication while still squeezing more bytes out, so ... there you go. And -b2304 was only a couple of percent off the fastest block size as well.
* -0 to -0b2304 saves 4 percent time and a quarter percent size. -0 to -0b3572 (tripling block size) saves six percent in speed then.

I have no idea why -2b4096 and -2b4608 are that slow.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-31 23:56:09
Block size impact on "-5" and "-7" speeds. (Atop -5 because -5 is default, atop -7 because -7 is good.)

Damn me I forgot to record sizes on this one, but a couple of manual checks indicate that nah, nothing here that is worth it in cost/benefit terms.
But "interesting" it may be, even if the time impact from default is just a few percent saved - and -b 3456 seems to be fastest both at -5 and -7.

Same computer, compile, files (CDDA) and setup as the previous posting, but this time it is only median of three runs, and this time I managed to remove the 39th. Sorted by time:

5b3456   385,554
5b3072   386,659
5b2880   389,014
5b2304   390,926
5b1728   391,627
5b2560   391,999
5b2048   393,695
5b1536   395,008
5b1152   403,659
5b4608   411,612
5b4096   413,726
5b1024   414,359
5b0576   457,707
5b0512   466,306

7b3456   566,989
7b3072   571,83
7b2880   571,947
7b2560   576,015
7b2304   581,654
7b2048   589,71
7b4608   590,918
7b4096   592,326
7b1728   594,442
7b1536   605,249
7b1152   636,17
7b1024   650,702
7b0576   777,917
7b0512   805,091


So whoever came up with -b3456 (@bennetng, I think?) might have a bonus in this.
However, I never got -b3456 to produce the best compression - and to whomever came up with -7 -l 11 to speed up -7 slightly (@sundance , I think?), that one made for both faster encode and smaller files than -7b3456 in this corpus.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-11-01 15:00:09
I don't have enough RAM disk space to benchmark encoding, data listed here are decoding speed, using foo_benchmark and single thread in RAM disk. Previous tests indicate relative decoding speed vs encoding parameters can be highly CPU-dependent, so keep this in mind.

BEST WORST

Some people may choose -0 because it decodes faster.

-0
8618108022 bytes, 1797.101x realtime

-0b2048
8603509098 bytes, 1836.416x realtime

-0b2304
8602686728 bytes, 1841.398x realtime

-0b3072
8605905301 bytes, 1845.695x realtime

-0b3456
8606959660 bytes, 1838.831x realtime

-0b4096
8607581496 bytes, 1851.142x realtime

-0b4608
8610113289 bytes, 1857.613x realtime

-3 and -8 are the lowest and highest presets default to -b4096. As for --no-mid-side, most, if not all of the test materials are in normal stereo.

-3b1152
8214442723 bytes, 1454.438x realtime

-3b2048
8179862763 bytes, 1523.949x realtime

-3b2304
8177409458 bytes, 1523.017x realtime

-3b3072
8178587836 bytes, 1508.774x realtime

-3b3456
8179963450 bytes, 1523.766x realtime

-3
8182244258 bytes, 1517.872x realtime

-3b4608
8186319940 bytes, 1524.288x realtime

At -8 I think there is no need to test anything below -b2048, except for Merzbow fans.

-8b2048
7951523906 bytes, 1377.707x realtime

-8b2304
7945563198 bytes, 1360.829x realtime

-8b3072
7938910470 bytes, 1350.458x realtime

-8b3456
7936429482 bytes, 1323.356x realtime

-8
7932854429 bytes, 1331.976x realtime

-8b4608
7932931331 bytes, 1320.299x realtime

At last, -5:

-5b2048
7987232470 bytes, 1420.886x realtime

-5b2304
7983765417 bytes, 1417.812x realtime

-5b3072
7983025331 bytes, 1423.270x realtime

-5b3456
7983619874 bytes, 1410.768x realtime

-5
7984866814 bytes, 1408.704x realtime

-5b4608
7988286008 bytes, 1408.448x realtime

Decode from RAM disk (what I did in this test) is still slower than "Load whole file into memory first" with around 1400-1500x decoding speed at -8 and 2000-2100x at -0, but I don't have enough RAM to do this.

Corpus total length 23h20m25s, around 53.52% at -8 to match ktf's graphs.

The playlist is deliberately built to achieve balance, with classical, electronic, ethnic, jazz, new age, pop, speech etc, including Eastern and Western works. The attached corpus.txt is not very well organized and shows a lot of "game music", but they are mostly individual tracks from different albums while some other files are big images without showing track names. Anyway "game music" is just all kind of genres used in games, except they don't have too many vocals. This highly deliberate effort gave the intended results at -8 in terms of file sizes I suppose. The lower presets are most likely limited by -l and -r, but -b1152 is still too low for the lower presets.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-11-01 15:22:21
In fact, after seeing this glitch I even scan files encoded with the Xiph encoders, if I am going to delete the original.
I always do. What if the process is aborted for whatever stupid reason, leaving a partial file?
[...]
(And even with that zealous attitude of mine ... the first floating-point .wav's I downloaded, were Audition's format, and I should have WavPack'ed them using official wavpack.exe rather than through foobar2000.)
I have many WavPack files directly saved with Audition without going through wav. WavPack saves markers and loops and they can be read by other software like Sound Forge and Reaper, therefore I also have 16 and 24-bit WavPack files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-11-01 17:30:58
I don't have enough RAM disk space to benchmark encoding, data listed here are decoding speed
[...]
Decode from RAM disk (what I did in this test) is still slower than "Load whole file into memory first" with around 1400-1500x decoding speed at -8 and 2000-2100x at -0, but I don't have enough RAM to do this.
Corpus total length 23h20m25s, around 53.52% at -8 to match ktf's graphs.
Even CUETools.Flake is happy with the corpus and showed no error.

-8 --vbr 4
7925706124 bytes, 1317.942x realtime

-8 -r 8 --vbr 4
7925683221 bytes, 1333.292x realtime

Adding -s search makes the encoding speed comparable to -8p in flac 1.4.2 and therefore not tested.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-01 19:01:02
Some people may choose -0 because it decodes faster.

Just to have repeated this for the record: in ktf's test done on an AMD processor (http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf), -3 decodes faster than -0.

Also -1 and -2 are -0M and -0m respectively, so the only difference in decoding are the transformations from mid+side / left+side / right+side to dual mono - and potentially the following: In the above link, we see that -2 decodes slightly faster than -1, and that is likely due to better compression and thus less data to handle and unpack (what else could it be?)

In your test, -3 is slower than -0, so this is where your Intel behaves ... not like AMD ;-)


Then:
-3: It seems that 1152 is slow and everything else is about equal. Don't know how much variation you would get by a re-run.
-8: I don't know why lower block sizes are faster here, but it might be that they use less complicated Rice partitioning - that is, 4096 could get another subdivision of two, compared to 2048? Just thinking aloud.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-11-07 21:22:16
So I realize there's a very steep wall of diminishing returns when using encoding options beyond simply using -8.
I randomly tested this about a week ago.  Using Nine Inch Nails - The Fragile, full album as a single wave file.

CPU = AMD 5850U
Original wave size = 1046.52 MiB
-8 = 20 seconds, 628.39 MiB  (60.04%)
-m -b 4096 -p -r7 -l18 -A subdivide_tukey(21/15e-1) = 283 minutes, 626.29 MiB  (59.84%) -0.02%
-m -b 4096 -p -e -r7 -l24 -A subdivide_tukey(7) = 564 minutes, 625.95  (59.81%) -0.23%
-m -b 4096 -p -e -r7 -l32 -A subdivide_tukey(21/15e-1) = 8972 minutes, 625.47 MiB  (59.76%) -0.37%

6.25 days to shave off just under 3 MiB  compared to just using -8!
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-11-08 00:39:45
-m -b 4096 -p -e -r7 -l32 -A subdivide_tukey(21/15e-1) = 8972 minutes, 625.47 MiB 
8972 minutes! That is what i call a performance test  8)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-08 07:24:52
So I realize there's a very steep wall of diminishing returns when using encoding options beyond simply using -8.
You can argue for "beyond -5" and "beyond -7" there as well. Try those!

Yes there is no "practical limit" to how slow you can get the encoding. I have also once run it over a few days I was away (the -A enough-for-five-days (https://hydrogenaud.io/index.php/topic,120158.msg1001834.html#msg1001834) line here). Your test was on a high resolution thing (otherwise the -l would have called you to invoke --lax), where -e often makes more difference than to CDDA.
Still, try the following, which won't take nearly as much time:
-8p
-8e
-m -b 8192 -r9 -l 15 -A "tukey(7e-1);punchout_tukey(4);subdivide_tukey(12);welch"
-m -b 8192 -r9 -l 15 -A "tukey(7e-1);punchout_tukey(4);subdivide_tukey(12);welch" -p
-m -b 8192 -r9 -l 15 -A "tukey(7e-1);punchout_tukey(4);subdivide_tukey(12);welch" -e
and see how they compare.
If it does good ... well how to tell? Need to run them all possible combinations brute-force. That is what takes time.


-m -b 4096 -p -r7 -l18 -A subdivide_tukey(21/15e-1) = 283 minutes, 626.29 MiB  (59.84%) -0.02%
-m -b 4096 -p -e -r7 -l24 -A subdivide_tukey(7) = 564 minutes, 625.95  (59.81%) -0.23%
-m -b 4096 -p -e -r7 -l32 -A subdivide_tukey(21/15e-1) = 8972 minutes, 625.47 MiB  (59.76%) -0.37%
There was something very unreasonable about your differences, as you earned only half a megabyte between the two latter. And indeed you have not quoted them correctly. They are
-0.20 not 0.02
-0.23 yep
-0.27 not 0.37. Well it is really 0.28 after roundoff.
So going from slow to ultra-slow gives you slightly less than 0.08 - not the mighty 0.37-0.02=0.35 your numbers could suggest.

(Oh, but it is percentage points - you get nearly half a percent ;) )
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-11-08 08:15:47
Apologies for the typos on the percentages.  My window to edit my post closed before I realized.  It's CDDA audio, so I should have specified the --lax option was used.

I had run other random tests, but only included a few.  I find that running either -e or -p in combination with subdivide_tukey gives better compression results than using -e and -p combined, while being faster.  I also noticed using higher values can sometimes result in worse compression than using smaller values.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-08 12:11:22
Oh, I forgot that The Fragile is a double CD  :-[

I find that running either -e or -p in combination with subdivide_tukey gives better compression results than using -e and -p combined, while being faster.
My experience is that for CDDA, you can outdo -e in shorter time by subdivide_tukey around 5. Above that, go for -8p. And that -8p -A subdivide_tukey(reasonably high) still will outdo -8pe.
For high resolution, the -e still isn't useless.

The thing is, brute-forcing "roundoffs" (well -e tries to put the upper coefficients equal to zero, in which case you don't have to store them, that saves space) does not only find the best trade-off between the bits the coefficients take up and the bits the coefficients will save - it also "unpredictably" moves the coefficients somewhere in a direction which every now and then is "better without us knowing why before actually calculating it". Brute-forcing means to actually go through the encoding for a lot of different (possibly only-slightly-different) coefficient vectors, and even if the resolution doesn't really save much, it could "by chance" be better. And even more so if FLAC's way of "guesstimating first to pick the best which is then calculated thoroughly" is not-so-good - and that has evidently been tested on CDDA.

I also noticed using higher values can sometimes result in worse compression than using smaller values.
Higher number in subdivide_tukey(N)? That also changes the effective tapering. Which does not necessarily give a better or worse, but if you do many tests you should see both directions. Your 21/15e-1 means that the the "full tukey" part of it has tapered 1.5/21 = 1/14 is around 0.07, which in my tests would be too close to a rectangle. Hence my suggestion to include a separate tukey with more tapering.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-08 23:23:16
And just to get some idea on what makes differences on The Fragile:
* Three MB saved -5 to -7.
* Another half a MB-ish saved -7 to -8.
* Another half a MB-ish -8 to -8p. (Midway in between: -8r7.)
* Another half a MB-ish -8p to -8pr8
* Another half a MB-ish for stacking up with -A "subdivide_tukey(12);tukey(7e-1);punchout_tukey(4);welch;hann;flattop"