HydrogenAudio

Lossless Audio Compression => FLAC => Topic started by: bennetng on 2022-09-22 18:21:22

Title: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-22 18:21:22
https://vgmdb.net/album/5066
https://www.youtube.com/playlist?list=PLyxqUxT0goirVyF6FILtXMKpcQfOwCC5d
100.00% 696,947,036 Einhander.wav
65.900% 459,284,936 flac141 -8p.flac
65.817% 458,712,584 flac141 -8p -b2304.flac

CUETools.Flake 2.2.2 failed to encode with --vbr, but now I can predict what kinds for files can benefit from changing the default block size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 11:39:56
At Netranger's request, here's a new topic for v1.4.x testing...

A short test of Case's 1.4.1 build on my Intel Core i7-7700 CPU @ 3.60GHz:
(NB. Compared to this test (https://hydrogenaud.io/index.php/topic,122949.msg1015601.html#msg1015601), I changed the test procedure (added --silent to flac options) -> little speed increase and less variations between the different test runs)
Code: [Select]
FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 29.407 seconds, Encoding speed = 367.67x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141_Case.exe -7 (844800 Bytes)
- Average time  = 29.780 seconds, Encoding speed = 363.06x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 45.762 seconds, Encoding speed = 236.27x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141_Case.exe -8 (844800 Bytes)
- Average time  = 46.053 seconds, Encoding speed = 234.77x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)
So I can't confirm ktf's assumption (https://hydrogenaud.io/index.php/topic,123014.msg1016154.html#msg1016154) (Binary might be an unmeasurable tiny amount faster because the binary is slightly smaller).
At least as far as Case's builds are concerned.

Meanwhile I've upgraded my main computer to the next CPU generation: Intel Core i7-8700 CPU @ 3.20GHz
Except being a little faster, the results are comparable:
Code: [Select]
FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 27.269 seconds, Encoding speed = 396.50x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac140_Case.exe -8 (852992 Bytes)
- Average time  = 42.676 seconds, Encoding speed = 253.35x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141_Case.exe -7 (844800 Bytes)
- Average time  = 27.487 seconds, Encoding speed = 393.36x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141_Case.exe -8 (844800 Bytes)
- Average time  = 42.478 seconds, Encoding speed = 254.53x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

Title: Re: FLAC v1.4.x Performance Tests
Post by: .halverhahn on 2022-09-23 12:44:20
One more time of my performance test of different flavors of FLAC 1.4.1 (Win64) on my Win10, i7-1185G7 Laptop.
Using a 16bit/44.1khz WAV file, size: 2.710.211.996 byte, Flac @ -8

Long story short:

flac141case-hashwell: ~56.5s
flac141xiph: ~60.8s
flac141rarewares-avx2: ~57.6s

Code: [Select]
PS C:\temp\FLAC141> Measure-Command { .\flac141case-hashwell.exe -8 image.wav -f -o image.flac.141case-hashwell.flac | Out-Default }

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

image.wav: wrote 1619924066 bytes, ratio=0,598


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 56
Milliseconds      : 506
Ticks             : 565069945
TotalDays         : 0,000654016140046296
TotalHours        : 0,0156963873611111
TotalMinutes      : 0,941783241666667
TotalSeconds      : 56,5069945
TotalMilliseconds : 56506,9945



PS C:\temp\FLAC141> Measure-Command { .\flac141xiph.exe -8 image.wav -f -o image.flac.141xiph.flac | Out-Default }

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

image.wav: wrote 1619924066 bytes, ratio=0,598


Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 0
Milliseconds      : 813
Ticks             : 608136756
TotalDays         : 0,000703861986111111
TotalHours        : 0,0168926876666667
TotalMinutes      : 1,01356126
TotalSeconds      : 60,8136756
TotalMilliseconds : 60813,6756



PS C:\temp\FLAC141> Measure-Command { .\flac141rarewares-avx2.exe -8 image.wav -f -o image.flac.141rarewares-avx2.flac | Out-Default }

flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

image.wav: wrote 1619924072 bytes, ratio=0,598


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 57
Milliseconds      : 670
Ticks             : 576700994
TotalDays         : 0,000667478002314815
TotalHours        : 0,0160194720555556
TotalMinutes      : 0,961168323333333
TotalSeconds      : 57,6700994
TotalMilliseconds : 57670,0994



PS C:\temp\FLAC141>
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-23 13:29:37
i3-12100, single wav CDDA image, all tested in RAM drive:

Case -8
Total encoding time: 0:17.078, 264.75x realtime
Case -7p
Total encoding time: 0:23.672, 191.00x realtime

john33 avx2 -8
Total encoding time: 0:18.203, 248.39x realtime
john33 avx2 -7p
Total encoding time: 0:28.234, 160.14x realtime

xiph -8
Total encoding time: 0:17.734, 254.95x realtime
xiph -7p
Total encoding time: 0:22.922, 197.25x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-09-23 15:14:01
Your plain -8 numbers are pretty similar to my Ryzen 5900x.
Still for -8 -p the official xiph binaries are clearly the fastest.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-23 15:40:03
Yes, xiph is the fastest on 8p for me as well:

xiph:
Total encoding time: 0:45.406, 99.57x realtime
Case:
Total encoding time: 0:50.281, 89.92x realtime
john33 avx2:
Total encoding time: 1:03.641, 71.04x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 17:07:34
And here's another round of tests with -7 (Intel Core i7-8700 CPU @ 3.20GHz):
Code: [Select]
FLAC Binary: xiph-140\flac.exe -7 (328704 Bytes)
- Average time  = 28.104 seconds (10 rounds), Encoding speed = 384.72x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: xiph-141\flac.exe -7 (299520 Bytes)
- Average time  = 27.900 seconds (10 rounds), Encoding speed = 387.53x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac140_Case.exe -7 (852992 Bytes)
- Average time  = 27.559 seconds (10 rounds), Encoding speed = 392.32x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141_Case.exe -7 (844800 Bytes)
- Average time  = 27.707 seconds (10 rounds), Encoding speed = 390.22x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141-john-avx2.exe -7 (1212928 Bytes)
- Average time  = 30.217 seconds (10 rounds), Encoding speed = 357.81x
- FLAC file size = 1.167.014.370 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141-case-haswell.exe -7 (860160 Bytes)
- Average time  = 25.416 seconds (10 rounds), Encoding speed = 425.40x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)

FLAC Binary: flac141-case-gcc12.exe -7 (781312 Bytes)
- Average time  = 26.174 seconds (10 rounds), Encoding speed = 413.08x
- FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size)
tldr;
With the official xiph binary ktf is right: 141 is a bit faster than 140.
Case's binaries still outperform xiph on my setup, but the difference is small (see note below).
Case's Haswell build is the fastest on my setup at -7. I will re-run my tests with -8.

NB. I realized that in my test here (https://hydrogenaud.io/index.php/topic,122949.msg1015601.html#msg1015601) I mislabeled the flac140.exe binary. It was NOT the official binary from xiph! Sorry for that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-23 17:14:25
Here's another 1.4.1 compile with newer GCC (12.2.0). Over here it's faster than Xiph build even with -p.
Yes. Both -8 and -8p are improved. Same test as my previous posts:
-8
Total encoding time: 0:16.547, 273.24x realtime
-8p
Total encoding time: 0:44.312, 102.03x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 17:32:09
Code: [Select]
FLAC Binary: xiph-141\flac.exe (299520 Bytes)
- Average time  = 41.038 seconds (5 rounds), Encoding speed = 263.46x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
- Average time  = 38.473 seconds (5 rounds), Encoding speed = 281.03x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141-john-avx2.exe (1212928 Bytes)
- Average time  = 46.881 seconds (5 rounds), Encoding speed = 230.63x
- FLAC file size = 1.166.206.863 Bytes (= 61,145% of WAV size)

FLAC Binary: flac141-case-gcc12.exe (781312 Bytes)
- Average time  = 39.323 seconds (5 rounds), Encoding speed = 274.95x
- FLAC file size = 1.166.206.855 Bytes (= 61,145% of WAV size)

Speed ranking with -8 here: Case-Haswell -> Case-GCC12 -> Xiph -> John-AVX2
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-09-23 17:43:01
Case and Xiph is ~102x for -8 -p on the Ryzen 5900x, nice.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-23 18:25:13
And finally with -8p (although out of my league):
Code: [Select]
FLAC Binary: flac141-case-gcc12.exe (781312 Bytes)
- Average time  = 127.483 seconds (5 rounds), Encoding speed = 84.81x
- FLAC file size = 1.165.475.620 Bytes (= 61,107% of WAV size)

FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
- Average time  = 128.832 seconds (5 rounds), Encoding speed = 83.92x
- FLAC file size = 1.165.475.620 Bytes (= 61,107% of WAV size)

FLAC Binary: xiph-141\flac.exe (299520 Bytes)
- Average time  = 134.315 seconds (5 rounds), Encoding speed = 80.50x
- FLAC file size = 1.165.475.620 Bytes (= 61,107% of WAV size)

FLAC Binary: flac141-john-avx2.exe (1212928 Bytes)
- Average time  = 190.607 seconds (3 rounds), Encoding speed = 56.72x
- FLAC file size = 1.165.475.622 Bytes (= 61,107% of WAV size)
Here Case's GCC12 build is the fastest, but both of Case's binaries are faster than Xiph on my setup.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-23 21:22:26
Got Windows screaming false positives on the various compiles posted here, so ... no 1.4.1 speed tests from me until next set of definitions downloaded.
(Submitting false positives reports to Microsoft isn't outright easy?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 09:09:24
1.4.0 compiles then, augmenting the Intel figures from here (https://hydrogenaud.io/index.php/topic,122949.msg1015694.html#msg1015694) with Ryzen 2500U and also did flac -0 for those who want the fastest. Reply number is from that thread.

I don't know how much the individual figures can be trusted - and if I ever do this again, it will not be on 1.4.0 - but the overall picture among the fastest is pretty clear: the Case compile wins on -0 and -5, and the official build wins at -7p.  I suspect it is when it calls for several residual compressions then?


-0. Why @john33's newer Intel compile is so good compared to the other builds posted I don't know - maybe order of runs matter, if the CPU has just struggled less or more, but just speculations. Adding up it overtakes the Xiph build.
Intel   Ryzen (much cheaper)   numbers are ordered by Intel
211      258      Case compile (x64) from reply 57
238      312      Xiph (x64 only)
240      293      john33's reply 82 with newer compile
240      332      NetRanger GCC-64 from Reply 10
254      348      NetRanger GCC-32 from Reply 10
258      n/a      Rarewares-x64 from john33's Reply 34 (first link)
264      367      Rarewares-x86-nonXP from john33's Reply 34 (first link)
269      360      NetRanger CLANG14-64 from Reply 15
274      360      NetRanger CLANG15-w64 from Reply 68
276      365      Rarewares-x86 from john33's Reply 34 (second link)
282      381      NetRanger CLANG14-32 from Reply 15
295      383      NetRanger CLANG15-w32
Rarewares-x64 produced strange numbers. Didn't go back and check again.


-5:
Intel   Ryzen   numbers are ordered by Intel
256      353      Case compile (x64) from reply 57
271      372      Xiph (x64 only)
298      375      john33's reply 82 with newer compile
300      412      Rarewares-x64 from john33's Reply 34 (first link)
321      481      NetRanger CLANG14-64 from Reply 15
328      462      Rarewares-x86-nonXP from john33's Reply 34 (first link)
330      464      Rarewares-x86 from john33's Reply 34 (second link)
336      454      NetRanger GCC-32 from Reply 10
344      447      NetRanger GCC-64 from Reply 10
347      450      NetRanger CLANG14-32 from Reply 15
366      n/a      NetRangerCLANG15-w64 from Reply 68
395      457      NetRangerCLANG15-w32
Deleted a nonsense result from CLANG15. Did Windows Update run or something?

-7p. Results start to vary, I wonder how much they can be trusted. Xiph was run immediately after all the -5 were done, maybe -5 was not enough to heat up the CPU that much. Will not re-run any 1.4.zero .
Intel   Ryzen    numbers are ordered by Intel and here the Ryzen numbers are not much in order.
 831      1279      Xiph (x64 only)
 880      1539      Case compile (x64)
 882      1426      NetRanger GCC-64
1006      1632      NetRanger GCC-32
1015      1600      NetRanger CLANG14-64
1029      1442      john33's reply 82 with newer compile
1035      1431      Rarewares-x64
1089      1888      Rarewares-x86
1170      1573      NetRanger CLANG14-32
1284      1862      Rarewares-x86-nonXP
1508      1783      NetRanger CLANG15-64
n/a       1797      NetRanger CLANG15-32
User aborted the CLANG15-32 was on Intel after, and the CLANG14-32 on Ryzen - the latter after two runs had completed, consistent figures, so was included.

Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2022-09-24 10:38:50
If you get weird times with john33 AVX2 compile, it might be CPU overheating - AVX2 is extremely heavy on power and heat, if there are any cooling deficiencies, CPU will throttle.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 10:51:57
Quite possible. Everything was done on laptops. Dell with Intel and consumer-grade Acer with Ryzen. I've had three generations of Dell Latitude, and consumer-grade Dell before that - Dell fan control was and remains a mystery.

A "speed concern" is also, how much are you doing at the time? If one is compressing only a few albums, one will wait for it to finish and then time is annoying - and then half of them might be done before the worst throttling kicks in, maybe?
If one is compressing a lot of them, overnight at full steam, that is something else - but speed is not that crucial as long as it only makes the difference between done at 0430 and done at 0510.
If one is migrating to (new) FLAC - running for days and weeks - then both duration and heat (slowdown and fan speed on a computer you are usinig every day) would be a big thing again.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-09-24 10:57:19
Long story short: on all of the Intels I've tried, from Gen.4 to Gen.8, the Haswell build by Case is the fastest one. Using -6 because I like to use -6, best speed/compression ratio IMHO.
But an older Flac 1.3.3 x64 build by Case is slightly faster with only 152940 bytes more out of 583 MB:

Flac 1.3.3 x64 by Case 11/08/2019
Code: [Select]
flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 611507403 bytes, ratio=0,734

Kernel  Time =     0.812 =    6%
User    Time =    10.687 =   88%
Process Time =    11.500 =   94%    Virtual  Memory =     14 MB
Global  Time =    12.140 =  100%    Physical Memory =     13 MB


Flac 1.4.1 x64 Haswell by Case 23/09/2022
Code: [Select]
flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 611354463 bytes, ratio=0,734

Kernel  Time =     0.687 =    5%
User    Time =    11.671 =   92%
Process Time =    12.359 =   97%    Virtual  Memory =     14 MB
Global  Time =    12.685 =  100%    Physical Memory =     17 MB
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 11:13:07
Using -6 because I like to use -6, best speed/compression ratio IMHO.
What CPU? All testing I have done so far, indicates that -6 is the useless one; -7 is quite close to -6 on time (on Intel) and quite close to -8 at size. Visualised by ktf at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 - see the third diagram that starts at -4.
-6 was even more strikingly bad with 1.3.4 it seems.

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-24 11:30:16
Don't know about specific CPUs but -6 is pretty poor on compression ratio. -7 is the best for speed vs compression ratio IMO. As for throttling, run HWiNFO in background so one knows that the CPU is throttled. Pretty much a non-issue for desktop systems with AVX2 stress tests on all cores.
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2022-09-24 11:47:04
It depends on CPU and cooling, some are designed to be low on power & heat, yet my i7-4790K with a huge tower air cooler hits 100C with Prime95 AVX2 stress test and is very close to Tcase with everyday AVX2 apps.
With current turbo technology you may miss turbo targets when it's warmer and will get varying results.
Laptops are always cooling deficient.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-09-24 11:56:59
I've tried with -7 and 1.3.3 is still faster than 1.4.1:

Flac 1.3.3 x64 by Case 11/08/2019
Code: [Select]
flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 609653841 bytes, ratio=0,732

Kernel  Time =     0.828 =    6%
User    Time =    12.250 =   91%
Process Time =    13.078 =   97%    Virtual  Memory =     14 MB
Global  Time =    13.345 =  100%    Physical Memory =     13 MB

Flac 1.4.1 x64 Haswell by Case 23/09/2022
Code: [Select]
flac 1.4.1
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

flac.wav: wrote 609301753 bytes, ratio=0,732

Kernel  Time =     0.593 =    3%
User    Time =    14.000 =   93%
Process Time =    14.593 =   97%    Virtual  Memory =     14 MB
Global  Time =    14.967 =  100%    Physical Memory =     13 MB
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-09-24 12:13:00
I've tried with -7 and 1.3.3 is still faster than 1.4.1:

Yes.

Quote from: https://xiph.org/flac/2022/09/09/flac-1-4-0-released.html
  • Compression for presets 3 through 8 has improved with only a small decrease in encoding speed, while presets 0, 1 and 2 got faster.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 12:31:29
Edit: If you want a competition to -6, try -5 -l 10 and see what happens. Compare to both old -6 and new -6. (Reason: it seems it is the low -l that makes -6 underperform.)

1.3.3 is still faster than 1.4.1:
Sure, but compare to 1.3.3 at -8 and see which one is fastest and compresses best?
I did that (https://hydrogenaud.io/index.php/topic,122949.msg1015396.html#msg1015396) (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

1.4 starts using double-precision coefficients, which gave the competition an upper hand for fifteen years. (https://hydrogenaud.io/index.php/topic,120158.msg999746.html#msg999746) Double precision takes a bit more time, but compresses better. The impact is greatest on high resolution material (well similar has been seen on lower resolution when there is not much content in the top octave). See that thread, first my reply 33 and then ktf testing upsampled material in reply 94.

(If 1.4 takes more time at high presets, then why not on 0, 1, 2? Code improvements.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-24 13:23:56
OK here are my speed and temperature tests, running 109 tracks in -8pe to make the test longer and uses all cores.

john33 AVX2
Total encoding time: 8:57.484, 38.06x realtime
Case Haswell
Total encoding time: 7:04.062, 48.24x realtime
Xiph
Total encoding time: 5:59.062, 56.98x realtime
Case GCC 12.2.0
Total encoding time: 5:55.157, 57.60x realtime

CPU package max temperature:
Case Haswell: 92C, the only one with "Yes" on "Power Limit Exceeded", no throttling though, it's a non-k i3 after all.
X

Xiph: 91C
X

Case GCC 12.2.0 88C
X

john33 AVX2 87C
X

The screenshot on Reply #17 can be used as a reference idle temperature. All tests ran on the stock cooler.
(https://hydrogenaud.io/index.php?action=dlattach;attach=23452;image)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-09-24 13:37:39
Double precision takes a bit more time, but compresses better.
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-09-24 13:48:57
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)

Then read the entire post:

I did that (https://hydrogenaud.io/index.php/topic,122949.msg1015396.html#msg1015396) (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

So if you care more about speed than compression, just lower the preset and you will get both better and faster compression.

And if you only care about speed and energy usage while the size doesn't matter at all, don't compress at all - just use WAV. Can't beat that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Chibisteven on 2022-09-24 14:04:06
I don't really care about compression (=storage) but I do care about speed and CPU (=electricity bill) especially these days ;)

Then read the entire post:

I did that (https://hydrogenaud.io/index.php/topic,122949.msg1015396.html#msg1015396) (corpus in my signature), and 1.4.0 at -7 was faster and compressed better than 1.3.4 at both -8 and -8p. (And -8e too, naturally.)

So if you care more about speed than compression, just lower the preset and you will get both better and faster compression.

And if you only care about speed and energy usage while the size doesn't matter at all, don't compress at all - just use WAV. Can't beat that.
FLAC seems pretty energy efficient and quite fast to decode compared to something like Monkey's Audio which just eats energy and is sloooooooooow.

Encoding is another story but if your stuff is already encoded, you could just leave it be.  Even on a UPS it has no major effect on battery life and that's running an Icecast server and yes I had a power outage while using Icecast and running a bunch of other things as well as FLAC encoding.
Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2022-09-24 17:16:22
Nothing in this thread tells us about energy efficiency. Something slower may use less joules to do it's job than something that finishes faster.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-24 17:43:33
For a given executable with more CPU intensive setting, I would not be surprised if it translates quite well. (As encoding goes - decoding is a different matter, and for FLAC a quite irrelevant one.)
But if you have a more modern CPU, a TDP of 15 watts is quite common, which amounts to some 2.52 kWh if run for a whole week.

Of course, GB cost is only relevant when your drive is getting full. Implying, as Chibisteven points out, leave encodes as they are.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-09-24 23:21:34
I wanted to test the effect on lossyWAV encodes. So I lazily picked thirteen songs from various artists in my collection (honestly, there really isn't a whole lot of variety here) and tested it as follows:


Here are the results:

Format
Size (bytes)
Percentage
wav
534185270
100%
1.3.4
371429142
69.532%
1.4.1
370814508
69.417%
lossy-1.3.4
154850676
28.988%
lossy-1.4.1
154596798
28.941%

More or less what I expected. Someone with a better / more varied testing playlist should probably redo the test, though.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-25 12:20:32
Why -6 does so bad - and why keep it as it is:
(i.e. what part of the settings)

Compared to -5, -6 goes up from -r5 to -r6 and tries another set of windowing functions.
But prediction order stays at 8.
Going up to -7 increases the prediction order to 12. It spends only a tiny bit more time, and compresses quite a lot better. See charts with nearly-the-same-as 1.4 at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 , 3rd diagram.

Question: why is the difference to -5 small and the difference to -7 large?
* It is not the -r5 to -r6. In the 38 CD corpus in my signature, -5 -r6 improved 0.0044 percent.
* It is the prediction order. Trying -5 -l10 and -5 -l12 ditches the two other measures (-r and windowing function) and increases the prediction order. The former nearly catches -6 at size, the latter overtakes it. Both encode significantly faster on both Intel and Ryzen.

Why keep -6 then, when you can compress better at shorter time?
Decoding CPU footprint.
FLAC already decodes faster than pretty much anything, so is that necessary? ... well, FLAC has some "special olympics" settings: -0 and -3 are dual mono, -0 to -2 are fixed predictor. Starting at -4 you have the 8th order predictor and stereo decorrelation (adaptive for -4, brute-force'd at higher). Look at the decoding chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf (and remember it starts at -0 not at -1): -7 and -8 take - percentwise - more CPU decoding, and that is because the -l12. Even for a "fastest there is", there is a reason for a couple of "fastest among the fastest".

So there you go, a special purpose setting for those who want something that, decoding-wise "cannot be distinguished from the default" (not completely true? There is a difference, the -r ...).
A beefed-up default is just ... nothing wrong with having one such for those inclined. Keep it. The rest of us ... don't use it.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-25 13:25:04
Look at the decoding chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf (and remember it starts at -0 not at -1): -7 and -8 take - percentwise - more CPU decoding, and that is because the -l12. Even for a "fastest there is", there is a reason for a couple of "fastest among the fastest".
Looks like the graph suggests that -3 should have the fastest decoding speed and -7 or -8 will be slowest. I zoomed the pdf plot a lot and see that the triangle mark in -3 is still sitting at 0% decoding CPU time. Anyway here are my results.

Test settings:
Code: [Select]
System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.12
Settings:
  High priority: yes
  Buffer entire file into memory: yes
  Warm-up: yes
  Passes: 5
  Threads: 1
  Postprocessing: none

-3
Code: [Select]
Stats by codec:
  FLAC: 1506.230x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:15.008
  Speed (x realtime): 1506.230

-8
Code: [Select]
Stats by codec:
  FLAC: 1501.024x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:15.060
  Speed (x realtime): 1501.024

.wav as a reference
Code: [Select]
Stats by codec:
  PCM: 59556.250x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.000
  Decoding time: 0:00.379
  Speed (x realtime): 59556.250

The benchmark is very sensitive, for example, if I use RAM drive and set "Buffer entire file into memory" to "no":
Code: [Select]
Stats by codec:
  PCM: 31524.633x realtime
Total:
  Decoded length: 6:16:47.267
  Opening time: 0:00.001
  Decoding time: 0:00.717
  Speed (x realtime): 31524.633
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-25 15:19:50
Decoding speed varies from album to album, but relative differences among 0-8 are similar. Pay attention to -3. The files are all encoded with flac 1.4.1

MA Recordings - Rediscovered Memories
https://www.discogs.com/release/7300636-Various-Rediscovered-Memories
0 (x realtime): 2251.000 min, 2271.072 max, 2261.496 average
1 (x realtime): 2190.539 min, 2195.430 max, 2193.093 average
2 (x realtime): 2180.179 min, 2197.860 max, 2187.784 average
3 (x realtime): 1606.081 min, 1608.317 max, 1606.900 average
4 (x realtime): 1625.529 min, 1627.217 max, 1626.215 average
5 (x realtime): 1621.599 min, 1625.611 max, 1623.098 average
6 (x realtime): 1587.640 min, 1590.859 max, 1589.449 average
7 (x realtime): 1536.414 min, 1539.106 max, 1538.185 average
8 (x realtime): 1534.826 min, 1536.911 max, 1535.957 average

FINAL FANTASY XIII Original Soundtrack, Disc 4
https://vgmdb.net/album/15980
0 (x realtime): 2144.344 min, 2153.433 max, 2149.787 average
1 (x realtime): 2043.644 min, 2054.518 max, 2048.156 average
2 (x realtime): 2024.009 min, 2040.479 max, 2033.885 average
3 (x realtime): 1506.466 min, 1509.097 max, 1508.142 average
4 (x realtime): 1556.885 min, 1559.473 max, 1558.123 average
5 (x realtime): 1564.512 min, 1565.964 max, 1565.423 average
6 (x realtime): 1558.401 min, 1560.904 max, 1559.655 average
7 (x realtime): 1499.939 min, 1500.600 max, 1500.309 average
8 (x realtime): 1499.042 min, 1502.150 max, 1500.238 average

黃耀明 - 信望愛
https://music.apple.com/hk/album/%E4%BF%A1%E6%9C%9B%E6%84%9B/1356525393
0 (x realtime): 2145.599 min, 2162.814 max, 2156.138 average
1 (x realtime): 2045.117 min, 2054.068 max, 2048.449 average
2 (x realtime): 2036.280 min, 2054.765 max, 2046.030 average
3 (x realtime): 1475.368 min, 1477.189 max, 1476.618 average
4 (x realtime): 1528.601 min, 1532.385 max, 1529.803 average
5 (x realtime): 1544.374 min, 1546.838 max, 1545.812 average
6 (x realtime): 1527.229 min, 1529.417 max, 1528.687 average
7 (x realtime): 1422.072 min, 1423.042 max, 1422.549 average
8 (x realtime): 1423.491 min, 1424.172 max, 1423.799 average

Full reports attached.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-25 15:37:46
@ktf: When you wrote that -0 to -2 got faster (https://hydrogenaud.io/index.php/topic,122949.msg1015314.html#msg1015314), does that apply to decoding as well? If the improved algorithm is "reversible into an improved decompression algorithm", then that would explain. (Edit: oh I could have set up a RAM disk myself ... maybe)

Looks like the graph suggests that -3 should have the fastest decoding speed and -7 or -8 will be slowest. I zoomed the pdf plot a lot and see that the triangle mark in -3 is still sitting at 0% decoding CPU time.
It looks like the left and right borders by construction go through the fastest and slowest point, but even then: the scale is logarithmic, so there is no 0. The next (or "previous") step left of 0.5% would be 0.4%.
-3 appears to be sitting at "point fortysomething".
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-09-25 18:03:08
@ktf: When you wrote that -0 to -2 got faster (https://hydrogenaud.io/index.phpN/topic,122949.msg1015314.html#msg1015314), does that apply to decoding as well?
Nope. The two things are fundamentally different problems, improvements to one are rarely applicable to the other.

My most plausible explanation here is simply differences between CPUs. Perhaps newer CPUs can better precondition the code used for decoding fixed subframes. Branch prediction and other front-end wizardry are a major factor in the performance of today's CPUs.

edit: I've always been baffled by -3 being decoded faster than -2. The decoding results here make much more sense. In the past, @Porcus found that blocksize has a profound influence on decoding speed. Might be related to training of branch prediction. It might be that newer CPUs have branch prediction with a larger capacity, in which this training only has to happen once every file instead of once every block. In that case, influence of block size might be much smaller.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-26 10:02:26
My most plausible explanation here is simply differences between CPUs. Perhaps newer CPUs can better precondition the code used for decoding fixed subframes. Branch prediction and other front-end wizardry are a major factor in the performance of today's CPUs.
Just tested on a super old and low-end Lumia 520 (https://www.gsmarena.com/nokia_lumia_520-5322.php) with LineageOS and foobar2000 APK. Concatenated the first track of the three albums mentioned in Reply #31 (https://hydrogenaud.io/index.php/topic,123025.msg1016404.html#msg1016404). 11m53s single file encoded from 0 to 8. I tried to use the first two tracks for each album but then foobar crashed during test, probably not enough RAM (only 512MB).
X

foobar's console shows a floppy disk icon which is supposed to save the output to text files, I tried but it can't save anything, so here are the screenshots. The speed transition seems smoother.

0
X

1
X

2
X

3
X

4
X

5
X

6
X

7
X

8
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-26 10:30:09
An AMD thing then?
* From http://www.audiograaf.nl/losslesstest/ I see that @ktf used an AMD on edition 3, 4, 5, but an Intel on edition 1, 2.
* In editions 3ff, -3 decodes fastest
* In edition 2, I see from http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%202.pdf figure 2.2 (using 2.1 to verify that -3 is the one on the 58 percent mark) that 0, 1, 2 decoded faster.

I think presets 0, 1, 2, 3 (and 4, 5) were synonyms for the same thing then as now.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-26 15:47:02
GCC 12.2.0 john33 (https://hydrogenaud.io/index.php/topic,123014.msg1016445.html#msg1016445) vs Case (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265), plus Xiph build:
Zero Wing OST, VGM format rendered by foo_gep at 16/44
https://www.youtube.com/playlist?list=PLPAbo-cOSKYzgI65IOneAq_oTzOjI6k4F

case -8p
Total encoding time: 0:14.562, 113.62x realtime

case -8p -b2304
Total encoding time: 0:17.172, 96.35x realtime

john33 -8p
Total encoding time: 0:14.890, 111.12x realtime

john33 -8p -b2304
Total encoding time: 0:17.297, 95.65x realtime

xiph -8p
Total encoding time: 0:14.969, 110.53x realtime

xiph -8p -b2304
Total encoding time: 0:17.203, 96.18x realtime

The OST is not very long (27m53s) so others can do a longer test, perhaps with some temperature tests too.


File size:
Code: [Select]
wav         291,876,908   100.000%
-8p         164,386,268   56.3204%
-8p -b2304  163,492,544   56.0142%

Decoding speed:
Code: [Select]
System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.12
Settings:
  High priority: yes
  Buffer entire file into memory: yes
  Warm-up: yes
  Passes: 5
  Threads: 1
  Postprocessing: none

-8p
Speed (x realtime): 1430.445 min, 1433.292 max, 1432.189 average

-8p -b2304
Speed (x realtime): 1467.472 min, 1469.977 max, 1468.835 average
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-09-26 17:25:42
Interesting, thanks. I just posted a clang compile that seems a little faster on my system, but not with any extensive validation done speed-wise.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-26 18:54:37
Zero Wing OST
-8p
Total encoding time: 0:19.640, 84.24x realtime
-8p -b2304
Total encoding time: 0:22.609, 73.18x realtime

So slower, but previous Clang builds are also slower on my machine, as well as other members.
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-09-26 20:37:08
OK, thanks.
Title: Re: FLAC v1.4.x Performance Tests
Post by: forart.eu on 2022-09-27 11:14:04
...so, after all tests, which build (and encoding parameters) I have to use on my new i5-12600 for BOTH speed and size ?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-27 12:05:19
for BOTH speed and size ?

Use -0, that gives you max speed and max size.  ;D

Oh, you wanted small size? On a more serious note, then:
You need to take some kind of stand on how much CPU time it is worth to save a mega-/gigabyte. Compared to fifteen years ago, the extra time spent on doing -7 instead of -5, is much lower - but the space saved is much less costly.
Also if you have a spinning drive (people do have that for larger collections ...) - or even more, a NAS - then I/O will take out a lot of the speed, so that might not be a concern unless you are considering -8p.

Generally the "most economic" suggestion is too not recompress until you have to: leave your FLAC files as they are, and only when your drive is closing in on full, recompress using a quite heavy setting. If you are on a spinning drive, that is sure as hell to get you fragmentation though - in case that matters anymore.


Edit: for new files, you can of course choose whatever, but you cannot tell flac.exe to "recompress those which were compressed with -5 and leave the -8p alone" - it doesn't know that. If on the other hand you use foobar2000 to recompress, you can filter on those not created by 1.4.x. But that does not recompress in-place. What I would do, would be to (1) run a verification on everything to make sure no FLAC file is corrupted, (2) make sure the backup is sync'ed, (3) foo_audiomd5 writes an actual audio md5 to a file that can be checked afterwards, (4) flac -f to recompress the whole thing in-place, (5) verify against foo_audiomd5's files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-27 15:02:01
Mostly academic but try Merzbow Pulse Demon / Venereology with -l 12 -b 512 -r 8 without windowing.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-27 21:50:39
Settings above -8, anyone?

Come on, there are a few of you out there. Care to run an overnight job on at least a handful of CDs? There are a few settings that might be better or worse return on CPU time. I'll explain the choices at the end, at the risk of posting spoilers.


* I've not been so concerned about single songs, so I've been converting a whole image to .wav, removing all tags if they were carried over, getting them into the same folder in an SSD, and running the second script. But if you use the first one, you will not get the logfile spammed down with long dir listings.
* There are elegant PowerShell solutions, but I'm too stuck in cmd and bash. So for timing (on Windows), I'm using timer64 off the https://sourceforge.net/projects/sevenmax/ package.  Syntax (will overwrite flac files) that I used, e.g.
   <directory-to-timer64>\timer64.exe flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
and then du or dir.
With timer64.exe, flac.exe and the .wav files in the same directory, I would run the following (let's keep it simple, no FOR loop that makes for the single/double percentage sign issue - should preferably be run a couple of times as timings are not ... exact, but then it might take quite a while).
Copy, put in a flactest.bat file in the same directory as the timer64.exe, flac.exe and wave. (Or modify accordingly.)
Open a cmd window, cd to the directory, and run .\flactest.bat - or you can double click it. 

Code: [Select]
.\flac -3pf *.wav 
.\timer64.exe .\flac -7pf  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -r7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -r8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(5)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
du *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(5)" *.wav >> logfile.log
du *.flac >> logfile.log
The last one is going to be slow ... but in my testing, more than 3x as fast as the infamous -8ep.

Afterwards, you will get a logfile.log with for each setting, timing and total .flac file size in order. I think the "Global time =" figure is the most interesting.

If you are interested in how each file compresses, that is, the individual .flac sizes and not only the aggregate, you can replace the du command by the old-fashioned DOS command dir and run this instead:

Code: [Select]
.\flac -3pf *.wav 
.\timer64.exe .\flac -7pf  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -r7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -r8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8f -A "subdivide_tukey(5)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -r8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(5e-1);subdivide_tukey(3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r 7  *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 7 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(4)" -r 8 *.wav >> logfile.log
dir *.flac >> logfile.log
.\timer64.exe .\flac -8pf -A "subdivide_tukey(5)" *.wav >> logfile.log
dir *.flac >> logfile.log

Then you will get tons of text out. Most interesting here I guess is still the total at the bottom of each dir.


So why these choices? Apart from the -3p that is there just to heat up the CPU a little in order not to let the -7p have an advantage of no-throttling?
-7 is synonymous to -l 12 -b 4096 -m -r 6 -A subdivide_tukey(2) and -8 to same but "(3)" at the end. Next step would be (4) then? But at some stage, the "-p" will be more worth it. When? That's why the initial -7p too; I expect it not to be worth it, but I see some surprises on some material.
Also there is this thing about -r8, which people often use to squeeze the most out of the files - expected is that -r7 offers better value for money, of course, but how much? When does it pay off to nudge up that parameter?
Before going to subdivide_tukey(4), there are a few stranger ones. Like -8 -A "tukey(5e-1);subdivide_tukey(3)"; isn't there already a single tukey function in subdivide_tukey(3)? Yes, but that is steeply tapered: closer to a rectangle than to the tukey(onehalf) that is the -5 default. So, trying that instead of the subdivide_tukey(4), hoping to get catch more compression cheaper?
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)": Here I'm making them more different, to see if that helps. (666e-3 = "0.666" or "0,666" depending on locale - don't use locale-dependent code, use this).
Then there is -p , and combinations.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-28 13:55:22
and then du or dir
f**c, "du" isn't in Windows out of the box?! That's what I get for installing sysinternals first day ...

Use the bottom one then.

Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-09-28 23:33:45
I did a little test with different builds posted here in the forum and my AMD Ryzen 5 3600X under Windows 11.
The file used, is a almost 2 hours long DJ set, of electronic music.
The fastest build was flac-1.4.1-win64-znver3 (Case).

Code: [Select]
Codec      :     PCM (WAV)
Duration   :     57:21:749
Sample rate:     48000 Hz
Channels   :     2
Bits per sample: 16

Igor Pavlov's timer64 have been used to measure the time.

Code: [Select]
timer64.exe flac -8p

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
Global  Time =    53.220
wrote 425812103 bytes, ratio=0,644

flac-1.4.1-x64-znver2-GCC1220 (john33)
Code: [Select]
Global  Time =    53.621
wrote 425812103 bytes, ratio=0,644

flac-1.4.1-win64-gcc12 (Case)
Code: [Select]
Global  Time =    56.626
wrote 425812106 bytes, ratio=0,644

FLAC-1.4.1_Win64_GCC122 (NetRanger)
Code: [Select]
Global  Time =    59.990
wrote 425812106 bytes, ratio=0,644

flac-1.4.1-x64-AVX2 (john33)
Code: [Select]
Global  Time =    73.045
wrote 425812100 bytes, ratio=0,644

flac-1.4.1-x64-AVX2-clang1500
Code: [Select]
Global  Time =    78.772
wrote 425812103 bytes, ratio=0,644

FLAC-1.4.1_Win64_Intel 19.2 (rarewares)
Code: [Select]
Global  Time =    79.738
wrote 425812100 bytes, ratio=0,644

FLAC-1.4.1_Win64_CLANG15 (NetRanger)
Code: [Select]
Global  Time =   100.662
wrote 425812106 bytes, ratio=0,644
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-28 23:46:15
Nearly a factor of two, that is quite a lot.

(But you didn't test the official build? Also, you say nearly two hours, but it says 57 minutes?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-09-29 00:11:17
(But you didn't test the official build? Also, you say nearly two hours, but it says 57 minutes?)

Oops yes it's 57 minutes not two hours. My mistake.

Official build from Xiph against flac-1.4.1-win64-znver3 (Case).

Official Xiph
Code: [Select]
Global  Time =    53.937
wrote 425812106 bytes, ratio=0,644

win64-znver3 (Case)
Code: [Select]
Global  Time =    51.475
wrote 425812103 bytes, ratio=0,644
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 10:00:36
Testing the Acer (Ryzen-equipped) laptop, as that has more consistent timings (two fan settings, on + off?) - Intel considerations at the end;

... with the following:

Settings above -8, anyone?
[...]
So why these choices? Apart from the -3p that is there just to heat up the CPU a little in order not to let the -7p have an advantage of no-throttling?
-7 is synonymous to -l 12 -b 4096 -m -r 6 -A subdivide_tukey(2) and -8 to same but "(3)" at the end. Next step would be (4) then? But at some stage, the "-p" will be more worth it. When? That's why the initial -7p too; I expect it not to be worth it, but I see some surprises on some material.
Also there is this thing about -r8, which people often use to squeeze the most out of the files - expected is that -r7 offers better value for money, of course, but how much? When does it pay off to nudge up that parameter?
Before going to subdivide_tukey(4), there are a few stranger ones. Like -8 -A "tukey(5e-1);subdivide_tukey(3)"; isn't there already a single tukey function in subdivide_tukey(3)? Yes, but that is steeply tapered: closer to a rectangle than to the tukey(onehalf) that is the -5 default. So, trying that instead of the subdivide_tukey(4), hoping to get catch more compression cheaper?
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)": Here I'm making them more different, to see if that helps. (666e-3 = "0.666" or "0,666" depending on locale - don't use locale-dependent code, use this).
Then there is -p , and combinations.

The following was done on all the 38 CDs with no attempt at checking differences between musical genres. I did note though, that -7p sizes varied from larger than -8 to smaller than -8 -A subdivide_tukey(4), but if you are willing to wait for -7p you might as well select something better.

First observations:
* -r7 isn't ruled out at first observation, only later it turns out not to be worth it on the Ryzen - but maybe it is on the Intel, see below. However -r8 offers very little over -r7. 1/15ths of the size improvement per second CPU on the Ryzen, and not worth it on the Intel either, at least not until you slow it down considerably by the apodization functions. So in the following, I ditch all the -r8.
* -A "tukey(666e-3);subdivide_tukey(3/333e-3)" improved over -A  "tukey(5e-1);subdivide_tukey(3)" at about the same time - YMMV. I remove the latter, as it is only going to make numbers look weird. Maybe I should rather have removed the "customized" one?!
* -7p is not worth it. Better go up to -8 -A subdivide_tukey(4 or 5)

So with those deletions, I ordered by compressed size, and calculated: how many bytes can I save per second it costs moving up one step?
* -r7: not that much - it was cheaper to add another apodization function. Only tried up to subdivide_tukey(5).
* -p becomes worth it around at the subdivide_tukey(5) point: if you consider going up from -8 -A subdivide_tukey(4) to 5, consider another doubling of encoding time to -8p instead, as that pays off nearly the same per extra second;
* ... but, in this test, the extra tukey also is worth about the same.

Some numbers after more deletions - but kept a doubtful one, to be explained below the table:
Sizesecondssaved per secondsetting
11969604531 833    -8
11968502388 94010300 -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
1196755657511554399 -8 -A "subdivide_tukey(4)"
1196646337115552733 -8 -A "subdivide_tukey(5)"
1196129143330033572 -8p (note jump in time when using -p)
1196017971933503204 -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
119592501645131522 -8p -A "subdivide_tukey(4)"
119581254247796422 -8p -A "subdivide_tukey(5)"
The "saved per second" means: if you go from the previous setting to this, it is going to take more seconds, (e.g. 107 extra from -8 to the next), but how many bytes do you save per second?
Now the fact that the "2733" is less than the next "3572" indicates you should not use the -8 -A subdivide_tukey(5) - if you think those 400 seconds are worth the savings, you should rather spend even more seconds going to -8p. Deleting that row, the "3572" will also changes - to 3390 - because it is relative to the "previous" row which is now another one.  But YMMV here, it surely depends on material and hardware and maybe even whether you use a different of the compiles posted here.

For comparison, adding a "-r 7" to the -8p -A "subdivide_tukey(4)" line would save 132 bytes per extra second, so you could as well go to  -8p -A "subdivide_tukey(5)"; and, going from -r7 to -r8 makes for only 11.


Evidence from the Ryzen then:
* If you want something heavier than -8, but are not willing to wait for -8p, try another tukey as described, or for simplicity, upping the game to -A subdivide_tukey(4).
* If you want something heavier than -8p, try the same thing
* If you are willing to wait for something that takes >10x as long as -8 (>3x as long as -8p), then I haven't checked much. Maybe at this stage you could consider -r7 - maybe -r7 is worth it "even earlier" on the Intel (see next).



Over to the Intel-equipped Dell laptop, I found something similar, except:
* timings vary too much to be reliable "at every line", but applying the rule of "when in doubt, delete one that the Ryzen results suggest I delete" I end up with something that by and large appears to give the same expression - except possibly -r7. And with the reservation for unreliable timings taken:
* -r7 doesn't seem that worthless as on the Ryzen. Actually you can consider -8p  A "tukey(666e-3);subdivide_tukey(3/333e-3)" -r7 rather than going to subdivide_tukey(4) or at least against (5).  Or maybe for short is you don't feel like typing wrong: -8p -r7 might be worth it. YMMV.


Oh, and for reference, compared to WavPack's -x4 vs -x
* Going up to -8p is about like going -hx to -hx4 with WavPack in terms of gains per second. (-hx4 is considered very slow, but the gains are higher; -8p is not that slow, but saves about proportionally less).
* From -8p to -8p  A "tukey(666e-3);subdivide_tukey(3/333e-3)" is about like going -hhx to -hhx4.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 11:54:29
appears to give the same expression
"impression" ...
(https://ih1.redbubble.net/image.2299611267.5326/tb,1000x1000,small-pad,750x1000,f8f8f8.jpg)

Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 15:08:21
To whom it may concern:
On my quest to find a setting for v1.4.1 that rivals my favourite encoding setup (v1.3.3 with -7 = my sweet spot for encoding time and compressed file size), I found these settings for v1.4.1.
The goal was to achieve the same (or better) encoding speed with better compression.
Code: [Select]
Reference:
FLAC Binary: flac133_case.exe
FLAC Option: -7
 Average time  = 22.682 seconds (5 rounds), Encoding speed = 476.67x
 FLAC file size = 1.168.025.916 Bytes (= 61,241% of WAV size)
For my timings I only used Case's Haswell build, since this was the fastest on my computer.
I haven't varied the windowing functions, because frankly I have little idea what I'm going to do there...
Code: [Select]
a) FLAC Option: -l11 -b4096 -m -r6 -A subdivide_tukey(2)
 Average time  = 22.927 seconds (3 rounds), Encoding speed = 471.58x <= worse encoding speed: 477x -> 472x
 FLAC file size = 1.167.741.823 Bytes (= 61,226% of WAV size) <= better compression: 0.015 percent points

b) FLAC Option: -l11 -b4096 -m -r5 -A subdivide_tukey(2)
 Average time  = 22.134 seconds (5 rounds), Encoding speed = 488.48x <= better encoding speed: 477x -> 488x
 FLAC file size = 1.167.807.739 Bytes (= 61,229% of WAV size) <= better compression: 0.012 percent points
 
c) FLAC Option: -l11 -b3072 -m -r5 -A subdivide_tukey(2)
 Average time  = 21.051 seconds (5 rounds), Encoding speed = 513.62x <= better encoding speed: 477x -> 514x
 FLAC file size = 1.167.708.945 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points

d) FLAC Option: -l11 -b3584 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.729 seconds (3 rounds), Encoding speed = 521.58x <= best encoding speed: 477x -> 522x
 FLAC file size = 1.167.755.713 Bytes (= 61,227% of WAV size) <= better compression: 0.014 percent points

e) FLAC Option: -l11 -b3328 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.866 seconds (3 rounds), Encoding speed = 518.16x <= better encoding speed: 477x -> 518x
 FLAC file size = 1.167.700.585 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points
So it's basically -l11 and -r5 with variations of block size between 4KB and 3KB.
I also tested these settings with another list of WAV files (some 3 hrs of playing time, mostly rock music) and the ranking of the results were the same.
So it's going to be either d) (block size = 0x0E00) or e) (blocksize = 0x0D00) for me.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 15:41:59
The "natural" result first: As -r6 does -r5 plus another little try, a) compresses better than b). And it spends only a split second more. 

More odd is that a) outcompresses -7.  a) is basically just -7 with the "12th" order parameter forced to 0. The only explanation I can come up with, is the fact that FLAC guesstimates first and calculates more exactly when it has picked what it thinks is best, and here it seems that - contrary to guesstimate - putting that parameter to zero actually improves. It is known since back in the day that too high -l could lead to this, but it is a bit surprising that it still kicks in between 11 and 12, if only by 0.015 percentage points.

The other are alternative block sizes.  All of those are divisible by ... well all are indeed divisible by 256, so they don't put restrictions on -r.
Have you tried -b 2048 (edit: or 2304)? Sometimes that improves. 2048 means twice as many blocks as default -b 4096, so twice as big block overhead - but the other side of the coin is that each block has its own predictor, which means it could offer a better fit.

Also you don't need to type out all this. Synonyms:
a) -7 -l 11
b) -7 -l 11 -r 5
c) -7 -l 11 -r 5 -b 3072
d) -7 -l 11 -r 5 -b 3584
e) -7 -l 11 -r 5 -b 3328

Edit: Did you try "cde with -r 6"? Expected: tiny improvement at tiny cost, just like a) over b). Any case of -r5 outcompressing -r6 would, I suppose, be "due to guesstimation error".
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 16:50:44
Here's what I got with a concatenated 6h43m54s .wav file over 7 CDs in different genres.

-5
Total encoding time: 0:48.563, 499.02x realtime
2066932342

-6
Total encoding time: 0:58.828, 411.95x realtime
2062800852

-l 12 -b 3456
Total encoding time: 0:47.531, 509.86x realtime
2060849025

-7
Total encoding time: 1:07.625, 358.36x realtime
2056437870

PS: 3456 = 1152 * 3
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 17:08:06
@Porcus:
Quote
More odd is that a) outcompresses -7
It does not. It outcompresses -7 from v1.3.3.
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -7
 Average time  = 25.416 seconds (10 rounds), Encoding speed = 425.40x <= worse speed: 477x -> 425x
 FLAC file size = 1.167.014.383 Bytes (= 61,188% of WAV size) <= better compression: 0.053 percent points
which was ruled out because it is much slower than -7 on v1.3.3

I also tried blocksizes of 2048 and >3584, but results are worse.

I'm aware of the synonyms, but I prefer to use the "full" settings in my test, just to see all the parameters and don't have to remember what "-7" stands for.

Going to test cde with -r6 asap...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 17:31:06
https://datatracker.ietf.org/doc/draft-ietf-cellar-flac/
Code: [Select]
10.1.1.  Blocksize bits

   Following the frame sync code and blocksize strategy bit are 4 bits
   referred to as the blocksize bits.  Their value relates to the
   blocksize according to the following table, where v is the value of
   the 4 bits as an unsigned number.  In case the blocksize bits code
   for an uncommon blocksize, this is stored after the coded number, see
   section uncommon blocksize (#uncommon-blocksize).

van Beurden & Weaver      Expires 31 March 2023                [Page 29]
Internet-Draft                    FLAC                    September 2022

      +=================+===========================================+
      | Value           | Blocksize                                 |
      +=================+===========================================+
      | 0b0000          | reserved                                  |
      +-----------------+-------------------------------------------+
      | 0b0001          | 192                                       |
      +-----------------+-------------------------------------------+
      | 0b0010 - 0b0101 | 144 * (2^v), i.e. 576, 1152, 2304 or 4608 |
      +-----------------+-------------------------------------------+
      | 0b0110          | uncommon blocksize minus 1 stored as an   |
      |                 | 8-bit number                              |
      +-----------------+-------------------------------------------+
      | 0b0111          | uncommon blocksize minus 1 stored as a    |
      |                 | 16-bit number                             |
      +-----------------+-------------------------------------------+
      | 0b1000 - 0b1111 | 2^v, i.e. 256, 512, 1024, 2048, 4096,     |
      |                 | 8192, 16384 or 32768                      |
      +-----------------+-------------------------------------------+

                                  Table 13
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-29 17:48:31
It does not. It outcompresses -7 from v1.3.3.
Ah, I cannot read.

But, try 1.3.3 at -5. Reason: You say -7 was your sweet spot, but then it is relevant: how much did you actually pay (in seconds and milliseconds) for the -7 compression improvement over -5?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 19:19:13
Here's what I got with a concatenated 6h43m54s .wav file over 7 CDs in different genres.

-5
Total encoding time: 0:48.563, 499.02x realtime
2066932342

-6
Total encoding time: 0:58.828, 411.95x realtime
2062800852

-l 12 -b 3456
Total encoding time: 0:47.531, 509.86x realtime
2060849025

-7
Total encoding time: 1:07.625, 358.36x realtime
2056437870

PS: 3456 = 1152 * 3
The tests above were done using Case's GCC 12.2.0 build (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265). I tried to disable AVX in BIOS to compare the differences but flac.exe simply crashed.

Tests below used the Xiph build which does not crash, the two sets of results showed the differences of using AVX or not.

5
Total encoding time: 0:54.640, 443.52x realtime
2066932338
Total encoding time: 0:50.906, 476.06x realtime
2066932342

6
Total encoding time: 1:09.578, 348.30x realtime
2062800850
Total encoding time: 1:01.516, 393.95x realtime
2062800852

-l 12 -b 3456
Total encoding time: 0:52.922, 457.92x realtime
2060849025
Total encoding time: 0:50.953, 475.62x realtime
2060849021

7
Total encoding time: 1:16.656, 316.14x realtime
2056437872
Total encoding time: 1:10.250, 344.97x realtime
2056437869
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 21:31:42
@Porcus:
Here are the -r6 results, side-by-side with the previously posted -r5:
(better/worse and faster/slower always compared to the "reference" v1.3.3 -7)
Code: [Select]
c5) FLAC Option: -l11 -b3072 -m -r5 -A subdivide_tukey(2)
 Average time  = 21.051 seconds (5 rounds), Encoding speed = 513.62x <= better encoding speed: 477x -> 514x
 FLAC file size = 1.167.708.945 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points

c6) FLAC Option: -l11 -b3072 -m -r6 -A subdivide_tukey(2)
 Average time  = 22.011 seconds (3 rounds), Encoding speed = 491.21x <= faster encoding (477x -> 491x)
 FLAC file size = 1.167.688.587 Bytes (= 61,223% of WAV size) <= better compression: 0.018 percent points


d5) FLAC Option: -l11 -b3584 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.729 seconds (3 rounds), Encoding speed = 521.58x <= best encoding speed: 477x -> 522x
 FLAC file size = 1.167.755.713 Bytes (= 61,227% of WAV size) <= better compression: 0.014 percent points

d6) FLAC Option: -l11 -b3584 -m -r6 -A subdivide_tukey(2)
 Average time  = 21.606 seconds (3 rounds), Encoding speed = 500.41x <= faster encoding (477x -> 500x)
 FLAC file size = 1.167.713.160 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points


e5) FLAC Option: -l11 -b3328 -m -r5 -A subdivide_tukey(2)
 Average time  = 20.866 seconds (3 rounds), Encoding speed = 518.16x <= better encoding speed: 477x -> 518x
 FLAC file size = 1.167.700.585 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points

e6) FLAC Option: -l11 -b3328 -m -r6 -A subdivide_tukey(2)
 Average time  = 21.926 seconds (3 rounds), Encoding speed = 493.11x <= faster encoding (477x -> 493x)
 FLAC file size = 1.167.669.980 Bytes (= 61,222% of WAV size) <= better compression: 0.019 percent points

tldr; speed is clearly slower with -r6 while compression gains are "nothing to write home about"  ;)

Talking about -5: I found my sweet spot @ -7 because I used -8 since I went with FLAC and here the gain in speed was remarkable while I didn't care about the compression loss of -0.039 percent points. Using -5 would boost my encoding speed by some 50% while losing 0.43% of disk space. Really worth to consider if you're planning to reencode your whole collection.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 22:07:43
Here's what I got with a concatenated 6h43m54s .wav file over 7 CDs in different genres.
-l 12 -b 3456
Total encoding time: 0:47.531, 509.86x realtime
2060849025
Just tried sundance's fastest setting on my data using the same Case GCC 12.2.0 build:

-l11 -b3584 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:54.484, 444.79x realtime
2058772911
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 22:33:40
That's faster here to, but losing compression, so not my goal (to achieve the same (or better) encoding speed with better compression than -7 on v1.3.3)...
In the end, you'll have to make up your mind what you're after...  ;)
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -l12 -b3456 <= bennetng setting
 Average time  = 16.319 seconds (3 rounds), Encoding speed = 662.55x <= way faster (477x -> 662x)
 FLAC file size = 1.168.849.826 Bytes (= 61,284% of WAV size) <= worse compression: -0.043 percent points
But your blocksize is very close to my fastest setting along with better compression:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -l11 -b3456 -m -r5 -A subdivide_tukey(2) <= bennetng block size
 Average time  = 20.825 seconds (3 rounds), Encoding speed = 519.18x <= faster encoding (477x -> 519x)
 FLAC file size = 1.167.698.586 Bytes (= 61,224% of WAV size) <= better compression: 0.017 percent points
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-29 23:00:22
My data with -l11 -b3456 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:55.047, 440.24x realtime
2058918957

Also worth to note that my data set's uncompressed size is 4274945852, so quite different to your data set.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-29 23:27:47
@bennetng:
Why do you think a larger data set makes a remarkable difference? Because it covers a greater variety of music/genre/styles or something I didn't think of yet? Btw. my encoding times are one-file-at-a-time, single core, running the timer64'd flac binary in a console window.
Just out of curiosity: How did you find your "magic" blocksize of 3456? Did you try all blocksizes with an interval of 128 bytes or was this a "lucky punch"?  ;) 
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-30 06:26:51
@bennetng:
Why do you think a larger data set makes a remarkable difference? Because it covers a greater variety of music/genre/styles or something I didn't think of yet? Btw. my encoding times are one-file-at-a-time, single core, running the timer64'd flac binary in a console window.
Just out of curiosity: How did you find your "magic" blocksize of 3456? Did you try all blocksizes with an interval of 128 bytes or was this a "lucky punch"?  ;)
What I meant was your compressed data set is in general around 61.2% of PCM, but mine is in general around 48.2%, even with -5 it is still around 48.35% (see Reply #52). Which means that baseline compression ratios are quite different. So mixing both data sets yields somthing like 54.7% for better corpus averaging. For example, one of ktf's plots also showed a similar ratio:
http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf

The table in Reply #54 shows some common blocksizes including 576, 1152, 2304 or 4608. 1152 is used in presets 0-2, and 4096 for 3-8. So 3456 is just a convenient number I got from the original presets.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-30 07:21:19
The 7 CDs used:


Hdcd Sampler Volume 2
https://www.discogs.com/release/6921177-Various-Hdcd-Sampler-Volume-2


Kaitou Saint Tail Original Soundtrack (Disc 1)
https://vgmdb.net/album/61630


Ondekoza (VDR-25231)
Can't find a suitable link for this specific CD, but in general Japanese arrangements featuring Taiko and Shamisen.
https://en.wikipedia.org/wiki/Ondekoza


Persona 2: Innocent Sin ~ The Errors of Their Youth
https://vgmdb.net/album/4383


Picture Of Primitive Hunting (Chinese Ancient Music)
https://www.discogs.com/release/16142867-Various-%E5%8E%9F%E5%A7%8B%E7%8B%A9%E7%8C%8E%E5%9B%BE-%E4%B8%AD%E5%9B%BD%E5%8F%A4%E4%B9%90-Picture-Of-Primitive-Hunting-Chinese-Ancient-Music


Tchaikovsky : 1812, Marche slave
https://www.amazon.com/Tchaikovsky-Marche-slave-Peter-Ilyich/dp/B000001GDT


何婉盈 / Elaine
https://youtu.be/kB7vQQ7hmcg
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-30 08:18:57
Seems your test corpus is easier to encode than my stuff (80% Classic Rock, some 10% Blues, no Classical tracks, no speech). Sadly, lots of the music from the 90s and later suffer from heavy compression (DR < 6) and are a challenge for audio compression.
Just finished some tests with a random selection of 160 audio files (all CDDA) from my collection (WAV file size = 6.111.491.436 bytes) and the results don't differ much from my regular test set:
FLAC file size = 3.736.404.536 Bytes (= 61,137% of WAV size, avg. bitrate = 863 kbps)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-30 08:51:46
A lot of metal here, CDDA averages 918 or something with 1.3.x at -8.
Classical music section encodes to the low 600s even if there are (literal!) tons of loud organ pipes.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-09-30 09:12:52
My neigbour was nice enough to give me a Classical CD (Mozart) for this test:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 Bytes)
FLAC Option: -l11 -b3456 -m -r5 -A subdivide_tukey(2)
 Average time  = 5.128 seconds (3 rounds), Encoding speed = 553.66x
 FLAC file size = 220.191.908 Bytes (= 43,964% of WAV size, avg. bitrate = 620 kbps)

FLAC Binary: flac133_case.exe (718848 Bytes)
FLAC Option: -7
 Average time  = 5.712 seconds (3 rounds), Encoding speed = 496.99x
 FLAC file size = 220.654.634 Bytes (= 44,056% of WAV size, avg. bitrate = 622 kbps)
So compression of this kind of music is in bennetng's ballpark. And still outperforms v1.3.3 -7  :))
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-30 13:11:59
Some free or pay-what-you-can CD resolution audio if people want to test other genres (and broaden their taste).

https://akelei.bandcamp.com/album/de-zwaarte-van-het-doorstane
https://soundcloud.com/blackmooncircle/sets/andromeda (hit the arrow and the download button, you will get .wav I think)
https://blacktapeforabluegirl.bandcamp.com/album/highlights-name-your-price
original master: https://bongripper.bandcamp.com/album/satan-worshipping-doom
same, a remaster: https://bongripper.bandcamp.com/album/satan-worshipping-doom-2020-remaster
https://kavakon.bandcamp.com/album/virgin-lava
https://nadja.bandcamp.com/album/autopergamene
https://thereisnoreasonforanyofthistohavehappened.bandcamp.com/album/til-eru-hr-sem-hafa-aldrei-veri-menn-og-munu-aldrei-ver-a-au-lifi-enn
https://udom.bandcamp.com/album/and-be-no-more
https://zeitgeber-aus.bandcamp.com/album/heteronomy

and Bach's organ works, I used one of them in my test corpus: http://www.blockmrecords.org/bach/download.htm
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-09-30 13:31:22
Here is a very biased corpus with only EDM music showing -b2880 is optimal. I would try something divisible by 512 or 576 without further subdividing, as the differences are too small.

EINHÄNDER ORIGINAL SOUNDTRACK
https://vgmdb.net/album/14

Dariusburst Original Soundtrack
https://vgmdb.net/album/16136

carpe diem "SENKO no RONDE" ORIGINAL SOUND TRACKS Volume 2
https://vgmdb.net/album/4419

BORDER DOWN -Sound Tracks-
https://vgmdb.net/album/311

PCM
2822769308

-6
Total encoding time: 0:40.313, 396.94x realtime
1937591945
68.6415%

-l11 -b3456 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.360, 428.32x realtime
1933346426
68.4911%

-l11 -b2560 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.734, 424.07x realtime
1933285028
68.4889%

-l11 -b3072 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.500, 426.72x realtime
1933147042
68.4841%

-l11 -b2880 -m -r5 -A subdivide_tukey(2)
Total encoding time: 0:37.938, 421.79x realtime
1933105922
68.4826%

-7
Total encoding time: 0:45.453, 352.05x realtime
1932486517
68.4607%
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-09-30 14:20:41
-b 3456 makes for larger files in my corpus (on first few tests). 0.05 to 0.06 percent, so not much, but not the other way.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-01 13:33:12
What FLAC settings should you avoid?
(... if you have my CDDA test corpus and no special considerations.)


Idea: How many bytes do you get for spending an extra second encoding? 
You would pick the low-hanging fruit first. That means, settings that pick "expensive" improvements should be avoided until you are squeezing the last drops out. For example, in my tests, -e is not worth it because you can get the same improvement cheaper elsewhere. (At least, up to settings "nobody" will want to use.) It surely is material dependent; for example, @Gravity Stupor has posted two examples here (https://hydrogenaud.io/index.php/topic,122949.msg1015422.html#msg1015422) and here (https://hydrogenaud.io/index.php/topic,122949.msg1016038.html#msg1016038) about -e actually not being completely dead.


Some initial tl;dr's:
* Avoid -e.
* Avoid -6.
* -p: only  for "-8p", as -7p is not worth it. If you want something in between -8 and -8p, then -8 -A subdivide_tukey(4) is not a bad thing - and maybe even -8 -A subdivide_tukey(5) also makes sense, but somewhere around there you would rather jump to -8p.
* -r8 is not worth it. -r7 ... that depends. I tried an Intel-equipped Dell business laptop and a Ryzen-equipped Acer consumer laptop, and I wouldn't use -r7 on the latter, it took too much time. The Intel Dell shows quite a bit of timing variability, but it seems it does the -r part a bit faster for some strange reason - anyway, I guess it is only in consideration for those who are already at -8p even if I could find a "non-p" setting where it wasn't hopeless?

Note that -8 is "-7 but changing the subdivide_tukey from 2 to 3" [there are some fine detail about that, but forget those], then the natural continuation would be to increase that number - and if you for simplicity apply the rule of thumb that above -5, there is -7, -8, -8p and higher subdivide_tukey(N), you won't do that much wrong.
That said, -7 -l 11 can serve the purpose @sundance tested it to obtain. And myself I found a higher-than-8 customized setting that worked fairly well - the -A "tukey(666e-3);subdivide_tukey(3/333e-3)" which takes the -8 windowing functions, the -5 tapering function, makes them more different from each other and combines them. But that might be a spurious result. But these are fine-tunings, and don't change the impression that the "natural choices" are pretty good with the exceptions in the above bullet items.


Assumptions made:
* FLAC subset or bust - and you will anyway choose -4 or higher
* If you think saving B bytes for a second extra encoding time is acceptable improvement, you will also accept waiting another second for another B bytes saving.  Only when the savings per second falls, you've had enough.
The dubious part of this assumption is that it requires you to behave as if you knew the outcome in advance.
* FLAC decodes fast enough for you: you don't care about decoding CPU footprint.  See https://hydrogenaud.io/index.php/topic,123025.msg1016398.html#msg1016398 and the bottom chart at http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf .
Nothing decodes as fast as FLAC, but if you want something for mass-decoding from SSD or the like, you might want to use -6 or oddballs like -6p or -8p -l 8.
* My hardware and test corpus (and official 1.4 build) are sane enough for testing :-)

One implications is that curves like 3rd diagram at https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227 "should be convex" (i.e.: stretch a ribbon around it, you won't select a point that is above the ribbon).  You see that -6 violates that: it lies above the straight line from the -5 to the -7 points.  And so you should not choose -6: if you are willing to wait for the improvement -5 to -6, you would also be willing to wait for -7.

Because I am in the land of imprecise measurements, as this involves dividing by the difference between times that may vary between runs.  So I did three measurements, picked the median time for each "genre section" (see signature link), each for an Intel-equipped Dell business laptop and a much cheaper Ryzen-equipped Acer Aspire.  Still the latter is more reliable.


Settings tested:
-4 (well even more) to -8 with or without -e or -p. 
-8 with higher subdivide_tukey(N); that is the same as -7 with higher subdivide_tukey(N): -7 implies N=2, upping it to N=3 yields standard -8, so also N=4 to N=8 are a natural continuation. (Edit: to test. Not saying the borderline between natural to use and not, lies precisely between N=8 and N=9.)
-8 with an additional function; partial_tukey(2) was tested for having a different taper parameter than subdivide_tukey(3), but "more promising" was  -A "tukey(666e-3);subdivide_tukey(3/333e-3)" that adds "only one" tukey function, but changes the tapering "in opposite directions" to make them more different.
-8ep
Also various -r settings, but not on everything.
Finally, a few results with the lighter-than-7 -7 -l 11 because @sundance started on it for time saving purposes. -7 -l 11 isn't a bad thing! (Could also have tested -l 10 then, but I didn't.)
With reference to @sundance 's testing: alternative -b were at no use here.


To the results:
What eliminates -e, is that you can get better compression cheaper. Smaller and faster than -8e are -8 -A subdivide_tukey(5) to (7). Smaller and faster than -8ep are are -8p -A subdivide_tukey(5) to (7).
What eliminates -6, is the "convexity" argument: if you are willing to pay for the improvement from -5 to -6, then the improvement from -6 to -7 is so much cheaper: the next byte saved costs less time than the previous.
What eliminates -r 8 is the same as -6. Although, I have not checked whether -8p -A subdivide_tukey(big number) -r 8 can be improved upon by -8p -A subdivide_tukey(bigger number) when "big" is too high.
And I don't think -r7 is useful until you are already at least at -8p.
Since -r7 is "questionable", one may wonder whether -r5 saves a lot of time with insignificant size difference? Turns out that it doesn't save much time. If I allow myself to deviate a little bit from the assumptions and say "at best you won't bother"

When to use -p? As said above, somewhere around -8 -A subdivide_tukey(4) to (5) you will rather take the (quite big!) time cost to get the benefit, as increasing the subdivide_tukey parameter will increase time at small benefit.
At -8p you are at about the same s(h)avings per second as going from WavPack -hx to -hx4.

Then the convexity argument could even rule out -5 in favour of "either -4 or -7" - even more so if one finds a good "lighter -7 alternative". Of course -5 will be used quite a lot out of being the default, but the argument for -4 would be as follows: If you think -5 is better than -7, as the time saved makes up for the size, you get about as good a time saving per megabyte to go to -4. (Not to -3, reveals a brief check.)

The above considerations don't appear to depend heavily on what "genre section" of my corpus I used. Sure some features are more pronounced on this or that, and there might possibly be an odd "error" in the sense that if I had deleted myself down to only one, I would have eliminated otherwise - but the big picture remains.

Lighter than -7, you said? Actually, -7 -11 might be considered. It saves some ten percent time on the total; but more significant, the extra time -5 to -7 is cut by a third. But the size gain from -5 to -7 then? Oh, you only forego ten percent of that.

And finally, how do all these considerations compare to FLAC 1.3? Not tested so much, but as you can expect: the double percision improvement picks some low-hanging fruit, so going -5 to -7 is not as lucrative as it was. On the Ryzen: 1.3.x -5 to -7 would save you 143 kilobytes per extra second taken, and this number is now reduced to 105. For -7 to -8, you would save 21 kilobytes per extra second of encoding, now down to 8.


Some numerical examples: on the Acer/Ryzen, not the most expensive CPU but I got more consistent timings.
-5 takes 419 seconds for 12034842978 bytes, -7 takes 603 seconds for 11976656017 bytes. Size difference divided by time difference becomes 316 kilobytes.
But if instead we went only to -7 -l 11, that would be 535 seconds for 11982589839 bytes. Now we can calculate two sizediff/timediff ratios: -5 to -7 -l 11 becomes 449, while -7 -l 11 becomes 87.
Those are as they should be, 449 is > 87; had the order been the other way around, -7 -l 11 would have been outright bad.
-7 to -8: saves about 30k per second. Again, makes sense that this is < 87, that means we pick those two fruits in the correct order.
-8 to -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)"  (mentioned in a post above): saves about 10k per second
-8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)"  to -8 -A subdivide_tukey(4): saves about 4k.
Going forth to (5) and then from there to -8p and then from -8p to -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)": around 3k at each step.
-8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" to -8p -A subdivide_tukey(4): down to half a k. Only slightly less from (4) to (5)

What about -r7 being bad (on the Ryzen)? From -8p to -8p -r7 you only save like 0.36 kilobytes per extra second. And from -8p -r7 to -8p -r8: only 22 bytes. But it looks like the Intel i7 does -r7 faster than the Ryzen does and can be worth it somewhere.


Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-01 23:16:17
Oh, and:
To speed up -7, one could of course consider dropping the subdivide_tukey(2) (that would then default to subdivide_tukey(1)=tukey(5e-1)). But that gives worse results than going down to -7 -l 11.

Then a stupid error, essentially a common factor of three due to three runs:
And finally, how do all these considerations compare to FLAC 1.3? Not tested so much, but as you can expect: the double percision improvement picks some low-hanging fruit, so going -5 to -7 is not as lucrative as it was. On the Ryzen: 1.3.x -5 to -7 would save you 143 kilobytes per extra second taken, and this number is now reduced to 105. For -7 to -8, you would save 21 kilobytes per extra second of encoding, now down to 8.
Wrong, that was sum over three runs. Luckily, the relationships between them are all good: you can multiply them all by three (well that gets you mean rather than the median I have elsewhere used, but, no big deal).
-5 takes 419 seconds for 12034842978 bytes, -7 takes 603 seconds for 11976656017 bytes. Size difference divided by time difference becomes 316 kilobytes.
This is correct with 1.4.1: Divide 316 by three and you get 105-ish. (Three quite even runs and roundoffs ...)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-02 08:55:39
@john33
I tried some files in the 2L websites.
http://www.2l.no/hires/
Code: [Select]
filename                              MD5 sum
-------------------------------------------------------------------------------
2L-038_01_stereo_FLAC_44k_16b.flac             80b5c0c20168c21073c95699a0bbf992
2L-064_stereo192kHz_01_08.flac                 576ed036eb1ffe78bdb40a131e6dd23f
2L-120_01_stereo.mqacd.mqa.flac                622afbe406784c4cd54226350b2289c9
2L-125_stereo-352k-24b_04.flac                 3f3ffbdb84654e7fb22767fedcfaa30e
2L-139_01_stereo.mqa.flac                      78f2e3afc697cbc4d3ff43567e968187
2L48SACD_14_stereo_96k.flac                    1fb4209b9db97a0089baf37ca5846214
The files are then converted to wav without changing the original bit-depth and sample rate on a RAM drive, then converted to flac with these settings.
X

Then I disabled and enabled AVX support in BIOS, and CPU-Z reported that AVX, AVX2 and FMA3 are being affected. Some flac builds crashed wtih AVX disabled, so here are the tests on some non-crashing builds:

Xiph
Total encoding time: 2:32.250, 14.12x realtime
425513526 bytes
Total encoding time: 1:12.562, 29.63x realtime
425513433 bytes

Free encoder pack
Total encoding time: 2:17.781, 15.60x realtime
425513499 bytes
Total encoding time: 1:36.328, 22.32x realtime
425513499 bytes

https://hydrogenaud.io/index.php/topic,123014.msg1016215.html#msg1016215
Total encoding time: 2:26.344, 14.69x realtime
425513526 bytes
Total encoding time: 1:13.328, 29.32x realtime
425513471 bytes

https://www.rarewares.org/files/lossless/flac-1.4.1-x64.zip
Total encoding time: 2:35.859, 13.79x realtime
425513666 bytes
Total encoding time: 2:36.781, 13.71x realtime
425513632 bytes

The rarewares build is the only one showed almost no speed difference, is this expected?
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-10-02 09:02:56
The rarewares build is generic with no cpu optimisations so I'm not really surprised.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-02 10:18:19
Thanks. Here are results with some AVX-only builds.

Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-znver2-GCC1220.zip
Total encoding time: 1:13.328, 29.32x realtime
425513429 bytes

znver3 (https://hydrogenaud.io/index.php/topic,123014.msg1016407.html#msg1016407)
Total encoding time: 1:11.891, 29.91x realtime
425513429 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-AVX2%20-GCC1220.zip
Total encoding time: 1:12.250, 29.76x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

It seems that the Ryzen builds have no compatibility issue with my Intel CPU.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-10-02 13:07:22
Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

As far as I can tell, the only difference between these two builds is GCC version (12.2.0 for the former, 7.3.0 for the latter). @Case is that correct?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-02 15:05:02
https://hydrogenaud.io/index.php/topic,123025.msg1016339.html#msg1016339
The differences are quite consistent on multi-thread tests as well.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Case on 2022-10-02 17:22:07
Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

As far as I can tell, the only difference between these two builds is GCC version (12.2.0 for the former, 7.3.0 for the latter). @Case is that correct?
That is correct. The builds use identical configuration settings but different compiler version.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-10-02 21:45:33
That is correct. The builds use identical configuration settings but different compiler version.

Not huge, but a pretty decent improvement on GCC's part. Obviously enough to make it jump from last to 1st place in that particular list.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-03 01:36:40
One quick and dirty metaflac performance test. I apply album RG to 20 24bit albums. 18,63GB altogether on my Ryzen 5900x.
Time in minutes:
2:42 xiph official
2:33 john33 GCC 12.2.0 znver2
2:28 Case GCC 12.2.0 thanks Case btw. :) from here: (https://hydrogenaud.io/index.php/topic,123014.msg1016842.html#msg1016842)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-05 02:28:59
Finaly found some time to set up GCC. I used the flags Case kindly offered but set CFLAG -Ofast instead of -O3.
The binary is even a bit faster over here. Maybe others want to try because -Ofast may include optimizations that make problems. It is flac from current git.

btw. if this is faster it would be nice if Case could do a 1.41 official flac and metaflac. I am not so sure about the preconfigured scripts i use. If you want recent git versions Netranger is the man. He has routine with that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-05 09:10:14
Yes, faster on my i3-12100 too. The first result is 4 CDs combined into a single .wav, the second result is the same 4 CDs split into 85 files. All tests used -8p.

Case GCC 12.2.0
Total encoding time: 2:23.610, 101.98x realtime
1465547067 bytes
Total encoding time: 0:33.735, 434.16x realtime
1465976688 bytes

Wombat GCC 12.2.0 Ofast
Total encoding time: 2:19.765, 104.79x realtime
1465547074 bytes
Total encoding time: 0:32.875, 445.52x realtime
1465977283 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-05 11:30:01
Code: [Select]
FLAC Binary: flac141-wombat-gcc1220-OFast.exe (794624 bytes)
FLAC Option: -7
 Average time =  26.876 seconds (5 rounds), Encoding speed = 402.29x
 FLAC size = 1.167.014.661 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  25.931 seconds (3 rounds), Encoding speed = 416.95x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
Global  Time =    25.546 =  100%    Physical Memory =     14 MB
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-05 15:21:14
Maybe others want to try because -Ofast may include optimizations that make problems.
Googled a bit and it seems that -Ofast is not an optimization on processor architecture, it does optimization by changing some floating point logic. Would like to see comments from some developers about whether it is safe to use or not in the case of FLAC.
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2022-10-05 16:04:33
Actual definition is:
Quote
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-05 16:25:25
The created files are exactly the same here using -O3 or -Ofast.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-05 16:32:22
floating point logic. Would like to see comments from some developers about whether it is safe to use or not in the case of FLAC.

No reason it should be unsafe. You use whatever-you-like to come up with a predictor, good or bad - the difference between a "good" and a "bad" one being the size of the residual. At the end of this post, (https://hydrogenaud.io/index.php/topic,120158.msg1014227.html#msg1014227) ktf refers to this Stackexchange question (https://math.stackexchange.com/questions/4488974/efficient-way-of-solving-a-matrix-equation-with-integer-solution) where the point is described: it rounds off, and maybe that gives a slightly suboptimal predictor, which may be a reason why -p makes for a better performance "than it should".

Different compiles leading to slightly different .flac files has been a FAQ item for ages, and lo and behold they still do even if Wombat did not experience such differences right here.  https://hydrogenaud.io/index.php/topic,122949.msg1015699.html#msg1015699 . Unlike Monkey's - where signal and mode (Normal, High etc) give a unique encoded file except header/footer (tags) stuff and thus Monkey's chooses an MD5 on the encode and not the PCM - there are literally millions of potential FLAC files that represent the same audio.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-05 16:49:10
It is already shown on other tests that different compiles created different files which decoded to the same PCM output, no MD5 error and such, like this:
[edit: corrected wrong link]
https://hydrogenaud.io/index.php/topic,123025.msg1016817.html#msg1016817

But this article indicated that for example, Infinities and NaNs can be treated differently:
https://simonbyrne.github.io/notes/fastmath/

So it is not only a math precision issue, it depends on how the higher level codes want the program to do. I am not saying the risk of producing non-bitperfect FLAC files, but risks of significant slowdown or crash when encoding some inputs, especially the crafted ones.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Vladeimir on 2022-10-05 17:44:10
The FMA intrinsics are compiled with "-ffast-math".

https://github.com/xiph/flac/blob/master/src/libFLAC/include/private/cpu.h#L110
https://github.com/xiph/flac/blob/master/src/libFLAC/lpc_intrin_fma.c#L46

I am not sure why the SSE and AVX ones are not.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-05 18:03:56
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
That's why a -Ofast compile from Case's enviroment could be interesting.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-05 18:05:16
The FMA intrinsics are compiled with "-ffast-math".
[...]
I am not sure why the SSE and AVX ones are not.
Because the SSE and AVX code is with intrinsics, but the FMA is plain C targeted at FMA. For SSE and AVX instructions need not to be reordered, but with FMA there is this need, so -fassociative-math is needed, which is part of -ffast-math
Title: Re: FLAC v1.4.x Performance Tests
Post by: Vladeimir on 2022-10-05 18:27:44
That makes total sense, thank you.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-06 04:58:34
A gcc -Ofast compile of flac and metaflac from reference lib flac 1.4.1.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-06 11:22:46
subdivide_tukey(N/taper), testing the tapering parameter

Background: As explained in the docs (https://xiph.org/flac/documentation_tools_flac.html), subdivide_tukey(N) by default tapers a fraction of 0.5 divided by N. The "/" is not a division slash here, only a separator character: subdivide_tukey(N/P) means that a fraction P/N is tapered.
P defaults to 0.5 - as has been the case for the default windowing function up to level -5, and before 1.4 also up to -8.
The question is why divide the 0.5 (or other P) by N? Must be: ktf &co have tested it and found it improves.

Indeed, I found some evidence that it "is not enough": if you bother to tweak, make the tapering parameter even smaller. 
Test for yourselves, it is likely material-dependent. And use scientific notation like 8e-2 rather than the locale-dependent 0.08 or 0,08.

What I did: ran tapering parameters 8e-2, 16e-2, upwards in steps of 8, with N=3, 4, 5. So, subdivide_tukey(3/8e-2), subdivide_tukey(3/16e-2), ... and then bumping the "3" up to 4 and 5. (Not hitting -8 exactly - I did subdivide_tukey(3/48e-2) and not 3/50e-2 , but I had already a standard -8 ... not that it mattered.) Why stop at 5? Because testing indicated that around there, -8p becomes more attractive.

Results: For both N=3 (as in -8!), N=4 and N=5, the taper parameter that made for smallest files, was 24e-2 i.e. 0.24 rather than the default 0.5. Then 0.32 was marginally better than 0.16.

This points at a smaller taper parameter than the one third in the "arbitrarily tested" combination in Reply 48 (https://hydrogenaud.io/index.php/topic,123025.msg1016625.html#msg1016625).

Also checked (this preliminary): as in Reply 48, combining with a bigger single tukey.
Hypothesis: because single tukey has always had the default parameter 0.5 - this after quite a bit of testing back in the day - there is no good reason that this small tapering should be good for a single tukey run, ==> reason why it works is for the subdivisions ==> if you want to improve, try one with a bigger taper parameter like -5 uses.
This to be tested with -p, because - I guess! - there is more of a demand for something that squeezes out more than -8p without wasting the month, than for something between -8 and -8p. (However, since -7 is so good, it might be a case for beefing up -8 further.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 15:11:38
Code: [Select]
FLAC Binary: flac141-wombat-gcc1220-OFast.exe (794624 bytes)
FLAC Option: -7
 Average time =  26.876 seconds (5 rounds), Encoding speed = 402.29x
 FLAC size = 1.167.014.661 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  25.931 seconds (3 rounds), Encoding speed = 416.95x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
Global  Time =    25.546 =  100%    Physical Memory =     14 MB
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
Could you do the same test on the Xiph build as well? Because it is interesting that your speed ranking is opposite to mine. For me Wombat's build is the fastest while Case's GCC 7.3.0 is the slowest.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-06 15:46:11
One quick and dirty metaflac performance test. I apply album RG to 20 24bit albums. 18,63GB altogether on my Ryzen 5900x.
Time in minutes:
2:42 xiph official
2:33 john33 GCC 12.2.0 znver2
2:28 Case GCC 12.2.0 thanks Case btw. :) from here: (https://hydrogenaud.io/index.php/topic,123014.msg1016842.html#msg1016842)
2:12 for the -Ofast version.
All files i created on the Ryzen are the same as the -O3 compile. The further -Ofast optimizations work well together with the flac code.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 16:34:11
@bennetng: Here you go:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
- Average time =  25.288 seconds (5 rounds), Encoding speed = 427.56x (a little faster today... ;)
- FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: xiph-141\flac.exe (299520 bytes)
FLAC Option: -7
- Average time =  27.598 seconds (5 rounds), Encoding speed = 391.77x
- FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 17:03:54
Thanks. Interesting that my results are more similar to the Ryzen than a not much older i7.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-06 18:31:08
Thanks. Interesting that my results are more similar to the Ryzen than a not much older i7.
Four generations, four years - but nothing revolutionary in the architecture? I agree that it is kinda unexpected.
Thinking aloud:
* You are both running them single-threaded? They differ: 6 cores 12 threads  vs 4 cores 8 threads.
* Is there any reason that e.g. RAM should matter?
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 18:46:17
My test environment is this:
CPU: Intel Core i7-8700 CPU @ 3.20GHz
RAM: 2 x 16 GB DDR4-2666 (1333 MHz) SK-Hynix
HDD: Samsung SSD 860 EVO 500GB

Both the source WAVs and the created FLACs come from/go to that SSD.
Btw. is there any (= freeware, trusted, non-system-cluttering) RAM disk solution for Windows 10 to recommend?
And yes, I'm running the test single core, file by file, from a console window. Timing is done with Igor Pavlov's timer64.exe.
But I did not try to select a specific CPU (I think there are tools for this), but let the Windows task manager do its job. So when you watch the task manager, there's not a single CPU constantly at 100% while the test is running, but CPUs are swapped.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 19:50:38
In my case, single or multi-thread does not affect speed ranking. For example Case's GCC 7.3.0 Haswell compile is always the slowest in both single and multi-thread tests.

For RAM, I am using a budget motherboard which only supports DDR4, even though the CPU supports DDR5. DDR4 has been mainstream for more than 5 years. I am using 2x8GB DDR4 3200.

As for AVX, AVX2 and FMA3, the 2013 Intel Haswell (4th gen) already supports all of them, and I was using i3-4160 before February this year.

In fact, Intel 12th gen does not officially support AVX-512, but some of the older Core i does, even though flac 1.4.x does not seem to use AVX-512 at all.

I am using this RAM disk:
https://sourceforge.net/projects/imdisk-toolkit/
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 20:20:05
I am using this RAM disk:
https://sourceforge.net/projects/imdisk-toolkit/
Since you use (and like) it, I'm gonna give it a try.
Does this RAM Disk hold your WAVs and FLACs during your performance tests?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 20:48:46
Does this RAM Disk hold your WAVs and FLACs during your performance tests?
Yes, all files are in the RAM drive, but I don't use timer64, I use foobar's console for timing. To enforce a single encoder instance, either combine everything into a single file, or do this in foobar's converter dialog:
https://hydrogenaud.io/index.php/topic,123025.msg1016809.html#msg1016809

Also, if relevant, I always use FAT32 to format the RAM drive, as NTFS is a more complex file system and occupies more space when formatted. The limitation is FAT32 only allows up to 4GB for a single file. If you have 32GB it should be no issue to create at least a 24GB RAM drive, but a single file cannot exceed 4GB if formatted in FAT32.

Make sure "Create virtual disk in physical memory" is selected when creating the RAM disk.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-06 21:13:56
Tried in foobar2000 like you suggested (single thread, 40 WAVs):
-> Total encoding time: 0:39.531, 273.51x realtime (single thread)
-> Total encoding time: 0:06.688, 1616.65x realtime (allow multiple threads), around 6x faster (matches the 6 cores)
But the single thread encode is way slower compared to flac.exe started in a console window: 0:25.288

Btw. Tested the RAM disk (NTFS) and the encoding time improved by around 200 msec (0.8%) + you extend your SSDs lifetime...
Going to repeat the test with FAT32...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 21:36:37
My previous tests with CDDA including -7 and other settings:
https://hydrogenaud.io/index.php/topic,123025.msg1016652.html#msg1016652

The important thing is relative speed ranking, for example, is Case's GCC v7.3.0 Haswell compile still the fastest when the test method is changed?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-06 21:43:38
In fact, Intel 12th gen does not officially support AVX-512, but some of the older Core i does, even though flac 1.4.x does not seem to use AVX-512 at all.

Sundance's i7-8gen (https://ark.intel.com/content/www/us/en/ark/products/126686/intel-core-i78700-processor-12m-cache-up-to-4-60-ghz.html) doesn't either, it seems. Your i3-12gen here, (https://ark.intel.com/content/www/us/en/ark/products/134584/intel-core-i312100-processor-12m-cache-up-to-4-30-ghz.html) the same instruction set extensions are listed.

However the 12th generation boasts the fancy name of "Gaussian & Neural Accelerator" which, at the risk of just parroting marketing spin,  "is an ultra-low power accelerator block designed to run audio and speed-centric AI workloads. Intel® GNA is designed to run audio based neural networks at ultra-low power, while simultaneously relieving the CPU of this workload."
Not sure if anything will utilize that?!
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-06 21:49:26
I disabled GNA in BIOS in all tests.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-07 09:25:44
Seems I'm quite limited here with my hp BIOS @ Elitedesk 800 G4.
There are no such settings like to disable some of the extended CPU features, I only can toggle "Multithreading" and "VTx"...  :(
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 09:30:46
The FMA intrinsics are compiled with "-ffast-math".
[...]
I am not sure why the SSE and AVX ones are not.
Because the SSE and AVX code is with intrinsics, but the FMA is plain C targeted at FMA. For SSE and AVX instructions need not to be reordered, but with FMA there is this need, so -fassociative-math is needed, which is part of -ffast-math
Does it mean using -Ofast globally can affect something completely irrelevant like progress indicator and such? Are there inline codes to prevent such kinds of global optimizations in certain parts of the codes?

My experience in vectorization is rather limited in GPU shaders and game engines, without touching low level stuff like intrinsics.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-07 15:14:34
Btw. Tested the RAM disk (NTFS) and the encoding time improved by around 200 msec (0.8%) + you extend your SSDs lifetime...
Going to repeat the test with FAT32...
I use the softperfect RAMdisk and exFAT is clearly the fastest with it but has a very big overhead due to its 64k cluster size. It shouldn't matter until you use lots of small files on it.
Until lately Windows had a uppercase renaming bug together with exFAT. That was fixed lately.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 16:10:30
Seems I'm quite limited here with my hp BIOS @ Elitedesk 800 G4.
There are no such settings like to disable some of the extended CPU features, I only can toggle "Multithreading" and "VTx"...  :(
Motherboards being sold separately like the ones from Asus, Gigabyte, MSI and such usually offer more options.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 17:51:16
Sizesecondssaved per secondsetting
11969604531 833    -8
11968502388 94010300 -8 -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
1196755657511554399 -8 -A "subdivide_tukey(4)"
1196646337115552733 -8 -A "subdivide_tukey(5)"
1196129143330033572 -8p (note jump in time when using -p)
1196017971933503204 -8p -A "tukey(666e-3);subdivide_tukey(3/333e-3)" 
119592501645131522 -8p -A "subdivide_tukey(4)"
119581254247796422 -8p -A "subdivide_tukey(5)"
How about this setting on your corpus?
-8 -A "tukey(75e-2);subdivide_tukey(3/25e-2)"
Of course I asked this because it works better on my corpus (about one day of duration), and I adjusted the corpus weighting so that the compression ratio is roughly 55%. You can also try other values which don't require rounding, for example 666e-3 may mean something like 0.66600000858306884765625 in single float.

Title: Re: FLAC v1.4.x Performance Tests
Post by: Octocontrabass on 2022-10-07 19:15:32
GNA [...] Not sure if anything will utilize that?!
It's like a GPU but much smaller. FLAC won't use it.

Does it mean using -Ofast globally can affect something completely irrelevant like progress indicator and such?
Potentially yes, but in practice most floating-point code doesn't rely on the compiler precisely following the floating-point standards. If the progress indicator is affected, you probably wouldn't be able to see what's different.

The real problem with -Ofast is that it can insert code that switches the CPU into a faster but not standards-compliant mode, and this can affect any program that loads a library compiled with -Ofast.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-07 20:13:51
I somehow understand why this Audition bug happened and what the OptimFROG author wanted to correct:
https://hydrogenaud.io/index.php/topic,114816.msg1009053.html#msg1009053
My program will definitely fail if done in the -Ofast way.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-08 10:01:44
How about this setting on your corpus?
-8 -A "tukey(75e-2);subdivide_tukey(3/25e-2)"
Of course I asked this because it works better on my corpus (about one day of duration),

Improves - because of the 25e-2.  The difference between 75e-2 and 666e-3 in the single tukey is ambiguous over genre (the latter is better in the classical section, the former in the "other"), but the overall impact is less than a part per million.
Tested same with "-p" added.

But lowering the subdivide_tukey tapering parameter helps and I think it should be even lower. I tested and found -A subdivide_tukey(24e-2) (https://hydrogenaud.io/index.php/topic,123025.msg1017066.html#msg1017066) to be a good one without the additional -A tukey, but preliminary testing indicates that 25e-2 is "too high" in the presence of that.

The 666 & 333 were not "optimal" choices - they were picked more out of the idea that if I wanted to deviate from 1/2 and 1/2 parameters, then "2/3 and 1/3" would be the next idea. I surely tested both 666&333 and 333&666, but I didn't do any exhaustive testing. So why then state with this three-decimal "accuracy"? Hey, 3/333e-3 is easy to remember. (And then the metal swine selected 666 over 667 for kinda the same reason.)


You can also try other values which don't require rounding, for example 666e-3 may mean something like 0.66600000858306884765625 in single float.
The predictor is rounded off to integer, so decimals beyond some kth will in the very least not matter very often. Quick testing on 11 CD images, starting from your 0.75 & 0.25, I got bit-identical files if I tweaked the fifth decimal, but the fourth would matter. I mean, not "matter" much, but yield different files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-09 11:49:11
Tested: To "-8" and above, added a tukey to a subdivide_tukey, various taperings tested. Do these choices make (much) different impact across genres? (No!)
-8p -A "tukey(Q);subdivide_tukey(N/P)" for N=3, 4, 5 and various P and Q.
Also without "-p".

Of course it doesn't matter much! On one hand, you can shrug it off as nothing by saying that for N=3, the extra tukey - with "optimal" parameters - saves 0.01 percent over standard -8p, and good/bad parameters make for only half of this. Nothing to care about? On the other hand, it is only slightly less than going up to N=4, and slightly more than going from N=4 to N=5. Each of those cost much more time.
So if standard -8p is not enough for you - well for the sports of it I guess - and you are ready to type in some -A manually, you might as well consider this. Same if you want to go up from -8 but without all the way to -8p; then you can just remove the "p" from the below, your material is likely to make more difference than that.

tl;dr: if adding an additional tukey to get ~half the benefit of higher subdivide_tukey at a fraction of the extra time, make its tapering parameter bigger than default (well maybe default if you are at very high compression) and the subdivide_tukey taper parameter very small.
If you like to think in 1/16ths terms: after a bit tweaking, you could try something like 11, 10, 9 or 10, 9, 8 combined with a 1/8 as follows:
N=3: -8p -A "tukey(6875e-4);subdivide_tukey(3/125e-3)" <---- 11/16ths & 1/8th, or reduce the first to 10/16ths for classical music
N=4: -8p -A "tukey(6250e-4);subdivide_tukey(4/125e-3)" <---- 10/16ths & 1/8th, or reduce the first to 9/16ths for classical music. Yes keep the 1/8th.
N=5: -8p -A "tukey(5625e-4);subdivide_tukey(5/125e-3)" <---- 9/16ths & 1/8th, or reduce the first to 5e-1 for classical music. Again keep the 1/8th.

But the genre differences between classical, heavier/metal and "other" didn't cause much drama - not even "relatively" to the very small impact of it all. That is kinda reassuring; even if classical music could use N/2e-1, it gained virtually nothing going down to N/<one eighth>.


So just to explain what I did here:
Also checked (this preliminary): as in Reply 48, combining with a bigger single tukey.
Hypothesis: because single tukey has always had the default parameter 0.5 - this after quite a bit of testing back in the day - there is no good reason that this small tapering should be good for a single tukey run, ==> reason why it works is for the subdivisions ==> if you want to improve, try one with a bigger taper parameter like -5 uses.
This to be tested with -p
I first made the "arbitrary" selection (files with "j" in the name) and then ran the test on the remainder, distinguishing between the classical music, the heavy rock/metal and the "other".
The P and Q are "7e-2", "14e-2" etc., i.e. 0.07 apart, though only the "most reasonable" ones tested on the big corpus. Then tweaked the parameters slightly from the "best", if only to see if small tweaks led to unexpectedly big changes. (They did not.)

Results: Well not unexpected given Reply 48: Make the Q and P tapering parameters quite far from each other as tukey(<big P>);subdivide_tukey(N/<small Q>). The "big" does not mean close to 1, though.
Genre differences: Nothing dramatic - nothing "relatively dramatic" relative to the .01 percent impact either. Sure there is a clear pattern in that the heavier music wants smaller Q, down below 0.1, and also slightly bigger P, but not much - and the classical music calls for slightly lower P. But the "overall" minimum is not far (in kilobytes) from each genre's minimum.

So the first runs ended up with
N=3: -8p -A tukey(70e-2);subdivide_tukey(3/14e-2)
N=4: -8p -A tukey(56e-2);subdivide_tukey(4/14e-2)
N=5: -8p -A tukey(49e-2);subdivide_tukey(5/14e-2)
Tweaking it and looking at genre differences, I ended up with something like up there with the tl;dr. It was the classical music section that made the "56" and "49" win, and it is the heavier section that pulls the other direction. The 14 was a bit too high except for classical music where it mattered very very little, like a few kb on 4 giga.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-09 16:22:48
With only -8 subdivide_tukey(3/x) I got these figures, from best to worst compression:

Difficult content (~70.64% compression ratio)

3/1875e-4
2531723115 bytes

3/2e-1
2531723128 bytes

3/22e-2
2531723205 bytes

3/25e-2
2531724460 bytes

3/125e-3
2531724619 bytes

-8
2531763292 bytes

Difficult contents are the usual electronic music in my collection, and some loudness war songs. Simple contents include speech, classical, ethnic and songs with simple accompaniment.

Simple content (~43.13% compression ratio)

3/25e-2
1799030423 bytes

3/2e-1
1799035729 bytes

3/22e-2
1799046777 bytes

3/1875e-4
1799054794 bytes

3/125e-3
1799080973 bytes

-8
1799116764 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-09 18:45:33
With only -8 subdivide_tukey(3/x) I got these figures, from best to worst compression:

Here is where I actually got a weirdness: .21 was worse than both .20 and .22. And the effect was not due to one genre. Tested a few more because in Reply #93 (https://hydrogenaud.io/index.php/topic,123025.msg1017066.html#msg1017066) I found .32 to be better than .16 over all three, so the below results point at a parameter slightly less than expected.

Anyway, disregarding .21 and doing (nearly) only your parameters, results are not outrageously far from yours, but slightly different - I suspect your speech content makes some impact?
 2e-1 was the best for both classical music and the "other" section. .22 was better than .1875 in these two genre sections. .2 also won the overall.

For my heavier material, go lower: 3/125e-3 is better than .1875 better than .2 better than .22 better than .25
Also checked 1e-1, which narrowly lost to 125e-3.

Impact of choosing "wrong": With your "simple" content, even the difference between the two best was like 3 parts per million. For my classical music, everything from .1875 and up would be within that interval, and same for the "other" genre.
But for your "difficult" material, everything from 125e-3 and up fell within one ppm, and my material needed 4ppm.
Not much still.


The low tapering parameter I found in Reply 115 just underlines that with an additional tukey, you want the two tukeys to be different.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-10 04:11:56
Guess this is my last try.
I am using ./configure options also now from Case's suggestion, -Ofast and -fipa-pta suggested elsewhere. -fipa-pta optimizes a tiny bit and saves some kb from the binaries by only the cost of compiling time.

Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-10 08:22:47
On my set of test files, your latest (really hopefully not last) build is right between Case's gcc v12.2 and gcc v7.3 builds:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes) = gcc v7.3
FLAC Option: -7
 Average time =  25.268 seconds (3 rounds), Encoding speed = 427.89x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-wombat2.exe (784384 bytes)
FLAC Option: -7
 Average time =  25.710 seconds (3 rounds), Encoding speed = 420.54x
 FLAC size = 1.167.014.381 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  26.100 seconds (3 rounds), Encoding speed = 414.26x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

And, fwiw, I was able to get some speed gain compared to plain -7 (on my test set [classic rock music]) @ almost no cost with smaller block size:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7 -b3584
 Average time =  23.949 seconds (3 rounds), Encoding speed = 451.46x <= faster encoding (428x -> 451x) [ comparted to -7]
 FLAC size = 1.167.032.442 bytes (= 61,189% of WAV size, ~863 kbps) <= min. worse compression: 0.001 percent points
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-10 17:07:28
Added more electronic and loudness war contents, hand-picked to only include the highest bitrate files, but does not contain noise music. Around 74.5% compression ratio.

1.3.1 (Xiph)

-8 -b2304
3200412387 bytes

-8
3202131236 bytes

1.3.2 (Xiph)

-8 -b2304
3200203911 bytes

-8
3201989505 bytes

1.4.1 (Case GCC 12.2.0)

-8 -b2304
3199429338 bytes

-8
3201122995 bytes

-8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)"
3201407279 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-10 18:22:45
Yikes, I suck at PowerShell ...

Can anyone hack together for me a script that does the following:

FOR every *.flac IN (D:\given path pattern...\*.flac) DO flac <parameters> with output <same filename except that in E: rather than D>
and measures total CPU time and total time including I/O?

Point being: how much "compression effort" is "free in time" because it compresses while busy writing?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-11 17:50:16
lol. Why look so far. :) When searching the web for compiler options guess where it leads to?
far far away (https://hydrogenaud.io/index.php/topic,118008.msg982030.html#msg982030)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-11 20:18:51
lol. Why look so far. :) When searching the web for compiler options guess where it leads to?
far far away (https://hydrogenaud.io/index.php/topic,118008.msg982030.html#msg982030)
"This is likely to be my last build"

Cue 2022:
Guess this is my last try.

Porcus quoting self:
Quote from: Porcus
rehab is for quitters

 O:)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-11 21:04:20
You got me  :-[  but somehow it makes to much fun  :)
Guess i have to try some more and maybe a 'skylake' version for sundance to test when i am at my PC later.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-11 21:09:14
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476

This might make compiling with -march=native much more rewarding when combined with --disable-asm-optimizations. I've changed to code in such a way that it is much easier to vectorize by a compiler. Currently the intrinsics routines cannot really be tuned by a compiler, but with this change a compiler can use the C code to get an even better result.

I've seen improvements of over 10% with preset 8, when run with -march=native. One could then use AVX512 for example. I don't have access to hardware with AVX512, so I can't say whether that would make sense.

Also, as a bug was found in libFLAC that affects playback with gstreamer, I won't wait long with releasing.
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2022-10-11 22:30:56
much easier to vectorize by a compiler.

Speaking of which, GCC 12 vectorizes even at -O2:
Quote from: https://gcc.gnu.org/gcc-12/changes.html
Vectorization is enabled at -O2 which is now equivalent to the original -O2 -ftree-vectorize -fvect-cost-model=very-cheap.

Not sure how much of an impact this particular change has here given that pretty much everyone just builds FLAC with -O3, but it's interesting nonetheless.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 01:47:09
I took the flags GrieverV uses back with the 1.3.3 version. I left out the math flags because they are part of -Ofast already.
The compile options as single steps are hard to measure here but using them all together creates clearly smaller binaries with a small speed advantage.
A -8p single file encode is now at ~110x vs ~108x or ~1070x vs ~1058x for multiple files in foobar.
I have atached a gcc skylake tuned version also. No difference for me on the 5900x against the haswell tuning but others may test.
And now lets see what ktf's git offers :)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 02:53:43
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476
I compiled reference libFLAC git-3d55a9dc 20221009 in 3 ways but left all additional flags in, sorry.
Nonetheless strange results.
I added --disable-asm-optimizations to ../configure
Compare the numbers to my posts above.

mtune=native is really slow!
408x
40x
since i have a Zen 3 5900x i tried mtune=znver3 and it crawls exactly as slow.

Finaly a mtune=haswell and numbers are almost normal but not fast.
981.51x
104x

It may be it collides with the additional flags but i wonder why mtune=haswell works.

Edit: the same slowness for mtune=native without fancy additional flags
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-12 06:36:56
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476
I compiled reference libFLAC git-3d55a9dc 20221009 in 3 ways but left all additional flags in, sorry.
You've compiled the wrong branch. The branch you're compiling is one without the mentioned optimizations. Checkout branch libFLAC-fast-math
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-12 07:41:07
@Wombat : Tested your builds here:
Code: [Select]
Reference:
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
 Average time =  25.384 seconds (5 rounds), Encoding speed = 425.94x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-wombat-manyflags.exe (718848 bytes)
FLAC Option: -7
 Average time =  25.283 seconds (5 rounds), Encoding speed = 427.65x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-wombat-manyflags-skylake.exe (712192 bytes)
FLAC Option: -7
 Average time =  26.346 seconds (5 rounds), Encoding speed = 410.39x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
So your "manyflags" build with GrieverV settings is a little faster than Case's Haswell build here. But, oddly enough, your Skylake build is slower although my 8th gen i7 is a family member...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 11:15:14
I'm planning something to keep y'all busy: https://github.com/xiph/flac/pull/476

This might make compiling with -march=native much more rewarding when combined with --disable-asm-optimizations. I've changed to code in such a way that it is much easier to vectorize by a compiler. Currently the intrinsics routines cannot really be tuned by a compiler, but with this change a compiler can use the C code to get an even better result.

I've seen improvements of over 10% with preset 8, when run with -march=native. One could then use AVX512 for example. I don't have access to hardware with AVX512, so I can't say whether that would make sense.

Also, as a bug was found in libFLAC that affects playback with gstreamer, I won't wait long with releasing.
Intel's way to deal with AVX-512 in 12th gen Core i is completely unfair to the the non-K i5 and i3 as they don't use E-cores so there should be no compatibility issue with AVX-512. @Porcus  ' CPU should support AVX-512?
On this CPU, an 11th generation i7 mobile

Also thanks for looking into the -ffast-math issue.
Title: Re: FLAC v1.4.x Performance Tests
Post by: cid42 on 2022-10-12 11:57:22
...
Intel's way to deal with AVX-512 in 12th gen Core i is completely unfair to the the non-K i5 and i3 as they don't use E-cores so there should be no compatibility issue with AVX-512. @Porcus  ' CPU should support AVX-512?
Earlier 12th gen didn't have AVX512 fused off so some motherboard+bios combinations allowed you to enable AVX512 if you disabled E-cores, it was disabled across the board so they didn't have to validate and because otherwise they'd have a situation where cheaper models would perform better than expensive models in some situations which would not be a good look, marketing nonsense. AFAIK newer 12th gen runs have unfortunately disabled AVX512 properly.

Muddying the waters a bit more is that Zen 4's AVX512 implementation differs in some key areas (some better some worse, some instruction-chaining performs well/poorly on one arch but not the other, etc), adds to the benchmarking fun: https://mersenneforum.org/showthread.php?t=28102
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-12 12:17:00
Heck, this sounds like "fun" ...

Question:
As go differences between compiles: if build X is faster than build Y, is that
* due to "fewer instructions" executed (--> less heat generated)
or
* due to "instructions queued more efficiently" and some CPU-internal parallelization (--> same heat generated in shorter time)

- or a combination of both? 

On a cooling-constrained setup (laptop!) that makes differences - which depend critically on how much you actually are FLACing at one run:
For someone who acquires a lossless album, does the tagging, and then (re-)compresses it to get everything from a tiny improvement to a large depending on the source file - that is when you will actually watch the thing run to the end, right? - then one might be pretty much done with the album before the CPU needs to wipe sweat? Long-term energy usage that would need to be dissipated during an overnight job is simply not the yardstick then.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 14:49:31
So your "manyflags" build with GrieverV settings is a little faster than Case's Haswell build here. But, oddly enough, your Skylake build is slower although my 8th gen i7 is a family member...
Nice. It may be GCC 12.2.0 does things differently with skylake as older versions and even when your 8700 is a coffee lake it does better with the haswell optimizations.

Also thanks for looking into the -ffast-math issue.
What exactly was this math issue?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 15:58:24
You've compiled the wrong branch. The branch you're compiling is one without the mentioned optimizations. Checkout branch libFLAC-fast-math
WOW! Great job!
This time really the fast-math files :)

Compiled with
haswell
~1290x
~135x

and

native
~1280x
~136x

Most likely only measuring tolerance . It identifies as reference libFLAC 1.4.1 20220922. Is it ok to offer it here?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 16:12:37
What exactly was this math issue?
As mentioned by ktf:
https://github.com/xiph/flac/pull/476
There are a lot of online resources on this denormal topic, for example in this interactive demo:
https://www.h-schmidt.net/FloatConverter/IEEE754.html
You can toggle the checkboxes to see the numeric representations. Specifically, when all "Exponent" checkboxes are empty, the represented values are called denormals (or subnormals). One of what -ffast-math does is setting denormals to zero. Depends on the programmer's intent it may break some codes as the values are no longer the intented ones.

Even if this version of flac is safe to do math in this way, the mentioned issue is the -ffast-math logic could affect other codes which are unrelated to flac, and those codes may require proper denormal support.

A separate process (e.g. foobar2000 loading flac.exe) should be safe, as the exe is being loaded as a separate process.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 16:29:12
Thanky! Didn't have a problem yet with my frontends but ktf's effort is surely most welcome.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-12 19:14:58
As go differences between compiles: if build X is faster than build Y, is that
* due to "fewer instructions" executed (--> less heat generated)
or
* due to "instructions queued more efficiently" and some CPU-internal parallelization (--> same heat generated in shorter time)

- or a combination of both? 
It is a combination. What instructions execute more efficiently varies highly between CPUs. In fact, the resource you linked on AVX512 in Zen 4 lists quite a few such issues. Certain instructions are executed directly on a specific part of the CPU, while others need to be decoded into several instructions. On another CPU, other instructions might have dedicated silicon. This dedicated silicon might be more power hungry, like AVX512.

WOW! Great job!
This time really the fast-math files :)
If I read this correctly, you're seeing a 20% speedup, right?

Quote
Most likely only measuring tolerance . It identifies as reference libFLAC 1.4.1 20220922. Is it ok to offer it here?
Yes, sure. You probably downloaded a tarball instead of checking out git. It can only generate the proper version string when checked out with git. No worries though, this is very close to libFLAC 1.4.1.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-12 19:50:08
Indeed ~20%! The several additional flags optimize C code further and they seem to work well here.

I used https://github.com/ktmf01/flac.git so it gave me the wrong files but the zip from fast-math downloaded manualy worked.
Attached the version i tested above.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-12 20:18:19
You've compiled the wrong branch. The branch you're compiling is one without the mentioned optimizations. Checkout branch libFLAC-fast-math

I compiled this (flac git-cb822660 20221012) on Linux using -march=znver3 -Ofast.  I get the same performance with or without asm optimizations.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-12 20:33:35
Tested ktf's fastmath build:
(sorry, test results were corrupted, will re-test asap)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 20:49:34
Added more electronic and loudness war contents, hand-picked to only include the highest bitrate files, but does not contain noise music. Around 74.5% compression ratio.

1.3.1 (Xiph)

-8 -b2304
3200412387 bytes

-8
3202131236 bytes

1.3.2 (Xiph)

-8 -b2304
3200203911 bytes

-8
3201989505 bytes

1.4.1 (Case GCC 12.2.0)

-8 -b2304
3199429338 bytes

-8
3201122995 bytes

-8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)"
3201407279 bytes
Yes, somewhat bigger file size, see the quoted data for comparison.
-8 -b2304
The two speeds are single and multi-thread results.

Case GCC 12.2.0
Total encoding time: 1:39.094, 245.88x realtime
Total encoding time: 0:29.609, 822.91x realtime
3199429338 bytes

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 1:23.563, 291.58x realtime
Total encoding time: 0:25.078, 971.59x realtime
3200178267 bytes

[EDIT] Added -8p -b2304 tests, only multi-thread:

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 1:21.890, 297.54x realtime
3196718833 bytes

Case GCC 12.2.0
Total encoding time: 1:30.437, 269.42x realtime
3196159402 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-12 21:33:50
... now the corrected results for ktf's fastmath build:
(somehow an orphaned flac file wasn't deleted before starting the test and was accounted in the total FLAC size)
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes) = Reference
FLAC Option: -7
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-ktf-fastmath.exe (665600 bytes)
FLAC Option: -7
 Average time =  21.760 seconds (5 rounds), Encoding speed = 496.87x   <= faster encoding (429x -> 497x)
 FLAC size = 1.167.045.858 bytes (= 61,189% of WAV size, ~863 kbps) <= on-par compression: -0.001 percent points
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-12 21:36:26
@bennetng : 1/40th of a percent bigger files. Actually, if you want that much compression improvement by tweaking parameters, you will likely have to pay more than those nineteen percent time penalty? Going to the Case compile seems to be the cheapest bytes saved?
That is about tenfold the savings in @sundance 's test run?

How does it fare with -8p [and your fave -b]? Asking because "p" brute-forces "a certain task", so there is something it does particularly much of.
(That can be said about -8e as well, and -8 --lax -r 12 also? In the latter case, no other -b please. Not saying they are useful for anything but testing.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-12 21:58:49
@bennetng : 1/40th of a percent bigger files. Actually, if you want that much compression improvement by tweaking parameters, you will likely have to pay more than those nineteen percent time penalty? Going to the Case compile seems to be the cheapest bytes saved?
That is about tenfold the savings in @sundance 's test run?

How does it fare with -8p [and your fave -b]? Asking because "p" brute-forces "a certain task", so there is something it does particularly much of.
(That can be said about -8e as well, and -8 --lax -r 12 also? In the latter case, no other -b please. Not saying they are useful for anything but testing.)
As mentioned in the quoted box of my previous test, the corpus used was heavily biased to the very high bitrate files (~74.5% compression ratio). I just conveniently reused this corpus because it is still in my foobar playlist, so it can be considered as a special case, and -b2304 is suitable for this brutal set of files.

I think the significance of ktf's latest tweak is it offers obvious speed boost for different types of CPUs, and makes -8 much cheaper. -8 is an important preset that many people actually use.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 02:39:17
I tested to compilpe without any flags but -Ofast -m64 -march=haswell to check if the inreased size of resulting files is due to the gcc optimizations. The resulting files are identical so it must be new flac code itself.
My single wav testfile is 2.729.717.132 Bytes consisting of several cd images of different genres.
It compresses to 1.526.366.181 Bytes and 1.526.597.886 Bytes so a 0,015% file increase.

I was also asked for the additional flags. It is no secret and i copied them more or less from Case and GrieverV.
With the new fast-math code Everything together after fno-stack-protector makes almost no difference. Less or even at all against the older flac code.

-Ofast -m64 -march=haswell -fipa-pta -funroll-loops -fno-stack-protector -fno-common -fno-plt -fno-semantic-interposition -falign-functions=32 -fdevirtualize-at-ltrans -fgraphite-identity -floop-nest-optimize -flto -ffat-lto-objects -pipe
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 05:13:01
Ops. Above post misses that these 0,015% file increase is for -8p. Must be i forgot to mention it because i only use -8p for every test here.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 08:14:51
Another corpus with a more typical compression ratio. Faster overall speed than the previous corpus with an extreme ratio.

PCM (17 files)
4223331916 bytes

-8

ktf-fast-math-noasm-manyflags-haswell
multi: 0:17.453, 1371.78x realtime
single: 1:17.250, 309.92x realtime
2511293691 bytes
59.462%

Case GCC 12.2.0
multi: 0:20.859, 1147.79x realtime
single: 1:33.891, 254.99x realtime
2510290651 bytes
59.439%

-8 -A subdivide_tukey(3/2e-1)

ktf-fast-math-noasm-manyflags-haswell
multi: 0:17.422, 1374.22x realtime
single: 1:17.203, 310.11x realtime
2511287164 bytes
59.462%

Case GCC 12.2.0
multi: 0:20.984, 1140.95x realtime
single: 1:34.297, 253.89x realtime
2510265971 bytes
59.438%

-8p

ktf-fast-math-noasm-manyflags-haswell
multi: 0:53.156, 450.40x realtime
single: 3:44.875, 106.46x realtime
2509518160 bytes
59.420%

Case GCC 12.2.0
multi: 1:01.547, 389.00x realtime
single: 4:17.797, 92.87x realtime
2508762363 bytes
59.402%
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-13 08:53:06
Ops. Above post misses that these 0,015% file increase is for -8p.
Are we killing the double precision here?!?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 10:18:55
Multi-thread tests only, with slower settings to stress test temperature and power limit.

-8p -A subdivide_tukey(9)

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 4:32.344, 87.91x realtime
2509390434 bytes

Case GCC 12.2.0
Total encoding time: 8:23.812, 47.52x realtime
2507621979 bytes

-8p -A subdivide_tukey(7)

Case GCC 12.2.0
Total encoding time: 5:12.453, 76.62x realtime
2507754046 bytes

Case GCC 7.3.0
Total encoding time: 5:22.157, 74.31x realtime
2507754044 bytes

So with the latest tweak, (9) is bigger than (7).

My previous tests showed that Case GCC 7.3.0 can make the CPU very hot, and exceed power limit with -8pe, but without -e the max temperature is similar to other builds in this test (83C max), also within power limit.

Still, these temperatures are far below TjMAX (100C) which will trigger thermal throttling.

Tests were done using Intel stock cooler at 27C ambient temperature.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 11:46:20
Another corpus with a more typical compression ratio. Faster overall speed than the previous corpus with an extreme ratio.

PCM (17 files)
4223331916 bytes

-8p

ktf-fast-math-noasm-manyflags-haswell
multi: 0:53.156, 450.40x realtime
single: 3:44.875, 106.46x realtime
2509518160 bytes
59.420%

Case GCC 12.2.0
multi: 1:01.547, 389.00x realtime
single: 4:17.797, 92.87x realtime
2508762363 bytes
59.402%
FLAC-git-90d7fdb3_20221012_Win_GCC122 (https://hydrogenaud.io/index.php/topic,123176.msg1017465.html#msg1017465)
multi: 1:00.453, 396.03x realtime
single: 4:19.515, 92.25x realtime
2509518286 bytes

Multi-thread tests only, with slower settings to stress test temperature and power limit.

-8p -A subdivide_tukey(9)

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 4:32.344, 87.91x realtime
2509390434 bytes

Case GCC 12.2.0
Total encoding time: 8:23.812, 47.52x realtime
2507621979 bytes
FLAC-git-90d7fdb3_20221012_Win_GCC122
Total encoding time: 5:25.297, 73.59x realtime
2509390559 bytes

So, the older version at -8p produced smaller files than the newer version at -8p -A subdivide_tukey(9)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-13 13:58:20
-8p -A subdivide_tukey(9)

ktf-fast-math-noasm-manyflags-haswell
Total encoding time: 4:32.344, 87.91x realtime
2509390434 bytes

Case GCC 12.2.0
Total encoding time: 8:23.812, 47.52x realtime

Look at the times. Could it be that the guesstimations now fails to distinguish between functions to apply? Cf my uneducated outburst at 149 (https://hydrogenaud.io/index.php/topic,123025.100.html) ...
@ktf : would this even be possible?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brazil2 on 2022-10-13 14:09:42
Attached the version i tested
ktf-fast-math-noasm-manyflags-haswell-Wombat.7z (https://hydrogenaud.io/index.php?action=dlattach;topic=123025.0;attach=23688)
This build is quite slow at decoding to WAV.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-13 14:46:22
Look at the times. Could it be that the guesstimations now fails to distinguish between functions to apply? Cf my uneducated outburst at 149 (https://hydrogenaud.io/index.php/topic,123025.100.html) ...
@ktf : would this even be possible?
I'm not sure why, but the last commit is faulty (https://github.com/xiph/flac/commit/90d7fdb3e1f058bd7b94330afc872cf277eae541). I've been doing some testing seeing the results here, and reverting that commit results in compression being back at 1.4.1 levels.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 14:49:33
Another corpus with completely different files.
Upper: ktf-fast-math-noasm-manyflags-haswell
Lower: Case GCC 12.2.0
Single thread, 31 files, 4278658628 bytes PCM size

-5
Total encoding time: 0:44.234, 548.34x realtime
2314954727 bytes
Total encoding time: 0:51.359, 472.27x realtime
2314955036 bytes

-6
Total encoding time: 0:54.000, 449.17x realtime
2310798337 bytes
Total encoding time: 1:01.671, 393.30x realtime
2310700059 bytes

-7
Total encoding time: 1:00.390, 401.64x realtime
2304079795 bytes
Total encoding time: 1:10.094, 346.04x realtime
2304028372 bytes

-8
Total encoding time: 1:19.453, 305.28x realtime
2303392314 bytes
Total encoding time: 1:37.187, 249.57x realtime
2302612016 bytes

-8 -A "tukey(7e-1);subdivide_tukey(3/2e-1)"
Total encoding time: 1:23.250, 291.35x realtime
2303079539 bytes
Total encoding time: 1:47.063, 226.55x realtime
2302352189 bytes

-8e
Total encoding time: 4:18.547, 93.81x realtime
2302133626 bytes
Total encoding time: 4:24.266, 91.78x realtime
2302133736 bytes

-8p
Total encoding time: 3:40.109, 110.19x realtime
2301560531 bytes
Total encoding time: 4:22.265, 92.48x realtime
2300997902 bytes

So -5 and -8e are the ones with marginally smaller sizes.

Not completely on topic but something interesting:
https://twitter.com/foone/status/1126996260026605568?lang=en
https://github.com/flyinghead/flycast/issues/644
Some 15 years ago I reported a similar issue on another Dreamcast emulator (Makaron) and the author talked about the same thing: precision differences between the Hitachi SH-4 CPU and generic x86 CPUs.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 14:53:15
This build is quite slow at decoding to WAV.
No difference here to older versions or even Case 12.2.0

Total:
  Decoded length: 21:29:32.933
  Opening time: 0:00.001
  Decoding time: 0:48.545
  Speed (x realtime): 1593.811
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-13 14:58:48
Total:
  Decoded length: 21:29:32.933
  Opening time: 0:00.001
  Decoding time: 0:48.545
  Speed (x realtime): 1593.811
Looks like a foo_benchmark report but foobar does not use flac.exe to decode files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-13 15:11:40
Ha! Didn't know that.
In frontah a 1.6GB flac decodes to wav in ~10 sec. with the Case and my manyflag fast-math version. I don't see a problem. You may have a better test.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-13 20:52:39
Okay, I've done some testing with the last commit removed, which is now current flac git. See the image below

X

What you see here is three different compiles of FLAC 1.4.1 and one of current git, all with GCC 12.2. Each line has, from left to right, presets -8, -7, -6, -5 and -4.

The light blue one is FLAC 1.4.1 as is, quite similar to what can be downloaded from xiph.org. The dark blue line adds -march=x86-64-v3. That is a 'vendor-neutral' shorthand for including all SSE, AVX, AVX2 and FMA3 instruction set extensions. Most recent CPUs (less than about 6 years old) have those. You can see that is a bit faster, but not much. If you add --disable-asm-optimizations, you get the red line, which is much slower. This is because GCC isn't able to properly optimize.

Now, with the recent changes, you get the green line when using those last options: disabling of specially crafted SSE/AVX/FMA routines so the compiler can try to do better, combined with saying it can use all SSE, AVX, AVX2 and FMA.

For -8 the difference is rather small but for other presets it is quite interesting.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-13 21:31:56
The graph cannot show size differences, I mean even the dots obscure that; but: are the sizes so close to equal that we can be pretty sure the builds do, per file, (as good as) the same thing?
That none of the compiles would make round-offs in the model estimation that leads to them selecting a less elaborate compression?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-14 06:24:57
There will always round-off differences, but these are no longer significant. With the builds posted here recently, a difference was clearly visible in my graphs. Don't have them around anymore though.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-14 08:08:17
For -8 the difference is rather small

In part because the first axis is speed (not time). It seems that if green is X seconds faster than blue at -7, then it would be about X seconds faster at -8 as well. And not far from X for -5 either. (Quick and dirty "calculations" from quick and dirty graph reading - you got the actual times?)

Anyway: does this give any information on what part of the job the fast compile actually does fast?
(Does it matter? Maybe not, but one is curious eh?)

There will always round-off differences, but these are no longer significant.
Looks that way! But visuals sometimes make for optical illusions.

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-14 11:30:37
In part because the first axis is speed (not time). It seems that if green is X seconds faster than blue at -7, then it would be about X seconds faster at -8 as well. And not far from X for -5 either. (Quick and dirty "calculations" from quick and dirty graph reading - you got the actual times?)

Anyway: does this give any information on what part of the job the fast compile actually does fast?
(Does it matter? Maybe not, but one is curious eh?)
The pixel resolution is good up to .002% size difference and I am seeing something like 190/195x vs 210x in -8. Relative speed differences may vary in different CPUs so I would rather do my own tests later on.

In general, if the compiler is allowed to change anything which is logically correct with unlimited precision, then additional steps should be made to limit the intermediate values within a smaller range to avoid significant loss of accuracy.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-14 15:24:00
Size is back to 1.4.1 with current git and speed is inbetween.
-8 -p is now at ~123x and ~1199x realtime.
reference libFLAC git-0665053c 20221013 attached.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-14 16:43:53
fast-math-noasm-manyflags-haswell-git
Same test condition as Reply #155 (https://hydrogenaud.io/index.php/topic,123025.msg1017484.html#msg1017484)

-5
Total encoding time: 0:41.406, 585.79x realtime
2314954944 bytes

-6
Total encoding time: 0:51.437, 471.55x realtime
2310700076 bytes

-7
Total encoding time: 1:00.016, 404.14x realtime
2304028346 bytes

-8
Total encoding time: 1:23.453, 290.64x realtime
2302612081 bytes

-8 -A "tukey(7e-1);subdivide_tukey(3/2e-1)"
Total encoding time: 1:33.125, 260.46x realtime
2302352273 bytes

-8e
Total encoding time: 4:06.922, 98.23x realtime
2302133843 bytes

-8p
Total encoding time: 3:58.109, 101.86x realtime
2300997970 bytes

So around 10-20% faster than Case GCC 12.2.0 with almost identical file sizes.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-14 18:19:58
Time for 1.4.2?  ;)
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-14 19:08:42
@ktf: Pardon my ignorance, just to make sure I understood what's going on right now:
There was a coding error that crept in which you found and corrected as reported in #154.
This glitch resulted in higher encoding speed @ a little worse compression. After correction, the encoded file size is back to what is to be expected and we lost some speed.
-> But the FLACs produced by the faulty binary are still fine? (at least they have the same MD5 hash as the ones from the reference encoder)
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-14 20:10:49
Time for 1.4.2?  ;)
Yes, but for another reason: https://github.com/xiph/flac/issues/471

But the FLACs produced by the faulty binary are still fine?
Yes, nothing wrong with those.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-15 00:40:07
@ktf  - whenever I specify CFLAGS or CXXFLAGS environmental variables, I see some of the default optimizations disappear in the makefile such as -O3 and -funroll-loops.  I also see they are prepended, so any conflicting flags will be overridden by the defaults


So after messing with compile flags, these seem to get me the fastest encode times.  There may be others I haven't tried that might improve things further.  Using gcc-11 seemed to have a slight edge over gcc-12.  I also have to manually edit the Makefile to remove the -fstack-protector-strong flags.
Code: [Select]
export CC="/usr/bin/gcc-11"  
export CXX="/usr/bin/g++-11"
export CFLAGS="-march=native -O3 -funroll-loops -pipe -flto -fomit-frame-pointer -fno-stack-protector"
export CXXFLAGS="-march=native -O3 -funroll-loops -pipe -flto -fomit-frame-pointer -fno-stack-protector"
export LDFLAGS="-Wl,-s"
./configure --disable-asm-optimizations --disable-altivec

flac v1.3.4 - default Debian build.
Code: [Select]
time flac -V -8
File-01.wav: Verify OK, wrote 24439771 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24699891 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39523493 bytes, ratio=0.701
File-04.wav: Verify OK, wrote 40045704 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15922434 bytes, ratio=0.366

real 0m4.398s
user 0m4.144s
sys 0m0.244s

 
time flac -V -8 -e *.wav
File-01.wav: Verify OK, wrote 24435335 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24662789 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39491754 bytes, ratio=0.700
File-04.wav: Verify OK, wrote 40039188 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15782279 bytes, ratio=0.363

real 0m10.772s
user 0m10.523s
sys 0m0.240s


time flac -V -8 -p *.wav
File-01.wav: Verify OK, wrote 24419397 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24686505 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39490413 bytes, ratio=0.700
File-04.wav: Verify OK, wrote 40014362 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15900707 bytes, ratio=0.366

real 0m9.328s
user 0m9.006s
sys 0m0.316s


time flac -V -8 -e -p *.wav
File-01.wav: Verify OK, wrote 24415349 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24649517 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39456681 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40008204 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15754777 bytes, ratio=0.362

real 1m0.028s
user 0m59.751s
sys 0m0.276s


time flac -V -8 -e -p -b 2304 *.wav
File-01.wav: Verify OK, wrote 24435705 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24730440 bytes, ratio=0.563
File-03.wav: Verify OK, wrote 39359937 bytes, ratio=0.698
File-04.wav: Verify OK, wrote 40034300 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15796351 bytes, ratio=0.363

real 1m15.970s
user 1m15.594s
sys 0m0.376s

 
time -V -8 -b 2304 *.wav
File-01.wav: Verify OK, wrote 24465811 bytes, ratio=0.728
File-02.wav: Verify OK, wrote 24780187 bytes, ratio=0.564
File-03.wav: Verify OK, wrote 39445204 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40077690 bytes, ratio=0.819
File-05.wav: Verify OK, wrote 15970568 bytes, ratio=0.367

real 0m4.687s
user 0m4.353s
sys 0m0.320s

 
time flac -V -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)" *.wav
File-01.wav: Verify OK, wrote 24439771 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24699891 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39523493 bytes, ratio=0.701
File-04.wav: Verify OK, wrote 40045704 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15922434 bytes, ratio=0.366

real 0m4.421s
user 0m4.119s
sys 0m0.292s


flac git-0665053c 20221013
Code: [Select]
time flac -V -8 *.wav
File-01.wav: Verify OK, wrote 24434850 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24616849 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39438963 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40041267 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15382115 bytes, ratio=0.354

real 0m4.620s
user 0m4.294s
sys 0m0.292s
 

time flac -V -8 -e *.wav
File-01.wav: Verify OK, wrote 24431769 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24615103 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39419532 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40036367 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15367594 bytes, ratio=0.354

real 0m12.232s
user 0m11.943s
sys 0m0.276s
 

time flac -V -8 -p *.wav
File-01.wav: Verify OK, wrote 24416121 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24604533 bytes, ratio=0.560
File-03.wav: Verify OK, wrote 39416318 bytes, ratio=0.699
File-04.wav: Verify OK, wrote 40010908 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15358013 bytes, ratio=0.353

real 0m11.245s
user 0m10.935s
sys 0m0.293s
 

time flac -V -8 -e -p *.wav
File-01.wav: Verify OK, wrote 24412782 bytes, ratio=0.726
File-02.wav: Verify OK, wrote 24602298 bytes, ratio=0.560
File-03.wav: Verify OK, wrote 39387742 bytes, ratio=0.698
File-04.wav: Verify OK, wrote 40005678 bytes, ratio=0.817
File-05.wav: Verify OK, wrote 15343571 bytes, ratio=0.353

real 1m19.520s
user 1m18.936s
sys 0m0.408s
 

time flac -V -8 -e -p -b 2304 *.wav
File-01.wav: Verify OK, wrote 24433519 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24697968 bytes, ratio=0.562
File-03.wav: Verify OK, wrote 39305681 bytes, ratio=0.697
File-04.wav: Verify OK, wrote 40031779 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15459132 bytes, ratio=0.356

real 1m45.081s
user 1m44.448s
sys 0m0.416s
 

time flac -V -8 -b 2304 *.wav
File-01.wav: Verify OK, wrote 24460714 bytes, ratio=0.728
File-02.wav: Verify OK, wrote 24720984 bytes, ratio=0.563
File-03.wav: Verify OK, wrote 39357368 bytes, ratio=0.698
File-04.wav: Verify OK, wrote 40072725 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15503783 bytes, ratio=0.357

real 0m4.857s
user 0m4.491s
sys 0m0.356s
 

time flac -V -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)" *.wav
File-01.wav: Verify OK, wrote 24438161 bytes, ratio=0.727
File-02.wav: Verify OK, wrote 24619892 bytes, ratio=0.561
File-03.wav: Verify OK, wrote 39500724 bytes, ratio=0.700
File-04.wav: Verify OK, wrote 40043969 bytes, ratio=0.818
File-05.wav: Verify OK, wrote 15389983 bytes, ratio=0.354

real 0m5.573s
user 0m5.337s
sys 0m0.225s

Compression results.  File names appended with version and options used.
ex..  B = -b 2304, A = -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)"
Code: [Select]
24412782 File-01.flac-1.4.1-EP
24415349 File-01.flac-1.3.4-EP
24416121 File-01.flac-1.4.1-P
24419397 File-01.flac-1.3.4-P
24431769 File-01.flac-1.4.1-E
24433519 File-01.flac-1.4.1-EPB
24434850 File-01.flac-1.4.1
24435335 File-01.flac-1.3.4-E
24435705 File-01.flac-1.3.4-EPB
24438161 File-01.flac-1.4.1-A
24439771 File-01.flac-1.3.4
24439771 File-01.flac-1.3.4-A
24460714 File-01.flac-1.4.1-B
24465811 File-01.flac-1.3.4-B
33605420 File-01.wav
 
24602298 File-02.flac-1.4.1-EP
24604533 File-02.flac-1.4.1-P
24615103 File-02.flac-1.4.1-E
24616849 File-02.flac-1.4.1
24619892 File-02.flac-1.4.1-A
24649517 File-02.flac-1.3.4-EP
24662789 File-02.flac-1.3.4-E
24686505 File-02.flac-1.3.4-P
24697968 File-02.flac-1.4.1-EPB
24699891 File-02.flac-1.3.4
24699891 File-02.flac-1.3.4-A
24720984 File-02.flac-1.4.1-B
24730440 File-02.flac-1.3.4-EPB
24780187 File-02.flac-1.3.4-B
43911884 File-02.wav
 
39305681 File-03.flac-1.4.1-EPB
39357368 File-03.flac-1.4.1-B
39359937 File-03.flac-1.3.4-EPB
39387742 File-03.flac-1.4.1-EP
39416318 File-03.flac-1.4.1-P
39419532 File-03.flac-1.4.1-E
39438963 File-03.flac-1.4.1
39445204 File-03.flac-1.3.4-B
39456681 File-03.flac-1.3.4-EP
39490413 File-03.flac-1.3.4-P
39491754 File-03.flac-1.3.4-E
39500724 File-03.flac-1.4.1-A
39523493 File-03.flac-1.3.4
39523493 File-03.flac-1.3.4-A
56417468 File-03.wav
 
40005678 File-04.flac-1.4.1-EP
40008204 File-04.flac-1.3.4-EP
40010908 File-04.flac-1.4.1-P
40014362 File-04.flac-1.3.4-P
40031779 File-04.flac-1.4.1-EPB
40034300 File-04.flac-1.3.4-EPB
40036367 File-04.flac-1.4.1-E
40039188 File-04.flac-1.3.4-E
40041267 File-04.flac-1.4.1
40043969 File-04.flac-1.4.1-A
40045704 File-04.flac-1.3.4
40045704 File-04.flac-1.3.4-A
40072725 File-04.flac-1.4.1-B
40077690 File-04.flac-1.3.4-B
48963980 File-04.wav
 
15343571 File-05.flac-1.4.1-EP
15358013 File-05.flac-1.4.1-P
15367594 File-05.flac-1.4.1-E
15382115 File-05.flac-1.4.1
15389983 File-05.flac-1.4.1-A
15459132 File-05.flac-1.4.1-EPB
15503783 File-05.flac-1.4.1-B
15754777 File-05.flac-1.3.4-EP
15782279 File-05.flac-1.3.4-E
15796351 File-05.flac-1.3.4-EPB
15900707 File-05.flac-1.3.4-P
15922434 File-05.flac-1.3.4
15922434 File-05.flac-1.3.4-A
15970568 File-05.flac-1.3.4-B
43467356 File-05.wav
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 12:22:34
It would be interesting to check the "fake hi-res improvement" as well. Same corpus as Reply #155 (https://hydrogenaud.io/index.php/topic,123025.msg1017484.html#msg1017484). Speed not shown as I don't have enough RAM drive space.

RetroArch 88200Hz, highest quality, 24-bit, no dither, no RG (i.e. with intersample over induced clipping)

Case GCC 12.2.0 (-8)
5606032877 bytes

fast-math-noasm-manyflags-haswell-git (-8)
5606121407 bytes

v1.3.2 Xiph x64 (-8)
7179114110 bytes

v1.3.2 (-8p)
7174033037 bytes

v1.3.2 (-8e)
7026991603 bytes

RetroArch 88200Hz, lower quality, 24-bit, no dither, with RG (i.e. no clipping but with some ultrasonic leakage)

Case GCC 12.2.0 (-8)
6946871439 bytes

fast-math-noasm-manyflags-haswell-git (-8)
6946870597 bytes

v1.3.2 (-8)
7200099834 bytes

v1.3.2 (-8p)
7196121884 bytes

v1.3.2 (-8e)
7138462138 bytes

RetroArch 88200Hz, normal quality, 24-bit, no dither, with RG (i.e. no clipping but with minor ultrasonic leakage)

Case GCC 12.2.0 (-8)
6231623126 bytes

fast-math-noasm-manyflags-haswell-git (-8)
6231620978 bytes

I don't have a lot of "real" hi-res files to test apart from the free ones. Files above 48kHz may use -b beyond 4608 so others can test it too.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-15 14:49:11
One tiny thing is that git version generated files are some bits larger only because of the longer version number string.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 15:09:59
In this case the file size difference should be consistant among different settings, but for example the post right above it:

RetroArch 88200Hz, normal quality, 24-bit, no dither, with RG (i.e. no clipping but with minor ultrasonic leakage)

Case GCC 12.2.0 (-8)
6231623126 bytes

fast-math-noasm-manyflags-haswell-git (-8)
6231620978 bytes

The git version is 2148 bytes smaller, while in another test:

RetroArch 88200Hz, highest quality, 24-bit, no dither, no RG (i.e. with intersample over induced clipping)

Case GCC 12.2.0 (-8)
5606032877 bytes

fast-math-noasm-manyflags-haswell-git (-8)
5606121407 bytes

The git version is 88530 bytes bigger. Negligible differences, but not consistent.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-15 15:33:47
Sorry, was not related to these files especialy. The longer git string is 6 or 7 bits. Didn't check it further.

This is for 18.6GB -8 -p HiBitrate files i used for RG testing:

19.713.863.889 Bytes flac 1.4.1
19.713.858.079 Bytes fast-math-noasm-manyflags-haswell-git
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 18:27:46
http://www.2l.no/hires/
Code: [Select]
http://www.lindberg.no/hires/test/2L-145/2L-45_stereo_01_FLAC_352k_24b.flac
http://www.lindberg.no/hires/test/2L-139/2L-139_stereo_FLAC_176k_24b_01.flac
http://www.lindberg.no/hires/test/2L-106/2L-106_stereo_PCM-96k_MAGNIFICAT_04.flac
http://www.lindberg.no/hires/test/2L38_01_96kHz.flac

(https://hydrogenaud.io/index.php?action=dlattach;attach=23555;image)

Upper: fast-math-noasm-manyflags-haswell-git
Lower: Case GCC 12.2.0
1579150352 bytes PCM size

-8
Total encoding time: 0:31.921, 49.25x realtime
772176807 bytes
Total encoding time: 0:30.422, 51.67x realtime
772176835 bytes

-8e
Total encoding time: 2:08.391, 12.24x realtime
772158544 bytes
Total encoding time: 1:47.328, 14.64x realtime
772158545 bytes

-8p
Total encoding time: 2:39.969, 9.82x realtime
771495942 bytes
Total encoding time: 2:10.266, 12.06x realtime
771495961 bytes

-8 -b16384
Total encoding time: 0:31.125, 50.51x realtime
769125599 bytes
Total encoding time: 0:29.109, 54.01x realtime
769125592 bytes

-8 -b16384 -A subdivide_tukey(5)
Total encoding time: 1:01.484, 25.57x realtime
769070756 bytes
Total encoding time: 0:52.516, 29.93x realtime
769070764 bytes

Case GCC 12.2.0 is consistently faster with these hi-res files on my i3-12100.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-15 20:06:30
Upper: fast-math-noasm-manyflags-haswell-git
Lower: Case GCC 12.2.0
1579150352 bytes PCM size
-8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"
Total encoding time: 0:41.172, 38.18x realtime
769068252 bytes
Total encoding time: 0:36.953, 42.54x realtime
769068236 bytes

Combining cheap windows sometimes can produce good results.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-15 20:35:32
hann is tukey(1) - the tukey windowing is a rectangle with cosine tapering, and when the "rectangle" hits zero width there is only the cosine left: https://en.wikipedia.org/wiki/List_of_window_functions#Tukey_window
So again you got "two very differently tapered tukeys".
flattop is a weirdo, it is even negative somewhere, but at HA fifteen years ago, it would do well in combination with tukey, so it is an obvious "try this if you want another".

For high resolution - and I got a bit of my testing material from 2L too! - I recall that a gauss window sometimes did "surprisingly" well. That is, surprise compared to it being of very little value for CDDA.
Note, welch = parabola and gauss = exp(parabola).
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-16 02:36:41
Tested a loud 24-96 album and the performance hit is pretty hefty indeed. -8 -p single file.

Case 1.4.1
26.71x realtime

1.4.1 manyflags -Ofast
27.63x realtime

fast-math-noasm-manyflags-haswell-git
21.36x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-16 10:37:31
Oh, 96/24 ... Attached: a few seconds of 96/24, squeezed down to < 2 MB.

Not at all randomly selected: 1.4.1's double precision makes savings like 17 percent at -7, that is YUGE. (But, it is not so that I cherry-picked the best-looking few seconds in the track. Though I did avoid the most dense part.)

* Compared to CDDA, it is easier to beat -p by stacking up with -A [functions]. For the full track, I could beat -p at half the encoding time
* -b [something] can often make a difference on 96/24, but default looks good on this clip. Also the gains from -r are not jaw-dropping either.

At https://hydrogenaud.io/index.php/topic,120158.msg1003288.html#msg1003288 I used the entire EP for testing, but of course I cannot share more than a clip. Buy it :-)

Music: "Temptation", 2017 remake, by Canadian band The Tea Party (who, name-wise, suffered halfway the same fate as ISIS ...)
Known for "Moroccan roll" style hard rock, however this track is largely industrial synth. I bought the EP from https://teaparty.com/tx20 , (2.99 Canadian $ - and 1.99 in MP3 for those of you who happily let others do the lossless testing). You can listen there or on Spotify, https://open.spotify.com/album/6Q3GV4HsGwPzQ9a2TA8cg0

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-16 11:41:08
Tested a loud 24-96 album and the performance hit is pretty hefty indeed.
If it is unavoidable then I can still use 1.4.1 for hi-res and 1.4.2 for CDDA.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-16 19:18:49
flattop is a weirdo, it is even negative somewhere, but at HA fifteen years ago, it would do well in combination with tukey, so it is an obvious "try this if you want another".
Flattop tends to work better when the upper spectrum is more empty. An example of DSD to flac conversion with different filtering.
X
-8 -A subdivide_tukey(3/2e-1) with and without flattop

818465962 25kHz flattop.flac
818511667 25kHz.flac (.0055842% bigger)

886963408 Multistage flattop.flac
886996117 Multistage.flac (.0036878% bigger)

====================================

Another example, the "raw DXD" file:
http://www.2l.no/hires/DXD-DSD/index.html
-8 -A subdivide_tukey(3/2e-1) with and without flattop

209196492 JGH flattop.flac
209215084 JGH no flattop.flac (.0088873% bigger)

With optimal -b the effect is even bigger.
-8 -b16384 -A subdivide_tukey(3/2e-1) with and without flattop

207277309 JGH flattop.flac
207300539 JGH no flattop.flac (.0112072% bigger)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-17 09:04:22
The weakness of git version is 24-bit.
Upper: Case GCC 12.2.0
Lower: fast-math-noasm-manyflags-haswell-git
-8p

16/96
Total encoding time: 0:05.204, 59.04x realtime
Total encoding time: 0:04.922, 62.43x realtime

16/192
Total encoding time: 0:12.406, 24.76x realtime
Total encoding time: 0:11.859, 25.91x realtime

16/352
Total encoding time: 0:26.329, 11.67x realtime
Total encoding time: 0:25.657, 11.97x realtime

24/96
Total encoding time: 0:12.828, 23.95x realtime
Total encoding time: 0:15.469, 19.86x realtime

24/192
Total encoding time: 0:27.688, 11.09x realtime
Total encoding time: 0:34.485, 8.91x realtime

24/352
Total encoding time: 0:53.562, 5.73x realtime
Total encoding time: 1:26.813, 3.53x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-18 02:38:05
flac git-92928f28 20221017

Clang 16 vs GCC 12.  Both compiled with the same cflags/cxxflags.

Compiled with GCC 12.2.0
Code: [Select]
flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658916991 bytes, ratio=0.600

/usr/local/bin/flac -V -8 the_fragile_album.wav
Encode Time: 1:34.67



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658690092 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -e the_fragile_album.wav
Encode Time: 1:56.36



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658391508 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -p the_fragile_album.wav
Encode Time: 1:55.56



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658115494 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -e -p the_fragile_album.wav
Encode Time: 7:12.51



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659488938 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -b 2304 the_fragile_album.wav
Encode Time: 1:22.54



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659082364 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -A tukey(5e-1);partial_tukey(2);punchout_tukey(3) the_fragile_album.wav
Encode Time: 1:25.99

Clang 16.0.0
Code: [Select]
flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658916991 bytes, ratio=0.600

/usr/local/bin/flac -V -8 the_fragile_album.wav
Encode Time: 1:06.14



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658690092 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -e the_fragile_album.wav
Encode Time: 1:56.12



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658391508 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -p the_fragile_album.wav
Encode Time: 2:00.89



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 658115494 bytes, ratio=0.599

/usr/local/bin/flac -V -8 -e -p the_fragile_album.wav
Encode Time: 11:54.96



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659488936 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -b 2304 the_fragile_album.wav
Encode Time: 0:58.16



flac git-92928f28 20221017
Copyright (C) 2000-2009  Josh Coalson, 2011-2022  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

the_fragile_album.wav: Verify OK, wrote 659082364 bytes, ratio=0.600

/usr/local/bin/flac -V -8 -A tukey(5e-1);partial_tukey(2);punchout_tukey(3) the_fragile_album.wav
Encode Time: 0:51.12
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-18 06:58:11
Wow, night and day speed differences! Effect of Remove all assembler (https://github.com/xiph/flac/commit/75ef7958df603ca6de29fa00e82615e0da017903) and / or Assume Clang supports x86 intrinsics up to FMA (https://github.com/xiph/flac/commit/90c0562d4eb302b01d9b82c75a7f6a66261c5546)?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-18 13:08:30
Compiling with --disable-asm-optimizations Clang slows the performance to half for 24-96 files against pure gcc.
Is --disable-asm-optimizations the right way atm.?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-18 14:55:36
Ha! Didn't know that.
In frontah a 1.6GB flac decodes to wav in ~10 sec. with the Case and my manyflag fast-math version. I don't see a problem. You may have a better test.
flac 1.4.1 = Case GCC 12.2.0
flac git-0665053c 20221013 = fast-math-noasm-manyflags-haswell-git
8h46m48s single flac file (~3502MB) encoded with -l0 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:20,06

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:23,49

Same file encoded with -l6 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:20,61

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:21,39

Same file encoded with -l12 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:22,18

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:22,95

Same file encoded with --lax -l32 then decoded with -ts

flac 1.4.1
H:\>flac -ts H:\Image.flac
00:00:25,00

flac git-0665053c 20221013
H:\>flac -ts H:\Image.flac
00:00:25,73

The timing was done using this method:
https://stackoverflow.com/a/9938411

I modified the script a bit, drag and drop a flac file into the cmd file to test in the same way I did. Even preset -5 uses -l8 so the differences should be small in most cases. Don't know it is CPU dependent or not.
Title: Re: FLAC v1.4.x Performance Tests
Post by: .halverhahn on 2022-10-18 15:28:26
The timing was done using this method:
https://stackoverflow.com/a/9938411

Or just use the Powershell and the Measure-Command ;)

e.g.:
Code: [Select]
PS C:\TEMP\FLAC141> Measure-Command { .\flac141xiph.exe -8 imput.flac -f -o output.flac141xiph.flac | Out-Default }
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-18 15:58:21
Oh. If now I could get PowerShell to do something as simple as FOR /R %f IN (*.flac) ...

... any help at https://hydrogenaud.io/index.php/topic,123025.msg1017305.html#msg1017305 ?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-18 16:05:18
Speed differences in decoding lower -l is real.

-l0

PS H:\> measure-command {h:\flac-case -ts image.flac | out-default}
TotalSeconds      : 20.2462909

PS H:\> measure-command {h:\flac-git -ts image.flac | out-default}
TotalSeconds      : 23.5196237

-l8

PS H:\> measure-command {h:\flac-case -ts image.flac | out-default}
TotalSeconds      : 21.362982

PS H:\> measure-command {h:\flac-git -ts image.flac | out-default}
TotalSeconds      : 22.1673663
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-20 00:40:29
So it seems in my test with Clang 16 vs GCC 12, Clang only seems to have the advantage with larger files being written to the storage device.  In my case, I'm using a single SSD in a ZFS pool.  I noticed encoding with the GCC compiled version, times were consistent within 1 second after multiple rounds of the same test, however, with the Clang compiled version, times seemed to vary more between rounds.  I wonder why Clang seems to do better when files are written to a disk...

Each round of test was done in order with these options:
Code: [Select]
flac -d *.flac
flac -V -8 *.wav
flac -V -8 -e *.wav
flac -V -8 -p *.wav
flac -V -8 -e -p *.wav
flac -V -8 -b 2304 *.wav
flac -V -8 -A "tukey(5e-1);partial_tukey(2);punchout_tukey(3)" *.wav


Nine Inch Nails - The Fragile (single file for whole album)
GCC 12.2.0 - R/W to disk
Code: [Select]
Decode Time: 0:37.55 
Encode Time: 1:34.83
Encode Time: 1:56.33
Encode Time: 1:53.33
Encode Time: 7:09.77
Encode Time: 1:30.12
Encode Time: 1:24.01

Nine Inch Nails - The Fragile (single file for whole album)
Clang 16.0.0 - R/W to disk
Code: [Select]
Decode Time: 0:36.79 
Encode Time: 1:19.53
Encode Time: 2:28.14
Encode Time: 2:26.93
Encode Time: 11:54.74
Encode Time: 1:23.08
Encode Time: 0:59.60

Nine Inch Nails - The Fragile (single file for whole album)
GCC 12.2.0 - R/W to ramdisk
Code: [Select]
Decode Time: 0:05.28 
Encode Time: 0:20.43
Encode Time: 0:56.15
Encode Time: 0:53.01
Encode Time: 6:11.10
Encode Time: 0:22.19
Encode Time: 0:24.88

Nine Inch Nails - The Fragile (single file for whole album)
Clang 16.0.0 - R/W to ramdisk
Code: [Select]
Decode Time: 0:05.52 
Encode Time: 0:23.80
Encode Time: 1:27.26
Encode Time: 1:29.84
Encode Time: 10:47.43
Encode Time: 0:24.70
Encode Time: 0:28.96

And because someone mentioned The Tea Party! 
This test mixed the album Transmission as individual tracks and Interzone Mantras as a single file.
GCC 12.2.0 R/W to disk
Code: [Select]
Decode Time: 0:42.11 
Encode Time: 1:18.81
Encode Time: 1:58.11
Encode Time: 1:52.34
Encode Time: 7:26.50
Encode Time: 1:25.81
Encode Time: 1:27.76

Clang 16.0.0 R/W to disk
Code: [Select]
Decode Time: 0:43.53 
Encode Time: 1:27.22
Encode Time: 2:30.46
Encode Time: 2:35.19
Encode Time: 12:20.65
Encode Time: 1:15.82
Encode Time: 1:00.24

GCC 12.2.0 - R/W to ramdisk
Code: [Select]
Decode Time: 0:05.37 
Encode Time: 0:20.91
Encode Time: 0:57.80
Encode Time: 0:53.64
Encode Time: 6:24.09
Encode Time: 0:22.74
Encode Time: 0:25.49

Clang 16.0.0 - R/W ramdisk
Code: [Select]
Decode Time: 0:05.67 
Encode Time: 0:23.27
Encode Time: 1:29.83
Encode Time: 1:29.42
Encode Time: 11:07.48
Encode Time: 0:25.76
Encode Time: 0:25.12
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 13:35:29
flac1013 = Wombat fast-math-noasm-manyflags-haswell-git
flac1021 = john33 flac-1.4.1-git-6abf272-20221021
flac141 = Case GCC 12.2.0

24-bit transcoding (96-352kHz)

PS H:\> measure-command{h:\flac1013 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  60.2480702


PS H:\> measure-command{h:\flac1021 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  56.8414738


PS H:\> measure-command{h:\flac141 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  55.1369771


16-bit transcoding (48kHz)

PS H:\> measure-command{h:\flac1013 *.flac -fs8}|select totalseconds

TotalSeconds
------------
   76.156825


PS H:\> measure-command{h:\flac1021 *.flac -fs8}|select totalseconds

TotalSeconds
------------
  86.8591274


PS H:\> measure-command{h:\flac141 *.flac -fs8}|select totalseconds

TotalSeconds
------------
  83.7913263


Decoding files encoded with -8 (16-bit 48kHz) For unknown reasons there is always a startup delay in the first, non-repeating decoding command despite using RAM disk, so both first and second runs are posted.

PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  19.7201548


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  16.9045647


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  20.4929391


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  17.6664767


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  19.6514963


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  16.8442894


Decoding files encoded with -8 -b16384 (24-bit 96-352kHz)


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
   11.280577


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
   8.4720569


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  11.5633101


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
   8.7760674


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  11.2297442


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
   8.4221105
  
  
Decoding files encoded with -l0, mixed bit-depth and sample rate:

PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  14.9447091


PS H:\> measure-command{h:\flac1013 -ts *.flac}|select totalseconds

TotalSeconds
------------
  12.1270033


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
   13.809708


PS H:\> measure-command{h:\flac1021 -ts *.flac}|select totalseconds

TotalSeconds
------------
  10.9962077


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
  13.2682311


PS H:\> measure-command{h:\flac141 -ts *.flac}|select totalseconds

TotalSeconds
------------
   10.469793
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-21 14:37:41
Seems that with the next release building without asm optimizations is good for 16bit only apps llike CUETools (besides HDCD).
Attached a current git versioin of both ways to compile.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 15:12:22
flac1013 = Wombat fast-math-noasm-manyflags-haswell-git
flac1021 = john33 flac-1.4.1-git-6abf272-20221021
flac141 = Case GCC 12.2.0

24-bit transcoding (96-352kHz)

PS H:\> measure-command{h:\flac1013 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  60.2480702


PS H:\> measure-command{h:\flac1021 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  56.8414738


PS H:\> measure-command{h:\flac141 *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  55.1369771
PS H:\> measure-command{h:\flac1021wombat *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  55.8933077


PS H:\> measure-command{h:\flac1021wombat-noasm *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  60.1707838

Same files in same sample rates, but 16-bit

PS H:\> measure-command{h:\flac1021wombat *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  37.7704419


PS H:\> measure-command{h:\flac1021wombat-noasm *.flac -fs8 -b16384 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"}|select totalseconds

TotalSeconds
------------
  35.4706544
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 16:07:52
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-10-21 16:09:51
I tested again my AMD Ryzen 5 3600X with different builds to see if there is some encoding speed up.  8)

Code: [Select]
flac -8p

Source
Code: [Select]
Codec      :     PCM (WAV)
Duration   :     57:21:749
Sample rate:     48000 Hz
Channels   :     2
Bits per sample: 16

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    56.570

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 425812103 bytes, ratio=0,644
Global  Time =    54.792

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    57.293

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    52.017
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-21 16:18:00
@music_1 : If you test -8, -8r8 and -8e rather than -8p, what happens?
Asking because -p brute-forces part of the process, -e a different one. Of course -8 is faster than either (and -8e is not much useful anymore!), but it is interesting to see whether the order of compiles stays the same. If not, then one makes part of the job more efficient and another a different part of the job.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 16:34:33
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
flac1021znver2john33
24/48
Total encoding time: 1:03.859, 200.51x realtime
16/48
Total encoding time: 0:29.985, 427.03x realtime

My i3-12100 must be a remarked Ryzen  :))
Title: Re: FLAC v1.4.x Performance Tests
Post by: music_1 on 2022-10-21 16:48:00
flac -8

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 426124832 bytes, ratio=0,645
Global  Time =    18.246

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 426124828 bytes, ratio=0,645
Global  Time =    17.863

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 425812106 bytes, ratio=0,644
Global  Time =    17.792

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 426124836 bytes, ratio=0,645
Global  Time =    17.647

flac -8r8

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 426124602 bytes, ratio=0,645
Global  Time =    20.921

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 426124598 bytes, ratio=0,645
Global  Time =    20.196

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 426124606 bytes, ratio=0,645
Global  Time =    20.341

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 426124606 bytes, ratio=0,645
Global  Time =    20.960

flac -8e

flac 1.4.1-win64 Xiph
Code: [Select]
wrote 426050030 bytes, ratio=0,645
Global  Time =    51.351

flac-1.4.1-win64-znver3 (Case)
Code: [Select]
wrote 426050026 bytes, ratio=0,645
Global  Time =    52.222

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=haswell)

Code: [Select]
wrote 426050035 bytes, ratio=0,645
Global  Time =    50.218

FLAC-1.4.1-git-6abf272-20221021 (john33)
Built on October 21, 2022, GCC 12.2.0
(Code Base : 1.4.1) (-Ofast -m64 -march=znver2)

Code: [Select]
wrote 426050035 bytes, ratio=0,645
Global  Time =    51.782
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-21 17:22:26
My i3-12100 must be a remarked Ryzen  :))

The only 12th generation Intel here? Plot thickens.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-21 17:54:47
Thanks. Here are results with some AVX-only builds.

Case GCC 12.2.0 (https://hydrogenaud.io/index.php/topic,123014.msg1016265.html#msg1016265)
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-znver2-GCC1220.zip
Total encoding time: 1:13.328, 29.32x realtime
425513429 bytes

znver3 (https://hydrogenaud.io/index.php/topic,123014.msg1016407.html#msg1016407)
Total encoding time: 1:11.891, 29.91x realtime
425513429 bytes

http://www.rarewares.org/files/lossless/flac-1.4.1-x64-AVX2%20-GCC1220.zip
Total encoding time: 1:12.250, 29.76x realtime
425513472 bytes

Case Haswell (https://hydrogenaud.io/index.php/topic,123014.msg1016228.html#msg1016228)
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

It seems that the Ryzen builds have no compatibility issue with my Intel CPU.
No joke. Looks like a znver3 build would ever be better.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-22 14:35:17
flac 1.4.2 arrived. Here my attempt with GCC 12.2.0 and the same flags as before. One build with disable-asm-optimizations and faster 16bit encoding for use in CUETools for example.
btw. configure spits out "unrecognized options: --enable-sse" Is it meant to be that way and the compiler decides? In that case different CPU versions could benefit more from the compiler choices?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-22 15:46:05
btw. configure spits out "unrecognized options: --enable-sse" Is it meant to be that way and the compiler decides? In that case different CPU versions could benefit more from the compiler choices?
--enable-sse was a misnomer. It was actually 'force sse2'. This option has been removed. See here (https://github.com/xiph/flac/issues/486) and the changelog (https://github.com/xiph/flac/blob/master/CHANGELOG.md).

It didn't do anything for 64-bit compiles anyway, only for 32-bit compiles.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-22 15:49:58
Thanks. Now Clang 15.0.3 (MSYS2) creates faster binaries for the first time. Attached some.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-22 16:32:24
Looks like -q is quite predictable for high bitrate CDDA transcodes. Here are some files from different lossless formats (ape, flac, tak, tta, wv) transcoded to flac, sorted by their original bitrates.

1000-1177kbps compressed, 119 files, 12h55m50s

-8 -q8
6,305,740,006 bytes

-8 -q9
6,305,212,370 bytes

-8 -q10
6,305,587,831 bytes

-8
6,307,265,332 bytes

950-999kbps compressed, 142 files, 10h8m32s

-8 -q9
4,458,162,567 bytes

-8 -q10
4,457,902,583 bytes

-8 -q11
4,458,321,772 bytes

-8
4,459,051,184 bytes

Lower bitrate files are much harder to predict, -p will make more sense.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-22 18:15:57
Deathblow with -p

24/48, 8 wav files, multi-thread, -8p

flac1021wombat-noasm
Total encoding time: 1:12.704, 176.12x realtime

flac1021wombat
Total encoding time: 0:59.812, 214.08x realtime

Same files but 16/48

flac1021wombat-noasm
Total encoding time: 0:29.735, 430.62x realtime

flac1021wombat
Total encoding time: 0:29.656, 431.77x realtime
flac1021znver2john33
24/48
Total encoding time: 1:03.859, 200.51x realtime
16/48
Total encoding time: 0:29.985, 427.03x realtime

My i3-12100 must be a remarked Ryzen  :))
flac 1.4.1 Case GCC 12.2.0
24/48: 1:02.718, 204.16x realtime
16/48: 0:32.359, 395.70x realtime

flac 1.4.2 Wombat Clang 15.0.3
24/48: 1:01.844, 207.04x realtime
16/48: 0:30.000, 426.82x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-22 18:51:21
Nice. For 16-44.1 GCC 12.2.0 and disable asm is the fastest. Clang does bad with it disabled. Will be interesting how fast Case and his clean enviroment does. I depend on MSYS2.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-23 02:40:54
Pic favorites out of packages :)

Ryzen 5900x
-8 -p single file 24-96
-8 -p single file 16-44.1
metaflac Replaygain 18,6GB Hibitrate files

Clang
28.75x realtime
112.03x realtime
2:20 minutes

GCC
27.97x realtime
106.54x realtime
2:09 minutes

GCC disable-asm-optimizations
21.48x realtime
132.45x realtime
2:09 minutes
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-23 12:13:26
Looks like -q is quite predictable for high bitrate CDDA transcodes. Here are some files from different lossless formats (ape, flac, tak, tta, wv) transcoded to flac, sorted by their original bitrates.
950-999kbps compressed, 142 files, 10h8m32s

-8 -q9
4,458,162,567 bytes

-8 -q10
4,457,902,583 bytes

-8 -q11
4,458,321,772 bytes

-8
4,459,051,184 bytes

Lower bitrate files are much harder to predict, -p will make more sense.
All flac 1.4.2, multi-thread. They all have same file sizes.

Upper: -8p
4455579306 bytes

Lower: -8 -q10 -b2880
4456433345 bytes

Wombat GCC 12.2.0
Total encoding time: 1:19.875, 457.10x realtime
Total encoding time: 0:28.890, 1263.81x realtime

Wombat GCC 12.2.0 noasm

Total encoding time: 1:23.219, 438.74x realtime
Total encoding time: 0:29.890, 1221.53x realtime

Wombat Clang 15.0.3
Total encoding time: 1:24.375, 432.72x realtime
Total encoding time: 0:29.797, 1225.34x realtime

Xiph
Total encoding time: 1:26.812, 420.58x realtime
Total encoding time: 0:31.203, 1170.13x realtime

Finally different from a real Ryzen.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-23 12:53:19
24-bit, 88.2-352.8kHz, 14 files, multi-thread, -8p

Wombat GCC 12.2.0
Total encoding time: 1:40.266, 40.59x realtime
1916812600 bytes

Wombat GCC 12.2.0 noasm
Total encoding time: 1:54.453, 35.56x realtime
1916812600 bytes

Wombat Clang 15.0.3
Total encoding time: 1:39.859, 40.76x realtime
1916812631 bytes

Xiph
Total encoding time: 1:41.250, 40.20x realtime
1916812621 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-23 14:07:37
I'll have a go too.

Here are the results for testing the 'Xiph' Win64 binary

Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 19:20:41
High resolution coming up. Prepare to be impressed if you haven't already seen what 1.4.x can do to high resolution.

* No classical music in this corpus - that behaves different (much smaller benefits from going above -7), so this is arguably a bit 1.4-friendly
* All stereo. Vast majority 96/24; a little bit of it is 88.2, and one track is 96/16.
* 178 files. Nearly all my non-classical high resolution stereo downloads. (DSD test files excluded, but who the hell uses those for anything but WavPack worship?)
* File sizes with tags removed. Which I forgot to do about the FLACs, so I removed afterwards, saw a decrease of 22 670 684, and adjusted. I think this means padding is removed too. Maybe a bit unfair to the APEv2 tagged formats, which don't need padding, but more interesting to discussing the codecs themselves. (But, I used MD5 ... as if that matters much.)
* Everything that isn't stated as a different codec or as 1.3, are 1.4.1 or 1.4.2

64.286%ALAC (refalac 1.75)
63.589%FLAC 1.3 at -5
62.647%TAK -p0
62.058%FLAC 1.3 at -8e
61.799%-3 comfortably beats old -8pe too, not only -8e
61.477%TTA
60.746%Monkey's Normal
60.745%Monkey's High
60.798%WavPack -hx
59.920%-5
59.868%Monkey's Insane
59.742%Monkey's Extra High
59.571%TAK -p1
59.313%MPEG-4 ALS at default
58.979%-7
59.047%TAK -p2
58.673%-8e is faster and compresses better than -8p. but there are better options
58.649%avoid this -8pe. it takes ten times -8e
58.560%-7r7 -A "subdivide_tukey(6)"
58.525%-7r7 -A "subdivide_tukey(6)" -l 13
58.525%-7r7 -A "subdivide_tukey(6)" -l 14 (yes slightly smaller than -l 13)
58.489%-7r7 -A "subdivide_tukey(6)" -l 15
58.477%about -8e speed: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14
58.475%-7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14 -b 8192
58.454%-7r7 -A "subdivide_tukey(7);tukey(7e-2)" -l 16 -b 8192
58.059%WavPack -hx4
57.879%TAK -p3
57.753%TAK -p4m
57.412%OptimFROG --preset 2 (default setting)
.
Evidence from this and some other non-rigorous testing on a part of these 178 files:
* This is damn good, although it's not gonna touch TAK nor WavPack -hx4 (where for the WavPack I tried only those two settings, -hx and -hx4 this time).
Actually, if I take the high-rez part of ktf's comparison (http://audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%205%20-%20hires.html) and "manually" imagine a FLAC improvement like this, it doesn't seem to beat TAK -p1, so this is likely "more FLAC-friendly material" - relatively at least.
* That "all the sevens" thing: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" - with or without -l 14 (which is a "seven" good enough to remember) - were at first arbitrarily chosen to see "what does it take to get around -8e time". As you see, it is much better ... urhmh ... to the extent anything around there is "much".
* -8e beats -8p at both time and size. Maybe surprising - to those who haven't already tested it. ( @bennetng , you just tested hi-rez -8p: how dows it work with your material?)
* -l 13 to -l 15 have something to them, but careful: It does not seem to be the case directly off -7 or -8. Say -8 -l 13 is not good, but -8 -A [something slow] -l 13 is. A bit of testing indicates that -l 13 starts saving space at -A subdivide_tukey(5) and -l 14 at (6).
With high-res classical music, -l 13 is the setting that improves over -7.
* -b 8192 also needs "-A [something slow]", it seems also to do harm when applied to -7 or -8 plain. But it doesn't help much here.
* -r7 is a good thing, but not at my high-resolution classical music; there, the sixth and seventh order are seldom used at all. (But at worst I found, -r7 at classical makes for 0.2 parts per million in size and only costs a bit of time, so ... if you want one monster slow setting, include -r7. -r8 doesn't improve much over -r7 it seems.)
* Monkey's wasn't developed for high resolution. Well we knew that - none of the codecs were. But however, on the Planet of the Apes there is nothing to do about it within a given compression mode: then a given signal yields a given encoded bitstream with no room for improving anything but speed.

Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-23 19:35:24
What are logical values for the x_tukey options?  Seems I can put anything and the encoder accepts it.  I see above you're using (7e-1), I've seen (3/2e-1).  I tried something random like (9/7z-1) and it works.  I've had good results using whole numbers, but what about these other values?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-23 20:01:38
I tried something random like (9/7z-1) and it works.
It is a bit tricky that the encoder accepts anything and silently drops stuff it doesn't understand.

If you want to know what it does, read the explanation at the bottom of this page: https://xiph.org/flac/documentation_tools_flac.html

TL;DR: for starters, just use whole numbers like subdivide_tukey(5) or something. If you feel like it, you can specify a second fraction between 0 and 1, like subdivide_tukey(5/0.2) The second value is locale-specific, so is subdivide_tukey(5/0,2) for many non-English PCs. Using scientific notation (2e-1) is a way around that. For other apodizations, see the linked document.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-23 20:08:14
* Monkey's wasn't developed for high resolution. Well we knew that - none of the codecs were. But however, on the Planet of the Apes there is nothing to do about it within a given compression mode: then a given signal yields a given encoded bitstream with no room for improving anything but speed.
If FLAC wasn't changed in a non-backwards compatible way back in 2007 (https://www.ietf.org/archive/id/draft-ietf-cellar-flac-07.html#name-addition-of-5-bit-rice-para), FLAC would have done terrible at 24-bit too. Luckily that change went in before Josh left. Now it is way too late to make such a change.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 20:30:17
Yeah, maybe I should not ask more questions on that document by now, but ... C3, you set the limit at 16 bits. How about 20?
Asking because the hdcd.exe utility appeared around 2007. So quite soon there would indeed be quite a few CD-sourced 24 bit (with at least four wasted) files.
And so if that were a problem, it would likely have manifested itself - unless reference FLAC would use Rice-4 on those signals then.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 20:56:41
I've seen (3/2e-1)
Just a point: "/" is not a division slash, it is a separator between arguments, where the first is mandatory.
tukey(P) takes only one argument in, and that is a number between 0 and 1. The subdivide_tukey can be specified as subdivide_tukey(N) and optionally subdivide_tukey(N/P) - but then again, the "/P" has nothing to do with division. As ktf says, for starters stick to N and remember that higher N will slow down.

What this tukey function does? For the block of the signal - 4096 samples, typically - it keeps the middle 1-P fraction and it downweighs the beginning and end according to a cosine function. Turns out, it typically gives a much better predictor than not applying any weight - that would be "rectangle".
The subdivide_tukey "generates more functions" in a way that recycles lots of calculations. It takes time to try them all, but it doesn't make for more complicated decoding. They are several simple attempts, and the encoder picks the one that happens to fit best. The decoder doesn't know how hard the encoder tried.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-23 21:05:30
@bennetng , you just tested hi-rez -8p: how dows it work with your material?
Files in Reply #208 contain classical, jazz and pop stuff but in general -b is much more important than -p. Acoustic / unplugged materials as usual may benefit from higher -b, for 88.2k and above I would try 6144 to 16384 for these genres. Amplified / electronic / loudness war hi-res files still prefer lower -b. When -b is wrong, windowing also works poorly.

I also found something about decoded HDCD, I mean the "real" tracks which really make use of transient filter and peak extension. flac seems pretty good at dealing with HDCD. I used flac -a and see many wasted bits in a decoded HDCD image.

I investigated wasted bits with a normal 16-bit file, using foo_dsp_utility > Scale. 24-bit output with 0.5 scale is essentially bit shift, so bitrate almost remain the same, but then I tried 0.75, 0.625 and 0.875 and flac can still get a lot of wasted bits. However, with dither or something like -0.1dB gain flac can no longer see wasted bits. foo_dsp_utility > Add Noise kills wasted bits as well, like 0.000001 noise with 16 to 24-bit conversion.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-23 21:22:41
Files in Reply #208 contain classical, jazz and pop stuff but in general -b is much more important than -p.
Comparing -8e to -8p?
For CDDA, -8p is better and -8e is (not always (https://hydrogenaud.io/index.php/topic,122949.msg1015422.html#msg1015422)) so much outdone that you wouldn't use it.
For high resolution, -e cannot be so easily written off, at least it is better than -p - in my tests, that is.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-10-24 02:12:16
Comparing -8e to -8p?
Only one of the 20 24-96 albums i use for the Replaygain benchmark comes out smaller with -8 -e vs -8 -p.
The size difference is not so much but speed is indeed.
The GCC version is much faster in multithreading with these as the Clang version if anyboby wants to know.

-8 -p
233.22x realtime
19.713.844.242 Bytes

-8 -e
292.17x realtime
19.721.033.993 Bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-24 06:25:41
Yeah, maybe I should not ask more questions on that document by now, but ... C3, you set the limit at 16 bits. How about 20?
That bit is directly what libFLAC does too. Seems silly to change it when there are already so many versions out that do this at 16 bit. Also, it would probably hurt compression a bit and ffmpeg doesn't care anyway, it even uses 5-bit Rice parameters for 16-bit audio in extreme cases.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 07:56:10
Here is what I can do with -e: resample a CDDA album to 24/88.2 with different resamplers. Care was taken to avoid clipping.
Code: [Select]
1052063534 SoX best 8p.flac
1050564778 SoX best 8e.flac

1214873052 RetroArch normal 8p.flac
1215123562 RetroArch normal 8e.flac

1335883144 RetroArch lower 8p.flac
1336339394 RetroArch lower 8e.flac
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-24 08:06:48
This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements (https://hydrogenaud.io/index.php/topic,120158.msg1014265.html#msg1014265).
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 08:22:49
No contradiction because the difference is night and day when compared to 1.3.x. No -p or -e.

1052844799 SoX best flac 142 -8.flac
1257694431 SoX best flac 134 -8.flac
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2022-10-24 09:39:54
Don't want to crash the hires party, but here are some test results with v1.4.2 binaries floating around here.
As always, CPU is Intel Core i7-8700 CPU @ 3.20GHz, test corpus is mostly classic rock CDDA material).
Code: [Select]
FLAC Binary: xiph-142\flac.exe (299520 bytes)
FLAC Option: -7
- Average time =  27.940 seconds (5 rounds), Encoding speed = 386.97x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-gcc1220-Ofast+manyflags-noasm-wombat_2022-10-23.exe (665600 bytes)
FLAC Option: -7
- Average time =  22.560 seconds (5 rounds), Encoding speed = 479.25x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-gcc1220-Ofast+manyflags-wombat_2022-10-23.exe (737280 bytes)
FLAC Option: -7
- Average time =  25.519 seconds (5 rounds), Encoding speed = 423.68x
- FLAC size = 1.167.014.374 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac142-x64-Clang1503-Ofast-wombat_2022-10-23.exe (613376 bytes)
FLAC Option: -7
- Average time =  24.699 seconds (5 rounds), Encoding speed = 437.75x
- FLAC size = 1.167.014.372 bytes (= 61,188% of WAV size, ~863 kbps)
So here wombat's gcc build w/noasm is the fastest, on-par with his 141-fastmath build from 2022-10-14.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-24 10:26:56
This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements (https://hydrogenaud.io/index.php/topic,120158.msg1014265.html#msg1014265).
The improvements in 1.4.0 weren't related to the guesstimation. 1.4.0 improved the accuracy with which predictors were formed, the guesstimation (that -e circumvents by brute-forcing) is in which order to pick.

FLAC's LPC encoding works by first calculating autocorrelation. These calculated autocorrelation numbers are then crunched to form a set of predictors, one for each prospective LPC order (for preset 8 that is order 1 through 12). These predictors have become more accurate with the release of 1.4.0. However, the encoder still has to guess which order will result in the smallest representation. This guesstimation remains unchanged.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 11:26:23
@ktf what is the problem with this file?
https://hydrogenaud.io/index.php/topic,123219.msg1018107.html#msg1018107

BTW... (https://hydrogenaud.io/index.php/topic,123025.msg1018053.html#msg1018053)
Code: [Select]
34586615 -8p.flac
34590833 -8 -b2304 -q10.flac
34598936 -8e.flac
34606376 -8.flac
50697404 Desert Rose.wav
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 17:15:10
Try this with 24-bit and >= 88.2kHz, ideally with appropriate -b:

-8 -A "subdivide_tukey(3/2e-1);welch;hann;flattop"

Much faster than -8e and seems to give good results, including "real" and "fake" hi-res and DSD transcodes with appropriate ultrasonic filtering. May not work well with clipped and loudness war hi-res though. -8p for 24-bit is just too slow. -8e is about two times as fast but still slow.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-24 17:48:34
This is a slightly puzzling. -e being helpful in particular when there is no high frequency content suggests that here the guesstimation procedure isn't very good. Yet it is precisely in those cases where 1.4 makes for the bigg improvements (https://hydrogenaud.io/index.php/topic,120158.msg1014265.html#msg1014265).
The improvements in 1.4.0 weren't related to the guesstimation. 1.4.0 improved the accuracy with which predictors were formed, the guesstimation (that -e circumvents by brute-forcing) is in which order to pick.

Yes, but: for CDDA it seems you (& the rest of the developers) effectively killed the need for -e - and without touching the guesstimation algorithm, then how?
I can only guess that the better you hit (and double precision does that!), the closer you get to what FLAC reasonably can achieve, and the smaller are the improvements available by any means.
Now with the files I tested above, FLAC can beat TAK -p2. Not proving anything, but mildly suggesting that at least here, you really hit something close to optimum - but all for sudden there is a brute-force switch that appears more attractive than for CDDA.
And that is kinda puzzling.

Also -p not doing well ... actually, I might be fooled here by -p having become damn slow, which leads to writing it off as "not worth it". Question: are the new versions doing the -p routine entirely in double precision even after precision is truncated down to 15...5 bits - and can that explain the slowdown? (And if so: is that even necessary? Is there anything gained to stay in double precision after you have calculated a proto-predictor to be rounded off?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-24 18:30:28
58.477%about -8e speed: -7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14
58.475%-7r7 -A "flatopp;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" -l 14 -b 8192
Did you really type "flatopp"?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-24 19:30:51
It is a bit tricky that the encoder accepts anything and silently drops stuff it doesn't understand.

If you want to know what it does, read the explanation at the bottom of this page: https://xiph.org/flac/documentation_tools_flac.html

TL;DR: for starters, just use whole numbers like subdivide_tukey(5) or something. If you feel like it, you can specify a second fraction between 0 and 1, like subdivide_tukey(5/0.2) The second value is locale-specific, so is subdivide_tukey(5/0,2) for many non-English PCs. Using scientific notation (2e-1) is a way around that. For other apodizations, see the linked document.

Just a point: "/" is not a division slash, it is a separator between arguments, where the first is mandatory.
tukey(P) takes only one argument in, and that is a number between 0 and 1. The subdivide_tukey can be specified as subdivide_tukey(N) and optionally subdivide_tukey(N/P) - but then again, the "/P" has nothing to do with division. As ktf says, for starters stick to N and remember that higher N will slow down.

What this tukey function does? For the block of the signal - 4096 samples, typically - it keeps the middle 1-P fraction and it downweighs the beginning and end according to a cosine function. Turns out, it typically gives a much better predictor than not applying any weight - that would be "rectangle".
The subdivide_tukey "generates more functions" in a way that recycles lots of calculations. It takes time to try them all, but it doesn't make for more complicated decoding. They are several simple attempts, and the encoder picks the one that happens to fit best. The decoder doesn't know how hard the encoder tried.

Thanks for the information.  I did a test of different combinations of options, so see what kind of times vs compression I would get.  I started the test before I asked the question, and it took over a day to complete.  I had put in a couple random values for subdivide_tukey(X/Xe-1) It looks like "subdivide_tukey(21/15e-1)" shouldn't work, as that would exceed what should work for tukey, but it did seem to be helpful here.

Code: [Select]
Ratio:      Size:             Enc Time   Options used
59.41%   464.176M   8:32:57    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(21)  
59.41%   464.178M   5:15:19    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(21)  
59.41%   464.185M   5:34:39    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(17)  
59.41%   464.187M   3:29:06    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(17)  
59.42%   464.205M   3:15:35    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(13)  
59.42%   464.206M   2:03:37    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(13)  
59.43%   464.281M   31:07.50    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(21/15e-1)  
59.43%   464.281M   46:29.04    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(21)  
59.43%   464.283M   31:08.72    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(21)  
59.43%   464.292M   30:45.12    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(17)  
59.43%   464.293M   20:43.93    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(17)  
59.43%   464.297M   20:36.87    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(17/12e-1)  
59.43%   464.315M   18:20.15    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(13)  
59.43%   464.316M   12:19.42    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(13)  
59.43%   464.320M   12:20.79    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(13/9e-1)  
59.43%   464.342M   1:35:41    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(9)  
59.43%   464.343M   23:23.17    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(21)  
59.43%   464.344M   59:58.54    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(9)  
59.44%   464.352M   15:32.61    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(17)  
59.44%   464.373M   9:18.64    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(13)  
59.45%   464.444M   30:08.22    -b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(5)  
59.45%   464.447M   19:02.92    -b 4096 -m -l 12 -r 7 -ep -A subdivide_tukey(5)  
59.45%   464.466M   6:05.76    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(9/6e-1)  
59.45%   464.477M   9:01.27    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(9)  
59.45%   464.479M   6:05.86    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(9)  
59.46%   464.525M   4:39.14    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(9)  
59.47%   464.595M   2:00.80    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(5/3e-1)  
59.47%   464.598M   2:55.68    -b 4096 -m -l 12 -r 8 -p -A subdivide_tukey(5)  
59.47%   464.600M   2:00.44    -b 4096 -m -l 12 -r 7 -p -A subdivide_tukey(5)  
59.47%   464.644M   1:35.94    -b 4096 -m -l 12 -r 6 -p -A subdivide_tukey(5)  
59.48%   464.661M   6:08.93    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(21)  
59.48%   464.663M   4:25.36    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(21)  
59.48%   464.670M   4:09.56    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(17)  
59.48%   464.671M   3:01.99    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(17)  
59.48%   464.691M   2:34.32    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(13)  
59.48%   464.692M   1:54.71    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(13)  
59.48%   464.723M   3:34.07    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(21)  
59.48%   464.730M   2:28.18    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(17)  
59.49%   464.748M   1:33.87    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(13)  
59.50%   464.849M   1:21.94    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(9)  
59.50%   464.850M   1:02.65    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(9)  
59.51%   464.896M   0:52.11    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(9)  
59.51%   464.962M   0:32.64    -b 4096 -m -l 12 -r 8 -A subdivide_tukey(5)  
59.51%   464.963M   0:26.19    -b 4096 -m -l 12 -r 7 -A subdivide_tukey(5)  
59.52%   465.009M   0:22.85    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(5)  
59.54%   465.197M   0:12.73    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(3)   # Same as preset -8 #
59.59%   465.591M   0:08.72    -b 4096 -m -l 12 -r 6 -A subdivide_tukey(2)   # Same as preset -7 #
59.74%   466.717M   0:07.24    -b 4096 -m -l 8 -r 6 -A subdivide_tukey(2)   # Same as preset -6 #
59.90%   467.965M   0:05.40    -b 4096 -m -l 8 -r 5   # Same as preset -5 #
64.92%   507.177M   0:03.73    -b 1152 -l 0 -r 3 --no-mid-side   # Same as preset -0 #
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-10-24 21:24:30
Nice. For 16-44.1 GCC 12.2.0 and disable asm is the fastest. Clang does bad with it disabled. Will be interesting how fast Case and his clean enviroment does. I depend on MSYS2.

I noticed this as well.  With asm enabled, Flac performs much better, but still a little behind Flac compiled with GCC.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-24 21:49:05
Thanks for the information.  I did a test of different combinations of options, so see what kind of times vs compression I would get.  I started the test before I asked the question, and it took over a day to complete.  I had put in a couple random values for subdivide_tukey(X/Xe-1) It looks like "subdivide_tukey(21/15e-1)" shouldn't work, as that would exceed what should work for tukey, but it did seem to be helpful here.

There is a complication particular to subdivide_tukey(N):
subdivide_tukey(1) is the same as a single tukey(.5), but subdivide_tukey(N) tapers off a fraction of .5/N (that slash is a division slash!).
It wants to generate N small tukey humps (and more!) - and with a small tukey hump, it doesn't make that much sense to taper away a big fraction of the total from every small one. But the consequence is that 21/15e-1 tapers off 1.5/21ths = 1/14ths or around 7 percent of the window, or around 3.5 percent at each end.

That  .5 divided by N is also the reason why I was testing something like
-A "subdivide_tukey(5/125e-1);tukey(666e-3)"
The subdivide_tukey tapers very little (that is, is fairly close to rectangle) so I combine it with a tukey that tapers a lot (that is, tapers as much as the left third and the right third and leaves only the middle third un-downweighted). Why do that? Making the functions different. There is no use trying two identical functions, and very little use trying two near-identical ones.
And that function is faster than subdivide_tukey(6/P).

Before partial_tukey, punchout_tukey and subdivide_tukey, you had to type in each and every function and it would only do one function per one you typed. subdivide_tukey(5) does a lot (and it does them faster than typing N individual into it).


Did you really type "flatopp"?
Damn, only one way to find out, and that is not done in two minutes ...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 07:54:50
I'd take a look at how the "best" settings compared to others with the attached simple signal :D

-b 4096 -m -l 12 -r 8 -ep -A subdivide_tukey(21)
4970861 bytes 3:22.843, 0.14x realtime

-b512 -r8 -q5
4583068 bytes 0:00.171, 175.43x realtime

WavPack hhx6
4376064 bytes 0:01.313, 22.84x realtime

APE insane
5018164 bytes 0:00.391, 76.72x realtime
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-25 09:21:51
I'd take a look at how the "best" settings compared to others with the attached simple signal :D
If the signal is so simple that 7 zip outperforms all audio codecs, I don't think is fit for such a test
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 11:05:34
Merzbow Pulse Demon and Venereology fit into this catagory. xz even works better than 7z.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-25 14:02:02
Merzbow Pulse Demon and Venereology fit into this catagory. xz even works better than 7z.
Venereolog'ing the compressors: https://hydrogenaud.io/index.php/topic,122040.msg1010086.html#msg1010086

The ultra-useless sac compressor could shave more than a quarter off the smallest OptimFROG.
I got xz down to 40 972 856, not beating more than the second-smallest .sac file. Well xz took only 20 seconds.
7z was around ten percent bigger than xz.

There must be some long-term repeating patterns there. Long-term as far as audio goes. AFAIunderstand, an OptimFROG block can be several seconds long (and an insane monkey nearly half a minute, without being able to make any sense out of that track). In the very least, xz got the flac file down by ten percent.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 17:02:46
With -l12 -r8 -b512 -q5 I got 424599071 bytes in Venereology at 207x speed.

Then the slow stuff:
424286964 bytes with subdivide_tukey(21)
424266692 bytes with subdivide_tukey(14)
424261436 bytes with subdivide_tukey(11/1)
424249707 bytes with subdivide_tukey(21/3)
424238652 bytes with subdivide_tukey(15/2)

11/1 is about two times faster than others, other settings have similar speed, at least in this test.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 19:16:08
* Monkey's Audio needs more bits than uncompressed PCM. TAK makes it below 1411 on a setting.
* 58176764: the .wav file
* 56535466 for the smallest .flac I could get
* 53920654 for the smallest .wv I could get
* 53222937 for the smallest frog
* 50613838 for sac at default
* 49046602 for the twelve-hour sac compression. But hey, it decodes in less than ten minutes. (Realtime playback? Please, no irrelevant questions! O:)  )
I see, only track 3 is used.
-l3 -b512 -r8 -q5 -e -A subdivide_tukey(15/2)
52292911 bytes
Higher -l does nothing for this track apart from slowing things down. The whole album would need a higher value though.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-25 19:46:42
Merzbow Pulse Demon and Venereology fit into this catagory. xz even works better than 7z.
I wouldn't call Merzbow proper material to base a test on, except when part of a well-rounded corpus. In this particular case, you have a signal for which 7z outperforms audio codecs by a factor of more than 10.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-25 20:20:49
I wouldn't call Merzbow proper material to base a test on, except when part of a well-rounded corpus. In this particular case, you have a signal for which 7z outperforms audio codecs by a factor of more than 10.
But you can see the potential of flac (vs other lossless audio formats, not 7z or xz) in these special signals when the optimal parameters are being used. Assume there is a multi-pass encoder (can't use pipe?) or a "long-term analyzer" does it mean there is no need for end users to guess the optimal settings? Not that I want something like this but just curious about it.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2022-10-25 20:53:46
I think implementing variable blocksize encoding would solve most of it. However, as the recent problems with variable blocksize encoding in CUETools have shown, this is no easy task. Many other things in FLAC are already being 'brute-forced'. It is rather high on my list because it would make the reference encoder cover more of the FLAC specification, which is something I pursue because of the IETF standardization effort.

I think that is a better idea than encoding a file several times with different fixed blocksizes to see which one results in the smallest file. -q is already brute-forced per-subframe when you use -p.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-25 21:55:49
variable blocksize encoding [...] is rather high on my list
And I was about to write that well, this makes a case for variable block size, which is now waiting in line somewhere between positions 2048 and 4096 on the to do list ...

Actually, with in particular partial_tukey(2) and thus subdivide_tukey(2), I've been thinking: if that is what the encoder selects - just dropping half of the block when designing the predictor - then that is a case where you would try to split the block in two, keep the predictor where it is good, and look for another for the rest? Same goes for the (4).

Here I am making an assumption that could be fiddled with, namely that we are not using "adjacent" samples to calculate the predictor of a certain frame. That isn't god-given either.

As for CUETools, their solution is to remove the variable block size option, https://github.com/gchudov/cuetools.net/pull/223 . So without it finding its way into reference flac, it won't be ... anywhere?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-25 22:07:32
... but for the Merzbow track, the magic is actually not in the -b, but in the -r. Try options
--lax  -r 15
That's right, no -8, no -p, no -e, no -b (edit: wrote that wrong!)
I got 51954953 bytes. Throwing in a -b 16383 reduces it slightly to 51923540 bytes.

The reason that -b512 looks so good, is not that it is a good block size - it is that it makes only a few samples per Rice partition within the subset's maximum order of 8. Relaxing that ...

So that gives another "potential case for" a variable block size.
* test -r9. If that improves, then dammit we are out of subset, and the easiest way to get back in is to halve the -b. One doesn't even have to recalculate the predictor to get that improvement.
* of course, a "smarter way" starting from current -7 and -8 would be to see if Rice pt order 6 is actually used and good for something for that frame. If it is, try order 7. If it helps, try 8. If that helps, ... previous item.

BUT: -r 8 only rarely improves, right?

Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-26 06:49:31
--lax  -r 15
That's right, no -8, no -p, no -e, no -b (edit: wrote that wrong!)
I got 51954953 bytes. Throwing in a -b 16383 reduces it slightly to 51923540 bytes.
Well 16384 "of course".
The reason for -r 15 was to try even bigger block size, but as I "discovered" that 65536 is invalid and max is 65535, a -1 snuck in when I typed it here. I wonder if anything can utilize the 15th order then, when 65536 is invalid.
51924581 for --lax -r 15 -b 32768.

So here is a weird thing:
51924584 for --lax -r 13,14 -b 32768
52709806 for --lax -r 13,13 -b 32768 and same for -r 12,13, and -r 12,12 is as "bad as" 54771887
52765170 for --lax -r 14,14 -b 32768 and unsurprisingly the same for -r 14,15. But ... but ... same also for -r 15,15?! (Maybe I need to compute some Rice encodings by hand?)

Now 16384
51923763 for --lax -r 12,13 -b 16384 (good!)
52710096 for --lax -r 11,12 -b 16384 which is lower than the next line, so here the 11 is used while for 32768 the 12 was ... well if it was, it didn't impact size.
52710316 for --lax -r 12,12 -b 16384
52763867 for --lax -r 13,13 -b 16384 and unsurprisingly the same for -r 13,14. Also for -r 14,14.

Well impact isn't monstrous. Still a bit strange.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-26 09:04:23
Another example with non-experimental music:
https://youtu.be/wJW9AIwHSGY

40388588 16 Syvalion arrange MMIX.wav
38322604 xz.xz
34931072 7z.7z
33814392 --lax -l15 -8 -r15 -b32768.flac
33742999 --lax -l32 -8 -r15 -b32768.flac
33691294 --lax -l15 -8 -r10 -b8192.flac
33627273 -8.flac
33627254 -8 -r8.flac
33627254 -8 -r7.flac
33622698 --lax -l15 -8 -r8.flac
33617925 --lax -l15 -8 -r8 -b3456.flac
33602205 -8 -r7 -q7.flac
33597671 --lax -l32 -8 -r15 -b2304.flac
33594087 --lax -l15 -8 -r8 -q7.flac
33594087 --lax -l15 -8 -r7 -q7.flac
33588219 -8pe.flac
33578671 -8 -r7 -q7 -b2304.flac
33574773 --lax -l15 -8 -r7 -q7 -b2304.flac
33568744 --lax -l20 -8 -r7 -b2304 -q7.flac
33558642 --lax -l32 -8 -r7 -b2304 -q7.flac
33558642 --lax -l32 -8 -r15 -b2304 -q7.flac

Basically, Merzbow is mostly clipped at max/min PCM values.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-26 21:47:26
I found another case where -e is much better than -p. I converted my test signal archive (sine, multitone, sweep etc) to flac from different sources including analog and digital I/O of sound cards, smartphones, cassette decks, disc players, motherboard codecs with GPU noise injected and so on. The sizes listed below are at -8.

 2371488 ALC892_44kHz.flac
 2551014 ALC892_48kHz.flac
 4632494 ALC892_96kHz.flac
11118051 Bifrost toslink_192kHz.flac
 7752605 Bifrost toslink_96kHz.flac
 8721549 Bifrost USB 96kHz.flac
 5805120 DN-C635 to Multiface II analog.flac
 2613358 headphone 44k.flac
 2791264 headphone 48k.flac
 5096341 headphone 96k.flac
 1305056 Hi Gain 24bit 44k.flac
 1405998 Hi Gain 24bit 48k.flac
 4874672 HTC one sv 1644.flac
 4954664 HTC one sv 1648.flac
  692250 imd sweep.flac
 1661020 MZ-R3 J-Test 1644.flac
 2064002 no gpu_44kHz.flac
 2189291 no gpu_48kHz.flac
 3859889 no gpu_96kHz.flac
 2504677 rca 44k.flac
 2676443 rca 48k.flac
 5714205 rca 88k asio 5ms.flac
 4838871 rca 96k.flac
 3478233 rca CD analog.flac
 9674809 rca dolby off 2496.flac
10425569 rca dolby on 2496.flac
 2328929 Result_44kHz.flac
 2420994 Result_48kHz.flac
 4837262 Result_96kHz.flac
 1994047 SMSL_48kHz.flac
 3280415 via2444.flac
 1433300 via2444spdif.flac
 3503452 via2448.flac
 1535871 via2448spdif.flac
 6628764 via2496.flac
 2459044 via2496spdif.flac

Total size:

-8
146195011 bytes

-8p
145864737 bytes

-8e
145252942 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-26 22:37:19
If you compress the at most 48 kHz vs > 48 kHz separately?
For 88.2 and up, my files compress to smaller size with -8e than -8p, although I see Wombat's 20 GB fare the opposite way.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-27 05:38:33
I refined the list to exclude 16-bit files and digitally recorded files (SPDIF), and added more analog files that I found in my archive, so all of them are 24-bit now. 24-bit means the recording bit-depth, the playback device may only support 16-bit or has no bit-depth (e.g cassette deck).

88.2 to 192k, but mostly 96k:

-8
200533202 bytes
-8p
199737874 bytes
-8e
199401868 bytes

44 and 48k:

-8
118779872 bytes
-8p
118049531 bytes
-8e
117806233 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-27 06:55:26
These test files have no copyright concern so I uploaded them, I may remove them later though.
https://1drv.ms/u/s!AvzB71jO7t0-gYwqfkhypQsi5ZNCJQ?e=MIITzy
Some files are better with 8e while others are better with 8p. The archive above was encoded with --lax to reduce the upload size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-27 16:32:34
Just encoded the original digital signals...

H:\>flac -f *.wav -8e
Test signal (44 kHz 16-bit).wav: wrote 1742181 bytes, ratio=0.153
Test signal (44 kHz 24-bit).wav: wrote 2819275 bytes, ratio=0.165

H:\>flac -f *.wav -8p
Test signal (44 kHz 16-bit).wav: wrote 1820499 bytes, ratio=0.160
Test signal (44 kHz 24-bit).wav: wrote 2950729 bytes, ratio=0.173

H:\>flac -f *.wav -8
Test signal (44 kHz 16-bit).wav: wrote 1904695 bytes, ratio=0.168
Test signal (44 kHz 24-bit).wav: wrote 3007455 bytes, ratio=0.176

Looks like -e just likes clean and stable signals with a lot of empty spectral content regardless of bit-depth. Record through the analog chain will somewhat pollute these test signals, but not dirty enough to change the overall results that -e works better with these signals.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-28 14:33:45
Here is an example that -p works better than -e:
https://www.soundliaison.com/index.php/studio-masters/856-ray-carmen-gomes-inc
The 768kHz file is free for download. --lax is required for 768kHz files.
X

PS H:\> measure-command{h:\flac -f *.wav --lax -8e -b16384}|select totalseconds
wrote 641447814 bytes, ratio=0.663

TotalSeconds
------------
  52.4976358


PS H:\> measure-command{h:\flac -f *.wav --lax -8p -b16384}|select totalseconds
wrote 641307624 bytes, ratio=0.663

TotalSeconds
------------
   68.536428


The additional windows are not very effective if the spectrum does not have a smooth decaying trend in higher frequencies. Blindly use higher subdivide_tukey(n) is just a waste of time.

PS H:\> measure-command{h:\flac -f *.wav --lax -8 -b16384 -A "subdivide_tukey(6)"}|select totalseconds
wrote 641444963 bytes, ratio=0.663

TotalSeconds
------------
  37.5487237


PS H:\> measure-command{h:\flac -f *.wav --lax -8 -b16384 -A "subdivide_tukey(5);tukey(75e-2);gauss(5e-2);blackman"}|select totalseconds
wrote 641444667 bytes, ratio=0.663

TotalSeconds
------------
  33.4128744


PS H:\> measure-command{h:\flac -f *.wav --lax -8 -b16384 -A "subdivide_tukey(5);welch;hann;flattop"}|select totalseconds
wrote 641443864 bytes, ratio=0.663

TotalSeconds
------------
   33.674331


Usually, the quickest way to reduce size is increasing -l, but setting -l too high can harm decoding speed.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Bogozo on 2022-10-28 15:47:16
Here is an example that -p works better than -e:
https://www.soundliaison.com/index.php/studio-masters/856-ray-carmen-gomes-inc
The 768kHz file is free for download. --lax is required for 768kHz files.
Looks like conversion from DSD made without high-frequency noise filtering, not normal PCM.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-28 16:09:18
The file was originally recorded in DXD, then played through a Studer tape machine and re-digitized at 768kHz using an RME interface with AKM ADC. ADC these days are mostly multibit delta-sigma and therefore the rise of noise. You can also see a faint dip at 150kHz. It is the tape bias, but someone in ASR complained about it and Sound Liason filtered the bias tone.

Read this for the whole story:
https://www.audiosciencereview.com/forum/index.php?threads/finally-music-we-can-buy-in-768-khz-sampling-rates.29544/

You can see the rise of noise when the RME interface is operating at high sample rates even when using PCM.
https://archimago.blogspot.com/2019/03/measurements-look-at-audio-ultra-high.html
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-28 19:59:18
48 kHz sample rate. Heavy (quite heavy indeed) metal.

Questions:
* For high sampling rates, 1.4.x would lead to quite impressive improvements. For CDDA, 1.4.0 "at preset N" would by and large beat 1.3.x "at preset N+1" - for high resolution, new -3 or -4 would beat old -8pe. What about for 48 kHz?
tl;dr: New -4 beat old -8p (didn't try -8ep).
* Block size. Is it a good idea to go up to the next standard block size (namely -b 4608) to closer "maintain time per block"? (Relative to 4096 samples for 44.1 kHz.)
tl;dr: from -7 up it did improve, but at -8p the improvement was only 0.003 percent, so ... do you care?

Corpus: Took everything I had of lossless 48 kHz. foobar2000 reports 58 percent 24-bit and 42 percent 16-bit. 
Not a well-balanced corpus: Mostly heavier metal, indeed a quarter of the forty-ish GB came from one single publisher of doom/stoner samplers.

Results for 739 files, sorted by file size. All are 1.4.1 except those marked "1.3.4"
462301283893b4608
462230031753
451037146201.3.4 -8b4608
450987188251.3.4 -8
450756239831.3.4 -8pb4608
450678931801.3.4 -8p
450487638724b4608
450451736844
449846006675b4608
449786770315
Below this line,-b 4680 will improve
446100977277
446071122737b4608
445874267328
445857801548r7
445848594968r8
445835800238b4608
445825244138e
445817437578r7b4608
445808340288r8b4608
445786332978eb4608
445726637888r7 -A "subdivide_tukey(5)"
445679434108r7 -A "subdivide_tukey(5)" -b4608
445611326018p
445597567638pb4608
.
Not sure if 4608 is worth the effort compared to just encoding and be done with it, biggest impact here is 0.01 percent, but ... anyway, to the questions I raised, this test indicates the following:
* So -3 is not enough to slay 1.3.x, but it seems you don't have to go much above 44.1 to see how even low 1.4.x presets are better than anything 1.3 could accomplish.
* This is material where -r8 does matter, and that suggests that smaller block size could be advantageous.
* But still, -b 4608 improves once the predictor already is good enough, which requires 1.4.x. And with 1.4.x it does happen earlier (i.e. for lower subdivide_tukey) than for 96/24, see below. At default -5, -b 4608 was slightly harmful, but even at -7 it would help.
Actually, the biggest difference that -b 4608 did, was at the -8r7 -A subdivide_tukey(5); that was the only that crossed the 0.01 percent mark. 
But then the difference is down to 0.003 percent at -8p, so ... mostly academic interest this.

However the corpus may reduce the benefit of -b 4608. At least, for 96/24 it was the classical music that benefited the most from adjusting block size, and heavier music (like here) did not benefit as much.


High resolution coming up.
[...]
* -l 13 to -l 15 have something to them, but careful: It does not seem to be the case directly off -7 or -8. Say -8 -l 13 is not good, but -8 -A [something slow] -l 13 is. A bit of testing indicates that -l 13 starts saving space at -A subdivide_tukey(5) and -l 14 at (6).
With high-res classical music, -l 13 is the setting that improves over -7.
* -b 8192 also needs "-A [something slow]", it seems also to do harm when applied to -7 or -8 plain. But it doesn't help much here.

... for 48 kHz and 4608, it didn't need "something slow".
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-28 20:20:29
How about dividing your metals into two groups?
[1] A lot of fast drumming especially the higher pitched ones with strong transients (hi-hat, snare, rim shot...)
[2] Mostly heavy in guitar and bass but in general slower paced.
Does one of them benefit from a different block size than the other? (edit: including -b below 4096)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-29 12:14:54
-e requires fairly low noise to work. Spek's spectrogram only has about 14 bits of dynamic range. The attached noisy.flac looks identical to clean.flac which can be misleading. Better use other tools like SoX to view the spectrum.

H:\>flac -f *.wav -8p
clean.wav: wrote 562708 bytes, ratio=0.425
noisy.wav: wrote 687112 bytes, ratio=0.519

H:\>flac -f *.wav -8e
clean.wav: wrote 515394 bytes, ratio=0.390
noisy.wav: wrote 687255 bytes, ratio=0.519

The advantage of -e is gone after I added some -80dB noise.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-29 20:17:17
How about dividing your metals into two groups?
[1] A lot of fast drumming especially the higher pitched ones with strong transients (hi-hat, snare, rim shot...)
[2] Mostly heavy in guitar and bass but in general slower paced.
Does one of them benefit from a different block size than the other? (edit: including -b below 4096)

I took ~10 GB (238 files) from the "least distorted" end of it. 

Now higher block size is not that good:
* I had to go all the way to -A <something higher> again ...
* ... and enter a "p" in there, and 4096 rules.  Even with -8p -A <something higher>.

Everywhere in the following, -A is short for -A "subdivide_tukey(4/125e-3);tukey(7e-1);flattop".

10396908384   -7b2048
10390256536   -8b2048
10387465483   -8b2048 -A
10380212639   -8b2304 -A
10374886478   -8pb2048
10372850371   -8pb2048 -A
10370624591   -7b4608
10370154789   -7
10368208665   -8b3072 -A
10366619584   -8pb2304 -A
10364101954   -8b4608
10363963614   -8
10360997863   -8 -A
10360867169   -8b4608 -A 
<-- the only case where a different block size helps!
10359300604   -8pb3072 -A
10357464670   -8pb4608
10356736011   -8p
10354455085   -8pb4608 -A
10353968178   -8p -A

.

You see that -b 2048 degrades -8p -A "subdivide_tukey(4/125e-3);tukey(7e-1);flattop" down to worse compression than -7. 2304 is much better.

What included: Prog, heavy post-rock and not-so-growling guitars.  Not much Iron Maiden-alike metal - well, https://zephaniahband.bandcamp.com/track/destiny was part of it (not the album, this track from a sampler).  And there is music like https://houseofmythology.bandcamp.com/track/the-power-of-love from former black metal act Ulver.  (Yes former, just listen.)
And a couple of bootleg albums too (here is a long one, Nine Inch Nails & David Bowie: https://ninlive.com/shows/1995/19951011.html) and a couple of vinyl rips. 

What omitted: to give you an idea, https://bspliveseries.bandcamp.com/track/you-write-your-name-in-my-skin-live-2 .  Yes the drum machine provides for some transients, not all slow - but the guitar is not strummed in fast succession.  Lots of slow heavy music in the remaining 30 GB where -b 4608 could be of help.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-29 21:52:16
Thanks, good to have some data.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-30 09:44:54
Yeah, your results with lower block sizes [table below now includes a couple of -b 3456 too] have puzzled me a bit in general, as I only very rarely experience the same. So I tried a few more settings and found out that on this corpus, -r7 and even -r8 have some impact. That should indicate that by halving the block size, you get for free a "better" partitioning for Rice'ing - but that is not enough by far: halving block size inflates files by around .15 percentage points (tenfold what the -A thing improves!) Hm ... have you tried, whenever lower block size improves, whether -r7 or even -r8 would make for the same benefit?

Some more tests - to compare with other "minor impact" changes in parameters - are filled in to the table below.
Also some tests are not included in the table, as I scripted them with, *cough* different padding and I didn't bother to go back and re-do it. They are not comparable with the table, but they are comparable "with each other", so I give size orderings (worse to better):
-8eb2304  >  -8eb2304 -A  >  -8eb8192  >  -8eb8192 -A  >  -8eb4608  >  -8e  >  -8eb4608 -A  >  -8e -A  >  -8eb4608 -A
Then for -3/-5, it seems that 3000s fare well, but default is never far from best.
Then for -2, ordering: b512 > b1024 > b1152 default > b8192 with --lax > b3456 > b4608 > b2048 > b3072 > b3072 > b4096 > b2304.  I guess those who contemplate -l 0 have other concerns than .15 percentage points size impact though.

Table then, this time with compression ratios. Again the "-A" signifies -A "subdivide_tukey(4/125e-3);tukey(7e-1);flattop".

100,000%.wav
63,004%-8 --no-mid-side (i.e. dual mono)
62,075%-5b8192 --lax
62,026%-5b2048
61,971%-5
61,660%-8Mb2048
61,615%-8r2 -b2048
61,601%-7b2048
61,573%-8r4 -b2048
61,561%-8b2048
61,552%-8r8 -b2048
61,545%-8b2048 -A
61,507%-8r2
61,505%-7 -l 11
61,502%-8b2304 -A
61,489%-8M
61,470%-8pb2048
61,466%-8b8192 --lax
61,462%-7b3456
61,458%-8pb2048 -A
61,445%-7b4608
61,442%-7
61,431%-8b3072 -A
61,429%-8b3456
61,422%-8r4
61,421%-8pb2304 -A
61,413%-8b3456 -A
61,406%-8b4608
61,406%-8
61,403%-8r7
61,400%-8r8
61,388%-8 -A
61,387%-8b4608 -A
61,378%-8pb3072 -A
61,367%-8pb4608
61,365%-8pb3456
61,363%-8p
61,349%-8pb4608 -A
61,346%-8p -A
61,342%-8r9 -l 13 --lax
.

The table includes dual mono and -M (which selects decorrelation strategy adaptively) and carelessly I did not think over some of it being mono already - but that material (namely the NIN+Bowie bootleg) amounts only to 4.5 percent of the total file size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-30 12:31:56
I compiled a CDDA list that I thought would work best with -b3456 without using --lax. Turns out -b2304 won. They are mostly J-pop, not necessarily very fast but in general have percussion with good transients and not too much reverb.
Code: [Select]
https://youtu.be/wZ6D6ikU7Qs
https://youtu.be/L-0cJqZ5WU4
https://youtu.be/MM8RufZr5lw
https://youtu.be/pYnLO7MVKno
Total length 8h56m42s, around 63.98% of original size when compressed.

-8b4608 -r8
3635937892 bytes

-8b2048
3635016402 bytes

-8
3634663623 bytes

-8 -r8
3634586639 bytes

-8b4608 -r8 -A subdivide_tukey(5/2e-1)
3634481652 bytes

-8b3456
3634303829 bytes

-8b3456 -r8
3634260975 bytes

-8b3072 -r8
3634036597 bytes

-8p -b4608 -r8
3633576341 bytes

-8b2304
3633523804 bytes

-8b2304 -r7
3633489680 bytes

-8b2304 -r8
3633480956 bytes

-8b3456 -r8 -A subdivide_tukey(5/2e-1)
3633136956 bytes

-8b2304 -r8 -A subdivide_tukey(5/2e-1)
3632658660 bytes

Looks like 1024 and 1152 based block sizes are not really correlated to sample rates, otherwise -b2048 should not perform this bad. The -b3456 thing in my previous test may have some stuff with fewer transients. If you still want to try CUETools.Flake, -8 --vbr 4 will be more effective with -r 8 and -s search.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-30 13:11:33
Why -8b2304 is so much better than -8b2048, putting them on opposite sides of the 3456/4096 ...
... probably we are in for another brute force.

Edit: Also -r8 helps -8, much more than it helps -8b<lower>, "the 8th r" doesn't help that much over the seventh ...
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-10-30 16:46:03
I compiled a CDDA list that I thought would work best with -b3456 without using --lax. Turns out -b2304 won. They are mostly J-pop, not necessarily very fast but in general have percussion with good transients and not too much reverb.
Code: [Select]
https://youtu.be/wZ6D6ikU7Qs
https://youtu.be/L-0cJqZ5WU4
https://youtu.be/MM8RufZr5lw
https://youtu.be/pYnLO7MVKno
Total length 8h56m42s, around 63.98% of original size when compressed.
[...]
If you still want to try CUETools.Flake, -8 --vbr 4 will be more effective with -r 8 and -s search.
Files were transcoded form APE, flac and WavPack instead of wav, multi-thread.

flac 1.4.2

-8b2304 -r8 -p
Total encoding time: 2:05.453, 256.68x realtime
3630058884 bytes

-8b2304 -r8 -pe
Total encoding time: 24:46.406, 21.66x realtime
3629421254 bytes

CUETools.Flake 2.2.2 (MD5 mismatch in 2 files)

-8 -r 8 -b 4608
3635394735 bytes

-8 -r 8 -b 2048
3635167153 bytes

-8 -r 8
3634138483 bytes

-8 -r 8 -b 3456
3633820714 bytes

-8 -r 8 -b 3072
3633680354 bytes

-8 -r 8 -b 2304
3633559971 bytes

The fixed block sizes tests above all have around 500x speed.

-8 -r 8 --vbr 4
Total encoding time: 1:26.344, 372.94x realtime
3627508218 bytes

-8 -r 8 --vbr 4 -s search
Total encoding time: 3:54.438, 137.35x realtime
3625410640 bytes

So vbr is an amazing thing... if there is no MD5 mismatch. Perhaps just scan for integrity after encoding, and re-encode the corrupted files with another encoder. In fact, after seeing this glitch I even scan files encoded with the Xiph encoders, if I am going to delete the original.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-30 17:45:16
Encoding from .ape will skew the timings, as .ape takes even longer time decoding than encoding. But as long as conditions are equal for each run, cardinal time figures are nothing but indications anyway, in this thread where the number of compiles x CPUs probably match the number of FLAC options humans have ever hand-coded ...

In fact, after seeing this glitch I even scan files encoded with the Xiph encoders, if I am going to delete the original.
I always do. What if the process is aborted for whatever stupid reason, leaving a partial file?
Sure there is the -V , but for mass conversion: running a foo_bitcompare on all, to obtain one single line saying no differences, that is more idiot-proof than a human (myself) reading flac.exe's output.

Of course until fb2k v2 & foo_bitcompare are updated to treat 32-bit integer losslessly (that isn't the case yet I think?!) one has to use a different approach for those ... but then they aren't many. I don't have music in that format.

(And even with that zealous attitude of mine ... the first floating-point .wav's I downloaded, were Audition's format, and I should have WavPack'ed them using official wavpack.exe rather than through foobar2000.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-31 23:17:35
-0 does not pick the fastest block size!  Test done on CDDA with official 1.4.1 x64.

Since different block sizes have been tested, I did that for -0 and -2. Recall the difference between those is that -0 does dual-mono while -2 brute-forces the stereo decorrelation strategy. Both have an implicit -r3 which I have not touched.
I took all multiples of 512 and 576 up to 4608. Recall that -0 to -2 use 1152=2*576.

Computer: my friend's Ryzen-equipped not-so-expensive Acer consumer laptop, which delivers more consistent results than my Intels.
Corpus: As in my signature ... well nearly: by mistake one file had a second copy. 39 CDs. But I corrected sizes, they are 38.

Timings are median of 6 runs (i.e. average between the two middle ones); first I did each setting separate, three runs; then I did three runs of -0b512, three of -0b576 etc.
Results, thanks to https://theenemy.dk/table/ , are sorted by time.
"BS" - for "BigSlow" but surely intended to be read as "bullshit" yes - indicates it is both bigger & slower than the one immediately above. "bs" indicates that if all the "BS" were removed, this would become a BS.
0b3072275,1913 304 034 333
0b3456277,3213 304 683 018BS
0b2880277,3213 304 059 146bs
0b2560278,2813 305 036 984BS
0b2048278,2813 303 957 236
0b2304280,2113 301 667 605
0b1728283,9113 316 001 267BS
0b1536288,6013 321 899 365BS
0b1152292,0113 332 311 287BS
0b1024293,6413 342 207 656BS
0b4096295,9113 304 264 090bs
0b4608299,0613 307 182 585BS
2b1728298,2712 724 538 882
2b1536301,6512 730 340 579BS
2b2880301,3412 713 304 251
0b0512301,3413 442 182 994BS - well really, this is a "-0" down in the "-2" bunch
2b2560302,9012 714 138 123bs - really a capital BS, only "saved by" the "-0"
2b3072302,7812 713 394 065bs
2b2304304,2212 710 560 120
2b1152302,7812 740 658 828BS
2b3456303,0812 714 271 305bs
2b2048308,7512 712 682 405bs
0b0576317,1913 418 999 908BS - another "-0" here
2b1024323,5612 750 559 594bs - this too saved from "BS" by a "-0"
2b0512327,2412 851 045 240BS
2b0576357,1712 827 742 003bs
2b4608385,7312 717 232 770bs
2b4096386,2812 714 139 624bs
.
Inferences:  Well I don't really believe this to be any universal truth. Why should -0b2048 be faster than -0b2304 while -2b2048 is slower than -2b2304? But some patterns are obvious. For example, the "extremes" are quite slow, and not the best. 
* The smallest block sizes are obtained for -b2304 in both settings. I guess the only reason for -2 is to avoid using multiplication while still squeezing more bytes out, so ... there you go. And -b2304 was only a couple of percent off the fastest block size as well.
* -0 to -0b2304 saves 4 percent time and a quarter percent size. -0 to -0b3572 (tripling block size) saves six percent in speed then.

I have no idea why -2b4096 and -2b4608 are that slow.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-10-31 23:56:09
Block size impact on "-5" and "-7" speeds. (Atop -5 because -5 is default, atop -7 because -7 is good.)

Damn me I forgot to record sizes on this one, but a couple of manual checks indicate that nah, nothing here that is worth it in cost/benefit terms.
But "interesting" it may be, even if the time impact from default is just a few percent saved - and -b 3456 seems to be fastest both at -5 and -7.

Same computer, compile, files (CDDA) and setup as the previous posting, but this time it is only median of three runs, and this time I managed to remove the 39th. Sorted by time:

5b3456   385,554
5b3072   386,659
5b2880   389,014
5b2304   390,926
5b1728   391,627
5b2560   391,999
5b2048   393,695
5b1536   395,008
5b1152   403,659
5b4608   411,612
5b4096   413,726
5b1024   414,359
5b0576   457,707
5b0512   466,306

7b3456   566,989
7b3072   571,83
7b2880   571,947
7b2560   576,015
7b2304   581,654
7b2048   589,71
7b4608   590,918
7b4096   592,326
7b1728   594,442
7b1536   605,249
7b1152   636,17
7b1024   650,702
7b0576   777,917
7b0512   805,091


So whoever came up with -b3456 (@bennetng, I think?) might have a bonus in this.
However, I never got -b3456 to produce the best compression - and to whomever came up with -7 -l 11 to speed up -7 slightly (@sundance , I think?), that one made for both faster encode and smaller files than -7b3456 in this corpus.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-11-01 15:00:09
I don't have enough RAM disk space to benchmark encoding, data listed here are decoding speed, using foo_benchmark and single thread in RAM disk. Previous tests indicate relative decoding speed vs encoding parameters can be highly CPU-dependent, so keep this in mind.

BEST WORST

Some people may choose -0 because it decodes faster.

-0
8618108022 bytes, 1797.101x realtime

-0b2048
8603509098 bytes, 1836.416x realtime

-0b2304
8602686728 bytes, 1841.398x realtime

-0b3072
8605905301 bytes, 1845.695x realtime

-0b3456
8606959660 bytes, 1838.831x realtime

-0b4096
8607581496 bytes, 1851.142x realtime

-0b4608
8610113289 bytes, 1857.613x realtime

-3 and -8 are the lowest and highest presets default to -b4096. As for --no-mid-side, most, if not all of the test materials are in normal stereo.

-3b1152
8214442723 bytes, 1454.438x realtime

-3b2048
8179862763 bytes, 1523.949x realtime

-3b2304
8177409458 bytes, 1523.017x realtime

-3b3072
8178587836 bytes, 1508.774x realtime

-3b3456
8179963450 bytes, 1523.766x realtime

-3
8182244258 bytes, 1517.872x realtime

-3b4608
8186319940 bytes, 1524.288x realtime

At -8 I think there is no need to test anything below -b2048, except for Merzbow fans.

-8b2048
7951523906 bytes, 1377.707x realtime

-8b2304
7945563198 bytes, 1360.829x realtime

-8b3072
7938910470 bytes, 1350.458x realtime

-8b3456
7936429482 bytes, 1323.356x realtime

-8
7932854429 bytes, 1331.976x realtime

-8b4608
7932931331 bytes, 1320.299x realtime

At last, -5:

-5b2048
7987232470 bytes, 1420.886x realtime

-5b2304
7983765417 bytes, 1417.812x realtime

-5b3072
7983025331 bytes, 1423.270x realtime

-5b3456
7983619874 bytes, 1410.768x realtime

-5
7984866814 bytes, 1408.704x realtime

-5b4608
7988286008 bytes, 1408.448x realtime

Decode from RAM disk (what I did in this test) is still slower than "Load whole file into memory first" with around 1400-1500x decoding speed at -8 and 2000-2100x at -0, but I don't have enough RAM to do this.

Corpus total length 23h20m25s, around 53.52% at -8 to match ktf's graphs.

The playlist is deliberately built to achieve balance, with classical, electronic, ethnic, jazz, new age, pop, speech etc, including Eastern and Western works. The attached corpus.txt is not very well organized and shows a lot of "game music", but they are mostly individual tracks from different albums while some other files are big images without showing track names. Anyway "game music" is just all kind of genres used in games, except they don't have too many vocals. This highly deliberate effort gave the intended results at -8 in terms of file sizes I suppose. The lower presets are most likely limited by -l and -r, but -b1152 is still too low for the lower presets.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-11-01 15:22:21
In fact, after seeing this glitch I even scan files encoded with the Xiph encoders, if I am going to delete the original.
I always do. What if the process is aborted for whatever stupid reason, leaving a partial file?
[...]
(And even with that zealous attitude of mine ... the first floating-point .wav's I downloaded, were Audition's format, and I should have WavPack'ed them using official wavpack.exe rather than through foobar2000.)
I have many WavPack files directly saved with Audition without going through wav. WavPack saves markers and loops and they can be read by other software like Sound Forge and Reaper, therefore I also have 16 and 24-bit WavPack files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2022-11-01 17:30:58
I don't have enough RAM disk space to benchmark encoding, data listed here are decoding speed
[...]
Decode from RAM disk (what I did in this test) is still slower than "Load whole file into memory first" with around 1400-1500x decoding speed at -8 and 2000-2100x at -0, but I don't have enough RAM to do this.
Corpus total length 23h20m25s, around 53.52% at -8 to match ktf's graphs.
Even CUETools.Flake is happy with the corpus and showed no error.

-8 --vbr 4
7925706124 bytes, 1317.942x realtime

-8 -r 8 --vbr 4
7925683221 bytes, 1333.292x realtime

Adding -s search makes the encoding speed comparable to -8p in flac 1.4.2 and therefore not tested.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-01 19:01:02
Some people may choose -0 because it decodes faster.

Just to have repeated this for the record: in ktf's test done on an AMD processor (http://audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf), -3 decodes faster than -0.

Also -1 and -2 are -0M and -0m respectively, so the only difference in decoding are the transformations from mid+side / left+side / right+side to dual mono - and potentially the following: In the above link, we see that -2 decodes slightly faster than -1, and that is likely due to better compression and thus less data to handle and unpack (what else could it be?)

In your test, -3 is slower than -0, so this is where your Intel behaves ... not like AMD ;-)


Then:
-3: It seems that 1152 is slow and everything else is about equal. Don't know how much variation you would get by a re-run.
-8: I don't know why lower block sizes are faster here, but it might be that they use less complicated Rice partitioning - that is, 4096 could get another subdivision of two, compared to 2048? Just thinking aloud.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-11-07 21:22:16
So I realize there's a very steep wall of diminishing returns when using encoding options beyond simply using -8.
I randomly tested this about a week ago.  Using Nine Inch Nails - The Fragile, full album as a single wave file.

CPU = AMD 5850U
Original wave size = 1046.52 MiB
-8 = 20 seconds, 628.39 MiB  (60.04%)
-m -b 4096 -p -r7 -l18 -A subdivide_tukey(21/15e-1) = 283 minutes, 626.29 MiB  (59.84%) -0.02%
-m -b 4096 -p -e -r7 -l24 -A subdivide_tukey(7) = 564 minutes, 625.95  (59.81%) -0.23%
-m -b 4096 -p -e -r7 -l32 -A subdivide_tukey(21/15e-1) = 8972 minutes, 625.47 MiB  (59.76%) -0.37%

6.25 days to shave off just under 3 MiB  compared to just using -8!
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2022-11-08 00:39:45
-m -b 4096 -p -e -r7 -l32 -A subdivide_tukey(21/15e-1) = 8972 minutes, 625.47 MiB 
8972 minutes! That is what i call a performance test  8)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-08 07:24:52
So I realize there's a very steep wall of diminishing returns when using encoding options beyond simply using -8.
You can argue for "beyond -5" and "beyond -7" there as well. Try those!

Yes there is no "practical limit" to how slow you can get the encoding. I have also once run it over a few days I was away (the -A enough-for-five-days (https://hydrogenaud.io/index.php/topic,120158.msg1001834.html#msg1001834) line here). Your test was on a high resolution thing (otherwise the -l would have called you to invoke --lax), where -e often makes more difference than to CDDA.
Still, try the following, which won't take nearly as much time:
-8p
-8e
-m -b 8192 -r9 -l 15 -A "tukey(7e-1);punchout_tukey(4);subdivide_tukey(12);welch"
-m -b 8192 -r9 -l 15 -A "tukey(7e-1);punchout_tukey(4);subdivide_tukey(12);welch" -p
-m -b 8192 -r9 -l 15 -A "tukey(7e-1);punchout_tukey(4);subdivide_tukey(12);welch" -e
and see how they compare.
If it does good ... well how to tell? Need to run them all possible combinations brute-force. That is what takes time.


-m -b 4096 -p -r7 -l18 -A subdivide_tukey(21/15e-1) = 283 minutes, 626.29 MiB  (59.84%) -0.02%
-m -b 4096 -p -e -r7 -l24 -A subdivide_tukey(7) = 564 minutes, 625.95  (59.81%) -0.23%
-m -b 4096 -p -e -r7 -l32 -A subdivide_tukey(21/15e-1) = 8972 minutes, 625.47 MiB  (59.76%) -0.37%
There was something very unreasonable about your differences, as you earned only half a megabyte between the two latter. And indeed you have not quoted them correctly. They are
-0.20 not 0.02
-0.23 yep
-0.27 not 0.37. Well it is really 0.28 after roundoff.
So going from slow to ultra-slow gives you slightly less than 0.08 - not the mighty 0.37-0.02=0.35 your numbers could suggest.

(Oh, but it is percentage points - you get nearly half a percent ;) )
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2022-11-08 08:15:47
Apologies for the typos on the percentages.  My window to edit my post closed before I realized.  It's CDDA audio, so I should have specified the --lax option was used.

I had run other random tests, but only included a few.  I find that running either -e or -p in combination with subdivide_tukey gives better compression results than using -e and -p combined, while being faster.  I also noticed using higher values can sometimes result in worse compression than using smaller values.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-08 12:11:22
Oh, I forgot that The Fragile is a double CD  :-[

I find that running either -e or -p in combination with subdivide_tukey gives better compression results than using -e and -p combined, while being faster.
My experience is that for CDDA, you can outdo -e in shorter time by subdivide_tukey around 5. Above that, go for -8p. And that -8p -A subdivide_tukey(reasonably high) still will outdo -8pe.
For high resolution, the -e still isn't useless.

The thing is, brute-forcing "roundoffs" (well -e tries to put the upper coefficients equal to zero, in which case you don't have to store them, that saves space) does not only find the best trade-off between the bits the coefficients take up and the bits the coefficients will save - it also "unpredictably" moves the coefficients somewhere in a direction which every now and then is "better without us knowing why before actually calculating it". Brute-forcing means to actually go through the encoding for a lot of different (possibly only-slightly-different) coefficient vectors, and even if the resolution doesn't really save much, it could "by chance" be better. And even more so if FLAC's way of "guesstimating first to pick the best which is then calculated thoroughly" is not-so-good - and that has evidently been tested on CDDA.

I also noticed using higher values can sometimes result in worse compression than using smaller values.
Higher number in subdivide_tukey(N)? That also changes the effective tapering. Which does not necessarily give a better or worse, but if you do many tests you should see both directions. Your 21/15e-1 means that the the "full tukey" part of it has tapered 1.5/21 = 1/14 is around 0.07, which in my tests would be too close to a rectangle. Hence my suggestion to include a separate tukey with more tapering.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2022-11-08 23:23:16
And just to get some idea on what makes differences on The Fragile:
* Three MB saved -5 to -7.
* Another half a MB-ish saved -7 to -8.
* Another half a MB-ish -8 to -8p. (Midway in between: -8r7.)
* Another half a MB-ish -8p to -8pr8
* Another half a MB-ish for stacking up with -A "subdivide_tukey(12);tukey(7e-1);punchout_tukey(4);welch;hann;flattop"

Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-03-24 18:54:22
Tested:
30-ish GB of decoded HDCD rips. Yeah I know I shouldn't have made that irreversible mistake fifteen years ago, but here we are.
So these are effectively 17-ish bits in 24 bit container - 95 percent of the tracks have a peak less than .8 scanned with oversampling.

* Why test these? Just to see whether there are any surprises with slightly unusual signals.
* Were there any? Not really. -4 isn't particularly good; I have earlier on questioned whether -5 is really much of an improvement over -4, but here it is. Anyway, that question has probably not made any great impact, I mean who uses -4?

What I did was to re-encode FLAC files (with overwrite) on an SSD. That is why I quote the times per encoded gigabyte. "142" means reference FLAC 1.4.2 Win64, 134 means 1.3.4 Win64, both Xiph builds.
Numbers then. Size relative to 1.4.2 -5, then setting, then comment with time taken. Sizes are file sizes, with tags and default padding.

+1.574%   142-4   ~25 sec per GB (encoded GB).
+0.119%   134-5 
ref.point   142-5   ~30 sec/GB. 31 808 619 711 bytes
-0.255%   134-7 
-0.306%   134-8 
-0.347%   142-7  ~37 sec/GB.
-0.397%   142-8  ~1 minute/GB.
-0.399%   134-8p
-0.412%   142-8e   ~3 minutes/GB.
-0.428%   142-"all the sevens but no p" (see below) - also ~3 minutes/GB.
-0.461%   134-8pe ~20 min/GB.
-0.480%   142-8p 2min40s/GB.
-0.507%   142-8pe Also in the ~20min/GB ballpark
-0.514%   142-"-p all the sevens", about the same time as -8pe

That "all the sevens" - and why not "8"? It is not because it is good! It is because I wanted to come up with a command that was easy to remember, takes about as much time as "-e" and outcompresses -e. Supporting the claim that "-e" should not be used on typical music with normal resolutions: if you are even willing to wait for -e, then there are better things around. (It is known that -e still has something for it on higher sampling rates, potentially.)
The actual option line is -7r7 -A "flattop;gauss(7e-2);tukey(7e-1);subdivide_tukey(7)" with an additional "-p" for the last line. And a -f to overwrite, but that goes for everything.
For those who ask "why -7 and not -8"? It wouldn't make any difference, -8 is -7 with a different (and heavier) "-A", and the moment I write "-A" here I override that by specifying yet a different (and heavier!!) -A.

Did you really type "flatopp"?
Damn, only one way to find out, and that is not done in two minutes ...
Here I made sure to get it right!  O:)

But anyway, bottom line is what we knew, 1.4.x improves, and -e does not deliver at these resolutions. -p is nearly as expensive, but much better
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-03-26 10:10:02
Can anyone try to replicate the following observation, using their fave build and CPU?

Prediction order (the "-l" switch) makes for big time penalty somewhere above -l12.

Being --lax settings, they may have gone under everyone's radar for good reason. But the impact is unexpectedly big here, see plot below.

Here is what I did, using the "timer64" tool - but PowerShell wizards can probably come up with something built-in (and *n*x users, you likely know what to do):
for /l %l IN (6,1,32) DO timer64 flac --lax -fr0 -ss -l %l filename*.flac >> logfile.txt
for /l %l IN (6,1,32) DO timer64 flac --lax -fpr0 -ss -l %l filename*.flac >> logfile.txt
... re-encoding yes (that's the -f), so in principle that means every successive encode has to read a more complicated FLAC file, but (1) FLAC decodes so quick it shouldn't matter, and (2) anyway a jump would be a surprise. The "-r0" to ensure that the partitioning is done the same for every run.

Timings on a quick run on one album (Swordfishtrombones) - this fanless computer is cooling constrained and timings have shown to be quite unreliable, but I ran  -l15 and -l16 (indicated in the oval) several times on several files and that particular jump is quite consistent. For -p, the impact is more dramatic already at -l 13.
(https://i.imgur.com/BpkBiWF.png)              
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-03-26 14:01:55
Prediction order (the "-l" switch) makes for big time penalty somewhere above -l12.
Makes perfect sense. Loops are only unrolled until order 12, not for orders above that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-03-26 17:53:30
Loops are only unrolled until order 12
That is something I need translated ... or maybe I should just not bother my pretty little head with those details.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Replica9000 on 2023-03-26 18:46:36
Sounds like code optimization (-funroll-loops) that's only beneficial until you reach max lpc order of 12
Title: Re: FLAC v1.4.x Performance Tests
Post by: doccolinni on 2023-03-26 19:15:06
That is something I need translated

Good thing we're on the Internet!

https://en.wikipedia.org/wiki/Loop_unrolling
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-03-26 19:18:51
I don't know how to explain this in simple terms, but let's say that for each order up to and including 12, there is code optimized for that specific order. For orders above 12, there is generic code.

A compiler can optimize loops in code much better if it knows in advance how often that loop will be traversed. It can 'unroll' a loop. In the generic code, the CPU will have to check after each addition and/or multiplication whether it needs to do another one for this sample, or whether it can move on to the next sample. When a loop is unrolled, there are simply a number of additions and multiplications after one another before encountering a check.

So, generic code looks like this:
Code: [Select]
repeat the following code for each sample {
     repeat the following code for each order {
          do multiplication
          do addition
     }
}

In FLAC, this is unrolled for orders below 12 to the following.
Code: [Select]
[...]
Use this code for order 2:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
}

Use this code for order 3:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
}

Use this code for order 4:
repeat the following code for each sample {
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
     do multiplication
     do addition
}

This is pretty much what happens for residual calculation, strictly up to order 12. This is the change you're seeing for the red line, because when using -p the residual calculation code dominates the execution time. Just look at the code here: https://github.com/xiph/flac/blob/master/src/libFLAC/lpc.c#L1101

For the blue line, the change between 15 and 16, is a little bit more complicated. This has to do with the autocorrelation calculation, which can be optimized in groups of 4, more or less. So, there is code for order below 8, below 12 and below 16. You see this with the red line, because when not using -p (or -e) the autocorrelation calculation dominates the execution time. Look at the code here: https://github.com/xiph/flac/blob/68f605bd281a37890ed696555a52c6180457164f/src/libFLAC/lpc.c#L158
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-03-26 19:54:41
Ah, OK. So it could be in the code and it could be done at compile time - meaning that some builds might potentially behave different? Or maybe not. Anyway, problem "solved" ...

... except for those who might think that hey, if they are willing to consider a different lossless codec than FLAC, then non-subset FLAC is at least as compatible maybe?  O:)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-04-12 11:50:36
To the "showcases" division, a file downloaded from <stupidlyhi-rez site that also offers a DSD file at 16x the "ordinary" 2822400>
Resolutions 384/24 and 384/32 (huh, that says encoded with 1.3.1 that one too).
Duration: 5:04.

The 32-bit file:
480 434 934 bytes (12637 kbps) for the one downloaded that says it was encoded with reference libFLAC 1.3.1 20141125
310 731 527 bytes for 1.4.2 at -8per7 -b8192 -A <quite some but not running overnight>

The 24-bit file is maybe more interesting, since flac.exe 1.3.1 supports 24 bits and it is easier to track the options used:
244 474 825 bytes for 1.3.1 at -5 (matches the 6428 kbit/s as downloaded).
240 441 708 bytes for 1.3.1 at -8pe (this is hi-rez, -e matters more than -p on this track)
I couldn't get 1.4.2 with fixed predictor to beat any of these, but ...
231 107 835 bytes for 1.4.2 at -3r0 -l4 deliberately using as weak LPC as ...
182 464 188 bytes for 1.4.2 at -3 taking more than 20 percent off what 1.3.1 could achieve
177 297 877 bytes for ffmpeg at default
175 257 853 bytes for 1.4.2 at -3e - see, "-e" matters quite a lot even here.
164 436 893 bytes for 1.4.2 at -5 default.
164 426 962 bytes for ffmpeg at -compression_level 8
153 320 636 bytes for ffmpeg at -compression_level 12
148 262 872 bytes for 1.4.2 at -8
145 371 133 bytes for 1.4.2 at -8e; -e matters less here!
142 006 237 bytes for 1.4.2 at -8per7 -b8192 -A <quite some but not running overnight>
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-04-15 20:06:33
Here are compiles using a more generic CPU optimization x86-64-v3 instead of haswell while using similar capabilities up to AVX2 in the hope for better performance across more modern CPU types.
Inside are builds with Clang 16.0.1, GCC 12.2.0 and a "disable-asm-optimizations" version with faster 16bit performance but slower 24bit performance for apps like CUETools or EAC.
Thanks. It is the first time that Clang got the best overall results with my i3-12100 in both CDDA and hi-res. The GCC builds are pretty similar to the builds in last October.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-04-15 20:44:45
Thank you for the feedback. Here on my 5900x GCC is still slightly faster. I may add exact benchmarking is hard on this system because every run is different.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-04-15 22:06:31
To the "showcases" division, a file downloaded from <stupidlyhi-rez site
According to @bennetng , it is not only the resolution itself that is dumb, it is the combination of resolution and "crappy quality junior MIDI sequencing stuff" (https://hydrogenaud.io/index.php/topic,123889.msg1025140.html#msg1025140), so take it with a grain of salt. Anyway, stupid resolutions are a waste of space, and at least not bad that FLAC 1.4.x wastes less space.

Also, you got artefacts like the following, for the 384/24 file:
308 548 254 for WavPack -hx, but -hx4 compresses to less than half of that: 151 671 854
209 to 222 for Monkey's ... which isn't happy about stupidly high resolutions and gets beaten by FLAC -3
FLAC -8e beats OptimFROG --preset 0.

Something is "as should be" though: When I "faked it as 192 kHz" (that is, same samples just telling the file they are at half speed) so that TAK can handle it, it compresses better than anything but default-and-up OptimFROG. Which at preset 10 can get it to < 119. Not quite as far to half the monkey.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-04-16 08:09:04
The original sample rate is likely 96kHz. See how clean the upper spectrum is, it is impossible for anything originally recorded in 384kHz with an ADC. Basically it is the same thing I mentioned previously regarding the use of -e, even if the content is not "crappy quality junior MIDI sequencing stuff".
https://hydrogenaud.io/index.php/topic,123025.msg1018144.html#msg1018144
X

Another thing is that the 24-bit file on the website is clipped (the vertical cyan lines). The website sells conversion software and it is a pretty pathetic way to sell the product with such a careless conversion.
X

Here are some examples of how clip-free 24-bit conversions should look like when using the 32-bit file as input.
X
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-04-16 15:45:12
The original sample rate is likely 96kHz. See how clean the upper spectrum is, it is impossible for anything originally recorded in 384kHz with an ADC.
It makes me wonder if it is possible to exploit without a brute-force -e.
How are upsamples done in practice and how would that translate to "a good linear predictor"? Say for the sake of the illustration: my "upsampling" just copies each sample - and say if you put weights ABCDEFGH on the eight past signals of the original, you could just try
0, A, 0, B, 0, C, 0, D, 0, E, 0, F, 0, G, 0, H
for the upsampled.
I have a hunch that it isn't as simple. Of course if one were using an iterative estimation, that could be one starting point to try.


Another thing is that the 24-bit file on the website is clipped (the vertical cyan lines). The website sells conversion software and it is a pretty pathetic way to sell the product with such a careless conversion.
Vicious.
(I love it!)

So whoever did this went to lenghts to force libflac 1.3.1 to handle a 32-bit signal (I mean, flac.exe couldn't!) but the 24-bit signal behaves different ... what for? 32-bits rounding up the 24th to overflow? Anyway, foo_bitcompare reports peak difference of -20.48 dB, which is suspiciously high but not high enough for my ears to bother trying to ABX it out.

(So next time I would worry, do I really want to give much publicity to a site that helps you "upconvert" your music to something nonsensical just to audiophool customers, then ... probably someone will uncover the ways to ridicule them. Let's see:
* this one
* Sound Liaison going 768 kHz ... for what?! It was recorded at much less. Oh, it captures the all-important noise of the analog tape they put their digital recording through
* nativedsd putting out high resolution test files that are just zero-padded to a higher bit depth)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-04-16 16:08:48
It makes me wonder if it is possible to exploit without a brute-force -e.
My guess is the provided -A options are not designed for this kind of abrupt and deep cutoff, but even if more options are provided we still need to try them out manually.
Quote
So whoever did this went to lenghts to force libflac 1.3.1 to handle a 32-bit signal (I mean, flac.exe couldn't!) but the 24-bit signal behaves different ... what for? 32-bits rounding up the 24th to overflow? Anyway, foo_bitcompare reports peak difference of -20.48 dB, which is suspiciously high but not high enough for my ears to bother trying to ABX it out.
It may not be a flac issue, my theory is that the clipping was introduced before encoding, could be an overlook when doing DAW export, or a bug in that guy's software converter.

Some other findings would like to see others to confirm/test:

"Wait for Spring" 32/384:
Upper: @Wombat 's Clang
Lower: Retune by @ktf
Method: Encode the original file to a new file in PowerShell.

-8 -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(2e-2)"
323473111 bytes, 49 seconds
337529462 bytes, 103 seconds

-8l32 -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(3e-2)"
348323578 bytes, 105 seconds
337529462 bytes, 103 seconds

-8e -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(3e-2)"
307255778 bytes, 266 seconds
303683449 bytes, 1551 seconds

-8p -b16384 -A "subdivide_tukey(5);blackman;gauss(5e-2);gauss(3e-2)"
317620242 bytes, 429 seconds
332639121 bytes, 879 seconds

I don't know why -l32 makes no difference on the retune, perhaps it detects the bloat?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-04-16 17:25:41
It may not be a flac issue
I didn't intend to suggest it was. Rather, whoever made this converter went to lenghts to bend flac-before-32-bit-support to accommodate it, but couldn't be bothered to check for clipping.


I don't know why -l32 makes no difference on the retune, perhaps it detects the bloat?
That is because the retune's "-8" does in fact select "-l 32" whenever the sampling rate is high enough for it to be subset-compliant. "-r 8" as well, since that part of it got a speedup.
But also, the retune did tweak the algorithm to select LPC order, with consequences you can find in that thread. Next test build is likely different.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-04-16 17:34:30
That is because the retune's "-8" does in fact select "-l 32" whenever the sampling rate is high enough for it to be subset-compliant. "-r 8" as well, since that part of it got a speedup.
But also, the retune did tweak the algorithm to select LPC order, with consequences you can find in that thread. Next test build is likely different.
Ah yes, I've completely forgotten these details :-[
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-04-17 18:58:09
The retune can be quite efficient with other settings without increasing -l.
"Wait for Spring" 32/384:

-8b8192 -l12 -A "subdivide_tukey(5/1);blackman;gauss(5e-2);gauss(2e-2)"
314252255 bytes, 52 secs

-8b8192 -l12 -A "subdivide_tukey(12/2);blackman;gauss(5e-2);gauss(2e-2)"
304400309 bytes, 203 secs

-8eb8192 -l12 -A "subdivide_tukey(5/1);blackman;gauss(5e-2);gauss(2e-2)"
302974489 bytes, 322 secs

Some characteristics of a file can be obtained in a much cheaper way by scanning the whole file, like EBU loudness stats and this kind of resampling. It should be possible to make a separate app to gather these information then generate a suggested command-line batch file to feed the encoder.

For example, loudly mastered files could be benefited from using fixed -q values:
https://hydrogenaud.io/index.php/topic,123025.msg1018053.html#msg1018053
Title: Re: FLAC v1.4.x Performance Tests
Post by: darkalex on 2023-04-29 14:27:10
@Porcus

have you ever experimented with the other anodization functions in flac? apart from the widely used subdivide_tukey, there are a bazillion other functions there and you can actually use multiple of them together

this, together with, -r 8 -p -l 12, seems to give slightly better or equal to -l8 compression, yet noticeably faster

any ideas? oh, I also found that for some tracks, like electronic or digital music, much small block sizes like 128, 256, were actually giving phenomal compression compared to the default L8 and 4096 size... in some usual tracks, I tested 2048 as block size and it worked better, but then it craps out in another set... I wonder if there's a way to find which block size could be optimal for a particular track/set of tracks without manually brute forcing them...

any ideas? I wonder if that -e extended model search does this, because it is not documented anywhere exactly what it does apart from the "expensive!!!" prompt in FLAC docs



MOD note: This post was merged from another topic.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-05-01 00:03:34
Brief while out on the road, @darkalex :
Moderator moved this to here. You will find several suggestions in this thread.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-05-15 21:37:27
Another example that -e is more suitable for these kinds of test signals with a lot of unused spectral spaces.
https://www.soundonsound.com/techniques/sos-audio-test-files-downloads
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-05-16 06:31:23
Got any idea of any "easy" way to improve on the guesstimation?
flac.exe can hardly employ a full signal analysis, but I suppose some blockwise variability metric might be put at work.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-05-16 07:31:44
I want to ask the same question as well. My comments about -e are based on observation, not because I precisely know how -e works. Just wondering how much time can be saved if a two-pass approach is allowed.

Some audio interfaces (e.g. Merging) have additional filters to reduce ultrasonic noise from ADC modulators when using >= 176.4k recording rates, also one cannot rule out that some hi-res releases have ultrasonic noise attenuated by using a DAW, and filter characteristics would vary depending on the mastering engineers' preferences. The same applies to DSD to PCM transcoding as well, different software may have different filters.
Title: Re: FLAC v1.4.x Performance Tests
Post by: darkalex on 2023-05-16 14:17:37
can the dev or someone who knows the source code of FLAC chime in and clarify this for us?

the -e function seems to be the most complex in FLAC and except the ambiguous 1 liner description, we have no idea how it works or what it does

anyone?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-05-16 15:02:14
It is actually rather simple. Normally, FLAC uses some well-known math: calculating LPC coefficients with the Yule-Walker equations solved with Levinson-Durbin recursion. This goes back many decades, to 1960.

Anyway, this recursion gives us a set of models, one for each order. If you specified a max LPC predictor order of 12 (this is default for compression level 12) this means 12 models are returned, each with an associated error. This error does not correlate exactly with compression, but it works rather well. The error is slightly weighted to account for the fact that choosing a higher order gives a slight amount of overhead.

If you do not specify -e, FLAC picks the model of which the weighted error is lowest. If you specify -e, FLAC does not use the error and simply tries all generated models one by one, picking the one giving the highest compression. This takes quite a while of course.

If you specify more than one apodization function, FLAC does this procedure (generating models and subsequently trying 1 or all) once for each apodization.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-05-16 16:54:44
The manual's exposition is not so clear though:

-e, --exhaustive-model-search
    Do exhaustive model search (expensive!)

OK, so a command called "--exhaustive-model-search" does exhaustive model search (which is expensive), but it does not specify what aspect of the model it exhausts. Maybe the easiest (with maybe slightly less than fifty percent chance that @ktf will have to correct me) way to understand what it brute-forces, is to check what is brute-forced by other switches (-p and -r), and then the additional knowledge that blocking (the -b) is done before the LPC modelling even start, and is not part of the "model search" - and isn't optimized at all in current flac.exe.

To a novice reader it might even be a bit confusing that precision can be set exactly with -q, and partitioning not only exactly but within a range with -r, while the LPC order (history taken into account!) can only be set as a maximum. I'm not saying that allowing a construction like -l 10,12 would be good for an end-user for anything but explaining something an end-user doesn't need to deal with (but it could be fun for testing).


As for error ... the error uses to select model is not the size of the encode; that would amount to brute-forcing - it is what you want to obtain, but not what you want to do, you want something quicker. Hence the question whether there is some better way to do it that is still quick enough. There is some theoretical support for a logarithmic size measure, and since discrete logs can be obtained by bit-shift, it should be possible ... but "theoretical support" does not mean that it improves much on actual data, over a method that has been tweaked to the level where it by and large works quite well.


order of 12 (this is default for compression level 12)
Level -8. (And -7.)

Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-06-21 13:27:58
-0 does not pick the fastest block size!  Test done on CDDA with official 1.4.1 x64.
Tested again with last Sunday's build: https://hydrogenaud.io/index.php/topic,123176.msg1029000.html#msg1029000 , which has some speedups committed.

The time penalty for -b1152 is still there, but not anymore as pronounced (was: >6 percent over 3072, now 2). That also goes for the time penalty on -b4096 (down from +7.5% to 4.)
Still, block sizes in the
3000's were the fastest, followed by the
2000's (although a catch with 'Global time' for -b2048) followed by the
1000's, followed by
4096 and 4608. Different CPU used, an Intel this time.

Timings according to the aforementioned timer64.exe follow. Figures are not comparable to the ones in the above link (different fileset):

-b:'Process''Global time'
-b1024626667
-b1152621662
-b1536626666
-b2048619672
-b2304616653
-b2560619660
-b3072609645
-b3456607642
-b3584611649
-b4096634673
-b4608628662
Now sorted by block size, not by speed. Here I run the Win32 version. Timings are medians over five runs after an initial (that probably would be affected by previous) was discarded. Each time is three -0fb<XXXX> encodes of the 38 CDs in my signature, encoded twice from FLAC once from WAV, on an SSD.
Timings according to the aforementioned timer64.exe
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-23 16:16:04
Just tried some flac 1.4.3 x64 builds from the release thread. Wombat's GCC build with asm optimizations and john33's AVX2 build have the best overall encoding speed for my i3-12100. Tried -8p with CDDA, also tried -8e and -8p with 24/48.
Title: Re: FLAC v1.4.3 (Release)
Post by: itisljar on 2023-06-24 12:21:22
Are AVX2 compiles actually faster?

I've tried both of them on one CD, and after multiple retries I've got these results:

non-avx:
Kernel  Time =     0.656 =    5%
User    Time =    10.187 =   86%
Process Time =    10.843 =   92%    Virtual  Memory =     15 MB
Global  Time =    11.745 =  100%    Physical Memory =     19 MB

avx2:
Kernel  Time =     0.718 =    6%
User    Time =    10.062 =   85%
Process Time =    10.781 =   91%    Virtual  Memory =     15 MB
Global  Time =    11.735 =  100%    Physical Memory =     16 MB

CPU is Ryzen 5 3600, 16 GB RAM. Files were on SSD drive.

It's almost eactly the same, if you look at global time. I've decided to keep non-avx one, exe file is smaller :)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-24 15:14:39
Ryzen 5900x -8 -p -V CDDA
My GCC 13.1.0 x64-v3 ~110x, no-asm ~130x, Clang 16.0.5 ~108x
John33 x64 ~97x, AVX2 ~108x
Title: Re: FLAC v1.4.x Performance Tests
Post by: capma on 2023-06-24 16:57:43
i5-1135G7, x64-AVX2 is about 24% faster than the x64 build (both from RareWares).
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-24 17:10:54
Here are some multithreaded x64 benchmarks using foobar2000 2.0 x64, tested on RAM drive, i3-12100, 16GB RAM, Win10.

Transcoding 29 CDDA flac files with unknown encoding settings (7h 6m 50s, 3.84GB) to new flac files using -8p
Code: [Select]
john33                  422.54x
john33 AVX2             462.62x
NetRanger GCC           437.19x
NetRanger clang         435.33x
Wombat GCC with asm     474.53x
Wombat GCC no asm       427.39x
Wombat clang            436.26x
Xiph                    451.40x

Transcoding 33 24/96 flac files with unknown encoding settings (2h 47m 28s, 2.85GB) to new flac files using -8p
Code: [Select]
john33                   86.57x
john33 AVX2              93.45x
NetRanger GCC            87.63x
NetRanger clang          87.32x
Wombat GCC with asm      92.07x
Wombat GCC no asm        72.92x
Wombat clang             88.56x
Xiph                     88.76x

...and -8e
Code: [Select]
john33                   99.20x
john33 AVX2             105.12x
NetRanger GCC            99.68x
NetRanger clang          98.99x
Wombat GCC with asm     106.31x
Wombat GCC no asm        84.69x
Wombat clang            100.16x
Xiph                    102.15x
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-24 17:32:58
Thanks bennetng. Interesting is that Clang compiles here still loose ground against GCC when used multithreaded.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-06-24 22:50:40
What does "multithreaded" mean here? Running one cmd start (or PowerShell Start-Job) for each of the 29 or 33 files and recording when it is done with them all?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-24 23:10:12
When i use foobar multithreaded the compiles can differ more in performance as for example a single instance with CUETools may suggest.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-25 04:37:59
What does "multithreaded" mean here? Running one cmd start (or PowerShell Start-Job) for each of the 29 or 33 files and recording when it is done with them all?
Because I mentioned the use of foobar2000 2.0 x64, the benchmark results are from foobar2000 2.0 x64's console output, which means the decoding was also performed by foobar2000 2.0 x64, only encoding was performed by flac.exe.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-06-25 08:06:45
OK, again I cannot read :-(
Thx.

Edit: just for reference, how much extra time would it take on your setup to transfer tags, pictures, ReplayGain? Just to set the perspective for real-life operations.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-25 10:46:34
I usually keep minimal amount of non-audio data in flac files so my test results may not be very useful.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-27 11:53:24
bennetng can you test this build with your intel CPU if it changed anything on speed for the 24/96 files?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-06-27 15:11:40
A preliminary run on CDDA at setting -8pr7 (overnight, chewing on my signature for hours straight and repeating, hoping to approach some thermal steady-state) indicates that 1.4.3 is like 13 to 14 percent faster than 1.4.2. Official x64 .exe.

And file sizes ... 1.4.3 saves some twenty bytes per CD :))
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-27 15:37:38
r7 is not a bad idea it seems while r8 does almost nothing anymore. Have to try some.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-27 17:16:21
bennetng can you test this build with your intel CPU if it changed anything on speed for the 24/96 files?
Not testing everyone's compiles this time, but included single thread ("Do not convert in multiple threads" checkbox) and multithread results by using foobar2000 2.0 x64. Not the same set of 24/96 files in my previous post.
Code: [Select]
-8p                      single         multi
john33 AVX2              24.50x        97.20x
Wombat clang             22.52x        94.15x
Wombat GCC with asm      24.81x        97.18x
Wombat fa32              24.77x        99.34x

-8e                      single         multi
john33 AVX2              28.32x       114.46x
Wombat clang             25.53x       108.15x
Wombat GCC with asm      28.57x       114.95x
Wombat fa32              28.74x       115.01x

-8                       single         multi
john33 AVX2             101.97x       359.46x
NetRanger clang         101.59x       363.32x
Wombat clang            101.36x       369.37x
Wombat GCC with asm     103.38x       358.85x
Wombat fa32             103.38x       361.88x
clang seems to have some multithread advantages when there is no -e or -p.
So what is fa32? How does it affect your Ryzen?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-27 17:41:08
I used to use the falign-functions=32 (fa32) compiler flag before but left it out when cleaning up my config.
On my Ryzen benchmarking is a pita because it varies on every run and depends on daily moot.
I did read a while back falign-functions=32 does well on intel CPUs sometimes with better utilizing the cache.
Many thanks for testing!
If interested i can repost in the main 1.43 thread and add the no-asm binary.
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2023-06-27 17:45:06
I couldn't resist doing some speed tests with my favourite "-7" setting.
I'm still using the following setup with my line-up of 40 CDDA-WAVs (3 hours of playing time):
 CPU: Intel Core i7-8700 CPU @ 3.20GHz
 RAM: 2 x 16 GB DDR4-2666 (1333 MHz) SK-Hynix
 HDD: Samsung SSD 860 EVO 500GB
 
Code: [Select]
My fastest v1.4.2 build (reference):
flac142-x64-gcc1220-Ofast+manyflags-noasm-wombat_2022-10-23.exe (665600 bytes)
-> Average time =  22.635 seconds (5 rounds), Encoding speed = 477.66x

Code: [Select]
xiph-143\flac.exe (302592 bytes)
-> Average time =  22.715 seconds (5 rounds), Encoding speed = 475.99x

flac143-x64-gcc1310-O3-noasm-wombat_2023-06-23.exe (705024 bytes)
-> Average time =  21.904 seconds (5 rounds), Encoding speed = 493.61x

flac143-x64-gcc1310-O3-wombat_2023-06-23.exe (814592 bytes) <== FASTEST BUILD
-> Average time =  21.334 seconds (5 rounds), Encoding speed = 506.80x

flac143-avx2-john33.exe (1310602 bytes)
-> Average time =  22.167 seconds (5 rounds), Encoding speed = 487.76x

flac143-fa32-wombat.exe (816640 bytes)
-> Average time =  22.355 seconds (5 rounds), Encoding speed = 483.65x
So the build from "down under" with asm option is:
a) the fastest encoder in my setup
b) 6.5% faster than the official xiph build
c) 6.1% faster than my fastest v1.4.2 build

However, it remains a mystery to me why wombat's fastest 1.4.2 was "noasm", while the fastest 1.4.3 was "asm"...

Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-06-27 17:53:29
The last flac changes plus compiler versions do strange things. Single performance on my 5900x within CUETools still is way faster with the no-asm version, ~130x vs 110x.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-06-27 19:28:07
However, it remains a mystery to me why wombat's fastest 1.4.2 was "noasm", while the fastest 1.4.3 was "asm"...
Because I greatly improved the asm (https://github.com/xiph/flac/pull/556), of course  :))
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-27 20:37:08
PS H:\> measure-command{h:\flac -ts *.flac}|select totalseconds
13 CDDA images, 3.93GB, two trials for each compile, on RAM disk as usual. Decoding time in seconds so lower is better.
Code: [Select]
compile                  1st           2nd
john33 avx2          30.6907728    30.3865113
NetRanger clang       36.592295    36.5719044
Ozz                  38.4181653    38.4648698
Wombat clang         39.2339226    38.9330091
Wombat fa32           31.886242    31.6372516
Wombat GCC with asm   31.999076    31.6573886
xiph                 35.2711499    35.0125727
Title: Re: FLAC v1.4.x Performance Tests
Post by: sundance on 2023-06-28 07:57:05
Because I greatly improved the asm (https://github.com/xiph/flac/pull/556), of course  :))
Doesn't your improvement mainly affect hires files with compression levels -0 .. -4?
Anyway, on my setup it greatly improves CDDA files with -7! Well done! As soon as I find some time, I'm going to test it with -8
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-06-28 11:32:06
In 1.4.2 Wombat's asm build was also faster for my CPU.
https://hydrogenaud.io/index.php/topic,123025.msg1018104.html#msg1018104
The post right below it contains 24-bit tests too.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-06-30 14:14:43
Very slow figures recorded for the 23 June build posted by https://hydrogenaud.io/index.php/topic,124356.msg1029259.html#msg1029259

It took 2.5x as much time running -8, as the slowest among the others posted in that thread - the Rarewares build without ASM optimizations (indicated as "x64" here (https://www.rarewares.org/lossless.php)). 3x as slow as the others.
On -8pr7 it wasn't that dramatic: 1.6x the time of Rarewares w/o ASM, which was also here slowest among the others.
-5: somewhat in between.

@Ozz : any known explanation?

CPU this time: Intel Core i7500T. The others builds are much more even.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-04 11:53:01
I took some CDs and tried -8p vs -8pr7 and it averages ~2,5kb smaller size per album. Not worth the speed hit imho.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-07-04 19:23:21
I think -8p is already the most practical and simple "slow" setting for most CDDA contents. The Hi-res "playground" is much more fun with different combinations of -b and -A before looking into -e or the even slower -p and more symmetric settings like -l.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-07-05 17:54:18
To the "showcases" division, a file downloaded from <stupidlyhi-rez site
According to @bennetng , it is not only the resolution itself that is dumb, it is the combination of resolution and "crappy quality junior MIDI sequencing stuff" (https://hydrogenaud.io/index.php/topic,123889.msg1025140.html#msg1025140), so take it with a grain of salt.
Another unsuspicious victim using this song for audio format tests (see the second screenshot):
https://hydrogenaud.io/index.php/topic,124399.msg1029548.html#msg1029548
Title: Re: FLAC v1.4.x Performance Tests
Post by: Air KEN on 2023-07-06 13:28:25
@bennetng

What?
you are a malicious person.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-07-06 14:43:03
@bennetng

What?
you are a malicious person.
I meant the song you used (Wait for Spring) is not a real hi-res file, it is upsampled, and in a poor way. Read this:
https://hydrogenaud.io/index.php/topic,123025.msg1025264.html#msg1025264
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-06 21:15:04
Left a computer running unattended for some days on CDDA material to compare Win-x64 1.4.3 builds, taken from https://hydrogenaud.io/index.php/topic,124356.0.html , at various settings from -8pr7 (heaviest) to -0 --no-md5-sum. I included official 1.4.2 for a baseline.

As readers of the thread probably know already, results do depend on CPU and should be taken with that grain of salt.

Figures quoted in the table are speed relative to realtime - higher is better this post! (Edit3, got that wrong first edit!)

.-8pr7-8-5-2e-0'00'00 is -0 --no-md5-sum. Remarks per build:
1.4.2 official501803514015246261.4.2 is second-to-slowest except -8pr7 (near-tied to Wombat-Clang)
1.4.3 official58209451468609751Always #3.
Wombat-Clang50215447426568700-8 was the 'least consistent timing' in the pack
Wombat-GCC 59224494540667826Fastest on LPC predictors, second-fastest (and close!) on fixed
Rarewares54174355410592737Over AVX2: Penalty in seconds "not much bigger" for -8pr7 compared to -8
RW-AVX259217488542681849Fastest on fixed predictors, second-fastest (and close!) on LPC
Ozz3469193369500619Ooof. Not equally disastrous on fixed predictors.
Fastest builds were always one of Wombat-GCC and Rarewares with AVX2 optimizations. Both improved over official at every preset tested.
Differences between the two fastest: Using Wombat-GCC over Rarewares on -5 & -8 & -8pr7 would save you about a minute per hour - and using Rarewares over Wombat-GCC on the three other modes would save you about a minute per hour as well.

Computer: HP Prodesk with i5-7500T @ 2.70 GHz.
Corpus: the 38 CDs in my signature. One file per CD.
"Method": "One run" := encode the 38 CD .wav images AND two re-encode from FLAC images. Did first a run to get the CPU to stable heat, discarded that, did three more runs and recorded the median of those three. For sanity-checking variability: also computed speeds using the fastest of those three runs; had I used fastest-of-three, the numbers in the table would have been 0 to 2 higher except Wombat-Clang at -8, which would have gone up from 215 to 222.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-07 01:19:55
Many thanks for the benches. Very interesting to see CPUs like yours or sundances one acting!
Title: Re: FLAC v1.4.x Performance Tests
Post by: Case on 2023-07-07 09:20:35
By the way, @Case - you did provide some compile that in some situations performed well. Care to post an exe of 1.4.3 too?
I don't know why but I can't make GCC compiles that run as fast as Wombat's. But I made a Clang compile that seems to be faster for me than other compiles. That build is attached.

Very slow figures recorded for the 23 June build posted by https://hydrogenaud.io/index.php/topic,124356.msg1029259.html#msg1029259
I don't know what has been done to the code but MSVC compiles are super slow nowadays. I used to use MSVC to build 32-bit FLAC to be bundled with the Free Encoder Pack (as the official compile is out of the question thanks to its libflac.dll dependency) and its speed was on par with other builds. Not so anymore.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-07-07 09:31:27
I don't know what has been done to the code but MSVC compiles are super slow nowadays.
It MSVC only super slow compared to 1.4.3 from other compilers, or also super slow compared to MSVC's 1.3.4?

I think the main problem is in auto-vectorization. GCC and Clang can auto-vectorize most code reasonably well, but it seems MSVC can´t. I could of course manually vectorize all code, but I'd rather not: there have been some nasty, hard to find bugs (potentially with security implications) in manually vectorized code, so I try to keep use of it low.

From what I've heard, MSVC is incorporating LLVM/Clang anyway: https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2023-07-07 09:41:40
VS 2019 and later already support LLVM/Clang both with the (now out of date) version built in as an option, and with the latest version installed by creating a 'Directory.build.props' file and placing it in the same dir as the .sln.

The content of the Directory.build.props is:
Code: [Select]
<Project>
  <PropertyGroup>
    <LLVMInstallDir>C:\Program Files\LLVM\</LLVMInstallDir>
    <LLVMToolsVersion>16</LLVMToolsVersion>
  </PropertyGroup>
</Project>
The only issue I have with compiling via VS is that it declares the VC version as the compiler rather than the LLVM/Clang version. I use a version of the x265 video encoder that I compiled that way.

EDIT: As an aside, I think the MSVC 32 bit compiles suffered considerably speed-wise with the removal of the nasm code.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-07-07 11:57:52
Left a computer running unattended for some days on CDDA material to compare Win-x64 1.4.3 builds, taken from https://hydrogenaud.io/index.php/topic,124356.0.html , at various settings from -8pr7 (heaviest) to -0 --no-md5-sum. I included official 1.4.2 for a baseline.

As readers of the thread probably know already, results do depend on CPU and should be taken with that grain of salt.

Figures quoted in the table are speed relative to realtime - higher is better this post! (Edit3, got that wrong first edit!)

.-8pr7-8-5-2e-0'00'00 is -0 --no-md5-sum. Remarks per build:
1.4.2 official501803514015246261.4.2 is second-to-slowest except -8pr7 (near-tied to Wombat-Clang)
1.4.3 official58209451468609751Always #3.
Wombat-Clang50215447426568700-8 was the 'least consistent timing' in the pack
Wombat-GCC 59224494540667826Fastest on LPC predictors, second-fastest (and close!) on fixed
Rarewares54174355410592737Over AVX2: Penalty in seconds "not much bigger" for -8pr7 compared to -8
RW-AVX259217488542681849Fastest on fixed predictors, second-fastest (and close!) on LPC
Ozz3469193369500619Ooof. Not equally disastrous on fixed predictors.
Fastest builds were always one of Wombat-GCC and Rarewares with AVX2 optimizations. Both improved over official at every preset tested.
Differences between the two fastest: Using Wombat-GCC over Rarewares on -5 & -8 & -8pr7 would save you about a minute per hour - and using Rarewares over Wombat-GCC on the three other modes would save you about a minute per hour as well.

Computer: HP Prodesk with i5-7500T @ 2.70 GHz.
Corpus: the 38 CDs in my signature. One file per CD.
"Method": "One run" := encode the 38 CD .wav images AND two re-encode from FLAC images. Did first a run to get the CPU to stable heat, discarded that, did three more runs and recorded the median of those three. For sanity-checking variability: also computed speeds using the fastest of those three runs; had I used fastest-of-three, the numbers in the table would have been 0 to 2 higher except Wombat-Clang at -8, which would have gone up from 215 to 222.
Not too different from mine, including the slowness on Ozz's build. I have an impression that GCC seems to work better with heavier settings (-p and -e) and multithread, and Clang is better on single thread and lighter settings, but the differences are rather small, unlike in last year that Clang was much slower than GCC (before ktf removed some assembly codes that hindered Clang's performance?)

Now I have Linux installed (the real thing, not in a VM) and I also got an NVMe SSD, so I have three types of storage device (HDD, SATA SSD, NVMe SSD) on a single machine now, alongside with Windows in a dual boot environment, I am thinking about doing some Linux benchmarks after figuring out how to build from the source.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-14 15:11:09
Time penalty of -r, anyone? Someone with a computer that isn't cooling-constrained, could you test that on your fave presets?

I did some rough tests at https://hydrogenaud.io/index.php/topic,124437.msg1030120.html#msg1030120 and then re-did them, and at least up to r6, the impact is so small that the variation in between runs pretty much kills the comparison.
But, the size impact wasn't big on that test either.
Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2023-07-14 23:19:14
15:42 of CDDA on i7-4790K:

-8pr1 - 12s
-8pr5 - 12,5s
-8pr6 - 13,6s
-8pr7 - 14,9s
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-22 20:11:29
Left a computer running unattended for some days on CDDA material to compare Win-x64 1.4.3 builds, taken from https://hydrogenaud.io/index.php/topic,124356.0.html , at various settings
[...]
Computer: HP Prodesk with i5-7500T @ 2.70 GHz.

Same computer, same corpus, different builds and this time, median of three. The triumphant match of the @Wombat builds, but ... but which one(s)?
It seems that the GCC builds benefit from "harder work": -p, -r7, more apodization functions and -e (which isn't anymore much useful for CDDA) - but in this test, -8r7 was not hard enough in itself and -8 -A subdivide_tukey(5) was not enough either; the combination -8r7 -A subdivide_tukey(5) would make GCC king of the hill. GCC builds has its day on -5r7 too, but I wouldn't trust the numbers that much.

The pattern is clear though. And 1.4.3 did surely speed up over 1.4.2.
This post I measured time (not speed), so that a "+" means slower.

"-e" is not anymore much useful for CDDA, but let's do away with it first, numbers are even more Clang-unfriendly than -p:
-8er7
WombatSomeflagsGCCdisableasm  was fastest at 38 minutes
+0.4%   Wombat-GCC
+1.2%   WombatSomeflagsGCC
+4.3%   Xiph
+12%   RW
+26%   1.4.2
+27%   Case and then Wombat's Clangs at 28 and 29

-p settings with several windowing functions. Hard tasks. "Good for something, unlike -e" ...

-8pr7 -A subdivide_tukey(5)
Wombat-GCC was fastest at 94 minutes
+0.4%   WombatSomeflagsGCCdisableasm
+2.5%   Xiph
+3.2%   WombatSomeflagsGCC
+7.6%   RW
+19%   1.4.2
>+20%   Case and then Wombat's Clangs.

-8pr5 -A subdivide_tukey(5), stepping the "-r" down to 5:
WombatSomeflagsGCC was fastest at 70 minutes
+0.5%   Wombat-GCC
+4.5%   WombatSomeflagsGCCdisableasm
+6.0%   Case
+8to9%   Xiph and Wombat's Clangs
+14%   1.4.2
+15%   RW

Now remove the "p". -r7 first:
-8r7 -A subdivide_tukey(5)
WombatSomeflagsGCCdisableasm was fastest at 19 minutes
+1%ish   WombatSomeflagsGCC and Wombat-GCC
+3.5%   WombatSomeflagsClang
+5.0%   Xiph
+8.4%   Wombat-Clang
+13%   Case
+21%   1.4.2
+22%   RW

Turns out that the GCC builds are dethroned from here on:

-8r5 -A subdivide_tukey(5), stepping the "-r" down to 5:
Wombat-Clang was fastest at 15 minutes. Clang making the top here is a bit WTF.
+6.0%   WombatSomeflagsClang.  This was not the immediate next, so it was not a fluke?
+7to9%   Wombat, the GCCs
+11%   Case
+12%   Xiph
+36%   RW

Down to "normal" windowing functions. -r up to 7 again:
-8r7
Wombat-Clangs and Case are fastest, 9min41 sec (within .6 seconds of each other)
Then   Wombat's GCCs, +1.8 to 2.7 percent (10 to 15 seconds).
+4.8%   Xiph
+25%   1.4.2
+26%   RW

-8 (plain)
Wombat-Clang at 8min38, 5 seconds before Case
+6to8%   Wombat, the rest
+12%   Xiph (that's +61 seconds)
+30%   1.4.2
+36%   RW

-8r5, stepping the "-r" down to 5:
WombatSomeflagsClang this time. 8min12.
Then   Other Wombats, with GCC at the end
+12%   Case (that's quite a bit worse than -8 and -7r5, maybe just busy CPU?)
+15%   Xiph (that's +71 seconds)
+30ies   1.4.2 and RW, lagging worse in %. 
Rarewares spends > 1 minute more doing -8r5 than the other 1.4.3s spend on -8r7.

-7r5 now.  I didn't do standard -7
Pretty much the same pattern as -8r5, differences worse in % but slightly better in seconds, should be as WombatSomeflagsClang is down to 5min26.
Exception: Case is now back at +3.6%, +20seconds
Rarewares is only 15 seconds off being beaten by some -8r5.

-7r3
Pattern continues? Wombat's Clangs win, ten seconds faster than -7r5.
+3.6%   Case, that is 17 seconds
Then:    Wombat-Clang and Wombat GCCs
+16%   Xiph
+40%   1.4.2
+47%   RW.


-5r7
Now for sudden Wombat's GCC win again.  4min 12to15 seconds.  No idea why; if it were -r7 alone, I would have expected it to happen on -8r7 too.
Then:    Wombat's Clangs and Xiph
+5.2%   Case.  That is +13 seconds.
+32%   RW.  Beaten by the fastest -7r5.
+34%   1.4.2

-5r5
All the Wombats then Case in at 4min 2to7 seconds. 
+5.4%   Xiph.  That is +13 seconds.
+34%   1.4.2 then RW, only narrowly beating the fastest -7r7

-5r3: Computer restarted for update during the Rarewares build, but pattern looks the same.


Might do some fixed-predictor run, but ... well?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-26 15:26:49
Always interesting what ideas you create for the way of testing :)
When you are bored one day i can do an AVX 512 (x64 v4) compile since one of your CPUs even when a smaller one has this extention. Somehow i didn't recognize or missed numbers about that.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-26 17:23:28
Please do.
Work computer rebooted again, so ... might run something over the upcoming week-end.
But it seems that "-e" is not very Clang-friendly even on lower settings.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-26 18:32:09
Build from todays git 5500690 of ktf and the multi-threading version.
x86-64-v3 should have AVX-2 and x86-64-v4 should have AVX-512 support. I also used -falign-functions=32 as GCC compiler flag.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-26 18:49:02
Oh, so this does -j multithreading? (But you missed the v5 just posted?  ;) )
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-26 18:52:54
v3 and v4 stands for the name of the official compiler flags. Both builds are todays v5.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-26 20:55:21
On the computer I'm at right now, your v4 exe does absolutely nothing. No text no processing no nothing. v3 works.

Meanwhile, it seems that fixed predictors aren't Clang's best friends either ... on this computer, at least.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-26 21:03:05
Yes, for v4 the AVX-512 support of the CPU is mandatory.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-26 23:31:45
Ah! Back at the i5-11gen it works.
First impression is that the v4 doesn't help, but I haven't tried any really heavy job yet.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-27 11:40:56
Meanwhile, it seems that fixed predictors aren't Clang's best friends either ... on this computer, at least.

Confirmed.

Neither is "-e" (this is on CDDA where it should hardly be used though)
- and the following combination is just slayyyyying: fixed predictor, -e and high -r:

-2er7, here is where the fastest Clang takes fifty percent more time than the winner
5:36 for WombatSomeflagsGCCdisableasm. I have no idea why this one makes ten percent faster than anything else - I re-ran and got pretty the same figures over again.
6:0x Wombat other GCC and Xiph.
7:02 Rarewares
8:22 to 8:25: The Clangs, including Case.
One more minute for 1.4.2.

-2er3 (equals -2e) to see if it is the "-7" that does it. It sure makes impact, but not so much on order:
3:42ish for the Wombat GCCs
4:10 Xiph
4:34 The Clangs, including Case.
4:45 Rarewares and 1.4.2

-2 plain to see if it is the "-e".
3:19ish for the Wombat GCCs
3:32 Xiph
3:41 Rarewares
4:02 the Clangs, including Case, and also: 1.4.2


For the impact of -r7 without any "-e", here are -0 modifications. Yes GCCs rule:

-0r7:
3:18ish for the Wombat GCCs with Xiph trailing four or five seconds-ish
3:32 Rarewares
3:48ish the Wombat Clangs with Case trailing four or five seconds-ish
4:02 for 1.4.2

-0r1 to see whether it is the "r":
3:05 for the Wombat GCCs
3:14 to 3:16 Xiph and Rarewares
3:26 for the Wombat Clangs with Case trailing a couple of seconds
3:38 for 1.4.2

Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-28 01:55:21
-2er7, here is where the fastest Clang takes fifty percent more time than the winner
5:36 for WombatSomeflagsGCCdisableasm. I have no idea why this one makes ten percent faster than anything else - I re-ran and got pretty the same figures over again.
The compiles with --disable-asm-optimizations are still way faster on many CPUs it seems but only for 16bit material.
ktf first time explained it here https://hydrogenaud.io/index.php/topic,123025.msg1017351.html#msg1017351
Thats why i still add it to the package and suggest it for the use inside CDDA apps like CUETools or EAC.
Guess the next package needs also a third version with the -falign-functions=32 compiler option because newer CPUs like it but older ones it slows down.
No idea when the point is reached it becomes silly  :D
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-28 16:44:05
Blunder in Reply#338 and Reply#347:
Wrong Rarewares build picked for those tests. Should have run them with the AVX2 build, which is faster.

Anyway, the Clangs wouldn't look better if up against something even faster.
And it is kinda weird, (or "in particular since") this is only one CPU: GCC builds win on fixed-predictor lightweight jobs, GCC builds win on heavy jobs - but even nearly as heavy as -8 (which equals -8r6), namely to -8r5, then Clang overtakes it; though, not at -5r5.

No idea when the point is reached it becomes silly  :D
Long ago!

But if everyone here runs their tests with -8 and -8p - despite those being ones where a few percent savings may actually matter - it isn't clear what compiler an official build would benefit from using.
BTW, should it matter at all what CPU is used for compiling?

(Compatibility issues here are only AVX for one Rarewares build and which ones of yours? Plus AVX512 for your v4?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-28 17:08:51
BTW, should it matter at all what CPU is used for compiling?

(Compatibility issues here are only AVX for one Rarewares build and which ones of yours? Plus AVX512 for your v4?)
The CPU shouldn't play a role for compiling until you tell the compiler to use "native" optimization and it tries to detect the CPU in use.
The last binaries are all AVX2 if not stated otherwise.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-29 13:41:58
Some multithreading results posted in that thread: https://hydrogenaud.io/index.php/topic,124437.msg1030783.html#msg1030783

In the last line of the reply, the "three -j1 runs discarded" (from that table) were done with Wombat's "GCC" build, to check whether times were about the same for the "v3" build posted above. Not much differences to write home about. The "v3" had so small variations to ktf's build, and also the GCC build might have had some inconsistent timing, being ran immediately after a different setting (say its -5r7 ran after another build doing something that was more than twice as intensive).

Anyway, "v3" didn't make for miracles on that computer.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-29 14:47:37
Many hanks for more numbers! While the last v3+v4 was more meant to check if AVX-512 helps.
If this was your i5-7500T it also doesn't like the -falign-functions=32 compiler flag.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-07-30 23:34:26
It was the i5-7500T - which also, I misread it, is 4 cores 4 threads.

But here are some figures from my i5-1135G7. Lower is better (and more negative is better). Only one run each, so take with a grain of salt.
I have two different things into the table here as well: typically, percents are "overhead penalty" of actual time vs the idealized "j1 time / # of threads". Edit: The "j9" is corrected to be 8 threads, which is what the CPU has. The -j9 setting was to verify that nothing too stupid happens.
But, the "time w4 vs w3" line has nothing to do with overhead, it measures how much time changed moving to your v4 from your v3. You see benefits at -8, but the other way around for -5, -3 and -2er7. At -2r0 things are going so fast anyway that I won't trust the numbers without multiple runs, but in the very least the sign is negative more often than positive.

The two rightmost columns are the same encoding parameter plus the "-M" to check if the overhead is that nasty as previous figures suggested (it is).
I also ran a range of -0r0 -b <something>, and there is no sign that the "v4" is worth it.

-8:j1 time/diffj2 ovrhd/diffj3 ovrhd/diffj4 ovrhd/diffj5 ovrhd/diffj8 ovrhd/diffj9 ovrhd/diff-Mj1 time/diffj2 ovrhd/diff
v51218%23%40%55%124%138%8397%
Wombat311316%29%43%70%138%158%8196%
Wombat410223%34%46%75%153%155%75111%
time w4 vs 43−9%−4%−5%−7%−6%−4%−10%−8%0%
-5:
v54924%56%80%105%239%246%4088%
Wombat34629%56%77%123%248%243%3991%
Wombat44746%55%111%127%257%270%4184%
time w4 vs w31%14%0%20%3%3%9%5%1%
-3:
v53726%68%100%148%312%307%3892%
Wombat33625%62%109%146%320%317%3883%
Wombat43632%66%120%162%335%345%3899%
time w4 vs w3−1%4%1%4%5%2%5%0%8%
-2er7:
v57016%38%52%75%180%212%50102%
Wombat36520%39%64%86%188%227%48104%
Wombat46617%37%64%94%203%229%4895%
time w4 vs w32%−1%1%1%6%7%2%1%−4%
-2r0:
v54424%57%85%153%377%320%3891%
Wombat33946%81%123%180%394%362%38101%
Wombat43855%63%111%205%379%360%36120%
time w4 vs w3−1%5%−12%−7%7%−4%−2%−5%4%
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-07-31 00:33:14
I guess original v5 is without AVX-2. So for high compression the speedup from non to AVX-2 to AVX-512 scales well at least.
For -5 the numbers are a bit surprising.
Again thanks for the numbers!
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-08-31 13:41:49
In order not to hijack @ktf 's comparison update thread (https://hydrogenaud.io/index.php/topic,124676.0.html) with too many FLAC specifics:
The fastest flac.exe encoding is now faster than decoding. I replicated this on a small sample of 192/24 files on an i5-7500T, repeated a few times to give you "ballpark" timings. Everything done on an internal SSD to the same SSD.
FLAC files: created using -0, or for the even faster times: -0 --no-md5

16 seconds / 14 seconds: decoding, respectively, -0 files / -0 --no-md5 files
13 seconds / 8 seconds: encoding -0 / -0 --no-md5
10 seconds / 6 seconds: flac -t on -0 / --no-md5

I did much of the same thing on a larger corpus with 96 kHz files too, similar results.

Now if I add -b4096 to everything, numbers improve slightly. Same order as above:
13/11 decoding
11/7 encoding
9/5.5 test

Add a very few tenths of a second to get the difference between -0b4096 and -1b4096, the latter being faster than -0 as well - actually, -2b4096 (optimizing joint stereo / dual mono over -0's dual mono only) was at -0 speed, give or take a few tenths of a second.  It looks like -b2048 could be just as fast as -b4096, which would be in line with what I have seen before - but I am not going to pretend that kind of accuracy.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Brand on 2023-09-22 13:35:04
Didn't read the whole thread, so apologies if this was mentioned...
I'm seeing significantly higher bitrates with FLAC 1.4.2 and 1.4.3 for "simple" signals, like sine waves, compared to FLAC 1.3.4, both at -8 compression.
For example, a 1kHz 48k-16bit sine tone is at 164 kbps with 1.3.4 and at 231 kbps with 1.4.3.

Pink noise, on the other hand, has almost the same bitrate with both versions.

Is this known/expected?
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-09-22 15:34:44
Is this known/expected?
No, not really. I'm able to reproduce this.

FLAC 1.4.0 had some major changes that benefit most sources, but not all. Apparently the sine wave you mention is one of the cases that do not benefit. However, a 1kHz sine sampled at 44.1kHz or a 1.01kHz sine sampled at 48kHz show a much smaller loss.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-23 11:23:45
Just tried a 10 seconds full scale dithered 1kHz sine at 16/48.

1.4.3
253kbps -8
146kbps -8p
114kbps -8e

1.3.4
186kbps -8
157kbps -8e
143kbps -8p

[edit]Attached a 4567Hz sine, 1.4.3's -8e performs much better than 1.3.4.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-23 13:03:15
One thing is that "-8" differs, because it is now synonymous to something else with different apodization functions.
But -5 also. bennetng's file recompressed:
322377 bytes with 1.3.4 win64 at -5 (379745 (bigger!) by adding --lax -l32)
381923 bytes with 1.4.2 win64 at -5 (382278 (bigger!) by adding --lax -l32)

Adding -p, similar happens:
222297 bytes with 1.3.4 win64 at -5p
268901 bytes with 1.4.2 win64 at -5p
 
-e instead of -p reverses the order, now 1.4 makes smaller:
276571 bytes with 1.3.4 win64 at -5e
269589 bytes with 1.4.2 win64 at -5e

-pe then, 1.3.4 is back winning, but not at -l32:
184466 bytes with 1.3.4 win64 at -5pe (down to 183542 by adding --lax -l32)
188771 bytes with 1.4.2 win64 at -5pe (down to 180144 by adding --lax -l32)

Edit: Could get it down as far as this:
161713 bytes with 1.4.2 win64 at -pe --lax -l32 -A<tonsofthem>. -r4 or -r15 didn't matter, -r3 inflated it one byte
139241 by adding -b 32768 -r15
136487 by adding -b 65535 , confirming that once the predictor is good enough, ... or am I interpreting it wrong?


Since -e makes the difference, is there something about the model guesstimation algorithm?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-23 14:43:59
https://hydrogenaud.io/index.php/topic,123025.msg1025264.html#msg1025264
https://hydrogenaud.io/index.php/topic,123025.msg1027285.html#msg1027285
A good thing about flac 1.4 is that the effect of -8e is quite predictable when the input files have a lot of unused spectral spaces, so the first thing to try with simple sine waves is to use -e.

Anyway, the tunings are still based on a very large set of corpus instead of very specific set of test samples, like waveforms from South Pole or 384-channel brainwaves.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-09-23 15:37:42
Yes, I think tuning for specific samples will result in a bad overall tuning.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-23 16:05:16
Yeah. The observation that -e improves on certain material does suggest that the model selection algorithm could be improved, but until one can actually capture both these and those signals, it is not a good idea to chase the oddballs.


429428 bytes for the above file with TAK -p4m
420210 bytes for wavpack -hhx6


More sines tested: https://hydrogenaud.io/index.php/topic,122444.0.html
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-23 22:33:41
A weirdness or two, though. (Got 1.4.3 on this computer and replicated with that.)

-e -l <N> would never pick order = N. At most order = N-1. Or do I misinterpret the flac -a output? (I read the "order=" part, which does not start at 0. order=7 means coefficients enumerated 0 through 6, and was the highest I got out of -l7.)
Anyway, a bunch of .ana files attached.

Also, with -b48000 - so the blocks "nearly repeat", but not exactly - the predictor coefficients do vary quite a lot between the frames. However as a sine should be perfectly replicable with order = 2 - presuming sufficient precision, which I am too lazy to check out - there would be a whole lot of different predictor vectors that would make for equally good prediction.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-24 06:49:01
I read the "order=" part, which does not start at 0. order=7 means coefficients enumerated 0 through 6, and was the highest I got out of -l7.
out of -l8. Argh.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-09-24 11:40:34
A weirdness or two, though. (Got 1.4.3 on this computer and replicated with that.)
This is only on the sine waves, right?

Quote
Also, with -b48000 - so the blocks "nearly repeat", but not exactly - the predictor coefficients do vary quite a lot between the frames. However as a sine should be perfectly replicable with order = 2 - presuming sufficient precision, which I am too lazy to check out - there would be a whole lot of different predictor vectors that would make for equally good prediction.
Analyzing a sine wave in the LPC stage gets one step near a singularity (or somesuch, I'm not too familiar with the terminology) and round-off errors get a tremendous influence. That is why optimizing for sine waves doesn't work, why the differences between 1.3.4 and 1.4.3 are so large and why I don't think it is a good idea to spend too much time on this.
Title: Re: FLAC v1.4.x Performance Tests
Post by: rutra80 on 2023-09-24 17:41:33
Wouldn't it be a good idea to have it optimized for sines of standard scales and round ones? 440Hz, 1000Hz etc.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-24 20:18:30
A weirdness or two, though. (Got 1.4.3 on this computer and replicated with that.)
This is only on the sine waves, right?
Yes. And furthermore, I forgot that you can have non-fixed frames of order 4 or less. Trying those ... -l3, it alternates between order 2 and 3, and -l2 uses order 2.
Anyway since there are several equally good predictor vectors on a sine, I couldn't even call it out as something you can improve upon, but it looked strange: if -el 6 finds the best predictor to be of order 5, you would expect the very same algorithm to select the very same predictor at -el 5 - unless the algorithm is written so that it starts out one step below and then goes up down one and up one, or something like that.



Analyzing a sine wave in the LPC stage gets one step near a singularity (or somesuch, I'm not too familiar with the terminology) and round-off errors get a tremendous influence. That is why optimizing for sine waves doesn't work, why the differences between 1.3.4 and 1.4.3 are so large and why I don't think it is a good idea to spend too much time on this.
Yeah, well, sines are special in this sense. Although they are quite easy to model when you know it is a sine, they are hardly the Billboard chartbusters - and those who know they are about to compress sines, can invoke -e ...

But as part of a bigger picture, there might be some information value from the test even if you are not going to chase sines per se. -e still makes a difference with high resolution signals, and some synth too - and if I understand @bennetng right, it appears to be down to signals that have very little content at the top.
So if that is the hypothesis: "The model selection algorithm doesn't work that well for signals with very little content in the top octave-or-so" (or was "octave-or-so" too far off?)
- then this experiment gives support to it.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-24 20:46:52
Wouldn't it be a good idea to have it optimized for sines of standard scales and round ones? 440Hz, 1000Hz etc.

The thing about a signal that is a single sine, is that you can predict them by two coefficients of which one is = -1.
You have a predictor x(N+1) = x(N) * 2 cos q - x(N-1) and with q fit to pi* 2f/F where f is the sine's frequency and F (>2f) is the sampling frequency, then you are all good - provided that the format can offer high enough precision for that 2 cos q.
 
So if you want a flac encoder that beats the official reference on sines and does nothing else worse, then - with reservation for that precision - you can do that by including a special algorithm for the order two predictor, and spending extra effort on every signal doing that number-crunching only to ditch it because the signal isn't close to a sine.
I have absolutely no idea how much it would slow down the encoder.

(Actually when I first started playing around with codec performance, I was indeed surprised at how bad codecs compress sines - I'd have this hunch that if you started to fiddle around with lossless compression, one would try to get those signals going first. But there are reasons why the engineering approaches have rather targetted real-world applications where data perfectly suiting the models are just not going to show up.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-25 08:00:27
Instead of changing the source code someone can simply write a "tips and tricks" guide or something similar. For example, video codecs also have guides for optimizing some specific contents (e.g. anime).

I think one can see that tones can be generated at arbitrary frequencies, and can be multi-tone, can be sweep etc, this will add a lot of complications. Also, if one has to use --lax to get close to subset -e performance it is also not very practical too.

For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.

So, something like -8b8192 -A "subdivide_tukey(3);blackman" may often help CDDA > 24/192 clean upsampling without a lot of trial and error or subdividing too many tukeys or doing some difficult gausswork, or increasing other symmetric parameters which may further slowdown decoding.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-25 09:11:35
For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.
Yeah, "often" - I didn't find it too predictable though ...

If one wants to enhance the encoder, then one could make it accept a -b"2048,4096,8192,16384" to do four encodes and pick the smallest file. (That's constant block size - variable is a way to go to implement, and besides you might for compatibility want to stick to constant.) I have no idea if such a construct would be effectively multithread-able.

What I wonder though: say you got a CDDA signal that is fairly well compressed using an 8-order predictor. Now upsample it to 88.2. Wouldn't that suggest that you could try to "compress as undersampled" and optimize coefficients on N-2, N-4, ..., N-14, and pin N-1, N-3, N-<odd> to zero?

Also, one could think up some smarter "-M-alike" thing to get out the larger parts of the "-pemr8 -A <tonsofthem>" gain for cheap. Say, brute-force some frames at "smart intervals" based on the performance of the previous brute-force'd choice and the encoder's guesstimate.

 
... but, someone's got to do it - and neither CPU nor storage is as expensive as when the codecs were new.
To all you computer science professors reading this - wouldn't there be some fun student projects from all this?  ;)
(Hm, "all" = "none" I guess.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-25 11:17:07
For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.
Yeah, "often" - I didn't find it too predictable though ...
Let's compared it to "-8" alone, and convert your 38 CDs to 24/192 using SoX best quality with -2dB gain reduction to reduce chance of clipping. Use -8b8192 -A "subdivide_tukey(3);blackman"
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-25 13:10:56
For upsampled hi-res using a clean resampler, -b8192 often helps 176.4/192k content, -b16384 often helps >= 352.8k content, when coupled with some narrow windows, the result will be comparable to -8e and often better than -8p with much faster encoding speed.
Yeah, "often" - I didn't find it too predictable though ...
Let's compared it to "-8" alone, and convert your 38 CDs to 24/192 using SoX best quality with -2dB gain reduction to reduce chance of clipping. Use -8b8192 -A "subdivide_tukey(3);blackman"

SoX best quality ... It's going to take the day to work itself up the gigabytes, so I hope this that I copied from someone out at the 'net is good enough?
rate -v -b 95.4 -b 45 -a <samplerate>
and with  -b 24 and -v 0.666 (two dB was not enough, and -v 0.7 wasn't either)


Edit:
Anyway, a "controlled" experiment unlike if I obtain high resolution files from label/artist with who knows what software and source file.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-25 13:37:45
So I tried it. The total duration of the two attached playlists is 41h 39m 06s. The playlists contain single tracks and images, so not every single track name is shown.

-8
38.1 GB (41002782313 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
37.7 GB (40507157807 bytes)

5 files out of 306 still clipped with -2dB gain. Yes, if one keeps throwing Merzbow to the chain that requires like -10dB it would harm the stats, so don't do that. Clipping should be controlled to as low as possible, or not at all (e.g. by using RG max non-clip gain).

For simplicity and speed, I did all the conversions within foobar and foo_dsp_resampler.
X
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-25 18:56:26
Same as above but using RetroArch "Higher" quality, 16/192. same -2dB gain without dither. 3 out of 306 files clipped.

-8
20.2 GB (21735937324 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
20.0 GB (21563020744 bytes)

-8e
19.8 GB (21326877503 bytes)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-25 23:17:47
While my computer is chugging off on upsamples ... warning for back-of-envelope quality calculations:

The thing about a signal that is a single sine, is that you can predict them by two coefficients of which one is = -1.
You have a predictor x(N+1) = x(N) * 2 cos q - x(N-1) and with q fit to pi* 2f/F where f is the sine's frequency and F (>2f) is the sampling frequency, then you are all good - provided that the format can offer high enough precision for that 2 cos q.

... and, well: An "optimal" coefficient would be around 1.65313038269 i.e. 6771.22204748 * 2-12, and although a simple -l2 does not catch it, -pl2 finds 6771 * 2-12. Maybe, for all that I know, 27085 * 2-14 or  54170 * 2-15 would do better, maybe not.

@ktf : out of "curiosity", is it possible to force-feed libflac a given predictor vector and test (6771 * 2-12, -1) vs  (27085 * 2-14, -1) vs  (54170 * 2-15, -1) on this file?
Asking because - semi-educated guesswork, but I might hit it - the explanation might be as follows:
* for -l2, where there would be a perfect predictor if resolution & accuracy were infinite, this is either "a format resolution issue" limited by the right-shift being at most 15 bits, or "an accuracy issue" where intermediate calculations should use an even more precise variable type.
* when order becomes > 2, we have a uniqueness issue on top. There are several "infinite-precision predictors" that would hit the sine exactly, the number of degrees of freedom should be some k = (order minus two) for a pure tone. With finite precision, that means to walk through that grid to find a benevolent roundoff error. Not sure if that is worth to code - likely the alternative with -e, possibly with a few apodization functions too, works as a "try a few arbitrary ones" with the added benefit that if it finds a good one at lower order, it saves the space a coefficient would take. 
But "uniqueness issue on top" does not rule out the resolution issue or the accuracy issue.

So if I am right about this, then question is, does it actually hit the best 2nd order predictor the format can offer. If it does not, then one can scratch one's head over whether the double precision change was enough. If it hits the best available 2nd order predictor, then improving over that is a mess ...
... but, can the algorithm detect whether you are close to a singularity? Because then the following is a thinkable option: if that is detected, invoke a more exhaustive search. And that might be an idea even for more general signals.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-26 11:48:26
It is pretty easy to kill -e too. The plot is configured in a way that signal below 16-bit noise floor appears black:
X
XX

-8
23.6 GB (25365043448 bytes)

-8e
23.5 GB (25329260329 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
23.4 GB (25181478740 bytes)

So, a 16/176.4 conversion using shaped dither makes -e a waste of time.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-26 12:18:45
It is pretty easy to kill -e too.
Pretty easy to generate signals where -e doesn't make any improvement? Sure.
But to find a model selection algorithm that (for your typical high resolution download from artist x or label y) makes -e redundant - that is something else.

As for what my computer is going to run overnight: I had the files initially written to flac -1, and encoding as -8b<N> -A <your choices> seems to save more %%s for 192 than for 384 or 96. Whether that is because the model selection works well up to a certain rate and bad from there, I don't know.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-26 12:30:04
Some characteristics of a file can be obtained in a much cheaper way by scanning the whole file, like EBU loudness stats and this kind of resampling. It should be possible to make a separate app to gather these information then generate a suggested command-line batch file to feed the encoder.
So, those with skills, passion and patience can try to make such a 3rd party tool. Even if it is being built into the main project, it still requires a 2-pass approach.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-26 13:27:27
-8
23.6 GB (25365043448 bytes)

-8e
23.5 GB (25329260329 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
23.4 GB (25181478740 bytes)
flac 1.3.4 at -8
26.2 GB (28186839381 bytes)

Average users may just set the GUI slider to "best compression" which usually means -8.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-26 13:34:29
Some characteristics of a file can be obtained in a much cheaper way by scanning the whole file, like EBU loudness stats and this kind of resampling. It should be possible to make a separate app to gather these information then generate a suggested command-line batch file to feed the encoder.
So, those with skills, passion and patience can try to make such a 3rd party tool. Even if it is being built into the main project, it still requires a 2-pass approach.
Is it obvious that you should scan the entire file, when blocks are encoded independently? Sure if you find patterns that way, but ...
Anyway, once someone wants to make anything out of a two-pass approach, then one could very well take a flac file as input, read the predictor vector, and rather than starting to encode from scratch, (1) try to improve on the predictor that is already stored, and (2) keep it if one doesn't find anything better.

Cf. ktf's attempt at IRLS post-processing: IIRC, if the reweighting pass would not improve, it would be discarded in the end - unlike what I ahve seen with ffmpeg, where running more than a few passes would often make for worse compression.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-26 13:49:19
Is it obvious that you should scan the entire file, when blocks are encoded independently?
Then the encoder would need to check for such redundancies for every block of incoming data, would slower than a full file scan.
[edit]
For example, clean 4x resampling often requires a digital filter that uses up to several hundred to thousand of samples. Which means, a sample at position n could affect other samples at positon up to for example n+/- 512 samples. By using a suitable apodization, the affected sample range can be greatly reduced. So when doing such analyses on the fly, the encoder always need to keep track of much more previous data, apply a math function to it to check for the suitability of a specific encoding strategy.

So, let's say a 3 minutes song works best with method A in the first minute, then method B and C in 2nd and 3rd minute, if these info are known, the encoder only needs to switch encoding methods at a specific time. A frequency analysis of the whole file is often pretty fast, for example, Spek (https://www.spek.cc/).

Quote
Anyway, once someone wants to make anything out of a two-pass approach, then one could very well take a flac file as input
Then the previously encoded flac files can be used as a pass, but encoding from PCM or transcoding from other formats won't benefit from it.

So, who wants to do it? Multipass encoding is quite common in video codecs, which are usually lossy.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-27 08:57:07
Here is an old arrangement I made in 2006, since I made it, it can be longer than 30 seconds. The plot is set to show stuff below 16-bit noise floor as purple/magenta.
X

I collected some public domain 44.1/48k samples, made some SoundFonts and arranged in a DAW. One octave has 12 semitones, but it was/is pretty hard to find high quality public domain sample sets (especially, ethnic instruments) that sampled every semitone. For example, it may just has the white keys (7 per octave) or even fewer like 2-3 per octave. Then the samples are being scaled in the synthesizer to cover up the missing keys. There are of course, effect like pitch bend and modulation where one can continuously change the pitch of a sample. The result is irregular cutoff plus some ultrasonic artifacts caused by resampling.

Download link:
https://1drv.ms/u/s!AvzB71jO7t0-gY1qXyHYew8N1nehTw

flac 1.3.4 -8
2193kbps

flac 1.4.3 -8
1780kbps

So if anything, the biggest optimizations had already been done, even for music with such a bizzare spectrum. -8 is the most important thing for typical users.

For those who are curious about the music itself, here is one of the live performances.
https://www.youtube.com/watch?v=dWDSK4FK-3k
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-09-27 14:28:56
I have to say since flac does multithreading and now all apps i still use for encoding benefit i throw -8ep on everything since.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-27 14:54:22
As a cheapskate using a budget quad core CPU I can use either -e or -p, but not both. Yes, these switches are simple enough so for a fast computer they may by chance improve some types of files significantly. -8ep is painfully slow on 24-bit files.

BTW, since WavPack got multithread I am more willing to use up to -x4 too.

[edit]
Download link:
https://1drv.ms/u/s!AvzB71jO7t0-gY1qXyHYew8N1nehTw
-8b8192 -A "subdivide_tukey(5);welch;hann;flattop"
127272408 bytes
24.63x realtime

-8ep
128616781 bytes
1.75x realtime

Single thread time.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-09-28 01:34:46
I recompressed 5 24/96 random albums with your -8b8192 -A "subdivide_tukey(5);welch;hann;flattop" and several files indeed come out smaller against simple -8ep but the average clearly not.
4.793.262k

Default -8ep, standard 4096 blocksize
4.788.876k

Out of curiosity -epb8192 -A "subdivide_tukey(5);welch;hann;flattop"
4.790.732

Somehow i believe the very best compromise for high bitrate material is still out there.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-28 05:35:04
Yes, either file size or encoding speed, depends on which is more important for individual users. I am happy to get 14x faster encoding time at the expense of slightly bigger files due to the cost of disk space nowadays, and I often don't like to use settings which may significantly slowdown decoding time too.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-28 09:34:16
@Wombat : How about against -8p and -8pb8192?

For CDDA, I made the following observation: The "5" in  -8 -A "subdivide_tukey(5)" would nearly be enough to catch -8p on size (and I think it did catch -8e) and be twice as fast. But if you were willing to take the slowdown from the "5", you should be willing to take the slowdown of the -8p too, as it would save more bytes per second:
https://hydrogenaud.io/index.php/topic,123025.msg1016625.html#msg1016625
More writeup here: https://hydrogenaud.io/index.php/topic,123025.msg1016761.html#msg1016761 (I wonder from vague memory: did 1.4.3 speed up -r7?)
Highly dependent on build, but 20 percent more time for using a CLANG build isn't that much when we are talking settings that might be several times as slow as -8.
BTW, hann = tukey(1). Or well maybe due to some roundoffs, only slightly different.

For high resolution material, there is still something to -e - and as you point out, results vary across files.
The hypothesis is that is isn't the high resolution per se, it is that the ultrasonic part is pretty much empty - explaining also why some dark ambient (low treble content) benefits from -e. Four most recent posts from this user: https://hydrogenaud.io/index.php?action=profile;area=showposts;u=125366
(edit: corrected link).
(Wonder how much one could tell by FFT-ing the signal and considering the top octave or so. Heck, the block will already be windowed.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-28 10:09:49
Also, -b8192 at 88.2/96k is not as safe as at 176.4/192k, a portion of materials may still work best at -b4096 or -b4608. As for my arrangement with bizarre spectrum, honestly I did not know the performance of -8pe is really this bad when compared to -8 alone (1780kbps -> 1779kbps). I expected at least around 5-10kbps reduction.

https://hydrogenaud.io/index.php/topic,122949.msg1016038.html#msg1016038 and https://hydrogenaud.io/index.php/topic,122949.msg1016038.html#msg1016038 .
Both are same?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-28 10:21:47
Also, -b8192 at 88.2/96k is not as safe as at 176.4/192k, a portion of materials may still work best at -b4096 or -b4608.
My observation too. (Let's see how it still holds up when my computer is donw with the upsamples to various frequencies ...)

As for my arrangement with bizarre spectrum, honestly I did not know the performance of -8pe is really this bad when compared to -8 alone (1780kbps -> 1779kbps).
That isn't -8pe bad, that is -8 good  :))

Both are same?
Oops. Edited! https://hydrogenaud.io/index.php?action=profile;area=showposts;u=125366 , four most recent posts.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-09-28 14:45:33
@Wombat : How about against -8p and -8pb8192?
I don't remember the albums i used yesterday but 5 random 24/96 albums.
-8pb8192
4.981.164k
-8p
4.979.130k
Sometimes higher -b was very varying on some tests i did before and since my hardware player only supports a blocksize of 4096 with 24/96 properly this is ok :)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-28 15:45:59
and since my hardware player only supports a blocksize of 4096 with 24/96 properly
... you were the one with the Squeezebox that would choke on 4608?
Actually:  https://wiki.hydrogenaud.io/index.php?title=FLAC_decoder_testbench , if you feel like testing.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2023-09-28 16:28:13
No, it is the Slimdevices Transporter that is vintage now. No need to add. The Squeezebox Touch has no problems.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-09-29 07:55:36
@ktf : out of "curiosity", is it possible to force-feed libflac a given predictor vector and test (6771 * 2-12, -1) vs  (27085 * 2-14, -1) vs  (54170 * 2-15, -1) on this file?
Sure, you can hack around in the encoder. It wouldn't make sense for any normal situation, because most of the time, the signal isn't the same throughout a whole file.

As said before, I think trying to optimize for sine waves is a waste of time: they trigger a specific condition in the encoder that isn't triggered on signals that either aren't a sine or change between the start and end of a block.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-29 08:46:23
@ktf : out of "curiosity", is it possible to force-feed libflac a given predictor vector and test (6771 * 2-12, -1) vs  (27085 * 2-14, -1) vs  (54170 * 2-15, -1) on this file?
Sure, you can hack around in the encoder. It wouldn't make sense for any normal situation, because most of the time, the signal isn't the same throughout a whole file.

As said before, I think trying to optimize for sine waves is a waste of time: they trigger a specific condition in the encoder that isn't triggered on signals that either aren't a sine or change between the start and end of a block.

Yes, - or: those vectors (for this specific file!) would be a test whether this is the correct takeaway from it. Because that "specific condition" should not occur when searching for the best among second-order predictors (coefficient matrix is invertible at order = 2).
If those vectors improve, then you got a more general round-off issue at hand - which may in the larger picture be part of the explanation why -p sometimes does better than one would expect.

But those two other vectors do not improve - or at least are so near-tied that nobody bothers with the difference - then the encoder finds the best 2nd order predictor within the limitations of the format (that is, the bound on resolution), and "your" conclusion is reasonable. I.e., the opportunities for improvement are not worth looking more into, as they are indeed due to this very specific degeneracy (namely, the least-squares minimizer x solves Ax=b but with A singular/non-invertible; for sines that happens when order > 2).
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-29 09:33:19
(for this specific file!)
More sines tested: https://hydrogenaud.io/index.php/topic,122444.0.html
To avoid misunderstanding, the method I mentioned above was undithered, but the file I posted on last page was RPDF dithered.

I've actually forgotten I posted a similar comment in another thread until you mentioned it so I did not intentionally use the same settings in waveform generation. Also, even with a prime number, adding dither can still further clean up the noise floor.

So here is the undithered version.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-29 10:18:59
Sure it is undithered? Splitting into 1 second chunks, those are not identical - which they would have been for a sine of integer frequency. Or maybe it is down to the resolution of pi, so at certain points the sine gets rounded to integer in different ways?

Anyway, this and the previous file differ in 120 385 out of 480 000 samples.
Compressing by -pl2 -b48000 gives same predictor vectors as the previous file, (6771,-4096) * 2-12.
-pl2 -b480 is also enough block size to find the same vector.
Title: Re: FLAC v1.4.x Performance Tests
Post by: ktf on 2023-09-29 10:47:44
Yes, - or: those vectors (for this specific file!) would be a test whether this is the correct takeaway from it. Because that "specific condition" should not occur when searching for the best among second-order predictors (coefficient matrix is invertible at order = 2).
If those vectors improve, then you got a more general round-off issue at hand - which may in the larger picture be part of the explanation why -p sometimes does better than one would expect.
Maybe try an (old) IRLS build. Could very well be those predictors show up. When I did some tests, those builds did very well on sines.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-29 10:49:57
Sure it is undithered?
The files were generated using Audition, and if you trust there is no bug, then it is undithered. Also, if you put my old, dithered version into xz, file size will be bigger. Also, as mentioned, the dithered version has a cleaner noise floor in the FFT view.

[edit]
Sines can be generated using foobar as well. Go to File > Add loacation, type tone://4567, 10
This seems to generate a full scale sine up to +/-1.0 (+/-32768), so be careful with the +1 offset.
The sample rate of the tone can be set by searching "tone" in Advanced preferences.

Also, adding dither to full scale sine may result in clipping.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-29 12:12:46
Yes, - or: those vectors (for this specific file!) would be a test whether this is the correct takeaway from it. Because that "specific condition" should not occur when searching for the best among second-order predictors (coefficient matrix is invertible at order = 2).
If those vectors improve, then you got a more general round-off issue at hand - which may in the larger picture be part of the explanation why -p sometimes does better than one would expect.
Maybe try an (old) IRLS build. Could very well be those predictors show up. When I did some tests, those builds did very well on sines.
Same predictors at  --lax -pl2 -b48000 -A "<quitealot>;irlspost-p(1999)"

Sure it is undithered?
The files were generated using Audition, and if you trust there is no bug, then it is undithered.
So, sample value created by way of "multiplying sample number by 2pi to certain precision" and getting different round-off every now and then?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-09-29 12:26:12
Sure it is undithered?
The files were generated using Audition, and if you trust there is no bug, then it is undithered.
So, sample value created by way of "multiplying sample number by 2pi to certain precision" and getting different round-off every now and then?
Audition 1.5 was a 2004 product which is no longer supported, and I am not good enough to reverse engineer their code.

Within freewares I know, apart from foobar, Audacity and SoX also support tone generation, so you can try to generate some files and compare them.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-09-29 13:01:24
sox -D -n -r 48000 -c 1 -b 16 4567sox.wav synth 10 sine 4567

-D for no dithering. Differs a little from yours, so there is a "not exact scine'nce" pun waking here.
At least each one-second chunk is identical for this one (edit: not sure if that is a good thing for testing ...), but still it xz's to bigger than yours.



Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-02 19:06:55
So I tried it. The total duration of the two attached playlists is 41h 39m 06s. The playlists contain single tracks and images, so not every single track name is shown.

-8
38.1 GB (41002782313 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
37.7 GB (40507157807 bytes)

5 files out of 306 still clipped with -2dB gain. Yes, if one keeps throwing Merzbow to the chain that requires like -10dB it would harm the stats, so don't do that. Clipping should be controlled to as low as possible, or not at all (e.g. by using RG max non-clip gain).

For simplicity and speed, I did all the conversions within foobar and foo_dsp_resampler.
[attach type=image]27188[/attach]
[attach type=image]27190[/attach]
Same corpus with same settings transcoded from pre-upsampled flac files (16/44 -> 24/192 with -2dB preamp), using foobar multi-file multi-thread, flac 1.4.3. Decoding times are single-threaded using foobar x64 benchmark, all tested on NVMe SSD, i3-12100.

-8p
Total encoding time: 1:49:49.297, 22.75x realtime
40794570053 bytes

-8b8192 -A "subdivide_tukey(3);gauss(22e-2)"
Total encoding time: 19:30.984, 128.05x realtime
40502851907 bytes

-8b8192 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:16.360, 117.47x realtime
40419431443 bytes

-8b9216 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:18.485, 117.28x realtime
40406946503 bytes

-8b16384 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:33.359, 115.93x realtime
40404848546 bytes
Decoding: 240.030x realtime

-8e
Total encoding time: 1:07:58.406, 36.76x realtime
40330004928 bytes
Decoding: 241.866x realtime

-8b8192 -A "subdivide_tukey(4);blackman;gauss(22e-2)"
Total encoding time: 30:46.781, 81.19x realtime
40053310660 bytes
Decoding: 240.803x realtime

From the data above I think one can deduce how slow -8pe on this CPU would be, so I am not going to test it.

gausswork
Gauss, like other windows, works best when coupled with an optimal blocksize. Also, the gauss parameter needed to be calculated for best performance. For example, the target 192kHz sample rate has a Nyquist of 96kHz, if the resampler is lowpassed at about 23kHz like the red plot below, then the value should be 23/96, so it would be gauss(24e-2), the green and yellow plots would be 23e-2 and 25e-2.
X

Blackman on the other hand describes a faster than expected overall decay trend at upper spectrum without doing anything very specific, so can catch more different filter shapes.

If the hardware does not support higher blocksizes, slightly increasing -l and -r would help a lot, like -l14 to -l16 and -r7. The increased decoding complexity should not be a big deal especially for mains-powered devices.

-b16384 can still cause inflation at 192kHz depending on materials, use with caution.

For those who did not follow the previous discussions, I am talking about optimizations for >=4x upsampled data, so the parameters listed above are not suitable for encoding "real" hi-res files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-02 22:15:06
8192 beats 16384 size-weighted in my tests on upsampling those 38 CDs, but it varies quite a lot. Medians could very well tell a different story. I messed up something and it is still running, but I can report on those two block sizes at least.

Using -A "subdivide_tukey(3);blackman" on everything, then
** In overall size:
At 96 kHz, -b8192 beats -b16384, and -eb8192 beats -eb16384.
At 192 kHz, same happens.
At 384 kHz, -b8192 beats -b16384 by around 0.12 percent, but -eb16384 beats -eb8192 by around 0.18 percent.

192 kHz, let's look into that further: No -e here.
* Classical music benefits from larger blocksize -b16384, 12 albums to 2; all except harpsichord and (near-zero) Cage's percussion works. Total impact 0.32 percent (not percentage points!), varying from -0.15 (harpsichord) to 0.63 percent (Bruckner, vocals)
Median impact = 0.37 = median absolute value impact.
But then the rest:
* The heavier music: -b8192 wins by 7 albums against 3, switching sign on impact to signify that:
Total impact -0.14 percent, varying from -0.71 (Laibach, biggest benefit for -b8192) to 0.24 percent (Gojira, that benefits from 16384).
Median impact = -0.24. Remove the sign for median absolute value impact.
* The others. -b8192 wins by 9 albums against 5
Total impact -0.28 percent, max benefit from -b8192 is -1.31 percent (Wovenhand, in this release that is singer/songwriter) and then -0.99 (Sopor Aeternus, that is something completely different: darkwave) - and on the other end, benefiting most from larger blocksizes are the jazz albums: 0.41 percent for both Davis and Johansson. Those were near-mono before dithering I think.
Median impact = -0.32 percent. Median absolute impact: 0.38.


For those who did not follow the previous discussions, I am talking about optimizations for >=4x upsampled data, so the parameters listed above are not suitable for encoding "real" hi-res files.

... but who knows how many hi-res files are "real".
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-03 06:20:36
With unknown hi-res content I usually stuck with the welch;hann;flattop combo, different shapes and easy to memorize window names.

Here is an outlier that still likes -b4096 at 192k. subdivide_tukey(6/1) was used. At 44k -b1152 seems optimal.
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-03 08:33:48
After some investigations the ones that still like -b4096 at 192kHz include ethnic plucking instruments (e.g. Shamisen, Pipa) and high pitched percussive sounds in either acoustic or synthesized manner.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-04 15:41:33
With unknown hi-res content I usually stuck with the welch;hann;flattop combo, different shapes and easy to memorize window names.
As hann is tukey(1), which is "the extreme end" (rectangle being the other), then I would be mildly surprised if it is any good compared to the tried and tested midways. 
Only "mildly" because who knows what works until it is tried ...

the ones that still like -b4096 at 192kHz
If you take those and compare
-b 4096 -r <R>
to
-b 8192 -r <R+1>
what do you get? Upping the partition order by 1 would double the partition number and maintain the partition size in samples - it would be more like the partial effect of -b alone. (The one you posted in #404 does not benefit from doubling block size and increasing the partitioning though.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-04 18:42:42
As hann is tukey(1), which is "the extreme end" (rectangle being the other), then I would be mildly surprised if it is any good compared to the tried and tested midways. 
Only "mildly" because who knows what works until it is tried ...
Both hann and blackman are of the cosine sum family but blackman is one step narrower than hann, so it would catch a decay trend which is not as severe as blackman. flattop by itself doesn't make too much sense due to the shape, so I don't use it alone, I use it with other windows so that the unique shape can catch something other windows can't. I only use this combo with >= 88.2k content, with subdivide tukey no more than 5.

If you take those and compare
-b 4096 -r <R>
to
-b 8192 -r <R+1>
what do you get? Upping the partition order by 1 would double the partition number and maintain the partition size in samples - it would be more like the partial effect of -b alone. (The one you posted in #404 does not benefit from doubling block size and increasing the partitioning though.)
When checking this I found something interesting. In my previous post I was comparing -8p with -8b8192 -A "subdivide_tukey(3);blackman;gauss(22e-2)" and assumed if -8p produced a smaller file then that file must be better with -b4096. There are 11 files, 51 minutes fall into this category. However if I only compare -8 with -8b8192 alone, there is only one file (Yu Miyake - WANDA WANDA) works better with -b4096, and only one more file (Cool Zone - The Blessed Place) works better with -b4096 when using -8 -A "subdivide_tukey(3);blackman;gauss(22e-2)", and the total duration of files work better with -b4096 is reduced to 8m 21s, out of  1d 17h 39m 06s. That said, it doesn't mean these 11 files are better with -8p than -8e.

-8e
969930824 bytes

-8p
996927661 bytes
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-04 19:55:02
So I tried to transcode some mp3 files I collected to flac, turns out I found nothing special. Total 1158 files, 2d 16h 47m 31s, bitrates from 124kbps VBR to 320kbps CBR. All RG scanned and converted with foobar's "prevent clipping according peak". Used Case Smart Dither plugin with highpass filter unchecked. All files are transcoded to 16/44.

-8
22712634863 dithered
22547632564 truncated

-8e
22706853763 dithered
22540297898 truncated

-8p
22693665912 dithered
22527026089 truncated

Among the dithered files, 54 files, 2h 33m 19s are smaller with -8e.
Among the truncated files, 78 files, 4h 11m 36s are smaller with -8e.
Among the dithered files, 49 of them are same as the truncated ones.
Among the remaining 5 dithered files, they don't share any similarity that I can think of.

Among all files showed smaller file sizes with -8e, this piece of music has the biggest file size differences when compared to -8p. Source mp3 file is in 128kbps CBR.

3621067 -8p dithered
3555347 -8e dithered
3172894 -8p truncated
3150222 -8e truncated

4 - 悲情城市 Variation 1
https://www.discogs.com/release/3913204-SENS-%E3%82%BB%E3%83%B3%E3%82%B9-%E6%82%B2%E6%83%85%E5%9F%8E%E5%B8%82-A-City-Of-Sadness
https://youtu.be/O6ciikFXUQw

Regardless of how the Youtube Opus version was encoded, the mp3 file I have looks like this:
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-05 08:04:33
As hann is tukey(1), which is "the extreme end" (rectangle being the other), then I would be mildly surprised if it is any good compared to the tried and tested midways. 
Only "mildly" because who knows what works until it is tried ...
Both hann and blackman are of the cosine sum family but blackman is one step narrower than hann, so it would catch a decay trend which is not as severe as blackman. flattop by itself doesn't make too much sense due to the shape, so I don't use it alone, I use it with other windows so that the unique shape can catch something other windows can't. I only use this combo with >= 88.2k content, with subdivide tukey no more than 5.
When filter length remains the same, the same window can work quite differently in different sample rates, especially with windows that don't take user-specified parameters. Changing -b can somehow change the effective filter length. While Tukey can be set to match Rectangular and Hann, Tukey should be used to do things that only Tukey can do.
X
X
X

The -1dB@25kHz DSD filter I mentioned in another thread (https://hydrogenaud.io/index.php/topic,124702.msg1033121.html#msg1033121) is also apodized with a cosine sum family window using a different configuration.
X

Now, invert the graph and compared it to some DSD modulators:
https://pcmdsd.com/Software/PCM-DSD_Converter.html
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-06 08:51:01
I collected some public domain 44.1/48k samples, made some SoundFonts and arranged in a DAW. One octave has 12 semitones, but it was/is pretty hard to find high quality public domain sample sets (especially, ethnic instruments) that sampled every semitone. For example, it may just has the white keys (7 per octave) or even fewer like 2-3 per octave.

I don't know if this is any interesting test to be honest - but there are a lot of twelve-tone (https://www.nytimes.com/2007/10/14/arts/music/14tomm.html) works composed over the last century.
Surprised though, that didn't find more freely downloadable on the 'net. Yes there are quite a few on the Internet Archive (https://archive.org/search?query=subject%3A%22Twelve-tone%22&page=2&and%5B%5D=mediatype%3A%22audio%22), but most seem to be vinyl rips in mp3.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-06 11:35:06
Codecs (at least not the "AI" ones) don't care about high level stuff that involve music theory. Codecs care more about production procedures like miking, post production etc. For example, a performance of a drum set recorded at the back of the hall can be quiet different from the same performance recorded at a close distance. The former would have most of the transients smoothed out and more reverberant which is more compatible with -b4096/4608 at 44/48k. The latter could be something like -b2304. But then if one uses heavy hand dynamic range compression/limiting on a closely recorded drum set then the transients would be once again disrupted again.

Classical music recordings generally don't use close miking, there could be a dedicated mic for the soloist of a Concerto but never one mic for each performer in a strings/brass/wind section. Even for the soloist (e.g. violin) the treatment can still be quite different from a solo violin used in a pop music arrangement.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-13 09:53:08
A surprising result: a case of
-e not making a single bit difference on an entire thirty-eight minutes album. Bit-identical .flac files created, not only same size.

But it is after sox resampling it to 64000 Hz, 24 bits (with dither) - it does not occur in the unaltered CD image file.  So something is going on in that resampling - but the encoder guessing spot-on for an entire album?  In fact, boosting block size I got it to happen to more albums too.

So what I was about to do - and redo, due to human error - was to upsample material to check how block sizes and -e react.  (Cf. this suggestion (https://hydrogenaud.io/index.php/topic,123025.msg1033219.html#msg1033219) from @bennetng - background below, to those who aren't reading this thread regularly.)
So in addition to trying a high sampling rate like 192k, I tried a small factor up, to see if merely half an octave empty spectrum on top would be enough.
Much to my surprise, the benefit of -e sometimes collapsed all the way to zero instead of increasing.

For one of the albums, the result was quite robust. This album: https://psycroptic.bandcamp.com/album/psycroptic-2  Like, put on track 4, it isn't like all the (sub)frames would be equal.
But it happens for more albums too. Including Miles Davis' Birth of the Cool (in the 'Complete Birth of the Cool' edition) - which doesn't sound at all like this to a human ear (encoders may beg to differ). But for Miles to match -e, I had to try -b32768, which reduces it to "only" 18604 chances to guess less-than-perfect at least once throughout the > 79 minutes.

So, too curious for my own good, I tried to investigate more about "when" the model selection algorithm would hit spot-on the same as a -e would. Psycroptic at 64 kHz again: 
-8 -l<something> with or without -e, tested -l 6 to -l 20:
-l 6: -e makes it differs in some eight subframes. 21 bytes.
-l 7 to -l 13: -e makes zero difference. -l 14, the difference is 84 bytes
-l 18: now the impact of -e has reached 0.1 percent
-l 20: here is where the estimation fails so much that it is bigger than -l 19. Indeed, bigger than -l 17.  Two percent bigger than with -e.

Next (mild) surprise coming up: Give it a few extra apodization functions to choose betweeen. Like, more opportunities --> even a good guessing algorithm would more often fail to hit spot-on, eh?
No!  With -A "subdivide_tukey(5);blackman", it is closer to -e, if anything:
Still the range where it is spot on to -e is -l 7 to -l 13.  But the neighbouring -l6 and -l14 are closer (10 and 4 bytes size difference), so is -l18
Still -l 20 is the point where the guesstimation algorithm fails to make it smaller than one order lower, but at least it beats -l 17.

Going the other way, down to -7: confirming it.

With higher sampling rate, we surely get a difference - like, for this particular album upsampled to 192 kHz, then
-8p improves a percent (!) over -8, but
-8e improves more than four percent over -8. Percent, not points.
(Bitrates are like 2824, compared to 2054 for 1/3 of the sampling rate - uncompressed size ratio is 3, for a perspective.)



Background info:
It is "known" that the -e switch can make a difference when the top frequency range is empty - and that is why it often happens on high resolution files, as they are sometimes sourced from normal resolution - and I was curious how much "empty spectrum" it would take for -e to kick in.
Those of you who are not familiar with what -e does: the reference encoder can choose between a number of ways to calculate the predictor - and would want to choose the one that leads to smaller file. Without -e, it makes an estimate of size per alternative, chooses the one that comes out best, and only that is compressed fully; with -e, it brute-forces through them all.  On CDDA material, the (gu)es(s)timate is so good that -e is very rarely worth it - if you have the patience to wait for -e, you could instead spend that time on more efficient ways to squeeze out bits, unless the material is quite peculiar.  On high resolution signals, there are more such "peculiar" material where the (gu)es(s)timation isn't that good.  Identifying such cases could improve upon the algorithm ... if one bothers.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-13 11:39:08
In my previous test (Reply #408) I also found that MP3 files which occupied about 70% of the spectrum (15-16kHz cutoff) don't have clear advantage when using -e, though zero improvement is quite amusing.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-13 22:00:53
8192 beats 16384 size-weighted in my tests on upsampling those 38 CDs, but it varies quite a lot.
[...]
192 kHz, let's look into that further: No -e here.
* Classical music benefits from larger blocksize -b16384, 12 albums to 2

But, the classical corpus benefits from higher block sizes even when compressing the CDDA (no upsampling no DSPing, just --lax).
-7b8192 --lax beats -8 by 10 albums to 4, and also in total size. And the difference is about twice the size as between -7 and -8
Indeed, -7b8192 wins by "same 10 albums to 4" aganinst -8b4608 and against plain -7.

Even -7b4608 beats -8, though narrowly - sure everything here is narrow, difference btween -7b8192 (best among these) and plain -7 (worst) is only 0.07 percent. It is also much less than how on upsampled to 192 kHz, blocksize 16384 edged out 0.32 over 8192.

But the point is, this is material where bigger blocks than default already improves with CDDA - only we don't test that as much, it being out of subset. No surprise that this is precisely where bigger blocks start improving first.


(Oh, and: Even if -8b16384 did beat plain -7, you can forget about that block size for CDDA.)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-14 14:12:36
A surprising result: a case of
-e not making a single bit difference on an entire thirty-eight minutes album. Bit-identical .flac files created, not only same size.

But it is after sox resampling it to 64000 Hz, 24 bits (with dither) - it does not occur in the unaltered CD image file.  So something is going on in that resampling - but the encoder guessing spot-on for an entire album?

Oh this is weird. Impact of -e atop -8:

2.8 megabytes: savings by -8e instead of -8 on the CDDA corpus. That is .012 percentage points of the 23 GB WAVE uncompressed size. Not much!
1.7 megabytes: After having resampled to 44.1/24, this is what -e saves. FLAC size does however increase by 80 percent, so compression ratio worsens from 51 percent to 64 percent. Not really strange.
1.25 megabytes: Resampling instead to 64/24, this is what -e saves. Going 64 instead of 44.1 increases uncompressed size by 45 percent, but FLAC size increases by only 22 percent over 44.1/24: that is slightly better than the overall compression ratio of 53 percent on the 64/24.
Going 64/24 makes the encoder compress better in percentage terms, and the impact of -e much smaller than for CDDA: only .0025 percentage points of WAVE.

Now, pay attention to the magnitude, there is no decimal point here. Resampling to 96/24, the impact of -e is instead:
187 megabytes. (Lower when I add the blackman though, but > 160 anyway.)

OK, it is known that the FLAC encoder starts picking systematically wrong models when the signal looks like this. And the 187 grows overproportionally from then: Doubling sampling rate to 192 increases the -e impact to 691, blah blah blah.
But 64 is pretty much midway between 44.1 and 96. (Geometric mean is 65.066 kHz.)
And going to 64/24 instead of 44.1/24 makes it do better. Somewhere in that extra half octave from 64 to 96, something weird happens ...

(... is it the resampling algorithm? Should I try 87 and 89?)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-14 17:45:40
Don't compare 16/44 -> 24/44 and 16/44 -> 24/64 because the former does not involve any resampling. Try something like 16/44 -> 24/48 vs 16/44 -> 24/64, and apply the same amount of negative gain for clipping prevention.

44.1k -> 48k has a small enough ratio, yet still retains the characteristics of clean resampling like steep lowpass and intersample peaks. Try omit the blackman as well, as subdivide_tukey should be much more effective for such a small resampling ratio.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-14 18:34:57
I think sox resamples if you specifically ask it to do so in a DSP chain with volume change and everything?

Try omit the blackman as well
There is no blackman in that post except "(Lower when I add the blackman though, but > 160 anyway.)"
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-14 18:49:36
I think sox resamples if you specifically ask it to do so in a DSP chain with volume change and everything?
No. You can use SoX -V followed by your commands, SoX will list the used effects. For example, SoX may add dither automatically depending on the input commands, with -V you can confirm if it is the case or not.

Resample (by using rate) means changing the sample rate without changing playback speed and pitch. So if the sample rate remains unchanged, there is no resampling.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-14 19:45:59
sox -V in.wav out-rate-48000.wav gain -4 rate 48000
Code: [Select]
sox INFO sox: effects chain: input        44100Hz  2 channels
sox INFO sox: effects chain: gain         44100Hz  2 channels
sox INFO sox: effects chain: rate       48000Hz  2 channels
sox INFO sox: effects chain: dither     48000Hz  2 channels
sox INFO sox: effects chain: output     48000Hz  2 channels
sox -V in.wav out-rate-44100.wav gain -4 rate 44100
Code: [Select]
sox INFO rate: has no effect in this configuration
sox INFO sox: effects chain: input        44100Hz  2 channels
sox INFO sox: effects chain: gain         44100Hz  2 channels
sox INFO sox: effects chain: dither       44100Hz  2 channels
sox INFO sox: effects chain: output       44100Hz  2 channels
You can see that when there is no resampling, there is no lowpass and intersample peaks as well.
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-15 17:58:09
Thanks for the correction. I ran it again on 48 kHz/24 resampling; just to pick a number between 64 and 96, I also resampled to 79 kHz/24 resampling. And 192/24 figures are in.
Megabytes saved going -8 to -8e, "new" numbers boldfaced:

0.326 for 48 kHz. Ridiculously low! So the resampling does something?
1.25 for 64 kHz. Low too. For comparison, -8p saves 10.3 megabytes (still only 0.04 percent!) over -8.
11.9 for 79 kHz.
187 for 96 kHz. That is around 0.6 percent (not percentage points), and overtakes big time the 53 megabytes impact of -p.
691 for 192 kHz. That is nearly 2 percent. For comparison, -8p saves half a percent.


Even if 48 kHz made for very low -e impact, I did not find any more bit-identical files. I did not scan them all at higher block sizes though, which was necessary to get equal files for the 64 kHz Sodom, Springsteen and Davis files.

So I went to the one that revealed the curious equality and even for -8, which produced bit-exactly the same file as -8e. The Psycroptic. Tested more sample rates.  -8 gave bit-identical files to -8e for
61 kHz, 64 and 67 kHz. And the former even equal when I lowered the block size to 2048, giving it twice as many chances to guess one wrong.
By doubling blocksize, thereby reducing the chances of guessing wrong, I also got -8 to agree with -8e on the 55 kHz and 58 kHz files. I managed the same for 70 and 73 and 76, but only when maxing out to block size -b65535.

So guessing spot-on what -e would find is still a bit peculiar - even if I at 64 kHz managed to get it for 4 of 38 albums ... ?


Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-16 05:53:36
Consider the most used sample rates, the smallest upsampling factor next to 44.1 -> 48 is 48 -> 88.2 (1.8375x) and -e is going to work. 44.1 -> 79 is only about 1.79x. Technically, even 44100 -> 44101 is a valid upsampling, but it is not something people would do for any practical purpose.

Which also matched my findings on 44.1k MP3 transcoding as the overall cutoff is rarely lower than 15kHz, and I specifically capped the bitrate to not lower than 128kbps CBR and 124kbps VBR. I did not encode these MP3 files myself, I collected them, so they are likely encoded by many different encoders.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-16 12:38:56
So I tried it. The total duration of the two attached playlists is 41h 39m 06s. The playlists contain single tracks and images, so not every single track name is shown.
16/48 -> 24/88.2 using these settings, without dither:
X
X

-8p
Initial conversion include transcoding from APE, flac, WavPack, upsample and -2dB gain therefore not timed, but previous tests showed -8p spent 1.6x time as -8e.
32320498436 bytes

Below are transcoded from pre-upsampled flac files:

-8
Total encoding time: 9:23.046, 244.67x realtime
32347544572 bytes

-8e
Total encoding time: 33:32.563, 68.45x realtime
32324646092 bytes

-8b8192
Total encoding time: 9:11.843, 249.64x realtime
32303228485 bytes

-8b8192 -A "subdivide_tukey(3);welch;hann;flattop"[*1]
Total encoding time: 11:57.859, 191.90x realtime
32240773767 bytes

88 files, 9971799931 bytes are smaller with -8e[*2], same files in -8p used 9976078278 bytes.
218 files, 22344420158 bytes are smaller with -8p[*3], same files in -8e used 22352846161 bytes.

45 files, 2264363917 bytes in [*1] are bigger than the smallest combinations of [*2] and [*3].

The idea is when resampling ratio is approaching 2x, other windows are going to work as well, so the importance of -e is more about the ability to use a lower blocksize for hardware compatibility. Also, -e requires a clean and low ultrasonic noise floor, so "analog upsampling" like playback of 44.1k content at DAC output and recapture by an ADC using 96k is not going to work, for example, download the 24/96 AMPT files below:
https://archimago.blogspot.com/search?q=ampt
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-18 10:27:37
@ktf , you may know whether the following is expected, or maybe that it might be a symptom of something - you see it didn't happen in 1.3.4 on the sine sizes listed below. But, official flac-1.4.3\Win32 and flac-1.4.3\Win64 builds produce different files (yeah I know the FAQ has long tried to explain users that it is normal at least for builds compiled on different architectures and run on different architectures, but still: it appears new with 1.4.)

Sines - like in the numbers below - are by themselves no big deal yes we know, but the question is if this gives a trace leading to something overlooked or otherwise possible to improve easily.
And since it might be related to model selection algorithm, maybe it could explain that the retune you tested and proposed here did not work as well when HA users tested it on Windows?

Because 1.3.4 and 1.4.x uses different apodization functions for -8, I included runs with -8 -A "tukey(5e-1)" , with and without -e and -p too.  You see that it doesn't fare very well on 1.4.3.  But at the bottom of the table, it looks like 1.3.4 does select the tukey (rather than partial/punchout) when brute-forced, while 1.4 does not.
Also it seems that 1.4 benefits more from -e; is that likely to be the double precision?  And if so: should one try double precision also elsewhere?

Code: [Select]
960044	4567Hz-undither.wav
824535 (for -0 and -0e, no matter what version)
374679 4567Hz-undither-5_143-Win64.flac
374539 4567Hz-undither-8_but_tukeyonly_143-Win64.flac
373646 4567Hz-undither-5_143-Win32.flac
373506 4567Hz-undither-8_but_tukeyonly_143-Win32.flac

326994 4567Hz-undither-5_134-Win64.flac
BIT-ID 4567Hz-undither-5_134-Win32.flac

275581 4567Hz-undither-5e_134-Win32.flac
BIT-ID 4567Hz-undither-5e_134-Win64.flac

273276 4567Hz-undither-8_143-Win64.flac
BIT-ID 4567Hz-undither-8_143-Win32.flac

262519 4567Hz-undither-5e_143-Win64.flac
261820 4567Hz-undither-5e_143-Win32.flac

261015 4567Hz-undither-8_134-Win32.flac
BIT-ID 4567Hz-undither-8_134-Win64.flac
BIT-ID 4567Hz-undither-8_but_tukeyonly_134-Win32.flac
BIT-ID 4567Hz-undither-8_but_tukeyonly_134-Win64.flac

254241 4567Hz-undither-8e_but_tukeyonly_143-Win64.flac
253416 4567Hz-undither-8e_but_tukeyonly_143-Win32.flac
220693 4567Hz-undither-8e_but_tukeyonly_134-Win32.flac
BIT-ID 4567Hz-undither-8e_but_tukeyonly_134-Win64.flac
BIT-ID 4567Hz-undither-8e_134-Win32.flac
BIT-ID 4567Hz-undither-8e_134-Win64.flac

179285 4567Hz-undither-8p_143-Win32.flac
BIT-ID 4567Hz-undither-8p_143-Win64.flac

174746 4567Hz-undither-8p_134-Win32.flac
BIT-ID 4567Hz-undither-8p_134-Win64.flac

170643 4567Hz-undither-8pe_but_tukeyonly_143-Win64.flac
170118 4567Hz-undither-8pe_but_tukeyonly_143-Win32.flac

150973 4567Hz-undither-8pe_but_tukeyonly_134-Win64.flac
BIT-ID 4567Hz-undither-8pe_but_tukeyonly_134-Win32.flac
BIT-ID 4567Hz-undither-8pe_134-Win64.flac
BIT-ID 4567Hz-undither-8pe_134-Win32.flac

147183 4567Hz-undither-8e_143-Win64.flac
145394 4567Hz-undither-8e_143-Win32.flac
124466 4567Hz-undither-8pe_143-Win64.flac
124455 4567Hz-undither-8pe_143-Win32.flac

Again, sines by themselves are not the most interesting, so overnight I ran a couple of versions of the 38 albums in my signature. 1.4.3 only! 

* CDDA with -5, -5e, -8, -8e:  Win32 vs Win64 size differences are virtually nothing - averaging to less than a part per million.  (Yes the "High Tech Choruses" with test signals, including a sine, got some more bytes difference.)
* Same files upsampled to 96/24, -8 and -8e: Win32 vs Win64 size differences averaging to 103 (-8) vs 56 (-8e) parts per million, 32bit biggest files on most, still not much.  Worst were some metal albums at -8: 657 (Gojira), 456 (Emperor), 440 (Psycroptic).  (Impact of -e exceeded 2 percent with Judas Priest.)
* 96/24 upsamples, testing -8l16 and -8el16: Smaller differences. But -8l6 averages to a percent bigger files than -8 (max: +9.8% - and: pretty much unaffected by Win32/Win64 build) so ... avoid for now.

Takeaway from the latter is that as long as a lot of "high resolution" material is fake or for other reasons have very little content in the inaudible octave(s), the encoder isn't yet ready to increase "-l" in presets. Even if say a Linux build on a different architecture doesn't show such results - indeed that would be a sign there is something to tweak for stability.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-21 23:17:47
Meanwhile, cf. Reply #420: The -8 vs -8e impact - is it the resampling or the 24 bit or something else?

Apparently, -8 hits closer to -8e for 24 bits files, than for 16 bits files. That is, size savings in -8e over -8, are larger for 16 bits even if the files are smaller.

I did two tests. One with this 38 CD corpus, and for a sanity check that it isn't an artefact of whatever sox command I have used, I afterwards tested a few 44.1/24 downloads, with similar result.


Test 1: The 38.
2.8 megabytes -e impact on the original CDDA. Out of approx 12 GB.
2.4 megabytes -e impact after resampling to 44.0/16 and no dither.
24 bits: 1.7 megabytes -e impact after resampling CDDA to 44.0/24 and no dither.  File sizes 23 GB, bitrate 1362.
24 bits: 1.7 megabytes -e impact when resampling the 96/24 upsample back to 44.1, keeping the 24 bits.  Note, this keeps a volume adjustment -v0.666 that was in the 96/24.
Then:
3.8 megabytes -e impact when ffmpeg'ing it to 16 bits (no dither).  Keeping the volume at -v0.666.  File sizes 11 GB, bitrate 678.

Running it through the sox resampler itself isn't enough to get the close results I saw for CDDA -> 48/24 upsample or CDDA -> 64/24 upsample. There is something about how much below Nyquist the low-pass sits.


Test 2: 44.1/24 lossless purchases (well some free), twenty-two kinda-arbitrarily-selected (no classical music though), one track from each, 2hrs34min in total - not much, but still enough to get a sanity check.

File sizes, -8 and then -8e savings, with and without dither:
24 bit orig.:
1 835 828 150, -8e saves 122 759
no dither:
1 025 018 227, -8e saves 199 641
16 bit dither rectangle
1 025 689 601, -8e saves 206 382
16 bit dither triangle:
1 026 407 484, -8e saves 179 793
16 bit dither "improved_e_weighted" (that's noise shaping)
1 049 624 168, -8e saves 310 590. Still only 0.03 percent impact.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-22 05:30:17
I can't tell from the perspective of -e but in the readme file of my oldsCool software, if you only increase the bit depth of a file with volume adjustment, the resulting file will have some missing bits, but with resampling the resulting file should have no missing bits. I am talking about typical audio files, not test signal like tones. Crafted signals can trigger false positive of LSB Trim and I have no plan to fix it.

Different types of dither should be more relevant when the resampling ratios are higher (e.g. 2x) because with low resampling ratios, the spectral noise floor of dither will be covered up by the audio content.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-26 23:38:56
-e and the model selection algorithm. Evidence from upsamples (and ... higher block sizes)

TL;DR: some evidence that without -e, the reference encoder selects too high prediction order (well what I actually counted: highest prediction order = 12 too often) when
* the sampling rate is high and top octave is empty
OR
* the block size increases
or both.

Background:
We have by now gotten a fairly good idea of when the model selection algorithm makes the brute-force search ("-e") superfluous: CDDA, at least for music that has a fair amount of treble.  Here are some examples where -e helps quite a lot, dark ambient/synth/drone examples posted by this user https://hydrogenaud.io/index.php?action=profile;area=showposts;u=125366
... and: at default block size.
For high sample rates, -e still has something to it.  It is not unlikely that this is due to very little up in the top octave(s), and the upsampled material has circa nothing (it is dithered ...).  That is, I am testing "fake high resolution", but lots of high resolution files are ... not too real.

Question: when -e makes a big difference, does the algorithm select too low or too high prediction order?  (= how long the history used to predict; the presets cap it at the 12 most recent history)  IOW: When -e is called, does the order increase or decrease?
(A better question would be: how well does this correlate with actual improvement.  If anyone wants data ... I kept .ana files.)

Lazy test: how often is the maximum order of 12 chosen?  (Seems most subframes are in orders 9 to 12 ... depending on everything.)

Here is a table for CDDA. No upsampling no nothing.
Commands given: at @bennetng's suggestion, I added another apodization function. So this is -8 -A "subdivide_tukey(3);blackman" -b <blocksize> with and without -e:
(Edit: since I initially messed up something, I might have run some with the Win64 build and not the Win32 build. But although they make different files sometimes, the difference should be very small ... I hope.)
album-b4096-b4096 -e-b 8192-b 8192 -e-b 16384-b 16384 -e
BachHrpsch25%26%35%31%46%37%
BachOrgan20%22%34%36%46%48%
Bruckner22%24%39%41%54%55%
Cage22%24%31%33%41%38%
Handel15%17%26%26%36%33%
Mahler46%49%66%67%77%77%
Mozart+Ch18%19%30%32%43%43%
Valen20%23%36%38%52%53%
VaughWill15%16%25%27%37%39%
VivGuitar20%20%42%38%60%53%
Flute14%15%24%26%35%35%
GouldPiano29%31%45%47%57%58%
PhChoruses27%29%43%45%56%55%
ZabaHarp60%63%77%77%83%83%
Becker,J36%38%51%50%61%57%
Colosseum3%4%9%11%20%21%
Emperor31%33%45%43%56%51%
Gojira22%25%36%37%48%46%
ITWoods54%56%68%68%74%72%
JudasPriest63%65%74%74%79%78%
Kiss31%34%47%47%58%55%
Laibach30%29%47%42%61%51%
Psycroptic5%6%12%12%22%19%
Sodom8%9%16%17%27%26%
Amos,T40%41%53%51%59%54%
BeastieBoys30%30%40%36%48%39%
Bjrnstd27%28%42%40%54%47%
Brown,J18%18%26%23%33%26%
Davis,M5%6%11%12%17%18%
Johansson19%21%34%35%47%46%
Kraftwerk31%28%40%31%45%29%
Rudess,J6%10%13%18%22%26%
Sopor33%33%46%43%56%49%
Springsteen20%20%31%30%41%38%
TheThe21%21%34%30%47%38%
VanHelden37%35%49%40%60%44%
Waits49%50%65%64%72%68%
Wovenhand25%22%35%27%48%31%
Total27%28%39%39%50%46%
At -b4096, -e increases the number of order12 subframes for 32 out of 38 albums. As block size increases, the share of subframes with order = 12 does increase - but now -e starts pulling the share downwards instead.

Upsampling to 96/24, what happens?
album-b4096-b4096 -e-b 8192-b 8192 -e-b 16384-b 16384 -e
BachHrpsch73.1%63.0%74.2%63.9%75.0%64.6%
BachOrgan81.5%62.4%82.7%63.1%83.2%63.3%
Bruckner65.3%61.6%71.9%67.1%76.1%70.5%
Cage70.8%66.7%76.1%70.6%79.0%72.0%
Handel75.5%71.7%79.9%74.9%82.9%77.3%
Mahler65.8%57.5%71.5%61.6%75.0%63.9%
Mozart+Ch73.6%65.2%77.5%68.3%79.8%69.7%
Valen64.3%55.7%67.3%57.0%69.5%57.6%
VaughWill62.5%60.3%70.0%67.2%75.2%71.3%
VivGuitar58.8%57.8%65.9%64.3%71.1%67.9%
Flute81.7%71.6%83.3%72.5%84.6%72.8%
GouldPiano87.2%83.3%88.9%84.4%89.7%84.7%
PhChoruses64.4%53.1%69.5%56.6%72.7%58.4%
ZabaHarp92.4%85.5%93.3%86.2%93.7%86.3%
Becker,J63.8%43.2%64.5%43.5%61.8%42.3%
Colosseum70.0%55.9%71.0%56.8%71.9%57.3%
Emperor65.1%38.8%65.1%39.3%63.6%39.3%
Gojira67.4%47.7%66.9%47.7%65.4%46.4%
ITWoods68.0%44.2%70.6%45.6%70.9%45.9%
JudasPriest61.0%31.4%62.1%32.1%62.7%33.3%
Kiss71.6%52.8%73.1%53.8%73.9%54.4%
Laibach67.0%51.4%69.0%51.7%70.9%50.8%
Psycroptic75.5%57.5%75.7%57.9%74.9%57.7%
Sodom69.2%51.3%69.7%52.1%69.9%52.9%
Amos,T69.3%51.3%70.9%52.3%71.2%52.5%
BeastieBoys66.0%54.3%68.0%54.1%69.2%52.7%
Bjrnstd66.5%57.4%69.1%57.9%71.8%58.3%
Brown,J48.9%38.5%51.7%39.5%53.8%39.6%
Davis,M47.6%40.9%49.2%41.6%50.8%41.9%
Johansson87.9%81.2%89.5%81.9%90.3%81.9%
Kraftwerk67.1%53.2%69.3%52.5%70.8%50.3%
Rudess,J74.0%62.2%76.7%63.6%78.6%63.9%
Sopor71.9%56.8%74.4%57.8%75.9%57.9%
Springsteen76.6%73.9%81.8%78.3%85.1%80.9%
TheThe69.6%55.8%71.8%56.5%73.2%56.3%
VanHelden60.4%42.6%62.5%43.3%63.6%43.4%
Waits85.5%75.6%87.4%76.6%88.6%76.8%
Wovenhand63.7%44.7%66.3%45.0%67.0%43.2%
Total69.1%57.1%71.8%58.7%73.3%59.4%
Invoking -e reduces the number of order12 subframes for each and every file in all three blocksizes.

Size impact ... oh that depends on whether we are counting percents or percentage points.
Those for which -e saves the most percentage points - around 1 - are Emperor and Judas Priest; those are the ones where the difference in the table is largest, in percentage points.
Percentwise, Sodom and then Gojira are the one where -e saves the most. Actually, if I go to 192/24, -e will reduce 3 of 4 order-12 subframes in the Sodom album. And save seven percent of the space.

I could paste the 192/24 table, but ... are there numbers enough by now? Actually the percentage of order12 subframes is slightly lower here, with a few of them reduced quite a lot.


For those who enjoy doing statistics, I can grep and sed down .ana files to share-able size.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-28 11:42:57
Here is a counter example that a hi-res file with full spectrum showed big improvement with -e. A Game Boy emulator using 16/96 output, which is the highest supported format from the emulator I used.
Parameters:
-8eb256 -A "subdivide_tukey(3/1)"

Then I used my hardware Game Boy and recorded the analog output using 192k. -e no longer has advantage, and not compressible with 7z due to analog noise and distortion.
Parameters:
-8pb1152 -A "subdivide_tukey(3/1)"

It would be somewhat relevant to @ktf 's Pokemon Gold entry in the hi-res corpus.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-28 13:48:50
-7 chooses larger orders than -7e. Indeed, -7 -l3 outcompresses -7.

5523203 bytes for -7r0
5497218 bytes for -7r0 -l3
4749741 bytes for -7e

So this is one where -r15 --lax improves quite a lot, but then -e improves even more, like 20 percent.

3919072 bytes for --lax -7r15
3754354 bytes for --lax -7r15 -l3
3151637 bytes for --lax -7r15 -e

Is it due to "non- -e" selecting a bad windowing function? Not really, it seems to boil down to prediction order. Putting a -A at the end, no choice of windowing:
3926169
3787128
3153724

Everything done with standard block size (-b2048 would improve), and with padding.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-28 14:17:39
Reply #426 made me think that -e can find some lower optimal -l values if the encoder cannot use more than 12.

And yes, windowing would be quite ineffective with these kinds of chiptunes with a lot of discontinuities (staircase waveform).
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-28 14:44:44
Reply #426 made me think that -e can find some lower optimal -l values if the encoder cannot use more than 12.
Not sure of the latter. Might - for all that I know - be that the encoder shouldn't use 12, and certainly not anything above.

Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-28 15:39:23
Rectangular is the simplest, fully-opened window, but 7r15 -A rectangle is smaller than 7r15 and 7r15 -A "".
https://download.ni.com/evaluation/pxi/Understanding%20FFTs%20and%20Windowing.pdf
Quote
Even if you use no window, the signal is convolved with a rectangular-shaped window of uniform height, by the nature of taking a snapshot in time of the input signal and working with a discrete signal. This convolution has a sine function characteristic spectrum. For this reason, no window is often called the uniform or rectangular window because there is still a windowing effect.
So I am talking about the concept of rectangular window = no window, and the windowing actually worsened the results.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-28 17:25:06
Not sure what you mean here. For one thing it isn't related to order.
Also, AFAIUnderstand: flac.exe defaults to the tukey(5e-1) window. That isn't "choice between tukey and rectangle", it is that single tukey window with no choice. So -A "" does not use "nothing" in any sense of the word, it tells the encoder to use the default. Which isn't rectangle.
(4881364 bytes) -7 is the same as -7 -A subdivide_tukey(2)
(4886263 bytes) -7 -A "" is the same as -7 -A tukey(5e-1)
(4872125 bytes) -7 -A rectangle is something else than -7 A ""
-A rectangle gives smaller files? OK, so this just happen to be one of the signals where downweighting the edges of the window worsens prediction. That happens every now and then but not so often that rectangle generally is a good window.

Edit: I omitted the -r15, which is a different part of the encoding.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-28 18:44:48
I am talking about windowing in a broad perspective because I learnt about different windows before studying about flac. So before I looked into flac I learnt about no windowing=rectangle.

For example, why I made a long post in Reply #409? Because you talked about tukey(1)=hann and tukey(0)=rectangle. While it is correct, the information is from the perspective of flac's documents. In general when people talk about windowing, when people say hann or rectangle, they don't consciously relate them to tukey.

Something like this when I talked about "no window":
https://www.audiosciencereview.com/forum/index.php?threads/smsl-su-9-balanced-dac-review.16150/post-536177
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-28 19:29:08
Audacity
X

Audition
X

Reaper
X

Tukey is often unavailable in DAWs for signal processing and visualization, even though it is very important in flac.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-29 11:00:53
Compression isn't the same as FFT-based signal processing. There is a connection in that autocorrelations can be recovered by the inverse Fourier transform.

There is a history here. Back some nearly twenty years ago, Josh Coalson tested various windowing functions - also with the help of Hydrogenaudio, but I don't think that thread is still around after a couple of forum updates?  Anyway, it so turned out that the Tukey window with a middle-of-the-road parameter between rectangle and Hann, did perform better than most.
I have no idea why - but my own ignorance is surely part of it. Sure we can handwave that rectangle isn't supposed to be good, and in such a short window Tukey gives you more data than Hann, but we cannot merely point at this or that window solving this spectral leakage problem in a different application. Lag-windowing for compression is a different purpose.

Fast forward halfway to the present, the release of FLAC 1.3.1. ktf has come up with the partial_tukey and punchout_tukey functions that taper away parts of the signal, but asymmetrically (like deleting the middle) when computing the autocorrelations. Works because one part of the signal destroys the estimate for the rest, but not vice versa: for large residuals, the impact of a slightly worse prediction is small (as opposed to what least squares optimization presumes!).
But these partial/punchout sections are run in succession and tested, because we don't know beforehand which one actually is best. Maybe we could have, by doing some signal processing - but "conventional" processing would also be time-consuming, so it is not at all given it is worth it. And maybe even less with 1.4's subdivide_tukey functions, which recycle calculations and are thus faster.


But bottom line is, your typical signal processing has one rationale for this and that windowing function; compression has a different one, and FLAC has selected its default from the proof of the pudding.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-29 12:53:02
I was not talking about which window is "good" or "bad", because you mentioned:
Is it due to "non- -e" selecting a bad windowing function?
I read about flac's docs and know that flac defaults to tukey-based windows and they are often very effective. So your mention of "bad windowing" gave me a hint and made me think about because these kinds of chiptunes contain a lot of discontinuities, what if I just use no window?

Also, why did I mentioned blackman when doing resampling? Because resamplers often use windowed sinc in the resampling filters, and blackman is a reasonably suitable window for flac's rather short sample history check.

Even without using FFT and only use direct convolution, different windows still have different frequency responses.

So by no means I was thinking about rectangle is good, what I was thinking by reading your reply in the quote box above is "how about trying rectangle if I can't even get the very effective subdivide_tukey to work?" If I originally thought that rectangle was a candidate, I already used it in Reply #427 when I firstly attached the files.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-29 14:15:08
I tried Pokemon Gold (DMG-AAUJ-JPN) and rendered the emulator output at 16/96, I adjusted each BGM's duration to no longer than 1 minute to avoid too many loops. "rect" in file name means rectangle and "sub3-1" means subdivide_tukey(3/1), otherwise defaults are used.

In a few cases rectangle alone can beat the defaults, but highly unlikely in higher presets which use the much more complicated subdivide_tukey, so it makes more sense to use both subdivide_tukey and rectangle together.

All settings are subset.
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-29 14:50:14
In the last table of yours: to isolate the impact of blocking etc, you should maybe allow for a higher -r when you double the block size. That way the partitions can keep their size. Well as far as I understand, the partition size must exceed the number of warm-up samples, so when taking -7/-8, that would be at least 16 samples and -r8 at 4096. To allow each chunk of 16 to have one individual Rice exponent parameter, it would be more fair to do
-b4096 -r8
-b2048 -r7
-b1024 -r6
-b512 -r5
-b256 -r4


Note that the impact of windowing should be expected to diminish as window size grows. The "old" large literature on windowing in LPC seems to be focused on applications to speech encoding - lossy, and with window sizes in the low hundreds. FLAC seems to have started out using 1152 sample blocks, but for presets >2 we are talking 4096.
(In this thread sometimes more ... or sometimes less.)

There is actually a point I forgot about that "non- -e" selecting a bad windowing function?
Fixed predictors. If you give -7e -A "", then it will not only choose between
order12 as computed by Tukey windowing
order11 as computed by Tukey windowing, etc. downwards
and if you give -7e -A rectangle it is not only the corresponding order12 vs order11 vs ...

Because those are also compared to the fixed predictors (and at worst VERBATIM). And if you select a "bad" function, then it will more often be outperformed by fixed predictors, than if you use a good one. Also if you select a function that sometimes is very good, sometimes very bad - then the latter effect will be offset by resorting to FIXED for the subframes where the function performs "very bad". Maybe even more so when using -e.

Maybe. Let's test on that emulator.flac
Blocksize 2048, since that is better than 4096. -7fb2048 -r15 --lax  -A "<windowfunction>", piped to flac -a  | grep -c FIXED. Looks like -e increases the usage of FIXED.
In the table below, the left number is FIXED hits without -e, the next is with. Sorted by their average, and you see rectangle is an extreme:

Code: [Select]
1900	2017	triangle 
1900 2016 bartlett
1940 1975 flattop
1893 2019 tukey
1890 2019 welch
1900 2006 blackman
1893 2007 bartlett_hann
1891 2009 hann
1897 2003 kaiser_bessel
1892 2006 connes
1901 1986 blackman_harris_4term_92db
1899 1987 nuttall
1869 2001 hamming
1741 1963 rectangle
Also the +222 for rectangle is by far the largest difference. Next is 132. Difference is smallest for flattop.
Gut feeling says it is reasonable that those two stand out, but I dunno how to interpret this.

Oh: no VERBATIM here at all.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-29 15:33:25
You suggest no more than 5 sets of encoding settings on my Pokemon Gold file and I will try them, to avoid confusion you need to list complete commands including presets (e.g. -8), -l, -r -e, -b -A.

Don't suggest --lax as I am more about practical uses.

To avoid further misunderstanding, it is not a challenge. I just want to see what you would try based on the results I provided. I think an 1.5 hour corpus should be a better representation to the Game Boy emulator's synth.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-29 15:48:31
You suggest no more than 5 sets
Not really. I suggest that if one wants to isolate the impact of windowing function, yet change the block size, then it would be more informative to restrict the partitioning for the smaller block sizes.

In the table above, what is it that makes -8r8 -b256 outperform -8r8?
Is it that block size is smaller (256 vs 4096)? Or is it that partition size is allowed to be smaller? Say, in a a subframe compressed with third order predictor, then a chunk of 16 samples could all for sudden be allowed four distinct Rice parameters.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-29 16:41:01
Then you suggest some subset combinations of -l and -b to override -8r8 and I will encode the file and post the results.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-30 11:56:47
Try -8b256 -l1 on emulator.flac

I also tried it on the 1.5 hour Pokemon Gold file.
750156374 bytes
Which is smaller than -8r8 -e, the second last one on Reply #437.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-30 12:58:53
Try -8b256 -l1 on emulator.flac

Try -el0 ...

3349148 with -8b256 -r8 -l1 --no-padding emulator.flac
3221216 with an extra -e
3160490 with -8b1024 -r8 -l0 --no-padding emulator.flac -e. 3973069 without -e, that is quite a significant difference.
So without -e, it makes quite a few "wrong" choices among the fixed predictors. It turns out that it chooses fixed order 1 too often when it should have used fixed order 2.

-b1024 makes for 2770 subframes.
458 of these are CONSTANT, all of value 0.
Since I have ruled out LPC and none are VERBATIM, the rest are FIXED. The distribution over FIXED orders 0, 1, 2, 3, 4 is
9, 141, 2162, 0, 0 with -e
7, 1216, 1089, 0, 0 without -e.

Going from -8fb256 -r8 -l1 to -8b1024 -r8 -el0 saves much more than I could squeeze by going to -8b1024 -r8 -pel32 -A <quite a lot>: then I got it down to 3060547. Subset, yes.


Edit:
Obviously, the options I select will "undo everything that -8 invokes" (that's because -m is on by default ... BTW, -M is not that good here) - but when I start out with -8 and then "-8 & something" I do not bother to delete the "8" from the command-line at the end.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-30 13:36:11
Another way is -8b256 -l1 -A subdivide_tukey(3/1), which is still in the "3" class. Anyway, -e is cheap with -l1.

Game Boy can either do mono or hard-panned stereo, it cannot do gradual panning.

[edit]
3160490 with -8b1024 -r8 -l0 --no-padding emulator.flac -e.
OK,  when doing this what -A is doesn't matter.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-30 16:08:16
Windowing with -A is done to calculate LPC predictor vectors, and with -l0 there are none such - there are just the hard-coded fixed vectors to choose between. And with -el0, that choice is done by calculating (nearly) all the way to the bottom and comparing size.

Anyway, I think the interesting part of this is to isolate where the model selection algorithm could be improved - or if the alogrithm stands: where its deficiencies might mitigated in other ways, for example by avoiding settings (like block sizes?!)) where you "have to" use a lot of brute force.
We tested candidates for retuned presets: https://hydrogenaud.io/index.php/topic,123889.0.html .
Turns out, the model selection algorithm is apparently not ready to handle -l32 (https://hydrogenaud.io/index.php/topic,123889.25.html). So where else does it make a mess when you feed it other audio than the CDDA/normal-resolution signals the encoder has been fine-tuned for over the years?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-30 20:54:19
Just found another emulator with 192k support and more accurate emulation, for example, it reproduced the same big glitches at 0:06 and 0:12 on my hardware recording.

I used the fast --no-padding -eb2048 -r8 -l0 to encode, so just bigger blocksize with higher sample rate.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-31 05:44:28
The same strategy works on Merzbow Pulse Demon and  Venereology as well, super fast encode and decent file sizes with -b512.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-31 08:44:12
Yeah, but we have been there before on Merzbow: (https://hydrogenaud.io/index.php/topic,123025.msg1018251.html#msg1018251)
... but for the Merzbow track, the magic is actually not in the -b, but in the -r. Try options
--lax  -r 15
That's right, no -8, no -p, no -e, no -b (edit: wrote that wrong!)
I got 51954953 bytes. Throwing in a -b 16383 reduces it slightly to 51923540 bytes.

The reason that -b512 looks so good, is not that it is a good block size - it is that it makes only a few samples per Rice partition within the subset's maximum order of 8. Relaxing that ...
It was of course 16384, not 16383.
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-10-31 08:49:48
I read that, but I am talking about fast and decent compression without --lax, which are much more attractive.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-10-31 22:49:27
-e and the model selection algorithm. Evidence from upsamples (and ... higher block sizes)

Block sizes, then. Yes, and more -e as well. (And adding a blackman function, at @bennetng 's suggestion.)

TL;DR: for upsampled material,



A summary of compression ratios:
(https://i.imgur.com/p8O15eZ.png)
Only every other artist labeled by name. The bottom curves reveal that the settings don't matter that much, but for some albums it isn't insignificant in percents.
The apodization chosen for the "-A ..." is -A "subdivide_tukey(3);blackman", i.e. like -8 with an additional blackman. We shall see the impact of that extra function.

Some albums in particular:
Not a single subframe encoded as VERBATIM. Some CONSTANT subframes that are absolutely not 0, an artefact of the resampling.


The impact of block size:
First, to see if the resampling to 24-bit by itself makes a big impact, I also did a resample to a requency not that far from 44.1: 64 kHz, 24 bits. (That is where the surprise of the -e came in, more about that later.)

(https://i.imgur.com/xz6uNG6.png)(https://i.imgur.com/h0uN9Ih.png)
It isn't so that 44.1/16 -> 64/24 "changes everything": Sure the audio format makes a quantitative impact, sure the upsampling makes b2048 worse and high block sizes less harmful - but the "shape of the graphs" are not too different.

Explanations and remarks:
.

More severe upsampling. Still sticking to percentage points, though that might not be the best metric.
Also I forgot the following, which I have sometimes suggested: when doubling block size, one could - to compare apples more closely to apples - try to maintain partition size by increasing the -r. So, -b4096 -r6 (as -8 does), -b8192 -r7, -b16384 -r8. I did not do that!
(https://i.imgur.com/O92HIgz.png)(https://i.imgur.com/cHePB8S.png)
.

Impact of -e. Why bother?
First, confirming that -e makes a difference you don't see in CDDA.
Second: If the model selection does bad for a certain block size, then improving the algorithm could overturn everything above.

The 64 kHz had this exceptionally small impact of -e - even, in some instances producing bit-identical files - so it is not (only) the resampling procedure that makes for the -e impact:
(https://i.imgur.com/vG8iHSj.png)(https://i.imgur.com/KBHvrYp.png)


More severe upsampling again. When upsampled an octave or two, graphs start to wiggle big time:
(https://i.imgur.com/fy5iXOp.png)
(https://i.imgur.com/zSknORv.png)
These are percentage points. "Triple-ish magnitude" in the 192 kHz chart is for the most bad-ass figures; for the Sodom album, -8e is seven percent smaller than -8. However with -b8192 -A "subdivide_tukey(3);blackman" it is down to slightly above five. And furthermore, -b8192 is better than -b4096 both with and without -e and the -e benefit is smaller.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-11-01 00:07:34
More block size testing. This time on much smaller material I obtained as 96/24 (well some percent were 88.2/24). Not saying the high resolution isn't "fake".

TL;DR:
Depends on corpus, especially if you are looking for a "winner" - but 8192 seems like a safer bet, always within plus/minus 0.4 percent of 4096.
Block size 16384 might perform outright bad, like > 1 percent worse than 4096 and 0.9 percent worse than 8192 - and  its benefits are predictably small, it halves the already low block overhead of 8192.
Impacts are not that different with -e or -p.
-p or -e best? For most files, -p makes for smaller files than -e, but the exceptions could have much bigger impact.

Settings:
Figures based on the latter. FLAC build used: https://hydrogenaud.io/index.php/topic,123176.msg1033168.html#msg1033168 with multi-threading.


Corpus and results:
* 7 first tracks merged to one file from Kimiko Ishizaka: Die Kunst der Fuge for solo piano, 96/24 free download from here. (https://kimikoishizaka.bandcamp.com/album/j-s-bach-the-art-of-the-fugue-kunst-der-fuge-bwv-1080)
 Block size 16384 saves 0.04% over 8192.
* 7 first tracks merged to one file from Nine Inch Nails: The Slip. 96/24 free download.
4096 best, 8192 costs 0.02%, 16384 costs 0.23%
* 7 first tracks merged to one file from Kayo Dot: Hubardo. 96/24 purchase from here (https://kayodot.bandcamp.com/album/hubardo), as used in https://hydrogenaud.io/index.php/topic,120158.msg1003288.html#msg1003288
8192 best, others cost 0.6 to 0.7 percent.
* EP merged to one file: The Tea Party Tx20. 96/24 purchase. As mentioned in the same thread.
@bennetng tells me it must be intentionally clipped for rawer sound, and this is one where I know that 2048 is good. So 4096 beats the two others, and the impact is as big as 0.4 and 0.9 percent.
And -e beats -p.

Tracks and the like:
* Nearly a minute in one file of Anal Trump: That Makes Me Smart!. 88.2/24, from https://analtrump.bandcamp.com/album/that-makes-me-smart
Grindcore, also wants smaller block size. Bigger costs 0.1 resp. nearly 0.4 percent.
And -e beats -p, making 0.3% smaller files.
* Track: Thy Catafalque: https://thycatafalque.bandcamp.com/track/erdgeist-2021
Melodious black metal, this is called. The only 16384 win outside the "classical" bracket, and very narrowly by 0.02%, and 4096 costs 0.15%
* 16 single tracks from https://www.discogs.com/release/25346929-Various-HDTracks-2022-Hi-Res-Sampler . Omitted tracks 2 and 9 which are normal sample rate.
. Up to track 8 and 12 and 13: The "more classical music" including modern classical. Higher block size better, but max impact 0.1 percent over -b8192 - which in turn saves nearly 0.3 percent over 4096.
. But: 14 (vocal music) and 18 (violin) narrowly prefer 8192.
. 10: Blues Company. Here 4096 wins by a "sizeable" impact. A quarter of a percent over 8192 and 0.74% better than 16384.
. 17: Even "worse" is the piano jazz, the second-biggest impact in 4096's favour. 0.3 and 1.1 percent bigger.
. The remaining three: 8192 wins, but never more than by 0.2.


Counting it up, -b16384 "lands most victories", but that is because there are so many classical/modern classical tracks here. Even still,  -b8192 is about as good measured with unweighted average: better with the "lightest" -7 based settings AND with the heaviest -8p -A subdivide_tukey(5) etc setting (but not without the -p).
For the heaviest setting tried, make the following comparison:
* Choose the block size that makes smallest for each file,
vs
* Choose always 4096 resp. always 8192, resp always 16384
The "always" make for 0.14% larger resp 0.03% larger resp. 0.04% larger in unweighted average, and it is a corpus that in unweighted average is likely imbalanced in favour of larger block sizes.
In comparison, -p saves 0.06%.

So ... for 96 kHz, block size 8192 could be something to consider, but even thinking of 16384 at that sampling rate is "for those so inclined" (as if compression improvements above -8 isn't already).
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-11-01 07:07:39
The higher the resampling ratio is, the smoother the resulting waveform is with more sine-looking patterns. To exploit the pattern a longer sample history lookup is required and -e reduced this requirement.

8k white noise
X

8k white noise upsampled to 48k
X

overlapping two plots
X
Title: Re: FLAC v1.4.x Performance Tests
Post by: german87 on 2023-11-01 09:26:40
In general, the higher the oversampling factor, the better the sound quality, right?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-11-01 10:18:32
There are different resampling algorithms, but before talking about which is "good" or "bad", not all resamplers or not all resampling settings are designed to remove as much high frequency content above the original sample rate's Nyquist as possible. In such cases, flac's -e setting won't be very effective in file size reduction.

If people believe (hydrogenaud.io has TOS#8, so "believe" is not enough) upsampling a CDDA file to to something like 352.8kHz before sending to the DAC can improve perceived sound quality, they can do this on their own files on the fly using the playback software's DSP options instead of obtaining such files from content providers.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-11-01 13:38:13
The higher the resampling ratio is, the smoother the resulting waveform is with more sine-looking patterns. To exploit the pattern a longer sample history lookup is required and -e reduced this requirement.

This is where I get (mildly) surprised at what actually happens. Yes "more sine-looking", but not "sine" in the sense that two past samples determine the entire thing. Nearer that when you take it to the extreme (way more than the 6x in the above waveform), but still. (Also we have seen that even with -e, reference FLAC doesn't predict sines by order 2, likely due to quantization to integers, so even if upsampling would smoothen it to sine-like, it wouldn't reduce the order all the way.)

When invoking -e (taking that as best shot for "best" predictor), I "often" see the following: When the model selection algorithm suggests 11, -e will select 12 for the CDDA original and 10 for the upsample. Hm?
If you hypothetically could choose the ten samples samples -2, -4, -6, ... , -20 you would likely get the interpolated samples predicted well too, but there is no provision for that in the FLAC format.

"Mildly" surprised only, I am used to getting surprises here. And we aren't comparing random apple to random apple, when one of them is well-tuned to its own home ground.
If we look at other codecs, going CDDA -> high resolution reveals that they were well tuned for CDDA. http://audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%206%20-%20hires.html : Monkey's (which uses big blocks) misbehaves for the highest resolutions, where WavPack isn't much good without -x. (Even -x1 improves so much one shouldn't wavpack hi-rez without it.)
Or maybe it is FLAC that is good because it doesn't spend bits modeling long-term patterns that aren't there? Even if we see that FLAC compression often can be improved quite a lot for those signals?
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-11-01 14:36:49
More sine-looking, but a complete sine is not necessary, or not even sine, just some curves. When the signal is bandlimited to a certain cutoff point, the samples cannot be bent to the opposite direction instantaneously because it will create a glitch which is not bandlimited. It must take several samples (depends on resampling ratio) to reach the opposite direction, and those "several samples" can be exploited even when a long term lookup is not possible.
Or maybe it is FLAC that is good because it doesn't spend bits modeling long-term patterns that aren't there? Even if we see that FLAC compression often can be improved quite a lot for those signals?
Maybe, especially when you see bloat when using WavPack's -hh or APE's extra high or insane, trying too hard to find patterns but turns out nothing can be exploited. I observed similar phenomenons in other aspects like video codecs as well, the encoder can be set to analyze macroblock with different sizes, enabling every block sizes can result in bloating in certain contents.
https://shopdelta.eu/h-265-video-coding-standard_l2_aid860.html

[edit]
Also flac allows a lot of user settings while many other codecs are preset-based. Other codecs either only provide hard-coded settings which cannot be tweaked, or the tweaks are not exposed to users.

Take WavPack as example, I would actually like to ask Bryant if it is possible to build a preset which is faster than -x4 to -x6 to specifically optimize for upsampled content without breaking backward compatibility.

flac is like "I can do a lot of things but you need to tell me how to do it right". Other codecs are like "trust me or don't use me".
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2023-11-01 16:30:26
More sine-looking, but a complete sine is not necessary, or not even sine, just some curves. When the signal is bandlimited to a certain cutoff point, the samples cannot be bent to the opposite direction instantaneously because it will create a glitch which is not bandlimited. It must take several samples (depends on resampling ratio) to reach the opposite direction, and those "several samples" can be exploited even when a long term lookup is not possible.

Yeah sure, and that explains why the in-betweens can be well compressed (and why the higher-res compression levels in the top graph in #450 are so much lower). But that is not precisely the same as to say that the optimal prediction length with twice as many samples is half the time. Sure you can predict the next few samples well, since you got the smoothness - but that is not the same as to predict one that is say 10/44100ths of a second away.

BTW, you got any idea how computationally costly it is to perform a simple signal analysis to check upper frequency? If that is where the model selection algorithm could be improved, I mean.


[edit]
Also flac allows a lot of user settings while many other codecs are preset-based. Other codecs either only provide hard-coded settings which cannot be tweaked, or the tweaks are not exposed to users.[/edit]
ktf's tests uses the standard presets. But of course, those are tuned too - and retuned. But obviously: not specifically for high resolution. Yet high resolution material makes FLAC (1.4!) catch well up with heavier codecs - despite what we see, there are often quite significant improvements possible.

It is not the format. The quite unique thing is how the FLAC reference encoder allows a lot of user choice that other codec formats could very well support, had anyone bothered to implement it into the encoder.
ALAC? Not too well known, CUETools' ALAC encoder lets you pick apodization functions among welch, hann, flattop, tukey, and offers search effort also beween flac.exe's default method and its "-e". I tested that as well (https://hydrogenaud.io/index.php/topic,123511.0.html) (corporate codecs should die, though!)

Monkey's certainly has a different philosophy, compressing to precisely the same bitstream as nineteen years ago. (There is a reservation for high resolution, actually (https://hydrogenaud.io/index.php/topic,123663.msg1022348.html#msg1022348).) Even, it stores its MD5 hash computed on the encode, not on the uncompressed audio.


Take WavPack as example, I would actually like to ask Bryant if it is possible to build a preset which is faster than -x4 to -x6 to specifically optimize for upsampled content without breaking backward compatibility.
The answer is yes, it is "possible" :-)
(Pending a successful design, you will have to use --threads .)
Title: Re: FLAC v1.4.x Performance Tests
Post by: bennetng on 2023-11-01 17:36:42
BTW, you got any idea how computationally costly it is to perform a simple signal analysis to check upper frequency? If that is where the model selection algorithm could be improved, I mean.
If the goal is to catch up -e's performance without using -e, the analysis does not need to be in very high accuracy. For example, you can generate a prime number sine like the 4567Hz one, but with some numbers beyond 16kHz, and the benefit of -e will diminish and overtaken by -p, and I already tested in the mp3 corpus that -e is not useful for 16k cutoff, and you tried the 64k upsampled contents too.

The usual spectrum analyzers in DAWs or visualization software often default to more than thousand of samples and can show the spectrum of a CD image within several seconds, and can do animated spectral analysis during playback, which is trivial. For example, foobar's built-in spectrogram can be set to the lowest 128 and one can still easily identify a 18kHz cutoff on an mp3 file, more sensitive than -e.
Title: Re: FLAC v1.4.x Performance Tests
Post by: saratoga on 2023-11-04 17:42:20
In general, the higher the oversampling factor, the better the sound quality, right?

The oversampling ratio on a typical DAC is 64-1024x, so the whatever ratio is built into the file (1-2x) is basically irrelevant compared to that except in so much as it complicates efficiently compressing the audio. 
Title: Re: FLAC v1.4.x Performance Tests
Post by: Kraeved on 2024-02-24 23:43:37
Dear @Wombat, where can I get your high-speed FLAC 1.4.3 build, suitable for a processor with the following features?

(https://i4.imageban.ru/out/2024/02/25/ef921b2d6df14d10ceb8beb722f16200.png)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2024-02-25 00:58:43
I once tried to do something faster for my j5005 but gcc optimized compiles did roughly nothing. Since your T7250 is even slower and older i am sorry.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2024-02-25 12:09:04
@Kraeved: I don't know what the Rarewares compiles (https://www.rarewares.org/lossless.php) require, but https://hydrogenaud.io/index.php/topic,123025.msg1029768.html#msg1029768 indicates that AVX is key to improvements over the official build.
Why don't you run a comparison? I don't think anyone else has posted figures on a (*looks up*) 2007 CPU.
Title: Re: FLAC v1.4.x Performance Tests
Post by: john33 on 2024-02-25 12:41:00
From memory, the non-AVX2 64 bit compiles are SSE-3 as the maximum requirement.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Kraeved on 2024-02-25 13:13:55
@Wombat, I mean do you have 1.4.3 GCC build without AVX requirement? So far I use x64 one from Rarewares.

Code: [Select]
FLAC v1.4.3 Release bundle 2023-06-23
Latest Release, flac.exe, metaflac.exe. win32 compile is XP friendly.

win32-nonXP Download (555kB)
win32 Download (689kB)
x64 Download (537kB)                <<<<<<<<<<<<<<<<<<<<--- This one.
x64-AVX2 Download (995kB)
Title: Re: FLAC v1.4.x Performance Tests
Post by: Wombat on 2024-02-25 15:58:26
I guess you mean a generic GCC 13.2.0 compile then. I attached one. Runs fine on my j5005.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Kraeved on 2024-02-28 14:39:49
Thank you, @Wombat. I compressed and re-compressed several WAV and FLAC albums, but, indeed, noticed no difference in speed. Also I would like to thank all those developers who contribute to future-proof solutions, not just rolling out some binaries to show off, that run smoothly across the globe for decades, even on vintage computers.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2024-02-28 14:56:30
even on vintage computers.
flac for MS-DOS: https://hydrogenaud.io/index.php/topic,123374.0.html

Title: Re: FLAC v1.4.x Performance Tests
Post by: Kraeved on 2024-03-02 06:26:13
Those of you who have been aware of the development of FLAC all these years, please tell me if there are any significant changes between versions 1.3.4 and 1.4.3? I ask because the amazing FSLAC (https://hydrogenaud.io/index.php/topic,122390.0.html) lossy encoder by @C.R.Helmrich that works like LossyWAV is still based on version 1.3.4, and I'm worried.
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2024-03-02 07:59:00
1.4.x improves compression, especially on high resolution, but also on enough CD material to make the difference you see in the y-axis:
http://www.audiograaf.nl/losslesstest/revision%205/Average%20of%20all%20CDDA%20sources.pdf
http://www.audiograaf.nl/losslesstest/revision%206/Average%20of%20CDDA%20sources.pdf
The difference in time taken are due to new computer (Intel CPU this round).
Title: Re: FLAC v1.4.x Performance Tests
Post by: Kraeved on 2024-03-12 22:58:46
How is it that SoX creates a file smaller than FLAC -8 and CUETools.Flake -8?
Title: Re: FLAC v1.4.x Performance Tests
Post by: Porcus on 2024-03-12 23:19:20
Padding (for future tags). Reference flac spends 8196 bytes on that by default. CUETools.Flake spends 4096.
You can of course reclaim the space, but then you will have to rewrite the entire file (which isn't much ...) upon if you add any tags.

Actually, compare the -sox and the -flac8 in a text editor. You will see they are identical except at the beginning.