FLAC v1.4.x Performance Tests

Topic: FLAC v1.4.x Performance Tests (Read 73621 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #300 – 2023-05-16 16:54:44

The manual's exposition is not so clear though:

-e, --exhaustive-model-search
Do exhaustive model search (expensive!)

OK, so a command called "--exhaustive-model-search" does exhaustive model search (which is expensive), but it does not specify what aspect of the model it exhausts. Maybe the easiest (with maybe slightly less than fifty percent chance that @ktf will have to correct me) way to understand what it brute-forces, is to check what is brute-forced by other switches (-p and -r), and then the additional knowledge that blocking (the -b) is done before the LPC modelling even start, and is not part of the "model search" - and isn't optimized at all in current flac.exe.

To a novice reader it might even be a bit confusing that precision can be set exactly with -q, and partitioning not only exactly but within a range with -r, while the LPC order (history taken into account!) can only be set as a maximum. I'm not saying that allowing a construction like -l 10,12 would be good for an end-user for anything but explaining something an end-user doesn't need to deal with (but it could be fun for testing).

As for error ... the error uses to select model is not the size of the encode; that would amount to brute-forcing - it is what you want to obtain, but not what you want to do, you want something quicker. Hence the question whether there is some better way to do it that is still quick enough. There is some theoretical support for a logarithmic size measure, and since discrete logs can be obtained by bit-shift, it should be possible ... but "theoretical support" does not mean that it improves much on actual data, over a method that has been tweaked to the level where it by and large works quite well.

Quote from: ktf on 2023-05-16 15:02:14

order of 12 (this is default for compression level 12)

Level -8. (And -7.)

Re: FLAC v1.4.x Performance Tests

Reply #301 – 2023-06-21 13:27:58

Quote from: Porcus on 2022-10-31 23:17:35

-0 does not pick the fastest block size! Test done on CDDA with official 1.4.1 x64.

Tested again with last Sunday's build: https://hydrogenaud.io/index.php/topic,123176.msg1029000.html#msg1029000 , which has some speedups committed.

The time penalty for -b1152 is still there, but not anymore as pronounced (was: >6 percent over 3072, now 2). That also goes for the time penalty on -b4096 (down from +7.5% to 4.)
Still, block sizes in the
3000's were the fastest, followed by the
2000's (although a catch with 'Global time' for -b2048) followed by the
1000's, followed by
4096 and 4608. Different CPU used, an Intel this time.

Timings according to the aforementioned timer64.exe follow. Figures are not comparable to the ones in the above link (different fileset):

-b:	'Process'	'Global time'
-b1024	626	667
-b1152	621	662
-b1536	626	666
-b2048	619	672
-b2304	616	653
-b2560	619	660
-b3072	609	645
-b3456	607	642
-b3584	611	649
-b4096	634	673
-b4608	628	662

Now sorted by block size, not by speed. Here I run the Win32 version. Timings are medians over five runs after an initial (that probably would be affected by previous) was discarded. Each time is three -0fb<XXXX> encodes of the 38 CDs in my signature, encoded twice from FLAC once from WAV, on an SSD.
Timings according to the aforementioned timer64.exe

Re: FLAC v1.4.x Performance Tests

Reply #302 – 2023-06-23 16:16:04

Just tried some flac 1.4.3 x64 builds from the release thread. Wombat's GCC build with asm optimizations and john33's AVX2 build have the best overall encoding speed for my i3-12100. Tried -8p with CDDA, also tried -8e and -8p with 24/48.

Re: FLAC v1.4.3 (Release)

Reply #303 – 2023-06-24 12:21:22

Quote from: Hacker3X on 2023-06-24 08:57:40

Are AVX2 compiles actually faster?

I've tried both of them on one CD, and after multiple retries I've got these results:

non-avx:
Kernel Time =    0.656 = 5%
User Time = 10.187 =   86%
Process Time = 10.843 =   92% Virtual Memory =    15 MB
Global Time = 11.745 = 100% Physical Memory =    19 MB

avx2:
Kernel Time =    0.718 = 6%
User Time = 10.062 =   85%
Process Time = 10.781 =   91% Virtual Memory =    15 MB
Global Time = 11.735 = 100% Physical Memory =    16 MB

CPU is Ryzen 5 3600, 16 GB RAM. Files were on SSD drive.

It's almost eactly the same, if you look at global time. I've decided to keep non-avx one, exe file is smaller

Re: FLAC v1.4.x Performance Tests

Reply #304 – 2023-06-24 15:14:39

Ryzen 5900x -8 -p -V CDDA
My GCC 13.1.0 x64-v3 ~110x, no-asm ~130x, Clang 16.0.5 ~108x
John33 x64 ~97x, AVX2 ~108x

Re: FLAC v1.4.x Performance Tests

Reply #305 – 2023-06-24 16:57:43

i5-1135G7, x64-AVX2 is about 24% faster than the x64 build (both from RareWares).

Re: FLAC v1.4.x Performance Tests

Reply #306 – 2023-06-24 17:10:54

Here are some multithreaded x64 benchmarks using foobar2000 2.0 x64, tested on RAM drive, i3-12100, 16GB RAM, Win10.

Transcoding 29 CDDA flac files with unknown encoding settings (7h 6m 50s, 3.84GB) to new flac files using -8p

Code: [Select]

john33                  422.54x
john33 AVX2             462.62x
NetRanger GCC           437.19x
NetRanger clang         435.33x
Wombat GCC with asm     474.53x
Wombat GCC no asm       427.39x
Wombat clang            436.26x
Xiph                    451.40x

Transcoding 33 24/96 flac files with unknown encoding settings (2h 47m 28s, 2.85GB) to new flac files using -8p

Code: [Select]

john33                   86.57x
john33 AVX2              93.45x
NetRanger GCC            87.63x
NetRanger clang          87.32x
Wombat GCC with asm      92.07x
Wombat GCC no asm        72.92x
Wombat clang             88.56x
Xiph                     88.76x

...and -8e

Code: [Select]

john33                   99.20x
john33 AVX2             105.12x
NetRanger GCC            99.68x
NetRanger clang          98.99x
Wombat GCC with asm     106.31x
Wombat GCC no asm        84.69x
Wombat clang            100.16x
Xiph                    102.15x

Re: FLAC v1.4.x Performance Tests

Reply #307 – 2023-06-24 17:32:58

Thanks bennetng. Interesting is that Clang compiles here still loose ground against GCC when used multithreaded.

Re: FLAC v1.4.x Performance Tests

Reply #308 – 2023-06-24 22:50:40

What does "multithreaded" mean here? Running one cmd start (or PowerShell Start-Job) for each of the 29 or 33 files and recording when it is done with them all?

Re: FLAC v1.4.x Performance Tests

Reply #309 – 2023-06-24 23:10:12

When i use foobar multithreaded the compiles can differ more in performance as for example a single instance with CUETools may suggest.

Re: FLAC v1.4.x Performance Tests

Reply #310 – 2023-06-25 04:37:59

Quote from: Porcus on 2023-06-24 22:50:40

What does "multithreaded" mean here? Running one cmd start (or PowerShell Start-Job) for each of the 29 or 33 files and recording when it is done with them all?

Because I mentioned the use of foobar2000 2.0 x64, the benchmark results are from foobar2000 2.0 x64's console output, which means the decoding was also performed by foobar2000 2.0 x64, only encoding was performed by flac.exe.

Re: FLAC v1.4.x Performance Tests

Reply #311 – 2023-06-25 08:06:45

OK, again I cannot read :-(
Thx.

Edit: just for reference, how much extra time would it take on your setup to transfer tags, pictures, ReplayGain? Just to set the perspective for real-life operations.

Re: FLAC v1.4.x Performance Tests

Reply #312 – 2023-06-25 10:46:34

I usually keep minimal amount of non-audio data in flac files so my test results may not be very useful.

Re: FLAC v1.4.x Performance Tests

Reply #313 – 2023-06-27 11:53:24

bennetng can you test this build with your intel CPU if it changed anything on speed for the 24/96 files?

Re: FLAC v1.4.x Performance Tests

Reply #314 – 2023-06-27 15:11:40

A preliminary run on CDDA at setting -8pr7 (overnight, chewing on my signature for hours straight and repeating, hoping to approach some thermal steady-state) indicates that 1.4.3 is like 13 to 14 percent faster than 1.4.2. Official x64 .exe.

And file sizes ... 1.4.3 saves some twenty bytes per CD

Re: FLAC v1.4.x Performance Tests

Reply #315 – 2023-06-27 15:37:38

r7 is not a bad idea it seems while r8 does almost nothing anymore. Have to try some.

Re: FLAC v1.4.x Performance Tests

Reply #316 – 2023-06-27 17:16:21

Quote from: Wombat on 2023-06-27 11:53:24

bennetng can you test this build with your intel CPU if it changed anything on speed for the 24/96 files?

Not testing everyone's compiles this time, but included single thread ("Do not convert in multiple threads" checkbox) and multithread results by using foobar2000 2.0 x64. Not the same set of 24/96 files in my previous post.

Code: [Select]

-8p                      single         multi
john33 AVX2              24.50x        97.20x
Wombat clang             22.52x        94.15x
Wombat GCC with asm      24.81x        97.18x
Wombat fa32              24.77x        99.34x

-8e                      single         multi
john33 AVX2              28.32x       114.46x
Wombat clang             25.53x       108.15x
Wombat GCC with asm      28.57x       114.95x
Wombat fa32              28.74x       115.01x

-8                       single         multi
john33 AVX2             101.97x       359.46x
NetRanger clang         101.59x       363.32x
Wombat clang            101.36x       369.37x
Wombat GCC with asm     103.38x       358.85x
Wombat fa32             103.38x       361.88x

clang seems to have some multithread advantages when there is no -e or -p.
So what is fa32? How does it affect your Ryzen?

Re: FLAC v1.4.x Performance Tests

Reply #317 – 2023-06-27 17:41:08

I used to use the falign-functions=32 (fa32) compiler flag before but left it out when cleaning up my config.
On my Ryzen benchmarking is a pita because it varies on every run and depends on daily moot.
I did read a while back falign-functions=32 does well on intel CPUs sometimes with better utilizing the cache.
Many thanks for testing!
If interested i can repost in the main 1.43 thread and add the no-asm binary.

Re: FLAC v1.4.x Performance Tests

Reply #318 – 2023-06-27 17:45:06

I couldn't resist doing some speed tests with my favourite "-7" setting.
I'm still using the following setup with my line-up of 40 CDDA-WAVs (3 hours of playing time):
CPU: Intel Core i7-8700 CPU @ 3.20GHz
RAM: 2 x 16 GB DDR4-2666 (1333 MHz) SK-Hynix
HDD: Samsung SSD 860 EVO 500GB

Code: [Select]

My fastest v1.4.2 build (reference):
flac142-x64-gcc1220-Ofast+manyflags-noasm-wombat_2022-10-23.exe (665600 bytes)
-> Average time =  22.635 seconds (5 rounds), Encoding speed = 477.66x

Code: [Select]

xiph-143\flac.exe (302592 bytes)
-> Average time =  22.715 seconds (5 rounds), Encoding speed = 475.99x

flac143-x64-gcc1310-O3-noasm-wombat_2023-06-23.exe (705024 bytes)
-> Average time =  21.904 seconds (5 rounds), Encoding speed = 493.61x

flac143-x64-gcc1310-O3-wombat_2023-06-23.exe (814592 bytes) 		<== FASTEST BUILD
-> Average time =  21.334 seconds (5 rounds), Encoding speed = 506.80x

flac143-avx2-john33.exe (1310602 bytes)
-> Average time =  22.167 seconds (5 rounds), Encoding speed = 487.76x

flac143-fa32-wombat.exe (816640 bytes)
-> Average time =  22.355 seconds (5 rounds), Encoding speed = 483.65x

So the build from "down under" with asm option is:
a) the fastest encoder in my setup
b) 6.5% faster than the official xiph build
c) 6.1% faster than my fastest v1.4.2 build

However, it remains a mystery to me why wombat's fastest 1.4.2 was "noasm", while the fastest 1.4.3 was "asm"...

Re: FLAC v1.4.x Performance Tests

Reply #319 – 2023-06-27 17:53:29

The last flac changes plus compiler versions do strange things. Single performance on my 5900x within CUETools still is way faster with the no-asm version, ~130x vs 110x.

Re: FLAC v1.4.x Performance Tests

Reply #320 – 2023-06-27 19:28:07

Quote from: sundance on 2023-06-27 17:45:06

However, it remains a mystery to me why wombat's fastest 1.4.2 was "noasm", while the fastest 1.4.3 was "asm"...

Because I greatly improved the asm, of course

Re: FLAC v1.4.x Performance Tests

Reply #321 – 2023-06-27 20:37:08

PS H:\> measure-command{h:\flac -ts *.flac}|select totalseconds
13 CDDA images, 3.93GB, two trials for each compile, on RAM disk as usual. Decoding time in seconds so lower is better.

Code: [Select]

compile                  1st           2nd
john33 avx2          30.6907728    30.3865113
NetRanger clang       36.592295    36.5719044
Ozz                  38.4181653    38.4648698
Wombat clang         39.2339226    38.9330091
Wombat fa32           31.886242    31.6372516
Wombat GCC with asm   31.999076    31.6573886
xiph                 35.2711499    35.0125727

Re: FLAC v1.4.x Performance Tests

Reply #322 – 2023-06-28 07:57:05

Quote from: ktf on 2023-06-27 19:28:07

Because I greatly improved the asm, of course

Doesn't your improvement mainly affect hires files with compression levels -0 .. -4?
Anyway, on my setup it greatly improves CDDA files with -7! Well done! As soon as I find some time, I'm going to test it with -8

Re: FLAC v1.4.x Performance Tests

Reply #323 – 2023-06-28 11:32:06

In 1.4.2 Wombat's asm build was also faster for my CPU.
https://hydrogenaud.io/index.php/topic,123025.msg1018104.html#msg1018104
The post right below it contains 24-bit tests too.

Re: FLAC v1.4.x Performance Tests

Reply #324 – 2023-06-30 14:14:43

Very slow figures recorded for the 23 June build posted by https://hydrogenaud.io/index.php/topic,124356.msg1029259.html#msg1029259

It took 2.5x as much time running -8, as the slowest among the others posted in that thread - the Rarewares build without ASM optimizations (indicated as "x64" here). 3x as slow as the others.
On -8pr7 it wasn't that dramatic: 1.6x the time of Rarewares w/o ASM, which was also here slowest among the others.
-5: somewhat in between.

@Ozz : any known explanation?

CPU this time: Intel Core i7500T. The others builds are much more even.

Notice