Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLAC v1.4.x Performance Tests (Read 84409 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #325
I took some CDs and tried -8p vs -8pr7 and it averages ~2,5kb smaller size per album. Not worth the speed hit imho.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #326
I think -8p is already the most practical and simple "slow" setting for most CDDA contents. The Hi-res "playground" is much more fun with different combinations of -b and -A before looking into -e or the even slower -p and more symmetric settings like -l.

Re: FLAC v1.4.x Performance Tests

Reply #327
To the "showcases" division, a file downloaded from <stupidlyhi-rez site
According to @bennetng , it is not only the resolution itself that is dumb, it is the combination of resolution and "crappy quality junior MIDI sequencing stuff", so take it with a grain of salt.
Another unsuspicious victim using this song for audio format tests (see the second screenshot):
https://hydrogenaud.io/index.php/topic,124399.msg1029548.html#msg1029548

Re: FLAC v1.4.x Performance Tests

Reply #328
@bennetng

What?
you are a malicious person.
SHURE SRH1840, SENNHEISER HD660S2, SENNHEISER HD 490 PRO, DT 1990 PRO, HiFiMAN Edition XS, Bowers & Wilkins P7, FiiO FT5, 水月雨 (MOONDROP) 空鳴 - VOID, Nakamichi Elite FIVE ANC, SONY WH-ULT900N (ULT WEAR) (made a Upgrade/Balanced Cable by myself)


Re: FLAC v1.4.x Performance Tests

Reply #330
Left a computer running unattended for some days on CDDA material to compare Win-x64 1.4.3 builds, taken from https://hydrogenaud.io/index.php/topic,124356.0.html , at various settings from -8pr7 (heaviest) to -0 --no-md5-sum. I included official 1.4.2 for a baseline.

As readers of the thread probably know already, results do depend on CPU and should be taken with that grain of salt.

Figures quoted in the table are speed relative to realtime - higher is better this post! (Edit3, got that wrong first edit!)

.-8pr7-8-5-2e-0'00'00 is -0 --no-md5-sum. Remarks per build:
1.4.2 official501803514015246261.4.2 is second-to-slowest except -8pr7 (near-tied to Wombat-Clang)
1.4.3 official58209451468609751Always #3.
Wombat-Clang50215447426568700-8 was the 'least consistent timing' in the pack
Wombat-GCC 59224494540667826Fastest on LPC predictors, second-fastest (and close!) on fixed
Rarewares54174355410592737Over AVX2: Penalty in seconds "not much bigger" for -8pr7 compared to -8
RW-AVX259217488542681849Fastest on fixed predictors, second-fastest (and close!) on LPC
Ozz3469193369500619Ooof. Not equally disastrous on fixed predictors.
Fastest builds were always one of Wombat-GCC and Rarewares with AVX2 optimizations. Both improved over official at every preset tested.
Differences between the two fastest: Using Wombat-GCC over Rarewares on -5 & -8 & -8pr7 would save you about a minute per hour - and using Rarewares over Wombat-GCC on the three other modes would save you about a minute per hour as well.

Computer: HP Prodesk with i5-7500T @ 2.70 GHz.
Corpus: the 38 CDs in my signature. One file per CD.
"Method": "One run" := encode the 38 CD .wav images AND two re-encode from FLAC images. Did first a run to get the CPU to stable heat, discarded that, did three more runs and recorded the median of those three. For sanity-checking variability: also computed speeds using the fastest of those three runs; had I used fastest-of-three, the numbers in the table would have been 0 to 2 higher except Wombat-Clang at -8, which would have gone up from 215 to 222.

Re: FLAC v1.4.x Performance Tests

Reply #331
Many thanks for the benches. Very interesting to see CPUs like yours or sundances one acting!
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #332
By the way, @Case - you did provide some compile that in some situations performed well. Care to post an exe of 1.4.3 too?
I don't know why but I can't make GCC compiles that run as fast as Wombat's. But I made a Clang compile that seems to be faster for me than other compiles. That build is attached.

Very slow figures recorded for the 23 June build posted by https://hydrogenaud.io/index.php/topic,124356.msg1029259.html#msg1029259
I don't know what has been done to the code but MSVC compiles are super slow nowadays. I used to use MSVC to build 32-bit FLAC to be bundled with the Free Encoder Pack (as the official compile is out of the question thanks to its libflac.dll dependency) and its speed was on par with other builds. Not so anymore.

Re: FLAC v1.4.x Performance Tests

Reply #333
I don't know what has been done to the code but MSVC compiles are super slow nowadays.
It MSVC only super slow compared to 1.4.3 from other compilers, or also super slow compared to MSVC's 1.3.4?

I think the main problem is in auto-vectorization. GCC and Clang can auto-vectorize most code reasonably well, but it seems MSVC can´t. I could of course manually vectorize all code, but I'd rather not: there have been some nasty, hard to find bugs (potentially with security implications) in manually vectorized code, so I try to keep use of it low.

From what I've heard, MSVC is incorporating LLVM/Clang anyway: https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170
Music: sounds arranged such that they construct feelings.

Re: FLAC v1.4.x Performance Tests

Reply #334
VS 2019 and later already support LLVM/Clang both with the (now out of date) version built in as an option, and with the latest version installed by creating a 'Directory.build.props' file and placing it in the same dir as the .sln.

The content of the Directory.build.props is:
Code: [Select]
<Project>
  <PropertyGroup>
    <LLVMInstallDir>C:\Program Files\LLVM\</LLVMInstallDir>
    <LLVMToolsVersion>16</LLVMToolsVersion>
  </PropertyGroup>
</Project>
The only issue I have with compiling via VS is that it declares the VC version as the compiler rather than the LLVM/Clang version. I use a version of the x265 video encoder that I compiled that way.

EDIT: As an aside, I think the MSVC 32 bit compiles suffered considerably speed-wise with the removal of the nasm code.

Re: FLAC v1.4.x Performance Tests

Reply #335
Left a computer running unattended for some days on CDDA material to compare Win-x64 1.4.3 builds, taken from https://hydrogenaud.io/index.php/topic,124356.0.html , at various settings from -8pr7 (heaviest) to -0 --no-md5-sum. I included official 1.4.2 for a baseline.

As readers of the thread probably know already, results do depend on CPU and should be taken with that grain of salt.

Figures quoted in the table are speed relative to realtime - higher is better this post! (Edit3, got that wrong first edit!)

.-8pr7-8-5-2e-0'00'00 is -0 --no-md5-sum. Remarks per build:
1.4.2 official501803514015246261.4.2 is second-to-slowest except -8pr7 (near-tied to Wombat-Clang)
1.4.3 official58209451468609751Always #3.
Wombat-Clang50215447426568700-8 was the 'least consistent timing' in the pack
Wombat-GCC 59224494540667826Fastest on LPC predictors, second-fastest (and close!) on fixed
Rarewares54174355410592737Over AVX2: Penalty in seconds "not much bigger" for -8pr7 compared to -8
RW-AVX259217488542681849Fastest on fixed predictors, second-fastest (and close!) on LPC
Ozz3469193369500619Ooof. Not equally disastrous on fixed predictors.
Fastest builds were always one of Wombat-GCC and Rarewares with AVX2 optimizations. Both improved over official at every preset tested.
Differences between the two fastest: Using Wombat-GCC over Rarewares on -5 & -8 & -8pr7 would save you about a minute per hour - and using Rarewares over Wombat-GCC on the three other modes would save you about a minute per hour as well.

Computer: HP Prodesk with i5-7500T @ 2.70 GHz.
Corpus: the 38 CDs in my signature. One file per CD.
"Method": "One run" := encode the 38 CD .wav images AND two re-encode from FLAC images. Did first a run to get the CPU to stable heat, discarded that, did three more runs and recorded the median of those three. For sanity-checking variability: also computed speeds using the fastest of those three runs; had I used fastest-of-three, the numbers in the table would have been 0 to 2 higher except Wombat-Clang at -8, which would have gone up from 215 to 222.
Not too different from mine, including the slowness on Ozz's build. I have an impression that GCC seems to work better with heavier settings (-p and -e) and multithread, and Clang is better on single thread and lighter settings, but the differences are rather small, unlike in last year that Clang was much slower than GCC (before ktf removed some assembly codes that hindered Clang's performance?)

Now I have Linux installed (the real thing, not in a VM) and I also got an NVMe SSD, so I have three types of storage device (HDD, SATA SSD, NVMe SSD) on a single machine now, alongside with Windows in a dual boot environment, I am thinking about doing some Linux benchmarks after figuring out how to build from the source.

Re: FLAC v1.4.x Performance Tests

Reply #336
Time penalty of -r, anyone? Someone with a computer that isn't cooling-constrained, could you test that on your fave presets?

I did some rough tests at https://hydrogenaud.io/index.php/topic,124437.msg1030120.html#msg1030120 and then re-did them, and at least up to r6, the impact is so small that the variation in between runs pretty much kills the comparison.
But, the size impact wasn't big on that test either.

Re: FLAC v1.4.x Performance Tests

Reply #337
15:42 of CDDA on i7-4790K:

-8pr1 - 12s
-8pr5 - 12,5s
-8pr6 - 13,6s
-8pr7 - 14,9s

Re: FLAC v1.4.x Performance Tests

Reply #338
Left a computer running unattended for some days on CDDA material to compare Win-x64 1.4.3 builds, taken from https://hydrogenaud.io/index.php/topic,124356.0.html , at various settings
[...]
Computer: HP Prodesk with i5-7500T @ 2.70 GHz.

Same computer, same corpus, different builds and this time, median of three. The triumphant match of the @Wombat builds, but ... but which one(s)?
It seems that the GCC builds benefit from "harder work": -p, -r7, more apodization functions and -e (which isn't anymore much useful for CDDA) - but in this test, -8r7 was not hard enough in itself and -8 -A subdivide_tukey(5) was not enough either; the combination -8r7 -A subdivide_tukey(5) would make GCC king of the hill. GCC builds has its day on -5r7 too, but I wouldn't trust the numbers that much.

The pattern is clear though. And 1.4.3 did surely speed up over 1.4.2.
This post I measured time (not speed), so that a "+" means slower.

"-e" is not anymore much useful for CDDA, but let's do away with it first, numbers are even more Clang-unfriendly than -p:
-8er7
WombatSomeflagsGCCdisableasm  was fastest at 38 minutes
+0.4%   Wombat-GCC
+1.2%   WombatSomeflagsGCC
+4.3%   Xiph
+12%   RW
+26%   1.4.2
+27%   Case and then Wombat's Clangs at 28 and 29

-p settings with several windowing functions. Hard tasks. "Good for something, unlike -e" ...

-8pr7 -A subdivide_tukey(5)
Wombat-GCC was fastest at 94 minutes
+0.4%   WombatSomeflagsGCCdisableasm
+2.5%   Xiph
+3.2%   WombatSomeflagsGCC
+7.6%   RW
+19%   1.4.2
>+20%   Case and then Wombat's Clangs.

-8pr5 -A subdivide_tukey(5), stepping the "-r" down to 5:
WombatSomeflagsGCC was fastest at 70 minutes
+0.5%   Wombat-GCC
+4.5%   WombatSomeflagsGCCdisableasm
+6.0%   Case
+8to9%   Xiph and Wombat's Clangs
+14%   1.4.2
+15%   RW

Now remove the "p". -r7 first:
-8r7 -A subdivide_tukey(5)
WombatSomeflagsGCCdisableasm was fastest at 19 minutes
+1%ish   WombatSomeflagsGCC and Wombat-GCC
+3.5%   WombatSomeflagsClang
+5.0%   Xiph
+8.4%   Wombat-Clang
+13%   Case
+21%   1.4.2
+22%   RW

Turns out that the GCC builds are dethroned from here on:

-8r5 -A subdivide_tukey(5), stepping the "-r" down to 5:
Wombat-Clang was fastest at 15 minutes. Clang making the top here is a bit WTF.
+6.0%   WombatSomeflagsClang.  This was not the immediate next, so it was not a fluke?
+7to9%   Wombat, the GCCs
+11%   Case
+12%   Xiph
+36%   RW

Down to "normal" windowing functions. -r up to 7 again:
-8r7
Wombat-Clangs and Case are fastest, 9min41 sec (within .6 seconds of each other)
Then   Wombat's GCCs, +1.8 to 2.7 percent (10 to 15 seconds).
+4.8%   Xiph
+25%   1.4.2
+26%   RW

-8 (plain)
Wombat-Clang at 8min38, 5 seconds before Case
+6to8%   Wombat, the rest
+12%   Xiph (that's +61 seconds)
+30%   1.4.2
+36%   RW

-8r5, stepping the "-r" down to 5:
WombatSomeflagsClang this time. 8min12.
Then   Other Wombats, with GCC at the end
+12%   Case (that's quite a bit worse than -8 and -7r5, maybe just busy CPU?)
+15%   Xiph (that's +71 seconds)
+30ies   1.4.2 and RW, lagging worse in %. 
Rarewares spends > 1 minute more doing -8r5 than the other 1.4.3s spend on -8r7.

-7r5 now.  I didn't do standard -7
Pretty much the same pattern as -8r5, differences worse in % but slightly better in seconds, should be as WombatSomeflagsClang is down to 5min26.
Exception: Case is now back at +3.6%, +20seconds
Rarewares is only 15 seconds off being beaten by some -8r5.

-7r3
Pattern continues? Wombat's Clangs win, ten seconds faster than -7r5.
+3.6%   Case, that is 17 seconds
Then:    Wombat-Clang and Wombat GCCs
+16%   Xiph
+40%   1.4.2
+47%   RW.


-5r7
Now for sudden Wombat's GCC win again.  4min 12to15 seconds.  No idea why; if it were -r7 alone, I would have expected it to happen on -8r7 too.
Then:    Wombat's Clangs and Xiph
+5.2%   Case.  That is +13 seconds.
+32%   RW.  Beaten by the fastest -7r5.
+34%   1.4.2

-5r5
All the Wombats then Case in at 4min 2to7 seconds. 
+5.4%   Xiph.  That is +13 seconds.
+34%   1.4.2 then RW, only narrowly beating the fastest -7r7

-5r3: Computer restarted for update during the Rarewares build, but pattern looks the same.


Might do some fixed-predictor run, but ... well?

Re: FLAC v1.4.x Performance Tests

Reply #339
Always interesting what ideas you create for the way of testing :)
When you are bored one day i can do an AVX 512 (x64 v4) compile since one of your CPUs even when a smaller one has this extention. Somehow i didn't recognize or missed numbers about that.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #340
Please do.
Work computer rebooted again, so ... might run something over the upcoming week-end.
But it seems that "-e" is not very Clang-friendly even on lower settings.

Re: FLAC v1.4.x Performance Tests

Reply #341
Build from todays git 5500690 of ktf and the multi-threading version.
x86-64-v3 should have AVX-2 and x86-64-v4 should have AVX-512 support. I also used -falign-functions=32 as GCC compiler flag.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!


Re: FLAC v1.4.x Performance Tests

Reply #343
v3 and v4 stands for the name of the official compiler flags. Both builds are todays v5.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #344
On the computer I'm at right now, your v4 exe does absolutely nothing. No text no processing no nothing. v3 works.

Meanwhile, it seems that fixed predictors aren't Clang's best friends either ... on this computer, at least.

Re: FLAC v1.4.x Performance Tests

Reply #345
Yes, for v4 the AVX-512 support of the CPU is mandatory.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #346
Ah! Back at the i5-11gen it works.
First impression is that the v4 doesn't help, but I haven't tried any really heavy job yet.

Re: FLAC v1.4.x Performance Tests

Reply #347
Meanwhile, it seems that fixed predictors aren't Clang's best friends either ... on this computer, at least.

Confirmed.

Neither is "-e" (this is on CDDA where it should hardly be used though)
- and the following combination is just slayyyyying: fixed predictor, -e and high -r:

-2er7, here is where the fastest Clang takes fifty percent more time than the winner
5:36 for WombatSomeflagsGCCdisableasm. I have no idea why this one makes ten percent faster than anything else - I re-ran and got pretty the same figures over again.
6:0x Wombat other GCC and Xiph.
7:02 Rarewares
8:22 to 8:25: The Clangs, including Case.
One more minute for 1.4.2.

-2er3 (equals -2e) to see if it is the "-7" that does it. It sure makes impact, but not so much on order:
3:42ish for the Wombat GCCs
4:10 Xiph
4:34 The Clangs, including Case.
4:45 Rarewares and 1.4.2

-2 plain to see if it is the "-e".
3:19ish for the Wombat GCCs
3:32 Xiph
3:41 Rarewares
4:02 the Clangs, including Case, and also: 1.4.2


For the impact of -r7 without any "-e", here are -0 modifications. Yes GCCs rule:

-0r7:
3:18ish for the Wombat GCCs with Xiph trailing four or five seconds-ish
3:32 Rarewares
3:48ish the Wombat Clangs with Case trailing four or five seconds-ish
4:02 for 1.4.2

-0r1 to see whether it is the "r":
3:05 for the Wombat GCCs
3:14 to 3:16 Xiph and Rarewares
3:26 for the Wombat Clangs with Case trailing a couple of seconds
3:38 for 1.4.2


Re: FLAC v1.4.x Performance Tests

Reply #348
-2er7, here is where the fastest Clang takes fifty percent more time than the winner
5:36 for WombatSomeflagsGCCdisableasm. I have no idea why this one makes ten percent faster than anything else - I re-ran and got pretty the same figures over again.
The compiles with --disable-asm-optimizations are still way faster on many CPUs it seems but only for 16bit material.
ktf first time explained it here https://hydrogenaud.io/index.php/topic,123025.msg1017351.html#msg1017351
Thats why i still add it to the package and suggest it for the use inside CDDA apps like CUETools or EAC.
Guess the next package needs also a third version with the -falign-functions=32 compiler option because newer CPUs like it but older ones it slows down.
No idea when the point is reached it becomes silly  :D
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

 

Re: FLAC v1.4.x Performance Tests

Reply #349
Blunder in Reply#338 and Reply#347:
Wrong Rarewares build picked for those tests. Should have run them with the AVX2 build, which is faster.

Anyway, the Clangs wouldn't look better if up against something even faster.
And it is kinda weird, (or "in particular since") this is only one CPU: GCC builds win on fixed-predictor lightweight jobs, GCC builds win on heavy jobs - but even nearly as heavy as -8 (which equals -8r6), namely to -8r5, then Clang overtakes it; though, not at -5r5.

No idea when the point is reached it becomes silly  :D
Long ago!

But if everyone here runs their tests with -8 and -8p - despite those being ones where a few percent savings may actually matter - it isn't clear what compiler an official build would benefit from using.
BTW, should it matter at all what CPU is used for compiling?

(Compatibility issues here are only AVX for one Rarewares build and which ones of yours? Plus AVX512 for your v4?)