Skip to main content
Topic: FLAC v1.3.3 (Read 98227 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: FLAC v1.3.3

Reply #51
FLAC 1d5299d MSYS2/mingw-w64-x86_64 2020-03-23 (64bit, GCC 9.3)

Tuned for modern Intel CPUs and linked against similarly built libogg 1.3.4.

Simple recompression benchmark on an AMD 2700X:

This build:
Code: [Select]
./timer64.exe ./flac.exe --verify --best --exhaustive-model-search --qlp-coeff-precision-search ./test.flac -o test_grieverv.flac

test.flac: Verify OK, wrote 377669674 bytes, ratio=0.992


Kernel  Time =     1.171 =    0%
User    Time =   335.281 =   99%
Process Time =   336.453 =   99%    Virtual  Memory =     15 MB
Global  Time =   337.578 =  100%    Physical Memory =     14 MB

NetRanger's build:
Code: [Select]
./timer64.exe ./flac.exe --verify --best --exhaustive-model-search --qlp-coeff-precision-search ./test.flac -o test_netranger.flac

test.flac: Verify OK, wrote 377669674 bytes, ratio=0.992


Kernel  Time =     1.531 =    0%
User    Time =   409.140 =   99%
Process Time =   410.671 =   99%    Virtual  Memory =     15 MB
Global  Time =   411.926 =  100%    Physical Memory =     14 MB

Build information:
Spoiler (click to show/hide)

Re: FLAC v1.3.3

Reply #52
FLAC 1d5299d MSYS2/mingw-w64-x86_64 2020-03-25 (64bit, GCC 9.3)

Rebuilt with more aggressive GCC optimizations taken from GentooLTO.
Compared to the previous build, it's around 4.2% faster on the previously tested file which saved another ~14 seconds.

Build information:
Spoiler (click to show/hide)

Re: FLAC v1.3.3

Reply #53
Speedup is great, from average 125x with Netranger's on my i7-2600 to 145x with GrieverV's build here.
Error 404; signature server not available.


Re: FLAC v1.3.3

Reply #55
His build just doesn't work on my computer. It starts, shows help, but when it needs to encode, it just... exits.
Error 404; signature server not available.

Re: FLAC v1.3.3

Reply #56
His build just doesn't work on my computer. It starts, shows help, but when it needs to encode, it just... exits.

Code: [Select]
-march=haswell
Means that the binary has been fully optimized for the Haswell CPU family and is unlikely to run on any other family unless they have full support for the instruction sets GCC enables for the Haswell family. Fortunately, I believe all current Intel CPUs since Haswell support the Haswell optimized code generated by GCC.

Re: FLAC v1.3.3

Reply #57
@GrieverV
Just tested your latest compile against Case's here on my i7-6700 with my set of 40 WAV files (1.77 GB).
All reading/writing is from/to SSD, but since I do at least 5 runs, the WAVs are likely to be cached.

Code: [Select]
flac_Griever_1d5299d.exe
Kernel  Time =     2.203 =    5%
User    Time =    38.453 =   91%
Process Time =    40.656 =   96%    Virtual  Memory =     14 MB
Global  Time =    42.164 =  100%    Physical Memory =     14 MB
Code: [Select]
CaseĀ“s build:
Kernel  Time =     2.562 =    6%
User    Time =    35.390 =   90%
Process Time =    37.953 =   96%    Virtual  Memory =     14 MB
Global  Time =    39.238 =  100%    Physical Memory =     14 MB
I have no idea what causes this difference (platform specific compile, compiler version used, compiler optimisations ...)
What I noticed is that your binary uses way less "kernel time" and more "user time" than Case's. But interpreting these figures is way beyond my knowledge...

Re: FLAC v1.3.3

Reply #58
@sundance
Their toolchain is likely very different so there could be a lot of possible explanations, but the most likely would be -march=haswell and GCC 4.9.

If I remember correctly, there was a big performance regression on GCC 5 with FLAC encoding and it's possible GCC never recovered it or further regressed. I've attached a build that was built with -march=haswell and -funroll-loops if you want to try it. As far as I could tell, there wasn't much difference with -funroll-loops alone when I tested.

Re: FLAC v1.3.3

Reply #59
Getting pretty close now:
Code: [Select]
flac_GrieverV_1d5299d_haswell.exe
Kernel  Time =     2.390 =    5%
User    Time =    36.468 =   90%
Process Time =    38.859 =   96%    Virtual  Memory =     14 MB
Global  Time =    40.137 =  100%    Physical Memory =     14 MB

Re: FLAC v1.3.3

Reply #60
did some testing of my own with GrieverV's aggressively optimized Skylake 1d5299d 1.3.3 flac.exe. I don't use -ep so I left it out and compared it to my up until recently used flac 1.3.2 (official xiph generic windows binary, flac-1.3.2-win.zip). SSD to SSD, i7-7700K, it's 25 % faster! (I know, apples and oranges, but I switched from this 1.3.2 to GrieverV's 1.3.3 just now and I'm gonna stick with it, so this is my real life improvement)


Code: [Select]
PS C:\Program Files (x86)\foobar2000\Encoders\skylakeflac> Measure-Command { ..\flac.exe --verify --best E:\flac_reenc_AIO\Image.wav -o E:\flac_reenc_AIO\Image_new.flac }

flac 1.3.2
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

Image.wav: Verify OK, wrote 2816137583 bytes, ratio=0,789


Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 50
Milliseconds      : 85
Ticks             : 1100852308
TotalDays         : 0,00127413461574074
TotalHours        : 0,0305792307777778
TotalMinutes      : 1,83475384666667
TotalSeconds      : 110,0852308
TotalMilliseconds : 110085,2308


Code: [Select]
PS C:\Program Files (x86)\foobar2000\Encoders\skylakeflac> Measure-Command { .\flac_1d5299d_skylake_2020-03-25.exe --verify --best E:\flac_reenc_AIO\Image.wav -o E:\flac_reenc_AIO\Image_new.flac }

flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

Image.wav: Verify OK, wrote 2816138302 bytes, ratio=0,789


Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 23
Milliseconds      : 653
Ticks             : 836538482
TotalDays         : 0,000968215835648148
TotalHours        : 0,0232371800555556
TotalMinutes      : 1,39423080333333
TotalSeconds      : 83,6538482
TotalMilliseconds : 83653,8482


Re: FLAC v1.3.3

Reply #61
wow, this is really impressive. seems like there is always room for further optimizations.

flac2flac conversion of 25 tracks in foobar via wine on i5-ivy.

netranger compile:
Track converted successfully.
Total encoding time: 0:22.661, 239.69x realtime


grieverv compile:
Track converted successfully.
Total encoding time: 0:17.350, 313.07x realtime

thanks a lot @grieverv.


Re: FLAC v1.3.3

Reply #63
FLAC 7a35c52 MSYS2/mingw-w64-x86_64 2020-04-07 (64bit, GCC 9.3)

Aggressively optimized builds tuned for modern processors. Included is a generic x86_64 build and an Intel Haswell optimized build that will only run on Haswell or newer Intel CPUs and Zen or newer AMD CPUs. All FLAC tests passed for both builds.

Compared to my previous builds, the generic build is now built with -funroll-loops and FLAC is now configured with --enable-64-bit-words.

Build information:
Spoiler (click to show/hide)

This is likely to be my last build so I've included my makepkg configuration and PKGBUILDs for libogg and flac.

Re: FLAC v1.3.3

Reply #64
no changes in speed with latest netranger build - but grievervs build got tamed from 313x to 264x.

Track converted successfully.
Total encoding time: 0:20.609, 263.56x realtime

same setup as before.

Re: FLAC v1.3.3

Reply #65
@GrieverV:
Well done, mate! This time your Haswell compile is the fastest I ever tested!
These are my test results (under today's full moon conditions  ;) )
Code: [Select]
flac133_ GrieverV_7a35c528.exe:
Kernel  Time =     2.921 =    7%
User    Time =    35.781 =   89%
Process Time =    38.703 =   96%    Virtual  Memory =     14 MB
Global  Time =    40.034 =  100%    Physical Memory =     14 MB

flac133_ GrieverV_7a35c528_haswell.exe:
Kernel  Time =     2.359 =    6%
User    Time =    35.359 =   90%
Process Time =    37.718 =   96%    Virtual  Memory =     14 MB
Global  Time =    39.087 =  100%    Physical Memory =     14 MB

flac133_Case.exe:
Kernel  Time =     2.812 =    7%
User    Time =    35.109 =   89%
Process Time =    37.921 =   96%    Virtual  Memory =     14 MB
Global  Time =    39.223 =  100%    Physical Memory =     14 MB

@sanskrit44: What CPU are you running, and which of GrieverV's compiles did you test?


Re: FLAC v1.3.3

Reply #67
I see consistent gains on a 2700X with this test file: BIS1447-002-flac_16.flac
Code: [Select]
--best:                                                                 4.5%
--verify --best:                                                        5.8%
--verify --best --qlp-coeff-precision-search:                           2.5%
--verify --best --exhaustive-model-search --qlp-coeff-precision-search: 1.2%
The builds I tested were generic x86_64 from 2020-03-25 and 2020-04-07. The differences between the two are just -funroll-loop and enabling 64 bit words for FLAC (upstream commits do not have any effect).

@sanskrit44
Were your tests perhaps done at different times? Background processes can change the results and each build should be tested at least three times to ensure the results are consistent.

Are you able to reproduce similar results with different FLAC files or the one linked above?

Re: FLAC v1.3.3

Reply #68
Are you able to reproduce similar results with different FLAC files or the one linked above?
i retested flac2flac with foobar and booted into a fresh system for each run.


mingw-w64-x86_64-flac-git-1.3.3.r3906.1d5299d6:

my set of 25 flacs:
Total encoding time: 0:17.488, 310.60x realtime

25x flac testfile:
Total encoding time: 0:32.065, 326.21x realtime



mingw-w64-x86_64-flac-git-1.3.3.r3909.7a35c528:

my set of 25 flacs:
Total encoding time: 0:16.978, 319.93x realtime

25x flac testfile:
Total encoding time: 0:30.801, 339.59x realtime


looking much better now :)

Re: FLAC v1.3.3

Reply #69
why are those called haswell builds when they are compiled with -mtune=skylake?


FLAC 1d5299d MSYS2/mingw-w64-x86_64 2020-03-25 (64bit, GCC 9.3)
Code: [Select]
PS C:\Program Files (x86)\foobar2000\Encoders> Measure-Command { .\flac.exe --verify --best E:\Musik_reenc_AIO\Image.w64 -o E:\Musik_reenc_AIO\imageout.flac }

flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

Image.w64: Verify OK, wrote 4116685875 bytes, ratio=0,779


Days              : 0
Hours             : 0
Minutes           : 2
Seconds           : 16
Milliseconds      : 485
Ticks             : 1364857502
TotalDays         : 0,00157969618287037
TotalHours        : 0,0379127083888889
TotalMinutes      : 2,27476250333333
TotalSeconds      : 136,4857502
TotalMilliseconds : 136485,7502


FLAC 7a35c52 MSYS2/mingw-w64-x86_64 2020-04-07 (64bit, GCC 9.3) haswell/skylake on i7-7700K
Code: [Select]
PS C:\Program Files (x86)\foobar2000\Encoders> Measure-Command { .\flacneu\flac.exe --verify --best E:\Musik_reenc_AIO\Image.w64 -o E:\Musik_reenc_AIO\imageout2.flac }

flac 1.3.3
Copyright (C) 2000-2009  Josh Coalson, 2011-2016  Xiph.Org Foundation
flac comes with ABSOLUTELY NO WARRANTY.  This is free software, and you are
welcome to redistribute it under certain conditions.  Type `flac' for details.

Image.w64: Verify OK, wrote 4116686510 bytes, ratio=0,779


Days              : 0
Hours             : 0
Minutes           : 1
Seconds           : 58
Milliseconds      : 800
Ticks             : 1188007802
TotalDays         : 0,00137500903009259
TotalHours        : 0,0330002167222222
TotalMinutes      : 1,98001300333333
TotalSeconds      : 118,8007802
TotalMilliseconds : 118800,7802

Re: FLAC v1.3.3

Reply #70
why are those called haswell builds when they are compiled with -mtune=skylake?

I only included the generic builds flags in the build info since all that's done for Haswell is changing march and removing mtune. The binary is still built for Haswell and the relevant makepkg configuration is included in the download.

You can think of -mtune=skylake as a more modern default for x86_64 CPUs since every Intel CPU from the last decade will likely show benefits over generic/x86_64; AMD Zen also shows improvements and I wouldn't be surprised if bulldozer benefits, too.

 

Re: FLAC v1.3.3

Reply #71
Size difference?

Re: FLAC v1.3.3

Reply #72
Size difference?
Executables are larger due to enabling loop unrolling, FLAC compression is different due to -march=haswell.

I was able to narrow the FLAC compression difference down to the -fma flag. AVX+AVX2 is fine and FLAC built with -fma and configured with --disable-avx --disable-asm-optimizations has the same difference so it's probably GCC/FMA's fault.

Re: FLAC v1.3.3

Reply #73
Yeah I meant compression difference. Was surprised as FLAC does integer math only doesn't it? It made me wonder how recompile could make the resulting files differ...

Re: FLAC v1.3.3

Reply #74
AFAIU, decoding is integer only but libFLAC uses floating point unless FLAC__INTEGER_ONLY_LIBRARY macro is defined.

For those watching this thread, I'll probably be releasing new builds soon with FMA disabled so compression is reproducible with other builds (confirmed MSVC reproduces same compression). I also noticed I had -D__USE_MINGW_ANSI_STDIO=1 disabled for the previous builds, although I'll leave that disabled since it incurs a very minor performance penalty that shouldn't be necessary for ogg/FLAC.

I'll have to build, test, package and benchmark before I release, however I did see ~4-5% gain on a 2700X --verify --best after disabling FMA.

 
SimplePortal 1.0.0 RC1 © 2008-2020