Hydrogenaudio Forums

Hydrogenaudio Forum => Scientific Discussion => Topic started by: enzo on 2018-04-29 23:10:23

Title: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-04-29 23:10:23
While working on the repacker for my multi-threaded MP3 converter, I noticed that LAME uses a relatively slow CRC implementation. Further investigations showed that FLAC, Ogg and Monkey's Audio use similar algorithms for their CRC needs.

I replaced these algorithms with one called slicing-by-8 to see if conversions using these codecs would benefit from it. Turned out the benefit for lossy codecs is marginal, but lossless codecs are sped up quite significantly due to them generating larger output files.

Here are the relative speed ups on my system (Core i7 6900K) using default settings:

CodecEncodeDecode
LAME0.5%-
Opus0.5%1%
Vorbis0.5%2%
Monkey's Audio4%-
FLAC5%5%
Ogg FLAC10%15%

Note that Ogg FLAC benefits twice. The FLAC frames are checksummed by a CRC16 while the Ogg pages are run through a CRC32. Hence the larger speed up.

The patches to change the CRC algorithms to slicing-by-8 can be found here:


And here's the full article (https://freac.org/developer-blog-mainmenu-9/14-freac/277-fastcrc) about this.

Edit: I updated the FLAC patch to fix a bug when using the --enable-64-bit-words option. See here (https://github.com/xiph/flac/pull/57#issuecomment-385432631) for more information.
Edit2: A proof-of-concept FLAC build can now be found under post #9 (https://hydrogenaud.io/index.php/topic,115900.msg956501.html#msg956501) along with some performance numbers.
Edit3: The Monkey's Audio patch has been integrated into the official Monkey's Audio 4.34 (http://www.monkeysaudio.com/) release.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: saratoga on 2018-04-29 23:56:56
Neat! If I ever had free time I'd love to look at optimizing lame or vorbis. I think the last time I checked, a lot of lame optimizations were implemented for an athlon or P4.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: jsdyson on 2018-04-30 02:28:34
Excellent to see more real software development being talked about on this site!!!  Software is a major component of audio processing nowadays, and should be encouraged more -- sure there are DSP sites, but sometimes audio specific might not be appropriate on GP DSP sites.   BTW, I think that I am going to continue my posts on DSP related regarding audio gain control/NR/etc on the DSPrelated site.  It isn't a big deal, except for those who are starting a development effort.  I don't have a strong financial incentive, so helping others is more important to me.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: Porcus on 2018-04-30 07:19:20
Fun. Even for a codec optimized for decoding speed, you can gain five percent just for the CRC?
(I recall someone (Gregory?) complaining over the time taken to create MD5 from the FLAC audio. But MD5 is not CRC* and encoding is done once.)
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-04-30 10:18:10
Neat! If I ever had free time I'd love to look at optimizing lame or vorbis. I think the last time I checked, a lot of lame optimizations were implemented for an athlon or P4.
For LAME there are TMKK's SSE optimizations (http://tmkk.undo.jp/lame/index_e.html) - the patch uses mostly SSE2, but can use some SSE3 and SSE4.1 instructions if available.

For Vorbis there is the Lancer version (https://hydrogenaud.io/index.php/topic,115774.0.html) of course. It uses SSE, SSE2 or SSE3 instructions.

If you would like to improve these patches, they both could use AVX and AVX2 optimizations which might provide significant performance improvements. Also, the LAME patch is using inline assembly which renders it not very portable. Reimplementing it with intrinsics would be great.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-04-30 10:19:47
Fun. Even for a codec optimized for decoding speed, you can gain five percent just for the CRC?
Yes. I guess CRC was simply overlooked in previous attempts to optimize FLAC.

However, speeding up FLAC decoding involved more than just replacing the CRC algorithm. The CRC value was updated whenever a word (4 or 8 bytes) had been processed by the decoder. I changed that so that more bytes are processed at once. The CRC is now updated only when the read buffer is flushed and at the end of each frame.

MD5 is a different beast and I see no way to speed it up significantly at this time. Nayuki did some MD5 optimizations (https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly), but his assembler version is only 10% faster than the C version - that would not make a noticeable difference when applied to FLAC's MD5 calculation.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: dutch109 on 2018-04-30 16:27:50
Thanks Enzo, for the valuable work and article.

From the article:
Quote
It's possible to speed up the CRC calculations even more using other methods such as using the PCLMULQDQ instruction on modern x86 CPUs. However, that would make the code depend on that platform and probably provide only marginal additional speed gains.
SSE 4.2 also has a CRC instruction that I expect would be largely more efficient.

https://en.wikipedia.org/wiki/SSE4#SSE4.2
https://www.felixcloutier.com/x86/CRC32.html

All modern compilers support the associated intrinsics:
https://msdn.microsoft.com/en-us/library/bb514033(v=vs.120).aspx
https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/X86-Built-in-Functions.html
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=crc

However the polynomial value is hardcoded and I don't know if it matches the one used by the various audio formats.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-04-30 17:20:27
SSE 4.2 also has a CRC instruction that I expect would be largely more efficient.
[...]
However the polynomial value is hardcoded and I don't know if it matches the one used by the various audio formats.
Unfortunately, the hardcoded polynomial is different from the one used by Ogg and Monkey's Audio, so that instruction cannot be used.

LAME and FLAC use CRC16, so the CRC32 instruction is not applicable there anyway.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: saratoga on 2018-04-30 17:31:14
Fun. Even for a codec optimized for decoding speed, you can gain five percent just for the CRC?
(I recall someone (Gregory?) complaining over the time taken to create MD5 from the FLAC audio. But MD5 is not CRC* and encoding is done once.)

The stock flac decoder is (or at least was 10 years ago) relatively unoptimized. The numbers we have in tockbox where you can decode a flac file in 5 or 10 MHz are for the ffmpeg flac decoder with CRC disabled and a lot of hand written arm assembly.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-04-30 17:55:51
I built some FLAC compiles for you to try out.

Built for Win64 with GCC 4.9.2 and -O3 -march=nehalem -funroll-loops. FLAC is configured with --enable-64-bit-words.

Here a some numbers converting a 2.5 hour, 48 kHz, 16 Bit Stereo WAV:

CodecEncodeDecode
FLAC35.711s13.646s
FLAC fastCRC32.850s12.411s
Ogg FLAC38.531s16.085s
Ogg FLAC fastCRC33.517s12.627s
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2018-04-30 18:57:49
Hi enzo,

just tested your x64 binary on my ancient i5-2400 with my usual set of test files.
I can confirm an encoding speed boost of 4,3% when compared with the latest git version.
Thank you very much for your efforts - quite interesting that no one thought of optimizing the CRC functions...

.sundance.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: TuNk77 on 2018-05-02 19:51:28
I tested with a small WAV-file (219 MB) and got the following result with my Intel i5 4460 CPU (4x 3,2 GHz):

Using FLAC-1.3.2_Git-2018-04-08_Win_GCC730 I got: Total encoding time: 0:07.629, 170.86x realtime

Using flac-1.3.2-fastcrc-win64 I got: Total encoding time: 0:06.240, 208.89x realtime
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-05-02 20:43:28
Using FLAC-1.3.2_Git-2018-04-08_Win_GCC730 I got: Total encoding time: 0:07.629, 170.86x realtime
Using flac-1.3.2-fastcrc-win64 I got: Total encoding time: 0:06.240, 208.89x realtime
Great, but builds from different compilers and possibly built with different options are not really comparable. That's why I include a regular flac.exe and the flac-fastcrc.exe in my ZIP.

With my build being 22% faster in your run, I suspect the GCC 7.3 build was not compiled with ideal options. It should be closer to about 5% difference otherwise.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: TuNk77 on 2018-05-02 21:42:40
Using FLAC-1.3.2_Git-2018-04-08_Win_GCC730 I got: Total encoding time: 0:07.629, 170.86x realtime
Using flac-1.3.2-fastcrc-win64 I got: Total encoding time: 0:06.240, 208.89x realtime
Great, but builds from different compilers and possibly built with different options are not really comparable. That's why I include a regular flac.exe and the flac-fastcrc.exe in my ZIP.

With my build being 22% faster in your run, I suspect the GCC 7.3 build was not compiled with ideal options. It should be closer to about 5% difference otherwise.

Ah, I see. Sorry for not doing things properly.

EDIT:
I tested with the same wav-file again, I got these results using the flac exe's included in flac-1.3.2-fastcrc-win64
flac.exe: Total encoding time: 0:07.316, 178.17x realtime
flac fastcrc: Total encoding time: 0:06.006, 217.03x realtime
Title: Re: Speeding up codecs with faster CRC calculations
Post by: TuNk77 on 2018-05-02 22:52:17
Sorry for posting again, but I tested again, with an larger wav-file (607 MB).
These are the results:

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\flac\U2 - Pop.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\flac\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:25.896, 139.46x realtime

--

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\U2 - Pop.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:16.100, 224.31x realtime
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-05-02 23:25:14
I tested with the same wav-file again, I got these results using the flac exe's included in flac-1.3.2-fastcrc-win64
flac.exe: Total encoding time: 0:07.316, 178.17x realtime
flac fastcrc: Total encoding time: 0:06.006, 217.03x realtime
OK, so that's still 22% percent difference. I didn't expect that as on my system the difference between those binaries is only about 5% to 7%. Would be interesting to see more results from others.
Destination file: F:\flac-1.3.2-fastcrc-win64\flac\U2 - Pop.flac
Total encoding time: 0:25.896, 139.46x realtime
--
Destination file: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\U2 - Pop.flac
Total encoding time: 0:16.100, 224.31x realtime
Here it could be that for the first run the .wav was not in the file system cache. The difference seems too big. Could you run the non-fast test again (maybe twice to see if the results are constant)?
Title: Re: Speeding up codecs with faster CRC calculations
Post by: TuNk77 on 2018-05-02 23:38:48
I tested with the same wav-file again, I got these results using the flac exe's included in flac-1.3.2-fastcrc-win64
flac.exe: Total encoding time: 0:07.316, 178.17x realtime
flac fastcrc: Total encoding time: 0:06.006, 217.03x realtime
OK, so that's still 22% percent difference. I didn't expect that as on my system the difference between those binaries is only about 5% to 7%. Would be interesting to see more results from others.
Destination file: F:\flac-1.3.2-fastcrc-win64\flac\U2 - Pop.flac
Total encoding time: 0:25.896, 139.46x realtime
--
Destination file: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\U2 - Pop.flac
Total encoding time: 0:16.100, 224.31x realtime
Here it could be that for the first run the .wav was not in the file system cache. The difference seems too big. Could you run the non-fast test again (maybe twice to see if the results are constant)?

Sure no problem, here are the results for the non-fast flac.exe:

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast1.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast1.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.020, 212.18x realtime

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast2.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast2.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.051, 211.80x realtime

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast3.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast3.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.004, 212.38x realtime

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast4.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast4.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.144, 210.65x realtime

CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast5.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast5.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:16.957, 212.97x realtime
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-05-02 23:46:51
Sure no problem, here are the results for the non-fast flac.exe:
Thanks! So it seems to settle around 212x while the fast version was 224x realtime. That's about exactly what I expected - 5 to 6% improvement.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: AiZ on 2018-05-03 00:24:03
Hello,

I'm still running a Core 2 Duo E7200@2.86GHz and your binary works, even if it is supposed to not support SSE4.2 and POPCNT.

Converting a 592MB wav file (time in seconds):
Run1Run2Run3
flac15.00915.00715.013
flac-fastcrc13.91213.96114.092

Quite interesting.

    AiZ
Title: Re: Speeding up codecs with faster CRC calculations
Post by: AiZ on 2018-05-08 16:45:04
Hi,

I eventually got some parts to assemble my new PC, now sporting an incredible...
Pentium G5400.
Woohoo!  :D

Same wav file as above (time in seconds):
Run1Run2Run3
flac7.0907.0667.058
flac-fastcrc6.1806.1276.237

Still great.

    AiZ
Title: Re: Speeding up codecs with faster CRC calculations
Post by: saratoga on 2018-05-08 16:58:30
Enzo,

Do you have any performance profile data for these codecs on current gen CPUs? I don't have time to dig into this right now, but I'd like to eventually and I'm curious if things have changed the last 10-15 years of new CPU hardware.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-05-09 13:41:59
Do you have any performance profile data for these codecs on current gen CPUs? I don't have time to dig into this right now, but I'd like to eventually and I'm curious if things have changed the last 10-15 years of new CPU hardware.
You mean regarding optimizations in general (with new instruction sets like AVX), yes? Not related to the CRC thing.

No, I didn't do any performance profiling. Same for me as for you: I'd like to take a deeper dive into this and try to further optimize codecs, but don't really have time to do it.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: saratoga on 2018-05-09 15:26:26
Yeah, was just curious what the breakdown of codec runtime was by function. I think I last looked at lame performance in 2006 on a 32 bit Pentium 4. Probably things have changed since then.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2018-12-04 14:02:37
Since it's been a while:
Has this great patch already been included to the official repository? (https://git.xiph.org/?p=flac.git;a=summary)
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-12-04 14:32:02
Has this great patch already been included to the official repository?
Yes, it's there: https://git.xiph.org/?p=flac.git;h=5579f29 :)

Should become part of the next official FLAC release.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2018-12-06 07:03:10
@Enzo:
Just ran my test set of FLACs again with John33's ICL18 x64 compile from 18.11.2018.
It took 68.852 sec compared to 70.646 sec with an ICL17 x64 compile from October 2017 (before the CRC patch).

That said, the compile you provided is still the fastest one (66.794 sec) on my i5-2400...
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2018-12-06 09:44:16
That said, the compile you provided is still the fastest one (66.794 sec) on my i5-2400...
Might be due to my build using --enable-64-bit-words which is a bit faster on x64. Not sure if John33's builds have that option enabled.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-09 10:24:59
Quote
Built for Win64 with GCC 4.9.2 and -O3 -march=nehalem -funroll-loops. FLAC is configured with --enable-64-bit-words

Since your build is still the fastest out there:
Did you try to compile the current FLAC version 1.33 with your settings?

Maybe I could try to set up the current version of GCC's C++ compiler and use your settings. You got any tips there?
What's the version you are using atm?
Title: Re: Speeding up codecs with faster CRC calculations
Post by: enzo on 2019-08-09 19:53:52
Did you try to compile the current FLAC version 1.33 with your settings?
I have compiled the new release and attached it here. Used the same settings as above and it's even slightly (~1.5%) faster than the fastcrc preview build (likely due to other optimizations added since 1.3.2).

If you'd like to try building it yourself:

I'm using my custom fre:ac Component Development Kit for compiling. It's still based on GCC 4.9.2 as I plan to update to a newer version only after the fre:ac 1.1 release. The toolkit can be found here: https://github.com/enzo1982/BoCA/releases

The script for building libogg and flac can be found here: https://github.com/enzo1982/freac/blob/ebbe9cb95b507617b5d3cd205329337be80a8012/tools/build-codecs
Changes for this build are replacing -march=nocona with -march=nehalem and removing the -mno-sse3 option in line 27 as well as adding --disable-shared in line 302.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-10 08:37:09
Hi enzo,

thank you so much for your build of FLAV v1.33. Your settings are AWESOME!
I tested it with my set of 40 WAV files (~ 1.9 GB) and compared it to your fastcrc version of v1.32 and Wombat's VS 2017 compile.
I used an i7-6700 @ 3.4 GHz, the WAVs stored on a Samsung EVO 850 SSD. Timings were done with Igor Pavlov's timer64.exe.

In this configuration, the difference between your fastcrc and the 1.33 build is marginal, with a tiny advantage for the 1.33 version, but that's within the range of measurement inaccuracies. But compared to the VS build, the difference is REMARKABLE:

Code: [Select]
flac132-fastcrc.exe (enzo):
Kernel  Time =     2.609 =    6%
User    Time =    35.421 =   89%
Process Time =    38.031 =   96%    Virtual  Memory =     14 MB
Global  Time =    39.359 =  100%    Physical Memory =     14 MB

flac133_enzo.exe:
Kernel  Time =     2.234 =    5%
User    Time =    35.750 =   90%
Process Time =    37.984 =   96%    Virtual  Memory =     14 MB
Global  Time =    39.342 =  100%    Physical Memory =     17 MB

flac133_vs2017.exe:
Kernel  Time =     2.796 =    6%
User    Time =    39.218 =   89%
Process Time =    42.015 =   95%    Virtual  Memory =     14 MB
Global  Time =    43.897 =  100%    Physical Memory =     16 MB

Thanks also for the instructions to set up the toolchain for GCC. Will try that on the next rainy weekend... ;-)
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sanskrit44 on 2019-08-10 12:46:22
Quote
I have compiled the new release and attached it here.

thanks a lot, @enzo - very appreciated! i was using your initial test-build up until now, because it is remarkably faster for me than the build from rarewares!  got to test this now in comparison 8)
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-10 17:41:30
Quote
i was using your initial test-build up until now, because it is remarkably faster for me than the build from rarewares!
Same here! And thanks to enzo's excellent tuning of the compiler switches, the free GCC compiler outperforms those from Intel and Microsoft. Btw, just tested the flac.exe in foobar's "Free_Encoder_Pack_2019-08-04" and sadly, this one is (by far) the slowest:
Code: [Select]
Kernel  Time =     2.671 =    5%
User    Time =    47.625 =   91%
Process Time =    50.296 =   96%    Virtual  Memory =     14 MB
Global  Time =    51.852 =  100%    Physical Memory =     14 MB
Title: Re: Speeding up codecs with faster CRC calculations
Post by: Rollin on 2019-08-10 18:31:18
Btw, just tested the flac.exe in foobar's "Free_Encoder_Pack_2019-08-04" and sadly, this one is (by far) the slowest:
flac.exe from Free_Encoder_Pack was made to work even on ancient CPUs like Pentium 2.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: lvqcl on 2019-08-10 18:46:26
Btw, just tested the flac.exe in foobar's "Free_Encoder_Pack_2019-08-04" and sadly, this one is (by far) the slowest:
Probably the main reason for it is that it's 32-bit. Because yes, compatibility.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sanskrit44 on 2019-08-10 20:32:22
Quote
i was using your initial test-build up until now, because it is remarkably faster for me than the build from rarewares!
Same here! And thanks to enzo's excellent tuning of the compiler switches, the free GCC compiler outperforms those from Intel and Microsoft.

i just made a little 1.33 comparison, not as profound as yours and under wine 64bit with fb2k. anyway here is the result. surprisingly, netrangers build is the fastest right now :)

netranger: 345x
enzo: 300x
wombat: 244x

enzo flac 1.32: 370x
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-11 12:30:21
Quote
Probably the main reason for it is that it's 32-bit. Because yes, compatibility.
Quote
flac.exe from Free_Encoder_Pack was made to work even on ancient CPUs like Pentium 2
That makes perfect sense! And is very important for the targeted audiance = Everyone!
Thanks for the explanation!
Title: Re: Speeding up codecs with faster CRC calculations
Post by: Case on 2019-08-11 17:49:42
@sundance, care to try this build?
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-12 12:49:33
Atm, I just can run the test on my office computer:
Code: [Select]
i5-6500 @ 3.2 GHz, 16 GB RAM

flac133_NetRanger:
Kernel  Time =     3.109 =    5%
User    Time =    52.562 =   90%
Process Time =    55.671 =   95%    Virtual  Memory =     14 MB
Global  Time =    58.036 =  100%    Physical Memory =     14 MB

flac133_VS2017:
Kernel  Time =     3.109 =    6%
User    Time =    46.343 =   90%
Process Time =    49.453 =   96%    Virtual  Memory =     14 MB
Global  Time =    51.172 =  100%    Physical Memory =     13 MB

flac133_enzo.exe:
Kernel  Time =     2.890 =    6%
User    Time =    40.734 =   90%
Process Time =    43.625 =   96%    Virtual  Memory =     14 MB
Global  Time =    45.233 =  100%    Physical Memory =     14 MB

flac133_Case.exe:
Kernel  Time =     2.890 =    6%
User    Time =    39.796 =   90%
Process Time =    42.687 =   97%    Virtual  Memory =     14 MB
Global  Time =    43.916 =  100%    Physical Memory =     14 MB

Look and behold! Another speed increase of ~3%...
As soon I'm back on my own computer, I'll keep you posted on the results there.
What compiler/settings did you use?
Title: Re: Speeding up codecs with faster CRC calculations
Post by: Case on 2019-08-12 13:04:14
I used GCC 4.9.3 (oldest one I had archived) with CFLAGS -O3 -m64 -march=haswell -funroll-loops -fno-stack-protector.

It looks like the newer the GCC version is the slower the compiled FLAC binary becomes. With the same settings as enzo used my compile was a tiny fraction slower. GCC 7.3 produced somewhat slower binary and GCC 9.1 somewhat slower than that.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-12 13:14:23
Did another run on an i7 at hand:
Code: [Select]
i7-4790 @ 3.6 GHz, 16 GB RAM

flac133_enzo.exe:
Kernel  Time =     2.046 =    4%
User    Time =    39.484 =   92%
Process Time =    41.531 =   97%    Virtual  Memory =     14 MB
Global  Time =    42.661 =  100%    Physical Memory =     14 MB

flac133_Case.exe:
Kernel  Time =     2.437 =    5%
User    Time =    42.265 =   92%
Process Time =    44.703 =   97%    Virtual  Memory =     14 MB
Global  Time =    45.705 =  100%    Physical Memory =     14 MB
Quite surprisingly, on that CPU your GCC 4.9.3 build was slower than enzo's AND even slower compared to my i5 office computer despite having less core speed...
Now I'm really curious how it performs on my i7-6700 @ 3.4 GHz, which is the same CPU family as the office PC (Skylake)
Title: Re: Speeding up codecs with faster CRC calculations
Post by: sundance on 2019-08-12 16:39:09
Here we are with the results on my i7-6700 @ 3.4 GHz CPU. For both binaries, I did 6 runs (only 3 for the previous tests) and calculated the average value of timer64's "Global time":
Code: [Select]
flac133_enzo.exe:
Global Time: 39.174 sec

flac133_Case.exe:
Global Time: 38.058 sec
So we see a ~3% benefit also, like with the i5-Skylake CPU in my office.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: Case on 2019-08-12 17:25:31
Thanks for testing. Weird that the i7-4790 performed the way it did as my i7-4771 shows my build to be faster.

Would be curious to know how these builds behave on AMD or older Intels. I'm a bit worried about processor specific optimization flags there.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: lvqcl on 2019-08-12 17:48:32
BTW, running configure with --enable-64-bit-words option should give some slight speedup for 64-bit executables.
Title: Re: Speeding up codecs with faster CRC calculations
Post by: Case on 2019-08-12 17:54:55
I used that in all GCC compile tests but neglected to mention it. Otherwise there would be no hope of getting speeds anywhere close to enzo's.
SimplePortal 1.0.0 RC1 © 2008-2019