While working on the repacker for my multi-threaded MP3 converter, I noticed that LAME uses a relatively slow CRC implementation. Further investigations showed that FLAC, Ogg and Monkey's Audio use similar algorithms for their CRC needs.
I replaced these algorithms with one called slicing-by-8 to see if conversions using these codecs would benefit from it. Turned out the benefit for lossy codecs is marginal, but lossless codecs are sped up quite significantly due to them generating larger output files.
Here are the relative speed ups on my system (Core i7 6900K) using default settings:
Codec | Encode | Decode |
LAME | 0.5% | - |
Opus | 0.5% | 1% |
Vorbis | 0.5% | 2% |
Monkey's Audio | 4% | - |
FLAC | 5% | 5% |
Ogg FLAC | 10% | 15% |
Note that Ogg FLAC benefits twice. The FLAC frames are checksummed by a CRC16 while the Ogg pages are run through a CRC32. Hence the larger speed up.
The patches to change the CRC algorithms to slicing-by-8 can be found here:
- LAME 3.100: lame-3.100-fastcrc.diff (https://freac.org/patches/lame-3.100-fastcrc.diff)
- FLAC 1.3.2: flac-1.3.2-fastcrc.diff (https://freac.org/patches/flac-1.3.2-fastcrc.diff)
- Ogg 1.3.3: libogg-1.3.3-fastcrc.diff (https://freac.org/patches/libogg-1.3.3-fastcrc.diff)
- Monkey's Audio 4.33: mac-sdk-4.33-fastcrc.patch (https://freac.org/patches/mac-sdk-4.33-fastcrc.patch)
And here's the full article (https://freac.org/developer-blog-mainmenu-9/14-freac/277-fastcrc) about this.
Edit: I updated the FLAC patch to fix a bug when using the --enable-64-bit-words option. See here (https://github.com/xiph/flac/pull/57#issuecomment-385432631) for more information.Edit2: A proof-of-concept FLAC build can now be found under post #9 (https://hydrogenaud.io/index.php/topic,115900.msg956501.html#msg956501) along with some performance numbers.Edit3: The Monkey's Audio patch has been integrated into the official Monkey's Audio 4.34 (http://www.monkeysaudio.com/) release.
Neat! If I ever had free time I'd love to look at optimizing lame or vorbis. I think the last time I checked, a lot of lame optimizations were implemented for an athlon or P4.
Excellent to see more real software development being talked about on this site!!! Software is a major component of audio processing nowadays, and should be encouraged more -- sure there are DSP sites, but sometimes audio specific might not be appropriate on GP DSP sites. BTW, I think that I am going to continue my posts on DSP related regarding audio gain control/NR/etc on the DSPrelated site. It isn't a big deal, except for those who are starting a development effort. I don't have a strong financial incentive, so helping others is more important to me.
Fun. Even for a codec optimized for decoding speed, you can gain five percent just for the CRC?
(I recall someone (Gregory?) complaining over the time taken to create MD5 from the FLAC audio. But MD5 is not CRC* and encoding is done once.)
Neat! If I ever had free time I'd love to look at optimizing lame or vorbis. I think the last time I checked, a lot of lame optimizations were implemented for an athlon or P4.
For LAME there are TMKK's SSE optimizations (http://tmkk.undo.jp/lame/index_e.html) - the patch uses mostly SSE2, but can use some SSE3 and SSE4.1 instructions if available.
For Vorbis there is the Lancer version (https://hydrogenaud.io/index.php/topic,115774.0.html) of course. It uses SSE, SSE2 or SSE3 instructions.
If you would like to improve these patches, they both could use AVX and AVX2 optimizations which might provide significant performance improvements. Also, the LAME patch is using inline assembly which renders it not very portable. Reimplementing it with intrinsics would be great.
Fun. Even for a codec optimized for decoding speed, you can gain five percent just for the CRC?
Yes. I guess CRC was simply overlooked in previous attempts to optimize FLAC.
However, speeding up FLAC decoding involved more than just replacing the CRC algorithm. The CRC value was updated whenever a word (4 or 8 bytes) had been processed by the decoder. I changed that so that more bytes are processed at once. The CRC is now updated only when the read buffer is flushed and at the end of each frame.
MD5 is a different beast and I see no way to speed it up significantly at this time. Nayuki did some MD5 optimizations (https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly), but his assembler version is only 10% faster than the C version - that would not make a noticeable difference when applied to FLAC's MD5 calculation.
Thanks Enzo, for the valuable work and article.
From the article:
It's possible to speed up the CRC calculations even more using other methods such as using the PCLMULQDQ instruction on modern x86 CPUs. However, that would make the code depend on that platform and probably provide only marginal additional speed gains.
SSE 4.2 also has a CRC instruction that I expect would be largely more efficient.
https://en.wikipedia.org/wiki/SSE4#SSE4.2
https://www.felixcloutier.com/x86/CRC32.html
All modern compilers support the associated intrinsics:
https://msdn.microsoft.com/en-us/library/bb514033(v=vs.120).aspx
https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/X86-Built-in-Functions.html
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=crc
However the polynomial value is hardcoded and I don't know if it matches the one used by the various audio formats.
SSE 4.2 also has a CRC instruction that I expect would be largely more efficient.
[...]
However the polynomial value is hardcoded and I don't know if it matches the one used by the various audio formats.
Unfortunately, the hardcoded polynomial is different from the one used by Ogg and Monkey's Audio, so that instruction cannot be used.
LAME and FLAC use CRC16, so the CRC32 instruction is not applicable there anyway.
Fun. Even for a codec optimized for decoding speed, you can gain five percent just for the CRC?
(I recall someone (Gregory?) complaining over the time taken to create MD5 from the FLAC audio. But MD5 is not CRC* and encoding is done once.)
The stock flac decoder is (or at least was 10 years ago) relatively unoptimized. The numbers we have in tockbox where you can decode a flac file in 5 or 10 MHz are for the ffmpeg flac decoder with CRC disabled and a lot of hand written arm assembly.
I built some FLAC compiles for you to try out.
Built for Win64 with GCC 4.9.2 and
-O3 -march=nehalem -funroll-loops. FLAC is configured with
--enable-64-bit-words.
Here a some numbers converting a 2.5 hour, 48 kHz, 16 Bit Stereo WAV:
Codec | Encode | Decode |
FLAC | 35.711s | 13.646s |
FLAC fastCRC | 32.850s | 12.411s |
Ogg FLAC | 38.531s | 16.085s |
Ogg FLAC fastCRC | 33.517s | 12.627s |
Hi enzo,
just tested your x64 binary on my ancient i5-2400 with my usual set of test files.
I can confirm an encoding speed boost of 4,3% when compared with the latest git version.
Thank you very much for your efforts - quite interesting that no one thought of optimizing the CRC functions...
.sundance.
I tested with a small WAV-file (219 MB) and got the following result with my Intel i5 4460 CPU (4x 3,2 GHz):
Using FLAC-1.3.2_Git-2018-04-08_Win_GCC730 I got: Total encoding time: 0:07.629, 170.86x realtime
Using flac-1.3.2-fastcrc-win64 I got: Total encoding time: 0:06.240, 208.89x realtime
Using FLAC-1.3.2_Git-2018-04-08_Win_GCC730 I got: Total encoding time: 0:07.629, 170.86x realtime
Using flac-1.3.2-fastcrc-win64 I got: Total encoding time: 0:06.240, 208.89x realtime
Great, but builds from different compilers and possibly built with different options are not really comparable. That's why I include a regular flac.exe and the flac-fastcrc.exe in my ZIP.
With my build being 22% faster in your run, I suspect the GCC 7.3 build was not compiled with ideal options. It should be closer to about 5% difference otherwise.
Using FLAC-1.3.2_Git-2018-04-08_Win_GCC730 I got: Total encoding time: 0:07.629, 170.86x realtime
Using flac-1.3.2-fastcrc-win64 I got: Total encoding time: 0:06.240, 208.89x realtime
Great, but builds from different compilers and possibly built with different options are not really comparable. That's why I include a regular flac.exe and the flac-fastcrc.exe in my ZIP.
With my build being 22% faster in your run, I suspect the GCC 7.3 build was not compiled with ideal options. It should be closer to about 5% difference otherwise.
Ah, I see. Sorry for not doing things properly.
EDIT:
I tested with the same wav-file again, I got these results using the flac exe's included in
flac-1.3.2-fastcrc-win64flac.exe: Total encoding time: 0:07.316, 178.17x realtime
flac fastcrc: Total encoding time: 0:06.006, 217.03x realtime
Sorry for posting again, but I tested again, with an larger wav-file (607 MB).
These are the results:
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\flac\U2 - Pop.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\flac\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:25.896, 139.46x realtime
--
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\U2 - Pop.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:16.100, 224.31x realtime
I tested with the same wav-file again, I got these results using the flac exe's included in flac-1.3.2-fastcrc-win64
flac.exe: Total encoding time: 0:07.316, 178.17x realtime
flac fastcrc: Total encoding time: 0:06.006, 217.03x realtime
OK, so that's still 22% percent difference. I didn't expect that as on my system the difference between those binaries is only about 5% to 7%. Would be interesting to see more results from others.
Destination file: F:\flac-1.3.2-fastcrc-win64\flac\U2 - Pop.flac
Total encoding time: 0:25.896, 139.46x realtime
--
Destination file: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\U2 - Pop.flac
Total encoding time: 0:16.100, 224.31x realtime
Here it could be that for the first run the .wav was not in the file system cache. The difference seems too big. Could you run the non-fast test again (maybe twice to see if the results are constant)?
I tested with the same wav-file again, I got these results using the flac exe's included in flac-1.3.2-fastcrc-win64
flac.exe: Total encoding time: 0:07.316, 178.17x realtime
flac fastcrc: Total encoding time: 0:06.006, 217.03x realtime
OK, so that's still 22% percent difference. I didn't expect that as on my system the difference between those binaries is only about 5% to 7%. Would be interesting to see more results from others.
Destination file: F:\flac-1.3.2-fastcrc-win64\flac\U2 - Pop.flac
Total encoding time: 0:25.896, 139.46x realtime
--
Destination file: F:\flac-1.3.2-fastcrc-win64\flac-fastcrc\U2 - Pop.flac
Total encoding time: 0:16.100, 224.31x realtime
Here it could be that for the first run the .wav was not in the file system cache. The difference seems too big. Could you run the non-fast test again (maybe twice to see if the results are constant)?
Sure no problem, here are the results for the non-fast flac.exe:
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast1.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast1.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.020, 212.18x realtime
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast2.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast2.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.051, 211.80x realtime
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast3.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast3.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.004, 212.38x realtime
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast4.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast4.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:17.144, 210.65x realtime
CLI encoder: flac.exe
Destination file: F:\flac-1.3.2-fastcrc-win64\U2 - Pop_nonefast5.flac
Encoder stream format: 44100Hz / 2ch / 16bps
Command line: "F:\flac-1.3.2-fastcrc-win64\flac.exe" -s --ignore-chunk-sizes -8 - -o "U2 - Pop_nonefast5.flac"
Working folder: F:\flac-1.3.2-fastcrc-win64\
Encoder process still running, waiting...
Encoder process terminated cleanly.
Track converted successfully.
Total encoding time: 0:16.957, 212.97x realtime
Sure no problem, here are the results for the non-fast flac.exe:
Thanks! So it seems to settle around 212x while the fast version was 224x realtime. That's about exactly what I expected - 5 to 6% improvement.
Hello,
I'm still running a Core 2 Duo E7200@2.86GHz and your binary works, even if it is supposed to not support SSE4.2 and POPCNT.
Converting a 592MB wav file (time in seconds):
| Run1 | Run2 | Run3 |
flac | 15.009 | 15.007 | 15.013 |
flac-fastcrc | 13.912 | 13.961 | 14.092 |
Quite interesting.
AiZ
Hi,
I eventually got some parts to assemble my new PC, now sporting an incredible...
Pentium G5400.
Woohoo! :D
Same wav file as above (time in seconds):
| Run1 | Run2 | Run3 |
flac | 7.090 | 7.066 | 7.058 |
flac-fastcrc | 6.180 | 6.127 | 6.237 |
Still great.
AiZ
Enzo,
Do you have any performance profile data for these codecs on current gen CPUs? I don't have time to dig into this right now, but I'd like to eventually and I'm curious if things have changed the last 10-15 years of new CPU hardware.
Do you have any performance profile data for these codecs on current gen CPUs? I don't have time to dig into this right now, but I'd like to eventually and I'm curious if things have changed the last 10-15 years of new CPU hardware.
You mean regarding optimizations in general (with new instruction sets like AVX), yes? Not related to the CRC thing.
No, I didn't do any performance profiling. Same for me as for you: I'd like to take a deeper dive into this and try to further optimize codecs, but don't really have time to do it.
Yeah, was just curious what the breakdown of codec runtime was by function. I think I last looked at lame performance in 2006 on a 32 bit Pentium 4. Probably things have changed since then.
Since it's been a while:
Has this great patch already been included to the official repository? (https://git.xiph.org/?p=flac.git;a=summary)
Has this great patch already been included to the official repository?
Yes, it's there: https://git.xiph.org/?p=flac.git;h=5579f29 :)
Should become part of the next official FLAC release.
@Enzo:
Just ran my test set of FLACs again with John33's ICL18 x64 compile from 18.11.2018.
It took 68.852 sec compared to 70.646 sec with an ICL17 x64 compile from October 2017 (before the CRC patch).
That said, the compile you provided is still the fastest one (66.794 sec) on my i5-2400...
That said, the compile you provided is still the fastest one (66.794 sec) on my i5-2400...
Might be due to my build using
--enable-64-bit-words which is a bit faster on x64. Not sure if John33's builds have that option enabled.
Built for Win64 with GCC 4.9.2 and -O3 -march=nehalem -funroll-loops. FLAC is configured with --enable-64-bit-words
Since your build is still the fastest out there:
Did you try to compile the current FLAC version 1.33 with your settings?
Maybe I could try to set up the current version of GCC's C++ compiler and use your settings. You got any tips there?
What's the version you are using atm?
Did you try to compile the current FLAC version 1.33 with your settings?
I have compiled the new release and attached it here. Used the same settings as above and it's even slightly (~1.5%) faster than the fastcrc preview build (likely due to other optimizations added since 1.3.2).
If you'd like to try building it yourself:
I'm using my custom fre:ac Component Development Kit for compiling. It's still based on GCC 4.9.2 as I plan to update to a newer version only after the fre:ac 1.1 release. The toolkit can be found here: https://github.com/enzo1982/BoCA/releases
The script for building libogg and flac can be found here: https://github.com/enzo1982/freac/blob/ebbe9cb95b507617b5d3cd205329337be80a8012/tools/build-codecs
Changes for this build are replacing -march=nocona with -march=nehalem and removing the -mno-sse3 option in line 27 as well as adding --disable-shared in line 302.
Hi enzo,
thank you so much for your build of FLAV v1.33. Your settings are AWESOME!
I tested it with my set of 40 WAV files (~ 1.9 GB) and compared it to your fastcrc version of v1.32 and Wombat's VS 2017 compile.
I used an i7-6700 @ 3.4 GHz, the WAVs stored on a Samsung EVO 850 SSD. Timings were done with Igor Pavlov's timer64.exe.
In this configuration, the difference between your fastcrc and the 1.33 build is marginal, with a tiny advantage for the 1.33 version, but that's within the range of measurement inaccuracies. But compared to the VS build, the difference is REMARKABLE:
flac132-fastcrc.exe (enzo):
Kernel Time = 2.609 = 6%
User Time = 35.421 = 89%
Process Time = 38.031 = 96% Virtual Memory = 14 MB
Global Time = 39.359 = 100% Physical Memory = 14 MB
flac133_enzo.exe:
Kernel Time = 2.234 = 5%
User Time = 35.750 = 90%
Process Time = 37.984 = 96% Virtual Memory = 14 MB
Global Time = 39.342 = 100% Physical Memory = 17 MB
flac133_vs2017.exe:
Kernel Time = 2.796 = 6%
User Time = 39.218 = 89%
Process Time = 42.015 = 95% Virtual Memory = 14 MB
Global Time = 43.897 = 100% Physical Memory = 16 MB
Thanks also for the instructions to set up the toolchain for GCC. Will try that on the next rainy weekend... ;-)
I have compiled the new release and attached it here.
thanks a lot, @enzo - very appreciated! i was using your initial test-build up until now, because it is remarkably faster for me than the build from rarewares! got to test this now in comparison 8)
i was using your initial test-build up until now, because it is remarkably faster for me than the build from rarewares!
Same here! And thanks to enzo's excellent tuning of the compiler switches, the free GCC compiler outperforms those from Intel and Microsoft. Btw, just tested the flac.exe in foobar's "Free_Encoder_Pack_2019-08-04" and sadly, this one is (by far) the slowest:
Kernel Time = 2.671 = 5%
User Time = 47.625 = 91%
Process Time = 50.296 = 96% Virtual Memory = 14 MB
Global Time = 51.852 = 100% Physical Memory = 14 MB
Btw, just tested the flac.exe in foobar's "Free_Encoder_Pack_2019-08-04" and sadly, this one is (by far) the slowest:
flac.exe from Free_Encoder_Pack was made to work even on ancient CPUs like Pentium 2.
Btw, just tested the flac.exe in foobar's "Free_Encoder_Pack_2019-08-04" and sadly, this one is (by far) the slowest:
Probably the main reason for it is that it's 32-bit. Because yes, compatibility.
i was using your initial test-build up until now, because it is remarkably faster for me than the build from rarewares!
Same here! And thanks to enzo's excellent tuning of the compiler switches, the free GCC compiler outperforms those from Intel and Microsoft.
i just made a little 1.33 comparison, not as profound as yours and under wine 64bit with fb2k. anyway here is the result. surprisingly, netrangers build is the fastest right now :)
netranger: 345x
enzo: 300x
wombat: 244x
enzo flac 1.32: 370x
Probably the main reason for it is that it's 32-bit. Because yes, compatibility.
flac.exe from Free_Encoder_Pack was made to work even on ancient CPUs like Pentium 2
That makes perfect sense! And is very important for the targeted audiance = Everyone!
Thanks for the explanation!
@sundance, care to try this build?
Atm, I just can run the test on my office computer:
i5-6500 @ 3.2 GHz, 16 GB RAM
flac133_NetRanger:
Kernel Time = 3.109 = 5%
User Time = 52.562 = 90%
Process Time = 55.671 = 95% Virtual Memory = 14 MB
Global Time = 58.036 = 100% Physical Memory = 14 MB
flac133_VS2017:
Kernel Time = 3.109 = 6%
User Time = 46.343 = 90%
Process Time = 49.453 = 96% Virtual Memory = 14 MB
Global Time = 51.172 = 100% Physical Memory = 13 MB
flac133_enzo.exe:
Kernel Time = 2.890 = 6%
User Time = 40.734 = 90%
Process Time = 43.625 = 96% Virtual Memory = 14 MB
Global Time = 45.233 = 100% Physical Memory = 14 MB
flac133_Case.exe:
Kernel Time = 2.890 = 6%
User Time = 39.796 = 90%
Process Time = 42.687 = 97% Virtual Memory = 14 MB
Global Time = 43.916 = 100% Physical Memory = 14 MB
Look and behold! Another speed increase of ~3%...
As soon I'm back on my own computer, I'll keep you posted on the results there.
What compiler/settings did you use?
I used GCC 4.9.3 (oldest one I had archived) with CFLAGS -O3 -m64 -march=haswell -funroll-loops -fno-stack-protector.
It looks like the newer the GCC version is the slower the compiled FLAC binary becomes. With the same settings as enzo used my compile was a tiny fraction slower. GCC 7.3 produced somewhat slower binary and GCC 9.1 somewhat slower than that.
Did another run on an i7 at hand:
i7-4790 @ 3.6 GHz, 16 GB RAM
flac133_enzo.exe:
Kernel Time = 2.046 = 4%
User Time = 39.484 = 92%
Process Time = 41.531 = 97% Virtual Memory = 14 MB
Global Time = 42.661 = 100% Physical Memory = 14 MB
flac133_Case.exe:
Kernel Time = 2.437 = 5%
User Time = 42.265 = 92%
Process Time = 44.703 = 97% Virtual Memory = 14 MB
Global Time = 45.705 = 100% Physical Memory = 14 MB
Quite surprisingly, on that CPU your GCC 4.9.3 build was slower than enzo's AND even slower compared to my i5 office computer despite having less core speed...
Now I'm really curious how it performs on my i7-6700 @ 3.4 GHz, which is the same CPU family as the office PC (Skylake)
Here we are with the results on my i7-6700 @ 3.4 GHz CPU. For both binaries, I did 6 runs (only 3 for the previous tests) and calculated the average value of timer64's "Global time":
flac133_enzo.exe:
Global Time: 39.174 sec
flac133_Case.exe:
Global Time: 38.058 sec
So we see a ~3% benefit also, like with the i5-Skylake CPU in my office.
Thanks for testing. Weird that the i7-4790 performed the way it did as my i7-4771 shows my build to be faster.
Would be curious to know how these builds behave on AMD or older Intels. I'm a bit worried about processor specific optimization flags there.
BTW, running configure with --enable-64-bit-words option should give some slight speedup for 64-bit executables.
I used that in all GCC compile tests but neglected to mention it. Otherwise there would be no hope of getting speeds anywhere close to enzo's.
@sundance, care to try this build?
Is this STILL the latest recommended FLAC binary for Windows with fastcrc mod?
Is this STILL the latest recommended FLAC binary for Windows with fastcrc mod?
fastcrc patch was included in official repository. Latest binaries are available on RareWares and here (https://hydrogenaud.io/index.php?topic=118008.msg994836#msg994836).