Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Multithreading (Read 24735 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Re: Multithreading

Reply #125
I don't really know.

It's the first time I deal with dlopen (used in the static OMP library).

At least the executable working properly :)

Re: Multithreading

Reply #126
  • Probably not worth it
  • --mode peakset --blocksize-list 576,1152,1728,2304,2880,3456,4032,4608 --analysis-comp 8p --output-comp 8ep --queue 8192 --tweak 1 --merge 0
@cid42 so I tried really really slow settings for fun on a 6 seconds file (I attached it).

Several times I get an "Error: Init failed" (this error is defined in common.c in flaccid) when using slow settings.
For example if I use this command:
Code: [Select]
flaccid --in 12_-_Napalm_Death_-_You_Suffer.flac --lax --out out.flac --preserve-flac-metadata --queue 8192 --workers 13 --tweak 1 --merge 0 --analysis-apod subdivide_tukey\(3\) --output-apod subdivide_tukey\(6\) --analysis-comp mepl32r15 --output-comp mepl32r15 --mode peakset --blocksize-list 256,512
Ideally I would use more blocksizes in the list (something like `seq -s, 256 256 5120`), but I shortened it to get the error faster (exact same output).
Here is the output:
Code: [Select]
(null)
Processed 1/3
Processed 3/3
Error: Init failed
With that same command, if I limit myself to fewer blocksizes (and bigger step) with `seq -s, 512 512 5120`, it works completely fine and output is created in 155 seconds (not too long if you want to test).

So I guessed flaccid could only work with blocksize >= 512, so I tried different settings (only removed `--queue` arg):
Code: [Select]
flaccid --in 12_-_Napalm_Death_-_You_Suffer.flac --lax --out out.flac --preserve-flac-metadata --workers 13 --tweak 1 --merge 0 --analysis-apod subdivide_tukey\(3\) --output-apod subdivide_tukey\(6\) --analysis-comp mepl32r15 --output-comp mepl32r15 --mode peakset --blocksize-list $(seq -s, 256 256 512)
Still has a blocksize of 256, and "Error: Init failed" but different output:
Code: [Select]
(null)
Processed 1/3
Processed 3/3
tweak(1) saved 20 bytes with 6 tweaks
# ~100 lines like this; then
Error: Init failed
Got this error after 73 seconds. Output FLAC file was almost done: 50ms shorter than input (only one frame is missing ?), MD5 not computed and 2 placeholders in the seektable.


You surely guessed it, I wrote this message in the hope that the error can be fixed.
I also have a question: why is it needed to have all the blocksizes (to bruteforce) multiples of the first one of the list ?
It is surely easier to implement and parallelize, but I find a waste of time when dealing with blocksize steps of 32, 64, 128 (since blocksize from 32 to ~1024 are completely useless to bruteforce).


Many thanks for your software and time.


Re: Multithreading

Reply #127
I'm seeing the same thing here.  I did some limited testing.  In combination with the --lax option,  I can reproduce if the blocksize list has any combination that can equal 256.  If the blocksize list has anything smaller or larger than 256, but doesn't equal 256, there's no error.







Re: Multithreading

Reply #128
You surely guessed it, I wrote this message in the hope that the error can be fixed.
Thank you for making it easy by providing the exact file and settings to reproduce, the latest commit should now work. It looks to be the small (4 sample) partial frame at the end that fails, it looks like it might be failing because the strong lax settings provided can't technically create valid output with such a small frame. I've added a catch-all to the init_static_encoder function that detects when initialisation fails, and tries again with less aggressive settings (presumably ./flac does something similar or the different way it handles the last frame means it doesn't encounter the problem). Not the most elegant solution but it should catch edge cases including this. I can't guarantee that all edge cases are now working, or even that there isn't an underlying problem yet to resolve as this catch-all may just be masking it.

I also have a question: why is it needed to have all the blocksizes (to bruteforce) multiples of the first one of the list ?
It means that regardless of the chosen blocks in a given chain, they always start and end on a discrete boundary that's a multiple of the smallest blocksize. Unconstrained bruteforce is not feasible and an intelligent algorithm that competes well has not been found, peakset is a reasonable compromise.

Re: Multithreading

Reply #129
Thanks for the quick update!

It indeed fixes the issue.

I built a static executable once again. Same specs as before: Linux x86-64, generic, static, PGO LTO optimized (O3 flto fprofile-instr-gen), stripped, UPX'd (compressed with -9 --ultra-brute).

Re: Multithreading

Reply #130
Built a static Linux 64-bit binary using libFLAC git-31ccd3df.   Targeted for AVX2 capable CPUs.  Also tried my hand at a static Win64 build.

Dear @Replica9000, if possible, consider building a Windows version without the need for AVX instructions.


• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Multithreading

Reply #131
Built a static Linux 64-bit binary using libFLAC git-31ccd3df.   Targeted for AVX2 capable CPUs.  Also tried my hand at a static Win64 build.

Dear @Replica9000, if possible, consider building a Windows version without the need for AVX instructions.




My test setup is Win 7 x64 in a VM, so I'm not sure how well these perform on real hardware.  I included 32-bit and 64-bit, with an asm/noasm version of each.  On my Linux PC, the no asm builds perform faster for 16-bit files, but the opposite seemed to be the case in the VM.  Flaccid only supports 16-bit, but the couple 24-bit files I tried encoded fine.


Re: Multithreading

Reply #132
Flaccid only supports 16 bit wav/raw input as the wav formats are many and I never quite got the wav library working. But it should support arbitrary flac input (iirc), so a hack to encode any supportable wav would be pipe/convert to flac first. Not ideal but workable. Might revisit when flac gets a major update, for now just busy with other things.

Re: Multithreading

Reply #133
@Replica9000, unfortunately, binaries from your archive crash upon launch on my end.
Are you sure they do not need SSE4 or something else that I might not have?

• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Multithreading

Reply #134
@cid42 The files I used were indeed 24-bit FLAC.  I read the github page a while back remembering something about 16-bit only.  I forgot that it applied to only wav/raw pcm.

....some brave soul can convert their entire collection if they wish. I recommend checking the result with flac -t before you delete the originals, this is still alpha you know ;)
A while back I started recompressing my library using more aggressive settings, updating tags and embedded album art.  About half way through I started using flaccid.  I've used it on thousands of tracks and no issues. :)



@Kraeved I had used a generic CPU flag.  I recompiled these binaries with the core2 flag.


Re: Multithreading

Reply #135
Thank you for the opportunity to get acquainted with Flaccid.

* hyperfine.exe --warmup 3 --runs 3 "commands"
* flac.exe x64 20240309, flake.exe x86 from CUETools 2.2.5
* in.wav, 44100 Hz 16 bit stereo, 99 423 788 bytes

Code: [Select]
  Command                                                           Mean time [s]        
 ---------------------------------------------------------------- ----------------
  flac -7 -f in.wav -o out.flac                                     8.716 ± 0.047       
  flaccid_w64 --preset 7 --in in.wav --out out.flac                 9.208 ± 0.176       
  flake -7 -f in.wav -o out.flac                                   10.496 ± 0.182      
  flaccid_w64_noasm --preset 7 --in in.wav --out out.flac          11.274 ± 0.041      
  flaccid_w32 --preset 7 --in ind.wav --out out.flac               16.057 ± 0.038      
  flaccid_w32_noasm --preset 7 --in ind.wav --out out.flac         17.986 ± 0.184 
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Multithreading

Reply #136
It looks like the binaries without asm optimizations are slower for you too.  Maybe it's something to do with Windows or the CPU flag used. 

On my system:
Code: [Select]
./flac -8p in.wav = 53.615s
./flac_noasm -8p in.wav = 44.648s
./flaccid --preset 8p --in in.wav --out out.wav = 53.606s
./flaccid_noasm --preset 8p --in in.wav --out out.wav = 44.261s

Edit:  Binaries compiled with core2 CPU flag.  Maybe no asm optimizations is only beneficial with AVX or better.
Code: [Select]
./flac_core2 -8p in.wav = 55.375s
./flac_core2_noasm -8p in.wav = 2m23.837s