Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Multithreading (Read 34991 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: Multithreading

Reply #100
So, this is a build: Linux x86-64, generic (mtune), static, optimized (O3 flto fprofile-instr-gen), stripped, UPX'd (compressed with -9 --ultra-brute)
[...]
Command used for compiling (PGO gen):
From my experience, using PGO and/or LTO for compiling libFLAC shrinks the binary size and slows down encoding instead of speeding it up. I haven't seen a build where either (or both) improve speed.
Music: sounds arranged such that they construct feelings.

Re: Multithreading

Reply #101
Thank you for a static Linux build, doubt I could have figured that out.

--analysis-comp mepl32r15 --output-comp mepl32r15
You almost certainly want to prepend 8 to these options, the first character if it's '0' to '8' defines the compression level. Without defining 8 you're using libflac's default of 5.

I must say that it's quite unclear for me what the values --merge and --tweak do (even reading that comment and the help message from flaccid).
A value of 0 disables them, otherwise it's the minimum number of bytes a pass over the queue needs to save to trigger another pass. Merge/tweak are iterative over the queue, the bigger the queue the easier it is for a pass to hit the byte target. Each successive pass tends to save less bytes than the last, a larger value means you're trying to target only the low hanging fruit for some cheap gains, a 1 means you want to minimise filesize at any cost.

The workers option seems to be ignored, and flaccid seg faults on large files.
How large and was it a seg fault or was an error message given? As far as I remember proper streamed read was implemented, it shouldn't be reading the entire file into memory before encoding or anything like that. If it was a large wav file then that might explain it, wav support currently is poor.

The last commit which I pushed after TheBigBadBoy first tried compiling the code fixed a regression in fixed mode where it only ran single-threaded. They may not have updated to include that commit before building the binary. Alternatively something else may have got lost in the shuffle of refactoring/tidying/minimising which were some of the last things I did in feb.

Re: Multithreading

Reply #102

The workers option seems to be ignored, and flaccid seg faults on large files.
How large and was it a seg fault or was an error message given? As far as I remember proper streamed read was implemented, it shouldn't be reading the entire file into memory before encoding or anything like that. If it was a large wav file then that might explain it, wav support currently is poor.

The last commit which I pushed after TheBigBadBoy first tried compiling the code fixed a regression in fixed mode where it only ran single-threaded. They may not have updated to include that commit before building the binary. Alternatively something else may have got lost in the shuffle of refactoring/tidying/minimising which were some of the last things I did in feb.

The WAV files are over 1GB. I also tried with the file already compressed to FLAC (629MB).
Code: [Select]
./flaccid --in NIN_The_Fragile.flac --out NIN_The_Fragile-fl.flac --workers 8 --preset 8
Segmentation fault
I end up with an incomplete 410MB file.

This is what I get at the end of strace
Code: [Select]
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f46b207a000} ---
+++ killed by SIGSEGV +++

I was gonna attempt to build a copy myself, but not having much luck at the moment.
Code: [Select]
./common.h:34:29: error: unknown type name 'FLAC__StaticEncoder'; did you mean 'FLAC__StreamEncoder'?

Re: Multithreading

Reply #103
From my experience, using PGO and/or LTO for compiling libFLAC shrinks the binary size and slows down encoding instead of speeding it up. I haven't seen a build where either (or both) improve speed.
Mmmmh... I highly doubt that PGO will slow down the encoding.
For LTO I don't really know, but I'll only trust numbers ^^

Thank you for a static Linux build, doubt I could have figured that out.
No problem. If you want I can also share the 4 static libraries needed to compile flaccid statically.
Once you have them it's just plug and play.
You almost certainly want to prepend 8 to these options, the first character if it's '0' to '8' defines the compression level. Without defining 8 you're using libflac's default of 5.
In the flac manual, we can read
Code: [Select]
       -8, --compression-level-8
              Synonymous with -l 12 -b 4096 -m -r 6 -A subdivide_tukey(3)
I override them all with the string "mepl32r15" (and using apod-anal + out), so shouldn't change a thing even if I added '5' or '8' at the beginning (at least in `flac`, don't know about `flaccid`).

Thanks for explanations about tweak and merge.

I was gonna attempt to build a copy myself, but not having much luck at the moment.
Code: [Select]
./common.h:34:29: error: unknown type name 'FLAC__StaticEncoder'; did you mean 'FLAC__StreamEncoder'?
@Replica9000 You got the same error I had :))
You simply need the static_encoder branch (instead of the main): https://github.com/chocolate42/flac/tree/static_encoder

Re: Multithreading

Reply #104
Oh forgot a thing:
The last commit which I pushed after TheBigBadBoy first tried compiling the code fixed a regression in fixed mode where it only ran single-threaded. They may not have updated to include that commit before building the binary. Alternatively something else may have got lost in the shuffle of refactoring/tidying/minimising which were some of the last things I did in feb.
I have included that commit before compiling ^^
I kept the sources, and `git pull` says everything (flac mod + flaccid) are up to date.

Re: Multithreading

Reply #105
@TheBigBadBoy: That's true, if you've covered everything a preset will do it shouldn't matter. flaccid uses the libflac api and just makes sure to call the preset first if present before any manual options.

I wonder how widely the static binaries you've made can actually be used, Linux or more specifically glibc tends to be a pain being forwards not backwards compatible even with static builds. It's why projects like Prime95 use ancient versions of CentOS to build their static Linux binaries, searching online that does seem to be the common solution.

@Replica9000: Thank you for details about the seg fault, it might help me be able to reproduce and fix.

Re: Multithreading

Reply #106
@cid42 idk about the static libs, it was just long to read how to build them properly and wait for them to build...
Right now they should be usable on any x86-64 Linux (I think).
I first tried to search that online, but found absolutely nothing (and Arch Linux does not provide any static libs with pacman, while Termux does - for aarch64/Android).

I tested flaccid again (this time without profiling data), with a faster encode `--mode peakset --blocksize-list 576,1152,1728,2304,2880,3456,4032,4608 --analysis-comp 8p --output-comp 8ep --queue 8192 --tweak 1 --merge 0 --workers 16`
Was quite fast, FLAC input was 4 min 36 sec 44.1kHz stereo s16, encoded in only 386 sec (6 min 26 sec).
I lookedcarefully this time, and **CPU usage never reached more than 100%** !
(Using 16 threads would result in 1600% CPU usage)

Anyway, if you need help to build stuffs or try some other commit don't hesitate to tag me.

I fully support this flaccid project. I like it, please continue ^^

Re: Multithreading

Reply #107
For some reason the static build is just not multithreaded, my normal build is multithreaded. Tried fixed/gasc/peakset modes (easy enough with presets 8/9/10) on a quad core with 4 workers, all pegged at 100% with the static build and 400% with the normal build.

Re: Multithreading

Reply #108
Thanks for letting me know, I'll look at it tomorrow. The cause is surely OMP, but it is quite weird I managed to compile flaccid without any warning or errors related to OMP.


Re: Multithreading

Reply #109
Fixed the seg fault and pushed to repo, the problem was underallocating working memory when storing potential seekpoints. With an average blocksize of 4096 it would start using unallocated memory at ~71 minutes 44.1KHz input and quickly seg fault, when the code assumed enough space was reserved for ~90 minutes. Always use sizeof.

Until further notice avoid gasc mode (including preset 9), testing with a large file uncovered something causing corruption which may or may not be related to using a large file. Maybe there's something off about queue handling or something. If a fix isn't obvious gasc will probably soon be removed temporarily (or permanently if gasc can't be made more useful).

Now that libflac can multithread there's no point mirroring presets 0 to 8. I'm tempted to make flaccid's -0 the same as flac's -8p to set expectations, make presets >0 progressively stronger peakset settings, and maybe <0 progressively weaker variable encodes that may overlap in strength/speed with flac's presets.

Re: Multithreading

Reply #110
Thanks for the update!

I managed to get OMP working, and now `flaccid` is using around 1550% CPU usage with 16 (resulting in 5x faster encoding compared to single threaded), good to see ^^ .

Same specs as before: Linux x86-64, generic (mtune), static, optimized (O3 flto fprofile-instr-gen), stripped, UPX'd (compressed with -9 --ultra-brute).
This build has already the latest update (segfault on large files *fixed*).

But this time, I have a Warning:
Code: [Select]
Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
in other words, some code (EDIT: not libFLAC) uses `dlopen` (aka dynamic library open), so this static build is not really static (as would be any static build with libFLAC code).

So idk if people using my build need the same library version as me (libc.so.6) or the same glibc version as me (2.37-3).
Anyway, anyone already have glibc installed, I just don't know if glibc is updated often/rarely :/

Re: Multithreading

Reply #111
But this time, I have a Warning:
Code: [Select]
Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
in other words, libFLAC's code uses `dlopen` (aka dynamic library open)
No, it doesn't. I assume that dlopen is somewhere else, maybe in the OpenMP code, the gcc libs or libc?
Music: sounds arranged such that they construct feelings.

Re: Multithreading

Reply #112
Oh yeah, my bad...
`grep` found matches with the text "dlopen" in my static OMP library (libomp.a).

And it also explains why I didn't have this warning when I first compiled (since OMP didn't work).

I have to admit that I don't know if there's a solution to "remove" this `dlopen()`, and if there is, how to find it.
I think I'll leave it at that (unless someone knows how).
Of course, if need be, I can rebuild it.

Re: Multithreading

Reply #113
dlopen issues sounds like libc. A common solution is to use a relatively ancient version of glibc from for example an ancient version of centos (it and likely future versions of glibc should be able to run it). Or maybe musl would be a good way to go now.

Re: Multithreading

Reply #114
@cid42 I don't think so, because it's not the first time I compile statically an executable (on the other hand, it's the first time I use a static version of OMP).

You can see here that OMP is using `dlopen()`: https://github.com/llvm/llvm-project/blob/858a2865d3008b35f22597a411b2b4f7110aaa15/openmp/runtime/src/kmp_alloc.cpp#L1277 (one example among others)

Also, I only needed to compile 4 static libs in order to make a static build of flaccid: libcrypto.a, libFLAC-static.a, libogg.a, libomp.a.
The problem is really OMP, and even an "old" version from 2020 already has `dlopen()` calls. https://github.com/llvm-mirror/openmp

Re: Multithreading

Reply #115
The new flaccid binary @TheBigBadBoy provided seems to work well with the larger files.  For the simple presets, even 9 and 9p seems to work (no corruption in my tests so far).  But when I was getting into the complex settings, I found an error with merge when used in combination with queue set to anything above 256.
Code: [Select]
./flaccid --in NIN_The_Fragile.wav --out NIN_The_Fragile-fl.flac --workers 16 --mode peakset pr7l12m --queue 8192 --merge 1 --tweak 0
Processed 1/7
Processed 3/7
Processed 7/7
merge(1) saved 6721 bytes with 423 merges                               Processed 1/7                                                          
Processed 3/7                                                          
Processed 7/7
merge(1) saved 2604 bytes with 152 merges
Processed 1/7
Processed 3/7
Processed 7/7
merge(1) saved 2764 bytes with 160 merges
Processed 1/7
Processed 3/7
Processed 7/7
merge(1) saved 3256 bytes with 189 merges
Processed 1/7
Processed 3/7
Processed 7/7
Processed 1/7
Processed 3/7
Processed 7/7
merge(1) saved 5227 bytes with 275 merges
flaccid: common.c:128: void simple_enc_encode(simple_enc *, flac_settings *, input *, uint32_t, uint64_t, int, stats *): Assertion `samples' failed.
Aborted

Re: Multithreading

Reply #116
Thanks, that assertion implies that 0 blocksize frames are trying to be encoded, which used to be a definite no but now are possible as the queue is implemented as an array. If it is what I think then it occurs because analysis and output settings are different and it just never got caught in testing. Strange that queue size matters, strange that it doesn't fail at the first instance of a 0 blocksize encode. There might be something deeper going on.

Re: Multithreading

Reply #117
Even with the same input file and options used, it doesn't fail at the same point.  Sometimes it fails early into the file, sometimes later.  There was even a couple successful runs.

Re: Multithreading

Reply #118
Only with  merge right? I think I've found a flaw in the logic.

edit: Another classic memory handling error, introduced at some point during the transition to an output queue and more commonly triggered by longer files.

Re: Multithreading

Reply #119
Only when using both merge and queue.  Anytime I remove one of those two options, there's no errors.

 

Re: Multithreading

Reply #121
As of the latest commit:
  • Fixed the merge bug, caused by a race condition when multithreading. A variable was erroneously shared by threads, rarely a merge occurred but the counter didn't update, if a pass ended with the counter at zero when it should have been >0 then kaboom. Seems to have been triggered more often with larger files and larger queues because the extra rope gave it more chance to hang itself
  • Reviewed the rest of the multithreading code, there's not much of it as most of it was combined into the queue code with only some short snippets in gset/gasc/peakset. It all seems sound
  • Updated the (still convoluted and dense) help/readme to answer questions TheBigBadBoy and Replica9000 recently had

I'll leave updating to the latest libflac until libflac multithreading is merged or there's a major compression benefit. Right now flaccid is 70 commits behind but there's nothing too relevant AFAICT. Haven't revisited gasc yet.


Re: Multithreading

Reply #123
Thanks for the update cid42!
And yes, updating the libFLAC version used by flaccid is not really usefull (for now), as there would be no real benefit (the main "things" of flaccid being multithreading and compression).

@Replica9000 Thanks for the builds. There are matches of "dlopen" (dynamic library open) in your Linux build, which was quite intended (like mine).
But I just built it statically on Termux for aarch64 CPU architecture, and surprisingly I got no warning about dlopen when compiling (no strict glibc version to have), and there is no match for dlopen in the archive.
So I guess there's a way to have the same behaviour for a x86-64 executable, but I need to dig in more.

Re: Multithreading

Reply #124
@Replica9000 Thanks for the builds. There are matches of "dlopen" (dynamic library open) in your Linux build, which was quite intended (like mine).
But I just built it statically on Termux for aarch64 CPU architecture, and surprisingly I got no warning about dlopen when compiling (no strict glibc version to have), and there is no match for dlopen in the archive.
So I guess there's a way to have the same behaviour for a x86-64 executable, but I need to dig in more.

I only get one warning about dlopen when compiling:
Code: [Select]
warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

But it doesn't seem the static binary is calling for glibc
Code: [Select]
readlinkat(AT_FDCWD, "/proc/self/exe", "/home/replica/user-data/builds/f"..., 4096) = 74
openat(AT_FDCWD, "/sys/devices/system/cpu/possible", O_RDONLY|O_CLOEXEC) = 3

vs a dynamic binary
Code: [Select]
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libogg.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libmvec.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libgomp.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/sys/devices/system/cpu/possible", O_RDONLY|O_CLOEXEC) = 3

So isn't dlopen just making internal calls to the libs statically built into the binary?