Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: WavPack 5.6.4 Has Multithreading Option (Read 769 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

WavPack 5.6.4 Has Multithreading Option

Some of you might be aware that I have been working with the Allen Institute for Neural Dynamics team on the applicability of WavPack for compressing electrophysiology data collected from probes inserted into the brain. These probes generally have many channels (e.g., 384) and while they were looking at FLAC they were referred to WavPack (thanks @ktf ) because it natively handles up to 4096 channels. You can read about this on GitHub here.

One of the issues that came up in this project was that while the encoding was parallelized in the Python front end, the decoding was not, and therefore was slow. I realized that it would not be that complicated to parallelize multichannel decoding right inside libwavpack. All of the encoded data for every channel must already be resident before decoding starts and there’s no interaction between the various streams. I put together a proof-of-concept for this and it greatly sped up their decoding. While it wasn’t needed initially, I also extended this to encoding because the organization was similar. I refer to this as spatial multithreading.

Once I had written all the code to have multiple worker threads in libwavpack I could not help but begin thinking about extending this to mono and stereo streams, which would have to be done temporally instead of spatially. This is more complex because it would only work with operations that span multiple WavPack blocks (which are already fairly long) so the command-line programs, and any other application that wanted to use this, would also have to change. And with encoding there’s an additional complication that WavPack, being continuously adaptive, uses state information from the end of the previous block to start encoding the next block.

But long story made short, I found ways around most of these issues and created a set of command-line programs that include a --threads option to allow multithreaded encoding and decoding, and the results are pretty dramatic. The speed improvement is generally between 2x and 4x, depending on the modes used, and can be as high as about 6x with the maximum thread count (--threads=12). The only serious limitation that I wasn’t able to get around is that the hybrid mode (lossy or lossless) does not encode multithreaded with stereo or mono files (although most multichannel encoding is fine with it). Multithreaded decoding is available for all files.

To be clear, this is not any more efficient than older WavPack versions, and in fact is slightly less so because of the context switching overhead. The speedup comes from using multiple cores or threads, so unless a single core can't keep up (which seems an unlikely situation) then using this for realtime playback or recording does not make sense. This does make sense for some offline operations though, like verify or transcode, and on my system CD encoding with --hhx4 goes from 11x RT to 59x RT!

Other notes:

  • This includes and lot of new and changed code and so I except bugs to appear, so it’s more important than ever to use the -v option when encoding to make sure everything worked.
  • If enabled, the default number of worker threads is 4 and that seems to be a reasonable compromise, even on computers that have less than 4 cores (see below). The maximum parameter is 12 workers, and this can give an additional speedup in some cases, however not all. For example, when on battery power only my new HP Dev One seems to throttle down the CPU speed when more than 4 cores are running, causing no improvement beyond that. On AC power it doesn't.
  • I have tested this on an old Core 2 Duo system that served as my main PC until this happened and even there I get a nice speedup. This old system somewhat explains why I wasn’t too interested in this earlier (I was a little late to the multi-core party).
  • For multichannel encoding, the output files should be identical whether threading is used or not. For now, dynamic noise shaping (dns) in the hybrid lossless mode and the --merge-blocks (for lossyWAV) option prevent multithreaded operation (although I might be able to fix those eventually).
  • For mono or stereo encoding, the output files probably will not be identical for different numbers of worker threads, however the compression ratios will be roughly similar. I recommend always using at least -x1 for all mono or stereo encodes because with multithreading it's basically free. Again, multithreaded encoding does not work for mono or stereo files in any hybrid mode.

Thanks in advance for any testing and/or feedback!

Re: WavPack 5.6.4 Has Multithreading Option

Reply #1
Cool!

Probably a good idea to instruct oss-fuzz to use the thread sanitizer (TSan) for fuzzing to catch some bugs. Perhaps you've already run tests with TSan locally?
Music: sounds arranged such that they construct feelings.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #2
Cool!

Probably a good idea to instruct oss-fuzz to use the thread sanitizer (TSan) for fuzzing to catch some bugs. Perhaps you've already run tests with TSan locally?
Thanks!

I definitely run all the sanitizers (including tsan) locally quite often.

As for the oss-fuzz, that's probably a good thing to add too. Unfortunately an issue I can see is that it requires big files and big reads to invoke the multithreading (at least the temporal variety) and the fuzzers get slow and inefficient with large files (I spent a lot of time distilling the seeds down to the absolute smallest size I could). But there may be some way to spoof that (like using really small frames).

So, why? Did you see something?   :D

Re: WavPack 5.6.4 Has Multithreading Option

Reply #3
So, why? Did you see something?   :D
No, I just wanted to help  :)) I have been busy the last few months improving code coverage of oss-fuzz for flac and fixing the found bugs, so it really was the first thing that popped in my mind reading this.

(I spent a lot of time distilling the seeds down to the absolute smallest size I could).
Ah, yes. I guess I can consider myself lucky, I really only need to worry about coverage and fixing bugs with flac, I haven't really bothered seeding anything.

I wasn’t too interested in this earlier (I was a little late to the multi-core party).[/li][/list]
I think WavPack is the first codec employing multithreading "within" a single input (at least with 'reference' software), so in that sense, you're the first to the party really!

Music: sounds arranged such that they construct feelings.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #4
Quickly testing one file (tested for ID3 import capability as you can see, but I might as well run --threads)

tl;dr:
Confirming that -x1 runs "virtually for free".


Some indicative times, recorded from WavPack's own timer - for any sort of rigor I would have to run this computer up to a stable temperature, and well ... not tonight.

-fx<N>, running the following loop with the x64 build, on an SSD:
for %x IN (0,1,4,6) DO (C:\bin\wavpack-5.6.4-win64\wavpack.exe --import-id3 -yfx%x ".\12277922_Galaxia_(Extended_Mix).aiff" & C:\bin\wavpack-5.6.4-win64\wavpack.exe --threads --import-id3 -yfx%x ".\12277922_Galaxia_(Extended_Mix).aiff")
0.9 vs 0.7 seconds for x0
1.6 vs 0.6 seconds for x1
6.5 vs 2.5 seconds for x4
9.3 vs 3.4 seconds for x6

Adding "mv", so --import-id3 -yfmvx%x,
1.8 vs 1.1 seconds for x0
2.3 vs 1.1 seconds for x1
7.3 vs 2.9 seconds for x4
9.9 vs 3.9 seconds for x6

Keep mv but use -g (normal mode):
2.9 vs 1.7 seconds for x0
4.0 vs 1.8 seconds for x1
30 vs 10 seconds for x4
49 vs 17 seconds for x6

h:
2.3 vs 1.3 seconds for x0
3.0 vs 1.6 seconds for x1
19.4 vs 6.8 seconds for x4
55 vs 19 seconds for x6

hh:
3.5 vs 2.0 seconds for x0
5.1 vs 2.2 seconds for x1
45 vs 17 seconds for x4
98 vs 32 seconds for x6

 

Re: WavPack 5.6.4 Has Multithreading Option

Reply #5
I wasn’t too interested in this earlier (I was a little late to the multi-core party).
I think WavPack is the first codec employing multithreading "within" a single input (at least with 'reference' software), so in that sense, you're the first to the party really!
Good point! I'm still not sure how much sense multithreading makes there, but after I had spent so much time getting the infrastructure in, it seemed silly not to go all the way. Fortunately I did it in a way that it can easily be disabled in the build if it turns out to be not ready for prime time.

Quickly testing one file (tested for ID3 import capability as you can see, but I might as well run --threads)

tl;dr:
Confirming that -x1 runs "virtually for free".


Some indicative times, recorded from WavPack's own timer - for any sort of rigor I would have to run this computer up to a stable temperature, and well ... not tonight.
Thanks Porcus! Yeah, at this point I'm happy to see it not crash, so that's good. I've done a lot of multithreading programming on embedded systems, but not on Windows.

Am considering making the base "extra" mode come on by default when "--threads" is specified. And of course, if this ends up universally stable, might be worth making threading the default too (and add "--no-threads" perhaps).