Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: WavPack 5.6.4 Has Multithreading Option (Read 11522 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

WavPack 5.6.4 Has Multithreading Option

Some of you might be aware that I have been working with the Allen Institute for Neural Dynamics team on the applicability of WavPack for compressing electrophysiology data collected from probes inserted into the brain. These probes generally have many channels (e.g., 384) and while they were looking at FLAC they were referred to WavPack (thanks @ktf ) because it natively handles up to 4096 channels. You can read about this on GitHub here.

One of the issues that came up in this project was that while the encoding was parallelized in the Python front end, the decoding was not, and therefore was slow. I realized that it would not be that complicated to parallelize multichannel decoding right inside libwavpack. All of the encoded data for every channel must already be resident before decoding starts and there’s no interaction between the various streams. I put together a proof-of-concept for this and it greatly sped up their decoding. While it wasn’t needed initially, I also extended this to encoding because the organization was similar. I refer to this as spatial multithreading.

Once I had written all the code to have multiple worker threads in libwavpack I could not help but begin thinking about extending this to mono and stereo streams, which would have to be done temporally instead of spatially. This is more complex because it would only work with operations that span multiple WavPack blocks (which are already fairly long) so the command-line programs, and any other application that wanted to use this, would also have to change. And with encoding there’s an additional complication that WavPack, being continuously adaptive, uses state information from the end of the previous block to start encoding the next block.

But long story made short, I found ways around most of these issues and created a set of command-line programs that include a --threads option to allow multithreaded encoding and decoding, and the results are pretty dramatic. The speed improvement is generally between 2x and 4x, depending on the modes used, and can be as high as about 6x with the maximum thread count (--threads=12). The only serious limitation that I wasn’t able to get around is that the hybrid mode (lossy or lossless) does not encode multithreaded with stereo or mono files (although most multichannel encoding is fine with it). Multithreaded decoding is available for all files.

To be clear, this is not any more efficient than older WavPack versions, and in fact is slightly less so because of the context switching overhead. The speedup comes from using multiple cores or threads, so unless a single core can't keep up (which seems an unlikely situation) then using this for realtime playback or recording does not make sense. This does make sense for some offline operations though, like verify or transcode, and on my system CD encoding with --hhx4 goes from 11x RT to 59x RT!

Other notes:

  • This includes and lot of new and changed code and so I except bugs to appear, so it’s more important than ever to use the -v option when encoding to make sure everything worked.
  • If enabled, the default number of worker threads is 4 and that seems to be a reasonable compromise, even on computers that have less than 4 cores (see below). The maximum parameter is 12 workers, and this can give an additional speedup in some cases, however not all. For example, when on battery power only my new HP Dev One seems to throttle down the CPU speed when more than 4 cores are running, causing no improvement beyond that. On AC power it doesn't.
  • I have tested this on an old Core 2 Duo system that served as my main PC until this happened and even there I get a nice speedup. This old system somewhat explains why I wasn’t too interested in this earlier (I was a little late to the multi-core party).
  • For multichannel encoding, the output files should be identical whether threading is used or not. For now, dynamic noise shaping (dns) in the hybrid lossless mode and the --merge-blocks (for lossyWAV) option prevent multithreaded operation (although I might be able to fix those eventually).
  • For mono or stereo encoding, the output files probably will not be identical for different numbers of worker threads, however the compression ratios will be roughly similar. I recommend always using at least -x1 for all mono or stereo encodes because with multithreading it's basically free. Again, multithreaded encoding does not work for mono or stereo files in any hybrid mode.

Thanks in advance for any testing and/or feedback!

Edit: updated executables are now in this post. (I don't seem to be able to delete these).

 

Re: WavPack 5.6.4 Has Multithreading Option

Reply #1
Cool!

Probably a good idea to instruct oss-fuzz to use the thread sanitizer (TSan) for fuzzing to catch some bugs. Perhaps you've already run tests with TSan locally?
Music: sounds arranged such that they construct feelings.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #2
Cool!

Probably a good idea to instruct oss-fuzz to use the thread sanitizer (TSan) for fuzzing to catch some bugs. Perhaps you've already run tests with TSan locally?
Thanks!

I definitely run all the sanitizers (including tsan) locally quite often.

As for the oss-fuzz, that's probably a good thing to add too. Unfortunately an issue I can see is that it requires big files and big reads to invoke the multithreading (at least the temporal variety) and the fuzzers get slow and inefficient with large files (I spent a lot of time distilling the seeds down to the absolute smallest size I could). But there may be some way to spoof that (like using really small frames).

So, why? Did you see something?   :D

Re: WavPack 5.6.4 Has Multithreading Option

Reply #3
So, why? Did you see something?   :D
No, I just wanted to help  :)) I have been busy the last few months improving code coverage of oss-fuzz for flac and fixing the found bugs, so it really was the first thing that popped in my mind reading this.

(I spent a lot of time distilling the seeds down to the absolute smallest size I could).
Ah, yes. I guess I can consider myself lucky, I really only need to worry about coverage and fixing bugs with flac, I haven't really bothered seeding anything.

I wasn’t too interested in this earlier (I was a little late to the multi-core party).[/li][/list]
I think WavPack is the first codec employing multithreading "within" a single input (at least with 'reference' software), so in that sense, you're the first to the party really!

Music: sounds arranged such that they construct feelings.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #4
Quickly testing one file (tested for ID3 import capability as you can see, but I might as well run --threads)

tl;dr:
Confirming that -x1 runs "virtually for free".


Some indicative times, recorded from WavPack's own timer - for any sort of rigor I would have to run this computer up to a stable temperature, and well ... not tonight.

-fx<N>, running the following loop with the x64 build, on an SSD:
for %x IN (0,1,4,6) DO (C:\bin\wavpack-5.6.4-win64\wavpack.exe --import-id3 -yfx%x ".\12277922_Galaxia_(Extended_Mix).aiff" & C:\bin\wavpack-5.6.4-win64\wavpack.exe --threads --import-id3 -yfx%x ".\12277922_Galaxia_(Extended_Mix).aiff")
0.9 vs 0.7 seconds for x0
1.6 vs 0.6 seconds for x1
6.5 vs 2.5 seconds for x4
9.3 vs 3.4 seconds for x6

Adding "mv", so --import-id3 -yfmvx%x,
1.8 vs 1.1 seconds for x0
2.3 vs 1.1 seconds for x1
7.3 vs 2.9 seconds for x4
9.9 vs 3.9 seconds for x6

Keep mv but use -g (normal mode):
2.9 vs 1.7 seconds for x0
4.0 vs 1.8 seconds for x1
30 vs 10 seconds for x4
49 vs 17 seconds for x6

h:
2.3 vs 1.3 seconds for x0
3.0 vs 1.6 seconds for x1
19.4 vs 6.8 seconds for x4
55 vs 19 seconds for x6

hh:
3.5 vs 2.0 seconds for x0
5.1 vs 2.2 seconds for x1
45 vs 17 seconds for x4
98 vs 32 seconds for x6

Re: WavPack 5.6.4 Has Multithreading Option

Reply #5
I wasn’t too interested in this earlier (I was a little late to the multi-core party).
I think WavPack is the first codec employing multithreading "within" a single input (at least with 'reference' software), so in that sense, you're the first to the party really!
Good point! I'm still not sure how much sense multithreading makes there, but after I had spent so much time getting the infrastructure in, it seemed silly not to go all the way. Fortunately I did it in a way that it can easily be disabled in the build if it turns out to be not ready for prime time.

Quickly testing one file (tested for ID3 import capability as you can see, but I might as well run --threads)

tl;dr:
Confirming that -x1 runs "virtually for free".


Some indicative times, recorded from WavPack's own timer - for any sort of rigor I would have to run this computer up to a stable temperature, and well ... not tonight.
Thanks Porcus! Yeah, at this point I'm happy to see it not crash, so that's good. I've done a lot of multithreading programming on embedded systems, but not on Windows.

Am considering making the base "extra" mode come on by default when "--threads" is specified. And of course, if this ends up universally stable, might be worth making threading the default too (and add "--no-threads" perhaps).

Re: WavPack 5.6.4 Has Multithreading Option

Reply #6
I wasn’t too interested in this earlier (I was a little late to the multi-core party).
I think WavPack is the first codec employing multithreading "within" a single input (at least with 'reference' software), so in that sense, you're the first to the party really!
Good point!

Except, TAK does it already. The -tn switch makes an impact even when processing a single file:
Code: [Select]
C:\tmp>\bin\TAK\Applications\Takc.exe -e -p4 -tn1 -overwrite NNN.wav
NNN.wav                             ..........  36.18%  256*

Compression:     36.18 %
Duration:         2.45 sec
Speed:          256.16 * real time

C:\tmp>\bin\TAK\Applications\Takc.exe -e -p4 -tn4 -overwrite NNN.wav
NNN.wav                             ..........  36.18%  692*

Compression:     36.18 %
Duration:         0.91 sec
Speed:          691.99 * real time

Re: WavPack 5.6.4 Has Multithreading Option

Reply #7
Tested the following, on one source file only (an AIFF available from https://soundcloud.com/kavakon/kava-kon-sakau-bar-heavy-mix )
For mono or stereo encoding, the output files probably will not be identical for different numbers of worker threads, however the compression ratios will be roughly similar.

I tested "all 364 possibilities" f, g, h, hh, -x0 to -x6, --threads=0 to --threads=12. With the reservation that follows from testing only one file, there is an unfortunate effect here:
The recommended "-x" is where the multi-threading does have adverse size impact.
Worst for -hx, any --threads (apart from 0) gave a size penalty of >2%
For -hhx, >1.34%
For -x, 0.66% to 1.34%
For -fx, 0.57% to 0.62%.
So yeah, well, -x is free but not free.
Also, for -x0, then any --threads=4 or above was within 11 ppm of minimum size, and better than --threads=0 (except two cherry-pickings: -hh beat --hh --threads=11 and --hh --threads=5 by eighteen resp. two bytes)

First, --threads=0 makes no difference (always gives bitwise-identical .wv files) to not using "--threads" at all. That is probably as you would expect.
But --threads=1 makes a difference.  Not sure if that was intended - it is still only one thread, right? And, typically it gave bigger files than --threads=0.  Not always though.

--threads=0 made for
* smallest files in the following settings: fx, fx2, x, x2, x4, x6, hx1, hx2, hx4, hx5, hx6, hhx1, hhx2
* BIGGEST files in the following settings: fx0 (+0.03% over smallest), x0 (+0.02%), x3, h (+0.02%), hx3, hhx3, hhx4 (+0.09%).  hhx4 is a setting people might want to use.

Settings where the "worst" makes a bigger impact over the "best", than the 0.09% of hhx4, are: the fx, x, hx, hhx as mentioned above, and anything with x2 (0.10% for fx2 to 0.90% for hx2) and [g|h]x{4|5|6], ranging +0.14% to +0.24%.

The CPU has 8 threads, so --threads=9 to --threads=12 are oddballs (I guess it is too much work to probe the hardware).  Anyway, one of those four settings -- indicated by 9, A, B and C in the table - make for smallest files in the following settings: fx3, x, x3, h, hh, hhx3, hhx4. Again, hhx4 is something that could make a difference.


File sizes follow, sorted. The command included -v and --import-id3, in case anyone gets nearly the same.
Code: [Select]
17966398	 -hh -x6 --threads4
17966450 -hh -x6 --threads5
17967270 -hh -x6 --threads6
17967560 -hh -x6 --threads7
17967600 -hh -x6 --threads8
17967668 -hh -x6 --threads0
17967756 -hh -x6 --threads2
17967958 -hh -x6 --threadsC
17968006 -hh -x6 --threads1
17968026 -hh -x6 --threads9
17968488 -hh -x6 --threads3
17968490 -hh -x6 --threadsB
17968652 -hh -x6 --threadsA
17979998 -hh -x5 --threadsC
17980986 -hh -x5 --threadsB
17982024 -hh -x5 --threadsA
17982170 -hh -x5 --threads9
17983078 -hh -x5 --threads8
17983766 -hh -x5 --threads7
17984294 -hh -x5 --threads6
17985008 -hh -x5 --threads4
17985234 -hh -x5 --threads5
17987356 -hh -x5 --threads0
17987624 -hh -x5 --threads2
17987740 -hh -x5 --threads1
17988302 -hh -x5 --threads3
17995618 -h -x6 --threads0
17996732 -h -x6 --threads2
17997394 -h -x6 --threads1
17998212 -h -x6 --threads3
18002710 -h -x5 --threads0
18003940 -h -x5 --threads2
18004596 -h -x5 --threads1
18005428 -h -x5 --threads3
18006674 -h -x6 --threads5
18011782 -h -x5 --threads4
18012066 -h -x5 --threads7
18012914 -h -x6 --threads7
18015318 -h -x5 --threads5
18015976 -h -x5 --threads6
18016120 -h -x6 --threads8
18017020 -h -x6 --threads4
18017184 -h -x5 --threads8
18019110 -h -x5 --threads9
18023404 -h -x5 --threadsA
18026424 -h -x5 --threadsB
18028274 -h -x6 --threads9
18028480 -hh -x4 --threadsC
18029350 -h -x5 --threadsC
18030972 -hh -x4 --threadsB
18031734 -h -x6 --threads6
18033154 -hh -x4 --threadsA
18033974 -h -x6 --threadsC
18035038 -hh -x4 --threads9
18036618 -hh -x4 --threads8
18037902 -hh -x4 --threads7
18038380 -h -x6 --threadsA
18039316 -h -x6 --threadsB
18039978 -hh -x4 --threads6
18040884 -h -x4 --threads0
18041266 -hh -x4 --threads5
18041314 -h -x4 --threads1
18041348 -h -x4 --threads2
18041436 -h -x4 --threads3
18043078 -hh -x4 --threads4
18044320 -hh -x4 --threads2
18044324 -hh -x4 --threads3
18044466 -hh -x4 --threads1
18044554 -h -x4 --threads4
18044598 -hh -x4 --threads0
18046308 -h -x4 --threads5
18047586 -h -x4 --threads6
18048766 -h -x4 --threads7
18050652 -h -x4 --threads8
18057090 -h -x4 --threads9
18058732 -h -x4 --threadsA
18063230 -h -x4 --threadsB
18068098 -h -x4 --threadsC
18171702 -g -x6 --threads0
18171988 -g -x6 --threads1
18171990 -g -x6 --threads3
18172008 -g -x6 --threads2
18175502 -g -x6 --threads4
18176508 -g -x6 --threads5
18178420 -g -x6 --threads6
18180804 -g -x6 --threads7
18184430 -g -x6 --threads8
18187504 -g -x6 --threads9
18190894 -g -x6 --threadsB
18191862 -g -x6 --threadsC
18196432 -g -x6 --threadsA
18206414 -hh -x2 --threads0
18210310 -hh -x3 --threadsB
18210328 -hh -x3 --threads7
18210338 -hh -x3 --threads5
18210342 -hh -x3 --threads8
18210344 -hh -x3 --threadsA
18210344 -hh -x3 --threads9
18210368 -hh -x3 --threadsC
18210380 -hh -x3 --threads4
18210394 -hh -x3 --threads6
18210586 -hh -x3 --threads3
18210616 -hh -x3 --threads2
18210766 -hh -x3 --threads1
18211166 -hh -x3 --threads0
18226536 -hh -x1 --threads0
18260694 -g -x5 --threads2
18260780 -g -x5 --threads1
18261060 -g -x5 --threads0
18261076 -g -x5 --threads3
18268110 -hh -x2 --threads1
18273464 -hh -x2 --threads2
18277048 -hh -x2 --threads3
18280678 -g -x5 --threads6
18281794 -g -x4 --threads0
18281950 -g -x4 --threads1
18282254 -g -x4 --threads2
18282620 -g -x4 --threads3
18286700 -g -x5 --threads5
18287026 -hh -x2 --threads4
18287250 -g -x5 --threads4
18289230 -hh -x2 --threads9
18289744 -hh -x2 --threads7
18289816 -hh -x2 --threads5
18289874 -hh -x2 --threads8
18290282 -hh -x2 --threads6
18290642 -g -x5 --threads7
18290838 -hh -x2 --threadsC
18291320 -hh -x2 --threadsB
18291568 -hh -x2 --threadsA
18294614 -g -x5 --threadsC
18294914 -g -x5 --threads8
18295936 -g -x5 --threads9
18299050 -g -x4 --threads6
18299988 -g -x5 --threadsB
18300490 -g -x5 --threadsA
18301486 -g -x4 --threads4
18304786 -g -x4 --threads5
18305224 -g -x4 --threads7
18310760 -g -x4 --threadsC
18311384 -g -x4 --threads8
18314142 -g -x4 --threads9
18317336 -g -x4 --threadsA
18319496 -g -x4 --threadsB
18345958 -h -x3 --threads7
18345960 -h -x3 --threadsC
18345962 -h -x3 --threadsB
18345968 -h -x3 --threads9
18345992 -h -x3 --threads6
18345994 -h -x3 --threadsA
18345996 -h -x3 --threads8
18346000 -h -x3 --threads4
18346008 -h -x3 --threads5
18346210 -h -x3 --threads3
18346320 -h -x3 --threads2
18346382 -h -x3 --threads1
18346894 -h -x3 --threads0
18365224 -h -x2 --threads0
18371064 -h -x1 --threads0
18452494 -h -x2 --threads1
18471850 -h -x2 --threads2
18480962 -hh -x1 --threads1
18483476 -hh -x1 --threads2
18483642 -hh -x1 --threads3
18483940 -h -x2 --threads3
18502496 -hh -x1 --threads4
18502850 -hh -x1 --threads7
18502892 -hh -x1 --threads5
18502922 -hh -x1 --threads6
18502994 -hh -x1 --threads8
18503142 -hh -x1 --threadsC
18503178 -hh -x1 --threads9
18503206 -hh -x1 --threadsA
18503392 -hh -x1 --threadsB
18503564 -hh -x0 --threads9
18503582 -hh -x0 --threads4
18503646 -hh -x0 --threads6
18503704 -hh -x0 --threadsA
18503742 -hh -x0 --threads3
18503744 -hh -x0 --threadsC
18503750 -hh -x0 --threads1
18503752 -hh -x0 --threads8
18503766 -hh -x0 --threads7
18503792 -hh -x0 --threads0
18503794 -hh -x0 --threads5
18503816 -hh -x0 --threadsB
18503998 -hh -x0 --threads2
18522798 -h -x2 --threads5
18524750 -h -x2 --threads6
18525096 -h -x2 --threadsA
18525114 -h -x2 --threads8
18525270 -h -x2 --threads4
18525632 -h -x2 --threads7
18528106 -h -x2 --threadsB
18529356 -h -x2 --threadsC
18529942 -h -x2 --threads9
18621858 -g -x3 --threadsC
18621870 -g -x3 --threadsB
18621874 -g -x3 --threads8
18621890 -g -x3 --threads7
18621892 -g -x3 --threads5
18621894 -g -x3 --threads9
18621894 -g -x3 --threads6
18621904 -g -x3 --threadsA
18621914 -g -x3 --threads4
18622202 -g -x3 --threads3
18622430 -g -x3 --threads2
18622590 -g -x3 --threads1
18623474 -g -x3 --threads0
18629112 -g -x2 --threads0
18639844 -g -x2 --threads1
18642336 -g -x2 --threads2
18644196 -g -x2 --threads3
18647808 -g -x1 --threads0
18648506 -g -x2 --threads7
18648642 -g -x2 --threads8
18648766 -g -x2 --threads4
18648806 -g -x2 --threads5
18648864 -g -x2 --threadsB
18648934 -g -x2 --threads9
18648940 -g -x2 --threads6
18649362 -g -x2 --threadsC
18649390 -g -x2 --threadsA
18767168 -h -x1 --threads1
18767934 -h -x1 --threads2
18768062 -h -x1 --threads3
18770870 -g -x1 --threads1
18788350 -h -x1 --threads6
18788606 -h -x1 --threads7
18788866 -h -x1 --threads8
18788920 -h -x1 --threadsC
18789020 -h -x1 --threads5
18789166 -h -x1 --threads4
18789500 -h -x1 --threads9
18789600 -h -x1 --threadsB
18789686 -h -x1 --threadsA
18795458 -h -x0 --threads9
18795510 -h -x0 --threadsB
18795530 -h -x0 --threads7
18795530 -h -x0 --threadsA
18795552 -h -x0 --threads8
18795580 -h -x0 --threadsC
18795606 -h -x0 --threads6
18795632 -h -x0 --threads5
18795684 -h -x0 --threads4
18796362 -h -x0 --threads3
18797028 -h -x0 --threads2
18797802 -h -x0 --threads1
18799708 -h -x0 --threads0
18809416 -g -x1 --threads2
18831798 -g -x1 --threads3
18892180 -g -x1 --threads4
18893526 -g -x1 --threads5
18896490 -g -x1 --threads7
18896502 -g -x1 --threads6
18897408 -g -x1 --threads8
18897970 -g -x1 --threadsA
18898090 -g -x1 --threadsC
18898372 -g -x1 --threads9
18898520 -g -x1 --threadsB
18903674 -g -x0 --threadsC
18903724 -g -x0 --threadsB
18903732 -g -x0 --threads8
18903778 -g -x0 --threads9
18903792 -g -x0 --threads7
18903794 -g -x0 --threads5
18903830 -g -x0 --threads6
18903858 -g -x0 --threads4
18903880 -g -x0 --threadsA
18904900 -g -x0 --threads3
18905554 -g -x0 --threads2
18905764 -g -x0 --threads1
18907852 -g -x0 --threads0
19217028 -f -x6 --threads3
19217420 -f -x6 --threads2
19217608 -f -x6 --threads1
19218416 -f -x6 --threads0
19226180 -f -x6 --threads5
19227064 -f -x6 --threads4
19229064 -f -x6 --threads6
19229588 -f -x6 --threads7
19231542 -f -x6 --threads8
19231618 -f -x6 --threads9
19231780 -f -x6 --threadsA
19231802 -f -x6 --threadsB
19231858 -f -x6 --threadsC
19236974 -f -x5 --threads1
19237248 -f -x5 --threads3
19237432 -f -x5 --threads0
19238492 -f -x5 --threads2
19241172 -f -x5 --threads6
19242336 -f -x5 --threads5
19242442 -f -x5 --threads4
19243552 -f -x5 --threads7
19243578 -f -x5 --threads9
19243616 -f -x5 --threadsB
19243810 -f -x5 --threads8
19244176 -f -x5 --threadsA
19244806 -f -x5 --threadsC
19245262 -f -x4 --threads3
19246692 -f -x4 --threads1
19246970 -f -x4 --threads2
19248010 -f -x4 --threads0
19249508 -f -x4 --threads6
19250700 -f -x4 --threads5
19251072 -f -x4 --threadsB
19251670 -f -x4 --threads4
19251788 -f -x4 --threads7
19252452 -f -x4 --threadsA
19252488 -f -x4 --threads9
19252728 -f -x4 --threads8
19252898 -f -x4 --threadsC
19339416 -f -x3 --threadsB
19339424 -f -x3 --threads8
19339430 -f -x3 --threadsC
19339434 -f -x3 --threads5
19339440 -f -x3 --threads6
19339448 -f -x3 --threads7
19339450 -f -x3 --threadsA
19339464 -f -x3 --threads9
19339480 -f -x3 --threads4
19339842 -f -x3 --threads3
19340032 -f -x3 --threads2
19340068 -f -x3 --threads1
19340950 -f -x3 --threads0
19356152 -f -x2 --threads0
19368958 -f -x2 --threads1
19370280 -f -x2 --threads3
19371322 -f -x2 --threads2
19374952 -f -x2 --threads4
19375570 -f -x2 --threads9
19375590 -f -x2 --threads7
19375782 -f -x2 --threadsC
19376148 -f -x2 --threads5
19376278 -f -x2 --threadsB
19376292 -f -x2 --threads8
19376314 -f -x1 --threads0
19376324 -f -x2 --threads6
19376372 -f -x2 --threadsA
19487358 -f -x1 --threads1
19488134 -f -x1 --threads2
19493332 -f -x1 --threads3
19496996 -f -x1 --threads6
19497002 -f -x1 --threads8
19497006 -f -x1 --threadsA
19497020 -f -x1 --threads7
19497022 -f -x1 --threadsB
19497024 -f -x1 --threadsC
19497032 -f -x1 --threads5
19497038 -f -x1 --threads9
19497058 -f -x1 --threads4
21372156 -f -x0 --threadsC
21372156 -f -x0 --threads8
21372218 -f -x0 --threadsB
21372298 -f -x0 --threadsA
21372326 -f -x0 --threads6
21372328 -f -x0 --threads7
21372342 -f -x0 --threads9
21372458 -f -x0 --threads4
21372488 -f -x0 --threads5
21374204 -f -x0 --threads3
21374966 -f -x0 --threads2
21375374 -f -x0 --threads1
21378356 -f -x0 --threads0
46772270 aiff

Re: WavPack 5.6.4 Has Multithreading Option

Reply #8
Thanks @Porcus  for your always thorough investigation and insights!    8)

I decided that the “free” extra mode wasn’t going to be consistent enough (your file was particularly bad) so instead I bumped up the work done for -x1 and -x2 when threading was active so they now do take proportionally longer. They’re still a little random and they won’t generate the same results as the single-threaded mode (and still vary slightly with the number of threads), but they do now seem to provide a consistent and reliable improvement with not much cost in speed. I would be interested in seeing new results with your same 364 run test!   :)

The --threads option specifies the number of worker threads to spin up. So, as you discovered, --threads=0 does nothing (I may add --no-threads if I make threading the default). However, --threads=1 creates a single worker thread in addition to the default thread, and the default thread will do work if all the worker threads are busy. For this application there is no penalty having more threads running than physical threads because the overhead is low and it makes it easier to keep all the hardware busy (which is why having 4 extra workers provides the best benefit on my 2 core machine).

One nitpick. You mention that the penalty for using threads in one case (-hx) was over 2%. This startled me at first, but then I realized you were talking about percentage of the final size, not the original size like we normally use. So since you’re getting over 60% compression (this file is almost mono and not a lot of HF) we’re talking less than 1% of the original size. In any event, the difference is now just around 0.34% with that file, and of course with threading you can afford to use far higher “extra” modes than before.

In any event, I have posted the latest version over in the thread on 32-bit compression.

Also, there's a preprint available of the "brainwave" compression paper.


Re: WavPack 5.6.4 Has Multithreading Option

Reply #9
Also, there's a preprint available of the "brainwave" compression paper available.
Nice! I've seen mentions of lossless audio compression algorithms (FLAC mostly) being mentioned in papers, but usually that's a few lines or even only a single mention. This paper mentions WavPack 83 times and FLAC 26 times, that is something else. This is really about the use of audio codecs for (certain) measurement signals.

Thanks for sharing
Music: sounds arranged such that they construct feelings.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #10
Yes, basically this paper is promoting the use of audio compression for this type of data. I don't know if that's what they had in mind when they started, but these results were so good they continued down that path, including the Wavpack lossy investigation.

Of course, whether that means they can convince anyone else that it actually makes sense is something else entirely.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #11
I put this to a (nasty?!) test in high mode:
of course with threading you can afford to use far higher “extra” modes than before.
Confirmed on as much as 3 hours of music on a cooling-constrained laptop where the fan is running loud. I had expected that with this much of work and heat, the gains would wane off - and some is lost, but still there is plenty up for grabs.
To my surprise, I could even run -hx6 --threads, which takes twenty minutes with the fan working hard, and it would take less time than single-threaded -hx4. And this on only two physical cores, four threads.
Results first, details to follow beneath:

1434205185   bytes in   136    seconds with -h.
1434179299   bytes in   89    seconds with -h --threads, makes faster and smaller
1425829951   bytes in   145    seconds with -hx --threads. Slightly larger than single-threaded and not much time saved, in line with previous result. (fb2k spent around 100 seconds multi-threading the conversion to -mhx, but that was not recompression, saving the decoding CPU load.)
I.e.: 149k saved per extra second spent, by adding the "x".
1425063487   bytes in   186    seconds with -hx.
Bigger and slower than -hx2 --threads.  But FWIW: 182k saved per extra second spent, by adding the "x". 94k over -h --threads. 19k over -hx --threads.
1424546885   bytes in   162   seconds with -hx2 --threads.
I.e.: Smaller and faster than -hx.
1423255361   bytes in   210    seconds with -hx3 --threads. That is also smaller and faster than single-threaded -hx2
I.e.: 27k saved per extra second spent over -hx2 --threads
1415436011   bytes in   633    seconds with -hx4 --threads=8. threads=8 on a 4-thread CPU ...
1415406469   bytes in   732    seconds with -hx4 --threads=3. Not utilizing the full!
1415406065   bytes in   648    seconds with -hx4 --threads.
I.e.: 18k saved per extra second spent over -hx3 --threads
1415381901   bytes in   1486    seconds with -hx4. So multi-threading with two physical cores halved the time, even though the fan was running at full noise for most of the job. (Savings by going single-threaded? 29 bytes per second. Not kilobytes!)
1414927445   bytes in   920    seconds with -hx5 --threads. (Even lower single-threaded savings.)
I.e.:1.8k saved per extra second spent over -hx4 --threads
1413957109   bytes in   1285    seconds with -hx6 --threads. Also shorter time than -hx4.
I.e.: 2.8k saved per extra second spent over -hx5 --threads. (As single-threaded -hx5 took over half an hour, I didn't bother to run -hx6 single-threded.)


So my "complaints" over multi-threading increasing size, are not at all supported by this test:
Even if we disregard that single-threaded -hx is both bigger and slower than -hx2 --threads, and just look at the impact of dropping "--threads": the savings in going single-threaded are like 19k per extra second spent.  Now if you think that is good value for money, then go straight to -hx4 --threads for more savings.  Actually, the "last" and least valuable step up, from -hx3 --threads to -hx4 --threads, paid off 18k per second, which is so nearly tied to the 19 that ... well. 

Also I am surprised that the benefits from multi-threading is that much in the heavy end.  I would have expected the cooling constraint to kick in much harder when we are running for twenty minutes.
For thse who wonder why - shouldn't four threads be faster than one?
The CPU is throttled down to get rid of the heat produced by the work done, and that work doesn't get less by being divided into two threads running simultaneously. If performance is limited by how much heat it can dissipate, then the number of threads wouldn't matter. On -hx<big> --threads, fan stars running during track 1 and hits full noise after "a couple of tracks". So 26 of them is enough to kill some of the speed gains. But far from all! Surprisingly far from, I'd say.

More details:
This was done on my retired Dell (business) laptop, equipped with  i5-6300U CPU, two cores and four threads. Not your water-cooled gaming rig eh?

Corpus: this (free!) sampler. https://gizehrecords.bandcamp.com/album/anthology-a-free-label-sampler . Chosen because it is as big as four ordinary CDs, and sampler --> some diversity of signals (37 percent is 24-bit, says fb2k).

I chose "high" mode over "very high" because ... hm, the following considerations might not fare well if I tested them, but what I was thinking was that those who truly want -hhx6 are more likely to want the smallest files, while those who choose -h might rather want to listen to whether it is worth it.
As for x4, it is so slow that I won't insist it is every person's choice on CDDA, but x4 has shown to work well on high resolution. That going -hx4 --threads (over 3!) would make for about as much bang for the buck as dropping multi-threading from -hx --threads, just turned up after the fact.


Re: WavPack 5.6.4 Has Multithreading Option

Reply #12
DSD multithreaded.
Back on my usual computer, fanless with unreliably varying timings, but still maybe it gives some information.
i5-1135G7, 4 cores and 8 threads.

Five files in two pieces of music tested.
DSD64, DSD128 and DSD256 of Carmen Gomes from https://www.soundliaison.com/index.php/6-compare-formats (all "blocks not padded with NULLs, MD5 will not match!")
DSD64 stereo and DSD64 5.1 of "Crossing" from https://www.oppodigital.com/hra/dsd-by-davidelias.aspx .

I ran both normal and high mode, --threads=from 0 to 9 (yes, "9" on an 8-thread CPU), SSD to same SSD. Command:
for %l IN (g,h) DO (for /l %n IN (0,1,9) DO (for /r %f IN (*.dsf) DO \bin\wavpack-5.6.6-win64\wavpack.exe -%l --threads=%n "%f" -o "%~nf-%l--threads=%n.wv"))

Observation: Files often came out identical.
* All -g (normal mode) files came out as ten identical ones.
* So did the multi-channel "Crossing" in high mode.
* The other four originals came out as ten distinct files each in -h mode. The "Crossing" varied the most in size, 2.5 percent (not percentage points!) with some multi-threading smallest, the Carmen Gomes ones varied by around half a percent with single-threading smallest.

Output copied and sorted appropriately follows. Likely there are timing variations that are due to how hard the previous compression task was, and how that affected throttling, but anyway: with -h there is quite a difference between --threads=0 and --threads=1.
Code: [Select]
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=0.wv in 1.95 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=1.wv in 1.53 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=2.wv in 1.26 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=3.wv in 1.26 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=4.wv in 1.07 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=5.wv in 0.98 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=6.wv in 0.98 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=7.wv in 0.99 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=8.wv in 1.01 secs (lossless, 40.30%)
created a-fool-for-you-carmen-gomes-inc-dsd64-g--threads=9.wv in 0.90 secs (lossless, 40.30%)

created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=0.wv in 10.43 secs (lossless, 53.58%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=1.wv in 6.63 secs (lossless, 53.53%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=2.wv in 5.49 secs (lossless, 53.50%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=3.wv in 5.18 secs (lossless, 53.50%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=4.wv in 4.43 secs (lossless, 53.39%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=5.wv in 4.07 secs (lossless, 53.45%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=6.wv in 3.82 secs (lossless, 53.43%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=7.wv in 3.66 secs (lossless, 53.44%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=8.wv in 3.48 secs (lossless, 53.41%)
created a-fool-for-you-carmen-gomes-inc-dsd64-h--threads=9.wv in 3.38 secs (lossless, 53.34%)

created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=0.wv in 3.65 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=1.wv in 3.03 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=2.wv in 2.53 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=3.wv in 2.44 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=4.wv in 2.13 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=5.wv in 1.96 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=6.wv in 1.88 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=7.wv in 1.93 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=8.wv in 1.96 secs (lossless, 40.38%)
created a-fool-for-you-carmen-gomes-inc-dsd128-g--threads=9.wv in 1.83 secs (lossless, 40.38%)

created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=0.wv in 20.51 secs (lossless, 55.34%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=1.wv in 12.93 secs (lossless, 55.28%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=2.wv in 11.21 secs (lossless, 55.25%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=3.wv in 9.49 secs (lossless, 55.23%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=4.wv in 8.67 secs (lossless, 55.13%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=5.wv in 7.99 secs (lossless, 55.15%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=6.wv in 7.55 secs (lossless, 55.14%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=7.wv in 7.20 secs (lossless, 55.14%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=8.wv in 6.79 secs (lossless, 55.12%)
created a-fool-for-you-carmen-gomes-inc-dsd128-h--threads=9.wv in 6.61 secs (lossless, 55.09%)

created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=0.wv in 7.06 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=1.wv in 5.38 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=2.wv in 4.66 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=3.wv in 4.54 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=4.wv in 3.90 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=5.wv in 3.58 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=6.wv in 3.44 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=7.wv in 3.52 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=8.wv in 3.51 secs (lossless, 40.41%)
created a-fool-for-you-carmen-gomes-inc-dsd256-g--threads=9.wv in 3.30 secs (lossless, 40.41%)

created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=0.wv in 42.80 secs (lossless, 57.13%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=1.wv in 25.30 secs (lossless, 57.10%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=2.wv in 20.55 secs (lossless, 57.07%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=3.wv in 18.63 secs (lossless, 57.05%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=4.wv in 16.88 secs (lossless, 56.98%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=5.wv in 15.59 secs (lossless, 56.98%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=6.wv in 14.89 secs (lossless, 56.96%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=7.wv in 14.00 secs (lossless, 56.96%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=8.wv in 13.35 secs (lossless, 56.95%)
created a-fool-for-you-carmen-gomes-inc-dsd256-h--threads=9.wv in 12.87 secs (lossless, 56.97%)


created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=0.wv in 3.02 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=1.wv in 2.43 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=2.wv in 2.05 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=3.wv in 1.93 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=4.wv in 1.79 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=5.wv in 1.58 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=6.wv in 1.53 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=7.wv in 1.49 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=8.wv in 1.57 secs (lossless, 31.32%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-g--threads=9.wv in 1.62 secs (lossless, 31.32%)

created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=0.wv in 17.52 secs (lossless, 47.92%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=1.wv in 10.49 secs (lossless, 47.78%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=2.wv in 8.75 secs (lossless, 47.99%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=3.wv in 7.83 secs (lossless, 47.77%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=4.wv in 7.37 secs (lossless, 48.47%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=5.wv in 6.47 secs (lossless, 47.13%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=6.wv in 6.17 secs (lossless, 48.46%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=7.wv in 5.82 secs (lossless, 48.46%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=8.wv in 5.58 secs (lossless, 48.46%)
created 08 - David Elias - Crossing - Morning Light Western Town (DSD64 2.0)-h--threads=9.wv in 5.34 secs (lossless, 48.46%)

created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=0.wv in 6.95 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=1.wv in 6.93 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=2.wv in 4.24 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=3.wv in 4.26 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=4.wv in 4.11 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=5.wv in 3.99 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=6.wv in 4.19 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=7.wv in 4.12 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=8.wv in 4.16 secs (lossless, 37.77%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-g--threads=9.wv in 4.11 secs (lossless, 37.77%)

created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=0.wv in 42.55 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=1.wv in 35.97 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=2.wv in 22.17 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=3.wv in 20.55 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=4.wv in 23.55 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=5.wv in 21.83 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=6.wv in 20.54 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=7.wv in 19.37 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=8.wv in 19.96 secs (lossless, 52.30%)
created 09 - David Elias - Crossing - Morning Light Western Town (DSD64 MCH)-h--threads=9.wv in 20.03 secs (lossless, 52.30%)
Overall compression figures:
46.2% i.e. 53.8 saved, at -h --threads=0
61.6% i.e. 38.4 saved, at -g
69.6% i.e. 30.4 saved, for NTFS' "LZX" file compression - I should add, the files did not have any NTFS compression when I WavPack'ed them.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #13
Sorry I'm late to this party.

@bryant I'm glad Neural Dynamic got your developing juices going and it seems you had fun there, great work as usual, man!
On a side note, I use the hybrid mode for stereo CD ripping... :'(

@Porcus Did you use v5.6.6 or 5.6.4 (it should't make a difference, right?) for your last test please?
EDIT: Sorry I didn't read your posts with the right attention, it's in the command line (latest version used).
WavPack 5.7.0 -b384hx6cmv / qaac64 2.80 -V 100

Re: WavPack 5.6.4 Has Multithreading Option

Reply #14
5.6.6 in both replies #11 and #12. I realize now I lost that information while editing it all down ... I mean I am all too talkative for anyone to want to read everything, so I edited a bit too tightly I see. Though the command-line I pasted in #12 did give away that I used 5.6.6.

Never tested whether it made any difference to 5.6.4, which I used in reply #7.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #15
Thanks @DARcode !

I’m still looking at multithreaded hybrid mode, but it probably won’t make it into this release (I need to make sure it could never introduce an audible artifact, which is much easier to do with lossless mode!)   :D

Thanks again @Porcus for your thorough testing!

Pretty much everything is about what I expected (with a few caveats). Specifically:

  • The reason that the higher “x” modes benefit more from the multithreading is that there’s basically a fixed overhead hit from losing the continuity, which means it counts less and less with higher modes. And I think I’ve basically gotten rid of the nasty case you found with 5.6.4 where -x1 was lackluster (it’s basically -x2 now, and -x2 got bumped too).
  • The relatively poor performance improvement on DSD “normal” mode mostly comes from simply the fact that the parallelized part is so fast that the other bookkeeping parts become significant. In particular, the MD5 hash consumes almost 30% of the time at --threads=12 (and that can’t, of course, be parallelized). Anyway, that mode is so fast anyway I’m happy with the moderate improvement, and it is expected that the results are always identical (that mode uses no prior context).
  • I was surprised to see your result where DSD “high” compression improved with increasing threads (I only saw degradation). This indicates that, at least in cases like that, the compression could be improved with single threads too. Something to look into when I run out of projects.  :D
  • Finally, with multichannel the technique is completely different (temporal instead of spacial) and there's a hard limit on the number of workers created. In fact, for that file everything beyond --threads=2 is ignored (it’s missing the LFE, so only 3 streams) and so all the variation in the timings beyond that are just run-to-run variations. The modest improvement with multithreaded multichannel is somewhat compensated for with the benefit that it always generates identical results (re: single-threaded) and works in more modes (e.g., hybrid lossy, which is used in the brain wave stuff).

Re: WavPack 5.6.4 Has Multithreading Option

Reply #16
Just a thought:

So, as you discovered, --threads=0 does nothing (I may add --no-threads if I make threading the default). However, --threads=1 creates a single worker thread in addition to the default thread, and the default thread will do work if all the worker threads are busy.
You could consider a shift of one here, so that --threads=1 does what in this beta would be --threads=0, etc.
One reason for that - apart from aligning to what TAK does (defaulting to -tn 1 for single-threading) and what ktf later has proposed for FLAC (-j1):
It would free --threads=0 for something else, and what I was thinking of, is "let encoder decide". That would leave users with an option to nullify a previous option which is maybe called by an alias command (like one can now do for -h -g) in favour of what would be the default possibly after a couple of tweaks.

So that suggestion (/"loose idea"/whatever) will amount to
--threads=0 : Application decides. User must expect behaviour to change in the future. Maybe in 5.7.0, --threads=0 will be synonymous to single-threading, and in 5.9.0 it will be synonymous to --threads, it leaves you with that flexibility. Also, nothing says it would be the same over all uses; maybe wvunpack -vv could default to a different number of threads than encoding.
--threads (sans argument): A default number of threads, >1. If the manual says defaults to 4, then you might be more conservative about changing this than --threads=0 behaviour, as --threads=0 is for users who trust defaults and forcing a particular option is for users who expect this or that particular behaviour.
--threads=1: single-threaded. User does not want multi-threading.
--threads=<N>, for N>1: invoke N threads in total, like current --threads=<N-1>

Re: WavPack 5.6.4 Has Multithreading Option

Reply #17
Thanks Porcus, yeah I pretty much agree to all of this. I definitely will fix the "off by one" issue (although I'm not sure I'll change the max to 13)...   :)

The only thing I don't like is having --threads=0 mean "use threads"; that's just too unintuitive for me. I'll either have --threads=0 do the same thing as --threads=1, or maybe even error (after all, it makes no more sense than --threads=-1, but I'm leaning toward 0 and 1 being the same).

And --threads (no arg) can mean "use multiple threads as guessed optimum" which may in the future include querying the OS for available threads (so I might use 8 on a 16-thread machine like mine). Only issue there is portability, but I do have code that works on Linux and Windows. Currently I'll make it 5 (4 workers) because that seems to be working reasonably well as a compromise (and handles up to 7.1).
 

Re: WavPack 5.6.4 Has Multithreading Option

Reply #18
The only thing I don't like is having --threads=0 mean "use threads"; that's just too unintuitive for me. I'll either have --threads=0 do the same thing as --threads=1, or maybe even error (after all, it makes no more sense than --threads=-1, but I'm leaning toward 0 and 1 being the same).

Most programs I know (FFmpeg, zstd...) treat "0" as "let the program decide the optimal value". I find it way more intuitive than "0" being the same as " 1".
Opus 96 kb/s (Android) / Vorbis -q5 (PC) / WavPack -hhx6m (Archive)

Re: WavPack 5.6.4 Has Multithreading Option

Reply #19
Most programs I know (FFmpeg, zstd...) treat "0" as "let the program decide the optimal value". I find it way more intuitive than "0" being the same as " 1".
Interesting! I guess the only program I've used regularly with configurable multithreading is "make" and that treats -j0 as an error (and with no argument means "infinite jobs", which I'm not sure I believe).

But I agree that having "0" and "1" do the same thing is goofy. I'll come up with something reasonable.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #20
The only thing I don't like is having --threads=0 mean "use threads"
I actually indended it too mean "use the default, whatever that is", and for the next release, that is likely to be single-threaded I guess?
And, I intended to suggest, --threads sans argument would mean "do multithreading, select some number >1".

(But, another point (that ktf wasn't too enthusiastic about): if called to encode with wildcard say *.wav, one could consider to send each file off to a new thread, rather than multithreading within single files. If so, that probably calls for an option to invoke or switch off.)

Currently I'll make it 5 (4 workers) because that seems to be working reasonably well as a compromise (and handles up to 7.1).
Out of utter curiosity: If multithreading several channels, you then pass "stereo pairs" to a thread? FL+FR to one thread, BL+BR to another, SL+SR to the third and FC and LFC (dual mono) to the fourth? That didn't work too well for FLAC, but that was in 2.0 and not 7.1, and also FLAC has s fairly short (default) block sizes which would then lead to frequent spawns and all that overhead.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #21
(But, another point (that ktf wasn't too enthusiastic about): if called to encode with wildcard say *.wav, one could consider to send each file off to a new thread, rather than multithreading within single files. If so, that probably calls for an option to invoke or switch off.)
Doing the parallelization in the command-line programs would obviously be better than how I’ve done it (in libwavpack) with respect to performance and what modes can and cannot work the same.  And it would actually be easier is some ways.

The big issue is how does one handle this in the UI? Can we display the progress of all the simultaneous conversions, or just the for the total job? Error handling might become a lot more complicated too. I’m sure this is what ktf was thinking about; the parallelization part is easy but the interaction with the display becomes a cross-platform nightmare.

Quote
Currently I'll make it 5 (4 workers) because that seems to be working reasonably well as a compromise (and handles up to 7.1).
Out of utter curiosity: If multithreading several channels, you then pass "stereo pairs" to a thread? FL+FR to one thread, BL+BR to another, SL+SR to the third and FC and LFC (dual mono) to the fourth? That didn't work too well for FLAC, but that was in 2.0 and not 7.1, and also FLAC has s fairly short (default) block sizes which would then lead to frequent spawns and all that overhead.
Yes, how you describe it is almost correct, except that the FC and LFE channels are mono streams and are therefore run on separate threads, for a total of 5 threads for 7.1 and 4 threads for 5.1.

This works well with WavPack, although it does help to not use short buffers for the reason you mention (and I don’t spawn the threads each call because I found that on Windows that was too much overhead, so I leave the threads waiting on a condition variable when the library returns to the caller).
 

Re: WavPack 5.6.4 Has Multithreading Option

Reply #22
So "5 (4 workers)" covering 7.1 means the mono streams are done so quickly that one thread is done in half time and respawned to take the other? Or was it "6 (5 workers)" for workers 1:front, 2:back, 3:side, 4:FC 5:LFE? Anyway, above my paygrade ... and the good thing about a "default" specified without a fixed numerical argument, is that it could be determined by encoder after having read the channel configuration. Could spawn the fifth worker if and only if the channel configuration suggests it.

As for UI reporting, and not wanting to navigate up lines in the console, it depends on how much you want the intermediate output to look like the usual single-threaded output, or whether you are fine with the final screen looking more like it. In the latter case you could make it as easy as dropping intermediate percentages, starting with a Multithreading several files (up to 4 at the time), detailed progress not shown and after the "final report" for the files done, keep a line saying say 9 files successfully converted, 1 failed, 8 to go (4 threads active) - once another file is done, that file can be deleted, the appropriate output for that file printed, and then the next line 10 files successfully converted, 1 failed, 7 to go (4 threads active).

Anyway, insignificant details thought aloud.

Re: WavPack 5.6.4 Has Multithreading Option

Reply #23
So "5 (4 workers)" covering 7.1 means the mono streams are done so quickly that one thread is done in half time and respawned to take the other? Or was it "6 (5 workers)" for workers 1:front, 2:back, 3:side, 4:FC 5:LFE? Anyway, above my paygrade ... and the good thing about a "default" specified without a fixed numerical argument, is that it could be determined by encoder after having read the channel configuration. Could spawn the fifth worker if and only if the channel configuration suggests it.
I think the confusing thing here is that I should stop talking about "workers", even though that's what the code spins up to offload jobs to, because when the "main" thread runs out of available "workers" it just does the work itself (unless there are many workers, in which case the bookkeeping is enough to keep it busy). So when I said "5 (4 workers)" I mean 5 available threads for doing work, and I should just keep the fine distinctions to myself (which caused the original confusion with --threads=1). And yes, the FC and LFE threads finish sooner and just go idle. And I already limit the number of workers spun up in the multichannel case to the maximum that could ever be used (in the single-stream case the number of threads used is determined by the size of the buffered operations requested and I have no way of knowing that in advance).
Quote
As for UI reporting, and not wanting to navigate up lines in the console, it depends on how much you want the intermediate output to look like the usual single-threaded output, or whether you are fine with the final screen looking more like it. In the latter case you could make it as easy as dropping intermediate percentages, starting with a Multithreading several files (up to 4 at the time), detailed progress not shown and after the "final report" for the files done, keep a line saying say 9 files successfully converted, 1 failed, 8 to go (4 threads active) - once another file is done, that file can be deleted, the appropriate output for that file printed, and then the next line 10 files successfully converted, 1 failed, 7 to go (4 threads active).
The funny thing is that doing whole directories of files from the command-line is something that I actually do often (and hybrid mode!), so this is something that I would actually use all the time. Even so, it still seems like a nightmare...   :D

Re: WavPack 5.6.4 Has Multithreading Option

Reply #24
Haha. What I was actually curious about when I asked, was whether running FC and LFE (being mono, no question about mid+side) would run as fast as to be able to accommodate the "housekeeping". I didn't ask because ... well user doesn't need to know every "why this order" - but for that to be on the table at all, one would have to be in the situation where WavPack could "allocate threads to channel pairs", which seemed to not to work as well in practice for reference flac, hence that question.
(For stereo-only as far as FLAC goes; I only recently found out the format does not support multi-channel decorrelation - not even the FL+FR  part. Frankly surprised that it keeps up with WavPack pretty much just as well on 5.1 as on stereo.)


As for the latter ... how often do I call a command line to (re)compress without using the "*"? I guess that is only when I download a single test file.
(For users who don't invoke command-lines at all, there is discussion at https://hydrogenaud.io/index.php/topic,124437.msg1030042.html#msg1030042 and Case's post referenced: a player like fb2k will pass audio to be processed in one thread per output file (queueing it until a thread becomes vacant), so one should think twice before making multi-threading a default.)