Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: More multithreading (Read 34042 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: More multithreading

Reply #150
The FLAC format is unfit for multithreaded decoding. That is, because reliably finding the next frame involves parsing the current one.
My usual uneducated question: Why can't one parse frame 1, send to thread 1, parse frame 2, send to thread 2 etc?

Re: More multithreading

Reply #151
That is possible of course, but of little use. Decoding only takes a tiny amount of time compared to parsing, which means the decoding thread will idle a lot, and sleeping/waking a thread comes with a lot of overhead. Most time is spent parsing.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #152
Decoding speed is more important than Encoding speed in most cases. And excessive operations are a plus for multithread. However, even in a very low processing load, we can benefit from multithread.

Intel Core i7-3770K(4 core, 8 thread), 16 gb ram, 240 gb ssd
The SSD I use in this test system is really slow (2012). Therefore, it is a large bottleneck in terms of speed.
Code: [Select]
HALAC Normal Decoding
1,245,704,379 bytes -> 1,857,654,566 bytes
HALAC Normal mt=1 : 10.074
HALAC Normal mt=2 :  6.590
HALAC Normal mt=4 :  4.914
HALAC Normal mt=8 :  4.303
-------------------
HALAC Fast Decoding
1.305.406.815 bytes -> 1,857,654,566 bytes
HALAC Fast mt=1 : 9.096
HALAC Fast mt=2 : 6.323
HALAC Fast mt=4 : 4.558
HALAC Fast mt=8 : 3.674
----------------------
FLAC Decoding
1.318.502.972 bytes -> 1,857,654,566 bytes
FLAC -0 : 14.307

Re: More multithreading

Reply #153
Decoding speed is more important than Encoding speed in most cases.
Can you name a few of those cases?

As I see it, encoded audio is often decoded real-time for playback. On any modern desktop/laptop CPU, FLAC already reaches 1000x playback speed, including MD5 calculation. FLAC playback has happened on battery-operated devices for for 20 years already, and it is often faster than decoding of lossy audio, see https://www.rockbox.org/wiki/CodecPerformanceComparison

The only audio-related use-case I can think of is verification of stored files, and in that case MD5 calculation is the bottleneck already. There are some non-audio usecases as described here, those would indeed benefit from faster decoding when MD5 is disabled.

So sure, FLAC decoding can be made faster and some people would benefit. But I wouldn't say it is more important than encoding speed in most cases, as it is already very, very fast.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #154
For perspective: Decoding CDDA at 1000x-ish speed means a maximal 4 GB WAVE file would run in half a minute-ish time. (And it seems to me that 4 GB full of 24-bit content decodes faster than 16-bit content - a big file is not unlikely to be higher resolution.)
Why bother about one file? Because if you are decoding (or verifying) several files, then use an application that will spawn one thread per file and respawn when that file is done.

The only audio-related use-case I can think of is verification of stored files, and in that case MD5 calculation is the bottleneck already.
First, if faster verification is an objective, one could do like WavPack/Monkey's/OptimFROG are able to: implement some flac -t --no-md5   that merely runs though the frame checksums without decoding.
Whether it is worth it given FLAC's decoding speeds is a different question. Same for offloading MD5 to different thread ...

But there are other audio-related uses. One is simple processing like ReplayGain scanning.
(Disregarding the fact that true-peak scan could be more intensive.)

Rewind time to 2007-ish, when HDD costs would some users to choose Monkey's for their CD rips. If you wanted to configure EAC to rip to Monkey's (hacking around with Case's wapet for APEv2 tagging), and then run a RG scan that would decode the .ape file, that would take quite some time. Of course it isn't much of an issue if you can run it overnight.
But the following more cumbersome procedure would cost less CPU: Configure EAC to rip to FLAC -0 with metaflac computing ReplayGain and tagging it (recall, WAVE tagging was a big meh in that age!); then convert to Monkey's with tags transfer.
Because a FLAC encoding and two decodings (one for RG and one for conversion) would be cheaper on the CPU than a Monkey's decoding. According to your tests back in 2009, encoding + 2x decoding FLAC -0 would go at 70x realtime on your hardware back then - and that was faster than a single Monkey's Fast decode, which could take a minute for a CD. (Yet another disclaimer: ot saying that is an awful wait either, when you are in a ripping process that takes more when you are changing CDs.)

Re: More multithreading

Reply #155
That is possible of course, but of little use. Decoding only takes a tiny amount of time compared to parsing, which means the decoding thread will idle a lot, and sleeping/waking a thread comes with a lot of overhead. Most time is spent parsing.
"Encode once, decode all the time". The principle is an important discourse in data compression. "Is the number of people who produce music(encode) too much or the number of people who listen(decode) too much?" The answer to his question supports this. Of course, in general data compression, there are also places where encode speed is important, and there are also places where decode speed is important. We can reach them as a result of a short search. However, the general opinion is that the decode speed is more important. Besides, it's not my idea. It is the general opinion of some authorities and the sector.
F_Score (universal score) = C + 2 * D + (S + F) / 10⁶

In addition, hashing operations can work independently at both encode and decode stages. But of course, if a codec has various dependencies due to its nature, multithread will not be effective. In the case of audio compression/decompression, we know that FLAC is fast enough. This is one of the biggest reasons why it has wide usage. The things I mentioned and showed here are that, contrary to what you said, the decode phase of audio data can also be performed quite efficiently in multithread. So just because something is fast doesn't mean it can't get faster. And faster will never hurt.

Re: More multithreading

Reply #156
"Encode once, decode all the time".
Sure. I've used that argument myself. So I am kinda surprised to see a score formula that puts only twice as much weight on decoding as encoding - that doesn't translate well to "all the time".

Also, wall time and CPU time are not the same. If you are in for efficiency, then don't produce overhead.

Re: More multithreading

Reply #157
"Encode once, decode all the time".
But this doesn't explain why multithreading decoding would be beneficial. When working with documents, images, etc. the file is decoded at once in its entirety, and then multithreading makes sense, because faster is better. But audio and video are usually decoded at playback speed, and being any faster only has a lower CPU usage as a benefit. In that case, multithreading doesn't lower total CPU usage, but increases it.

Additionally, decoding is far more vulnerable to security bugs then encoding is, so I'd like to keep that code simple where possible. Besides, if this were a competition, there is no way FLAC would win: it is a 25 year old format based on patent-free techniques. So in effect, FLAC is working with 50 year old techniques like Rice coding. Of course it doesn't stand a chance to techniques like ANS when only looking at speed and compression ratio.

So I'd prefer to FLAC be compressing good, fast and stable. There are many codecs that compress better, there are faster codecs, but FLAC is an open, patent-free, very well documented standard, has a lot of independent decoder implementations, has several independent encoder implementations, compresses reasonably well (within 10% of state-of-the-art), has a stable, fast and open-source reference implementation and is supported by a lot of hardware.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #158
When working with documents, images, etc. the file is decoded at once in its entirety, and then multithreading makes sense, because faster is better. But audio and video are usually decoded at playback speed, and being any faster only has a lower CPU usage as a benefit.
You're right about what you said. I'm looking at the case more generally in terms of data compression. This is also a little due to my previous different studies.

Besides, if this were a competition, there is no way FLAC would win: it is a 25 year old format based on patent-free techniques. So in effect, FLAC is working with 50 year old techniques like Rice coding. Of course it doesn't stand a chance to techniques like ANS when only looking at speed and compression ratio.
Even if Rice coding is old (remember Solomon W. Golomb with respect), I think it's very ideal for audio compression. When used correctly, it can work efficiently from HUFFMAN, ANS and even from AC. I have already said that according to my tests, the compression ratio of Rice encoding in audio data is better than ANS. In certain cases, even in image compression. So if I can also solve the speed issue, I can use a custom Rice-derived coding system in the coming period.

So I'd prefer to FLAC be compressing good, fast and stable. There are many codecs that compress better, there are faster codecs, but FLAC is an open, patent-free, very well documented standard, has a lot of independent decoder implementations, has several independent encoder implementations, compresses reasonably well (within 10% of state-of-the-art), has a stable, fast and open-source reference implementation and is supported by a lot of hardware.
I have always liked and used FLAC. But according to this interpretation, it comes to a conclusion that there is no need to develop anything new and it is not worth spending time on it. Maybe that's the truth.

Re: More multithreading

Reply #159
So I'd prefer to FLAC be compressing good, fast and stable. There are many codecs that compress better, there are faster codecs, but FLAC is an open, patent-free, very well documented standard, has a lot of independent decoder implementations, has several independent encoder implementations, compresses reasonably well (within 10% of state-of-the-art), has a stable, fast and open-source reference implementation and is supported by a lot of hardware.
I have always liked and used FLAC. But according to this interpretation, it comes to a conclusion that there is no need to develop anything new and it is not worth spending time on it. Maybe that's the truth.
That is pulling it too broad. I just stated a list of goals for FLAC and its strong points, nothing else. I listed that, because I think developing multithreaded decoding provides little gain and might conflict the goal of being stable. I'm not saying FLAC should have no more additions, and I am certainly not saying new codecs have no use.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #160
I guess FLAC decoding is not multithread yet.
The FLAC format is unfit for multithreaded decoding. That is, because reliably finding the next frame involves parsing the current one. One could offload MD5 calculation to a separate thread, and maybe parsing and decoding could be done in separate threads, but I currently don't really see a way to  make more than 4 threads have any benefit at all, and even then, the workload would be very uneven.
I agree that the juice probably isn't worth the squeeze wrt a MT decoder, but it isn't that dire. A seektable does provide a reliable way to chunk up the input, and even without a seektable a decoder could make use of things like max frame size if available to attempt to give threads a reasonable chunk of sequential frames to work on to minimise overhead (or just do 1MiB chunks or even split the file into n parts for n threads). A MT decoder can be proof-of-concepted similar to the hack job I did with flaccid, which I might do at some point just to answer some of these questions.

I doubt there's much benefit to be had unless the file is already in ram, and md5 would have to be ignored as that just kills the whole idea unless capping out at some double digit speedup is the goal. It does rather mess up memory accesses.

Re: More multithreading

Reply #161
I guess FLAC decoding is not multithread yet.
The FLAC format is unfit for multithreaded decoding. That is, because reliably finding the next frame involves parsing the current one. One could offload MD5 calculation to a separate thread, and maybe parsing and decoding could be done in separate threads, but I currently don't really see a way to  make more than 4 threads have any benefit at all, and even then, the workload would be very uneven.

As pointed out here, ffmpeg does some degree of multithreading on decoding - and that goes for FLAC too. Running a longer test on a different computer by now with new ffmpeg, but on this laptop with ffmpeg 5.1.1, it seems that the fastest decoding of "one CD image" (73 minutes, tagless, decoded to NUL) are in order:
* ffmpeg decoding FLAC
* ffmpeg decoding TAK
* ffmpeg decoding TTA
* ffmpeg decoding WavPack -f / -g and ALAC,
and: flac.exe 1.4.2 on dual mono file which has no MD5
* ffmpeg on WavPack -h
* flac.exe and wvunpack.exe --threads (5.7.0) on -f, -g, -h and ffmpeg on WavPack -hh
* takc.exe (doesn't multithread decoding)
* wvunpack.exe --threads on -hh
* and from then on the usual order of single-threaded wvunpack.exe with refalac and tta meddling in before the heaviest ones.

Re: More multithreading

Reply #162
I remember reading through this (or at least skimming? Which may be part of the issue) but for whatever reason I never realized you needed to invoke a command for multithreading in the latest builds. Maybe I just assumed it was already multithreaded by default? Or didn’t care that much because things were already so speedy?

Well anyway, upon the recommendation from the GIT release thread, I tried using the -j command in EAC (with the latest posted build of FLAC, from 10/28)

A very unscientific experiment, mind you. But first I tried my typical encoding settings, -8ep. The CD I was ripping had 2 tracks each a bit over 35 minutes in length. As the tracks ripped and the encoder popped up, they each took about 2 minutes to complete encoding. As this was a burst mode rip, the 2nd track started and finished ripping, then began encoding, as the first was still encoding. Multithreading in the old fashioned sense, but I figured that was worth mentioning as it illustrates how long it was taking.


So, I then decided to try again with -j30 -8ep (leaving a couple threads open for background tasks didn’t seem like such a bad idea)


Started the rip again, and… WOW. Great googly moogly! These tracks ended up encoding in less than 10 seconds each. Over 90% CPU utilization in task manager, with temps and fan/pump speeds going up and everything! I like it!


In reality, it’s still fairly useless in this particular application -  one way or the other you are waiting for something other than the encoder to complete when ripping CDs. Even if it’s a disc with an unusually long track, you’re likely putting in your next disc while the last track is still compressing anyway. And most discs do not have 35 minute tracks. But there are absolutely other use cases for this and it’s incredibly impressive, even if it’s only for personal satisfaction and e-peen measuring.

I like making the most of my hardware so I see no reason not to leave the -j30 command in my normally used EAC settings.


I’d like to really put this through its paces with an abnormally long 24/192 recording and see just how fast we can make an -8ep encode go.

Re: More multithreading

Reply #163
A 32 thread CPUs to encode your CDs! And keep you warm through the winter  8)

As has been pointed out, there is a good reason it does not multithread by default. Applications like fb2k will, when multiple files are to be converted, spawn one instance per CPU thread, and then all you get is the overhead.

I’d like to really put this through its paces with an abnormally long 24/192 recording and see just how fast we can make an -8ep encode go.
With > 48 kHz, you can use -8epr8 -l32 and still stay inside FLAC subset.  Start with -8pr8 and give us bytes per extra second elapsed when putting in each of the following (or in combination):
-e
-l 32
-A "subdivide_tukey(7/777e3);tukey(4e-1);flattop;gauss(3e-3)"
(*evil grin but hey it's doable*)

Re: More multithreading

Reply #164
A 32 thread CPUs to encode your CDs! And keep you warm through the winter  8)

As has been pointed out, there is a good reason it does not multithread by default. Applications like fb2k will, when multiple files are to be converted, spawn one instance per CPU thread, and then all you get is the overhead.

I’d like to really put this through its paces with an abnormally long 24/192 recording and see just how fast we can make an -8ep encode go.
With > 48 kHz, you can use -8epr8 -l32 and still stay inside FLAC subset.  Start with -8pr8 and give us bytes per extra second elapsed when putting in each of the following (or in combination):
-e
-l 32
-A "subdivide_tukey(7/777e3);tukey(4e-1);flattop;gauss(3e-3)"
(*evil grin but hey it's doable*)


Hey Porcus,
in terms of highest compression at reasonable times, which presets do you personally like that give better comp. than FLAC -8 but aren't too slow like the -extensive option?

Re: More multithreading

Reply #165
Hey Porcus,
in terms of highest compression at reasonable times, which presets do you personally like that give better comp. than FLAC -8 but aren't too slow like the -extensive option?
-8pr7 . But "r7" only helps on "more dense" music ... for heavier rock/metal (that's my thing) or especially noise it can make a difference, but on classical music I have seen in testing that -r4 and above might even yield bit-identical files.  r8 is also within subset, and if you are really into noise music, then maybe.

Above that, add windowing functions. When I can let it encode in the background while fiddling around with something else, a  -A "subdivide_tukey(5);flattop"   at the end. But you might want to make a command alias or a bat file, if you don't have the patience to actually type a long command-line ...
There is a thread at https://hydrogenaud.io/index.php/topic,123025.0.html

-e sometimes makes surprisingly good results on high resolution material. Meaning, the default guess-and-encode has its limitations. No wonder it is best on "typical resolution", because that is what everything was developed on ...
And on sampling rates above 48, you can stay within subset on say -l16.  But who bothers to check a compilation for precisely which files are hi-rez?