Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: More multithreading (Read 9929 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: More multithreading

Reply #126
I agree that it is a good idea to warn users it will change in the future. My thinking-aloud was ... why not make that warning clearer - because it isn't synonymous to any other setting, it cannot be expected to remain synonymous to any other setting. Just to warn users about that from jay day one. But I'm not crying over it currently being same as -j1.


Anyway, I see the tests here are CDDA material. Maybe high resolution and multichannel? More for sanity checking - see if there is anything completely bonkers going on - than for sheer numbers.
But the numbers ... might be unreliable, so maybe I should ask, were there any "general" changes made from v1 to v5?
Asking because when I tried the below hi-res corpus on -8Mper7 -j1/-j2 (yes, "M" - stupid setting but the point was to test "-M") then I got the following times:
v1   2574 vs 2550
v5   2538 vs 2534
(Wombat's build: slightly faster. I didn't run more 8pe, not included below.)


So, on to the tests; v1 (first one posted in this thread) vs v5 vs Wombat's build from same source as v5, but with AVX-512 compiler flag ("v4", but I left that out not to confuse with ktf's builds):

High resolution. 62 minutes, 2848MiB compressed (-5): size contributions are 1008 of 192/24 from the 2L testbench and Linn Records, 619 of that infamous 768kHz Carmen Gomes PR stunt, 350 of DXD (that's only 5 minutes!) from the 2L testbench - and then the rest is various-rate 32-bit integer from that sample rate converter site, and then one track in 192/16 from some French stoner band.
 
First two cells were redone later, computer was apparently still busy when I started. Also did -j9 for a sanity check (got only 8 threads) -  virtually the same times as -j8, omitted from the table.
One surprise: v1 doing -0b4096 -j2 so well. Confirmed by a few re-runs.
But generally, v5 is superior on this material too. Take individual times with a grain of salt.

This time I used seconds. You can see where the benefits flatten out:
.j1j2j3j4j5j8Mj1Mj2
v1@-0r02223181717172222
v52112121113112219
Wombat2113121212132319
v1@-02120181717172321
v52213111112112320
Wombat2213121212122320
v1@-0b40962115161616162215
v52112101110102219
Wombat2112111112112218
v1@-0er73130181717173232
v53117141211123229
Wombat3017151212123129
v1@-5332416171616
v5331814111211
Wombat321814121212
v1@-810011054383831
v51005339323230
Wombat975238303128
v1@-8e477484249183178153
v5481253183150151145
Wombat476247181149148141
v1@-8pr7638644335244255206
v5642332241201201194
Wombat631327239197197189
(Wombat builds produce slightly different files, size differences +/- 0.01%. )


5.1 multichannel. About an hour, DVD-sourced at 48kHz (avoiding high resolution here, one test at the time).
Since -M was off the table, I cut down to fewer -j options too. Again I have omitted a -j9 done for sanity checking, it produced numbers consistent with -j8.

There are some weirdnesses for -0; I cannot rule out that the computer might have been not completely done with some other job or whatever.  Also I cannot access that computer at the moment to re-run it (I am on the road, I had it output numbers to a text file in the cloud).
j1j2j4j8
v1@-0139914
v51321156
Wombat147710
v1@-2er71718911
v5171087
Wombat20977
v1@-514999
v514867
Wombat14967
v1@-534322410
v53320119
Wombat3119118
v1@-8e1181175935
v5121924834
Wombat115914732
v1@-8pr71621687948
v51611296746
Wombat153834744
So, since I was looking for anomalies, and am away from that computer since firing up that multichannel test ... well, I would have hoped I didn't have to re-run anything due to results like those -0. But with that minor reservation, I think the picture (on this computer with some Intel 4 cores 8 threads) is getting quite clear.  v5 does behave sane, but I should count myself lucky if I save much time going beyond -j4.

Re: More multithreading

Reply #127
Current git of the multithreading version c1fc2c91, CPU generic.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: More multithreading

Reply #128
I used that version above in my little j5005 machine and did recompress more than 300GB over the time using --threads=3 since it runs anyway. These are mixed bitrates.
All files correctly bit-compare.
@ktf Do you already have a timeline for the merge with xiph master or even a final and multithreaded 1.4.4?
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: More multithreading

Reply #129
Like 1.4.0, release it 09-09 in order not to confuse ISO-8601-illiterate Americans?  ;)
a final and multithreaded 1.4.4
Question arises, are the changes so minor that it will stay "1.4"?
Possible relevance: a "1.5.0" might justify some more discussion on what is to be included.

Speaking of the second digit:
1.3.4 has this error in Rice partitions with escape code zero (testbench file 64). And 1.3.4 is the last of the 1.3 series.
Is there a risk that 1.3.4 will be kept in production because the breaking changes to 1.4.0? If so, should there be a maintenance 1.3.5 with this bugfix?

Re: More multithreading

Reply #130
@ktf Do you already have a timeline for the merge with xiph master or even a final and multithreaded 1.4.4?
Not really, no. Merge with master probably in a few weeks, release might be next year.

Question arises, are the changes so minor that it will stay "1.4"?
The reason to bump the 4 to 5 would be because of a breaking API change. That isn't the case here.

1.3.4 has this error in Rice partitions with escape code zero (testbench file 64). And 1.3.4 is the last of the 1.3 series.
Is there a risk that 1.3.4 will be kept in production because the breaking changes to 1.4.0? If so, should there be a maintenance 1.3.5 with this bugfix?
No, that won't happen. I don't have time to backport all fixes that have happened in the meantime.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #131
No, that won't happen. I don't have time to backport all fixes that have happened in the meantime.
Fair enough - it is not that it creates invalid files (I think?)
But the changelog could maybe have been clearer recommending up- or downgrade if flac.exe version 1.3.4 errs out on a file.

Re: More multithreading

Reply #132
@ktf Thanks for the info.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: More multithreading

Reply #133
"Currently, passing a value of 0 is synonymous with a value of 1, but this might change in the future"
Maybe it would be better to change its meaning to "sets to amount of available cores"?

Re: More multithreading

Reply #134
There is no platform-independent way to determine the 'amount of available cores': this is different for Windows, MacOS, *nixes, microcontrollers etc. Might also differ between CPU architectures. Also, with the advent of performance and efficiency cores, using all cores might not be beneficial. Same goes for hyperthreading and similar technologies.

So auto-selecting isn't as simple as it might seem.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #135
It's a de-facto standard that 0 means "as many cores as available", if you don't want to do that I suggest removing 0 as an option entirely. Either way the default should be 1. IMO if a user requests "as many cores as available", it's on them if that's not the most effective option.

Re: More multithreading

Reply #136
It's a de-facto standard that 0 means "as many cores as available",
Wasn't it so that "-threads 0" in ffmpeg means "let application decide"?

Either way the default should be 1.
Obviously. Say fb2k will spawn one instance per available thread.

Re: More multithreading

Reply #137
You may also relate 0=default and this is still 1 thread.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: More multithreading

Reply #138
First thanks for starting to develop multithreading.

I performed some tests and compared it to https://www.rarewares.org/files/mp3/fpMP3Enc.zip multithreading behavior.
fpFLAC2FLAC used all cores @100% by default (with no options added) but flac.exe with maximum opted threads uses CPU in this way:X

And i would like to ask if there is any chance to add some timers to CLI text for testing purposes?

Re: More multithreading

Reply #139
AMD Ryzen 5900x, 24 threads (--threads=24)
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: More multithreading

Reply #140
but flac.exe with maximum opted threads uses CPU in this way:
Please explain what options you used.

FLAC encodes very fast, the system calls used to enable multithreading take some time to execute and some things cannot be multithreaded. This means that when multitheading with a large number of threads, full CPU usage can only be reached when using a slow FLAC preset, like -8p. If you used the default compresion level of 5, then what you are seeing is probably due to parsing or decoding (which cannot be multithreaded) being a bottleneck.
Music: sounds arranged such that they construct feelings.

Re: More multithreading

Reply #141
Please explain what options you used.
5950x automatically runs @~4,40-4,55 GHz settings -8 -V -j32 X
5950x auto throttles down to ~3,25-3,35 GHz settings -8 -V -e -p -j32 X

 

Re: More multithreading

Reply #142
5950x auto @4,45 GHz settings -8 -V -p -j32 X