Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Tested: Lossless decoding speed, multithreaded - and fast verification (Read 1034 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Tested: Lossless decoding speed, multithreaded - and fast verification

Why test wall time for multithreading? After all, multithreading doesn't use less CPU, only waiting time if the CPU isn't already running busy.  Conversion software that spawns one thread per file would be expected to be more efficient, so why not let it ... ?
But, there are situations where you want to save to compressed, say in DAW plug-ins; then I'd expect even two seconds' wait to be noticeable.  And if you are opening a project, then the same goes for decoding?

Anyway, swine got curious:
 * ffmpeg 7.0 now encodes faster: https://hydrogenaud.io/index.php/topic,125694.msg1042555.html#msg1042555 .  Its fastest encoding is now on par with single-threaded fastest official WavPack (which compresses better!), while 6.1.1 spent 60 percent more time compressing worse.  Seems to be due to one thread processing source and ... ?  (I have no idea of what overhead that creates.)
 * ffmpeg also decodes with some degree of multithreading, unknown to me until pointed out in that same thread. Aha ... so how fast?  The reference FLAC multithreading in git is only for encoding, not for decoding.
 * For multiple files, I just noticed that since decoding is fast, the penalty from a FOR loop is more than noticeable.  Hence decoders that can do wildcards are at advantage.  Of course that advantage is very much real - heck, the time taken to type a FOR loop is also significant here! - but for measuring the decoder ... what then?

For a more apples to apples comparison, this is one (untagged) file, 73 minutes of CDDA on (internal) SSD. Corpus isn't super-important (I hope!):
I took the first ten minutes and a half of each of 7 CDs that are neither classical music nor metal - because the variety of signals, some old near-mono, some this and some that. Compressed better than your average I guess, numbers are given at the end.
So think of it as one full compilation CD as image (no cuesheet, no tags!)

Test was done with the hyperfine benchmarking tool that I recently started using: I run the whole thing 11 times + warmup for the larger part, all with pause in between to keep the CPU reasonably stable, and then because some figures looked suspicious and I would anyway re-run a few (no big changes!), I included some like shn and mlp in that run and pasted them in order. (Since the fastest ran in 1.199 and then 1.200, that makes no difference.)  CPU: i5-1135-G7, 4 cores 8 threads.
I wish hyperfine could be set to report this nice summary with median instead of mean, for robustness to the whims of the OS - but laziness gets the better off me when the output looks as nice as this.  Reformatted slightly, and commented.

  ffmpeg -i =N.flac.-5.flac -f wav -y NUL ran
1.03 ± 0.01times faster thanffmpeg -i =N.flac.-0r0b4096--no-md5.flac -f wav -y NULThis dual mono FLAC with no MD5 was encoded to decode fast.  Seems ffmpeg ignores MD5.
1.16 ± 0.01times faster thanffmpeg -i =N.flac.-8l32.flac -f wav -y NUL-8l32 --lax, to be more precise.  I cannot force flac.exe to use a very high order, but this was intended to decode slower and it did
1.32 ± 0.02times faster thanffmpeg -i =N.tak.-p0.tak -f wav -y NULTAK, fastest one
1.43 ± 0.02times faster thanffmpeg -i =N.tak.-p4m.tak -f wav -y NULWhy this is faster than -p2 ... could be number of frames
1.59 ± 0.02times faster thanffmpeg -i =N.tta -f wav -y NULTTA is a surprise. Look at how much faster than the reference ...
1.73 ± 0.02times faster thanffmpeg -i =N.wv.ffmpeg-0.wv -f wav -y NULWavPack.  ffmpeg decodes WavPack faster than multithreaded wvunpack.exe does
1.83 ± 0.02times faster thanffmpeg -i =N.wv.-f.wv -f wav -y NUL
1.85 ± 0.03times faster thanffmpeg -i =N.flac.-2e.flac -f wav -y NULFLAC with smaller block size, that yields time penalty
1.93 ± 0.03times faster than.\wvunpack.exe -qy --threads=8 =N.wv.-f.wv  -o NULwvunpack with --threads=8.  Only one faster than flac -d.
2.08 ± 0.02times faster than.\flac.exe -ss -d =N.flac.-0r0b4096--no-md5.flac -fo NULFLAC with official decoder.  Here the absence of MD5 matters.  ffmpeg does it twice as fast.
2.08 ± 0.09times faster thanffmpeg -i =N.alac.refalac.m4a -f wav -y NULALAC compressed with refalac
2.20 ± 0.02times faster thanffmpeg -i =N.alac.ffmpeg.m4a -f wav -y NULALAC compressed with ffmpeg
2.45 ± 0.03times faster thanffmpeg -i =N.wv.-x.wv -f wav -y NULWavPack default mode
2.51 ± 0.03times faster thanffmpeg -i =N.alac.cuetools8.m4a -f wav -y NULALAC compressed with CUETools, the slower preset "8".
2.69 ± 0.03times faster than.\wvunpack.exe -qy --threads =N.wv.ffmpeg-0.wv -o NULWavPack by official wvunpack --threads (selecting the thread count itself).  Note, no -m used.
2.69 ± 0.03times faster than.\wvunpack.exe -qy --threads =N.wv.-f.wv -o NUL(-q for "quiet")
2.87 ± 0.03times faster than.\flac.exe -ss -d =N.flac.-5.flac -fo NUL(-ss for "silent")
3.04 ± 0.03times faster than.\flac.exe -ss -d =N.flac.-2e.flac -fo NULBecause block size 1152?
3.30 ± 0.04times faster than.\wvunpack.exe -qy --threads =N.wv.-x.wv -o NULWavPack default mode (-x does not slow down decoding),
3.47 ± 0.17times faster thanffmpeg -i =N.wv.-hx2.wv -f wav -y NULffmpeg on a high mode .wv nearly catches official on a default mode
3.52 ± 0.04times faster than.\flac.exe -ss -d =N.flac.-8l32.flac -fo NULheaviest flac
4.18 ± 0.14times faster than.\wvunpack.exe -qy --threads =N.wv.-hx2.wv -o NULwvunpack takes 27 percent more time than ffmpeg
4.59 ± 0.07times faster thanffmpeg -i =N.wv.-hhx3.wv -f wav -y NUL
5.19 ± 0.17times faster than.\takc.exe -d -overwrite -tn4 =N.tak.-p0.tak NULTAK. "-tn4" would turn on multithreaded encoding, but like FLAC it doesn't multithread decoding.  4x the time of ffmpeg!
5.24 ± 0.13times faster than.\wvunpack.exe -qy --threads =N.wv.-hhx3.wv -o NUL
5.68 ± 0.14times faster than.\takc.exe -d -overwrite -tn4 =N.tak.-p2.tak NUL
5.76 ± 0.12times faster thanffmpeg -i =N.shn -f wav -y NULShorten, for completeness. ffmpeg does that faster too.
5.95 ± 0.15times faster than.\takc.exe -d -overwrite -tn4 =N.tak.-p4m.tak NUL
6.48 ± 0.16times faster than.\wvunpack.exe -qy =N.wv.ffmpeg-0.wv -o NULwvunpack, single-threaded
6.67 ± 0.08times faster than.\wvunpack.exe -qy =N.wv.-f.wv -o NUL
6.95 ± 0.15times faster than.\shorten.exe -x =N.shn NUL
7.39 ± 0.08times faster than.\refalac -D =N.alac.refalac.m4a -o NULrefalac spends 3.6x the time of ffmpeg
8.28 ± 0.16times faster than.\wvunpack.exe -qy =N.wv.-x.wv -o NUL
9.18 ± 0.10times faster than.\tta.exe -d =N.tta NULTTA official spends 5.8x the time of ffmpeg.  Either one is good or the other is bad ... or could it be the large block size?
9.75 ± 0.10times faster thanffmpeg -i =N.ape.-c1000.ape -f wav -y NULMonkey's is also faster with ffmpeg, but not that much
10.55 ± 0.12times faster than.\refalac -D =N.alac.ffmpeg.m4a -o NUL
10.75 ± 0.24times faster than.\wvunpack.exe -qy =N.wv.-hx2.wv -o NUL
11.09 ± 0.12times faster than.\refalac -D =N.alac.cuetools8.m4a -o NULrefalac spends 5.3x the time of ffmpeg on this heavier file.
13.17 ± 0.30times faster than.\MAC.exe =N.ape.-c1000.ape NUL -d
13.68 ± 0.16times faster than.\wvunpack.exe -qy =N.wv.-hhx3.wv -o NULhh-eaviest WavPack file.  2.6x the multithreaded time.  3x ffmpeg time.
14.72 ± 0.16times faster thanffmpeg -i =N.ape.-c3000.ape -f wav -y NUL
20.32 ± 0.41times faster than.\MAC.exe =N.ape.-c3000.ape NUL -d
23.38 ± 0.51times faster thanffmpeg -i =N.mlp.mka -f wav -y NULMLP. I was curious.
45.37 ± 0.63times faster than.\MAC.exe =N.ape.-c5000.ape NUL -d
54.04 ± 0.57times faster thanffmpeg -i =N.ape.-c5000.ape -f wav -y NULThe only one where ffmpeg ran slower than the official.  Yes I re-ran them.
It seems ffmpeg does that thing pretty universally, but not too well on Monkey's.
At this speed I might wonder whether there are significant differences due to whether/how the decoders ensure that the file is properly closed - even if it is null output. Speculations, but wavpack the encoder does close and reopen upon verification ...?

Also tested:

MPEG4-ALS. ffmpeg crashed consistently on this file, of course that made for the "fastest" run and all the other figures wrong. Instead of correcting them: discard and another overnight run.

Extra time to write out.wav on same SSD compared to NUL
* zero-ish: wvunpack
* 0.07 to 0.11: ffmpeg (unreliably measured on .ape)
* 0.36 to 0.55: flac.exe (Xiph and Wombat, -2e was worst) and tta.exe
* 0.7 for refalac

takc.exe writes NUL.wav which takes 1.0 (1.2 seconds) more than just test decode - how much of that is for actual file and how much is for null output, I don't know. But it leads to this:
All official decoders can do test decode - verify by decoding. Extra time for them to do -o NUL compared to verify by decoding:
* 0.06 to 0.10 for flac.exe
* 0.5 ± a little, for wvunpack --threads, and 0.85 ± a little for single-threaded wvunpack.  Is this the penalty for checking that the file is properly closed, I think WavPack goes to greater lenghts to do that?
* (unreliably measured on ape ... at those speeds it doesn't matter much.  If you don't want to wait, use the official GUI that can spawn a thread per file.)

And more:
* wvunpack --threads=<1 through 8>. One number posted in the table.
* Did .wv files encoded with --threads take more or less time to decode?  No, all within the variations. ± 0.07
* How did Wombat's most recent flac build do? .42 to .49 slower. 
Timing for ffmpeg -i =N.wav -f wav -y NUL: around 0.4 seconds.  This is the only "seconds" here.

Except the latter 0.4 seconds: numbers are differences in the "times faster than", so add twenty percent to get it in seconds.


Finally, file sizes. WAV is 773972684, and the following are compression ratios - the content of old jazz/soul makes for smaller files:
45.8%   =N.ofr.--presetmax.ofr
46.3%   =N.ofr.--preset7.ofr
47.0%   =N.ofr.--preset2.ofr
47.4%   =N.tak.-p4m.tak
47.9%   =N.ofr.--preset0.ofr
48.1%   =N.tak.-p2.tak
49.1%   =N.wv--threads.-hhx3.wv
49.1%   =N.wv.-hhx3.wv
49.3%   =N.wv.-hx2.wv
49.4%   =N.wv--threads.-hx2.wv
49.5%   =N.flac.-8l32.flac
49.6%   =N.als
49.6%   =N.als.m4a
49.8%   =N.tta
50.0%   =N.tak.-p0.tak
50.1%   =N.flac.-5.flac
50.1%   =N.wv.-x.wv
50.1%   =N.wv--threads.-x.wv
50.4%   =N.alac.cuetools8.m4a
51.1%   =N.alac.refalac.m4a
51.6%   =N.alac.ffmpeg.m4a
51.9%   =N.wv--threads.-f.wv
51.9%   =N.wv.-f.wv
53.0%   =N.flac.-2e.flac
58.4%   =N.wv.ffmpeg-0.wv
59.9%   =N.shn
60.7%   =N.flac.-0r0b4096--no-md5.flac
70.9%   =N.mlp.mka


Soon to be posted: fast-verification times.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #1
TTA have fixed number of encoded samples for each packet, except last packet in file. There is no ways to do any optimizations here except bruteforce threading.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #2
FFmpeg still does so much better than tta.exe that it adds to the suspicion that the reference implementation isn't very good.
I don't speak code, but the following also indicate that reference tta.exe isn't particularly stellar:
* FFmpeg-tta does things that tta.exe cannot - like detect errors.
* Official foobar2000 component errs out on certain files (I think it is 8-bits, fixed in case's component).
* I have not tested this rewrite, but it claims speedups: https://hydrogenaud.io/index.php/topic,125048.0.html
* tta.exe is picky about WAVE version, and thinks that WAVE sample count is signed integer.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #3
Impressive improvements!
Are FFMPEG lossy encoders also multithreaded? It could be also very interesting for video tools (handbrake…).

Thanks for this table Porcus, it's very interesting :)

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #4
Note, I have tested DEcoding here, not ENcoding. What has happened in ffmpeg 7.0 on the encoding ... quoting from https://ffmpeg.org/#cli_threading

Thanks to a major refactoring of the ffmpeg command-line tool, all the major components of the transcoding pipeline (demuxers, decoders, filters, encodes, muxers) now run in parallel. This should improve throughput and CPU utilization, decrease latency, and open the way to other exciting new features.
Note that you should not expect significant performance improvements in cases where almost all computational time is spent in a single component (typically video encoding).

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #5
Note, I have tested DEcoding here, not ENcoding.

Ah yes, it's mentioned in the title  :-*

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #6
Fast-verification times.

WavPack (from format 5, decoder 5.40), Monkey's (CLI from ... year twentytwenty-something) and OptimFROG can verify a file without carrying out the decoding - especially good on the latter two, that incur some CPU load doing decoding. Of course, no decoding does not verify that the audio is what it is supposed to be, but block-level checksums should protect against bit-flips and general corruption.

Other formats, like FLAC, do have block-level checksums and could do the same, but with no application supporting it.
Whether it would offer much value-added for FLAC, which decodes fast and whose users are so accustomed to having audio MD5 being included that the file vendor who supplies FLAC downloads without MD5 gets the evil eye - up to opinion, but at least here is a take on the differences in speed.

Same single file as above. Take note that the fastest of these, WavPack in high mode (fewer blocks?) ran in
0.239 seconds <--- 18358x realtime!
  .wvunpack.exe -q -vv =N.wv.-hhx3.wv ran
    1.00 ± 0.02 times faster than .wvunpack.exe -q -vv =N.wv.-hx2.wv
    1.22 ± 0.02 times faster than .wvunpack.exe -q -vv =N.wv.-x.wv
    1.23 ± 0.04 times faster than .wvunpack.exe -q -vv =N.wv.-f.wv
    3.02 ± 0.05 times faster than .MAC.exe =N.ape.-c5000.ape -v
    3.06 ± 0.05 times faster than .MAC.exe =N.ape.-c3000.ape -v
    3.13 ± 0.05 times faster than .MAC.exe =N.ape.-c1000.ape -v
    3.47 ± 0.06 times faster than .ofr.exe --verify =N.ofr.--presetmax.ofr
    3.54 ± 0.06 times faster than .ofr.exe --verify =N.ofr.--preset7.ofr
    3.59 ± 0.07 times faster than .ofr.exe --verify =N.ofr.--preset2.ofr
    3.66 ± 0.06 times faster than .ofr.exe --verify =N.ofr.--preset0.ofr
   10.20 ± 0.17 times faster than .flac.exe -ss -t =N.flac.-0r0b4096--no-md5.flac
   12.37 ± 0.23 times faster than .flac-wombat.exe -ss -t =N.flac.-0r0b4096--no-md5.flac
   14.36 ± 0.24 times faster than .flac.exe -ss -t =N.flac.-5.flac
   15.09 ± 0.29 times faster than .flac.exe -ss -t =N.flac.-2e.flac
   16.35 ± 0.28 times faster than .flac-wombat.exe -ss -t =N.flac.-5.flac
   17.00 ± 0.27 times faster than .flac-wombat.exe -ss -t =N.flac.-2e.flac
   17.62 ± 0.28 times faster than .flac.exe -ss -t =N.flac.-8l32.flac
   19.69 ± 0.33 times faster than .flac-wombat.exe -ss -t =N.flac.-8l32.flac
   20.55 ± 0.36 times faster than .takc.exe -t =N.tak.-p0.tak
   23.21 ± 0.42 times faster than .takc.exe -t =N.tak.-p2.tak
   24.26 ± 0.42 times faster than .takc.exe -t =N.tak.-p4m.tak
   28.61 ± 0.48 times faster than .wvunpack.exe -q -vv =N.wv.ffmpeg-0.wv
No "fast" verification in the latter, which is a WavPack version 4 file - that is what ffmpeg creates. Included as a "(s)low anchor".
-q for quiet, -ss for silent, I am not sure if it matters since hyperfine does not display a console, but ... habits, habits. "flac-wombat.exe": renamed the exe of the latest build (link in original post).
hyperfine command in the bat file, the pings take a second each and are for pause in between:
hyperfine.exe -i --style full -r 11 -w 1 --prepare "(for /l %%t IN (1,1,8) DO ping 127.0.0.1 )" <and the command list>

Summarizing:
* WavPack (fastest) verifies around 3x as quickly as Monkey's and OptimFROG. WavPack's block-level checksum is evidently fast.
* Still the slowest frog verifies 73 minutes CDDA in less than a second ...
* ... which in turn is 4x to 5x the speed of FLAC, at least if your flac files have MD5 as they reasonably should.
* TAK to FLAC ratio are what you would expect from decoding, because that is what they do. Same goes for that old WavPack format.

Also tested:
* On USB3-connected spinning drive: tested the fastest   .wvunpack.exe -q -vv =N.wv.-hhx3.wv , at like 10 percent time penalty. Also a cursory test on Monkey's confirms that I/O doesn't do that much here.
* Multithreading the fastest wvunpack, that is .wvunpack.exe --threads  -q -vv =N.wv.-hhx3.wv .  Somewhat surprising, that incurred an additional nine percent-ish penalty on the USB3 spinning drive, but saved nine percent-ish on the SSD.


More discussion on error detection capabilities and robustness at https://hydrogenaud.io/index.php/topic,122094 . Note that the reference FLAC decoder has in the meantime been changed to mute corrupted blocks (so output has the right length) rather than to drop them.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #7
@Porcus
Thanks for your always thorough tests! Interestingly your results differ from mine somewhat (e.g., slower WavPack) and I'm not sure exactly what's going on, but I'll post them here in a table for reference. My technique is not nearly as automated nor exhaustive as yours, but I did run the tests enough times to convince myself that I was getting reasonably accurate results. I tested on FFmpeg 7.0, WavPack 5.7.0, and one of the most recent FLAC builds on a double-album CDDA file (2h18m) encoded to WavPack and FLAC (w/ and w/o MD5) at modes suited for fast decoding.

Your system has 8 threads and mine 12, but I see the same relative speeds on my other Intel 8-thread machine and my 16-core AMD (but I don't test on those because neither are Windows).

One of the limits of WavPack multithreading in its current form is that it can't keep all physical threads continuously busy because it only runs worker threads during the actual client call into libwavpack. So each call splits the work into the requested number of threads and then waits until the last one finishes before returning to the caller. This might be why adding additional threads beyond those physically available continues to improve performance in sort of a linear way.

Also, using just --threads is the equivalent (for now) to --threads=5. There is no determination based on available threads or anything like that, although that could obviously be added at some future date. That value (5) is the point where the trade-off between CPU work and speed starts to significantly deteriorate. In other words, --threads=12 will almost always be faster than the default (unless the CPU starts throttling down), but will use significantly more total CPU time/power due to context switching.

Multithreaded Decoding Test

  • File details: duration is 2:18:54.44, 16-bit, 44.1-kHz, stereo
  • System: Win 10, Intel i7-10710U, 6-core, 12-thread
  • FLAC encoding: default, file size = 811883811 bytes (55.22%)
  • WavPack encoding: -fx6, file size = 814957694 bytes (55.43%)
  • FFmpeg command: ffmpeg [-threads 1] -i <file> -f wav -y NUL
  • wvunpack command: wvunpack [--threads=N] <file> -z0qyo NUL
  • FLAC command: flac -ss -d <file> -fo NUL

FormatProgramOptionsTimeComment
flacFFmpeg2.10 sec3968 xRT (3.5 x single-threaded)
WavPackwvunpack--threads=122.94 sec2835 xRT (5.4 x single-threaded)
WavPackwvunpack--threads=83.62 sec2302 xRT (4.4 x single-threaded)
WavPackFFmpeg4.14 sec2013 xRT (5.4 x single-threaded)
WavPackwvunpack--threads4.91 sec1697 xRT (3.3 x single-threaded)
flac-no-md5flac6.24 sec1336 xRT
flacFFmpeg-threads 17.28 sec1145 xRT
flac-md5flac8.68 sec960 xRT
WavPackwvunpack16.01 sec521 xRT
WavPackFFmpeg-threads 122.53 sec370 xRT


Final notes:
  • I added -z0 to the wvunpack commands to avoid updating the console window title (helps a little)
  • I did not discover -hide_banner -loglevel error for FFmpeg until after these tests, so that disadvantages it a little
  • It is interesting that FFmpeg manages to implement decent multi-threaded FLAC decoding despite the frame length not being present in the header. How does it do that?

Quick Verify

As for the significantly faster performance of the WavPack quick-verify mode, your guess is probably right that it's because the checksum I use is very fast. It's far simpler than an MD5 or even a CRC, but it's not quite as simple (or as weak cryptographically) as a true checksum (there's an additional shift and add each byte). However, there is absolutely no support of multithreading with the quick verify, so those differences you show are suspect.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #8
It is interesting that FFmpeg manages to implement decent multi-threaded FLAC decoding despite the frame length not being present in the header. How does it do that?

ffmpeg has strictly seperated decoding and demuxing. So for FLAC it looks for sync codes and does some short integrity checks as part of the demuxing. When decoding FLAC in ffmpeg, you'll see warnings every now and then because of that, when it stumbles upon something it thinks is a frame, but isn't. This has been the case for many years already, because of this strict separation.

Of course, with this mechanism in place, multithreading decoding is rather trivial.
Music: sounds arranged such that they construct feelings.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #9
Hm, definitely some confusion on me, as usual:
* I also found out that not only TAK, but also
wvunpack filename.wv -yo NUL writes to NUL.wav, and that seemingly takes more time than stdout redirected to NUL:
wvunpack filename.wv -yo - > NUL
* Weird about that fast-verification --threads, the numbers looked consistent enough to conclude, and I didn't think it would tax the CPU that much. Seven seconds in between a quarter of a second work?!

(Does Windows keep the executable in memory or something?)


Of course, with this mechanism in place, multithreading decoding is rather trivial.
So ... the obvious question is, any reason why not?
The odd event that "a valid frame header" shows up just by random in the data (the FLAC specification doesn't forbid junk between frames, as long as it is byte-aligned, and in any case parsing must take into account that a stream may be broken ...). Or even worse and more odd, an entire "valid frame" starting inside another?


Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #10
Because of the following:

you'll see warnings every now and then

It makes decoding much more complicated, less predictable and less stable. For ffmpeg it was necessary to fit its model in which decoding and demuxing is completely separated.
Music: sounds arranged such that they construct feelings.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #11
In other words, MP4/Matroska/Ogg/CAF is actually better for ffmpeg than the original FLAC container format?
Among these, for only FLAC container fb2k cannot do real-time bitrate display.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #12
In other words, MP4/Matroska/Ogg/CAF is actually better for ffmpeg than the original FLAC container format?
Yes, the inability to reliably skip ahead 1 frame without having to decode it is sometimes a disadvantage. For multithreading this is very valuable. However, relying solely on frame lengths is much less robust, and relying on both adds overhead of course. Maybe FLACs design was a bit too much focussed on reducing overhead.
Music: sounds arranged such that they construct feelings.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #13
I tested FLAC in containers. Not CAF, I forgot about that one. With and without multithreading ffmpeg. This time I tried a shorter file - half an hour - because there were so many to run through.
With quite extreme settings, including blocksize 16 - that malice paid off ...
Turns out ffmpeg refused to remux the uncompressed flac streams into any of the three containers I tried.

Container overhead
* flac -5 is a sane setting, and the biggest overhead for that one was 0.44 percent (not percentage points) for OGG container
* Blocksize 16 is just nuts, but for what the file sizes are worth - .wav in the middle. No padding:
323 001 659 ¨3x.flac-8b16.flac
328 733 400 ¨3x.flac-8b16.flac.oga
331 702 604 ¨3x.wav
343 738 725 ¨3x.flac-8b16.flac.mp4
354 113 911 ¨3x.flac-8b16.flac.mka
9.6 percent penalty for putting it in Matroska. I used ffmpeg, comments like ffmpeg -i ¨3x.flac-8b16.flac -acodec copy -vn -sn ¨3x.flac-8b16.flac.mka


For sorting I moved the ".oga" etc. to a separate column. ¨3x.flac-5.flac <tab> .oga means the file is an OGG containered ¨3x.flac-5.flac.oga .  (The reason for the "¨" is to make sure the test audio files had a character nothing else had.)
Threadsdecodersettings on encodingcontainerspeed x realtimecomment
1flac.exe¨3x.flac-0b65535--no-md5--uncompressed.flac500
1ffmpeg¨3x.flac-0b65535--no-md5--uncompressed.flac8791That is extreme!
7ffmpeg¨3x.flac-0b65535--no-md5--uncompressed.flac8685
1flac.exe¨3x.flac-0b65535--no-md5.flac527
1ffmpeg¨3x.flac-0b65535--no-md5.flac1474about same for containers
7ffmpeg¨3x.flac-0b65535--no-md5.flac3544slower than containers
7ffmpeg¨3x.flac-0b65535--no-md5.flac.oga4919
7ffmpeg¨3x.flac-0b65535--no-md5.flac.mp46013mp4 very fast
7ffmpeg¨3x.flac-0b65535--no-md5.flac.mka5932
1flac.exe¨3x.flac-0r0--no-md5.flac518
1ffmpeg¨3x.flac-0r0--no-md5.flac1049about same for containers
7ffmpeg¨3x.flac-0r0--no-md5.flac1869containers are only slightly faster.
7ffmpeg¨3x.flac-0r0--no-md5.flac.oga1879
7ffmpeg¨3x.flac-0r0--no-md5.flac.mp41918
7ffmpeg¨3x.flac-0r0--no-md5.flac.mka1924Not that much faster
1flac.exe¨3x.flac-5.flac518
1ffmpeg¨3x.flac-5.flac966about same for containers
7ffmpeg¨3x.flac-5.flac2981
7ffmpeg¨3x.flac-5.flac.oga3600noticeably faster in all containers
7ffmpeg¨3x.flac-5.flac.mp43827
7ffmpeg¨3x.flac-5.flac.mka3854
1flac.exe¨3x.flac-8b16.flac247
1ffmpeg¨3x.flac-8b16.flac80about as slow for containers
7ffmpeg¨3x.flac-8b16.flac31Even slower! And about as slow for containers
1ffmpeg¨3x.flac-8pr8--lax-l32.flac669about the same for containers. Forgot to run flac.exe on this one.
7ffmpeg¨3x.flac-8pr8--lax-l32.flac2493
7ffmpeg¨3x.flac-8pr8--lax-l32.flac.oga2599
7ffmpeg¨3x.flac-8pr8--lax-l32.flac.mp42631
7ffmpeg¨3x.flac-8pr8--lax-l32.flac.mka2642
I am not sure how ffmpeg -threads 1 works, if I should use "0" to get single-threaded? Because it does decode much quicker than reference flac. I also did ffmpeg decoded without -threads command, that uses all 8, and that would improve the flac-in-other-containers slightly (but harm wavpack slightly, I leave that for a separate post).

So table does not list speed for ffmpeg without -threads, nor for the following:
* the same entire thing ran on USB3-connected spinning drive. Differences were just very minor. These figures are on internal SSD.
* reference flac.exe decoding -8pr8 --lax -l32 because human error
* ogg/mp4/mkv decoded with ffmpeg -threads 1, those were pretty much the same as .flac speeds
* same for the -8b16 in containers, those were just as horrible as .flac
Yes blocksize 16 decodes slow, but ffmpeg just does it terribly.

All ffmpeg had a "-hide_banner -loglevel error" but I don't know if that matters when hyperfine doesn't display it.

I also went the other way with a bigger file, 219 minutes. No fancy table formatting here, I just paste hyperfine output. No containers tested that time. Fastest decoding took 1.054 seconds, and then:
Code: [Select]
Summary
  ffmpeg -i ¨219min.flac-0b65535--no-md5--uncompressed.flac -hide_banner -loglevel error -f wav -y NUL  ran
    1.02 ± 0.02 times faster than ffmpeg -threads 1 -i ¨219min.flac-0b65535--no-md5--uncompressed.flac -hide_banner -loglevel error -f wav -y NUL
    1.04 ± 0.02 times faster than ffmpeg -threads 7 -i ¨219min.flac-0b65535--no-md5--uncompressed.flac -hide_banner -loglevel error -f wav -y NUL
    2.80 ± 0.04 times faster than ffmpeg -threads 7 -i ¨219min.flac-0b65535--no-md5.flac -hide_banner -loglevel error -f wav -y NUL
    2.83 ± 0.16 times faster than ffmpeg -i ¨219min.flac-0b65535--no-md5.flac -hide_banner -loglevel error -f wav -y NUL
    4.07 ± 0.06 times faster than ffmpeg -i ¨219min.flac-7--lax-l32.flac -hide_banner -loglevel error -f wav -y NUL
    4.19 ± 0.08 times faster than ffmpeg -threads 7 -i ¨219min.flac-7--lax-l32.flac -hide_banner -loglevel error -f wav -y NUL
    6.09 ± 0.08 times faster than .\wvunpack --threads=7 ¨219min.wv.-fx0.wv -z0qyo -o - > NUL
    6.26 ± 0.09 times faster than ffmpeg -threads 7 -i ¨219min.flac-0r0--no-md5.flac -hide_banner -loglevel error -f wav -y NUL
    6.33 ± 0.18 times faster than ffmpeg -i ¨219min.flac-0r0--no-md5.flac -hide_banner -loglevel error -f wav -y NUL
    6.58 ± 0.15 times faster than ffmpeg -i ¨219min.wv.-fx0.wv -hide_banner -loglevel error -f wav -y NUL
    6.86 ± 0.12 times faster than ffmpeg -threads 7 -i ¨219min.wv.-fx0.wv -hide_banner -loglevel error -f wav -y NUL
    7.21 ± 0.09 times faster than .\wvunpack --threads=7 ¨219min.wv.-gx1--blocksize=4096.wv -z0qyo -o - > NUL
    7.42 ± 0.10 times faster than .\wvunpack --threads=7 ¨219min.wv.-gx1.wv -z0qyo -o - > NUL
    7.99 ± 0.11 times faster than ffmpeg -threads 1 -i ¨219min.flac-0b65535--no-md5.flac -hide_banner -loglevel error -f wav -y NUL
    8.37 ± 0.25 times faster than ffmpeg -i ¨219min.wv.-gx1.wv -hide_banner -loglevel error -f wav -y NUL
    8.58 ± 0.15 times faster than .\wvunpack --threads=4 ¨219min.wv.-fx0.wv -z0qyo -o - > NUL
    8.71 ± 0.12 times faster than ffmpeg -threads 7 -i ¨219min.wv.-gx1.wv -hide_banner -loglevel error -f wav -y NUL
    8.78 ± 0.16 times faster than ffmpeg -i ¨219min.wv.-gx1--blocksize=4096.wv -hide_banner -loglevel error -f wav -y NUL
    9.34 ± 0.14 times faster than ffmpeg -threads 7 -i ¨219min.wv.-gx1--blocksize=4096.wv -hide_banner -loglevel error -f wav -y NUL
    9.93 ± 0.49 times faster than .\wvunpack --threads=4 ¨219min.wv.-gx1--blocksize=4096.wv -z0qyo -o - > NUL
   10.18 ± 0.13 times faster than .\wvunpack --threads=7 ¨219min.wv.-hx2.wv -z0qyo -o - > NUL
   10.46 ± 0.21 times faster than .\wvunpack --threads=4 ¨219min.wv.-gx1.wv -z0qyo -o - > NUL
   11.34 ± 0.15 times faster than ffmpeg -threads 1 -i ¨219min.flac-0r0--no-md5.flac -hide_banner -loglevel error -f wav -y NUL
   11.50 ± 0.16 times faster than ffmpeg -i ¨219min.wv.-hx2.wv -hide_banner -loglevel error -f wav -y NUL
   12.21 ± 0.16 times faster than ffmpeg -threads 7 -i ¨219min.wv.-hx2.wv -hide_banner -loglevel error -f wav -y NUL
   12.99 ± 0.17 times faster than .\wvunpack --threads=7 ¨219min.wv.-hhx3.wv -z0qyo -o - > NUL
   14.02 ± 0.19 times faster than .\wvunpack --threads=4 ¨219min.wv.-hx2.wv -z0qyo -o - > NUL
   14.68 ± 0.19 times faster than ffmpeg -i ¨219min.wv.-hhx4--blocksize=131072.wv -hide_banner -loglevel error -f wav -y NUL
   15.45 ± 0.21 times faster than ffmpeg -i ¨219min.wv.-hhx3.wv -hide_banner -loglevel error -f wav -y NUL
   15.53 ± 0.21 times faster than ffmpeg -threads 7 -i ¨219min.wv.-hhx4--blocksize=131072.wv -hide_banner -loglevel error -f wav -y NUL
   16.35 ± 0.22 times faster than ffmpeg -threads 7 -i ¨219min.wv.-hhx3.wv -hide_banner -loglevel error -f wav -y NUL
   16.53 ± 0.22 times faster than ffmpeg -threads 1 -i ¨219min.flac-7--lax-l32.flac -hide_banner -loglevel error -f wav -y NUL
   17.87 ± 0.24 times faster than .\wvunpack --threads=4 ¨219min.wv.-hhx3.wv -z0qyo -o - > NUL
   19.32 ± 0.27 times faster than .\wvunpack --threads=7 ¨219min.wv.-hhx4--blocksize=131072.wv -z0qyo -o - > NUL
   20.56 ± 0.27 times faster than .\wvunpack ¨219min.wv.-fx0.wv -z0qyo -o - > NUL
   23.03 ± 0.31 times faster than .\flac ¨219min.flac-0b65535--no-md5.flac -fo NUL
   23.53 ± 0.31 times faster than .\flac ¨219min.flac-0r0--no-md5.flac -fo NUL
   24.39 ± 0.54 times faster than .\flac ¨219min.flac-0b65535--no-md5--uncompressed.flac -fo NUL
   25.68 ± 0.34 times faster than .\flac ¨219min.flac-7--lax-l32.flac -fo NUL
   25.74 ± 0.34 times faster than .\wvunpack ¨219min.wv.-gx1.wv -z0qyo -o - > NUL
   25.98 ± 0.34 times faster than .\wvunpack ¨219min.wv.-gx1--blocksize=4096.wv -z0qyo -o - > NUL
   27.86 ± 0.37 times faster than ffmpeg -threads 1 -i ¨219min.wv.-fx0.wv -hide_banner -loglevel error -f wav -y NUL
   28.06 ± 0.37 times faster than .\wvunpack --threads=4 ¨219min.wv.-hhx4--blocksize=131072.wv -z0qyo -o - > NUL
   33.95 ± 0.45 times faster than .\wvunpack ¨219min.wv.-hx2.wv -z0qyo -o - > NUL
   36.93 ± 0.50 times faster than ffmpeg -threads 1 -i ¨219min.wv.-gx1.wv -hide_banner -loglevel error -f wav -y NUL
   40.31 ± 0.63 times faster than ffmpeg -threads 1 -i ¨219min.wv.-gx1--blocksize=4096.wv -hide_banner -loglevel error -f wav -y NUL
   42.66 ± 0.56 times faster than .\wvunpack ¨219min.wv.-hhx4--blocksize=131072.wv -z0qyo -o - > NUL
   43.94 ± 0.58 times faster than .\wvunpack ¨219min.wv.-hhx3.wv -z0qyo -o - > NUL
   49.46 ± 0.66 times faster than .\flac ¨219min.flac-8b16.flac -fo NUL
   52.08 ± 0.72 times faster than ffmpeg -threads 1 -i ¨219min.wv.-hx2.wv -hide_banner -loglevel error -f wav -y NUL
   65.43 ± 0.86 times faster than ffmpeg -threads 1 -i ¨219min.wv.-hhx4--blocksize=131072.wv -hide_banner -loglevel error -f wav -y NUL
   69.96 ± 1.06 times faster than ffmpeg -threads 1 -i ¨219min.wv.-hhx3.wv -hide_banner -loglevel error -f wav -y NUL
  177.26 ± 2.49 times faster than ffmpeg -threads 1 -i ¨219min.flac-8b16.flac -hide_banner -loglevel error -f wav -y NUL
  427.60 ± 5.75 times faster than ffmpeg -threads 7 -i ¨219min.flac-8b16.flac -hide_banner -loglevel error -f wav -y NUL
  430.33 ± 7.03 times faster than ffmpeg -i ¨219min.flac-8b16.flac -hide_banner -loglevel error -f wav -y NUL

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #14
I already mentioned that current ffmpeg cli utility is useless for extremely small packets. FFmpeg developer that rewrote ffmpeg.c related code did not care and still does not care about this bug. So for small packets use lib calls directly instead of brain-dead ffmpeg.c implementation.

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #15
BLUNDER on me and on ffmpeg.

ffmpeg errs out on the ¨3x.flac-0b65535--no-md5--uncompressed.flac also when decoding. Of course I should have checked that when it refuses to demux.

It is not about it using the only-verbatim-subframes flac - likely it is about frames being too big.
The attached 1.3 second flac file - good old Merzbow at it again - has 57330 samples and is created with
-0r0 --no-padding -fb57330 --lax
So one frame, both subframes are FIXED, order 1.
ffmpeg cannot decode it. Recompress it with smaller block size, and it will - 57300 is still too large though.

Edit: reuploaded without artwork, that is not the blame - and seems the "r0" is superfluous.
Fiddling around with files I found out that padding-or-not could even influence the max block size. I got a file where 53207 with default padding is OK, 53208 with default padding is not, 53208 with --no-padding is OK.


@ktf, of course there is nothing wrong with the file? The blame is squarely on ffmpeg?
It makes decoding much more complicated, less predictable and less stable.
You might have had a point ...

 

Re: Tested: Lossless decoding speed, multithreaded - and fast verification

Reply #16
FLAC format is brain-dead from stream-oriented usages. Its same like historic shorten format.