Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lossless codec comparison update (Read 3003 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lossless codec comparison update

I've updated my lossless codec comparison. Not much has changed, I'll leave drawing conclusions to you  :)

Music: sounds arranged such that they construct feelings.

Re: Lossless codec comparison update

Reply #1
Nice work as always.
On an Intel this time. So, is it the CPU or is it 1.4.3 that makes -3 decode slower than -0 / -0 decode faster than -3?

You might have made a 1.3.4 vs 1.4.3 plot or two that you were too modest to tout?


Re: Lossless codec comparison update

Reply #3
On an Intel this time. So, is it the CPU or is it 1.4.3 that makes -3 decode slower than -0 / -0 decode faster than -3?
Compare with https://hydrogenaud.io/index.php/topic,122508.msg1024512.html#msg1024512, the i3-10100T. Not much difference in the grand scale of things, just a bit faster encoding and a bit slower decoding.

Quote
You might have made a 1.3.4 vs 1.4.3 plot or two that you were too modest to tout?
No, I don't.

http://www.audiograaf.nl/losslesstest/revision%206/All%20CDDA%20sources.csv missing  ;)
Fixed.
Music: sounds arranged such that they construct feelings.

Re: Lossless codec comparison update

Reply #4
Compare with https://hydrogenaud.io/index.php/topic,122508.msg1024512.html#msg1024512
Ah, that chart yes. We know that FLAC has improved, and it is possible to see by putting them over one another - although but the fine grain of that is drowned in the scale.

Although http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%206%20-%20main.html doesn't say so outright except for refalac64 (encoding, I bet decoding is the same ;-) ) then I guess from the other thread Reply#34 that you used 64-bit whenever available - even if that might be slower? I was for a second asking myself what end-users would do ("x64 is newer, must be better" or "86>64, must be better then"?) but then it struck me that, how many users are actively selecting a CLI themselves ... ?

Also, your figures confirm one thing about Monkey's: for for CDDA the encoding and decoding CPU loads are now pretty much matching. It used to decode even slower than encoding, and apparently it still does for higher sampling rates (where it also isn't very good, but that is a different topic).

Re: Lossless codec comparison update

Reply #5
Ah, that chart yes. We know that FLAC has improved, and it is possible to see by putting them over one another - although but the fine grain of that is drowned in the scale.
Which of course puts things into perspective: the gains aren't that big, and FLAC is already very fast.

Although http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%206%20-%20main.html doesn't say so outright
I've updated the page to explicitly state that:
Quote
64-bit compiles have been used, except for TAK, Shorten, La and MP4ALS, where no 64-bit executable was available.
Music: sounds arranged such that they construct feelings.

Re: Lossless codec comparison update

Reply #6
Great, thanks ktf! Would it be possible for you to, in these two figures, let the horizontal axes both start at exactly 0.1%? That way FLAC's curve wouldn't be glued to the vertical axis so much, making it easier to read.

Chris
If I don't reply to your reply, it means I agree with you.

Re: Lossless codec comparison update

Reply #7
I have also struggled to read where the axis starts yes.

And when I do, I get some surprises:
Is it really so that FLAC -0/-1/-2 encode faster than they decode on high resolution and multi-channel?

Re: Lossless codec comparison update

Reply #8
encode faster than they decode
Why ask when I can reproduce and confirm, ...
Actually, so is the case for TAK as well. It is visible in ktf's charts, and on this computer on high resolution (96 and 192), both TAK -p0, -p1, -p2 and -p3 would encode faster than decoding. Moreover, still on high resolution, -p0 would encode/decode at FLAC -0 speeds, although that is the absence of MD5 calculations.

But it might be the case for TAK -p0 on CDDA too? @ktf , you got the numbers ... ?
I tested it on this computer, and differences are very small - but then, this computer has also small speed differences between TAK -p0 and TAK -p1.

Re: Lossless codec comparison update

Reply #9
encode faster than they decode
Why ask when I can reproduce and confirm, ...
Actually, so is the case for TAK as well. It is visible in ktf's charts, and on this computer on high resolution (96 and 192), both TAK -p0, -p1, -p2 and -p3 would encode faster than decoding. Moreover, still on high resolution, -p0 would encode/decode at FLAC -0 speeds, although that is the absence of MD5 calculations.
Isn't really surprising to me. On encoding, all data is available, and that means you can either do SIMD or use superscalar capabilites better. At least with FLAC, quite a lot of operations are independent. On decoding, you really need to do stuff sequential, because you don't yet have all the data.

On a really FLAC fast preset, there is almost no brute-forcing going on, which means the advantage mentioned above is visible. With the slower presets, there is some sort of brute-forcing, meaning some calculations are not used in compression because it turned out a different route resulted in better compression.

Combine that with the fact that FLAC has seen most optimizations happen for the slowest presets on encoding (i.e. the brute-forcing) and that optimizations for decoding have been much more general (i.e. parsing), and you see why the focus on the fast presets resulted in this.

Quote
But it might be the case for TAK -p0 on CDDA too? @ktf , you got the numbers ... ?
I tested it on this computer, and differences are very small - but then, this computer has also small speed differences between TAK -p0 and TAK -p1.
I didn't. I just hacked my script to give me the raw data. Just the main graphs, not the 96kHz and 5.1 graphs.

Colums are: codec, preset, encoding CPU speed, encoding speed (wall time), decoding CPU speed, decoding speed (wall time), compression
Music: sounds arranged such that they construct feelings.

Re: Lossless codec comparison update

Reply #10
On encoding, all data is available, and that means you can either do SIMD or use superscalar capabilites better. At least with FLAC, quite a lot of operations are independent. On decoding, you really need to do stuff sequential, because you don't yet have all the data.
That is due to say, a 4096 sample unencoded block is of known length, but the encoded blocks/subframes don't, at least not in FLAC?

On a really FLAC fast preset, there is almost no brute-forcing going on
Ah, and that (at least in part) explains also why multichannel -5 speed is pretty much tied between encoding and decoding, because FLAC is bound to dual mono?
Anyway, both -0 and -1 do encode faster than decode. For high resolution also -2 does so, though narrowly, that is with full channel decorrelation strategy. For multi-channel, -3 does, and -5 wall/CPU times fall on opposite signs.

TAK -p0 also encodes faster than decodes for CDDA, and from the plots also for 5.1 and maybe / maybe not for high resolution.


(Of course, for the slower codecs, "decodes slower than it encodes" doesn't impress as much.)

Re: Lossless codec comparison update

Reply #11
That is due to say, a 4096 sample unencoded block is of known length, but the encoded blocks/subframes don't, at least not in FLAC?
No. It is due to all data being available. When encoding, all sample values are known and have to be encoded. A CPU can already start on transforming the next value if processing on a value stalls. When decoding, all sample values are unknown and have to be decoded, and one really needs to finish with a value before you can start on the next.

Superscalar, out-of-order processing doesn't work like multithreading on frame-by-frame basis, instead it essentially mixes instructions. See Superscalar processor and Out-of-order execution. So this only saves nanoseconds instead of milliseconds each time it is invoked, but there are much, much more opportunities to do so.

On the very fast presets, the most time consuming parts of encoding and decoding are MD5 summing and bitreading/bitwriting. MD5 summing is infamous for its very long dependecy chains (so a CPU can't do any reordering). Bitwriting (on encoding) provides more opportunities for superscalar performance than bitreading (on decoding) because there are fewer dependencies between instructions.
Music: sounds arranged such that they construct feelings.