Lossless codec comparison update

Topic: Lossless codec comparison update (Read 3003 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Lossless codec comparison update

2023-08-29 18:26:57

I've updated my lossless codec comparison. Not much has changed, I'll leave drawing conclusions to you

Re: Lossless codec comparison update

Reply #1 – 2023-08-29 19:12:10

Nice work as always.
On an Intel this time. So, is it the CPU or is it 1.4.3 that makes -3 decode slower than -0 / -0 decode faster than -3?

You might have made a 1.3.4 vs 1.4.3 plot or two that you were too modest to tout?

Re: Lossless codec comparison update

Reply #2 – 2023-08-29 22:18:24

http://www.audiograaf.nl/losslesstest/revision%206/All%20CDDA%20sources.csv missing

Re: Lossless codec comparison update

Reply #3 – 2023-08-30 07:11:43

Quote from: Porcus on 2023-08-29 19:12:10

On an Intel this time. So, is it the CPU or is it 1.4.3 that makes -3 decode slower than -0 / -0 decode faster than -3?

Compare with https://hydrogenaud.io/index.php/topic,122508.msg1024512.html#msg1024512, the i3-10100T. Not much difference in the grand scale of things, just a bit faster encoding and a bit slower decoding.

Quote

You might have made a 1.3.4 vs 1.4.3 plot or two that you were too modest to tout?

No, I don't.

Quote from: Thundik81 on 2023-08-29 22:18:24

http://www.audiograaf.nl/losslesstest/revision%206/All%20CDDA%20sources.csv missing

Fixed.

Re: Lossless codec comparison update

Reply #4 – 2023-08-30 08:54:44

Quote from: ktf on 2023-08-30 07:11:43

Compare with https://hydrogenaud.io/index.php/topic,122508.msg1024512.html#msg1024512

Ah, that chart yes. We know that FLAC has improved, and it is possible to see by putting them over one another - although but the fine grain of that is drowned in the scale.

Although http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%206%20-%20main.html doesn't say so outright except for refalac64 (encoding, I bet decoding is the same ;-) ) then I guess from the other thread Reply#34 that you used 64-bit whenever available - even if that might be slower? I was for a second asking myself what end-users would do ("x64 is newer, must be better" or "86>64, must be better then"?) but then it struck me that, how many users are actively selecting a CLI themselves ... ?

Also, your figures confirm one thing about Monkey's: for for CDDA the encoding and decoding CPU loads are now pretty much matching. It used to decode even slower than encoding, and apparently it still does for higher sampling rates (where it also isn't very good, but that is a different topic).

Re: Lossless codec comparison update

Reply #5 – 2023-08-30 12:38:59

Quote from: Porcus on 2023-08-30 08:54:44

Ah, that chart yes. We know that FLAC has improved, and it is possible to see by putting them over one another - although but the fine grain of that is drowned in the scale.

Which of course puts things into perspective: the gains aren't that big, and FLAC is already very fast.

Quote from: Porcus on 2023-08-30 08:54:44

Although http://www.audiograaf.nl/losslesstest/Lossless%20audio%20codec%20comparison%20-%20revision%206%20-%20main.html doesn't say so outright

I've updated the page to explicitly state that:

Quote

64-bit compiles have been used, except for TAK, Shorten, La and MP4ALS, where no 64-bit executable was available.

Re: Lossless codec comparison update

Reply #6 – 2023-08-30 19:14:24

Great, thanks ktf! Would it be possible for you to, in these two figures, let the horizontal axes both start at exactly 0.1%? That way FLAC's curve wouldn't be glued to the vertical axis so much, making it easier to read.

Chris

Re: Lossless codec comparison update

Reply #7 – 2023-08-30 19:52:38

I have also struggled to read where the axis starts yes.

And when I do, I get some surprises:
Is it really so that FLAC -0/-1/-2 encode faster than they decode on high resolution and multi-channel?

Re: Lossless codec comparison update

Reply #8 – 2023-08-31 15:11:04

Quote from: Porcus on 2023-08-30 19:52:38

encode faster than they decode

Why ask when I can reproduce and confirm, ...
Actually, so is the case for TAK as well. It is visible in ktf's charts, and on this computer on high resolution (96 and 192), both TAK -p0, -p1, -p2 and -p3 would encode faster than decoding. Moreover, still on high resolution, -p0 would encode/decode at FLAC -0 speeds, although that is the absence of MD5 calculations.

But it might be the case for TAK -p0 on CDDA too? @ktf , you got the numbers ... ?
I tested it on this computer, and differences are very small - but then, this computer has also small speed differences between TAK -p0 and TAK -p1.

Re: Lossless codec comparison update

Reply #9 – 2023-08-31 19:42:27

Quote from: Porcus on 2023-08-31 15:11:04

Quote from: Porcus on 2023-08-30 19:52:38
encode faster than they decode
Why ask when I can reproduce and confirm, ...
Actually, so is the case for TAK as well. It is visible in ktf's charts, and on this computer on high resolution (96 and 192), both TAK -p0, -p1, -p2 and -p3 would encode faster than decoding. Moreover, still on high resolution, -p0 would encode/decode at FLAC -0 speeds, although that is the absence of MD5 calculations.

Isn't really surprising to me. On encoding, all data is available, and that means you can either do SIMD or use superscalar capabilites better. At least with FLAC, quite a lot of operations are independent. On decoding, you really need to do stuff sequential, because you don't yet have all the data.

On a really FLAC fast preset, there is almost no brute-forcing going on, which means the advantage mentioned above is visible. With the slower presets, there is some sort of brute-forcing, meaning some calculations are not used in compression because it turned out a different route resulted in better compression.

Combine that with the fact that FLAC has seen most optimizations happen for the slowest presets on encoding (i.e. the brute-forcing) and that optimizations for decoding have been much more general (i.e. parsing), and you see why the focus on the fast presets resulted in this.

Quote

But it might be the case for TAK -p0 on CDDA too? @ktf , you got the numbers ... ?
I tested it on this computer, and differences are very small - but then, this computer has also small speed differences between TAK -p0 and TAK -p1.

I didn't. I just hacked my script to give me the raw data. Just the main graphs, not the 96kHz and 5.1 graphs.

Colums are: codec, preset, encoding CPU speed, encoding speed (wall time), decoding CPU speed, decoding speed (wall time), compression

Re: Lossless codec comparison update

Reply #10 – 2023-08-31 22:02:25

Quote from: ktf on 2023-08-31 19:42:27

On encoding, all data is available, and that means you can either do SIMD or use superscalar capabilites better. At least with FLAC, quite a lot of operations are independent. On decoding, you really need to do stuff sequential, because you don't yet have all the data.

That is due to say, a 4096 sample unencoded block is of known length, but the encoded blocks/subframes don't, at least not in FLAC?

Quote from: ktf on 2023-08-31 19:42:27

On a really FLAC fast preset, there is almost no brute-forcing going on

Ah, and that (at least in part) explains also why multichannel -5 speed is pretty much tied between encoding and decoding, because FLAC is bound to dual mono?
Anyway, both -0 and -1 do encode faster than decode. For high resolution also -2 does so, though narrowly, that is with full channel decorrelation strategy. For multi-channel, -3 does, and -5 wall/CPU times fall on opposite signs.

TAK -p0 also encodes faster than decodes for CDDA, and from the plots also for 5.1 and maybe / maybe not for high resolution.

(Of course, for the slower codecs, "decodes slower than it encodes" doesn't impress as much.)

Re: Lossless codec comparison update

Reply #11 – 2023-09-01 11:41:15

Quote from: Porcus on 2023-08-31 22:02:25

That is due to say, a 4096 sample unencoded block is of known length, but the encoded blocks/subframes don't, at least not in FLAC?

No. It is due to all data being available. When encoding, all sample values are known and have to be encoded. A CPU can already start on transforming the next value if processing on a value stalls. When decoding, all sample values are unknown and have to be decoded, and one really needs to finish with a value before you can start on the next.

Superscalar, out-of-order processing doesn't work like multithreading on frame-by-frame basis, instead it essentially mixes instructions. See Superscalar processor and Out-of-order execution. So this only saves nanoseconds instead of milliseconds each time it is invoked, but there are much, much more opportunities to do so.

On the very fast presets, the most time consuming parts of encoding and decoding are MD5 summing and bitreading/bitwriting. MD5 summing is infamous for its very long dependecy chains (so a CPU can't do any reordering). Bitwriting (on encoding) provides more opportunities for superscalar performance than bitreading (on decoding) because there are fewer dependencies between instructions.

Notice