Lossless codec comparison - part 3: CDDA (May '22)

Topic: Lossless codec comparison - part 3: CDDA (May '22) (Read 21580 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Lossless codec comparison - part 3: CDDA (May '22)

2022-05-26 21:12:46

Following comparing the performance of lossless audio codecs on multichannel and hi-res material, here are the results of CDDA material with the same approach.

Report can be found here: audiograaf.nl/losslesstest/Lossless audio codec comparison - revision 5 - cdda.html

The main result of the test, the average of all 64 sources is inserted below:

For more results and an analysis/discussion refer to the report linked above.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #1 – 2022-05-26 23:15:42

Great job as always.

Since I'm too lazy to aggregate the numbers: TAK -p4m does only nearly compress as Monkey's Insane?
TAK is an awesome performer, of course. Also offering the fastest encoder, with -p0, but we knew that.
There are 65 codec-settings tested, 50 of them other than TAK, and among those 50, most will have to see itself dominated by some TAK at all three parameters simultaneously. Well that "most" could be due to the high number of ALS settings tried, seems only -7 beats TAK at compression (there is -7 -p too, which is even slower).

A few other observations, starting at the small file size end:

* LA vs OptimFROG: The frog had a new release between your Revision 4 and this, and now it beats LA soundly - but it is hard to know whether it is due to material?
Anyway, LA still holds the ground on denser material it seems? Epica, In Flames, Iron Maiden, Muse, Status Quo, Prodigy (but not Nightwish. And the Merzbow might be as compressed as reasonably goes.)
But then the FreeSound Ambient source and Dataplex - and the Pokémon. Cannot catch'em all. Or maybe could, had one put enough effort into the development. But LA was ... done and abandoned.
And "Diffuse soundfields" - now that's Frog-Fressen!

* MPEG-4 ALS -7 is like OptimFROG --preset 4 except ten times as slow encoding. And if that isn't enough: -7 isn't even the slowest ALS setting, try -7 -l -p. I don't think -l matters much for -7, but -p can slow down.

* Why Monkey's has to decode slower than it encodes ...? (TTA does that too, and refalac "fast")

* In line with the previously posted results, WavPack -x4h and -x4hh decode faster (if only slightly so) than -h and -hh. Less data to wvunpack?
WavPack's performance doesn't look exceptional at first glance, but the only thing that "beats it at everything" is TAK. (And most of these encoders-settings runs are beaten at everything by some TAK.)

* I get the feeling that TTA's purpose in life is to serve as "the most average everything" in such comparisons ... even though it has the rare feature of decoding slower than it encodes, it still finds itself with a lot of dots both to the left and right and lower and higher.

* FLAC -3 is the fastest decoder, which is a bit strange. Wonder if it is the -l or the -M that makes -4 slower.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #2 – 2022-05-27 02:40:45

Consideration should also be made for which of these are freely available to compile yourself on any platform you should need to use them on. Also which may or may not have optimizations for non-x86 platforms.

I wouldn't mind trying to do a rundown of whichever of these I can get my hands on and force to work on an AArch64 machine. TAK at least *decodes* here, thanks to FFmpeg's implementation of a decoder.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #3 – 2022-05-27 06:12:23

Yes, definitely, but I consider that out-of-scope for this comparison. That is the domain of this Hydrogenaud.io wiki entry, of which I will update the table soon with the results of this test.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #4 – 2022-05-27 10:06:23

There are of course tons of other considerations that could be made.
Including some that are related to speed. Time to a file is actually written? Would depend on implementation and platform. And if you use a NAS with a spinning drive, don't expect to experience anything close to FLAC's decoding speed in practice.

And time until your encode is verified with tag transfer and everything? Sure these considerations are relevant to those who care about performance, but now we are way out into YMMV-land. ktf has stuck to codec CPU load and audio size, not tag chunks and file handling, and that is just fine - and of course it isn't the whole truth and never pretended to be.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #5 – 2022-05-27 17:20:23

To answer the question I was wondering about, the "% of 1 core" refers to an A4-5000 which is a low-end laptop chip from 2013, apparently.

EDIT: And appears to be vaguely around the performance of a Raspberry Pi 4 at 1.5GHz, if it matters

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #6 – 2022-05-28 01:40:00

And, if it matters, the Apple M1 family soundly trounces the Pi 4, and is more likely to be used as a regular desktop or even workstation than a Pi 4. Well, maybe.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #7 – 2022-05-28 07:14:27

First a couple minor editing suggestions:

Quote

Not all codecs appear in all results, for example Shorten and La only support 16 bit per sample sources and WMA does not support samplerates above 96kHz.

Seems to be copy-pasted from hi-res comparison and doesn't apply to CDDA.

Quote

The WAV file is encoded by the chosen codec provided with the required settings. The amount of CPU time required to do this conversion is measured

The encoded file is decoded by the chosen codec. The amount of CPU time required to do this conversion is measured and the resulting filesize is recorded

Shouldn't the resulting filesize be recorded after encoding, rather than decoding?

I also have a question about the CSV file. Fields 1-3 and 8 are obvious. Fields 4-5 seem to represent encoding speed (i.e. reciprocal of cpu-usage) and fields 6-7 seem to represent decoding speed. But what is the significance of each pair? Are they just two separate runs? Does one give average speed and the other the reciprocal of average cpu-usage? Or something else?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #8 – 2022-05-28 08:40:21

Quote from: rompel on 2022-05-28 07:14:27

[...]
Seems to be copy-pasted from hi-res comparison and doesn't apply to CDDA.
[...]
Shouldn't the resulting filesize be recorded after encoding, rather than decoding?

Thanks, fixed.

Quote from: rompel on 2022-05-28 07:14:27

I also have a question about the CSV file. Fields 1-3 and 8 are obvious. Fields 4-5 seem to represent encoding speed (i.e. reciprocal of cpu-usage) and fields 6-7 seem to represent decoding speed. But what is the significance of each pair? Are they just two separate runs? Does one give average speed and the other the reciprocal of average cpu-usage? Or something else?

You are right, I could have made that more clear. The software used for timing returns two values: CPU time and real time (wall clock time). These always differ by a little, I'm not sure how it is possible the wall clock time is sometimes shorter than the CPU-time, I guess that it is caused by some inaccuracy of the timing method.

For the graphs I've only used the first of each pair, which is CPU-time, which I deemed more accurate. I left the PC idle except for running the tests, no network connections were active, anti-virus was shut down and only 1 of the 4 cores was used, so the values shouldn't differ much.

edit: I say CPU-time and wall time, but it's actually audio length divided by either CPU-time or wall-time. In previous comparisons I used to express speed as 'times realtime', and this is still how the script works. Only during graphing this is inverted to 'percentage of CPU used'

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #9 – 2022-05-28 10:52:23

Quote from: ktf on 2022-05-28 08:40:21

You are right, I could have made that more clear. The software used for timing returns two values: CPU time and real time (wall clock time). These always differ by a little, I'm not sure how it is possible the wall clock time is sometimes shorter than the CPU-time, I guess that it is caused by some inaccuracy of the timing method.

Thanks, that makes sense.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #10 – 2022-05-28 15:05:24

Correction to my eyeballing:

TAK -p4m outcompresses Monkey's Insane - by 108 parts per million, that is such a small margin that had it been the other way around, it could maybe/maybe not be tilted by the already-published TAK 2.3.2.

Since I was too curious (and at the risk of posting wrong numbers - @ktf, are they correct?), here are the compression percentages around where we enter the LAnd of FROG:
50.39 OFR --preset 1
49.72 Monkey's Extra High
49.68 TAK -p4
49.66 OFR --preset 2 <-- default!
49.64 TAK -p4e
49.59 Monkey's Insane
49.58 TAK -p4m <-- 108 parts per million below Monkey's Insane
49.39 MPEG-4 ALS -7 <-- making ALS the highest-compressing ffmpeg-decodable codec
48.58 OFR --preset 3 <-- this isn't in line with eyeballing the plot vs ALS -7. Triangles vs circles seem to make for an optical illusion here, or did I get the numbers wrong?

The corpus this time is quite different, obviously. I'd be surprised if a "popularity-weighted" compression ratio is 0.5.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #11 – 2022-05-28 15:06:13

Quote from: Porcus on 2022-05-26 23:15:42

Also offering the fastest encoder, with -p0, but we knew that.

Of note when comparing FLAC and TAK: in this comparison FLAC does MD5 summing (as it does this by default) and TAK does not (as it does by default), if I'm not mistaken. I didn't check, so if default behaviour of TAK recently changed, this is untrue. Calculating an MD5sum at these speeds is about 30% of the work, so it makes quite a difference.

Quote

* Why Monkey's has to decode slower than it encodes ...? (TTA does that too, and refalac "fast")

I am unfamiliar with the inner workings of these codecs, but I would conjecture that it has to do with the bitreader being dependent on the rest of the decoder.

In FLAC for example, the residual can be read without knowing anything about the predictor except its order. So, most FLAC decoders first read an entire subframe before actually calculating any sample values.

If this is not possible because the way the residual is read is dependent on something this means the decoder has to switch back and forth between reading bits and calculating sample values. These context switches can be rather expensive, especially when it messes with the CPU's branch prediction.

The encoder does not have this problem, because it already knows all sample values so it can defer writing bits until it is done creating a frame. I read something about Monkeys Audio having an adaptive predictor, so this explanation would make some sense at least.

TTA also has an adaptive predictor, I don't know about ALAC.

Quote

* In line with the previously posted results, WavPack -x4h and -x4hh decode faster (if only slightly so) than -h and -hh. Less data to wvunpack?

Considering WavPack's -x4 modes decompressing faster, @bryant was surprised so see that too. Because of that, I won't dare give an hypothesis

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #12 – 2022-05-28 21:41:06

Quote from: ktf on 2022-05-28 15:06:13

Quote from: Porcus on 2022-05-26 23:15:42
Also offering the fastest encoder, with -p0, but we knew that.
Of note when comparing FLAC and TAK: in this comparison FLAC does MD5 summing (as it does this by default) and TAK does not (as it does by default), if I'm not mistaken. I didn't check, so if default behaviour of TAK recently changed, this is untrue. Calculating an MD5sum at these speeds is about 30% of the work, so it makes quite a difference.

You are right, TAK does not by default.

BTW: I am so grateful for your work! It is invaluable to have a trustable external source to check my own findings against (danger of routine-blinded). And often inspirational. The same is true for member Porcus. Sorry if i don't reply more often. My bad english skills are a severe impediment. Could have written this text much better in german in 1/10 of the time.

Regarding the speed i would like to make one remark:

If the cpu of the test system really is an A4-5000: The most time consuming function in the encoder is performing lots of SSE2/SSSE3 multiplications. This particular cpu can start one each 2 clock cycles. But in between i am using the instruction palignr, which usually is very fast but on this cpu takes 19 clock cycles to execute. And since the next mutiplication has to wait for the result of palignr, you have only a fraction of the usual speed, about 1/3 to 1/2 i would estimate. This regards to this function, overall the effect is considerably smaller. The decoder is not affected.

On the other hand: Maybe other codecs are similarly affected. Who knows. But the risk of distortions of the speed results seems to be bigger if a non-mainstream or outdated cpu is used.

I wouldn't have mentioned it, but if we talk about speed variations of 30%, this possibly should be taken into account. The A4-5000 is one of the 2 cpus i know of where Taks optimizations don't work well. The other is the Pentium 4. TAK 2.3.3 (which will be released soon) will also drop some optimizations for the good old Core 2 Duo line. More exact: It will do some things in the way more recent cpus like better. I usually take mainstream cpus of the past 10 years into account.

Interpretation of the results

My quick analysis based upon looking at the diagrams, not at the data. I am particularly interested into the performance of Mp4Als and OptimFROG.

Mp4Als, because it is so similar to TAK, but can use up to 1024 predictors (if i remember this right) compared to TAK's 160. I found 16 files where Mp4Als is at least 0.5 percent stronger than TAK. In any case there is also a clear advantage of TAK's preset 4 over 3, which are using 160 vs. 80 predictors. Imho strong evidence that the advantage of Mp4Als is based upon the higher predictor count.

This means, that Tak should easily (in terms of programming effort) achieve similar results by simply increasing the predictor count (tried up to 512 some time ago). But this would definitely make encoding and decoding considerably slower. And i suppose that this file set is unusual high predictor order friendly. Maybe i will try what can be achieved by a moderately higher predictor count (256, 320 or 384).

OptimFROG obviously is the measure of all things (compression wise). I am long used to see it perform 1.5 to 2 percent better than TAK. I see two primary reasons for it's better performance:

1. It's using a different method (than TAK, FLAC and Mp4Als) to calculate the predictor coefficients, that is often advantegous.

2. It seems to take advantage of 'holes' in the sample frequency distribution, caused by amplification by a constant factor or non uniform quantization (12 bit DAT as source e.g.). This can have huge effects. I have seen savings of more than 10 percents!

Some speculation:

I suppose 5 files are affected by the second Factor, for instance "Alela Diane - The Pirate’s Gospel".

Pattern:

2 or more percent better than any other encoder including LA.
LA seems to be particularly important in this context, because it is usually close to the higher end of OptimfROGS graph. I suspect it is similar to OptimFROG regarding Factor 1, but lacks Factor 2.

Quite speculative. But feels right.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #13 – 2022-05-28 22:12:26

Quote from: TBeck on 2022-05-28 21:41:06

Mp4Als, because it is so similar to TAK, but can use up to 1024 predictors (if i remember this right) compared to TAK's 160. I found 16 files where Mp4Als is at least 0.5 percent stronger than TAK. In any case there is also a clear advantage of TAK's preset 4 over 3, which are using 160 vs. 80 predictors. Imho strong evidence that the advantage of Mp4Als is based upon the higher predictor count.

Yes, but TAK -p3 beats ALS' -a -b -o1023, where -o1023 is the predictor order ... or the max predictor order. That is the second-to-highest ALS setting used in the test.
So either the high prediction order is not the explanation - or, in case -o1023 only sets the maximum, it could be that ALS selects lower orders. The "-a" switch allows for adaptive predictor order, but I don't know how that matters.

It seems that the max framesize is 65536, Maybe that - in combination with the predictor order? - is what makes the "-7" performance, both for good and bad?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #14 – 2022-05-28 22:53:03

Well, i just looked at the source code (i am a bit tired, errors likely), and it seems as if -7 translates to
-a
-b
-l (Lsb check, wasted bits)
1023 (1024 ?) predictors
Frame size: 20480 samples (per channel)

What might be different ist the option -g, which is 0 by default and set to 5 (the possible maximum) by -7.

It is called 'Block Switching Level'. Once i knew what this meant. Not perfectly now, but i seem to remember that this controls, how hard the encoder evaluates if it is advantegous to split the frame into 2 or more sub frames with individual parameters e.g. predictors. With a frame size of 20480 it is definitely necessary to perform this to achieve good results beacuse the signal parameters are likely to change in 464 ms of audio data. I suppose, -g0 means "turn it off". A bad choice that can perfectly explain the comparitively bad performance of the second-highest setting.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #15 – 2022-05-29 03:27:05

@ktf I would like to echo the others and shout out a huge thank you for these tests; it is great to have such a complete and up-to-date resource available! In particular I appreciate your taking seriously my suggestion to represent WavPack as essentially two separate codecs (it certainly does make the results easier to understand).

Quote from: ktf on 2022-05-28 15:06:13

Quote
* In line with the previously posted results, WavPack -x4h and -x4hh decode faster (if only slightly so) than -h and -hh. Less data to wvunpack?
Considering WavPack's -x4 modes decompressing faster, @bryant was surprised so see that too. Because of that, I won't dare give an hypothesis

I have now verified what’s going on with these. Without the extra mode, WavPack makes a fixed number of decorrelation passes over the audio, and this number increases with mode (e.g., "very high" is 16 passes). The -x4 mode uses a recursive search method on each frame of audio to generate an optimum sequence of filters up to the maximum number of passes for that mode. Usually each additional pass will improve the results, but in some cases we reach a plateau where an additional pass provides no improvement. In those cases we terminate the process and the resulting frame will decode with fewer decorrelation passes.

One obvious place this occurs is compressing silence (where any decorrelation is wasteful) and this explains why this effect is more pronounced in the previous multichannel test where long runs of silence in some tracks is probably common. In the CDDA test the effect is less pronounced, but a quick check showed it greater in classical and solo instrumental tracks, and particularly on the Jeroen van Veen solo piano album. I had something similar handy and tried compressing that with -hh and -hhx4 and sure enough got a 10% difference in decoding speed. To verify that this was from truncated decorrelator passes I examined the frames statistically and found that over 40% used less than the maximum 16 passes, and in fact almost 15% used 10 or less (which is the fixed count for “high” mode). So that explains that...

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #16 – 2022-05-29 15:10:29

Quote from: TBeck on 2022-05-28 21:41:06

Regarding the speed i would like to make one remark:

If the cpu of the test system really is an A4-5000: The most time consuming function in the encoder is performing lots of SSE2/SSSE3 multiplications. This particular cpu can start one each 2 clock cycles. But in between i am using the instruction palignr, which usually is very fast but on this cpu takes 19 clock cycles to execute. And since the next mutiplication has to wait for the result of palignr, you have only a fraction of the usual speed, about 1/3 to 1/2 i would estimate. This regards to this function, overall the effect is considerably smaller. The decoder is not affected.

On the other hand: Maybe other codecs are similarly affected. Who knows. But the risk of distortions of the speed results seems to be bigger if a non-mainstream or outdated cpu is used.

I wouldn't say this one wasn't mainstream when it was on sale, but I see your point. The past two revisions were done with an AMD A4-3400, revision 2 was the last one on an Intel CPU. If I compare revision 2 with revision 3, which is TAK 2.2.0 with TAK 2.3.0 and FLAC 1.2.1 with FLAC 1.3.0. Preset -p0 should decode 29% faster and encode 44% faster on TAK 2.3.0, according to the changelog. FLAC 1.3.0 had no noteworthy speed improvements over 1.2.1. Comparing TAK -p0 with FLAC -0 as baseline between revision 2 and 3, encoding is 22% faster and decoding is 30% faster. So yes, it seems moving from the old Intel Core2Duo T9600 to an AMD A4-3400 gave TAK a penalty of 20% on encoding compared to FLAC but no penalty on decoding.

Anyway, there is a bit of reasoning behind choosing this old, slow CPU.

First, it makes timing easier. When timing FLAC or TAK decodes one runs into limitations of measuring CPU time in Windows on modern CPUs combined with short tracks. This is less of a problem on Linux, but not all codecs run there of course. Maybe I take a newer CPU next time and downclock it by a lot.

Second, it lessens the advantage of using the latest CPU extensions like AVX2, FMA, BMI. I feel this comparison would be too x86 focussed if these were included, but especially with Apple moving to ARM and lots of playback devices using ARM hardware it seems wrong to include AVX2 and beyond. NEON, which is available in a lot of ARM hardware, is AFAIK very comparable to the full SSE stack, but with the advantage of having many more registers. So, the A4-5000 CPU is one with all the SSE extensions, but its AVX implementation is crippled (read: is being split up into SSE by microcode).

With these limitations in mind, this A4-5000 was the only thing I had lying around somewhere. If you have recommendations for a next revision (if there ever will be any) let me know. I just wanted to show I really put some thought into this.

edit: BTW, with Microsoft also doing stuff on ARM, I wouldn't be surprised if a next revision would be better done on an ARM CPU. But then again, seeing lots of programs today still cling to 32-bit x86...

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #17 – 2022-05-29 15:55:15

You are totally right and i wished i hadn't written this part. Actually i thought about removing it. But this would have been a bit cowardly. I did a mistake.

I really didn't want to influence your great work in a way that would make TAK look better although it might look like that.

That's not my way. If i am concerned about TAK beeing too slow i try to optimize it.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #18 – 2022-05-29 16:19:24

Quote from: TBeck on 2022-05-29 15:55:15

You are totally right and i wished i hadn't written this part. Actually i thought about removing it. But this would have been a bit cowardly. I did a mistake.

I thought it gave a nice insight into the problems associated with writing assembler by hand. In FLAC there is assembler being removed in favor of intrinsics accelerated or even plain C code. I'm doing a lot of profiling and finding that assembler that was presumably once much faster than C is now as fast or even even slower, because CPUs have changed and compilers have gotten much smarter.

One of the most interesting things I came across was the one documented here: https://github.com/xiph/flac/pull/347 I compared 3 variants of C code with intrinsics + assembler on 3 different compilers. The differences between the 3 variants of C code were huge, even when seen on all 3 different compilers, while the fastest C code was either slightly faster of slightly slower than the intrinsics + assembler approach. Combined with that the potential for security problems increase with the amount of code, I decided to remove the intrinsics + assembler approach.

Anyway, seeing that FLAC is made to be compiled on a whole bunch of different compilers (GCC, Clang, Visual Studio, ICC) on different platforms (Windows, Linux, Mac) and different architectures (x86, ARM, PowerPC, MIPS etc) there is an obvious benefit to finding which C code can be best optimized instead of doing that optimization by hand.

Quote from: TBeck on 2022-05-29 15:55:15

I really didn't want to influence your great work in a way that would make TAK look better although it might look like that.

One could also say my choice to not include AVX2 and beyond would be driven by FLAC not being able to make particularly much use of these extensions. I think that is not the case (FLAC does use AVX2 on quite a bit of functions after all) but I am most certainly biased. I do my best to suppress it for this comparison, but I would be foolish not to acknowledge it. Therefore I take all comments on this matter to heart.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #19 – 2022-05-29 18:44:31

I think it is well in place to discuss the topic of CPUs and architecture - and it is also enlightening to those who don't know coding and compilation. It isn't so that time is codec complexity divided by the CPU Hz rating. Computational power is more than that, utilizing hardware takes effort, different developers prioritize different hardware platforms - and that affects the numbers.

I think it is a healthy lesson to those of us who aren't able to run into more than a basic syntax error (see what I did there?): This is a formidable job with tons of information into it; it is still far from the whole story, but even if you cannot read a log axis there is a big picture to be taken home that - the first axis is on a log scale! - wouldn't change by a slowdown in the tens-of-percents range.
We can get an idea what it takes to squeeze another half a percent out of the hard drive.
We can know what codecs have a low CPU footprint - within the lossless realm, actually I am curious what it takes to encode this corpus as MP3 and decode, for example.

Quote from: ktf on 2022-05-29 15:10:29

If you have recommendations for a next revision (if there ever will be any)

For that I recommend FLAC 1.4 with all the development still not committed, TAK 2.4 with 7.1 support and a sound loud "no WMAL because if you haven't picked up why you should get rid of it, here is why:" ... :-)

And a new CPU because well now you have done the old. Optionally a brief anchoring up just a couple of presets on the old CPU to see how new codec version compares (say, FLAC -0 -5 -8, TAK -p0 -p2 -p4m should be enough for that?)

Question: if rather than downclocking you would just let the timer measure say 4x runs, what then? If there are some strange effects, which are what users would experience in real life, is that wrong?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #20 – 2022-05-29 20:07:10

Not comparing AVX2 is also relevant to ARM compatibility layers, or at least Apple's, since they support recompiling SSE up to 4, but no AVX2, not even original AVX.

Oh yeah, and if you thought you'd be clever and use x87 math opcodes, since those still work on x86_64, Rosetta 2 punishes you horribly: They compile to soft float to retain full 80 bit precision. Windows on ARM recompiles these to scalar math opcodes instead, dunno if they bother to do soft float if the app actually switches to 80 bit precision from the default of 64 bit. Or if they switch to 32 bit precision if the app switches to it.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #21 – 2022-05-30 10:30:05

Quote from: Porcus on 2022-05-29 18:44:31

Question: if rather than downclocking you would just let the timer measure say 4x runs, what then? If there are some strange effects, which are what users would experience in real life, is that wrong?

The problem here is the precision of timing I get from the timer64 program I use. The precision is limited to steps of 15.625ms (see here) and when I take a 30 second audio sample like the samples in the Rarewares sample collection and decode that at 700x speed on an modern CPU, thats only two ticks. It sometimes even happens to register zero ticks (presumably because of a context switch during execution or something), so the range is about 0 to 3 ticks. It is rather difficult to get to a sufficiently low uncertainty, that needs a lot of runs.

Anyway, I've tried to find a more accurate replacement for timer64.exe several times, but with no result. So, we're hitting a wall here: FLAC (and TAK) decoding is too fast for Windows to measure on recent CPUs.

Perhaps the better solution is to test files as album-per-file instead of as track-per-file. This has one downside though: currently I test all codecs and settings on one file at a time. This has the advantage that if for some reason some background process decides to do something and mess with the test results, that's only one track of the album affected, and the problem can still be 'averaged out' by the processing of the other tracks of the album. When doing album-per-file, this is no longer possible. Maybe this is not a problem because we're running tests from a ramdisk anyway, so I/O activity shouldn't mess with the testresults, and I have been overly cautious.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #22 – 2022-05-30 11:12:57

Quote from: ktf on 2022-05-30 10:30:05

if for some reason some background process decides to do something and mess with the test results, that's only one track of the album affected, and the problem can still be 'averaged out' by the processing of the other tracks of the album. When doing album-per-file, this is no longer possible.

Hm, is there any reason that a 0.01 second delay doesn't aggregate up to 0.01 second on the album total whether it happened in track 5 of 10 or it happened midway in the image? If you average by just adding times, it shouldn't matter? (It seems you just did two plain runs with no other sanity checking than MD5ing?)

How about this, with a fast processor rather than downclocking it - assuming you have enough of a RAM disk, and you probably do:
* Full album. For a very short album (/"album" like the rarewares signals ... is that short?) - just duplicate it, making for double number of tracks. You run it unweighted-by-album, so that doesn't matter.
* By using a fast processor, you can run it more times. E.g. four.
* Delete the two slowest (if there are background processes delaying up to two, they won't show up in the results)
* Sanity check the remaining two (say, < 10 percent time difference?) and average them. (Or maybe even: Pick the lowest, decide that that is codec performance and everything else is "distortion".)

By the way, to the .csv file, did I understand correctly
* First two numerical columns are encoding "speed", computed by 1/time , two runs?
* Next two: same except decoding?
* Size percentage ... but computed how, you are just taking filesize / .wavfilesize? (Means, file headers are in, and that likely includes WAVE headers stored for the "Wave file compressors" (I don't even think Monkey's can turn that off?) - and possibly padding for FLAC?)

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #23 – 2022-05-30 11:49:53

Quote from: Porcus on 2022-05-30 11:12:57

Hm, is there any reason that a 0.01 second delay doesn't aggregate up to 0.01 second on the album total whether it happened in track 5 of 10 or it happened midway in the image? If you average by just adding times, it shouldn't matter? (It seems you just did two plain runs with no other sanity checking than MD5ing?)

Let's assume some background process starts doing something for 10 minutes. I don't think this will happen, as I've disabled internet connectivity and anti-virus, but I was being paranoid. If during those ten minutes all tracks of one album are processed for FLAC, all of these are 'tainted'. If however, one one track of an album is processed for FLAC, ALAC, WavPack, Monkey's then the disturbance is spread out over several codecs and a lot of settings, instead of being concentrated for one codec.

But, as I said, maybe I was just being paranoid.

Quote from: Porcus on 2022-05-30 11:12:57

By the way, to the .csv file, did I understand correctly
* First two numerical columns are encoding "speed", computed by 1/time , two runs?
* Next two: same except decoding?
* Size percentage ... but computed how, you are just taking filesize / .wavfilesize? (Means, file headers are in, and that likely includes WAVE headers stored for the "Wave file compressors" (I don't even think Monkey's can turn that off?) - and possibly padding for FLAC?)

For what columns mean, see https://hydrogenaud.io/index.php?topic=122508.msg1011260#msg1011260 Size is simply compressed file size divided by WAV file size. So, including any headers, padding, metadata etc. The last columns is either lossless or not lossless depending on the result of the MD5 check (once more, being paranoid here)

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #24 – 2022-05-30 23:13:27

Curious little fact: The WavPack "x4" decoding speed artifact was visible, but zooming in on the fine detail, it happens to TAK as well. Nothing you would notice in practice, but:

All the -p#m decode faster than the corresponding -p#e and the corresponding -p#. Totalling up the 0,...,4, "m" decode 0.3 percent faster than "e"
Of the five -p#e, three decode faster than the corresponding -p#. Totalling up to, well, only 0.06 percent faster.

Notice