Lossless codec comparison - part 3: CDDA (May '22)

Topic: Lossless codec comparison - part 3: CDDA (May '22) (Read 14813 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #25 – 2022-05-31 15:53:00

Also, given that an iPod Video (dual-core ARMv4 at 80 MHz) running Rockbox as it existed circa 2010 can decode TTA, ALAC, and some Monkey's Audio files in realtime (as well as FLAC at about 6x realtime) a lot of this is basically a rounding error on anything approaching a modern CPU.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #26 – 2022-06-01 00:15:48

Oh, a lot of this isn't noticeable in practice - or, dwarfed by other phenomena, like drive write speed and how much time the software spends doing tag transfers afterwards, for that matter.
But if you use Monkey's Insane (well, Matt Ashland called it "insane", you were warned!), you might have to wait a while for transcodes, and for scanning for ReplayGain or dynamic range or maybe acoustic ID for tagging, or for bitcomparisons ... if you do many at once.

(I have asked before if battery life is much affected anymore ... maybe it isn't? Back in the Rockbox days, you would enjoy more battery life using FLAC than using MP3.)

Filesize isn't as much of a concern anymore either - unless your SSD or memory card is running full. If you don't have any more lossless files than can fit on your work laptop (or your phone!) you have an entire extra backup that way.

But the sport is still fun to watch.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #27 – 2022-06-05 11:28:26

So how much of the news is the change in corpus? Of course it moves all sizes downwards, but it also made TAK overtake the ape.

Some of the codecs have improved their compression (like FLAC), some do not - a Monkey's file at -cN000 is the same no matter what version. WavPack seems pretty much the same (slightly bigger due to the frame-checksums that make for fast verification, countering the very small mono-as-stereo part of the corpus) - and looking back at the TAK 2.3.1 and 2.3.2 release notes and tests, the size differences are very small for CDDA, we are talking a hundredth of a percentage point or so.

So - by rough eyeballing, not precise calculations - this is a bit on how the corpus matters:
* ktf's Revision 5 corpus compress to slightly below 50 percent size for -p4m and Insane. "Revision 4" compressed to around 54. So this corpus is around 8 percent (not points!) "more compressible".
* Revision 4: TAK -p4m produced files about 1 percent larger than Monkey's Insane. In Revision 5, it narrowly beats every Monkey. So the change in corpus has made for
-> 9 percent smaller files on TAK -p4m
-> 8 percent smaller files on Monkey's Insane.
Question arises, is any of those figures off compared to the rest? WavPack, ALAC and LA (all the default settings) make for slightly above 8 as well - again by eyeballing. And the codecs don't happened to have changed much. Also, presuming that revision 4 has a WavPack -h in there (it quotes four WavPack settings but has tested six!), we can also make sense out of what happens between WavPack default and -h:
-> Like Monkey's, TTA makes the same files no matter what version (including using ffmpeg!); it was around TAK -p0m, slightly better than any FLAC, between WavPack default and -h which in turn was around Monkey's fast; now overtaken by TAK -p0whatever and FLAC -678 (FLAC has improved!).

Which does indicate that well, the revision 5 corpus is "more TAK friendly". But
* That does not mean this is a "biased towards TAK corpus"; it could be that it is less biased than the the narrower revision 4 corpus.
* The "only" thing that makes for suspicion towards this corpus is that it is broad and unweighted and brings file size down to the 50 percent mark, and that looks a bit low - but that is viewed through the biased eyes of a metalhead.
* Is it by much? ~ 8 percent smaller files for everything, with ~8 percent-of-those-8-percent better for TAK? Let's stop here and think for a little: how much faster is running a 100m flat on 9.92 vs 10.00? Quite a lot if you are chasing olympic medals, pretty much nothing for practical purposes. The reason it is striking is that Monkey's was the to-go high-compression codec of 2005, and "whoah, you beat Monkey's Insane" with a fast asymmetric codec is still raising eyebrows.

All this on CDDA. The success of WavPack -hx4 on hi-res shows it is a slightly different animal. For example.

And, concerning this "noticeable in practice": I see from my TAK 2.3.1 testing that it would decode like 20 percent faster than 2.3.0 even when reading from spinning HDD. Whether that is "noticeable" ... up to opinion.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #28 – 2022-06-07 11:05:42

Quote from: Porcus on 2022-06-05 11:28:26

So how much of the news is the change in corpus?

I guess that's hard to tell in general. I could have chosen to take the same corpus as last time to be able to compare, but that would propagate any advantages certain codecs have on certain material. I instead chose to make a 'fresh' start.

One could argue the inclusion of the single instrument material and the number of orchestral sources provide an advantage to codecs that do well with tonal content, and the addition of exotic material (like the chiptune and microsound sources) provide an unfair advantage to certain codecs that just happen to perform well there. I think I made a balanced corpus including a wide variety of material one would want to losslessly encode, not just music. If we would continue to compare codecs based on sources that were likely similar to the ones used in tuning the codec, we won't learn anything new.

Also, it seems reasonable to assume asymmetric codecs are better suited to deal with material that is different than the usual. See for example Ryodi Ikeda - Dataplex, which I think is very different from what would most people would consider music. Wavpack -x4, FLAC, ALAC, TAK, Shorten and ALS, all asymmetric codecs, do much better compared to symmetric codecs like WavPack, Monkey's Audio, TTA and LA. OptimFROG, being a bit of both, also very well here. That's why I think it is important to include material that is off the beaten path. I didn't do that (as much) in the previous revisions.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #29 – 2022-06-25 00:10:44

Quote from: ktf on 2022-06-07 11:05:42

Quote from: Porcus on 2022-06-05 11:28:26
So how much of the news is the change in corpus?
I guess that's hard to tell in general. I could have chosen to take the same corpus as last time to be able to compare, but that would propagate any advantages certain codecs have on certain material. I instead chose to make a 'fresh' start.

* Hm, since I am too lazy to check: Is the Revision 4 corpus a subset of the Revision 5, or did you also remove signals?

* Also, since I cannot check: the Pokémon album you have both as hi-res and as CDDA. Are they ... what we in music would speak of loosely as "the same mastering" (in that the difference would be very low volume if the 192 were carefully resampled to 44.1)? The results are a bit different between 192 and 44.1

* What was the recording chain of your diffuse sound fields recording? Is there any suspicion about that phenomenon "2" that TBeck points out in reply #12?

* I looked at the CDDA results the other day with fresh eyes, and there are some "unexpected" (well maybe not after Revision 4): In line with what you point out about where asymmetric codecs shine, it seems that the symmetric benefit at "denser" music. (Also that is how LA benefits vs OptimFROG?). But then at the most noisy, this Merzbow track fools the ape - and so also with the Merzbow album you included, although it isn't the least compressible in your corpus.
... it wouldn't be hard to visualize, if one is bothered to do the work: sort the albums by compressibility (say averaging some codecs at their max setting to order the signals) and present how TAK -p4m vs Monkey's Extra/Insane do against each other, say per decile/quartile. And what frog level it takes to beat them (and LA!).

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #30 – 2022-06-25 11:11:49

Quote from: Porcus on 2022-06-25 00:10:44

* Hm, since I am too lazy to check: Is the Revision 4 corpus a subset of the Revision 5, or did you also remove signals?

I started mostly from scratch, only adding sources from revision 4 when I couldn't find better alternatives. Most revision 4 sources did not return in revision 5.

Quote

* Also, since I cannot check: the Pokémon album you have both as hi-res and as CDDA. Are they ... what we in music would speak of loosely as "the same mastering" (in that the difference would be very low volume if the 192 were carefully resampled to 44.1)? The results are a bit different between 192 and 44.1

They were generated from the same programs with the same emulator but with different settings.

Quote

* What was the recording chain of your diffuse sound fields recording? Is there any suspicion about that phenomenon "2" that TBeck points out in reply #12?

Sounds were recorded with a Zoom H4n and normalized in Audacity. So it it highly likely that OptimFROG benefits from the mentioned holes, yes.

Quote

it seems that the symmetric benefit at "denser" music.

Could you give an example, I don't see that in the data.

If you take The Ambient Visitor, Bobby McFerrin and Bach, which compress very well, I see a difference between FLAC -8 and OptimFROG max of about 4%-point, or about 10%. Looking at Jeroen van Veen that is about 6% point or > 20%. However Skrillex and Merzbow see about 3% point difference, but because these compress much less, this translates into a difference of only about 4%. So, I think I see the opposite trend?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #31 – 2022-06-25 14:08:14

Actually it could be a point to distinguish "CDDA" between on one hand "CD rips" as in "music from commercial CDs or 44.1/16 stereo downloads" and "other sources converted to 44.1/16".

As for this:

Quote from: ktf on 2022-06-25 11:11:49

Quote
it seems that the symmetric benefit at "denser" music.
Could you give an example, I don't see that in the data.

If you take The Ambient Visitor, Bobby McFerrin and Bach, which compress very well, I see a difference between FLAC -8 and OptimFROG max of about 4%-point, or about 10%. Looking at Jeroen van Veen that is about 6% point or > 20%. However Skrillex and Merzbow see about 3% point difference, but because these compress much less, this translates into a difference of only about 4%. So, I think I see the opposite trend?

Well if you were to take white noise, all would evaluate to 100 percent, and you could then say it benefits the asymmetric because they are generally not so efficient and here they are equal. So that is a point.

What made me suspicious about this - and might have caught me in major confirmation bias! - is the observation that
* revision 5 has smaller files (better compression)
and
* TAK overtakes the insane ape and FLAC overtakes TTA.
So what I based my impression was how a couple of pairs that generally measure about the same do vary: high-setting TAK/Monkey's score compare to each other. Also TAK vs OptimFROG default and high-setting FLAC compared to TTA, but I have to admit I had an eye constantly on TAK/ape.

Anyway, from alphabet down, disregarding those which end up "close to 50", and the oddball signals. "*" seems to confirm my perception, "-" against it, "." well uh.
* Animals as Leaders: TTA and Monkey slightly better than FLAC and TAK
- Alea Diane: counter, against my perceived observation yes
. Krauss: would be a "*" but is maybe too close to 50 and default frog doesn't shine
- Vivaldi:
* Apocalyptica: TTA and Monkey slightly better than FLAC and TAK
. Berlage: counter for TTA, but TAK beats ape.
. Bert Kaempfert: TTA and Monkey slightly better than FLAC and TAK, but this is close to average
* McFerrin: TAK beats ape and default frog
. Cavallro: close to 50
* Coldplay. Although default frog does not shine, Monkey's soundly beats TAK and TTA beats FLAC
* Confido Domino Minsk. Well TAK doesn't win by much, but this isn't TTA's fave
*? Daft Punk. Maybe the ape blinded me.
skipping some odds and near-50
* Dvorak: frog and TTA not happy
. Epica: I had this as "*" because of the ape, but ... ah maybe not the others
* Equilibrium: look at ape and TTA
* Fanfare Ciocarlia: ditto but to a lesser extent
*? Fatboy Slim: maybe unfair to judge this based solely upon how well the frog fares
- Flanders recorder.
*? Fors/Bjelland. TTA does not agree, it likes this piece.
. FreeSound ambient: odd signal, I scrolled past
* Verdi: TAK beats default frog, FLAC narrowly beats TTA
- Horacio Vaggione disagrees with me
. and so would Gotovsky do except the frog isn't overly happy
. Stravinsky ditto
*? In Flames. I had this as an example due to the ape, but comparing all three I was maybe ...
* Powerslave. Of course I paid attention to what happens to that album ...
.? Jean Guillou. Would be a - hadn't it been for the frog.

That was page 32 of 64, looks like a place to stop.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #32 – 2023-04-02 15:25:49

I'd like to add some more results (Hydrogenaud.io exclusive

) addressing concerns raised here about the CPU used in the tests.

I've rerun the tests with two different CPUs, for exactly the same codecs (same versions, same compile, same executable etc.) so we can compare. Took me a few weeks.

Results for the AMD A4-5000 as published before. Launched in 2013. The architecture of this CPU is actually very mainstream, it has been used in the XBox One, XBox One S, XBox One X, Playstation 4, Playstation 4 Slim, Playstation 4 Pro and in a line of desktop APUs. As far as I know, the architecture can be categorized as rather simple, with AMD having cut complexity (sacrificing IPC) in favor of having a smaller die and more cores.
Intel Celeron N5105. I bought this extra efficient CPU this year, launched in 2021. This architecture is used in low-power Atom, Celeron and Pentium processors. This architecture has been further developed into Gracemont, which is used for the E-cores of Alder Lake and Raptor Lake Intel processors.
Intel Core i3-10100T. Launched in 2020, I borrowed this computer from a friend. This CPU is made with the last iteration of the Skylake architecture, which has been the workhorse architecture for Intel from 2015 until 2020. This should be most representative of current CPUs.

I'd say the AMD is the simplest architecture (with the lowest instructions per clockcycle), the Celeron is one step above that, and the Core i3 is the most complex (with the highest instructions per clockcycle). The tests on the Celeron were run through wine, on the Core i3 on Windows 11. WMA Lossless wasn't tested on the Celeron, because that codec is part of the OS and thus not available on wine. Otherwise, testing is pretty much the same as described here.

The Celeron has all SSE instruction set extentions available but no AVX. The AMD has AVX, but I don't think, contrary to AVX2, is used by any codec. The Core i3 has access to all SSE, AVX, AVX2 and FMA.

At first glance, the results are very similar. No enormous shifts in codec performance. When looking closely, I noticed some differences however.
- When comparing decoding performance of TAK and FLAC, it seems the AMD and the i3 show the same pattern (FLAC is clearly 10-20% faster than TAK) but on the Celeron, some TAK presets are faster than the slowest FLAC presets. I'm can't think of a good reason why.
- It seems decoding of TAK, FLAC, ALAC and the fastest presets of MP4ALS is quite a bit faster compared to WavPack, Shorten, TTA on the Intel CPUs than it was on the AMD CPU. Perhaps predictive codecs profit more from improved instruction reordering and superscalar performance than transforming codecs. This doesn't explain the behavior of Shorten, maybe that has to do with Shorten not using SSE instructions at all or being a 32-bit binary.
- Similarly, encoding of FLAC seems to be faster compared to other codecs on the Core i3. For most codecs this is probably because FLAC uses AVX2 instruction set extensions while the others do not, but this is not the case for Monkey's Audio. Perhaps FLAC makes more use of AVX2 than Monkey's does.

Of course, feel free to compare and remark.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #33 – 2023-04-03 00:02:42

Quote from: ktf on 2023-04-02 15:25:49

maybe that has to do with Shorten not using SSE instructions at all or being a 32-bit binary.

Hm, don't know if this is clearly stated, actually: did you use 64-bit executables whenever available? You state that ALAC is done with refalac64, apart from that I don't see the information.
According to TBeck, TAK 64-bit is slower and has no advantage other than when 64-bit is needed.

Anyway and anyhow, this shakes up the truism that the only thing possibly decoding faster than FLAC is another FLAC. Indeed, on the Celeron, TAK -p0m beats FLAC -7 and -8 at both size and encoding speed and decoding speed simultaneously, and even if you have to cherrypick a CPU to get to that result and run FLAC with MD5 summing and TAK without (yes?), that does earns TBeck another bragging right ...
... which, true to HA history, should have been posted one day earlier, hadn't even @TBeck himself failed at keeping up with that

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #34 – 2023-04-03 06:07:06

Quote from: Porcus on 2023-04-03 00:02:42

Quote from: ktf on 2023-04-02 15:25:49
maybe that has to do with Shorten not using SSE instructions at all or being a 32-bit binary.

Hm, don't know if this is clearly stated, actually: did you use 64-bit executables whenever available?

I used 64-bit whenever available, TAK didn't have that back then. So 32-bit executables for TAK, Shorten, MP4ALS and LA, 64-bit for FLAC, WavPack, Monkey's Audio, OptimFROG, TTA and ALAC. So probably Shorten not improving while other predictive codecs has to do with SSE, not with being 32-bit or 64-bit I guess.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #35 – 2023-04-19 13:38:25

Quote from: ktf on 2023-04-02 15:25:49

I've rerun the tests with two different CPUs, for exactly the same codecs (same versions, same compile, same executable etc.) so we can compare. Took me a few weeks

I'm now running the encodes/decodes on yet another CPU, on Windows 7 and on Linux through wine, to establish whether wine adds any noticeable overhead. CPU is an Haswell i7 by the way. Results should be very close to the i3 tested previously.

Looking back at the previous results, I wonder which encoders and decoders are memory bound, and which are CPU bound. I am considering running them again on Windows 7 on the lowest CPU clock (800MHz instead of 2400MHz) to see what the differences are. I would think that CPU bound processes get 3x as slow and memory bound processes do not get any slower.

Any thoughts?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #36 – 2023-04-20 10:23:59

You might be hard pressed to find a memory-bound lossless audio codec on modern x86. Your haswell likely has 6/8MB of L3, the latest generations have more (intel 13100/13600k/13900k has 12/24/32, Zen2 and up has 32 per CCD, Ryzen 5800X3D/7800X3D has 96). intel is at 1.25MB of L2 per core with 13th gen, AMD is at 1MB L2 per core with Zen4.

Point being they can all fit an entire frame of normal-sized input fully in L3 cache if not L2, it would take working data an order of magnitude or more bigger than the input to come close to overflowing L3 to the point that a pass over the working data requires a chunk of memory reads. If a pass doesn't require memory reads we're left with IO as the main memory interaction, which all codecs have to contend with and realistically IO is disk bound before memory bound. So, which codecs have very large working data?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #37 – 2023-04-20 10:52:16

I might have gotten the concepts wrong, but are we asking how much unencoded PCM can fit in "one MB to some MB" L2 or L3 memory, and comparing that to codec frame size?
Disregarding that for a 16 bit signal, a mid-side frame might require a 17 bit word (is that in practice a 32-bit word, taking the memory of a full stereo?): a mono channel of CDDA is at the rate of 86-ish KiB/s; I think Monkey's insane has frames > 25 seconds, so that would exceed 2 MiB. OptimFROG might close in on a MiB.
If mid-side'ing makes a subframe require 32-bit words (or at least half of the subframes?) then ... ? Not to mention if for a 32-bit signal any codec will mid-side it to 33 bits - would they then use long 64-bit words? Monkey's version 6.xx had a slowdown due to using 64-bit words, fixed in version 7.00.

(Frame sizes ... I don't know them, I inferred from intentionally corrupting a byte, https://hydrogenaud.io/index.php/topic,122094.0.html . With higher resolution, frame sizes might be different.)

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #38 – 2023-04-20 11:19:27

Quote from: cid42 on 2023-04-20 10:23:59

You might be hard pressed to find a memory-bound lossless audio codec on modern x86.

Yes, indeed. Maybe I should phrase the question more broadly: are there any codecs that perform differently on modern CPUs when downclocked. For these comparisons, I've always ran encoding and decoding 'full steam', but for most codecs decoding will happen with the CPU mostly idling. I don't know how bursty up- and downclocking is these days, it seems possible to me real time decoding actually happens at the lowest possible clock for the fastest codecs. (edit: apparently switching CPU speeds takes between 5 and 50ms, so it seems reasonable to assume decoding of a single frame for a fast codec happens before the CPU switches to a faster speed. A frame is about 0.1 to 0.5s, processed at 100x speed means 1 to 5ms)

As far as I know, RAM CAS latencies are lower on lower clocks, at least for RAM. I don't know whether L2/L3 access latencies are lower too (as in: measured equal in nanoseconds, so measured faster in CPU cycles), and things like L1 cache misses are less expensive. I can't seem to find much information on this. So I figured, lets just run the numbers anyway, maybe there's a difference, maybe there's not.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #39 – 2023-04-21 12:27:59

Code: [Select]

System:
  CPU: 12th Gen Intel(R) Core(TM) i3-12100, features: MMX SSE SSE2 SSE3 SSE4.1 SSE4.2
  App: foobar2000 v1.6.16
Settings:
  High priority: no
  Buffer entire file into memory: no
  Warm-up: no
  Passes: 1
  Threads: 1
  Postprocessing: none

163 CDDA flac files (with unknown/mixed encoding settings), 6.04GB, on a Toshiba 4TB HDD.
I have 16GB RAM and Windows is installed on a 250GB SATA SSD, with page file disabled.

Code: [Select]

First Run:
  Decoded length: 18:08:08.773
  Opening time: 0:02.202
  Decoding time: 1:01.562
  Speed (x realtime): 1023.913

Second Run (so cached to RAM):
  Opening time: 0:00.601
  Decoding time: 0:43.844
  Speed (x realtime): 1468.988

Then I changed the CPU clock ratio from the default 41 (turbo at 43) to 20...

Code: [Select]

First Run:
  Opening time: 0:03.340
  Decoding time: 1:49.582
  Speed (x realtime): 578.174

Second Run:
  Opening time: 0:01.286
  Decoding time: 1:34.047
  Speed (x realtime): 684.851

How about a clock ratio of 10?

Code: [Select]

First Run:
  Opening time: 0:04.775
  Decoding time: 3:30.515
  Speed (x realtime): 303.259

Second Run:
  Opening time: 0:02.529
  Decoding time: 3:08.018
  Speed (x realtime): 342.638

I also tried to change the "Adjacent Cache Line Prefetch" and "Hardware Prefetcher" is BIOS but there is no difference in speed.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #40 – 2023-04-21 15:01:13

Monkey's Audio. 5 CD images for each compression level, but not the same 5 images in each level, so there are totally 15 20 different images tested. Same test conditions as previous post at default CPU speed.

Fast (784 kbps average)
1st:
Total length: 4:50:59.013
Opening time: 0:00.208
Decoding time: 0:57.130
304.490x realtime

2nd:
Opening time: 0:00.001
Decoding time: 0:49.855
350.187x realtime

Normal (623 kbps average)
1st:
Total length: 4:56:04.680
Opening time: 0:00.165
Decoding time: 1:10.000
253.182x realtime

2nd:
Opening time: 0:00.001
Decoding time: 1:04.026
277.453x realtime

High (683 kbps average)
1st:
Total length: 4:52:44.933
Opening time: 0:00.158
Decoding time: 1:15.826
231.165x realtime

2nd:
Opening time: 0:00.001
Decoding time: 1:08.756
255.462x realtime

Extra High (754 kbps average)
1st:
Total length: 4:53:59.387
Opening time: 0:00.129
Decoding time: 1:36.996
181.615x realtime

2nd:
Opening time: 0:00.001
Decoding time: 1:28.892
198.433x realtime

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #41 – 2023-04-23 12:04:09

4GB single wav file in CDDA format on 4TB HDD:
1st run:
Total length: 6:45:46.960
Opening time: 0:00.000
Decoding time: 0:22.532
1080.538x realtime

2nd run:
Opening time: 0:00.000
Decoding time: 0:00.762
31949.655x realtime

1411200 * 1080.538 / 1048576 / 8 = 181.777MB, pretty close to the sequential speed reported by benchmarking software like ATTO.

Same file in AAC-LC 96kbps (283MB):
1st:
Opening time: 0:00.005
Decoding time: 0:10.380
2344.435x realtime

2nd:
Opening time: 0:00.005
Decoding time: 0:09.152
2658.956x realtime

The same AAC file on SATA SSD:
1st:
Opening time: 0:00.005
Decoding time: 0:10.761
2261.427x realtime

2nd:
Opening time: 0:00.005
Decoding time: 0:09.132
2664.830x realtime

People with NVMe SSD may check if there are still differences between first and second runs. Remember to reboot before first run to make sure nothing is cached, and make sure there is no high CPU / disk activity before start of tests.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #42 – 2023-04-23 17:54:22

@bennetng Could you please explain what you're trying to test? I understand reply #39, while that is only a single codec (and I'd like to compare codecs on this issue) but what are you trying to prove with #40 and #41.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #43 – 2023-04-23 18:20:03

Because apart from your posts, there are also replies about storage I/O constrains and how CPU caches things, and I am also interested to try these things out.

I think you are doing / already finished your own tests, so did you get any (un)expected result?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #44 – 2023-04-23 19:06:36

Running the numbers for a single configuration takes a whole two weeks, so I don't have any results yet.

There has been some discussion on GitHub which hasn't been conclusive really.

But still, what conclusion do you draw from your own numbers? That AAC and Monkey's aren't really disk bound, WAV clearly is and FLAC is somewhere in the middle?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #45 – 2023-04-23 19:15:13

I think FLAC is still disk (HDD) bound, at least under my test conditions.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #46 – 2023-04-26 10:43:05

Quote from: Porcus on 2023-04-20 10:52:16

I might have gotten the concepts wrong, but are we asking how much unencoded PCM can fit in "one MB to some MB" L2 or L3 memory, and comparing that to codec frame size?
Disregarding that for a 16 bit signal, a mid-side frame might require a 17 bit word (is that in practice a 32-bit word, taking the memory of a full stereo?): a mono channel of CDDA is at the rate of 86-ish KiB/s; I think Monkey's insane has frames > 25 seconds, so that would exceed 2 MiB. OptimFROG might close in on a MiB.
If mid-side'ing makes a subframe require 32-bit words (or at least half of the subframes?) then ... ? Not to mention if for a 32-bit signal any codec will mid-side it to 33 bits - would they then use long 64-bit words? Monkey's version 6.xx had a slowdown due to using 64-bit words, fixed in version 7.00.

(Frame sizes ... I don't know them, I inferred from intentionally corrupting a byte, https://hydrogenaud.io/index.php/topic,122094.0.html . With higher resolution, frame sizes might be different.)

I mentioned raw frame size as that plus some working variables should be the worst-case "hot" data. If hot data fits in cache memory bandwidth is not an issue. Even if hot data doesn't fit in cache the access pattern and frequency would have to be extreme for memory bandwidth to be in play.

For example a naive encoder that has 1000 ways to encode may just brute force 1000 encodes, reading input separately every encode. If there's 2MB of input data and 1MB of cache then that's probably 2GB of memory access (it could be as little as 1GB but if the encoder is dumb enough to do brute force it's unlikely to make smart use of cached data). If the cache is 4MB then memory access should be negligible (the initial read plus some re-reads when other processes trigger premature evictions).

Most real implementations probably do at most a few passes over the input (a stat pass and an encode pass, if that), possibly with some passes over the resulting encoding to refine it further. Orders of magnitude less potential memory access.

libflac uses 32 bit for everything except 33 bit midside which uses 64 bit. It's unlikely that many implementations of flac or otherwise do things too differently, x86 at least is slower doing arithmetic on 8/16 bit widths.

Quote from: ktf on 2023-04-23 19:06:36

Running the numbers for a single configuration takes a whole two weeks, so I don't have any results yet.

There has been some discussion on GitHub which hasn't been conclusive really.
...

Pretty sure the large buffer benefit on Linux is real but only applies when the drive is fragmented, the same reason libflac applies a large buffer to win32 on encode. It's a myth that non-windows partitions don't suffer from fragmentation, they just tend to be better at avoiding it. And there's always dummies using ntfs on Linux, without access to windows it's hard to defrag so frag issues there should be more prevalent than the average ntfs user. At the very least I think win32 decode should use a large buffer for parity with encode, if it's useful on encode it should be useful on decode.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #47 – 2023-05-26 08:58:47

Quote from: ktf on 2023-04-23 19:06:36

There has been some discussion on GitHub which hasn't been conclusive really.

Just read your recent posts. Do the results related to Advanced Format? How old is the 1TB HDD being used? External or internal drive, or internal drive in a disk enclosure?

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #48 – 2023-05-26 09:09:50

It is a WDBUZG0010BBK-WESN, bought in jan 2022. From a quick scan I can't find any info on advanced format.

Re: Lossless codec comparison - part 3: CDDA (May '22)

Reply #49 – 2023-05-26 09:36:57

So a new external drive. I have an old internal 3.5" Samsung HD103UJ and it was released before the introduction of AF.

For external drives there is also another USB3 layer to deal with. Not a problem for total data throughput but just wondering about the performance of smaller block transfers.

I also have a newer 3TB internal 3.5" drive with AF and a USB3 disk enclosure, but even if I can run some benchmarks there is still no way to tell if any performance difference is caused by some combinations of disks/enclosure/external drives/USB controllers.

Notice