Hydrogenaudio Forum => Validated News => Topic started by: TBeck on 2022-06-30 15:46:37

Title: TAK 2.3.3
Post by: TBeck on 2022-06-30 15:46:37
Final release of TAK 2.3.3 ((T)om's lossless (A)udio (K)ompressor)

This release brings an 64-bit decoder library and better unicode support for the GUI version.

It consists of:

Title: Re: TAK 2.3.3
Post by: TBeck on 2022-06-30 15:50:44
What's new

New features:



Here the results for my primary file set.

Test system: Intel i3-8100 (3.6 GHz / 1 Thread), Windows 10.

Code: [Select]
Preset  Enco-Speed                Deco-Speed
        2.3.2    2.3.3    Win %   2.3.2    2.3.3    Win %
-p0     868.72   893.94    2.90   806.84   810.57    0.46
-p0e    698.08   711.16    1.87   812.44   817.09    0.57
-p0m    418.70   425.54    1.63   815.07   819.19    0.51
-p1     726.44   748.04    2.97   787.03   788.37    0.17
-p1e    482.99   491.72    1.81   788.73   791.12    0.30
-p1m    314.46   322.68    2.61   790.61   794.60    0.50
-p2     580.58   591.20    1.83   715.05   718.12    0.43
-p2e    347.07   348.35    0.37   714.52   719.10    0.64
-p2m    203.02   206.26    1.60   715.99   718.71    0.38
-p3     301.25   306.83    1.85   697.20   703.97    0.97
-p3e    241.91   244.79    1.19   697.88   702.06    0.60
-p3m    131.89   133.65    1.33   699.12   701.40    0.33
-p4     183.23   186.54    1.81   650.44   657.64    1.11
-p4e    158.75   160.61    1.17   651.01   658.40    1.14
-p4m     82.49    83.73    1.50   651.21   656.84    0.86
Speed as multiple of realtime playback.

And to illustrate the speed disadvantage of the 64-bit version:
Code: [Select]
Preset  Enco-Speed                Deco-Speed
        32 bit   64 bit   Win %   32 bit   64 bit   Win %
-p0     893.94   827.49   -7.43   810.57   700.80  -13.54
-p0e    711.16   661.05   -7.05   817.09   709.15  -13.21
-p0m    425.54   398.79   -6.29   819.19   710.48  -13.27
-p1     748.04   698.29   -6.65   788.37   689.93  -12.49
-p1e    491.72   461.79   -6.09   791.12   693.09  -12.39
-p1m    322.68   303.56   -5.93   794.60   695.48  -12.47
-p2     591.20   557.27   -5.74   718.12   634.55  -11.64
-p2e    348.35   336.29   -3.46   719.10   633.90  -11.85
-p2m    206.26   193.64   -6.12   718.71   636.49  -11.44
-p3     306.83   287.58   -6.27   703.97   619.50  -12.00
-p3e    244.79   233.77   -4.50   702.06   620.18  -11.66
-p3m    133.65   123.27   -7.77   701.40   621.06  -11.45
-p4     186.54   172.30   -7.63   657.64   577.94  -12.12
-p4e    160.61   145.73   -9.26   658.40   579.34  -12.01
-p4m     83.73    76.29   -8.89   656.84   578.45  -11.93
Speed as multiple of realtime playback.
Title: Re: TAK 2.3.3
Post by: TBeck on 2022-06-30 15:51:32

The next release should add support for the AVX2 instruction set. I achieved encoding speed improvements of about 14 percent for preset -p4m on my primary system (Intel Skylake based CPU), less for other presets. But results of my secondary (Haswell based) system were discouraging: Maximum improvement of 8 percent for presets p4 and p4e and up to 23 percent slower encoding for p2m, p3m and p4m!

Those presets make the most use of AVX2-instructions and should also benefit the most. But they seem to trigger the automatic down clocking mechanism of the cpu. AVX2 base and turbo frequencies are lower than the regular ones. This wouldn't hurt too much if the encoder would mostly use AVX2 instructions, but that's not the case. I havent profiled the code yet but i would estimate that about 30 percent of the encoding time goes to AVX2 instructions. And this is no continuous block, instead blocks of x86/SSE2 and AVX2 instructions are alternating.

That's bad, beacuse it will cause many transitions between the different clock rates. During such transitions the speed can be much slower than the lower clock rate would suggest. After the last AVX2 instruction the lower clock rate will be maintained for a considerable amount of time, therfore succeeding non-AVX2 instructions will also be excecuted slower.

Well, my haswell cpu is an 35w low power quad core, quite a challenge for an older desktop microarchitecture. The difference between regular and AVX2 clock most likely is considerably bigger than for the common 65W+ cpus.

Nevertheless i am really hesitant to release an AVX2-version which will make encoding on an unknown number of older systems slower. And  imho the possible advantage isn't big enough to justify an elaborate study and implementation of a cpu dependend code path.

Currently it's not clear what i will do next. Possibly i will try to improve the encoding speed by algorithmic modifications. Ktf's latest "Lossless codec comparison" also made me think about the (re-) introduction of higher predictor counts.

Features for later Versions:

Title: Re: TAK 2.3.3
Post by: NetRanger on 2022-06-30 18:08:16
Thnx for the new release. :)
Title: Re: TAK 2.3.3
Post by: ktf on 2022-06-30 18:42:39
Currently it's not clear what i will do next.

Your website is slightly out-of-date, listing the following items
    Unterstützung für Unicode-Zeichensätze.
    Eine deutschsprachige Version.
    Noch ein bißchen mehr Geschwindigkeit und Kompressionseffizienz...
    Anwendungen für andere Plattformen als Windows.
    Unterstützung für mehr als 6 Audiokanäle.

Of course I don't know where you would like TAK to go from here. Do you still like to make (big) changes/additions to the format, or do you want to keep things backwards-compatible?

I know you've been talking about open-sourcing, but I can imagine this is a big step. If you'd like to see TAK gain more users, you could consider contributing a bare essentials TAK encoder to ffmpeg for example. I would imagine a TAK encoder without all specific tuning and tweaks, just a simple TAK encoder would already beat FLAC with ease. Or instead of open-sourcing the software, you could open up the format by creating a document describing the structure and sharing the ideas you used. Maybe someone else will do it for you (like the ffmpeg guys did with wavpack)

Please don't feel offended or pressured to do anything, I just wanted to contribute a few ideas.
Title: Re: TAK 2.3.3
Post by: Florian on 2022-06-30 20:46:42
An 64-bit decoder library for the SDK.
Thank you! I'll include it with the next release of Mp3tag.
Title: Re: TAK 2.3.3
Post by: sPeziFisH on 2022-06-30 21:29:57
Thank you Thomas for the new release! Your details about efforts and insights are always a pleasure to read too.

of course welcome to also have CPUs of the last decade in view, I guess most of us will run those. Reg. to https://en.wikipedia.org/wiki/Advanced_Vector_Extensions (https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) the last 5-7years-generations seems to have AVX2 support, newer CPUs are targeting next generation AVX-512.
My CPU does not have AVX2, next one in the next 12 months will, nevertheless I personally start endcoding and do not watch the progressbar. I rarely take care of some % of encoding as with current enc-speed-range, which is already at an awesome level, this is not the driving factor - to me.
To whom might this be an 'issue' or real downside at all, usecases, scenarios, ..? The group might get smaller and smaller anyway.
IMHO it makes sense to really go for the newer instruction set with next version, which will be sort of minor-version 2.4 then, guess so. And for the almost impossible case of bug fixing release another 2.3.x patch-version of no-AVX2 TAK.

Also like ktf's thoughts.
Title: Re: TAK 2.3.3
Post by: Porcus on 2022-07-01 09:14:04
You could consider switching from MD5 to something faster. I've been told most SHA checksums are faster simply because they don't have a long dependency chain and can more efficiently use the superscalar properties and out-of-order execution capabilities of modern CPUs. (That would obviously break backward compatibility, but only for checksumming) For FLAC checksumming is quite a significant part of decoding CPU load (and encoding for the fastest presets), so I imagine this is also the case for TAK.

The "Fast integrity check without decoding based upon the checksums only" would resolve most of [note1] that. I think so much that the downside to replacing MD5 outweights the benefits.

In TAK and WavPack, MD5 is optional and disabled by default - in WavPack it is viewed more as a fingerprint, and even more so after having implemented the non-decoding integrity check.
From that point of view, where MD5 is an optional fingerprint, I think it is a great advantage to have the same [note2] algorithm across FLAC/TAK/WavPack(/OptimFROG if anyone cares):
 * if you want something quicker, then use the default; integrity verification will be faster than decoding anyway
 * if you want a checksum as a fingerprint, you presumably want the one that everyone uses (say, if you transcode: yep, every MD5 appears precisely twice, that is source and target; and if you want to use say foo_bitcompare in the end to be sure, just sort source file list and target file list by MD5).

[note1]: Here is the exception. If one wants to verify not only integrity, but to check an encoding against the original PCM, then one could make a checksum of the PCM, encode, decode and check against the checksum. Then two checksums are calculated and a slow algorithm is a penalty - that would be "unnecessary" if the MD5 is not written to the file, never again to be used.
Now is it worth it to implement a second algorithm for the cases where MD5 is not stored?
(Even if MD5 has to be calculated once to be stored, a more than twice as fast algorithm would save time. But is it worth it?)

[note2]: except well, codecs differ on how to calculate MD5 on 8 bit signals, them being unsigned. Who has a big collection of 8 bit  .wav compressed?
Title: Re: TAK 2.3.3
Post by: mudlord on 2022-08-13 00:42:16
You could consider looking into the arithmetic coder used in Daala/AV1. That is an arithmetic coder that was specifically designed to evade any existing patents. I don't know whether that is fast enough for your liking

That hasn't stopped Microsoft from patenting the rANS entropy method, regardless.

Which to me leaves us in the exact same situation as if we use ordinary arithmetic coding. Might as well just use that or range coding. :/