Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: TAK 2.3.3 (Read 1829 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

TAK 2.3.3

Final release of TAK 2.3.3 ((T)om's lossless (A)udio (K)ompressor)

This release brings an 64-bit decoder library and better unicode support for the GUI version.

It consists of:

  • TAK Applications 2.3.3.
  • TAK Winamp plugin 2.3.3
  • TAK Decoder library 2.3.3 (x86/x64)
  • TAK SDK 2.3.3

 

Re: TAK 2.3.3

Reply #1
What's new

New features:

  • An 64-bit decoder library for the SDK.
  • I have also created 64-bit versions of the applications. As expected they are bigger and slower without any advantage. As long as Windows supports 32-bit applications i see no reason to release them. But i will continously maintain them, so that they are ready when needed.
  • Unicode support of the GUI version is no longer limited to the open file dialogs. The required switch to a newer version of my development environment is responsible for a 3.5 times bigger program file.

Improvements:

  • Tiny encoding speed improvements of not more than 3 percent for Intel cpus based upon the skylake microarchitecture (6th to 10th Generation Core). I could have squeezed out more but only at the expense of significantly slower processing on older platforms. As rule of thumb i am taking into account cpu microarchitectures of the last 10 years.

Results

Here the results for my primary file set.

Test system: Intel i3-8100 (3.6 GHz / 1 Thread), Windows 10.

Code: [Select]
Preset  Enco-Speed                Deco-Speed
---------------------------------------------------------
        2.3.2    2.3.3    Win %   2.3.2    2.3.3    Win %
---------------------------------------------------------
-p0     868.72   893.94    2.90   806.84   810.57    0.46
-p0e    698.08   711.16    1.87   812.44   817.09    0.57
-p0m    418.70   425.54    1.63   815.07   819.19    0.51
-p1     726.44   748.04    2.97   787.03   788.37    0.17
-p1e    482.99   491.72    1.81   788.73   791.12    0.30
-p1m    314.46   322.68    2.61   790.61   794.60    0.50
-p2     580.58   591.20    1.83   715.05   718.12    0.43
-p2e    347.07   348.35    0.37   714.52   719.10    0.64
-p2m    203.02   206.26    1.60   715.99   718.71    0.38
-p3     301.25   306.83    1.85   697.20   703.97    0.97
-p3e    241.91   244.79    1.19   697.88   702.06    0.60
-p3m    131.89   133.65    1.33   699.12   701.40    0.33
-p4     183.23   186.54    1.81   650.44   657.64    1.11
-p4e    158.75   160.61    1.17   651.01   658.40    1.14
-p4m     82.49    83.73    1.50   651.21   656.84    0.86
---------------------------------------------------------
Speed as multiple of realtime playback.

And to illustrate the speed disadvantage of the 64-bit version:
Code: [Select]
Preset  Enco-Speed                Deco-Speed
---------------------------------------------------------
        32 bit   64 bit   Win %   32 bit   64 bit   Win %
---------------------------------------------------------
-p0     893.94   827.49   -7.43   810.57   700.80  -13.54
-p0e    711.16   661.05   -7.05   817.09   709.15  -13.21
-p0m    425.54   398.79   -6.29   819.19   710.48  -13.27
-p1     748.04   698.29   -6.65   788.37   689.93  -12.49
-p1e    491.72   461.79   -6.09   791.12   693.09  -12.39
-p1m    322.68   303.56   -5.93   794.60   695.48  -12.47
-p2     591.20   557.27   -5.74   718.12   634.55  -11.64
-p2e    348.35   336.29   -3.46   719.10   633.90  -11.85
-p2m    206.26   193.64   -6.12   718.71   636.49  -11.44
-p3     306.83   287.58   -6.27   703.97   619.50  -12.00
-p3e    244.79   233.77   -4.50   702.06   620.18  -11.66
-p3m    133.65   123.27   -7.77   701.40   621.06  -11.45
-p4     186.54   172.30   -7.63   657.64   577.94  -12.12
-p4e    160.61   145.73   -9.26   658.40   579.34  -12.01
-p4m     83.73    76.29   -8.89   656.84   578.45  -11.93
---------------------------------------------------------
Speed as multiple of realtime playback.

Re: TAK 2.3.3

Reply #2
Future

The next release should add support for the AVX2 instruction set. I achieved encoding speed improvements of about 14 percent for preset -p4m on my primary system (Intel Skylake based CPU), less for other presets. But results of my secondary (Haswell based) system were discouraging: Maximum improvement of 8 percent for presets p4 and p4e and up to 23 percent slower encoding for p2m, p3m and p4m!

Those presets make the most use of AVX2-instructions and should also benefit the most. But they seem to trigger the automatic down clocking mechanism of the cpu. AVX2 base and turbo frequencies are lower than the regular ones. This wouldn't hurt too much if the encoder would mostly use AVX2 instructions, but that's not the case. I havent profiled the code yet but i would estimate that about 30 percent of the encoding time goes to AVX2 instructions. And this is no continuous block, instead blocks of x86/SSE2 and AVX2 instructions are alternating.

That's bad, beacuse it will cause many transitions between the different clock rates. During such transitions the speed can be much slower than the lower clock rate would suggest. After the last AVX2 instruction the lower clock rate will be maintained for a considerable amount of time, therfore succeeding non-AVX2 instructions will also be excecuted slower.

Well, my haswell cpu is an 35w low power quad core, quite a challenge for an older desktop microarchitecture. The difference between regular and AVX2 clock most likely is considerably bigger than for the common 65W+ cpus.

Nevertheless i am really hesitant to release an AVX2-version which will make encoding on an unknown number of older systems slower. And  imho the possible advantage isn't big enough to justify an elaborate study and implementation of a cpu dependend code path.

Currently it's not clear what i will do next. Possibly i will try to improve the encoding speed by algorithmic modifications. Ktf's latest "Lossless codec comparison" also made me think about the (re-) introduction of higher predictor counts.

Features for later Versions:

  • Port to Lazarus / Freepascal. Nice for Linux support.
  • Fast integrity check without decoding based upon the checksums only.
  • Transcode mode.
  • Tuning of the encoder for the problem files which have been reported in the past months.

Re: TAK 2.3.3

Reply #3
Thnx for the new release. :)

Re: TAK 2.3.3

Reply #4
Currently it's not clear what i will do next.

Your website is slightly out-of-date, listing the following items
Quote
    Unterstützung für Unicode-Zeichensätze.
    Eine deutschsprachige Version.
    Noch ein bißchen mehr Geschwindigkeit und Kompressionseffizienz...
    Anwendungen für andere Plattformen als Windows.
    Unterstützung für mehr als 6 Audiokanäle.

  • For FLAC I think there is still a bit to gain by improving quantization of predictor coefficients, but it seems this is not applicable to TAK, as it doesn't store raw predictor coefficients like FLAC does. I'm sharing the idea just in case it does make sense
  • You could consider switching from MD5 to something faster. I've been told most SHA checksums are faster simply because they don't have a long dependency chain and can more efficiently use the superscalar properties and out-of-order execution capabilities of modern CPUs. (That would obviously break backward compatibility, but only for checksumming) For FLAC checksumming is quite a significant part of decoding CPU load (and encoding for the fastest presets), so I imagine this is also the case for TAK.
  • You could consider looking into the arithmetic coder used in Daala/AV1. That is an arithmetic coder that was specifically designed to evade any existing patents. I don't know whether that is fast enough for your liking

Of course I don't know where you would like TAK to go from here. Do you still like to make (big) changes/additions to the format, or do you want to keep things backwards-compatible?

I know you've been talking about open-sourcing, but I can imagine this is a big step. If you'd like to see TAK gain more users, you could consider contributing a bare essentials TAK encoder to ffmpeg for example. I would imagine a TAK encoder without all specific tuning and tweaks, just a simple TAK encoder would already beat FLAC with ease. Or instead of open-sourcing the software, you could open up the format by creating a document describing the structure and sharing the ideas you used. Maybe someone else will do it for you (like the ffmpeg guys did with wavpack)

Please don't feel offended or pressured to do anything, I just wanted to contribute a few ideas.
Music: sounds arranged such that they construct feelings.


Re: TAK 2.3.3

Reply #6
Thank you Thomas for the new release! Your details about efforts and insights are always a pleasure to read too.

@Future:
of course welcome to also have CPUs of the last decade in view, I guess most of us will run those. Reg. to https://en.wikipedia.org/wiki/Advanced_Vector_Extensions the last 5-7years-generations seems to have AVX2 support, newer CPUs are targeting next generation AVX-512.
My CPU does not have AVX2, next one in the next 12 months will, nevertheless I personally start endcoding and do not watch the progressbar. I rarely take care of some % of encoding as with current enc-speed-range, which is already at an awesome level, this is not the driving factor - to me.
To whom might this be an 'issue' or real downside at all, usecases, scenarios, ..? The group might get smaller and smaller anyway.
IMHO it makes sense to really go for the newer instruction set with next version, which will be sort of minor-version 2.4 then, guess so. And for the almost impossible case of bug fixing release another 2.3.x patch-version of no-AVX2 TAK.

Also like ktf's thoughts.

Re: TAK 2.3.3

Reply #7
You could consider switching from MD5 to something faster. I've been told most SHA checksums are faster simply because they don't have a long dependency chain and can more efficiently use the superscalar properties and out-of-order execution capabilities of modern CPUs. (That would obviously break backward compatibility, but only for checksumming) For FLAC checksumming is quite a significant part of decoding CPU load (and encoding for the fastest presets), so I imagine this is also the case for TAK.

The "Fast integrity check without decoding based upon the checksums only" would resolve most of [note1] that. I think so much that the downside to replacing MD5 outweights the benefits.

In TAK and WavPack, MD5 is optional and disabled by default - in WavPack it is viewed more as a fingerprint, and even more so after having implemented the non-decoding integrity check.
From that point of view, where MD5 is an optional fingerprint, I think it is a great advantage to have the same [note2] algorithm across FLAC/TAK/WavPack(/OptimFROG if anyone cares):
 * if you want something quicker, then use the default; integrity verification will be faster than decoding anyway
 * if you want a checksum as a fingerprint, you presumably want the one that everyone uses (say, if you transcode: yep, every MD5 appears precisely twice, that is source and target; and if you want to use say foo_bitcompare in the end to be sure, just sort source file list and target file list by MD5).

[note1]: Here is the exception. If one wants to verify not only integrity, but to check an encoding against the original PCM, then one could make a checksum of the PCM, encode, decode and check against the checksum. Then two checksums are calculated and a slow algorithm is a penalty - that would be "unnecessary" if the MD5 is not written to the file, never again to be used.
Now is it worth it to implement a second algorithm for the cases where MD5 is not stored?
(Even if MD5 has to be calculated once to be stored, a more than twice as fast algorithm would save time. But is it worth it?)

[note2]: except well, codecs differ on how to calculate MD5 on 8 bit signals, them being unsigned. Who has a big collection of 8 bit  .wav compressed?
Name? – Владимир Владимирович Путин
Occupation? – Nonono, just a couple of days' vacation!

Re: TAK 2.3.3

Reply #8
Quote
You could consider looking into the arithmetic coder used in Daala/AV1. That is an arithmetic coder that was specifically designed to evade any existing patents. I don't know whether that is fast enough for your liking

That hasn't stopped Microsoft from patenting the rANS entropy method, regardless.

Which to me leaves us in the exact same situation as if we use ordinary arithmetic coding. Might as well just use that or range coding. :/