HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: rdtsh on 2023-12-10 01:23:41

Title: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2023-12-10 01:23:41
https://github.com/stseelig/libttaR
Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-12 17:28:18
good news, bad news

good news:
i've been working on a multi-threaded version. the encoder is already done in the 1.1-dev branch

bad news:
the tta wikipedia page was deleted
Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-21 05:28:38
1.1 is now nearly finished. There is just some minor stuff for me to get at. I plan to officially release it on the first of next month.

here is a benchmark against ffmpeg:

system:
   Linux 5.10.0-23-amd64 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
CPU:
   AMD Ryzen 7 1700 64-bit 8-Core MT MCP 1550/3750 MHz
ffmpeg:
   ffmpeg version 4.3.6-0+deb11u1
   built with gcc 10 (Debian 10.2.1-6)
   libavutil      56. 51.100 / 56. 51.100
   libavcodec     58. 91.100 / 58. 91.100
   libavformat    58. 45.100 / 58. 45.100
   libavdevice    58. 10.100 / 58. 10.100
   libavfilter     7. 85.100 /  7. 85.100
   libavresample   4.  0.  0 /  4.  0.  0
   libswscale      5.  7.100 /  5.  7.100
   libswresample   3.  7.100 /  3.  7.100
   libpostproc    55.  7.100 / 55.  7.100
ttaR:
   Debian clang version 11.0.1-2
   -march=native -mtune=native
file:
   Nirvana - MTV Unplugged in New York
   571084460   mtv.wav
   337981896   mtv.tta

##############################################################################

 Performance counter stats for 'ffmpeg -loglevel quiet -threads 1 -i mtv.wav -f tta /dev/null -y' (20 runs):

          8,925.80 msec task-clock                #    1.000 CPUs utilized            ( +-  0.06% )
                60      context-switches          #    0.007 K/sec                    ( +-  5.24% )
                16      cpu-migrations            #    0.002 K/sec                    ( +-  2.69% )
            86,798      page-faults               #    0.010 M/sec                    ( +-  0.00% )
    33,302,059,704      cycles                    #    3.731 GHz                      ( +-  0.05% )  (83.33%)
       566,286,522      stalled-cycles-frontend   #    1.70% frontend cycles idle     ( +-  0.54% )  (83.33%)
     3,988,341,464      stalled-cycles-backend    #   11.98% backend cycles idle      ( +-  0.18% )  (83.33%)
    65,718,741,261      instructions              #    1.97  insn per cycle
                                                  #    0.06  stalled cycles per insn  ( +-  0.00% )  (83.33%)
     8,702,525,883      branches                  #  974.986 M/sec                    ( +-  0.01% )  (83.34%)
       384,955,890      branch-misses             #    4.42% of all branches          ( +-  0.03% )  (83.33%)

           8.92647 +- 0.00557 seconds time elapsed  ( +-  0.06% )


 Performance counter stats for 'ffmpeg -loglevel quiet -threads 1 -i mtv.tta -f s16le /dev/null -y' (20 runs):

          6,504.99 msec task-clock                #    1.000 CPUs utilized            ( +-  0.13% )
                55      context-switches          #    0.008 K/sec                    ( +-  2.46% )
                16      cpu-migrations            #    0.002 K/sec                    ( +-  2.49% )
             3,692      page-faults               #    0.567 K/sec                    ( +-  0.04% )
    24,289,757,929      cycles                    #    3.734 GHz                      ( +-  0.13% )  (83.31%)
       327,298,120      stalled-cycles-frontend   #    1.35% frontend cycles idle     ( +-  0.15% )  (83.31%)
     3,454,321,337      stalled-cycles-backend    #   14.22% backend cycles idle      ( +-  0.20% )  (83.33%)
    67,039,367,957      instructions              #    2.76  insn per cycle
                                                  #    0.05  stalled cycles per insn  ( +-  0.00% )  (83.35%)
     8,499,223,928      branches                  # 1306.569 M/sec                    ( +-  0.00% )  (83.36%)
       313,478,808      branch-misses             #    3.69% of all branches          ( +-  0.01% )  (83.34%)

           6.50534 +- 0.00866 seconds time elapsed  ( +-  0.13% )


 Performance counter stats for 'ffmpeg -loglevel quiet -threads 16 -i mtv.tta -f s16le /dev/null -y' (320 runs):

          9,947.58 msec task-clock                #   12.309 CPUs utilized            ( +-  0.09% )
             4,450      context-switches          #    0.447 K/sec                    ( +-  0.37% )
             1,041      cpu-migrations            #    0.105 K/sec                    ( +-  1.04% )
             8,380      page-faults               #    0.842 K/sec                    ( +-  0.00% )
    36,296,805,088      cycles                    #    3.649 GHz                      ( +-  0.02% )  (83.37%)
       562,776,161      stalled-cycles-frontend   #    1.55% frontend cycles idle     ( +-  0.15% )  (83.13%)
     5,056,980,289      stalled-cycles-backend    #   13.93% backend cycles idle      ( +-  0.02% )  (83.21%)
    67,127,958,734      instructions              #    1.85  insn per cycle
                                                  #    0.08  stalled cycles per insn  ( +-  0.00% )  (83.30%)
     8,518,827,278      branches                  #  856.372 M/sec                    ( +-  0.00% )  (83.43%)
       317,537,726      branch-misses             #    3.73% of all branches          ( +-  0.00% )  (83.57%)

           0.80817 +- 0.00137 seconds time elapsed  ( +-  0.17% )

##############################################################################

 Performance counter stats for 'ttaR encode -q -S mtv.wav -o /dev/null' (20 runs):

          5,697.67 msec task-clock                #    1.000 CPUs utilized            ( +-  0.10% )
                13      context-switches          #    0.002 K/sec                    ( +- 14.11% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +- 31.26% )
               252      page-faults               #    0.044 K/sec                    ( +-  0.10% )
    21,277,212,273      cycles                    #    3.734 GHz                      ( +-  0.08% )  (83.32%)
       424,132,363      stalled-cycles-frontend   #    1.99% frontend cycles idle     ( +-  1.19% )  (83.32%)
     8,164,992,255      stalled-cycles-backend    #   38.37% backend cycles idle      ( +-  0.17% )  (83.33%)
    50,036,808,248      instructions              #    2.35  insn per cycle        
                                                  #    0.16  stalled cycles per insn  ( +-  0.00% )  (83.34%)
     3,929,444,278      branches                  #  689.658 M/sec                    ( +-  0.01% )  (83.36%)
       371,064,057      branch-misses             #    9.44% of all branches          ( +-  0.01% )  (83.34%)

           5.69822 +- 0.00548 seconds time elapsed  ( +-  0.10% )


 Performance counter stats for 'ttaR encode -q -t16 mtv.wav -o /dev/null' (320 runs):

          8,177.73 msec task-clock                #   15.379 CPUs utilized            ( +-  0.02% )
             1,300      context-switches          #    0.159 K/sec                    ( +-  2.07% )
               203      cpu-migrations            #    0.025 K/sec                    ( +-  2.01% )
             5,834      page-faults               #    0.713 K/sec                    ( +-  0.04% )
    30,484,625,770      cycles                    #    3.728 GHz                      ( +-  0.02% )  (83.09%)
       439,889,964      stalled-cycles-frontend   #    1.44% frontend cycles idle     ( +-  0.38% )  (83.18%)
     4,980,516,390      stalled-cycles-backend    #   16.34% backend cycles idle      ( +-  0.04% )  (83.31%)
    50,068,984,657      instructions              #    1.64  insn per cycle        
                                                  #    0.10  stalled cycles per insn  ( +-  0.00% )  (83.46%)
     3,937,619,679      branches                  #  481.505 M/sec                    ( +-  0.00% )  (83.60%)
       380,192,849      branch-misses             #    9.66% of all branches          ( +-  0.00% )  (83.36%)

          0.531753 +- 0.000344 seconds time elapsed  ( +-  0.06% )


 Performance counter stats for 'ttaR decode -q -S mtv.tta -f raw -o /dev/null' (20 runs):

          5,807.10 msec task-clock                #    1.000 CPUs utilized            ( +-  0.12% )
                15      context-switches          #    0.003 K/sec                    ( +-  9.58% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 25.36% )
               207      page-faults               #    0.036 K/sec                    ( +-  0.10% )
    21,697,968,813      cycles                    #    3.736 GHz                      ( +-  0.12% )  (83.29%)
       409,444,598      stalled-cycles-frontend   #    1.89% frontend cycles idle     ( +-  0.06% )  (83.33%)
     8,348,408,571      stalled-cycles-backend    #   38.48% backend cycles idle      ( +-  0.30% )  (83.35%)
    53,311,974,255      instructions              #    2.46  insn per cycle        
                                                  #    0.16  stalled cycles per insn  ( +-  0.00% )  (83.35%)
     3,884,628,982      branches                  #  668.945 M/sec                    ( +-  0.00% )  (83.36%)
       370,081,414      branch-misses             #    9.53% of all branches          ( +-  0.01% )  (83.32%)

           5.80763 +- 0.00684 seconds time elapsed  ( +-  0.12% )


 Performance counter stats for 'ttaR decode -q -t16 mtv.tta -f raw -o /dev/null' (320 runs):

          8,396.63 msec task-clock                #   15.660 CPUs utilized            ( +-  0.02% )
             2,705      context-switches          #    0.322 K/sec                    ( +-  0.61% )
                70      cpu-migrations            #    0.008 K/sec                    ( +-  1.97% )
             4,820      page-faults               #    0.574 K/sec                    ( +-  0.00% )
    31,341,341,315      cycles                    #    3.733 GHz                      ( +-  0.02% )  (83.16%)
       404,497,658      stalled-cycles-frontend   #    1.29% frontend cycles idle     ( +-  1.11% )  (83.19%)
     4,983,844,918      stalled-cycles-backend    #   15.90% backend cycles idle      ( +-  0.03% )  (83.27%)
    53,368,851,871      instructions              #    1.70  insn per cycle        
                                                  #    0.09  stalled cycles per insn  ( +-  0.00% )  (83.37%)
     3,896,895,024      branches                  #  464.102 M/sec                    ( +-  0.00% )  (83.55%)
       378,556,224      branch-misses             #    9.71% of all branches          ( +-  0.00% )  (83.45%)

          0.536170 +- 0.000363 seconds time elapsed  ( +-  0.07% )

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-05-21 08:44:41
Using (old version) ffmpeg of cli tool for performance comparison is flawed and extremely biased.
Also current ffmpeg cli tool have bad performance with smaller packets due to clumsy MT work of ex-developer.
Also just to run generic build of ffmpeg for the first time takes extra time... That is just few points I wanted to emphasize how such comparison is unfair and biased.
Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-05-21 08:49:03
Also this repo does not have any SIMD x86 assembly and the only way performance can go up if compiler is modern clang.
Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-21 21:30:09
Using (old version) ffmpeg of cli tool for performance comparison is flawed and extremely biased.
The tta codec version should be the same between the version I used and the current version, or at least the differences are negligible. You are always welcome to post your own benchmarks.

Also current ffmpeg cli tool have bad performance with smaller packets due to clumsy MT work of ex-developer.
ok

Also just to run generic build of ffmpeg for the first time takes extra time
That cannot be more than a millisecond.

Also this repo does not have any SIMD x86 assembly and the only way performance can go up if compiler is modern clang.
If anything, that is a virtue.
FYI, most of the performance increases came from optimizing the rice coder, which is completely unSIMDable
Title: Re: libttaR (TTA rewrite part 2)
Post by: Porcus on 2024-05-21 23:37:11
smaller packets, wouldn't TTA be fixed at about a second - that's not small?

Anyway, ffmpeg 5 isn't the newest and hottest either ... though, compared to this article:
the tta wikipedia page was deleted
Been flagged as looking like an ad for twelve years ... and not too wrongfully either.
Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-22 00:05:50
smaller packets, wouldn't TTA be fixed at about a second - that's not small?
yes, with CD quality audio, the framesize is 180KiB.
Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-06-01 21:36:15
1.1 officially released

lib:
    api changes
    performance improvements

cli:
    multi-threading
    faster single-threading
    some bug fixes
Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 00:11:21
Still extremely misleading and biased, proper benchmark are done by decoding very long audio files and looking at speed of decoding versus realtime. Also TTA is more irrelevant and niche than TAK, and TAK actually did have some cool new innovative stuff to offer, TTA have nothing new to offer.
Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 00:13:44
smaller packets, wouldn't TTA be fixed at about a second - that's not small?
yes, with CD quality audio, the framesize is 180KiB.

Not decoded framesize, but encoded ones as stored in final output, if audio is mostly silence it will slow thing down unless TTA encode silence frames extremely inefficiently which may be true after all considering it poor design decisions.
Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-06-02 04:04:33
Not decoded framesize, but encoded ones as stored in final output, if audio is mostly silence it will slow thing down unless TTA encode silence frames extremely inefficiently which may be true after all considering it poor design decisions.
so ffmpeg gets slower when it has to read less? what a well designed piece of software
and yes, tta is not that efficient with silence compared to most other codecs
Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 09:29:08
Indeed, 60 seconds of silence (44100Hz S16) encoded with ffmpeg: tta is 654K and flac is 17K.
Title: Re: libttaR (TTA rewrite part 2)
Post by: Porcus on 2024-06-02 12:09:22
I tested the compression of silence: https://hydrogenaud.io/index.php/topic,122413.0.html .  (Post says "5.1" but it was five channels.)
TTA isn't that bad for a codec that doesn't handle wasted bits - compare to Monkey's and ALAC.
I could get smaller files by manually setting block sizes. 29k with ALS, 70k with FLAC, and 220k with WavPack --pair-unassigned-chans (edit: fixed numbers)

Anyway, don't whine over people who make a little code-improvement project. The TTA reference implementation isn't good.
As for the format itself, there is one thing to it, that sometimes is useful in some applications: the encoded bit stream is unique. (Bar the encryption feature, which I haven't seen anything being able to play except command-line ffplay.) Would make it ideal for p2p distribution.
Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 12:39:49
I do not whine over people, I whine over their claims.