Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: libttaR (TTA rewrite part 2) (Read 1676 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Re: libttaR (TTA rewrite part 2)

Reply #1
good news, bad news

good news:
i've been working on a multi-threaded version. the encoder is already done in the 1.1-dev branch

bad news:
the tta wikipedia page was deleted

Re: libttaR (TTA rewrite part 2)

Reply #2
1.1 is now nearly finished. There is just some minor stuff for me to get at. I plan to officially release it on the first of next month.

here is a benchmark against ffmpeg:

system:
   Linux 5.10.0-23-amd64 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
CPU:
   AMD Ryzen 7 1700 64-bit 8-Core MT MCP 1550/3750 MHz
ffmpeg:
   ffmpeg version 4.3.6-0+deb11u1
   built with gcc 10 (Debian 10.2.1-6)
   libavutil      56. 51.100 / 56. 51.100
   libavcodec     58. 91.100 / 58. 91.100
   libavformat    58. 45.100 / 58. 45.100
   libavdevice    58. 10.100 / 58. 10.100
   libavfilter     7. 85.100 /  7. 85.100
   libavresample   4.  0.  0 /  4.  0.  0
   libswscale      5.  7.100 /  5.  7.100
   libswresample   3.  7.100 /  3.  7.100
   libpostproc    55.  7.100 / 55.  7.100
ttaR:
   Debian clang version 11.0.1-2
   -march=native -mtune=native
file:
   Nirvana - MTV Unplugged in New York
   571084460   mtv.wav
   337981896   mtv.tta

##############################################################################

 Performance counter stats for 'ffmpeg -loglevel quiet -threads 1 -i mtv.wav -f tta /dev/null -y' (20 runs):

          8,925.80 msec task-clock                #    1.000 CPUs utilized            ( +-  0.06% )
                60      context-switches          #    0.007 K/sec                    ( +-  5.24% )
                16      cpu-migrations            #    0.002 K/sec                    ( +-  2.69% )
            86,798      page-faults               #    0.010 M/sec                    ( +-  0.00% )
    33,302,059,704      cycles                    #    3.731 GHz                      ( +-  0.05% )  (83.33%)
       566,286,522      stalled-cycles-frontend   #    1.70% frontend cycles idle     ( +-  0.54% )  (83.33%)
     3,988,341,464      stalled-cycles-backend    #   11.98% backend cycles idle      ( +-  0.18% )  (83.33%)
    65,718,741,261      instructions              #    1.97  insn per cycle
                                                  #    0.06  stalled cycles per insn  ( +-  0.00% )  (83.33%)
     8,702,525,883      branches                  #  974.986 M/sec                    ( +-  0.01% )  (83.34%)
       384,955,890      branch-misses             #    4.42% of all branches          ( +-  0.03% )  (83.33%)

           8.92647 +- 0.00557 seconds time elapsed  ( +-  0.06% )


 Performance counter stats for 'ffmpeg -loglevel quiet -threads 1 -i mtv.tta -f s16le /dev/null -y' (20 runs):

          6,504.99 msec task-clock                #    1.000 CPUs utilized            ( +-  0.13% )
                55      context-switches          #    0.008 K/sec                    ( +-  2.46% )
                16      cpu-migrations            #    0.002 K/sec                    ( +-  2.49% )
             3,692      page-faults               #    0.567 K/sec                    ( +-  0.04% )
    24,289,757,929      cycles                    #    3.734 GHz                      ( +-  0.13% )  (83.31%)
       327,298,120      stalled-cycles-frontend   #    1.35% frontend cycles idle     ( +-  0.15% )  (83.31%)
     3,454,321,337      stalled-cycles-backend    #   14.22% backend cycles idle      ( +-  0.20% )  (83.33%)
    67,039,367,957      instructions              #    2.76  insn per cycle
                                                  #    0.05  stalled cycles per insn  ( +-  0.00% )  (83.35%)
     8,499,223,928      branches                  # 1306.569 M/sec                    ( +-  0.00% )  (83.36%)
       313,478,808      branch-misses             #    3.69% of all branches          ( +-  0.01% )  (83.34%)

           6.50534 +- 0.00866 seconds time elapsed  ( +-  0.13% )


 Performance counter stats for 'ffmpeg -loglevel quiet -threads 16 -i mtv.tta -f s16le /dev/null -y' (320 runs):

          9,947.58 msec task-clock                #   12.309 CPUs utilized            ( +-  0.09% )
             4,450      context-switches          #    0.447 K/sec                    ( +-  0.37% )
             1,041      cpu-migrations            #    0.105 K/sec                    ( +-  1.04% )
             8,380      page-faults               #    0.842 K/sec                    ( +-  0.00% )
    36,296,805,088      cycles                    #    3.649 GHz                      ( +-  0.02% )  (83.37%)
       562,776,161      stalled-cycles-frontend   #    1.55% frontend cycles idle     ( +-  0.15% )  (83.13%)
     5,056,980,289      stalled-cycles-backend    #   13.93% backend cycles idle      ( +-  0.02% )  (83.21%)
    67,127,958,734      instructions              #    1.85  insn per cycle
                                                  #    0.08  stalled cycles per insn  ( +-  0.00% )  (83.30%)
     8,518,827,278      branches                  #  856.372 M/sec                    ( +-  0.00% )  (83.43%)
       317,537,726      branch-misses             #    3.73% of all branches          ( +-  0.00% )  (83.57%)

           0.80817 +- 0.00137 seconds time elapsed  ( +-  0.17% )

##############################################################################

 Performance counter stats for 'ttaR encode -q -S mtv.wav -o /dev/null' (20 runs):

          5,697.67 msec task-clock                #    1.000 CPUs utilized            ( +-  0.10% )
                13      context-switches          #    0.002 K/sec                    ( +- 14.11% )
                 0      cpu-migrations            #    0.000 K/sec                    ( +- 31.26% )
               252      page-faults               #    0.044 K/sec                    ( +-  0.10% )
    21,277,212,273      cycles                    #    3.734 GHz                      ( +-  0.08% )  (83.32%)
       424,132,363      stalled-cycles-frontend   #    1.99% frontend cycles idle     ( +-  1.19% )  (83.32%)
     8,164,992,255      stalled-cycles-backend    #   38.37% backend cycles idle      ( +-  0.17% )  (83.33%)
    50,036,808,248      instructions              #    2.35  insn per cycle        
                                                  #    0.16  stalled cycles per insn  ( +-  0.00% )  (83.34%)
     3,929,444,278      branches                  #  689.658 M/sec                    ( +-  0.01% )  (83.36%)
       371,064,057      branch-misses             #    9.44% of all branches          ( +-  0.01% )  (83.34%)

           5.69822 +- 0.00548 seconds time elapsed  ( +-  0.10% )


 Performance counter stats for 'ttaR encode -q -t16 mtv.wav -o /dev/null' (320 runs):

          8,177.73 msec task-clock                #   15.379 CPUs utilized            ( +-  0.02% )
             1,300      context-switches          #    0.159 K/sec                    ( +-  2.07% )
               203      cpu-migrations            #    0.025 K/sec                    ( +-  2.01% )
             5,834      page-faults               #    0.713 K/sec                    ( +-  0.04% )
    30,484,625,770      cycles                    #    3.728 GHz                      ( +-  0.02% )  (83.09%)
       439,889,964      stalled-cycles-frontend   #    1.44% frontend cycles idle     ( +-  0.38% )  (83.18%)
     4,980,516,390      stalled-cycles-backend    #   16.34% backend cycles idle      ( +-  0.04% )  (83.31%)
    50,068,984,657      instructions              #    1.64  insn per cycle        
                                                  #    0.10  stalled cycles per insn  ( +-  0.00% )  (83.46%)
     3,937,619,679      branches                  #  481.505 M/sec                    ( +-  0.00% )  (83.60%)
       380,192,849      branch-misses             #    9.66% of all branches          ( +-  0.00% )  (83.36%)

          0.531753 +- 0.000344 seconds time elapsed  ( +-  0.06% )


 Performance counter stats for 'ttaR decode -q -S mtv.tta -f raw -o /dev/null' (20 runs):

          5,807.10 msec task-clock                #    1.000 CPUs utilized            ( +-  0.12% )
                15      context-switches          #    0.003 K/sec                    ( +-  9.58% )
                 1      cpu-migrations            #    0.000 K/sec                    ( +- 25.36% )
               207      page-faults               #    0.036 K/sec                    ( +-  0.10% )
    21,697,968,813      cycles                    #    3.736 GHz                      ( +-  0.12% )  (83.29%)
       409,444,598      stalled-cycles-frontend   #    1.89% frontend cycles idle     ( +-  0.06% )  (83.33%)
     8,348,408,571      stalled-cycles-backend    #   38.48% backend cycles idle      ( +-  0.30% )  (83.35%)
    53,311,974,255      instructions              #    2.46  insn per cycle        
                                                  #    0.16  stalled cycles per insn  ( +-  0.00% )  (83.35%)
     3,884,628,982      branches                  #  668.945 M/sec                    ( +-  0.00% )  (83.36%)
       370,081,414      branch-misses             #    9.53% of all branches          ( +-  0.01% )  (83.32%)

           5.80763 +- 0.00684 seconds time elapsed  ( +-  0.12% )


 Performance counter stats for 'ttaR decode -q -t16 mtv.tta -f raw -o /dev/null' (320 runs):

          8,396.63 msec task-clock                #   15.660 CPUs utilized            ( +-  0.02% )
             2,705      context-switches          #    0.322 K/sec                    ( +-  0.61% )
                70      cpu-migrations            #    0.008 K/sec                    ( +-  1.97% )
             4,820      page-faults               #    0.574 K/sec                    ( +-  0.00% )
    31,341,341,315      cycles                    #    3.733 GHz                      ( +-  0.02% )  (83.16%)
       404,497,658      stalled-cycles-frontend   #    1.29% frontend cycles idle     ( +-  1.11% )  (83.19%)
     4,983,844,918      stalled-cycles-backend    #   15.90% backend cycles idle      ( +-  0.03% )  (83.27%)
    53,368,851,871      instructions              #    1.70  insn per cycle        
                                                  #    0.09  stalled cycles per insn  ( +-  0.00% )  (83.37%)
     3,896,895,024      branches                  #  464.102 M/sec                    ( +-  0.00% )  (83.55%)
       378,556,224      branch-misses             #    9.71% of all branches          ( +-  0.00% )  (83.45%)

          0.536170 +- 0.000363 seconds time elapsed  ( +-  0.07% )


Re: libttaR (TTA rewrite part 2)

Reply #3
Using (old version) ffmpeg of cli tool for performance comparison is flawed and extremely biased.
Also current ffmpeg cli tool have bad performance with smaller packets due to clumsy MT work of ex-developer.
Also just to run generic build of ffmpeg for the first time takes extra time... That is just few points I wanted to emphasize how such comparison is unfair and biased.
Please remove my account from this forum.

Re: libttaR (TTA rewrite part 2)

Reply #4
Also this repo does not have any SIMD x86 assembly and the only way performance can go up if compiler is modern clang.
Please remove my account from this forum.

Re: libttaR (TTA rewrite part 2)

Reply #5
Using (old version) ffmpeg of cli tool for performance comparison is flawed and extremely biased.
The tta codec version should be the same between the version I used and the current version, or at least the differences are negligible. You are always welcome to post your own benchmarks.

Also current ffmpeg cli tool have bad performance with smaller packets due to clumsy MT work of ex-developer.
ok

Also just to run generic build of ffmpeg for the first time takes extra time
That cannot be more than a millisecond.

Also this repo does not have any SIMD x86 assembly and the only way performance can go up if compiler is modern clang.
If anything, that is a virtue.
FYI, most of the performance increases came from optimizing the rice coder, which is completely unSIMDable