Print Page - libttaR (TTA rewrite part 2)

Title: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2023-12-10 01:23:41

https://github.com/stseelig/libttaR

Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-12 17:28:18

good news, bad news

good news:
i've been working on a multi-threaded version. the encoder is already done in the 1.1-dev branch

bad news:
the tta wikipedia page was deleted

Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-21 05:28:38

1.1 is now nearly finished. There is just some minor stuff for me to get at. I plan to officially release it on the first of next month.

here is a benchmark against ffmpeg:

system:
   Linux 5.10.0-23-amd64 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
CPU:
   AMD Ryzen 7 1700 64-bit 8-Core MT MCP 1550/3750 MHz
ffmpeg:
   ffmpeg version 4.3.6-0+deb11u1
   built with gcc 10 (Debian 10.2.1-6)
   libavutil 56. 51.100 / 56. 51.100
   libavcodec    58. 91.100 / 58. 91.100
   libavformat 58. 45.100 / 58. 45.100
   libavdevice 58. 10.100 / 58. 10.100
   libavfilter    7. 85.100 / 7. 85.100
   libavresample   4. 0. 0 / 4. 0. 0
   libswscale 5. 7.100 / 5. 7.100
   libswresample   3. 7.100 / 3. 7.100
   libpostproc 55. 7.100 / 55. 7.100
ttaR:
   Debian clang version 11.0.1-2
   -march=native -mtune=native
file:
   Nirvana - MTV Unplugged in New York
   571084460   mtv.wav
   337981896   mtv.tta

##############################################################################

Performance counter stats for 'ffmpeg -loglevel quiet -threads 1 -i mtv.wav -f tta /dev/null -y' (20 runs):

8,925.80 msec task-clock # 1.000 CPUs utilized ( +- 0.06% )
60 context-switches # 0.007 K/sec ( +- 5.24% )
16 cpu-migrations # 0.002 K/sec ( +- 2.69% )
86,798 page-faults    # 0.010 M/sec ( +- 0.00% )
33,302,059,704 cycles # 3.731 GHz ( +- 0.05% ) (83.33%)
   566,286,522 stalled-cycles-frontend   # 1.70% frontend cycles idle    ( +- 0.54% ) (83.33%)
   3,988,341,464 stalled-cycles-backend #   11.98% backend cycles idle ( +- 0.18% ) (83.33%)
65,718,741,261 instructions # 1.97 insn per cycle
# 0.06 stalled cycles per insn ( +- 0.00% ) (83.33%)
   8,702,525,883 branches # 974.986 M/sec ( +- 0.01% ) (83.34%)
   384,955,890 branch-misses    # 4.42% of all branches ( +- 0.03% ) (83.33%)

   8.92647 +- 0.00557 seconds time elapsed ( +- 0.06% )

Performance counter stats for 'ffmpeg -loglevel quiet -threads 1 -i mtv.tta -f s16le /dev/null -y' (20 runs):

6,504.99 msec task-clock # 1.000 CPUs utilized ( +- 0.13% )
55 context-switches # 0.008 K/sec ( +- 2.46% )
16 cpu-migrations # 0.002 K/sec ( +- 2.49% )
   3,692 page-faults    # 0.567 K/sec ( +- 0.04% )
24,289,757,929 cycles # 3.734 GHz ( +- 0.13% ) (83.31%)
   327,298,120 stalled-cycles-frontend   # 1.35% frontend cycles idle    ( +- 0.15% ) (83.31%)
   3,454,321,337 stalled-cycles-backend #   14.22% backend cycles idle ( +- 0.20% ) (83.33%)
67,039,367,957 instructions # 2.76 insn per cycle
# 0.05 stalled cycles per insn ( +- 0.00% ) (83.35%)
   8,499,223,928 branches # 1306.569 M/sec ( +- 0.00% ) (83.36%)
   313,478,808 branch-misses    # 3.69% of all branches ( +- 0.01% ) (83.34%)

   6.50534 +- 0.00866 seconds time elapsed ( +- 0.13% )

Performance counter stats for 'ffmpeg -loglevel quiet -threads 16 -i mtv.tta -f s16le /dev/null -y' (320 runs):

9,947.58 msec task-clock #   12.309 CPUs utilized ( +- 0.09% )
   4,450 context-switches # 0.447 K/sec ( +- 0.37% )
   1,041 cpu-migrations # 0.105 K/sec ( +- 1.04% )
   8,380 page-faults    # 0.842 K/sec ( +- 0.00% )
36,296,805,088 cycles # 3.649 GHz ( +- 0.02% ) (83.37%)
   562,776,161 stalled-cycles-frontend   # 1.55% frontend cycles idle    ( +- 0.15% ) (83.13%)
   5,056,980,289 stalled-cycles-backend #   13.93% backend cycles idle ( +- 0.02% ) (83.21%)
67,127,958,734 instructions # 1.85 insn per cycle
# 0.08 stalled cycles per insn ( +- 0.00% ) (83.30%)
   8,518,827,278 branches # 856.372 M/sec ( +- 0.00% ) (83.43%)
   317,537,726 branch-misses    # 3.73% of all branches ( +- 0.00% ) (83.57%)

   0.80817 +- 0.00137 seconds time elapsed ( +- 0.17% )

##############################################################################

Performance counter stats for 'ttaR encode -q -S mtv.wav -o /dev/null' (20 runs):

5,697.67 msec task-clock # 1.000 CPUs utilized ( +- 0.10% )
13 context-switches # 0.002 K/sec ( +- 14.11% )
   0 cpu-migrations # 0.000 K/sec ( +- 31.26% )
   252 page-faults    # 0.044 K/sec ( +- 0.10% )
21,277,212,273 cycles # 3.734 GHz ( +- 0.08% ) (83.32%)
   424,132,363 stalled-cycles-frontend   # 1.99% frontend cycles idle    ( +- 1.19% ) (83.32%)
   8,164,992,255 stalled-cycles-backend #   38.37% backend cycles idle ( +- 0.17% ) (83.33%)
50,036,808,248 instructions # 2.35 insn per cycle
# 0.16 stalled cycles per insn ( +- 0.00% ) (83.34%)
   3,929,444,278 branches # 689.658 M/sec ( +- 0.01% ) (83.36%)
   371,064,057 branch-misses    # 9.44% of all branches ( +- 0.01% ) (83.34%)

   5.69822 +- 0.00548 seconds time elapsed ( +- 0.10% )

Performance counter stats for 'ttaR encode -q -t16 mtv.wav -o /dev/null' (320 runs):

8,177.73 msec task-clock #   15.379 CPUs utilized ( +- 0.02% )
   1,300 context-switches # 0.159 K/sec ( +- 2.07% )
   203 cpu-migrations # 0.025 K/sec ( +- 2.01% )
   5,834 page-faults    # 0.713 K/sec ( +- 0.04% )
30,484,625,770 cycles # 3.728 GHz ( +- 0.02% ) (83.09%)
   439,889,964 stalled-cycles-frontend   # 1.44% frontend cycles idle    ( +- 0.38% ) (83.18%)
   4,980,516,390 stalled-cycles-backend #   16.34% backend cycles idle ( +- 0.04% ) (83.31%)
50,068,984,657 instructions # 1.64 insn per cycle
# 0.10 stalled cycles per insn ( +- 0.00% ) (83.46%)
   3,937,619,679 branches # 481.505 M/sec ( +- 0.00% ) (83.60%)
   380,192,849 branch-misses    # 9.66% of all branches ( +- 0.00% ) (83.36%)

0.531753 +- 0.000344 seconds time elapsed ( +- 0.06% )

Performance counter stats for 'ttaR decode -q -S mtv.tta -f raw -o /dev/null' (20 runs):

5,807.10 msec task-clock # 1.000 CPUs utilized ( +- 0.12% )
15 context-switches # 0.003 K/sec ( +- 9.58% )
   1 cpu-migrations # 0.000 K/sec ( +- 25.36% )
   207 page-faults    # 0.036 K/sec ( +- 0.10% )
21,697,968,813 cycles # 3.736 GHz ( +- 0.12% ) (83.29%)
   409,444,598 stalled-cycles-frontend   # 1.89% frontend cycles idle    ( +- 0.06% ) (83.33%)
   8,348,408,571 stalled-cycles-backend #   38.48% backend cycles idle ( +- 0.30% ) (83.35%)
53,311,974,255 instructions # 2.46 insn per cycle
# 0.16 stalled cycles per insn ( +- 0.00% ) (83.35%)
   3,884,628,982 branches # 668.945 M/sec ( +- 0.00% ) (83.36%)
   370,081,414 branch-misses    # 9.53% of all branches ( +- 0.01% ) (83.32%)

   5.80763 +- 0.00684 seconds time elapsed ( +- 0.12% )

Performance counter stats for 'ttaR decode -q -t16 mtv.tta -f raw -o /dev/null' (320 runs):

8,396.63 msec task-clock #   15.660 CPUs utilized ( +- 0.02% )
   2,705 context-switches # 0.322 K/sec ( +- 0.61% )
70 cpu-migrations # 0.008 K/sec ( +- 1.97% )
   4,820 page-faults    # 0.574 K/sec ( +- 0.00% )
31,341,341,315 cycles # 3.733 GHz ( +- 0.02% ) (83.16%)
   404,497,658 stalled-cycles-frontend   # 1.29% frontend cycles idle    ( +- 1.11% ) (83.19%)
   4,983,844,918 stalled-cycles-backend #   15.90% backend cycles idle ( +- 0.03% ) (83.27%)
53,368,851,871 instructions # 1.70 insn per cycle
# 0.09 stalled cycles per insn ( +- 0.00% ) (83.37%)
   3,896,895,024 branches # 464.102 M/sec ( +- 0.00% ) (83.55%)
   378,556,224 branch-misses    # 9.71% of all branches ( +- 0.00% ) (83.45%)

0.536170 +- 0.000363 seconds time elapsed ( +- 0.07% )

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-05-21 08:44:41

Using (old version) ffmpeg of cli tool for performance comparison is flawed and extremely biased.
Also current ffmpeg cli tool have bad performance with smaller packets due to clumsy MT work of ex-developer.
Also just to run generic build of ffmpeg for the first time takes extra time... That is just few points I wanted to emphasize how such comparison is unfair and biased.

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-05-21 08:49:03

Also this repo does not have any SIMD x86 assembly and the only way performance can go up if compiler is modern clang.

Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-21 21:30:09

Quote from: mycroft on 2024-05-21 08:44:41

Using (old version) ffmpeg of cli tool for performance comparison is flawed and extremely biased.

The tta codec version should be the same between the version I used and the current version, or at least the differences are negligible. You are always welcome to post your own benchmarks.

Quote from: mycroft on 2024-05-21 08:44:41

Also current ffmpeg cli tool have bad performance with smaller packets due to clumsy MT work of ex-developer.

Quote from: mycroft on 2024-05-21 08:44:41

Also just to run generic build of ffmpeg for the first time takes extra time

That cannot be more than a millisecond.

Quote from: mycroft on 2024-05-21 08:49:03

Also this repo does not have any SIMD x86 assembly and the only way performance can go up if compiler is modern clang.

If anything, that is a virtue.
FYI, most of the performance increases came from optimizing the rice coder, which is completely unSIMDable

Title: Re: libttaR (TTA rewrite part 2)
Post by: Porcus on 2024-05-21 23:37:11

smaller packets, wouldn't TTA be fixed at about a second - that's not small?

Anyway, ffmpeg 5 isn't the newest and hottest either ... though, compared to this article:

Quote from: rdtsh on 2024-05-12 17:28:18

the tta wikipedia page was deleted

Been flagged as looking like an ad for twelve years ... and not too wrongfully either.

Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-05-22 00:05:50

Quote from: Porcus on 2024-05-21 23:37:11

smaller packets, wouldn't TTA be fixed at about a second - that's not small?

yes, with CD quality audio, the framesize is 180KiB.

Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-06-01 21:36:15

1.1 officially released

lib:
api changes
performance improvements

cli:
multi-threading
faster single-threading
some bug fixes

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 00:11:21

Still extremely misleading and biased, proper benchmark are done by decoding very long audio files and looking at speed of decoding versus realtime. Also TTA is more irrelevant and niche than TAK, and TAK actually did have some cool new innovative stuff to offer, TTA have nothing new to offer.

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 00:13:44

Quote from: rdtsh on 2024-05-22 00:05:50

Quote from: Porcus on 2024-05-21 23:37:11
smaller packets, wouldn't TTA be fixed at about a second - that's not small?
yes, with CD quality audio, the framesize is 180KiB.

Not decoded framesize, but encoded ones as stored in final output, if audio is mostly silence it will slow thing down unless TTA encode silence frames extremely inefficiently which may be true after all considering it poor design decisions.

Title: Re: libttaR (TTA rewrite part 2)
Post by: rdtsh on 2024-06-02 04:04:33

Quote from: mycroft on 2024-06-02 00:13:44

Not decoded framesize, but encoded ones as stored in final output, if audio is mostly silence it will slow thing down unless TTA encode silence frames extremely inefficiently which may be true after all considering it poor design decisions.

so ffmpeg gets slower when it has to read less? what a well designed piece of software
and yes, tta is not that efficient with silence compared to most other codecs

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 09:29:08

Indeed, 60 seconds of silence (44100Hz S16) encoded with ffmpeg: tta is 654K and flac is 17K.

Title: Re: libttaR (TTA rewrite part 2)
Post by: Porcus on 2024-06-02 12:09:22

I tested the compression of silence: https://hydrogenaud.io/index.php/topic,122413.0.html . (Post says "5.1" but it was five channels.)
TTA isn't that bad for a codec that doesn't handle wasted bits - compare to Monkey's and ALAC.
I could get smaller files by manually setting block sizes. 29k with ALS, 70k with FLAC, and 220k with WavPack --pair-unassigned-chans (edit: fixed numbers)

Anyway, don't whine over people who make a little code-improvement project. The TTA reference implementation isn't good.
As for the format itself, there is one thing to it, that sometimes is useful in some applications: the encoded bit stream is unique. (Bar the encryption feature, which I haven't seen anything being able to play except command-line ffplay.) Would make it ideal for p2p distribution.

Title: Re: libttaR (TTA rewrite part 2)
Post by: mycroft on 2024-06-02 12:39:49

I do not whine over people, I whine over their claims.

HydrogenAudio

Lossless Audio Compression => Lossless / Other Codecs => Topic started by: rdtsh on 2023-12-10 01:23:41