FLAC v1.4.x Performance Tests

Topic: FLAC v1.4.x Performance Tests (Read 78799 times) previous topic - next topic

cid42 and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #500 – 2024-05-16 20:59:48

-8p down from 6'13" to 3"49 and then a different compile shaves off nearly another minute - not complaining, no.
-8p is now not even 13 times as slow as -5

Re: FLAC v1.4.x Performance Tests

Reply #501 – 2024-05-17 07:32:39

Thanks. It turns out that some changes I made when working out the 32-bit encoder/decoder did affect the 24-bit part more than I thought. I was under the impression the code paths meant for 32-bit audio were only seldomly used for 24-bit audio, but it turns out certain kinds of 24-bit audio (especially those with a completely empty upper half of the spectrum) do use these code paths a lot, and they are much slower.

So, these changes make the choice between these code paths more strict: that choice was rather made rather roughly (on the safe side of course), but now the encoder goes through a little bit more math to only choose the slow code path when absolutely necessary.

The speed-up is highly dependent on source material. Audio with a high samplerate in which the upper frequencies are fully 'utilised' do not see any change at all, most audio I've tested sees quite some improvement at preset 8, and those where really no audio exists above 20kHz see most improvement at preset 5, but are still slow at preset 8.

Re: FLAC v1.4.x Performance Tests

Reply #502 – 2024-05-17 11:07:29

Quote from: ktf on 2024-05-17 07:32:39

(especially those with a completely empty upper half of the spectrum)

Oh, and that maybe makes it even more interesting - those signals appear to be the ones where -e matters, i.e. the model choice algorithm is less reliable. If such ones can be "identified" (not surely, but with enough statistical association), then we might be in for some fun ... uh, assuming that developer and testers have infinite amount of spare time of course.

Re: FLAC v1.4.x Performance Tests

Reply #503 – 2024-05-17 16:31:19

This is a spectrogram of the 24/48KHz wave from my previous post.

This wave file for this run is 24-bit/96KHz 1h52m 3.63GiB

flac git-04532802 (2024-05-02)

Code: [Select]

     1 thread     8 threads
-5   0m26.6009s   0m7.253s
-5p  1m44.983s    0m20.706s
-8   2m10.424s    0m25.829s
-8p  17m37.321s   3m30.797s

flac git-cfe3afca (2024-05-16)

Code: [Select]

     1 thread     8 threads
-5   0m20.809s    0m6.847s
-5p  0m35.891s    0m8.379s
-8   1m46.554s    0m21.323s
-8p  11m30.919s   2m20.278s

Re: FLAC v1.4.x Performance Tests

Reply #504 – 2024-05-17 20:37:09

-5p from 105 to 36 seconds ...

I didn't quite get the change, but is it so that the code checks that the residual fits a signed short, and when it does ... much faster?
Has it made changes as to when it can select the 4-bit method?

How does -5e work? And with sizes for -5, -5e, -5p? Never mind whether there is any change in -5e, I am curious about the bang for the buck, and the comparison between -5e and -5p must have been tilted right now.

Re: FLAC v1.4.x Performance Tests

Reply #505 – 2024-05-17 21:20:57

Quote from: Porcus on 2024-05-17 20:37:09

I didn't quite get the change, but is it so that the code checks that the residual fits a signed short, and when it does ... much faster?

When adding the 32bit PCM part of the encoder, I've amended the FLAC spec to include that all residuals must fit a 32 bit signed int. This is to keep decoding simple. The encoder must make sure this is done.

It is possible to calculate that for certain predictors, checking each residual sample separately is not necessary. When this is not possible, each residual sample must be checked, which is slower of course. This calculation was improved, so the slow process of checking all residual samples is needed less often.

I didn't know at the time 24-bit encoding would be affected, but it turns out that for signals with very little noise in the upper frequencies (= smooth signal) the predictor can be of a very high quality, which can lead to the residual spiking for parts of the signal where the predictor doesn't fit. This doesn't normally happen, but it needs to be checked anyway.

Re: FLAC v1.4.x Performance Tests

Reply #506 – 2024-05-17 22:22:40

Ah. So you have just improved a criterion of the following kind:
"This predictor vector cannot possibly create any too big residual from a history of N-bit samples, so we can save time by bypassing the size checks that we are mathematically sure it would anyway pass"?

Re: FLAC v1.4.x Performance Tests

Reply #507 – Today at 08:16

If I don't assume differences in performance of binaries (same Flac git version, same compiler and version) prepared by different people, then these last optimizations are indeed significant even in my conditions (mobile Intel i5-8250U), but undoubtedly content-dependent.

Audio file 24/44, 1:20 hour, Flac compression "-4":
- Flac commit: 28e4f05, GCC14.1 (NetRanger) - ca. 440-450x
- Flac git-cfe3afca, GCC14.1 (Wombat) - ca. 495-505x
- Flac git-cfe3afca, GCC12.2 (Replica9K) - ca. 480-500x, fluctuates greatly during encoding

..but NetRanger's Clang compile is very slightly but constantly a winner
- Flac commit: 28e4f05, Clang 18.1.4 (NetRanger)- ca. 505-515x

The result of Clang for git-cfe3afca could therefore be quite interesting. However, if I understand correctly, under standard conditions with no external influences (software etc.) the difference between GCC13 and Clang18 should ideally be none. GCC14 is a bit below v.13 in performance for me as well.

Re: FLAC v1.4.x Performance Tests

Reply #508 – Today at 13:36

Compression -4 may be pretty good with a clang compile but high compression and multithreaded stress should be clearly faster with GCC builds.

Re: FLAC v1.4.x Performance Tests

Reply #509 – Today at 14:20

Quote from: Porcus on 2024-05-17 22:22:40

Ah. So you have just improved a criterion of the following kind:
"This predictor vector cannot possibly create any too big residual from a history of N-bit samples, so we can save time by bypassing the size checks that we are mathematically sure it would anyway pass"?

Yes, that is correct.

Re: FLAC v1.4.x Performance Tests

Reply #510 – Today at 15:12

Quote from: Wombat on Today at 13:36

Compression -4 may be pretty good with a clang compile but high compression and multithreaded stress should be clearly faster with GCC builds.

Ok, i didn't try that.. but luckily, all others do it here. I choose compression "-4" based on the charts and own tests yeeeeeears ago, all resting compression steps ceased to exist for me, i forgot them completely.. and that continues to this day.
But I understand that the code here should primarily be tested under stress conditions of higher compression, where the changes are more pronounced.

Re: FLAC v1.4.x Performance Tests

Reply #511 – Today at 15:49

Quote from: jaro1 on Today at 15:12

Ok, i didn't try that.. but luckily, all others do it here. I choose compression "-4" based on the charts and own tests yeeeeeears ago, all resting compression steps ceased to exist for me, i forgot them completely.. and that continues to this day.
But I understand that the code here should primarily be tested under stress conditions of higher compression, where the changes are more pronounced.

SORRRY! Me is a bit outdated it seems. I just tried the latest flac git together with latest clang and indeed it is faster as everything else even with high compression so far! Guess i must change some things here.
Attached a generic clang 03 compile without much additional optimizations for testing latest git.

Notice