Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLAC v1.4.x Performance Tests (Read 99940 times) previous topic - next topic
Skymmer and 1 Guest are viewing this topic.

Re: FLAC v1.4.x Performance Tests

Reply #500
-8p down from 6'13" to 3"49 and then a different compile shaves off nearly another minute - not complaining, no.
-8p is now not even 13 times as slow as -5  ;)

Re: FLAC v1.4.x Performance Tests

Reply #501
Thanks. It turns out that some changes I made when working out the 32-bit encoder/decoder did affect the 24-bit part more than I thought. I was under the impression the code paths meant for 32-bit audio were only seldomly used for 24-bit audio, but it turns out certain kinds of 24-bit audio (especially those with a completely empty upper half of the spectrum) do use these code paths a lot, and they are much slower.

So, these changes make the choice between these code paths more strict: that choice was rather made rather roughly (on the safe side of course), but now the encoder goes through a little bit more math to only choose the slow code path when absolutely necessary.

The speed-up is highly dependent on source material. Audio with a high samplerate in which the upper frequencies are fully 'utilised' do not see any change at all, most audio I've tested sees quite some improvement at preset 8, and those where really no audio exists above 20kHz see most improvement at preset 5, but are still slow at preset 8.
Music: sounds arranged such that they construct feelings.

Re: FLAC v1.4.x Performance Tests

Reply #502
(especially those with a completely empty upper half of the spectrum)
Oh, and that maybe makes it even more interesting - those signals appear to be the ones where -e matters, i.e. the model choice algorithm is less reliable. If such ones can be "identified" (not surely, but with enough statistical association), then we might be in for some fun ... uh, assuming that developer and testers have infinite amount of spare time of course.

Re: FLAC v1.4.x Performance Tests

Reply #503
This is a spectrogram of the 24/48KHz wave from my previous post.

X




This wave file for this run is 24-bit/96KHz  1h52m  3.63GiB

flac git-04532802 (2024-05-02)
Code: [Select]
     1 thread     8 threads
-5   0m26.6009s   0m7.253s
-5p  1m44.983s    0m20.706s
-8   2m10.424s    0m25.829s
-8p  17m37.321s   3m30.797s

flac git-cfe3afca (2024-05-16)
Code: [Select]
     1 thread     8 threads
-5   0m20.809s    0m6.847s
-5p  0m35.891s    0m8.379s
-8   1m46.554s    0m21.323s
-8p  11m30.919s   2m20.278s

X


Re: FLAC v1.4.x Performance Tests

Reply #504
-5p from 105 to 36 seconds ...  ;D
I didn't quite get the change, but is it so that the code checks that the residual fits a signed short, and when it does ... much faster?
Has it made changes as to when it can select the 4-bit method?

How does -5e work? And with sizes for -5, -5e, -5p? Never mind whether there is any change in -5e, I am curious about the bang for the buck, and the comparison between -5e and -5p must have been tilted right now.

Re: FLAC v1.4.x Performance Tests

Reply #505
I didn't quite get the change, but is it so that the code checks that the residual fits a signed short, and when it does ... much faster?

When adding the 32bit PCM part of the encoder, I've amended the FLAC spec to include that all residuals must fit a 32 bit signed int. This is to keep decoding simple. The encoder must make sure this is done.

It is possible to calculate that for certain predictors, checking each residual sample separately is not necessary. When this is not possible, each residual sample must be checked, which is slower of course. This calculation was improved, so the slow process of checking all residual samples is needed less often.

I didn't know at the time 24-bit encoding would be affected, but it turns out that for signals with very little noise in the upper frequencies (= smooth signal) the predictor can be of a very high quality, which can lead to the residual spiking for parts of the signal where the predictor doesn't fit. This doesn't normally happen, but it needs to be checked anyway.
Music: sounds arranged such that they construct feelings.

Re: FLAC v1.4.x Performance Tests

Reply #506
Ah. So you have just improved a criterion of the following kind:
"This predictor vector cannot possibly create any too big residual from a history of N-bit samples, so we can save time by bypassing the size checks that we are mathematically sure it would anyway pass"?

Re: FLAC v1.4.x Performance Tests

Reply #507
If I don't assume differences in performance of binaries (same Flac git version, same compiler and version) prepared by different people, then these last optimizations are indeed significant even in my conditions (mobile Intel i5-8250U), but undoubtedly content-dependent.

Audio file 24/44, 1:20 hour, Flac compression "-4":
- Flac commit: 28e4f05, GCC14.1 (NetRanger) - ca. 440-450x
- Flac git-cfe3afca, GCC14.1 (Wombat) - ca. 495-505x
- Flac git-cfe3afca, GCC12.2 (Replica9K) - ca. 480-500x, fluctuates greatly during encoding

..but NetRanger's Clang compile is very slightly but constantly a winner
- Flac commit: 28e4f05, Clang 18.1.4 (NetRanger)- ca. 505-515x

The result of Clang for git-cfe3afca could therefore be quite interesting. However, if I understand correctly, under standard conditions with no external influences (software etc.) the difference between GCC13 and Clang18 should ideally be none. GCC14 is a bit below v.13 in performance for me as well.

Re: FLAC v1.4.x Performance Tests

Reply #508
Compression -4 may be pretty good with a clang compile but high compression and multithreaded stress should be clearly faster with GCC builds.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #509
Ah. So you have just improved a criterion of the following kind:
"This predictor vector cannot possibly create any too big residual from a history of N-bit samples, so we can save time by bypassing the size checks that we are mathematically sure it would anyway pass"?
Yes, that is correct.
Music: sounds arranged such that they construct feelings.

Re: FLAC v1.4.x Performance Tests

Reply #510
Compression -4 may be pretty good with a clang compile but high compression and multithreaded stress should be clearly faster with GCC builds.


Ok, i didn't try that.. but luckily, all others do it here. I choose compression "-4" based on the charts and own tests yeeeeeears ago, all resting compression steps ceased to exist for me, i forgot them completely.. and that continues to this day.
But I understand that the code here should primarily be tested under stress conditions of higher compression, where the changes are more pronounced.

Re: FLAC v1.4.x Performance Tests

Reply #511
Ok, i didn't try that.. but luckily, all others do it here. I choose compression "-4" based on the charts and own tests yeeeeeears ago, all resting compression steps ceased to exist for me, i forgot them completely.. and that continues to this day.
But I understand that the code here should primarily be tested under stress conditions of higher compression, where the changes are more pronounced.
SORRRY! Me is a bit outdated it seems. I just tried the latest flac git together with latest clang and indeed it is faster as everything else even with high compression so far! Guess i must change some things here.
Attached a generic clang 03 compile without much additional optimizations for testing latest git.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #512
First of all thanks for the binary file, Wombat!
I know this is a discussion primarily about encoder code optimization and not about the features and impact of different compilers on the result, so I don't want to clutter it up with a proxy issue here, sorry for that.
So just in short.. somehow it is going crazy on my side, as i'm unable to reproduce the previous results with same setup anymore.. everything is simply slower today.. anyway, the speed-up from "Release 1.4.3" to last git with GCC14.1 does not seem to happen in case of Clang. Rather your binary has the same encoding performance as NetRanger's Clang binary of "Release 1.4.3", at best.
I've to look, whats wrong here and repeat it later.. There doesn't seem to be any background stuff issue, nor fb2k converter settings impact this.

Re: FLAC v1.4.x Performance Tests

Reply #513
You're absolutely right. Only when there really is a final version one day i may try different ways of compiling.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

 

Re: FLAC v1.4.x Performance Tests

Reply #514
I tried some AVX2 versions on my 5900x and metaflacs replaygain is clearly faster with GCC, 16bit and option disabled asm is clearly faster with GCC and 16bit/24bit combined is both faster with the default AVX2 clang.
All i was able to produce with lto and clang produced slower binaries.
I guess some experienced users can do better with lto or pgo or even a combination of both.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!