Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLAC v1.4.x Performance Tests (Read 84407 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.


Re: FLAC v1.4.x Performance Tests

Reply #77
Case GCC 12.2.0
Total encoding time: 1:11.218, 30.19x realtime
425513472 bytes

Case Haswell
Total encoding time: 1:16.328, 28.17x realtime
425513511 bytes

As far as I can tell, the only difference between these two builds is GCC version (12.2.0 for the former, 7.3.0 for the latter). @Case is that correct?
That is correct. The builds use identical configuration settings but different compiler version.

Re: FLAC v1.4.x Performance Tests

Reply #78
That is correct. The builds use identical configuration settings but different compiler version.

Not huge, but a pretty decent improvement on GCC's part. Obviously enough to make it jump from last to 1st place in that particular list.

Re: FLAC v1.4.x Performance Tests

Reply #79
One quick and dirty metaflac performance test. I apply album RG to 20 24bit albums. 18,63GB altogether on my Ryzen 5900x.
Time in minutes:
2:42 xiph official
2:33 john33 GCC 12.2.0 znver2
2:28 Case GCC 12.2.0 thanks Case btw. :) from here:
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #80
Finaly found some time to set up GCC. I used the flags Case kindly offered but set CFLAG -Ofast instead of -O3.
The binary is even a bit faster over here. Maybe others want to try because -Ofast may include optimizations that make problems. It is flac from current git.

btw. if this is faster it would be nice if Case could do a 1.41 official flac and metaflac. I am not so sure about the preconfigured scripts i use. If you want recent git versions Netranger is the man. He has routine with that.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #81
Yes, faster on my i3-12100 too. The first result is 4 CDs combined into a single .wav, the second result is the same 4 CDs split into 85 files. All tests used -8p.

Case GCC 12.2.0
Total encoding time: 2:23.610, 101.98x realtime
1465547067 bytes
Total encoding time: 0:33.735, 434.16x realtime
1465976688 bytes

Wombat GCC 12.2.0 Ofast
Total encoding time: 2:19.765, 104.79x realtime
1465547074 bytes
Total encoding time: 0:32.875, 445.52x realtime
1465977283 bytes

Re: FLAC v1.4.x Performance Tests

Reply #82
Code: [Select]
FLAC Binary: flac141-wombat-gcc1220-OFast.exe (794624 bytes)
FLAC Option: -7
 Average time =  26.876 seconds (5 rounds), Encoding speed = 402.29x
 FLAC size = 1.167.014.661 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  25.931 seconds (3 rounds), Encoding speed = 416.95x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
Global  Time =    25.546 =  100%    Physical Memory =     14 MB
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.

Re: FLAC v1.4.x Performance Tests

Reply #83
Maybe others want to try because -Ofast may include optimizations that make problems.
Googled a bit and it seems that -Ofast is not an optimization on processor architecture, it does optimization by changing some floating point logic. Would like to see comments from some developers about whether it is safe to use or not in the case of FLAC.

Re: FLAC v1.4.x Performance Tests

Reply #84
Actual definition is:
Quote
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens. It turns off -fsemantic-interposition.

Re: FLAC v1.4.x Performance Tests

Reply #85
The created files are exactly the same here using -O3 or -Ofast.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #86
floating point logic. Would like to see comments from some developers about whether it is safe to use or not in the case of FLAC.

No reason it should be unsafe. You use whatever-you-like to come up with a predictor, good or bad - the difference between a "good" and a "bad" one being the size of the residual. At the end of this post, ktf refers to this Stackexchange question where the point is described: it rounds off, and maybe that gives a slightly suboptimal predictor, which may be a reason why -p makes for a better performance "than it should".

Different compiles leading to slightly different .flac files has been a FAQ item for ages, and lo and behold they still do even if Wombat did not experience such differences right here.  https://hydrogenaud.io/index.php/topic,122949.msg1015699.html#msg1015699 . Unlike Monkey's - where signal and mode (Normal, High etc) give a unique encoded file except header/footer (tags) stuff and thus Monkey's chooses an MD5 on the encode and not the PCM - there are literally millions of potential FLAC files that represent the same audio.

Re: FLAC v1.4.x Performance Tests

Reply #87
It is already shown on other tests that different compiles created different files which decoded to the same PCM output, no MD5 error and such, like this:
[edit: corrected wrong link]
https://hydrogenaud.io/index.php/topic,123025.msg1016817.html#msg1016817

But this article indicated that for example, Infinities and NaNs can be treated differently:
https://simonbyrne.github.io/notes/fastmath/

So it is not only a math precision issue, it depends on how the higher level codes want the program to do. I am not saying the risk of producing non-bitperfect FLAC files, but risks of significant slowdown or crash when encoding some inputs, especially the crafted ones.


Re: FLAC v1.4.x Performance Tests

Reply #89
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
That's why a -Ofast compile from Case's enviroment could be interesting.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #90
The FMA intrinsics are compiled with "-ffast-math".
[...]
I am not sure why the SSE and AVX ones are not.
Because the SSE and AVX code is with intrinsics, but the FMA is plain C targeted at FMA. For SSE and AVX instructions need not to be reordered, but with FMA there is this need, so -fassociative-math is needed, which is part of -ffast-math
Music: sounds arranged such that they construct feelings.

Re: FLAC v1.4.x Performance Tests

Reply #91
That makes total sense, thank you.

Re: FLAC v1.4.x Performance Tests

Reply #92
A gcc -Ofast compile of flac and metaflac from reference lib flac 1.4.1.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

 

Re: FLAC v1.4.x Performance Tests

Reply #93
subdivide_tukey(N/taper), testing the tapering parameter

Background: As explained in the docs, subdivide_tukey(N) by default tapers a fraction of 0.5 divided by N. The "/" is not a division slash here, only a separator character: subdivide_tukey(N/P) means that a fraction P/N is tapered.
P defaults to 0.5 - as has been the case for the default windowing function up to level -5, and before 1.4 also up to -8.
The question is why divide the 0.5 (or other P) by N? Must be: ktf &co have tested it and found it improves.

Indeed, I found some evidence that it "is not enough": if you bother to tweak, make the tapering parameter even smaller. 
Test for yourselves, it is likely material-dependent. And use scientific notation like 8e-2 rather than the locale-dependent 0.08 or 0,08.

What I did: ran tapering parameters 8e-2, 16e-2, upwards in steps of 8, with N=3, 4, 5. So, subdivide_tukey(3/8e-2), subdivide_tukey(3/16e-2), ... and then bumping the "3" up to 4 and 5. (Not hitting -8 exactly - I did subdivide_tukey(3/48e-2) and not 3/50e-2 , but I had already a standard -8 ... not that it mattered.) Why stop at 5? Because testing indicated that around there, -8p becomes more attractive.

Results: For both N=3 (as in -8!), N=4 and N=5, the taper parameter that made for smallest files, was 24e-2 i.e. 0.24 rather than the default 0.5. Then 0.32 was marginally better than 0.16.

This points at a smaller taper parameter than the one third in the "arbitrarily tested" combination in Reply 48.

Also checked (this preliminary): as in Reply 48, combining with a bigger single tukey.
Hypothesis: because single tukey has always had the default parameter 0.5 - this after quite a bit of testing back in the day - there is no good reason that this small tapering should be good for a single tukey run, ==> reason why it works is for the subdivisions ==> if you want to improve, try one with a bigger taper parameter like -5 uses.
This to be tested with -p, because - I guess! - there is more of a demand for something that squeezes out more than -8p without wasting the month, than for something between -8 and -8p. (However, since -7 is so good, it might be a case for beefing up -8 further.)

Re: FLAC v1.4.x Performance Tests

Reply #94
Code: [Select]
FLAC Binary: flac141-wombat-gcc1220-OFast.exe (794624 bytes)
FLAC Option: -7
 Average time =  26.876 seconds (5 rounds), Encoding speed = 402.29x
 FLAC size = 1.167.014.661 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-gcc12.exe (781312 bytes)
FLAC Option: -7
 Average time =  25.931 seconds (3 rounds), Encoding speed = 416.95x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
Global  Time =    25.546 =  100%    Physical Memory =     14 MB
 Average time =  25.392 seconds (3 rounds), Encoding speed = 425.80x
 FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)
On my old Intel Core i7-8700 CPU @ 3.20GHz still Case's Haswell build (= GCC v7.3.0) followed by Case's GCC v12.2 build.
Could you do the same test on the Xiph build as well? Because it is interesting that your speed ranking is opposite to mine. For me Wombat's build is the fastest while Case's GCC 7.3.0 is the slowest.

Re: FLAC v1.4.x Performance Tests

Reply #95
One quick and dirty metaflac performance test. I apply album RG to 20 24bit albums. 18,63GB altogether on my Ryzen 5900x.
Time in minutes:
2:42 xiph official
2:33 john33 GCC 12.2.0 znver2
2:28 Case GCC 12.2.0 thanks Case btw. :) from here:
2:12 for the -Ofast version.
All files i created on the Ryzen are the same as the -O3 compile. The further -Ofast optimizations work well together with the flac code.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Re: FLAC v1.4.x Performance Tests

Reply #96
@bennetng: Here you go:
Code: [Select]
FLAC Binary: flac141-case-haswell.exe (860160 bytes)
FLAC Option: -7
- Average time =  25.288 seconds (5 rounds), Encoding speed = 427.56x (a little faster today... ;)
- FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

FLAC Binary: xiph-141\flac.exe (299520 bytes)
FLAC Option: -7
- Average time =  27.598 seconds (5 rounds), Encoding speed = 391.77x
- FLAC size = 1.167.014.383 bytes (= 61,188% of WAV size, ~863 kbps)

Re: FLAC v1.4.x Performance Tests

Reply #97
Thanks. Interesting that my results are more similar to the Ryzen than a not much older i7.

Re: FLAC v1.4.x Performance Tests

Reply #98
Thanks. Interesting that my results are more similar to the Ryzen than a not much older i7.
Four generations, four years - but nothing revolutionary in the architecture? I agree that it is kinda unexpected.
Thinking aloud:
* You are both running them single-threaded? They differ: 6 cores 12 threads  vs 4 cores 8 threads.
* Is there any reason that e.g. RAM should matter?

Re: FLAC v1.4.x Performance Tests

Reply #99
My test environment is this:
CPU: Intel Core i7-8700 CPU @ 3.20GHz
RAM: 2 x 16 GB DDR4-2666 (1333 MHz) SK-Hynix
HDD: Samsung SSD 860 EVO 500GB

Both the source WAVs and the created FLACs come from/go to that SSD.
Btw. is there any (= freeware, trusted, non-system-cluttering) RAM disk solution for Windows 10 to recommend?
And yes, I'm running the test single core, file by file, from a console window. Timing is done with Igor Pavlov's timer64.exe.
But I did not try to select a specific CPU (I think there are tools for this), but let the Windows task manager do its job. So when you watch the task manager, there's not a single CPU constantly at 100% while the test is running, but CPUs are swapped.