Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Wavpack.exe Slow on AMD Ryzen (Read 17649 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Wavpack.exe Slow on AMD Ryzen

I have two machines:

#1 is a Ryzen 3900X with 32GB of 3200MHz RAM running Windows 10 x64
#2 is an i7-4770 with 16GB DDR3 running Windows 7 x64

Running Windows 64-bit wavpack.exe 5.1.0 on machine #1 is 64% SLOWER than machine #2.

Code: [Select]
wavpack -hh -x3 -m test.wav

Machine #1 takes 1:23 to complete the task while machine #2 takes 0:50. This is completely counter intuitive to me since machine #1 bests machine #2 in all benchmarks, even single-threaded integer performance (24% faster). What am I missing? More importantly, is there anything I can do to speed up wavpack on machine #1?

EDIT - Just in case anyone was wondering, it's not a hard drive issue. Machine #1 has an NVMe drive and machine #2 is just spinning platters.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #1
WavPack has a lot of hand-optimized MMX code so it's quite possible it accidentally has a sequence in a critical loop that incurs a penalty on non-Intel CPUs.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #2
Interesting. AMD has all of the instruction sets that the i7 has. I wondered if maybe the binaries were built with an Intel compiler, which is known to kneecap other processors on purpose.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #3
The code is hand written so that's not an issue here.

Most time is spent in this loop:
https://github.com/dbry/WavPack/blob/master/src/pack_x64.asm#L599-L633

But nothing jumps out as being an issue on Ryzen.

What's the effective clock speed while the wavpack binary is running? Is it possible the Intel machine is getting a much higher boost?

Re: Wavpack.exe Slow on AMD Ryzen

Reply #4
The issue, if I recall correctly, with Intel's compiler is that it will not generate MMX, SSE, etc. for any processor other than Intel, even if they support those instructions. It's not about how the source is written, but the machine code generated by the Intel compiler.

I'm looking through the sources right now to see if there's something checking for an Intel CPU instead of checking for MMX support.

On the i7 machine I'm getting 3.7 GHz during encoding. When I get to the other machine I'll get you the actual clock running the wavpack process. I don't suspect this is the issue since the benchmarks on machine #1 far surpass the same benchmarks on machine #2. If there were a boost clock issue I wouldn't think that would be the case.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #5
Quote
It's not about how the source is written, but the machine code generated by the Intel compiler.

Let me repeat this once again because it clearly didn't register: the relevant part of the source code is written directly in machine language by hand. There is nothing for the compiler to compile. The code IS MMX code.

The 64-bit binary assumes MMX because that's guaranteed to be available in AMD64 mode. It has no need to detect the CPU vendor to know it can use MMX.


Re: Wavpack.exe Slow on AMD Ryzen

Reply #6
Let me repeat this once again because it clearly didn't register: the relevant part of the source code is written directly in machine language by hand. There is nothing for the compiler to compile. The code IS MMX code.

You're not repeating yourself, you're clarifying your earlier statement:

The code is hand written so that's not an issue here.

Any ideas on tracking this down other than checking the obvious (i.e., Ryzen dropping into low frequencies while running wavpack)?

Re: Wavpack.exe Slow on AMD Ryzen

Reply #7
Quote
Any ideas on tracking this down other than checking the obvious (i.e., Ryzen dropping into low frequencies while running wavpack)?

A precise enough profiler (e.g. Linux perf) might be able to show if there's a particular hotspot that is unexpected (and that might be compared to the same output on an Intel system).

I wonder if it's worth to compile the WavPack code with the assembler code disabled and see what it does to performance (likely worse, but you never know...)

Re: Wavpack.exe Slow on AMD Ryzen

Reply #8
I wonder if it's worth to compile the WavPack code with the assembler code disabled and see what it does to performance (likely worse, but you never know...)

I'm happy to test that, but trying to avoid downloading MSVC. I wish it was easy to build on Cygwin.

The weird thing is that Process Explorer shows that wavpack is only one thread using about 11.5% CPU on the i7 system the whole time. It doesn't seem like it's constrained by computational power.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #9
What's weird? Almost all audio encoders are single-threaded, AFAIK.
So yes, wavpack is constrained by single-threaded performance.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #10
What's weird? Almost all audio encoders are single-threaded, AFAIK.
So yes, wavpack is constrained by single-threaded performance.
It's weird that it's not even using 100% of the single core it's running on.

Anyways, I broke down and downloaded the 2GB of Visual Studio and built wavpack without the ASM. It STILL runs faster on this i7-4770 than the assembly-enabled version runs on the Ryzen 3990X ... by about 30%.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #11
[offtopic]
[...] Almost all audio encoders are single-threaded, AFAIK.[...]
TAK, qaac...
[/offtopic]
@TS - try live-cd-linux for delete "win10"-factor (hypothetical).

Re: Wavpack.exe Slow on AMD Ryzen

Reply #12
Set wavpack affinity for core #0 on the Ryzen 3900X and core #0 boosts to 4.5GHz while running wavpack. Still 60% slower than the i7-4770 running at 3.7GHz. CLock-for-clock that's almost precisely double the speed.

I tried the non-asm version of wavpack I built and it's even slower than the asm-enabled version.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #13
What's weird? Almost all audio encoders are single-threaded, AFAIK.
So yes, wavpack is constrained by single-threaded performance.
It's weird that it's not even using 100% of the single core it's running on.

Since you have 8 hardware threads, and you're getting 1/8th utilization from a single threaded program, it sounds like everything is normal.

Anyways, I broke down and downloaded the 2GB of Visual Studio and built wavpack without the ASM. It STILL runs faster on this i7-4770 than the assembly-enabled version runs on the Ryzen 3990X ... by about 30%.

VS has a profiler built in: https://docs.microsoft.com/en-us/visualstudio/profiling/beginners-guide-to-performance-profiling?view=vs-2019

You could profile on both of your systems and see what the difference is.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #14
What's weird? Almost all audio encoders are single-threaded, AFAIK.
So yes, wavpack is constrained by single-threaded performance.
It's weird that it's not even using 100% of the single core it's running on.

Its because you don't understand  what the numbers represent
Your CPU has 8 logical Cores
100% of 1/8 = 12.5%
So yeah its using close 100% of a Core but that's is only close to 12.5% of the CPU
11.5% of you CPU = 92% of a core

So that does not seem weird at all for a single threaded process



Going back to your Ryzen. Some of the performance you are missing can also be coming from CCX jumping which results large cache latancy  penalties
Try using affinty to lock it to just one of your CCX's
Avoiding CCX jumping has shown to give up to 23% FPS boost in games, so I would think it could give some boost for something even more CPU bound

The Ryzen CCX built up really hurts  Thread performance if you don't take control of your threads
Sven Bent - Denmark

Re: Wavpack.exe Slow on AMD Ryzen

Reply #15
Its because you don't understand  what the numbers represent

I understand what they mean. I think some people are misled because they're thinking 11.5% = 12.5%. That's like Intel floating point math. The i7-4770 is 4 cores, with 4 pretend cores. 100% utilization of a core should be at least 12.5%. 11.5% means it's not using 100% of the core. If there's no I/O bottleneck, what's the deal with the 8% speed penalty?

Going back to your Ryzen. Some of the performance you are missing can also be coming from CCX jumping which results large cache latancy  penalties
Try using affinty to lock it to just one of your CCX's
Avoiding CCX jumping has shown to give up to 23% FPS boost in games, so I would think it could give some boost for something even more CPU bound

The Ryzen CCX built up really hurts  Thread performance if you don't take control of your threads

That's why I was saying I locked the affinity to core #0 to no effect. The fact that it's close enough to exactly 2x clock-for-clock to be within the margin of calculation error got me to thinking about the way the first generation Ryzen processors did two AVX128 to simulate AVX256, which incurred a 50% speed penalty relative to Intel CPUs with AVX256, but this is different. This is the first program I've noticed this issue with, so there's something peculiar about that loop that really hurts on Ryzen for some reason, even with double the RAM at twice the speed.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #16
Its because you don't understand  what the numbers represent
I understand what they mean. I think some people are misled because they're thinking 11.5% = 12.5%. That's like Intel floating point math. The i7-4770 is 4 cores, with 4 pretend cores. 100% utilization of a core should be at least 12.5%. 11.5% means it's not using 100% of the core.

That is normal sampling error.  If you want the exact stats you'll need to log them yourself using a profiler.

Also, you don't have 4 real and 4 pretend cores.  You have 4 cores, each of which is 2 threads wide. 

Re: Wavpack.exe Slow on AMD Ryzen

Reply #17
If I profile wavpack release version, I don't get the function names, but if I profile a debug build it's like 75% slower and might be masking the speed issue. Not sure which is better. I'm not knowledgeable enough to interpret the results, but this is what I see profiling the release version on the i7-4770:



And this is with the debug version:


Re: Wavpack.exe Slow on AMD Ryzen

Reply #18
Thanks for sharing the results.

Unfortunately, results from the debug build are likely not meaningful because debug build should have no optimizations whatsoever, as well as additional debug checks adding overhead.

In order to get function names in release build, you need to enable debug symbols in release build, rebuild and run again.
Microsoft Windows: We can't script here, this is bat country.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #19
In order to get function names in release build, you need to enable debug symbols in release build, rebuild and run again.

Not sure how to do that with Visual Studio. I'll try and figure it out, but if someone knows how it would probably save a lot of time figuring it out.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #20
Got it. It ran at full speed. Profiler report looks like this:



If I'm reading things right, the most time is spent on the:

Code: [Select]
pmaddwd mm1, mm5

instruction that appears in a couple of places. But this is the fast i7-4770 machine. Now I need to see what's happening on the 3990X.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #21
It occurs to me that this information may not be helpful. The 3900X could be 10X slower but still spend the same percentage of time in the same loops. Is there a way to quantify the actual number of instructions or the amount of time to execute an instruction using this profiler? Is that what the numbers next to the percentages show? For example, the 18,656 next to the pack_decorr_stereo_pass_cont_common: Is that the number of times that particular loop executed in the 30.685 seconds of profiling (i.e., 608 per second)?

 

Re: Wavpack.exe Slow on AMD Ryzen

Reply #22
It occurs to me that this information may not be helpful. The 3900X could be 10X slower but still spend the same percentage of time in the same loops. Is there a way to quantify the actual number of instructions or the amount of time to execute an instruction using this profiler?

You would need a lower level profiler (or at least I don't think VS can do it but I could be wrong) like Intel's Vtune or AMD's CodeXL.  The problem with that is that you'd have to use each vendor's tool (most likely) since you want access to low level timing information and hardware counters.

However, your profiling above says that essentially the entire runtime is spent in that one bit of hand written MMX.  Unfortunately since it looks like the linker inlines, you're not able to see finer detail.  Have you tried profiling the version with MMX disabled and just letting the compiler generate its own code?

Is that what the numbers next to the percentages show? For example, the 18,656 next to the pack_decorr_stereo_pass_cont_common: Is that the number of times that particular loop executed in the 30.685 seconds of profiling (i.e., 608 per second)?

That is the number of milliseconds spent in that function and anything it calls.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #23
Looks like this on the 3900X:



You can see the time is longer by a long shot.

Re: Wavpack.exe Slow on AMD Ryzen

Reply #24
Looks like this after compiled with /Ob (disabled inline expansion):



OUCH!!!