Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: One updated and one new audio processing DSP library (Read 7820 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Re: One updated and one new audio processing DSP library

Reply #25
Convolution version 4 without using any double. Files within the same group are identical (also true for version 1).
Quality wise, version 4 is obviously cleaner and more consistent among different compiler settings.
Interesting that the canonical version with the "safest" compiler settings produces the worst result (If I'm reading your graphs correctly)!

I suspected that the 4th version might (jn theory) produce the best results (because it orders the adds from the smallest to the largest values), but I was never able to measure it. What's more surprising is that (again, if I'm reading correctly) the more advanced SIMD options are producing progressively better results compared to the "correct" ones. Have no guess what that would be.

It seems though that it might be reasonable to switch to version 4 as the default.

Re: One updated and one new audio processing DSP library

Reply #26
If someone fancies reading assembler, here's a diff view in godbolt.org showing version 1, -O3 vs -Ofast: https://godbolt.org/z/Gsb9eT1zb

You can change the version by changing the #define in line 50 of each editor and the compiler options by editing the textfield of each assembly view.

Re: One updated and one new audio processing DSP library

Reply #27
I just tried this, let's call it version 5:
Code: [Select]
static double apply_filter(float* A, float* B, int num_taps)
{
int i = num_taps - 1;
float sum = 0.0;

do {
sum += A[0] * B[0] + A[i] * B[i] + A[1] * B[1] + A[i - 1] * B[i - 1];
A += 2; B += 2;
} while ((i -= 4) > 0);

return sum;
}
In MSVC, regardless of settings (AVX2 or not, fp:fast or fp:precise), it has similar speed as version 3 and similar quality as version 4, while version 3 has obviously lower quality and version 4 has obviously slower speed. However in mingw64 and -Ofast (AVX2 or not), version 5 is slower than version 4 albeit they have similar quality.

So MSVC benefits from having 4 items in the same line, mingw64 does the opposite.

MSVC v3
F:\Programming\art-msvc\x64\Release>art-msvc-v3 -q -y -4 -r76543 1.wav 2.wav
8367
F:\Programming\art-msvc\x64\Release>art-msvc-v3 -q -y -4 -r44100 2.wav art-msvc-v3.wav
4852

MSVC v4
F:\Programming\art-msvc\x64\Release>art-msvc-v4 -q -y -4 -r76543 1.wav 2.wav
10562
F:\Programming\art-msvc\x64\Release>art-msvc-v4 -q -y -4 -r44100 2.wav art-msvc-v4.wav
6080

MSVC v5
F:\Programming\art-msvc\x64\Release>art-msvc-v5 -q -y -4 -r76543 1.wav 2.wav
8418
F:\Programming\art-msvc\x64\Release>art-msvc-v5 -q -y -4 -r44100 2.wav art-msvc-v5.wav
4881

Magenta and white are v3 showing lower quality, the others are v4 and v5 which are similar. Notice the two v3 results have 0.00001% THD and 0.00003% IMD+N. The figures are low, but still worse than v4 and v5.
X

Re: One updated and one new audio processing DSP library

Reply #28
Tested on some old Ubuntu (desktop):
  • gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
  • Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

Compiled with -O3. Run:
Code: [Select]
./art -q -y -4 -r96000 INPUT OUTPUT
5 times:
Code: [Select]
for ((i=1;i<=5;i++)); do echo -n "$i "; { time ... ; } 2>&1 | grep "real" | cut -f2; done
and discarding max and min. The input is 30s of float/44.1k.

Comparing this change:
Code: [Select]
-    sum1 = apply_filter (cxt->filters [i], source - cxt->numTaps / 2 + 1, cxt->numTaps);
-    sum2 = apply_filter (cxt->filters [i+1], source - cxt->numTaps / 2 + 1, cxt->numTaps);
+    sum1 = apply_filter (cxt->filters [i], source - cxt->numTaps / 2 + 1, cxt->numTaps/8*8);
+    sum2 = apply_filter (cxt->filters [i+1], source - cxt->numTaps / 2 + 1, cxt->numTaps/8*8);
There seem to be some significant time improvement with version 4:
Code: [Select]
                                            version
                          1             2             3             4
                     2 0m6.418s    2 0m3.355s    2 0m2.201s    2 0m3.774s
cxt->numTaps         4 0m6.458s    3 0m3.357s    3 0m2.200s    3 0m3.772s
                     5 0m6.424s    5 0m3.373s    5 0m2.205s    4 0m3.826s

                     2 0m6.423s    1 0m3.365s    2 0m2.199s    1 0m3.424s
cxt->numTaps/8*8     4 0m6.421s    3 0m3.363s    3 0m2.197s    2 0m3.388s
                     5 0m6.417s    5 0m3.379s    5 0m2.198s    3 0m3.399s

But then tested on a bit newer Ubuntu (laptop):
  • gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
  • Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz

and there is no change:
Code: [Select]
     4
1 0m3.666s
2 0m3.683s
4 0m3.696s

1 0m3.683s
3 0m3.668s
5 0m3.665s

Re: One updated and one new audio processing DSP library

Reply #29
This library implementation is sub-optimal, lower quality modes with aliasing artifacts are not fast enough to compensate with lower quality, and high quality mode is extremely slow.
Please remove my account from this forum.

Re: One updated and one new audio processing DSP library

Reply #30
If someone fancies reading assembler, here's a diff view in godbolt.org showing version 1, -O3 vs -Ofast: https://godbolt.org/z/Gsb9eT1zb

You can change the version by changing the #define in line 50 of each editor and the compiler options by editing the textfield of each assembly view.
I have played around with this site before (it’s pretty amazing), but I see you’ve figured out a few tricks. Cool how using argc in the code prevents everything from getting optimized away! Unfortunately my assembler is rusty and I never went past MMX, but I can sort of see what’s going on.

I have also run into the sorts of things you describe where two executables will perform similarly on a newer CPU and then completely different on an older CPU (or Intel vs. AMD). Rather than sink too much time and research into it, I just provide a few examples of the time-critical code and let the user figure how to make it fast enough for their application.  :D

Re: One updated and one new audio processing DSP library

Reply #31
I just tried this, let's call it version 5:
Code: [Select]
static double apply_filter(float* A, float* B, int num_taps)
{
int i = num_taps - 1;
float sum = 0.0;

do {
sum += A[0] * B[0] + A[i] * B[i] + A[1] * B[1] + A[i - 1] * B[i - 1];
A += 2; B += 2;
} while ((i -= 4) > 0);

return sum;
}
In MSVC, regardless of settings (AVX2 or not, fp:fast or fp:precise), it has similar speed as version 3 and similar quality as version 4, while version 3 has obviously lower quality and version 4 has obviously slower speed. However in mingw64 and -Ofast (AVX2 or not), version 5 is slower than version 4 albeit they have similar quality.
Thanks for this! I will add your version #5 for MSVC users because it seems like a natural extension.

Also, since there’s a quality issue, I might even eliminate the first three versions that don’t go from the outside in (and so people won’t have to try so many).

Re: One updated and one new audio processing DSP library

Reply #32
This library implementation is sub-optimal, lower quality modes with aliasing artifacts are not fast enough to compensate with lower quality, and high quality mode is extremely slow.
I think I mention this pretty unambigously in the “caveats” section of the README:

Quote
The resampling engine is a single C file, with another C file for the biquad filters. Don't expect the quality and performance of more advanced libraries, but also don't expect much difficulty integrating it. The simplicity and flexibility of this code might make it appealing for many applications, especially on limited-resource systems.

But thanks for pointing it out again as someone might have missed it; your observations are always welcome!