Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: One updated and one new audio processing DSP library (Read 22176 times) previous topic - next topic
0 Members and 13 Guests are viewing this topic.

Re: One updated and one new audio processing DSP library

Reply #25
Convolution version 4 without using any double. Files within the same group are identical (also true for version 1).
Quality wise, version 4 is obviously cleaner and more consistent among different compiler settings.
Interesting that the canonical version with the "safest" compiler settings produces the worst result (If I'm reading your graphs correctly)!

I suspected that the 4th version might (jn theory) produce the best results (because it orders the adds from the smallest to the largest values), but I was never able to measure it. What's more surprising is that (again, if I'm reading correctly) the more advanced SIMD options are producing progressively better results compared to the "correct" ones. Have no guess what that would be.

It seems though that it might be reasonable to switch to version 4 as the default.

Re: One updated and one new audio processing DSP library

Reply #26
If someone fancies reading assembler, here's a diff view in godbolt.org showing version 1, -O3 vs -Ofast: https://godbolt.org/z/Gsb9eT1zb

You can change the version by changing the #define in line 50 of each editor and the compiler options by editing the textfield of each assembly view.

Re: One updated and one new audio processing DSP library

Reply #27
I just tried this, let's call it version 5:
Code: [Select]
static double apply_filter(float* A, float* B, int num_taps)
{
int i = num_taps - 1;
float sum = 0.0;

do {
sum += A[0] * B[0] + A[i] * B[i] + A[1] * B[1] + A[i - 1] * B[i - 1];
A += 2; B += 2;
} while ((i -= 4) > 0);

return sum;
}
In MSVC, regardless of settings (AVX2 or not, fp:fast or fp:precise), it has similar speed as version 3 and similar quality as version 4, while version 3 has obviously lower quality and version 4 has obviously slower speed. However in mingw64 and -Ofast (AVX2 or not), version 5 is slower than version 4 albeit they have similar quality.

So MSVC benefits from having 4 items in the same line, mingw64 does the opposite.

MSVC v3
F:\Programming\art-msvc\x64\Release>art-msvc-v3 -q -y -4 -r76543 1.wav 2.wav
8367
F:\Programming\art-msvc\x64\Release>art-msvc-v3 -q -y -4 -r44100 2.wav art-msvc-v3.wav
4852

MSVC v4
F:\Programming\art-msvc\x64\Release>art-msvc-v4 -q -y -4 -r76543 1.wav 2.wav
10562
F:\Programming\art-msvc\x64\Release>art-msvc-v4 -q -y -4 -r44100 2.wav art-msvc-v4.wav
6080

MSVC v5
F:\Programming\art-msvc\x64\Release>art-msvc-v5 -q -y -4 -r76543 1.wav 2.wav
8418
F:\Programming\art-msvc\x64\Release>art-msvc-v5 -q -y -4 -r44100 2.wav art-msvc-v5.wav
4881

Magenta and white are v3 showing lower quality, the others are v4 and v5 which are similar. Notice the two v3 results have 0.00001% THD and 0.00003% IMD+N. The figures are low, but still worse than v4 and v5.
X

Re: One updated and one new audio processing DSP library

Reply #28
Tested on some old Ubuntu (desktop):
  • gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
  • Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

Compiled with -O3. Run:
Code: [Select]
./art -q -y -4 -r96000 INPUT OUTPUT
5 times:
Code: [Select]
for ((i=1;i<=5;i++)); do echo -n "$i "; { time ... ; } 2>&1 | grep "real" | cut -f2; done
and discarding max and min. The input is 30s of float/44.1k.

Comparing this change:
Code: [Select]
-    sum1 = apply_filter (cxt->filters [i], source - cxt->numTaps / 2 + 1, cxt->numTaps);
-    sum2 = apply_filter (cxt->filters [i+1], source - cxt->numTaps / 2 + 1, cxt->numTaps);
+    sum1 = apply_filter (cxt->filters [i], source - cxt->numTaps / 2 + 1, cxt->numTaps/8*8);
+    sum2 = apply_filter (cxt->filters [i+1], source - cxt->numTaps / 2 + 1, cxt->numTaps/8*8);
There seem to be some significant time improvement with version 4:
Code: [Select]
                                            version
                          1             2             3             4
                     2 0m6.418s    2 0m3.355s    2 0m2.201s    2 0m3.774s
cxt->numTaps         4 0m6.458s    3 0m3.357s    3 0m2.200s    3 0m3.772s
                     5 0m6.424s    5 0m3.373s    5 0m2.205s    4 0m3.826s

                     2 0m6.423s    1 0m3.365s    2 0m2.199s    1 0m3.424s
cxt->numTaps/8*8     4 0m6.421s    3 0m3.363s    3 0m2.197s    2 0m3.388s
                     5 0m6.417s    5 0m3.379s    5 0m2.198s    3 0m3.399s

But then tested on a bit newer Ubuntu (laptop):
  • gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
  • Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz

and there is no change:
Code: [Select]
     4
1 0m3.666s
2 0m3.683s
4 0m3.696s

1 0m3.683s
3 0m3.668s
5 0m3.665s

Re: One updated and one new audio processing DSP library

Reply #29
This library implementation is sub-optimal, lower quality modes with aliasing artifacts are not fast enough to compensate with lower quality, and high quality mode is extremely slow.

Re: One updated and one new audio processing DSP library

Reply #30
If someone fancies reading assembler, here's a diff view in godbolt.org showing version 1, -O3 vs -Ofast: https://godbolt.org/z/Gsb9eT1zb

You can change the version by changing the #define in line 50 of each editor and the compiler options by editing the textfield of each assembly view.
I have played around with this site before (it’s pretty amazing), but I see you’ve figured out a few tricks. Cool how using argc in the code prevents everything from getting optimized away! Unfortunately my assembler is rusty and I never went past MMX, but I can sort of see what’s going on.

I have also run into the sorts of things you describe where two executables will perform similarly on a newer CPU and then completely different on an older CPU (or Intel vs. AMD). Rather than sink too much time and research into it, I just provide a few examples of the time-critical code and let the user figure how to make it fast enough for their application.  :D

Re: One updated and one new audio processing DSP library

Reply #31
I just tried this, let's call it version 5:
Code: [Select]
static double apply_filter(float* A, float* B, int num_taps)
{
int i = num_taps - 1;
float sum = 0.0;

do {
sum += A[0] * B[0] + A[i] * B[i] + A[1] * B[1] + A[i - 1] * B[i - 1];
A += 2; B += 2;
} while ((i -= 4) > 0);

return sum;
}
In MSVC, regardless of settings (AVX2 or not, fp:fast or fp:precise), it has similar speed as version 3 and similar quality as version 4, while version 3 has obviously lower quality and version 4 has obviously slower speed. However in mingw64 and -Ofast (AVX2 or not), version 5 is slower than version 4 albeit they have similar quality.
Thanks for this! I will add your version #5 for MSVC users because it seems like a natural extension.

Also, since there’s a quality issue, I might even eliminate the first three versions that don’t go from the outside in (and so people won’t have to try so many).

Re: One updated and one new audio processing DSP library

Reply #32
This library implementation is sub-optimal, lower quality modes with aliasing artifacts are not fast enough to compensate with lower quality, and high quality mode is extremely slow.
I think I mention this pretty unambigously in the “caveats” section of the README:

Quote
The resampling engine is a single C file, with another C file for the biquad filters. Don't expect the quality and performance of more advanced libraries, but also don't expect much difficulty integrating it. The simplicity and flexibility of this code might make it appealing for many applications, especially on limited-resource systems.

But thanks for pointing it out again as someone might have missed it; your observations are always welcome!

Re: One updated and one new audio processing DSP library

Reply #33
I have created version 0.3 of the resampler including the new convolution loop mentioned above as default (and eliminating the two with poorer quality) and fixing the MSVC issues, plus some other improvements.

The release is available here on GitHub (including a new Windows binary built with MinGW). This is the changelog:

  • new strong ATH noise-shaping for popular sampling rates (32, 44.1, 48, 88.2 and 96 kHz)
  • performance improvement through compiler optimizations and new convolution loop
  • move decimation (float to integer and back) into own module for use outside ART
  • make dither and noise-shaping fully configurable (including disabled)
  • fix MSVC compiles (no variable-length arrays, M_PI defined)
  • only write extensible WAV header if strictly required (ART)


@bennetng @danadam Thanks again for your input!

Re: One updated and one new audio processing DSP library

Reply #34
Thanks for providing a solution for the variable length array limitation. I also observed the M_PI issue but was able to fix it myself and therefore I did not report it.

Re: One updated and one new audio processing DSP library

Reply #35
I came up with an idea to further improve speed and quality, but don't know if it conflicts with the intended use case or not:
I put it in a real time ASRC used for synchronization in a wireless speaker (it includes the functionality to query the current phase used in the feedback loop).
I added a new function in art.c...
Code: [Select]
static unsigned long gcd(unsigned long sr)
{
    unsigned long rs = resample_rate;
    while (sr && rs)
    {
        if (sr > rs)
            sr %= rs;
        else
            rs %= sr;
    }
    return fmax(resample_rate / (sr | rs), 2);
}
...and use it below:
Code: [Select]
static unsigned int process_audio (FILE *infile, FILE *outfile, unsigned long sample_rate,
    unsigned long num_samples, int num_channels, int inbits)
{
...
    //int flags = SUBSAMPLE_INTERPOLATE;
    int flags = 0;
    unsigned long factor = gcd(sample_rate);
    if (factor > num_filters)
        flags = SUBSAMPLE_INTERPOLATE;
    else
        num_filters = factor;
...

Benchmark results with 44.1k -> 48k -> 44.1k conversion
Current version:
Code: [Select]
H:\>old -4qyr48000 1.wav 48old.wav
723ms  f=1024
H:\>old -4qyr44100 48old.wav 48-44old.wav
673ms  f=1024
Current version with manual -f:
Code: [Select]
H:\>old -4qyr48000 -f160 1.wav 48oldf.wav
650ms  f=160
H:\>old -4qyr44100 -f147 48oldf.wav 48-44oldf.wav
608ms  f=147
New version with auto -f and auto interpolation:
Code: [Select]
H:\>new -4qyr48000 1.wav 48new.wav
330ms  f=160
H:\>new -4qyr44100 48new.wav 48-44new.wav
312ms  f=147

X

I still have another idea to use multi-stage filtering in order to reduce the required taps while maintaining reasonable steepness (I am totally fine with the default 256 taps in 44k <-> 48k conversions) when using high resampling ratios (e.g. >4x), but it is too difficult for me to write, especially in C.

Re: One updated and one new audio processing DSP library

Reply #36
Here is another example of 44.1k -> 37.8k -> 44.1k conversion. 37.8k is used on PlayStation 1.

Current version:
Code: [Select]
H:\>old -4qyr37800 1.wav 38old.wav
505ms  f=1024
H:\>old -4qyr44100 38old.wav 38-44old.wav
587ms  f=1024
New version:
Code: [Select]
H:\>new -4qyr37800 1.wav 38new.wav
245ms  f=6
H:\>new -4qyr44100 38new.wav 38-44new.wav
281ms  f=7
The sharp rise of distortion beyond 18kHz in the converted files is expected due to Nyquist limit (18.9kHz).
X

Re: One updated and one new audio processing DSP library

Reply #37
I came up with an idea to further improve speed and quality, but don't know if it conflicts with the intended use case or not:
I put it in a real time ASRC used for synchronization in a wireless speaker (it includes the functionality to query the current phase used in the feedback loop).
This is very cool! Like you suggest, it does conflict with the intended use of the resampler (ASRC), but your change is only to the ART command-line program which has nothing to do with ASRC, so it’s appropriate.

I haven’t looked at it too deeply, but my first thought is is how it would work if the sample rate ratio was not exactly representable in a double, and how often does that happen? Would the angle drift over time until it “skipped a cog”, as it were? I’d have to experiment with that (unless you’ve already thought about it).

I was also thinking about converting my audio-stretch library into floats and incorporating that into the ART workflow. I’m pretty busy with other stuff these days, but I’ll definitely look into this at some point. Thanks for trying it out and letting me know!

Re: One updated and one new audio processing DSP library

Reply #38
Which software is used to produce above graphs?, I want to test another resampler...

Re: One updated and one new audio processing DSP library

Reply #39
Which software is used to produce above graphs?, I want to test another resampler...
Looks like RightMark Audio Analyzer (RMAA).

Re: One updated and one new audio processing DSP library

Reply #40
https://audio.rightmark.org
Alexey Lukin was a developer of RMAA, but after he joined iZotope RMAA has not been updated for many years, and several bugs/limitations remains:
[1] 88.2k and 176.4k tests are broken.
[2] The program does not support WaveFormatExtensible files.
[3] Input/output sample rates must be the same during analysis, so it is not possible to compare for example 48k and 44.1k directly, it has to be 48k -> another sample rate -> 48k.
[4] While 32-bit fixed point formats are supported, internally it uses 32-bit float. So it cannot reveal the true performance of 32-bit fixed point.

Alexey Lukin was also a test designer of the popular Infinite Wave SRC benchmark website:
https://src.infinitewave.ca/faq.html

Re: One updated and one new audio processing DSP library

Reply #41
I haven’t looked at it too deeply, but my first thought is is how it would work if the sample rate ratio was not exactly representable in a double, and how often does that happen? Would the angle drift over time until it “skipped a cog”, as it were? I’d have to experiment with that (unless you’ve already thought about it).

In this case the code path will fall back to status quo, with same speed and resulting in identical files. Here are some benchmarks with a 1h10m16s CD image.
Code: [Select]
H:\>old -4qyr49999 -o32 1.wav old49999.wav
60827ms  f=1024
H:\>new -4qyr49999 -o32 1.wav new49999.wav
60743ms  f=1024
So applications requiring arbitrary/continuous shifting won't work.

It only skips interpolation when the required -f is not higher than specified.
Code: [Select]
H:\>old -4qyr50000 -o32 1.wav old50000.wav
62101ms  f=1024
H:\>new -4qyr50000 -o32 1.wav new50000.wav
31270ms  f=500
Of course I can use a lower -t to make things faster, but just wanted to do a longer test to obtain more data for analysis.

Re: One updated and one new audio processing DSP library

Reply #42
Can bellow link be useful for sinc-based resamplers?

https://www.dsprelated.com/showcode/236.php

From it I see in octave output that it have much better performance than vanilla sinc().

Re: One updated and one new audio processing DSP library

Reply #43
Yeah, it sounds like that would allow much shorter filters to achieve the same response. Well, at least a response as good as a longer filter, even if it's not the same response.

Unfortunately the math is beyond me, but I'd be very interested to see the results in a resampler!

 

Re: One updated and one new audio processing DSP library

Reply #44
I think this is just multiplication with kaiser-bessel window (which can be finely adjusted..), note that sinc one in matlab script is not multiplied with any window, like von hann or blackman so thus huge difference.

Re: One updated and one new audio processing DSP library

Reply #45
I think this is just multiplication with kaiser-bessel window (which can be finely adjusted..), note that sinc one in matlab script is not multiplied with any window, like von hann or blackman so thus huge difference.
That really doesn't sound like a fair comparison, does it?  :)

And they do mention the "slow decrease rate of the sinc tails" without any mention of well-established windowing techniques.

Re: One updated and one new audio processing DSP library

Reply #46
I have taken my audio-stretch code and refactored it to work with 32-bit float audio and then integrated that into the ART resampler command-line tool. I added three long form options to invoke it:

  • --pitch=<cents>  modifies pitch as specified with range +/- 2400 (2 octaves)
  • --tempo=<ratio> modifies tempo (speed) with range 0.25x to 4.0x
  • --duration=<time>  set target time, either absolute or relative to source duration

Of course, the simple TDHS that I use can and does have very audible or annoying artifacts, especially when used with large stretch ratios or with highly polyphonic source material. However, it can also sound pretty good with more subtle changes. Note that it only works with stereo or mono files, and may not work at extreme sample rates or combination of settings.

I also fixed a bug where I put the clipping inside the noise-shaping feedback loop which caused some instability with the new strong ATH shaping curves.

Here's the 0.4 release with a Windows executable on GitHub.