Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: One updated and one new audio processing DSP library (Read 8166 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

One updated and one new audio processing DSP library

I have made some significant improvements to the audio-stretch library and have created a sort-of complimentary library for audio resampling. Both are available on my GitHub account and both include pre-built Windows executables intended for demo purposes (although they’re perfectly usable as command-line applications):

Audio-stretch library on GitHub

Audio-resampler library on GitHub

The audio-stretch library has been up for years, but someone recently suggested a couple improvements and I also made some small performance enhancements and edge-case bug fixes. It implements time domain harmonic scaling (TDHS) to stretch (or squeeze) audio data in the time domain (i.e., it changes the duration of audio clips without changing their pitch). This is useful for slowing audio to make it more intelligible or speeding it up to more quickly digest audio books or podcasts.

Previously the stretch ratio was limited to just 0.5X - 2.0X, but by cascading two instances internally I doubled that range to 0.25X - 4.0X. The other new feature is the ability to detect silence gaps in the audio and use a different ratio for those portions. This is useful for further speeding up speech and keeping it intelligible.

The audio resampling code is a new repository, but I actually wrote it over 15 years ago for the project described in this post where it's used to oversample audio 256X. I’ve used it in several places and made improvements over the years since and thought it might be useful to others. So I wrote a command-line demo for it with an option for including a biquad lowpass if desired.

It’s a relatively simple implementation intended for real time or embedded applications while not necessarily offering the best offline quality (there are several great examples of that around). It’s all one C module (except for the biquad) and should be trivial to build and integrate with applications, and it can easily be fine-tuned for the environment with respect to CPU and memory usage (and at the higher quality settings is quite respectable). Most recently I put it in a real time ASRC used for synchronization in a wireless speaker (it includes the functionality to query the current phase used in the feedback loop).

I say that these libraries are complementary because while they are certainly useful independently, together they can provide the tempo/pitch/playback controls that media players often have. And although it’s subjective, I believe that they offer somewhat higher quality than the other free implementations I’ve tried (especially for playing slower).

Hope someone might find something useful here...   :)

Re: One updated and one new audio processing DSP library

Reply #1
Thank you very much for sharing these libraries.

I have been looking for something similar to Audio-Stretch for a long time, then had it on my todo list since I saw your post.
Today, I finally implemented a foobar2000 DSP based on it: https://www.foobar2000.org/components/view/foo_dsp_audiostretch
Microsoft Windows: We can't script here, this is bat country.

Re: One updated and one new audio processing DSP library

Reply #2
Thanks Peter, glad they were useful.

I tried the component and it sounds the same as when I run it offline (a good sign).


Re: One updated and one new audio processing DSP library

Reply #3
The resampler is now included in the Infinite Wave SRC Comparisons.

These are useful for investigating SRC implementations, although one obviously has to take the computational complexity into account when comparing SRCs for a given application (and this is not shown).

Re: One updated and one new audio processing DSP library

Reply #4
Weird thought:

How do you reckon vectorization would go? Thought of SSE4/NEON?

Re: One updated and one new audio processing DSP library

Reply #5
Weird thought:

How do you reckon vectorization would go? Thought of SSE4/NEON?
Not a weird thought at all. I assume that modern C compilers will vectorize the correlation loop (of which there are several versions to choose from).

I use -Ofast with gcc and choose the fastest version of the loop when I make a build, and I assume I'm getting vectorization but have never really looked any further than that.

Re: One updated and one new audio processing DSP library

Reply #6
Reason I asked is I reckon these are pretty cool, a resampler DSP for FB2K based on this new resampler would be nice to have.

Plus I think DEATH got the tempo DSP working on FB2K mobile (and I tested and it works great on Android), so maybe indeed clang and others are smart enough to properly vectorize those loops.

Re: One updated and one new audio processing DSP library

Reply #7
I use -Ofast with gcc
Be careful with that. GCC will insert code to change the CPU's floating-point behavior, and that might affect the program using your library in unexpected ways.

Re: One updated and one new audio processing DSP library

Reply #8
I use -Ofast with gcc
Be careful with that. GCC will insert code to change the CPU's floating-point behavior, and that might affect the program using your library in unexpected ways.
Ah, thanks, that's good to be aware of! So it doesn't restore the behavior on function exit? That's not very nice.

In any event, I'm not distributing a library. This is just a single C file with header, so it would normally be included and built within a larger application (whose developer is hopefully aware of these issues).

 

Re: One updated and one new audio processing DSP library

Reply #9
Thanks @bryant , I tried ART for self-education with Visual Studio.
I know a bit C# and JavaScript but I am very unfamiliar with C. I got error C2057 in art.c. I tried to change the language version settings (C11, C17, Legacy MSVC) but it is useless. I did an ugly hack and changed...
Code: [Select]
    Biquad lowpass [num_channels] [2];
...
    float error [num_channels];
...to this so it only works with stereo.
Code: [Select]
    Biquad lowpass [2] [2];
...
    float error [2];
I suppose this should not change the quality of the resampler but I found something unusual. For example, I used RMAA to generate a 44.1k float signal, resample it to e.g. 48k and resample it to 44.1k again. The result of -4 is consistently worse than by using a lower tap count like -4 -t300. Is this an expected behavior?
X
X
X

Re: One updated and one new audio processing DSP library

Reply #10
I got error C2057 in art.c. I tried to change the language version settings (C11, C17, Legacy MSVC) but it is useless.
Visual doesn't support variable length arrays: https://stackoverflow.com/questions/65505955/does-microsoft-c-compiler-allow-variable-sized-arrays

I suppose this should not change the quality of the resampler but I found something unusual. For example, I used RMAA to generate a 44.1k float signal, resample it to e.g. 48k and resample it to 44.1k again. The result of -4 is consistently worse than by using a lower tap count like -4 -t300. Is this an expected behavior?
The noise floor gets worse with more taps, yes. If I had to guess, I'd say that with more calculations there are more errors accumulating due to limited precision. But that's just a guess.
X

The transition band gets narrower though:
X

Re: One updated and one new audio processing DSP library

Reply #11
The transition band gets narrower though:
Yes. It would be a problem if the resampling ratio is high (e.g. >4x) as the passband will have a roll off earlier than 20kHz with low tap count.
BTW what did you use (on Linux?) to plot these graphs?


Re: One updated and one new audio processing DSP library

Reply #13
Thanks.

Re: One updated and one new audio processing DSP library

Reply #14
OK found a way to improve the quality. Change sum to double in resampler.c
Code: [Select]
#if 1   // Version 1 (canonical)
static double apply_filter (float *A, float *B, int num_taps)
{
    float sum = 0.0;
And yes, as the comment suggested, the unrolled versions are faster.

Re: One updated and one new audio processing DSP library

Reply #15
Haha, well I wrote something offline before reading the excellent response from @danadam , but I'll still post what I wrote.

As for the C language issue, sorry about that! I normally only use dynamically allocated arrays when I’m developing because it’s so easy and intuitive and then “fix” it later so people won’t scream at me. It actually is required in C99 but Microsoft is well known for ignoring parts of standards they don’t agree with.

You can just replace the “num_channels” with the maximum number of channels you’ll ever use. If you go beyond that accidentally you would probably get a crash (and certainly bad results) but using fewer channels is fine (it just wastes a little stack space).

As for the increased noise with longer filters, I'm not really too surprised. Everything is done in 32-bit floats (including the filter coefficients) and each operation adds a little random rounding noise, and I think that’s where that noise floor comes from. As you know, the primary purpose of those longer filters is to improve the transition band slope, which continues to improve significantly beyond 300 taps. It looks like that might raise the noise floor, but that’s not something I look at too closely (other than to make sure it doesn’t show anything “funny”).

It’s also interesting that the Infinite Wave tests don’t show the same thing. This is a comparison of preset 4 with preset 3 (which has 256 taps) and the results are identical, if not slightly better at preset 4.

Thanks for pointing this out though. Next time I’m working on that I’ll see if anything pops out. I’m actually thinking of adding more noise-shaping options for decimation (including the ATH filters from @SebastianG described in his excellent Wiki article) and moving decimation into the library also so it's more easily used, so it might not be too long.

Re: One updated and one new audio processing DSP library

Reply #16
OK found a way to improve the quality. Change sum to double in resampler.c
Code: [Select]
#if 1   // Version 1 (canonical)
static double apply_filter (float *A, float *B, int num_taps)
{
    float sum = 0.0;
Yes, that makes sense. I use this library on Raspberry Pi's and I'm concerned that using doubles for that math might slow things down too much, but I don't remember trying. I just assumed that sticking to 32-bit floats is appropriate for embedded applications, especially since we're talking noise levels far below any reasonable requirements. If it ends up being "free" though I would change that. Thanks!

Re: One updated and one new audio processing DSP library

Reply #17
Thanks. I also noticed the dither code contains a customized PRNG, it is also optimized for embedded devices?

Re: One updated and one new audio processing DSP library

Reply #18
Thanks. I also noticed the dither code contains a customized PRNG, it is also optimized for embedded devices?
Performance was a consideration there, but not the primary one. I just wanted something that would behave the same on various platforms without relying on an opaque library call with unknown (or known bad) characteristics like rand(). I instrumented that code when I wrote it making sure it gave me exactly what I expect, and now I know that it will behave the same forever.

And, BTW, I know that interpreting the decimal digits of pi as hexadecimal is nonsense. That’s kind of an XKCD-style Easter egg…  :)

Re: One updated and one new audio processing DSP library

Reply #19
I tried mingw64 14.2.0 in my i3-12100 Win10 PC, with these build commands I benchmarked the executables in elapsed milliseconds. Input is a 2m 13.5s stereo 32-bit float 44.1k .wav file.

-O3
H:\>art-O3 -q -y -4 -r76543 1.wav 2.wav
19307
H:\>art-O3 -q -y -4 -r44100 2.wav art-O3.wav
11159

-O3 -mavx2
H:\>art-avx2-O3 -q -y -4 -r76543 1.wav 2.wav
18870
H:\>art-avx2-O3 -q -y -4 -r44100 2.wav art-avx2-O3.wav
10919

-O3 -fno-signed-zeros -fno-trapping-math -fassociative-math
H:\>art-switch -q -y -4 -r76543 1.wav 2.wav
5588
H:\>art-switch -q -y -4 -r44100 2.wav art-switch.wav
3290

-Ofast
H:\>art-Ofast -q -y -4 -r76543 1.wav 2.wav
5845
H:\>art-Ofast -q -y -4 -r44100 2.wav art-Ofast.wav
3303

-Ofast -mavx2
H:\>art-avx2-Ofast -q -y -4 -r76543 1.wav 2.wav
2915
H:\>art-avx2-Ofast -q -y -4 -r44100 2.wav art-avx2-Ofast.wav
1837

-O3 -mavx2 -fno-signed-zeros -fno-trapping-math -fassociative-math
H:\>art-avx2-switch -q -y -4 -r76543 1.wav 2.wav
2998
H:\>art-avx2-switch -q -y -4 -r44100 2.wav art-avx2-switch.wav
1802

For this specific program and my environment, the minimum required switches to improve SIMD performance are -fno-signed-zeros -fno-trapping-math -fassociative-math and I suppose at least SSE2 is used when -mavx2 is not used (Windows 10 requires SSE2).

-fno-signed-zeros -fno-trapping-math -fassociative-math are parts of -funsafe-math-optimizations.
-funsafe-math-optimizations is part of -ffast-math.
-ffast-math is part of -Ofast.
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
https://simonbyrne.github.io/notes/fastmath/

Re: One updated and one new audio processing DSP library

Reply #20
Fun and safe optimizations are the best optimizations ;););)

Re: One updated and one new audio processing DSP library

Reply #21
I tried mingw64 14.2.0 in my i3-12100 Win10 PC, with these build commands I benchmarked the executables in elapsed milliseconds. Input is a 2m 13.5s stereo 32-bit float 44.1k .wav file.

-O3
H:\>art-O3 -q -y -4 -r76543 1.wav 2.wav
19307
H:\>art-O3 -q -y -4 -r44100 2.wav art-O3.wav
11159

...snip...

-O3 -mavx2 -fno-signed-zeros -fno-trapping-math -fassociative-math
H:\>art-avx2-switch -q -y -4 -r76543 1.wav 2.wav
2998
H:\>art-avx2-switch -q -y -4 -r44100 2.wav art-avx2-switch.wav
1802
Thanks for testing these! I was certainly aware of -Ofast and I built the release binary with that because it makes a huge difference, but it looks like twice more is still available with some compiler flags (at least on some CPUs). I always assumed that hand-written SIMD code would be required to be fully competitive speedwise, but perhaps not!

BTW, is this with your "double" change for precision? And which of the provided convolutions implementations is this with? I assume it must matter, right, although if they're smart enough they could abstract all that away.  :D

Re: One updated and one new audio processing DSP library

Reply #22
BTW, is this with your "double" change for precision? And which of the provided convolutions implementations is this with?
I specifically tried mingw64 because I hate the MSVC limitations. There is no change in the original source code (i.e. 32-bit with canonical convolution). Because the comparison among different versions would be unfair for example in the 4x unroll version:
Code: [Select]
sum += (A[0] * B[0]) + (A[1] * B[1]) + (A[2] * B[2]) + (A[3] * B[3]);
Everything is still in float before adding to sum, and if I cast A and B to double within such a loop things will slow down a lot which nullify any unrolling effort.

Re: One updated and one new audio processing DSP library

Reply #23
Convolution version 4 without using any double. Files within the same group are identical (also true for version 1).

Group 1
-O3
H:\audio-resampler-master>art-O3 -q -y -4 -r76543 1.wav 2.wav
10431
H:\audio-resampler-master>art-O3 -q -y -4 -r44100 2.wav art-O3.wav
6062
-O3 -mavx2
H:\audio-resampler-master>art-avx2-O3 -q -y -4 -r76543 1.wav 2.wav
9381
H:\audio-resampler-master>art-avx2-O3 -q -y -4 -r44100 2.wav art-avx2-O3.wav
5476

Group 2
-O3 -fno-signed-zeros -fno-trapping-math -fassociative-math
H:\audio-resampler-master>art-switch -q -y -4 -r76543 1.wav 2.wav
6122
H:\audio-resampler-master>art-switch -q -y -4 -r44100 2.wav art-switch.wav
3589
-Ofast
H:\audio-resampler-master>art-Ofast -q -y -4 -r76543 1.wav 2.wav
6176
H:\audio-resampler-master>art-Ofast -q -y -4 -r44100 2.wav art-Ofast.wav
3584

Group 3
-Ofast -mavx2
H:\audio-resampler-master>art-avx2-Ofast -q -y -4 -r76543 1.wav 2.wav
2978
H:\audio-resampler-master>art-avx2-Ofast -q -y -4 -r44100 2.wav art-avx2-Ofast.wav
1790
-O3 -mavx2 -fno-signed-zeros -fno-trapping-math -fassociative-math
H:\audio-resampler-master>art-avx2-switch -q -y -4 -r76543 1.wav 2.wav
3018
H:\audio-resampler-master>art-avx2-switch -q -y -4 -r44100 2.wav art-avx2-switch.wav
1777

Quality wise, version 4 is obviously cleaner and more consistent among different compiler settings.
X

Re: One updated and one new audio processing DSP library

Reply #24
BTW, is this with your "double" change for precision? And which of the provided convolutions implementations is this with?
I specifically tried mingw64 because I hate the MSVC limitations. There is no change in the original source code (i.e. 32-bit with canonical convolution). Because the comparison among different versions would be unfair for example in the 4x unroll version:
Code: [Select]
sum += (A[0] * B[0]) + (A[1] * B[1]) + (A[2] * B[2]) + (A[3] * B[3]);
Everything is still in float before adding to sum, and if I cast A and B to double within such a loop things will slow down a lot which nullify any unrolling effort.
My thinking was that the unrolled versions will let the compiler know that there are (in the case you show) always a multiple of 4 samples available, which should make the SIMD easier. But maybe it just throws in extra code to check the counts and in the case of 1024 terms it makes no difference. I'm not sure what you mean by "unfair". The fastest version would be the best unless it generated bad results (which I think in this case would just be slightly more noise).