Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Using deep learning for noise suppression (Read 6072 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Using deep learning for noise suppression

Just thought this should be of interest to some people here. I just published a demo that shows how you can do better noise suppression using deep learning.

Re: Using deep learning for noise suppression

Reply #1
This is really interesting. I'm at work just now so can't listen to the samples but I will do as soon as I'm home.

I use Izotope RX 3 denoise almost daily, so that will be my main comparison point.  One question that I have now is - why have a focus on speed? Very few applications for denoise need to happen in real time. In fact, quality is paramount as it's usually employed in achieve or production applications.

Ok, I've listed to some samples now. I'm very impressed how good the results are. There are far fewer artefacts when using the new algorithm.  I'd love to try this on a vinyl record recording

Re: Using deep learning for noise suppression

Reply #2
I use Izotope RX 3 denoise almost daily, so that will be my main comparison point.  One question that I have now is - why have a focus on speed? Very few applications for denoise need to happen in real time. In fact, quality is paramount as it's usually employed in achieve or production applications.
The main application here is real-time voice communication, e.g. being able to hold a videoconference without getting an insane amount of noise. For that it needs to be able tun run easily on a mobile phone.

Ok, I've listed to some samples now. I'm very impressed how good the results are. There are far fewer artefacts when using the new algorithm.  I'd love to try this on a vinyl record recording
I suspect it will sound terrible (but possibly entertaining!) when applied to a vinyl because it's been trained exclusively on speech. It's probably not too hard to repurpose it to other applications (like vinyl recordings), but that would at least require training on different data. Also, for "offline" use, it's probably a good idea to add some amount of "look ahead" instead of trying to denoise frames one at a time.

Re: Using deep learning for noise suppression

Reply #3
With 'No suppresion' and '0db babble noise/0db street noise' I can make out the speech clearly, but when I use RNNoise then I struggle a bit to make out the speech. It is as if the end and start of the words are cut off.

Edit: Oops I didn't read further
Quote
As strange as it may sound, you should not be expecting an increase in intelligibility.

Re: Using deep learning for noise suppression

Reply #4
Just thought this should be of interest to some people here. I just published a demo that shows how you can do better noise suppression using deep learning.

Nice demo but...

The processing does not seem to be all that exceptional. looks like a strong variable  noise gate cascaded with a weaker dynamic equalizer.

Re: Using deep learning for noise suppression

Reply #5
The processing does not seem to be all that exceptional. looks like a strong variable  noise gate cascaded with a weaker dynamic equalizer.
You're describing the "classic DSP" part of the algo. The tricky part is controlling all of it to remove the noise and leave the speech. It's not as easy as it seems -- especially when the noise is actually speech (babble noise).

Re: Using deep learning for noise suppression

Reply #6
Does the deep learning process involve recognizing the main speaker's voice characteristics e.g. formant, or detect the language being used (speech recognition) so that the algorithm can tell the signal and noise apart?

I am thinking about software like Melodyne which can separate mixed songs to a certain extent depends on source materials.

Re: Using deep learning for noise suppression

Reply #7
Does the deep learning process involve recognizing the main speaker's voice characteristics e.g. formant, or detect the language being used (speech recognition) so that the algorithm can tell the signal and noise apart?
There's several reasons it cannot do that. First, this is meant to be real-time: each time we get 10 ms of audio we need to decide what to do about it. There's just not enough audio available (we'd need to look ahead at least a few seconds) to run speech recognition. Another reason is that one of the goals is small and fast. A good speech recognizer is neither small nor fast. Even if we could work around those problems and the difficulty of supporting *all* languages, there's also a fundamental danger to doing recognition followed by resynthesis. In some cases, the speech is just not intelligible -- due to noise, mumbling, or other factors. When that happens, RNNoise will just output something that's also not intelligible. OTOH, the system you're describing might *choose* a word and then making intelligible. So the listener would very clearly understand the wrong word and not ask the speaker to repeat.

Re: Using deep learning for noise suppression

Reply #8
Looks like artificial intelligence (AI) is a next big thing. All IT talks about it now.
I wonder whether AI can be applied to audio codec (like Opus) in general  (and not just for this particular case of noise suppression) to take smart decisions and increase audio quality.
It would require computing resources but today 8-16 core systems are already pretty affordable (and 32-cores CPUs will be affordable as well in next 2-3 years).

It looks very promising.  :)

Re: Using deep learning for noise suppression

Reply #9
I have been looking into using AI/Neural techniques for noise reduction/etc.   I do believe that these methods can be useful, but actually the traditional methods aren't being used completely enough.   We'll probably determine that for NR, transient and other needs that we can see an incremental improvement...  However, again there is a huge area to be mined using traditional math.   I have been doing lots of 'paper/book' research looking for various audio improvement techniques, and it seems to me that there STILL is a reasonable area to be mined EVEN JUST IN THE TIME DOMAIN.   Also, there are some transform domain techniques that should be helpful, but I have seen little being done in the more esoteric areas (e.g. trying to take advantage of Complex DCT like technques for the filtering (NR/expansion) domain instead of Fourier.)  (Yea -- I know that there are some analogs to some of the transform methods being used -- but I still think that we can benefit from some methods more efficient than most AI techniques)

As has been implied, AI techniques can take lots more CPU than traditional techniques, often for just an incremental improvement.  I'd certainly like to see more research in that area ALONG with more traditional techniques.   I do plan to look into using Complex DCT-like techniques for improved NR.   Also, another potentially useful field would be fractional transforms...   I wonder if transient recovery might better be done by using fractional transforms instead of standard transforms where there is a different time resolution/frequency resolution tradeoff.
I am just blabbing -- and I AM NOT DISMISSING AI TECHNIQUES!!   We have a long way to go yet with traditional methods (IMO.)
Maybe we are in the effective technology timeframe as if we are worrying a lot about Moores' law in yr2000.   Yes -- there was a concern, but we still had 15yrs of room for growth.  Likewise, I think that there are some number of years of progress that we should try to make using non-AI methods (while still looking to utilize AI methods.)

John Dyson

Re: Using deep learning for noise suppression

Reply #10
Wasn't lossy compression artificial intelligence, way back then with MP3? Chopping off the frequencies  that most people can't hear anyway is not amazingly clever, but working out what sounds are masked by other sounds and omitting the masked sounds has always seemed to me to be sheer brilliance, let alone intelligence. And it doesn't take much computing power.

It may not be a directly relevant example, but I'm saying that AI has been around in audio since long before anyone called it "AI."
The most important audio cables are the ones in the brain


 

Re: Using deep learning for noise suppression

Reply #12
In a very, very approximate way -- mp3 is a way of 'resinging' music.   On playback, it reconstructs the levels of 'signal' in each band, and plays it out.  It isn't really 100% correct to say this, but mp3 is (in a way, and I'll probably get in trouble for writing this) a very sophisticated vocoder.   The stuff actually compressed are the coefficients for the levels in each transform band that represents the real analog signal.   There are some tricks to improve the compression (like changing windows/etc), but it is in no-way AI.    When one uses too much compression, then the mp3 playback produces spurious signals because the decoder isn't getting accurate enough representation of the original signal.   Differences between the various transform-type compression schemes are things like the way that the coefficients are compressed, the kind of transform being used, the kind of windowing being used and the data rates.
The art in almost any kind of the fancy compression schemes is in the compression -- usually the decoding is fairly well defined.  Video compression is even more 'interesting' because of all of the kinds of redundancy that is removed...  (Please don't make fun of my comments herein -- I was only trying to explain something in totally non-technical terms without being 100% wrong!!!)

John