Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Algorithm used in vocal removers? (Read 4607 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Algorithm used in vocal removers?

I know the programs or plugins you can use for vocal removal, what I want to know is how they work their magic.

Actually what I have in mind is the inverse of vocal removal, that is, taking away everything except the voice in the middle. I may be undertaking an undergraduate final year project that aims to recreate in machines the abilities of human auditory attention--being able to pick up and follow a discussion even a background full of noise and other conversations that you've determined you don't want to hear.

What I have in mind is to phase-shift and amplify the L/R audio channels for each speaker so that each speaker is effectively speaking from centre stage on its own channel, then apply a 'vocal retainment' algorithm to remove everything except the voice in the 'centre'.

Algorithm used in vocal removers?

Reply #1
I would recommend you to use FFT and then do a bandpass for the vocal band. You could record backgroundnoise before, and use this information for subtracting it from the signal, too, to make your bandpassing better.
In a few words: Get Background -> FFT -> Original.sub(FFT) -> Vocal.
If you are able to record the noise really good, you could use an IIR filter algorithm, too. (Theory: Take the original signal=vocal, apply noise to it=music. Now, do exactly the same thing inverted, with trying to interpolate the noise signal by a taylorinterpolation).
But for fast and easy effects I just would do a bandpass for the vocal band, unfortunately you will get the vocal distorted this way, for better and "exact" results, you should use the IIR approach.

Algorithm used in vocal removers?

Reply #2
"Voice Removal" in cool edit or winamp dsp will remove all sound which is the same for both channels. Doing a 180 degrees phase shift on only one channel will accomplish that.

I'm also interested in "Keep Voice Only" because I've tried many times to do it with cool edit with no success...

Algorithm used in vocal removers?

Reply #3
Removing voice inverts one channel and mixes it with the other.
To keep voice only, you should substract the voiceless version (mono) from the original version converted to mono. Doesn't it work ?

Algorithm used in vocal removers?

Reply #4
So simple??? This might be easier than I thought...

What about what pop3smtp23 said? I suppose that's the kind of stuff you have to do if you only have one channel to work with?

Algorithm used in vocal removers?

Reply #5
Quote
To keep voice only, you should substract the voiceless version (mono) from the original version converted to mono. Doesn't it work?


How the heck do you downmix the voiceless version into mono??

The left channel is L-R
The right channel is R-L
mono would be L-R+R-L = nothing
experimentally proven

I guess it's not so simple after all...

Algorithm used in vocal removers?

Reply #6
in editing prog
m/s matrix(ch1=(r+l), ch2=(r-l))
remove side channel
m/s matrix(ch1=(r+l)+0, ch2=(r+l)-0)
you get only center stage sounds.
is this correct??

Algorithm used in vocal removers?

Reply #7
Quote
Originally posted by Joe Bloggs


How the heck do you downmix the voiceless version into mono??

The left channel is L-R
The right channel is R-L
mono would be L-R+R-L = nothing
experimentally proven

I guess it's not so simple after all...


Well, then it's just inverted phase mono, just invert one channel to get pure mono.
In practice, just keep one of the two channels to work with, they are the same. Only the signum is inverted.

Algorithm used in vocal removers?

Reply #8
Ok, suppose you just take one channel
L-R or R-L

The original downmixed to mono would be L+R
Result
L+R-(L-R) = 2R or L+R-(R-L) = 2L

Don't see how that would do anything...

Algorithm used in vocal removers?

Reply #9
No, because the mono is L/2+R/2. But it doesn't work : I realized that in the version without voice, R and L are mixed with an opposite signum. If one is canceled, the other can't be canceled.

Well, just use a Dolby Sorround decoder and keep the central channel, then !