Algorithm to use in a vocal RETAINER??

Topic: Algorithm to use in a vocal RETAINER?? (Read 2751 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Algorithm to use in a vocal RETAINER??

2002-08-19 14:37:23

OK, after some experimentation it is obvious to me that throwing away everything except the voice in the centre is much more difficult than throwing out the voice in the centre.

It was simple, really (why things don't work, that is)
To remove vocals, you have L-R in the left channel and R-L in the right channel (where -L and -R are used to denote inverted waveforms of the same channel)

If you subtract that from the original recording, you get
left channel L-(L-R) = R
right channel R-(Rl) = L

which is just the original recording with inverted channels

If you try downmixing the vocal-removed track to mono, you get
L-R+R-L = nothing!

It seems that pop3smtp23 saw through the problem beforehand and started talking about frequency domain analysis, just as the the professor did...

I guess I'm just being impatient--I would meet the professor tomorrow and I guess he would tell me all about how he intends to go about doing it--but I would like to hear some ideas from people here, if it's not too off-topic...

Of course, if you are trying to pick one voice out of many voices, you can't use the 'record the noise' trick...

Algorithm to use in a vocal RETAINER??

Reply #1 – 2002-08-19 14:57:27

I personally think, picking one voice out of manys without knowing the character will be a very hard project, which won´t work without some additional AI which takes care of the language which is spoken. Example: If you are hearing a foreign language, it´s hard for you to tell which person is talking, even for your mother language it gets really hard to hear for you how many people are talking if you hear a lot of people discussing. And your brain trys to improve the result by taking care of voice characteristics. As an hint you could take some voice characteristics, as persons normally speak at some special band. So, you could build a histogram to get a first start, take the maxima and then try to do bandpassing. This should work for "some persons talking". Just my few thoughts while coding some computergraphics...

Algorithm to use in a vocal RETAINER??

Reply #2 – 2002-08-19 14:58:58

You are doing(voice removal):
m/s: ch1=l+r, ch2=l-r
remove mid(ch1)
m/s: ch1=0+(l-r), ch2=0-(l-r)
get: ch1=l-r, ch2=r-l

If you do the other way(remove side)
m/s: ch1=l+r+0, ch2=l+r-0
get: ch1=l+r, ch2=l+r

[span style='font-size:9']edit: i'm confused. need to sleep.[/span]

Algorithm to use in a vocal RETAINER??

Reply #3 – 2002-08-19 15:16:15

Hm, would it help any you are directly facing the person you want to listen to while the other speakers are off to the side? My idea, as I said, is to phase-shift and amplify the L/R channels so that the person you wants to listen to SOUNDS like he's facing you even if he isn't, then do some processing to remove the voices to the side.

Wait... just saw daniel's post--can you explain what you are saying?

Well, looks like he's saying that the best you can do for 'side removal' is downmix to mono...

Anyway, I was saying... how about I compare the spectral analysis of the two channels and for every time/frequency point take the lower value of L and R? This ought to attenuate sounds off to the side only! Is there some way for me to try it out on songs now? What editor would allow me to do this?

Algorithm to use in a vocal RETAINER??

Reply #4 – 2002-08-19 15:24:48

If you could decide to which person you are listening to, you could do everything But extracting the person out of the audiosignal should be the real problem, just taking L-R will only work in situations where in the center there is only the voice and everything else is L/R.

Algorithm to use in a vocal RETAINER??

Reply #5 – 2002-08-19 15:38:41

What I'm saying is that you can keep:
a: left channel
b: right channel
c: mid channel
d: side channel

thats allready 4 channels.
if your person is talking little bit right from the centre you mix the mid with a little right. The result should be ALWAYS mono(speech is mono panned). If you want to extract one persons voice the first step is what I described. The second is spectral etc.etc.

Notice