Hi folks, I'm new on this forum. Nice to be here!
I've a question:
I wish to produce "center channel" from a simple stereo audio.
My idea was to extract all the audio content that are COMMON between the left and right channels.
Doing this I will obtain three signal L (without mono contents), C (The mono content) and R (without mono contents).
Do you think it's possible to do this? I think that would be something near to the output of a Dolby Prologic2 that put in the center channel the signal common between Lt and Rt.
That will be useful from me because I'm doing an Upmix of a stereo soundtrack and I wish to put the voice in the center channel without having them in the frontal left and right!
In the surround channel I'm putting the [L-R] signal, decorrelating it a bit with a surround reverb, it works nice, but the problem for me is still the "front image".
Suggenstions and tricks are welcome!
Thanx!
I find that if I'm centered between the front speakers, I can't tell whether the center channel is on. Sitting significantly off center is a different story, and why there are center speakers in the first place.
Do you have problems with plain stereo? Maybe one of your speakers is reverse polarity?
Anyhow, as you said, the effect you are looking for is Dolby Pro-logic, so that is the way to go. Or are you lookin to implement ProLogic in software?
Doing this I will obtain three signal L (without mono contents), C (The mono content) and R (without mono contents).
You can't quite do that.
Let's call the new channels Ln, Cn, and Rn (n=new!)
If you do the obvious...
Ln=L-R
Cn=L+R
Rn=R-L
Then you have two channels playing the same thing but out of phase (left and right), and a combined channel in the middle.
If you do Mono=L+R
Cn=Mono
Ln=L-Mono
then what you really have is
Ln=L-(L+R)=L-L-R=-R
(not very helpful).
The best passive system is simply
Ln=L
Rn=R
Cn=L+R (-3dB)
But the stereo sound stage width reduces (move the speakers further apart from compensate).
I've read that Michael Gerzon (ambisonics) suggested a better 2>3 conversion, but don't know any more. If you search the archives of the "Sursound" mailing lists, there's a thread on this (Google Sursound).
The Dolby systems work by "steering" the content - reducing the loudness of the quieter channels in favour of the louder channel - so if it detects the content is all in L+R and there's nothing in L-R, it mutes L and R, leaving only C. ProLogic 2 is supposedly more subtle and smart, and so works better for music.
Cheers,
David.
If you're interested in stereo to 5.1 upmixing, take a look at this (http://forum.doom9.org/showthread.php?s=&threadid=60137) (slightly more complicated ambisonic method) or this (http://forum.doom9.org/showthread.php?s=&threadid=57179) (more automated, quite close to what you're trying to do).
From a simple stereo input, if you invert one channel and then combine the channels you will eliminate all audio common to both:
Left + (inverted Right) = "zero" Center
From this, if you go back to the original Left and Right channels, subtract the "zero" Center from each to create the centered audio channels (which are essentially duplicates of each other):
Left - ("zero" Center) = "centered" Left = "true" Center
Right - ("zero" Center) = "centered" Right = "true" Center
Then, the "true" Center can be subtracted from the original Left and Right channels to create the "extreme" Left and Right:
Left - ("true" Center) = "extreme" Left
Right - ("true" Center) = "extreme" Right
Variations on this process form the basis for the standard creation of OOPS (Out-of-phase-stereo) mixes.
- M.
Take a look at VirtualDub's "Center cut" filter. Here's a description from Avery Lee:
Center cut. The classic "vocal cut" filter, except that the output is stereo instead of mono. This is accomplished through FFT phase analysis; the output will have some warbling in it, but stereo separation is preserved.
It not only produces stereo side output, but a mono center output (depending on which output pin of the filter is used).
It sounds pretty good IMO, here's a clip of the stereo side output: M2M - Smiling Face (23 Sec Clip) (http://members.cox.net/moitah/M2M-Smiling_Face_Clip_(Center_Cut).mp3) (Sorry, I didn't make a clip of the mono center output)
I asked a while back if someone would make this into a Winamp plugin, but got no response...
Well, I ended up doing it myself . See this thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=17450&).
Left - ("true" Center) = "extreme" Left
Right - ("true" Center) = "extreme" Right
It doesn't quite work this way, as 2B explained 2 posts before yours
Left + (inverted Right) = "zero" Center
From this, if you go back to the original Left and Right channels, subtract the "zero" Center from each to create the centered audio channels (which are essentially duplicates of each other):
Left - ("zero" Center) = "centered" Left = "true" Center
Yes I really think this doesn't work
you talk about "zero" = L-R
"Centered" left = Left - "zero" = L- (L-R) = L - L + R = R
not very useful
That's because in "zero" (L-R) both channel are present and the R channel is phase inverted!
The best passive system is simply
Ln=L
Rn=R
Cn=L+R (-3dB)
But the stereo sound stage width reduces (move the speakers further apart from compensate).
I've read that Michael Gerzon (ambisonics) suggested a better 2>3 conversion, but don't know any more.
He does something like
Ln= L - K*R
Cn= K * (L+R)
Rn = R - K*L
It works fine with K=0.5
Better than a complete "Side" (L-R) because Ln and Rn are not completely phase inverted, the mono content is reduced and the stereo picture are still good.
It's a compromise.
To do Ln and Rn I've just created a "subtraction channel": another stereo channel in the mix with the original stereo "swapped" in R L and phase inverted.
Playing with fader you can tune K to your own settings.
It sounds good
But the Voice are not full separated in Cn as I wish...
Would it make more sense if I explained it by replacing the minus signs with a combination of plus signs and notation of the phase inversion? I was assuming folks who read this could translate that part for themselves, from the previous description. At any rate, here's a second attempt at explaining the same thing:
Left + (inverted Right) = "zero" Center
From this, if you go back to the original Left and Right channels, subtract (or add as an inversion) "zero" Center from each to create the centered audio channels (which are essentially duplicates of each other):
Left + (inverted "zero" Center) = "centered" Left = "true" Center
Right + (inverted "zero" Center) = "centered" Right = "true" Center
Then, the "true" Center can be subtracted (or added as an inversion) from the original Left and Right channels to create the "extreme" Left and Right:
Left + (inverted "true" Center) = "extreme" Left
Right + (inverted "true" Center) = "extreme" Right
Note that the terms "zero," "centered," "true" and "extreme" are in quotation marks (""), to imply that they are used only for convenience of notation, and to clarify channel order.
Now, all of this still may not quite do what RealTime was wanting to do, but I assure you it is a sound practice. Next time, before you write something off with "It doesn't quite work this way," try going through it step by step to be sure you understand what's been written.
[/semi-irritated rant mode off]
- M.
I'm sorry but what you say has something that doesn't work from an algebric point of view.
You say:
Left + (inverted Right) = "zero" Center
So the thing that you call "zero" Center is (L-R)
Going on, you say:
Left + (inverted "zero" Center) = "centered" Left = "true" Center
if we replace "zero" Center with L-R we have
Left + (inverted "zero" Center) = Left - (L-R)
but this means:
= Left - L + R = R
this is not an opinion as sayed by einstein
Center channel is (L+R)/2 (if both are equal, it is mono, and center has this value. If both are opposite, there's no central content, so this is zero)
"pure" left channel would be the left minus this center channel. similar with the right one.
A few things:
1) I wish to apologize for the tone of my previous post. I was irritated, and it was uncalled for.
2) I also wish to withdraw the explanation I offered, as it is (as several of you already recognized) flawed.
The lesson? Don't ever try to explain things when your wife has just told you she might be pregnant. Even concepts with which you are intimately familiar, and techniques you have used for years, will seem a little garbled and incoherent.
I hesitate to post anything more until my mind settles down (And I hope everyone here will forgive my obvious and egregious errors!).
- M.
M don't worry, for me it's ok!
I think you've just created a good "Center Channel" between a great Stereo couple (you and your wife), so don't mind about mistakes!
Best wishes!
GIANNI
"pure" left channel would be the left minus this center channel
JAZ if you subctract L+R sum from the left channel, the only thing you'll get will be the Right channel phase inverted
It's simple:
L - (L+R) = L - L - R = -R
Cheers
"pure" left channel would be the left minus this center channel
JAZ if you subctract L+R sum from the left channel, the only thing you'll get will be the Right channel phase inverted
It's simple:
L - (L+R) = L - L - R = -R
Cheers
Yes, but if you do as JAZ suggested you get:
L - (L+R)/2 = (L-R)/2
The only 'real' way to do this would be to do a spectral band transform, filter to the channels and then reconstruct.
Tangent can you explain this process?
Yes, but if you do as JAZ suggested you get:
L - (L+R)/2 = (L-R)/2
Yes ancl you're right.
JAZ suggests to do this to obtain the "pure" L, and doing the same on the Right channel to produce the "pure" R.
So we will have (L' and R' are the supposed "pure" versions)
L'= (L-R)/2
R'= (R-L)/2
but this also mean that L' and R' are the same thing with the phase inverted.
This is something like an M/S encoding, whith M containing the mono version (beware: NOT only the mono contents but the mono mix of the two channels) and S containing the "side" LR, in the same way that an M/S microphone captures the spacial information.
To recreate the original stereo picture you have to do:
L=M+S
R=M-S
So a way to derive the three channels could be the follow:
L=S
C=M
R=-S
If you are sitting EXACTLY in the front of the speaker, you can listen something close to the original stereo picture "expanded" on the three speakers, but this is not a valid way to derive the channels because nobody has the speaker perfectly tuned and centered. Furthermore if you tilt or rotate your head you'll listen very strange Phase effects because L=-R and viceversa.
I use something very close, but a little bit better for the phase:
L'=L-0.5*R
C=(L+R)*.5
R'=R-0.5*R
this reduces the phase inversion and preserve center separation and stereo picture.
It's not perfect but a good compromise IMO.
But I like to obtain MORE....
What I'm searching for it's a way to obtain a COMPLETE (almost) separation of the voices on the center channel...
what's the way??
Realtime: Did you look at what I suggested, or is that not what you want?
MOITAH: YES!!!! I'm sorry but I missed your reply!!
Thanks a lot to sent me that link!
It works very very well and it sounds very very good.
Anyway it has little phase distortion and sometimes it produce crack/noise in the waveform (I suppose with very high frequency sounds or fast envelope sounds)
I don't know if it's a problem of the technique used or an implementation bug.
Anyway I'd like to know more of the Avery Lee techique used to do that because I wish to do the same in Nuendo Surround Edition on my own stereo files.
I understand something of C/C++ but it's not a simple task for me to understand the algorythm from the source code..
Furthermore, me and a friend of mine would like to produce a Free VST Plugin to do that stuff, it would be very nice for all the people that use Cubase/Nuendo or other VST Hosts.
Thanks a lot!!
GIANNI
Thanks a lot!!
No problem . I'm glad you asked about it, because I had pretty much given up the idea of having it as a Winamp plugin (the first time I looked at the code in VirtualDub, I thought it would be too hard for me to do... and nobody else seemed interested). I tried more patiently this time and it took me almost 2 whole days, but I'm very happy that I got it done (and Chun-Yu made a foobar2000 plugin (http://www.hydrogenaudio.org/forums/index.php?showtopic=17661&) based on it as well).
... sometimes it produce crack/noise in the waveform (I suppose with very high frequency sounds or fast envelope sounds)
Try the new version. All the processing is now done with 'double' instead of 'float' variables. I don't know if this fixed it, but when comparing output with the original version I noticed some beeping sounds were gone.
I've tried the 1.1.0 release, It think it's the last version, isn't it?
Anyway I hear also large and strange phase distortions, and I'd like to improve the quality of the output.
About that Avery says:
What really improves the quality of the algorithm is
increasing the FHT size to 16384 points or beyond; for smaller FHTs
such as 4096 points, increasing the frequency of the overlapping
transforms to every 1024 or even 512 samples helps somewhat. You have
to adjust the output amplitude to compensate, of course.
I've tried to increase the FHT window size in the source code changing some constants in this way:
ORIGINAL:
enum {
kWindowSize = 4096,
kHalfWindow = 2048,
kQuarterWindow = 1024
};
MODIFIED:
enum {
kWindowSize = 8192,
kHalfWindow = 4096,
kQuarterWindow = 2048
};
But it doesn't work.. it play something and then crashes..
What's wrong?
The VDCenterCut_Run() function needs a certain amount of input samples (kQuarterWindow) and outputs the same number of samples. Since Winamp isn't going to give you exactly that number of samples each time (it's 576 in the case of FLAC, for example), the input is buffered until enough samples are collected. Then the whole output is given back to Winamp at once. The DSP plugin SDK states that you cannot output more than twice the number of input samples. This isn't a problem now because the smallest number of input samples I've seen is 576, so you are allowed to output up to 1152 samples (which is just slightly bigger than kQuarterWindow). When you increase the size of kQuarterWindow, the output becomes too large for Winamp's sample buffer to hold, causing an overrun. The solution to this would be to create an output buffer, so that smaller blocks of samples can be passed back to Winamp more often, instead of one big block all at once. I will try to do this soon.
(I hope that made sense... )
I finally got around to doing the output buffering. I tested with a 32768 point FHT and it uses about 9% CPU on my P4 2.4GHz. It really does sound noticably better . I'll post it sometime tomorrow hopefully.
Looking forward to the updated model...
- M.
Version 1.2.0 (http://www.hydrogenaudio.org/forums/index.php?showtopic=17450&) is out now. I delayed releasing it because I noticed larger FHT size causes echo to be added. I e-mailed Avery about this and here's his response (I hope you don't mind me posting this, Avery ):
To be honest, I don't know exactly where the echo comes from, although
it does seem to be an indication that the algorithm is both becoming
more stable and less correct with larger window size (accuracy vs.
precision tradeoff). 4x8K is probably about the best that can be
accomplished. The algorithm tends not to work very well at
either the low or high extremes of the spectrum, and windowing the
center channel's spectrum before subtraction might help, but I haven't
tried it. I can tell you that center cut is quite sensitive to the
stereo quality of your sample; an MP3 that sounds OK can turn out very
ratty once the center channel has been removed, if joint-stereo mode
really zapped it.
You can also try increasing or decreasing the amount of overlap,
although that won't help much if the window size alone is giving
excessive echo or warbling.
I don't know enough about DSP to try his suggestions. I ended up using a FHT size of 8192 because I think it's a good balance between warbling and echo, but if you look in the 'test' directory in the zip file I also compiled versions with 4k (this is how the code originally was), 16k, and 32k FHTs.