Advanced or psychoacoustic downmixing

Topic: Advanced or psychoacoustic downmixing (Read 3577 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Advanced or psychoacoustic downmixing

2003-01-15 22:32:46

Typically, downmixing from stereophonic (2-channel) to monophonic (1-channel) sound is achieved by adding the sample values from left and right channels and dividing by 2.

Usually this works pretty well, partly because most music for playback on HiFi's has been professionally recorded using microphone techniques that keep left and right largely in-phase. If they didn't, it wouldn't play well on old mono radios (or AM radio or TV). Likewise, studio recordings compiled from individually miked instruments and vocalists then placed in a stereo soundstage, are usually mastered to ensure mono compatibility.

However, if there are phase differences between channels, particularly frequency-dependent phase differences (such as simple time delays), it can create a comb-filter effect or any arbitrary effect where parts of the sounds are missing, where some frequencies add constructively, others cancel out completely, and many are somewhere in between. This is particularly true of artificial stereo effects, I believe.

Mostly, simple downmixing works. Sometimes when it doesn't, simply choosing the left or the right channel is good enough. I seem to recall that before the days of NICAM stereo broadcasts, TV viewers watching the video of Kiss by Tom Jones/Art of Noise were provided with the left channel only, which completely removed much of the sound in certain sections of the song, leaving practically only electronic drum sounds. This is one occasion when it didn't work (although it added some interesting dynamic to the music).

Some samples would never be quite right with either method.

What I've never found is a robust "secure mode" for downmixing to mono the way the human ear would expect. All encoders, even multi-channel capable ones like Ogg Vorbis seem to simply take the average of the sample values.

I've tried Googling, but found nothing. Has anyone heard of such a tool?

I'd assume there might be a few approaches.

A mathematical way would be to derive the frequency spectrum (real part or the power spectrum) of each channel for short frames or blocks and add or average the amplitudes of the components together with the same phase relationship for the part from each channel. For projects like LAME, this sounds like the sort of thing that could be done while encoding to MP3, just after the MDCT, rather than doing the downmix first. Adjustments to the phase relationship may induce clipping on decode.

Another approach might involve some form of psychoacoustics that I haven't thought of. Any ideas?

Regards,

Dick Darlington

Advanced or psychoacoustic downmixing

Reply #1 – 2003-01-16 00:16:41

I think it's OK to mangle phases. Take full power spectrum and average both phases and amplitudes. If you get too strong a signal, you know where the gain knob is. ;-)

Advanced or psychoacoustic downmixing

Reply #2 – 2003-01-16 03:52:33

How do you average a phase difference of 180 degrees?
What about completely killing phase (set all to 0)? Might that work?

Advanced or psychoacoustic downmixing

Reply #3 – 2003-01-16 07:29:27

well, I suppose you could do a normal mix and an invert-mix and if there's anything in the invert mix you could mix that with the first normal mix

Advanced or psychoacoustic downmixing

Reply #4 – 2003-01-16 09:54:00

Quote

well, I suppose you could do a normal mix and an invert-mix and if there's anything in the invert mix you could mix that with the first normal mix

That is one of those simple ideas that sounds so good in theory!

But if you do the maths:

N=L+R
I=L-R

(normal and inverted).

At some points, you suggest we could add the normal and inverted mix, N+I, to get everything.

N+I=(L+R)+(L-R)=2L

So, this way, you just get the left channel!

Frequency domain processing sounds promissing. I'll have to try that! :-)

Cheers,
David.

Advanced or psychoacoustic downmixing

Reply #5 – 2003-01-16 10:32:01

Thanks guys,

I think setting phase to zero (e.g. using a cosine to generate each component, which I think the MDCT effectively does) ought to be fine, then averaging would work.

Doctor, the average of phase P and phase P + 180° is P + 90°, but any arbitrary phase ought to sound the same, as Tangent said. (Remember, I'm not talking about averaging the sample levels. If you had a sine-wave, I'd shirt it left or right along the time axis by up to have a period in either direction.

_Shorty, if you do a normal mix, M:

M = ½L + ½R

and an invert mix, I is:

I = ½L - ½R

If you find stuff in the invert mix (which you will if there's any stereo content) and add it to the normal mix, you get:

M + I = ½L + ½L + ½R - ½R = L

So what you're saying is simply the same as picking the left channel. Likewise, M - I = R, the same as picking the right channel.

[Edit: I see David got there first. Serves me right for getting diverted in mid message before finally submitting it!]

To get the loudness of both, I think you have to remove the phase differences over short timescales before adding the components, as previously discussed.

I guess nobody knows of any tools that do this?

It's a shame Cool Edit, which does plenty of frequency-domain stuff, some of it over short "frames", doesn't actually let you access the transform in the editor and do things like mixing right there.

I guess if I had the tools and experience to compile the LAME source code, I could experiment with adding my own commandline switch for "secure downmix" to average the left and right channel frequency components just after the MDCT, or I could write a standalone program (I probably could, at a stretch) to take a RAW or WAV file, carry out MDCT or FFT, do the downmix, then perform the inverse transform and write back a new RAW or WAV file.

If I did that, I guess I'd want to include a preamp to scale the components in the frequency domain before the inverse transform. I guess a little dither would also help before truncation to 16 bits.

Dick Darlington

[Edit: If anyone tries this method and finds it to work (maybe the binaural samples at http://jimtreats.crosswinds.net/MyTreats/B...aural/index.htm would be good samples to test it with) please let us know. I believe, for example, that David used to have a fair number of signal processing tools at his disposal, from reading his excellent ReplayGain and academic work. If not, I may eventually get round to writing a crude program in C or something, unless I learn enough of one of these Matlab clone languages]

Advanced or psychoacoustic downmixing

Reply #6 – 2003-01-16 18:34:56

ok, so what I did was record a stereo file from my mic where I was saying "both" over and over for 10 seconds.

Then I recorded a mono file where I was saying "left" over and over for 10 seconds.

And I recorded a mono file where I was saying "right" over and over for 10 seconds.

Then I took the stereo 'both' file and mixed the 'left' file in with the left channel and the 'right' file in with the right channel.

Then I made another copy of the stereo file with all three words.

I downmixed one to mono simply doing a 50% L and 50% R. Can hear all three words.

I downmixed the other by inverting the R channel. Can only hear 'left' and 'right', 'both' is gone from this file.

I mixed the two together and surprise, you can still hear 'right' so it is not simply taking the left channel.

<edit> ah that's why. I missed the 50% during the invert stereo to mono, it was using 100% so the 'right' ended up being louder than the other one and that's why it didn't get cancelled. did it again and it did indeed only have 'left' and 'both' in it.

Advanced or psychoacoustic downmixing

Reply #7 – 2003-01-16 22:15:18

Quote

Doctor, the average of phase P and phase P + 180° is P + 90°, but any arbitrary phase ought to sound the same, as Tangent said. (Remember, I'm not talking about averaging the sample levels. If you had a sine-wave, I'd shirt it left or right along the time axis by up to have a period in either direction.

I thought about it too. I don't like the idea of discarding phase altogether because downmixing should only lose stereo picture and preserve everything else. The inverted case is not special: you always have two possible solutions for the average of phases (you always have two roots of a quadratic eqn .

Can somebody elaborate: is phase information really irrelevant? Do lossy codecs discard it?

Advanced or psychoacoustic downmixing

Reply #8 – 2003-01-17 04:42:59

Btw, P+270 is also a valid solution for the average of P and P+180

Notice