Multichannel Replay Gain
Reply #3 – 2010-01-19 01:50:23
I was a bit mindful of this thread when posting on another one . Although I don't have anything conclusive to say about how it ought to be done, or any experience of using 5.1 audio, my comments give a different perspective. In particular, I understand that Replay Gain algorithms typically deal with stereo or mono files. To my mind Centre + Low Freq Extension channels could be added in the voltage or power domains with practically the same effect (save for clipping, which Replay Gain functions should be immune to, being floating point), because they occupy separate parts of the spectrum. Now, I'd assume a simple method of Replay Gain would simply:take the time-windowed power for each channel (in linear units, not dB), possibly a power spectrum over the averaging time calculate the sum of powers in all channels (or power spectral densities if using a power spectrum) apply the inverse loudness contour weighting to determined perceived loudness power (again linear units, possibly do this before step 2) convert to dB scale The algorithm for loudness over a time chunk could even be coded to start with zero power, then iterate through all channels adding the power (or p.s.d.) to the stored value for that time chunk. I haven't included peak detection, which can work per-channel. If Centre + LFE clips, that represents a special case of clipping that occurs only when the Centre channel is a full-range driver taking the sum of C+LFE. If we added LFE/√3 to each of C, L and R it's a different configuration again. Anyhow, that system is relatively simple, in that it's extensible to any number of channels, providing it makes sense to sum the powers or power spectral densities. Special cases where that won't work include speakers positioned much further away from the listener, speakers with deliberately different sensitivity or those with dramatically different frequency response, none of which one could expect a generalized algorithm to cope with (unless some such becomes a standard of assumed placement like 5.1 surround). How does that compare to, for example, Dolby's standard (or is it theirs?) for calibrating movie theatres with pink noise (as mentioned in the Replay Gain proposal )? I think that was to play pink noise at -20dBFS over one channel only and calibrate to achieve 83 dB SPL on a particular spec of sound meter. (It was, and the meter was a C weighted slow averaging type SPL meter). So, that means that playing that signal over any one only of the 5 channels (with the low-freq component of such noise played over the LFE subwoofer if necessary) should produce 83 dB SPL at the listening position when calibrated. So, give or take any requirement to ignore low-frequency components not reproduced by the satellite speakers (which should be mastered to LFE anyhow or moved to LFE by the receiver automatically) the algorithm above seems reasonable and gain values can be calculated in comparison to a single channel pink noise calibration signal played on one channel. A possibly computationally demanding, but probably technically superior solution (which goes beyond the movie standard) might be to model or measure (using in-ear binaural recording microphones or a dummy-head "Kunstkopf") the in-ear response to each channel (essentially as a transfer function or impulse response) from each of the standard speaker positions used in home theatre in a typical setup. This can also be simulated with Dolby Headphone DSP which does this same effect for 5.1 to binaural (i.e. headphone) downmixing for a virtual surround effect. Recordings of an impulsive sound (balloon popping is suggested by one website) in the appropriate position can be used to generate impulse responses directly (though calibrated loudness might be problematic). The summation of amplitudes from all speakers via their respective impulse response to each ear can be fed to the ReplayGain algorithm (and compared to the average power of the pink noise signal computed from each speaker to both ears). Likewise, the same approach could be applied to a typical stereo setup, working out stereo-to-binaural transfer functions or impulse responses. Foobar2000 has one or two stereo-to-binaural simulator DSPs. It's possible that DSPs like downmix and upmix from stereo to 5.1 and vice versa should be calibrated to preserve perceived loudness measured by the chosen method with typical source material (or if it's deemed essential to lower gain to avoid clipping, they might feed a Replay Gain correction value to the playback chain). [reason for edit: add "computationally demanding"]