Multichannel Replay Gain

Topic: Multichannel Replay Gain (Read 6753 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Multichannel Replay Gain

2010-01-16 17:40:37

I have recently stumbled on the issue of Replay Gain analysis of multichannel audio files. Someone asked at the J. River forum if J. River Media Center is able to analyze multichannel files for Replay Gain.

I wrote the following reply (a quote from http://yabb.jriver.com/interact/index.php?...78198#msg378198 ):

__________________________________________________________

I don't think there's any standard for analyzing multichannel files. This is what the Replay Gain standard says about stereo files:

http://replaygain.hydrogenaudio.org/rms_energy.html

Quote

Stereo files

The only difficulty lies in what to do with stereo files. We could sum them to mono before calculating the RMS energy, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how we perceive them, so it's not a good solution.

The alternative is to calculate two RMS values (once for each channel) and then add them. Unfortunately a Linear addition still doesn't give the same effect as our ears. To demonstrate this, consider a mono (single channel) audio track. We replay it over 1 loudspeaker, and remember how loud it sounds. If we now replay it over 2 loudspeakers, how large should the signal to each speaker be such that, overall, the sound is still as loud as before? You'd think the answer would be half as large (since we have two speakers - that's what a linear addition would suggest) but if you try it, you'll find that the answer is about 3/4.

We get the right answer if we add the means of the channel-signals before calculating the square root. In mixing pan-pot terms, we're using "equal power" rather than "equal voltage". If we also assume that any mono (single channel) signal will always be replayed over two loudspeakers, we can treat a mono signal as a pair of identical stereo signals. Hence a mono signal gives (a+a)/2 (i.e. a), while a stereo signal gives (a+b)/2, where a and b are the mean squared values for each channel. After this, we carry out the square root and conversion to dB.

My suggestion would be:

Analyze only the three front channels (L, R, C) and calculate:
(a+b+c)/3, where a, b and c are the mean squared values for each channel. Perhaps if the center channel is silent or significantly quieter than the left and right channels it should taken out of the equation. "Significantly" could be e.g. a difference of 12 dB or more.

If the file is "quadraphonic" and contains four main channels (4.0 or 4.1) analyze only the two front channels. Though I don't know if the file headers can actually inform about the correct channel mapping in this case.

I am suggesting leaving the other channels out because the signal that comes from the side and back channels is normally very uneven and don't usually build the main sound field.

A small minority of audio mixes place the listener more or less in the center, but only a very odd mix would have the main audio source behind the listener. For instance, an exactly "centered" 5.1 mix would subjectively be a bit louder than what the Replay Gain Value indicates because in order to place the source in the center the two surround channels need to be slightly louder than the three front channels, but probably that would not be a practical problem.[/i]
__________________________________________________________

After posting my reply I did some experimenting with foobar 2000 and noticed that it can analyze 5.1 FLAC files and use playback correction. I searched the HA forums, but couldn't find any related information. How does its analyzer handle the channels?

A straightforward and perhaps not the best possible approach would be to just add the additional channels to the equation: (a+b+c+d+e+f)/6

Apparently the above is not what foobar does because, for instance:

when the following three points are true
1) all channels have identical content (the same mono file copied to all six channels)
2) a = Left, b = Right, c = Center, d = Low Frequency, e = Surround Left, f = Surround Right
3) "1" indicates full volume level and "0" a silent channel (volume reduced to zero before saving the multichannel mix)

a 6-channel file "a1,b1,c1,d0,e0,f0" and a 6-channel file "a0,b0,c0,d1,e1,f1" produce different Replay Gain values.

When I analyzed a pair of such test files foobar found "a1,b1,c1,d0,e0,f0" to be 1.76 dB louder than "a0,b0,c0,d1,e1,f1".

I think now would be a good time to define and add multichannel analysis to the Replay Gain standard. I'd say that the dominant multichannel setups on the file level are 5.1 and in lesser extent 7.1 and that only these two should be included in a multichannel addition. The actual speaker setups can vary greatly but that should be handled by a channel mixer after applying the Replay Gain correction.

Multichannel Replay Gain

Reply #1 – 2010-01-16 20:31:54

BS.1770 (Annex 1) defined a way to mix together multichannel audio for loudness estimation: L/R/C channels are mixed added +0db, surround channels are mixed +1.5db. That might be a better place to start from.

Multichannel Replay Gain

Reply #2 – 2010-01-16 21:40:04

Unfortunately ITU restricts access to the documents. A google search found a pdf of the ITU-R BS.1770-1 paper. It looks authentic.

I browsed quickly trough the document, but it isn't clear to me why the two surround channels should weight more than the three main channels.

As I already said, if the apparent direction of the audio source in a 5.1 mix is "centered" then the two surround channels already must have a higher volume level per channel than the three front channels.

For practical purposes I am suggesting taking the surrounds out of the equation because in my experience usually the output of the front channels defines the subjectively experienced volume level and even if the mix is more or less centered the result would not be too far off for replay gain purposes. Only a mix in which the surround channels have a dominant role (i.e. the source is behind the listener) would be really incorrectly measured. When the surround channels (and LFE) are not included it doesn't matter if the file is 5.1, 7.1 or contains even more surround channels.

If this wasn't clear I would like to have a standard that the developers can follow if they want to create a replay gain analyzer that can handle also 5.1 and 7.1 files.

I am trying to avoid a possible future situation in which we have various different implementations that save RG tags to files.

If the system that foobar uses appears to be good perhaps it should be documented and added to the standard.

Obviously the practical reason to add multichannel analysis would be the ability to use the Replay Gain playback correction with mixed playlists of stereo and multichannel files.

EDIT

I am thinking only about audio files, but perhaps someone would like to add a multi-channel replay gain analyzer and playback correction system to a video player. It could be useful with music videos or other short video files that are played from a playlist.

Multichannel Replay Gain

Reply #3 – 2010-01-19 01:50:23

I was a bit mindful of this thread when posting on another one.

Although I don't have anything conclusive to say about how it ought to be done, or any experience of using 5.1 audio, my comments give a different perspective.

In particular, I understand that Replay Gain algorithms typically deal with stereo or mono files.

To my mind Centre + Low Freq Extension channels could be added in the voltage or power domains with practically the same effect (save for clipping, which Replay Gain functions should be immune to, being floating point), because they occupy separate parts of the spectrum.

Now, I'd assume a simple method of Replay Gain would simply:

take the time-windowed power for each channel (in linear units, not dB), possibly a power spectrum over the averaging time
calculate the sum of powers in all channels (or power spectral densities if using a power spectrum)
apply the inverse loudness contour weighting to determined perceived loudness power (again linear units, possibly do this before step 2)
convert to dB scale

The algorithm for loudness over a time chunk could even be coded to start with zero power, then iterate through all channels adding the power (or p.s.d.) to the stored value for that time chunk.

I haven't included peak detection, which can work per-channel. If Centre + LFE clips, that represents a special case of clipping that occurs only when the Centre channel is a full-range driver taking the sum of C+LFE. If we added LFE/√3 to each of C, L and R it's a different configuration again.

Anyhow, that system is relatively simple, in that it's extensible to any number of channels, providing it makes sense to sum the powers or power spectral densities. Special cases where that won't work include speakers positioned much further away from the listener, speakers with deliberately different sensitivity or those with dramatically different frequency response, none of which one could expect a generalized algorithm to cope with (unless some such becomes a standard of assumed placement like 5.1 surround).

How does that compare to, for example, Dolby's standard (or is it theirs?) for calibrating movie theatres with pink noise (as mentioned in the Replay Gain proposal)? I think that was to play pink noise at -20dBFS over one channel only and calibrate to achieve 83 dB SPL on a particular spec of sound meter. (It was, and the meter was a C weighted slow averaging type SPL meter). So, that means that playing that signal over any one only of the 5 channels (with the low-freq component of such noise played over the LFE subwoofer if necessary) should produce 83 dB SPL at the listening position when calibrated.

So, give or take any requirement to ignore low-frequency components not reproduced by the satellite speakers (which should be mastered to LFE anyhow or moved to LFE by the receiver automatically) the algorithm above seems reasonable and gain values can be calculated in comparison to a single channel pink noise calibration signal played on one channel.

A possibly computationally demanding, but probably technically superior solution (which goes beyond the movie standard) might be to model or measure (using in-ear binaural recording microphones or a dummy-head "Kunstkopf") the in-ear response to each channel (essentially as a transfer function or impulse response) from each of the standard speaker positions used in home theatre in a typical setup. This can also be simulated with Dolby Headphone DSP which does this same effect for 5.1 to binaural (i.e. headphone) downmixing for a virtual surround effect. Recordings of an impulsive sound (balloon popping is suggested by one website) in the appropriate position can be used to generate impulse responses directly (though calibrated loudness might be problematic). The summation of amplitudes from all speakers via their respective impulse response to each ear can be fed to the ReplayGain algorithm (and compared to the average power of the pink noise signal computed from each speaker to both ears).

Likewise, the same approach could be applied to a typical stereo setup, working out stereo-to-binaural transfer functions or impulse responses. Foobar2000 has one or two stereo-to-binaural simulator DSPs.

It's possible that DSPs like downmix and upmix from stereo to 5.1 and vice versa should be calibrated to preserve perceived loudness measured by the chosen method with typical source material (or if it's deemed essential to lower gain to avoid clipping, they might feed a Replay Gain correction value to the playback chain).

[reason for edit: add "computationally demanding"]

Notice