Anyhow, great work in progress
I am excited to see further developement and how it coud work with RG.
What R128GAIN does is the following (in principle):Create an empty gating block capable of holding samples up to 400ms using a ring buffer.For each input sample:If the gating block is full remove the first sample from it.Add the current sample to the end of the gating block.If the gating block is full:Pick the sample cached in the middle of the gating block.Depending on the (un-gated) loudness measure of the gating block decide, whether to add the picked sample to the overall statistics.That's my understanding of Tech doc 3341, Annex 1, at least in principle.
I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1.
The "ungated" total loudness of the resulting set of blocks is your result.
It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%.
Doesn't it imply to count samples more than once?
What do you mean by "loudness of a set of blocks"?
PS My comments are not supposed to curtain the fact that you have done a great job so far!
But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc).I looks to me like there is a good reason to stay near the 50% minimum overlap.
Many thanks to C.R.Helmrich and you for the great comments!
$ r128gain ../sounds/ebu-loudness-test-setv01/args../sounds/ebu-loudness-test-setv01 analyzing ... 1kHz Sine -20 LUFS-16bit.wav (1/16): -20.0 LUFS, -3.0 LU (peak: 0.100734: -10.0 dBFS) 1kHz Sine -26 LUFS-16bit.wav (2/16): -26.0 LUFS, 3.0 LU (peak: 0.050508: -13.0 dBFS) 1kHz Sine -40 LUFS-16bit.wav (3/16): -40.0 LUFS, 17.0 LU (peak: 0.010260: -19.9 dBFS) seq-3341-1-16bit.wav (4/16): -23.0 LUFS, -0.0 LU (peak: 0.071316: -11.5 dBFS) seq-3341-2-16bit.wav (5/16): -33.0 LUFS, 10.0 LU (peak: 0.023049: -16.4 dBFS) seq-3341-3-16bit.wav (6/16): -23.0 LUFS, -0.0 LU (peak: 0.071468: -11.5 dBFS) seq-3341-4-16bit.wav (7/16): -23.0 LUFS, 0.0 LU (peak: 0.070850: -11.5 dBFS) seq-3341-5-16bit.wav (8/16): -22.9 LUFS, -0.1 LU (peak: 0.100845: -10.0 dBFS) seq-3341-6-5channels-16bit.wav (9/16): -23.0 LUFS, 0.0 LU (peak: 0.063133: -12.0 dBFS) seq-3341-6-6channels-WAVEEX-16bit.wav (10/16): -23.7 LUFS, 0.7 LU (peak: 0.063133: -12.0 dBFS) seq-3341-7_seq-3342-5-24bit.wav (11/16): -23.0 LUFS, -0.0 LU (peak: 0.358341: -4.5 dBFS) seq-3341-8_seq-3342-6-24bit.wav (12/16): -23.0 LUFS, 0.0 LU (peak: 0.718299: -1.4 dBFS) seq-3342-1-16bit.wav (13/16): -22.6 LUFS, -0.4 LU (peak: 0.100089: -10.0 dBFS) seq-3342-2-16bit.wav (14/16): -16.8 LUFS, -6.2 LU (peak: 0.177974: -7.5 dBFS) seq-3342-3-16bit.wav (15/16): -20.0 LUFS, -3.0 LU (peak: 0.100089: -10.0 dBFS) seq-3342-4-16bit.wav (16/16): -20.0 LUFS, -3.0 LU (peak: 0.100075: -10.0 dBFS) ALBUM: -21.9 LUFS, -1.1 LU (peak: 0.718299: -1.4 dBFS)
r128gain <input>? [-o <directory> [flac]]
Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison[blockquote]block loudness > -70 LUFS[/blockquote]is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to[blockquote]block energy > 0.4 * sample rate * 10^((-70+0.691)/10),[/blockquote]with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by[blockquote]block energy > 0.1584893 * average 70-gated energy[/blockquote]
The BS.1770 loudness measure is defined as[blockquote]-0.691 + 10*lg(wmsq),[/blockquote]where[blockquote]wmsq = sum_i_j G_i*x_i_j*x_i_j/n,i running over all channels,G_i the weighting coefficient for the i-th channel,j running from 0 to n-1 over all sampling intervals,x_i_j the j-1 channel's voltage of the i-1 sample[/blockquote]is the (per channel) weightet mean square of the intervall under consideration.
The BS.1770 loudness measure is defined as[blockquote]-0.691 + 10*lg(wmsq),[/blockquote]where[blockquote]wmsq = sum_i_j G_i*x_i_j*x_i_j/n,i running over all channels,G_i the weighting coefficient for the i-th channel,j running from 0 to n-1 over all sampling intervals,x_i_j the j-th channel's voltage of the i-th sample[/blockquote]is the (per channel) weightet mean square of the intervall under consideration.
Just out of curiosity, where does that 0.4 come from?
Without independent tag fields the authors of such plugins cannot start supporting EBU R128 gain control in their Replay Gain plugins.
C:\development\replaygain>r128gain.exe ref_pink.wavargs analyzing ... ref_pink.wav (1/1): -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS) ALBUM: -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS)
I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base?
New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20.
Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding).