Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: R128GAIN: An EBU R128 compliant loudness scanner (Read 387899 times) previous topic - next topic
0 Members and 5 Guests are viewing this topic.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #50
Anyhow, great work in progress 

Thanks 

I am excited to see further developement and how it coud work with RG.

R128GAIN writes the RG tags (currently only for FLAC). I've re-tagged all my FLACs and listen (shuffling) to them all the time with RG enabled. That's my most important test case

R128GAIN: An EBU R128 compliant loudness scanner

Reply #51
What R128GAIN does is the following (in principle):
  • Create an empty gating block capable of holding samples up to 400ms using a ring buffer.
  • For each input sample:
    • If the gating block is full remove the first sample from it.
    • Add the current sample to the end of the gating block.
    • If the gating block is full:
      • Pick the sample cached in the middle of the gating block.
      • Depending on the (un-gated) loudness measure of the gating block decide, whether to add the picked sample to the overall statistics.
That's my understanding of Tech doc 3341, Annex 1, at least in principle.


I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1.

  • A measurement interval is first divided into blocks of 400ms length (with at least 50% overlap). Basically doable with a ring buffer.
  • From this set of blocks, eliminate all blocks with loudness below -70 LUFS. (I don't understand why you speak of adding samples (and not blocks) to the "overall statistic").
  • Calculate the total loudness for the latter, decimated set of blocks.
  • This loudness minus 8 LU is your final threshold.
  • Decimate the already decimated set of blocks again: Remove all blocks below the final threshold (which can only be known after a complete pass).
  • The "ungated" total loudness of the resulting set of blocks is your result.

I think it would be simpler to work with block indices (a list, array, or bitmap) than a ring buffer.* Read the input stream two times block by block and skip the calculated indices. Usually, with the buffering left to the OS, you should be reading the second pass from memory automatically.

PS The wording in Annex 1 could be better. Especially the "gated loudness" LKG in (6) and (8) should use different symbols. But mathematically it is not ambiguous. It took me over an hour to crunch the whole thing, though.

* Of course, you can still use something as a ring buffer for I/O. But I would put a block-wise abstraction layer on top of it to make the overall design simpler.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #52
I agree with C.R.H. that this interpretation doesn't seem to follow Tech 3341, Annex 1.

Probably you're both are right, I have to think about it.

What I have in mind (probably not correct) is the following:
  • BS.1770 defines a loudness measure to which all samples contribute the same way.
  • R128 adds a gate: some samples have to be removed from the pure BS.1770 measure.
  • In R128GAIN the gate is local to each sample, i.e. for each sample it consist of -200ms to +200ms samples relative to the sample under consideration. Depending on the pure (un-gated) BS.1770 loudness measure of the relative -200ms to +200ms block the sample under consideration is added to or removed from the (gated) BS.1770 loudness measure of the track/album.
The block relative to each sample is what I call the running gate because it's easy to update it's sum of squares at each step.

  • The "ungated" total loudness of the resulting set of blocks is your result.

What do you mean by "loudness of a set of blocks"? Doesn't it imply to count samples more than once?

It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%. If this is true than it would be fully compliant because they require 50% at a minimum.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #53
It seems to me that what I've implemented is the limit of what you get if you let go the overlap to 100%.


No. Because, for true overlap, the answer to this

Doesn't it imply to count samples more than once?


is "Yes!". Within the overlapping area the same sample can be part of both, zero or more eliminated blocks and zero or more non-eliminated blocks. All non-eliminated blocks are part of the final calculation.

What do you mean by "loudness of a set of blocks"?


Conceptually: Concatenate all non-eliminated blocks and calculate the "ungated" loudness for the whole interval. In practice you basically average the pre-calculated loudness values of all non-eliminated blocks, see step (8).

PS My comments are not supposed to curtain the fact that you have done a great job so far!

R128GAIN: An EBU R128 compliant loudness scanner

Reply #54
PS My comments are not supposed to curtain the fact that you have done a great job so far!

Many thanks to C.R.Helmrich and you for the great comments! Meanwhile I've taken another look at the papers and I think the point is clear now.

Probably the next version will offer the two pass approach (and may leave the current one pass as a very good approximation).

R128GAIN: An EBU R128 compliant loudness scanner

Reply #55
Please pardon the noob here; hopefully I'm keeping up with the discussion even though most of this is far outside my normal domain. I'm sure you'll all set me straight if I'm on the wrong track!

Is a two-pass approach over the input really required? While I'm sure it's a reasonable approach; from a library perspective a single pass interface seems convenient (like the common ReplayGainAnalysis C code). In googlebot's steps 1 through 6; the loudness per block is calculated implicitly during step #2 and if I'm understanding correctly only that per-block loudness is needed for all of the remaining steps.

Now in a "maximum overlap" approach as suggested by pbelkner each input sample results in a block so the block count per second is of course very high (equal to the sample rate). In this case buffering the per-block loundness in a single-pass approach sounds ridiculous compared to a two-pass algorithm.

But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc).

I looks to me like there is a good reason to stay near the 50% minimum overlap.

-Jeff

 

R128GAIN: An EBU R128 compliant loudness scanner

Reply #56
Completely agree! One doesn't really have to pass two times over the whole input. Only the loudness values of non-eliminated blocks need to be saved during the first pass. The second "pass" can then just further decimate those (in the same loop as the final averaging).

I often do not start to look for speed optimization potential before I have a simple to understand and correctly working first sketch. In my experience this leads to better code in the long run. But you are right: 2 passes over the whole input are overkill, probably even for a first sketch...

R128GAIN: An EBU R128 compliant loudness scanner

Reply #57
But in the minimum 50% overlap standard laid out by Tech 3341, Annex 1; the block count per second is fixed at 5 independent of the sample rate. If I'm understanding this correctly it means that buffering the per-block loudness would "only" require 18K samples per hour (versus 172 million for near 100% overlap). If the loudness samples are stored in 64-bits that's only a little over 700 KiB an hour of buffering. While it isn't bounded; it sounds reasonable for in memory buffering this application on modern hardware (considering tyipcal PC applications at this point, not embedded devices, etc).

I looks to me like there is a good reason to stay near the 50% minimum overlap.

Thanks a lot for this estimation. For album gain calculation we have to buffer "loudness samples" in this order of magnitude.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #58
Many thanks to C.R.Helmrich and you for the great comments!

Gern geschehen. Thank you for taking the implementation initiative!

I also agree with Jeff and googlebot and suggest to do it exactly like they proposed: compute a new block loudness measure every 9600 samples (at 48 kHz) and store all blocks with loudness > -70 LUFS in a linked list (or array if you know the track/album length ahead of time... which you do in our scenario, I guess). Then you can apply the relative gate on this list.

Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison

[blockquote]block loudness > -70 LUFS[/blockquote]
is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to

[blockquote]block energy > 0.4 * sample rate * 10^((-70+0.691)/10),[/blockquote]
with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by

[blockquote]block energy > 0.1584893 * average 70-gated energy[/blockquote]
Chris
If I don't reply to your reply, it means I agree with you.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #59
I have it on good authority that the calculation can be done in a single pass. This was a design requirement as R128 was designed to be workable for live broadcast applications. I will make inquiries and try and scare up the technical details. If anyone happens to be in Switzerland in February all will be revealed.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #60
A fully standard compliant single-pass outline is on the table since at least post #54 (bottom). For I-scale measurements, some state has to be accumulated, though, because the loudness of a programme's last block can in principle decide whether its first block gets gated or not. Hardware with limited memory will have to be subject to limits for the maximum integrable time span (which can be huge at moderate cost if you look at Jeff's post). The S- and M- scales, on the other hand, are suited for measurements of infinite length.

I'm looking forward, however, to what you can dig up at the workshop and share here!

Great optimization by C.R.Helmrich, btw, this should save several orders of magnitude CPU time!

R128GAIN: An EBU R128 compliant loudness scanner

Reply #61
v0.3 released

I've just uploaded the new version and it's available at[blockquote]http://sourceforge.net/projects/r128gain/files/[/blockquote]What's new?
  • Implements the algorithm as discussed with C.R.Helmrich, googlebot, and jdoering (cf. "r128c.c", many thanks again, however the latest optimization as proposed by C.R.Helmrich is still missing). The results of the test cases are now in accordance with the specification:

    Code: [Select]
    $ r128gain ../sounds/ebu-loudness-test-setv01/
    args
    ../sounds/ebu-loudness-test-setv01
      analyzing ...
        1kHz Sine -20 LUFS-16bit.wav (1/16): -20.0 LUFS, -3.0 LU (peak: 0.100734: -10.0 dBFS)
        1kHz Sine -26 LUFS-16bit.wav (2/16): -26.0 LUFS, 3.0 LU (peak: 0.050508: -13.0 dBFS)
        1kHz Sine -40 LUFS-16bit.wav (3/16): -40.0 LUFS, 17.0 LU (peak: 0.010260: -19.9 dBFS)
        seq-3341-1-16bit.wav (4/16): -23.0 LUFS, -0.0 LU (peak: 0.071316: -11.5 dBFS)
        seq-3341-2-16bit.wav (5/16): -33.0 LUFS, 10.0 LU (peak: 0.023049: -16.4 dBFS)
        seq-3341-3-16bit.wav (6/16): -23.0 LUFS, -0.0 LU (peak: 0.071468: -11.5 dBFS)
        seq-3341-4-16bit.wav (7/16): -23.0 LUFS, 0.0 LU (peak: 0.070850: -11.5 dBFS)
        seq-3341-5-16bit.wav (8/16): -22.9 LUFS, -0.1 LU (peak: 0.100845: -10.0 dBFS)
        seq-3341-6-5channels-16bit.wav (9/16): -23.0 LUFS, 0.0 LU (peak: 0.063133: -12.0 dBFS)
        seq-3341-6-6channels-WAVEEX-16bit.wav (10/16): -23.7 LUFS, 0.7 LU (peak: 0.063133: -12.0 dBFS)
        seq-3341-7_seq-3342-5-24bit.wav (11/16): -23.0 LUFS, -0.0 LU (peak: 0.358341: -4.5 dBFS)
        seq-3341-8_seq-3342-6-24bit.wav (12/16): -23.0 LUFS, 0.0 LU (peak: 0.718299: -1.4 dBFS)
        seq-3342-1-16bit.wav (13/16): -22.6 LUFS, -0.4 LU (peak: 0.100089: -10.0 dBFS)
        seq-3342-2-16bit.wav (14/16): -16.8 LUFS, -6.2 LU (peak: 0.177974: -7.5 dBFS)
        seq-3342-3-16bit.wav (15/16): -20.0 LUFS, -3.0 LU (peak: 0.100089: -10.0 dBFS)
        seq-3342-4-16bit.wav (16/16): -20.0 LUFS, -3.0 LU (peak: 0.100075: -10.0 dBFS)
        ALBUM: -21.9 LUFS, -1.1 LU (peak: 0.718299: -1.4 dBFS)

  • The command line syntax has slightly changed in order to allow for (hopefully) proper wildcard expansion:

    Code: [Select]
    r128gain <input>? [-o <directory> [flac]]

    The new version accepts one or more input files or directories possibly containing wildcards. The optional output directory has to be separated from the list of inputs by the switch "-o".

R128GAIN: An EBU R128 compliant loudness scanner

Reply #62
Works perfectly, great job! Even for multichannel and high resolution files.

I'm wondering why the EBU provided test sample don't match their own descriptions in tech 3341. That should be fixed.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #63
Actually, I think to avoid calculating the logarithm and division by T every 200 ms you can simply store the block energies in your list, because the comparison

[blockquote]block loudness > -70 LUFS[/blockquote]
is, assuming your block energy = left energy + right energy + center energy + 1.41* ..., equivalent to

[blockquote]block energy > 0.4 * sample rate * 10^((-70+0.691)/10),[/blockquote]
with the right-hand term being a constant (0.00225113 for 48 kHz, 0.00206823 for 44.1 kHz). Then you can work analogously for the relative gating: simply sum up all the block energies in your 70-gated list, divide by the number of energies in the list to get the average 70-gated energy, and apply the relative gating threshold by

[blockquote]block energy > 0.1584893 * average 70-gated energy[/blockquote]

Let me, please, summarize how I understand this:
  • The BS.1770 loudness measure is defined as[blockquote]-0.691 + 10*lg(wmsq),[/blockquote]where[blockquote]wmsq = sum_i_j G_i*x_i_j*x_i_j/n,
    i running over all channels,
    G_i the weighting coefficient for the i-th channel,
    j running from 0 to n-1 over all sampling intervals,
    x_i_j the j-1 channel's voltage of the i-1 sample[/blockquote]is the (per channel) weightet mean square of the intervall under consideration.
  • Let wmsq_i be the weighted mean square of the i-th block of an EBU R128 overlapping segmentation.
  • Phase 1 of the EBU R128 algorithm chooses block i of the EBU R128 segmentation if[blockquote]-0.691 + 10*lg(wmsq_i) > -70
    <==> 10*lg(wmsq_i) > 0.691-70
    <==> lg(wmsq_i) > (0.691-70)/10
    <==> wmsq_i > 10^((0.691-70)/10)

    The threshold for chosing block i with wmsq_i is 10^((0.691-70)/10).[/blockquote]
  • Let wmsq_p1 be the weighted mean square of all blocks chosen in phase 1.
  • Phase 2 of the EBU R128 algorithm chooses block i of the EBU R128 segmentation if[blockquote]-0.691 + 10*lg(wmsq_i) > -0.691 + 10*lg(wmsq_p1) - 8
    <==> 10*lg(wmsq_i) > 10*lg(wmsq_p1) - 8
    <==> lg(wmsq_i) > lg(wmsq_p1) - 0.8
    <==> lg(wmsq_i) - lg(wmsq_p1) > -0.8
    <==> lg(wmsq_i/wmsq_p1) > -0.8
    <==> wmsq_i/wmsq_p1 > 10^(-0.8)
    <==> wmsq_i > wmsq_p1*10^(-0.8)

    The threshold for chosing block i with wmsq_i is wmsq_p1*10^(-0.8)[/blockquote]
  • Let wmsq_p2 be the weighted mean square of all blocks chosen in phase 2.
  • Then[blockquote]-0.691 + 10*lg(wmsq_p2)[/blockquote]is the EBU R128 loudness measure.
It turns out that it is possible to avoid any logartithm during intermediate calculations. Intermediate results, i.e. weighted mean squares, are optained simply by add and multiply operations. Only the one time calculaton of the two thresholds for phase 1 and phase 2 needs exponentation.

Pass 1 of the EBU R128 algorithm only has to cache the weighted mean squares wmsq_i of the EBU R128 segmentation. From that all the rest can easily be derived.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #64
The BS.1770 loudness measure is defined as[blockquote]-0.691 + 10*lg(wmsq),[/blockquote]where[blockquote]wmsq = sum_i_j G_i*x_i_j*x_i_j/n,
i running over all channels,
G_i the weighting coefficient for the i-th channel,
j running from 0 to n-1 over all sampling intervals,
x_i_j the j-1 channel's voltage of the i-1 sample[/blockquote]is the (per channel) weightet mean square of the intervall under consideration.

should read

Quote
The BS.1770 loudness measure is defined as[blockquote]-0.691 + 10*lg(wmsq),[/blockquote]where[blockquote]wmsq = sum_i_j G_i*x_i_j*x_i_j/n,
i running over all channels,
G_i the weighting coefficient for the i-th channel,
j running from 0 to n-1 over all sampling intervals,
x_i_j the j-th channel's voltage of the i-th sample[/blockquote]is the (per channel) weightet mean square of the intervall under consideration.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #65
Exactly, and if you pull out the "/n" in wmsq = sum_i_j G_i*x_i_j*x_i_j/n, which you can do since n is the same in all blocks and channels, you get what I wrote because n = 0.4 * sample rate and

[blockquote]wmsq = block energy / n[/blockquote]

and save many divisions. Of course you still need the division when computing the final R128 loudness measure.

I think you mean "x_i_j the i-th channel's voltage of the j-th sample" though, right?

Chris

If I don't reply to your reply, it means I agree with you.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #66
Just out of curiosity, where does that 0.4 come from?


R128GAIN: An EBU R128 compliant loudness scanner

Reply #68
Duh. My bad!

R128GAIN: An EBU R128 compliant loudness scanner

Reply #69
I have a proposal.

New standard tag fields:
EBU_R128_REFERENCE_LOUDNESS
EBU_R128_TRACK_GAIN
EBU_R128_TRACK_PEAK
EBU_R128_ALBUM_GAIN
EBU_R128_ALBUM_PEAK
or R128GAIN_*, EBUR128_*, ...

Replay Gain tag fields should become optional, only activated by a command line option. So no loss there if people want to test your implementation without having an EBU R128 DSP plugin.

Without independent tag fields the authors of such plugins cannot start supporting EBU R128 gain control in their Replay Gain plugins.

R128GAIN: An EBU R128 compliant loudness scanner

Reply #70
PS: I'd say that using GAIN in your prefix is a bad idea like I had suggested (unfortunately I can't edit the post anymore). The GAIN in REPLAYGAIN_* is part of the proper name of that loudness measurement system. Choose wisely, AFAIK you're the first with such an implementation that uses tag field names or even the first with a PC implementation of EBU R128. The tag fields will probably become the standard (in the PC sound community).

EBUR128_* would be consistent with Replay Gain's omission of the whitespace between the two words, but it is confusing so that people might think the standard's name is Ebur 128 or EBUR 128. Hence I would vote for EBU_R128_*


R128GAIN: An EBU R128 compliant loudness scanner

Reply #72
I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base?

New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20.

Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding).

-Jeff

R128GAIN: An EBU R128 compliant loudness scanner

Reply #73
Just compared r128gain output versus ReplayGain for ref_pink.wav. ReplayGain defines ref_pink.wav as +6.00 dB. This was originally 0 when compared to 83 dB SPL but shifted up when 6 dB was added to make typical music "loud enough" on non-calibrated systems.

Code: [Select]
C:\development\replaygain>r128gain.exe ref_pink.wav
args
  analyzing ...
    ref_pink.wav (1/1): -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS)
    ALBUM: -23.4 LUFS, 0.4 LU (peak: 0.292569: -5.3 dBFS)


Since the whole point is to come up with a scaling ratio and relative LU are scaled in dB it looks to me like this algorithm will generate ReplayGain compatible values simply by adding 5.6 to the reported LU to compensate for the different base loudness of ref_pink.wav (-.4 difference due to fundamental differences in algorithms and +6 difference due to the ReplayGain reference point shift). However; this whole comparison is based on my assumption that the goal is to use the new algorithm for computing the loudness and adjustment but calibrating to the original reference sound. I don't know if this is a valid comparison or if for example the new algorithm would specifically not be expected to behave ideally on ref_pink.wav.

Anyway, for the few real music files I compared the results were similar enough to the ReplayGain calculated values that this seems plausible but different enough that I don't know if this is a valid conversion method or not.

-Jeff

R128GAIN: An EBU R128 compliant loudness scanner

Reply #74
I had been hoping that the written tags were being converted into REPLAYGAIN compatible units (although I wondered). How are the flacs being tested being tested; a modified playback program as well? In that case is the correction algorithm applied at playback the same just different units / base?

EBU R128 and ReplayGain are two different approaches to reach the same goal: uniform loudness at replay time.

Common to both approaches is to define an algorithm in order to determine at scan time
  • an absolute loudness, and
  • a relative loudness (gain) in order to adjust the loudness accordingly to a standardized absolute loudness at replay time.
Even if this is common to both approaches there are huge differences
  • The two algorithms are completely different. If you compare the relative loudness between different tacks achieved with ReplayGain and EBU R128 you will observe huge differences.
  • Hence it makes not much sense to have one part of your audio collection processed using ReplayGain and the other part using EBU R128. Propably it's best to decide beforehand which approach to use based on personal preferences (tests).
Metadata is not part of EBU R128. What R128GAIN does is writing the same tags as METAFLAC . Each playback software providing ReplayGain and honoring the METAFLAG tags should work with FLACs tagged by R128GAIN, e.g. Winamp. The loudness level than will be -23 LUFS as requiered by EBU R128 (completely different measure than RG's 83 dB).

Tests where performed using Winamp in conjunction with my own SoX and FFmpeg based input plugin. Native WA should do as well.

New tags seems very unfortunate (given hardware device support, etc). New tags for the peak data wouldn't mean anything more than sample peak (ReplayGain) versus true signal peak (EBU R128); right? Would a playback program care about the distinction (would seem unlikely unless a fancy client had some way of estimating the worst-case error in sample-peak based on sampling frequency, etc ... sounds far fetched). In terms of the gain; I had been assuming that it was just a matter of converting units / reference levels. I guess the paper probably answers that. It sound interesting; too bad it's $20.

Plaback software (as e.g. Winamp) makes use of the peak values (e.g. providing a clipping prevention mode). Whether it is amplitude peak or true peak will become intersting in case there is some up-sampling in the playback chain, and propably it is, because each contemporary DAC does it. Hence you should always store true peaks.

Also note that storing REFERENCE_LOUDNESS for ReplayGain is not a standard and probably doesn't make any more sense here than it does for ReplayGain (current non-standard metaflac behavior notwithstanding).

Maybe it could become part of the RG standard:
  • It would help to resolve the 83 dB vs. 89 dB debate.
  • It would help to integrate playback of ReplayGain tagged and EBU R128 tagged tracks depending
on the unit dB vs. LU, provided someone figures out the mean relative loudness between the two approaches.