Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Loudness estimation (Read 13717 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Loudness estimation

It seems odd to me that since loudness adds across critical bands, and compresses inside of one critical band, (zwicker, fletcher, allen, a cast of hundreds), that most loudness estimation does a filtering that ensures underestimation of loudness over threshold from low and high frequencies, and then measures broadband energy.

Discuss??
-----
J. D. (jj) Johnston

Loudness estimation

Reply #1
So, what would you do instead?
I'm not sure about what you mean by "compressed inside of one ctirical band".
You are basically saying that the loudnes within a critical band should be calculated somehow differently (accounting for the "compression effect") and then summing all the critical bands loudness values to get a subjective loudness score, right?

How should loudness be calculated for each critical band?

SebastianG

Loudness estimation

Reply #2
Loudness estimation is not an easy task, especially if you want to have measurements that work with very different types of programs (music, speech, sports,..)
Some years ago, the ITU begun a working group (SG3) to try to define a loudness measurement (mainly for broadcast) , some papers :
skovenborg_loudness
IRT levelling
AES on loudness
Lund_TC

Loudness estimation

Reply #3
I am not sure to understand the question.
Are you wondering if we should compute loudness using values weighted according to the ATH or just plain flat level?

Loudness estimation

Reply #4
I am not sure to understand the question.
Are you wondering if we should compute loudness using values weighted according to the ATH or just plain flat level?


I'm wondering if "weighting" or not is even the right question.
-----
J. D. (jj) Johnston

Loudness estimation

Reply #5
Bump, because this is an interesting topic that never got a good resolution.

From Skovenborg's paper that jlohl linked to, it seems that Woodinville's original question - why equal-loudness filtering is still used for loudness estimation, if it performs so poorly - has already been answered. There's no universal standard that has eclipsed things like A-weighting, since SG3 hasn't made a final recommendation yet.

I feen uneasy, given the perennial talk about the loudness wars, that the commonly used objective measurements of loudness involve algorithms that are known to be flawed. I suspect that once a better estimator is used, the measured differences in loudness between old/new masterings will change considerably, and will better match what people actually experience, rather than being a bit artificial. I also figure that ReplayGain would work a lot better if it used a better loudness estimator than an IIR filter.

Are there any new estimators that are reasonably close to widespread implementation?

EDIT: The srg3list commentary mentions an ITU-R meter that is close to being standardized, but I don't see a description of it anywhere.

Loudness estimation

Reply #6
I also figure that ReplayGain would work a lot better if it used a better loudness estimator than an IIR filter.


Same here. I'm surprised no one has done it, since Woodinville is right - the mechanism of loudness addition is well known!

Mind you, that's for one instant. When you consider short-term time integration, long term perception, and the effect of different content, it becomes quite difficult to say "how loud is this" in a single number, or even X is as loud as Y, when X and Y are two different 3 minute recordings.

When I proposed ReplayGain, I expected someone to unpick most of this and do it "properly" within months.

Five years and counting...

It ought to make a good university project for someone!

Cheers,
David.

Loudness estimation

Reply #7
Quote
The srg3list commentary mentions an ITU-R meter that is close to being standardized, but I don't see a description of it anywhere.

ITU released the recommendation BS-1770, you can get it at ITU R-1770, Algorithms to measure audio programme loudness and true-peak audio level
The weighting curve consists of two stages :
- shelving highpass filter (Fc = 1500Hz, +4dB)
- second order highpass at Fc =50Hz (near contour curve C)
followed by Leq(W) measurement

Loudness estimation

Reply #8
Quote
The srg3list commentary mentions an ITU-R meter that is close to being standardized, but I don't see a description of it anywhere.

ITU released the recommendation BS-1770, you can get it at ITU R-1770, Algorithms to measure audio programme loudness and true-peak audio level
The weighting curve consists of two stages :
- shelving highpass filter (Fc = 1500Hz, +4dB)
- second order highpass at Fc =50Hz (near contour curve C)
followed by Leq(W) measurement

Thanks. I did manage to hunt down that page, but I was hoping that the free drafts would still be available for perusal. (I'd rather not blow $33 on this if I can help it.) But sg3list doesn't seem to have any.

What exactly is Leq(W)?

So, if I understand the basics of 1770 correctly, it still uses straight up IIR filtering? That's kind of a drag. On the other hand, it looks like they will also standardize loudness calculations for multichannel recordings, which would benefit RG quite a bit.

Loudness estimation

Reply #9
You can get 3 documents anualy for free from ITU, you don't have to pay 33$ if 3 documents per year is enough for you.

Actualy Leq(LRB) is used in ITU-R BS.1770 and you can see about Leq(LRB) in already mentioned skovenborg paper. Leq(W) is just general algorithm without speicifing which frequency weighting (W) to use.

Loudness estimation

Reply #10
I am puzzled by claim of ITU that Leq(LRM) is the best and the claim of Skovenborg and Nielsen that TC HEIMDAL is the best.
In Skovenborg/Nielsen paper they say that version of HEIMDAL model was also evaluated by SRG-3 group.
I understand that depending on the set of data used results can differ and I am wondering if this is the only case here.

Loudness estimation

Reply #11
Well, on top of that, the frequency weighting only takes into account head-related frequency shelving (ie high freqs are boosted for side channels compared to front channels), and a phenomenally basic weighting filter. It's nothing but a 700hz highpass.

Look at Figure 5 in the recommendation, the subjective results. The outliers differ from each other by close to 5db in places. Isn't this too large a range? Obviously they are getting subjective results that correlate rell to objective results - it's kind of hard not to - but it would seem to me as if they aren't setting their standards particularly high with that much variation in subjective loudness. I would expect 1-2db to be a more reasonable tolerance, albeit obviously dependent on how much effort one puts into testing.

Loudness estimation

Reply #12
Well, on top of that, the frequency weighting only takes into account head-related frequency shelving (ie high freqs are boosted for side channels compared to front channels), and a phenomenally basic weighting filter. It's nothing but a 700hz highpass.


Head-related shelving is not of big importance here. It is used in ITU-R BS.1770 only for multichannel. Subjective tests were conducted on monophonic signals.

Look at Figure 5 in the recommendation, the subjective results. The outliers differ from each other by close to 5db in places. Isn't this too large a range? Obviously they are getting subjective results that correlate rell to objective results - it's kind of hard not to - but it would seem to me as if they aren't setting their standards particularly high with that much variation in subjective loudness. I would expect 1-2db to be a more reasonable tolerance, albeit obviously dependent on how much effort one puts into testing.


In figure 5 in recomendation BS.1770 there is nothing that indicates variations in subjective loudness. As I understand it, only averages for each test sequence are plotted.  As I can see maximum error between subjective level and objective measurment is about 4 dB (that is the maximum deviation of an outlier). This is not much IMHO. And overall this graph corresponds with reported correlation coefficient.

What misses is correlation graphs in Skovenborg/Nielsen paper. Based on Figure 4 in Skovenborg it seems that there is less error and thus bigger correlation in their experiments than in experiments conducted by ITU. But one must have in mind that this is written in Skovenborg/Nielsen: "Each segment was normalized in level by means of an RMS-based estimate of its loudness." From this one can conclude that distribution of obtained subjective gain is much less than in ITU-R BS.1770 (in ITU-R BS.1770 it is about 20 dB). This probably leads to much less correlation in Skovenborg/Nielsen even though absolute error is less than in ITU.

It seems to me now that ITU recomended Leq(RLB) might really be the best solution.

What do people here think about replacing method proposed by David Rabinson with Leq(RLB)?
Of course one also should take into account "long term perception, and the effect of different content".
What do you think David?


Loudness estimation

Reply #14
I guess it all depends on what we're going to use it for. I don't doubt that 1770 used straight-up would be a fine choice for ReplayGain. Moreover, it is somewhat advantageous to tie the gain values to professional standards to make the numbers more comparable to those used in other fields.

Also: the true-peak algorithm should also be used with ReplayGain, to calculate track/album peaks. This would provide the cleanest way for listeners to avoid 0dbFS+ overload issues, which seems like a fantastically important thing to me. Unfortunately, it would also be fantastically slow and would negate any performance advantages that 1770 weighting has over David's existing filter. I still think it's well worth implementing though.

But moving beyond just ReplayGain, the scope of 1770 is limited. It's not designed to be an end-all be-all loudness estimator. It's designed to be "good enough" - much better than A-weighting, around as good as anything else out there, and fast (or at least the weighting is fast). And the documentation I've seen on HEIMDAL seems to imply that it won't be ready for prime time for a while. But what I was hoping for was an algorithm that could be used to compare dynamic ranges of various musical selections, and to show the effects of compression on dynamic loudness changes, in such detail as to closely match subjective evaluations. Ie, to objectively prove the damage of the loudness war, without having to resort to unscientific waveform plots or to inaccurate loudness estimators. And 1770, even in its hypothetical real-time incarnation, doesn't appear to be able to do that as much as I hoped.

My standards might be too high on this though.


Loudness estimation

Reply #16
Right. With jj at the helm of that, I would honestly expect whatever loudness estimation MS is using to be about as good as one can get right now.

I suppose one could identify the algorithm by recording a Vista sound output in loopback and feeding test signals through it?

Loudness estimation

Reply #17
Notice this:
Quote
Loudness equalization (EQ) uses a simulation of human hearing to obtain an accurate loudness measurement—as opposed to intensity—of an audio source and then provides a dynamic gain adjustment to keep the loudness of the sources more constant. Therefore, loudness EQ might affect both dynamic range and peak loudness.

So this is not equivalent to replay gain. Though it certainly uses some algorithm to estimate loudness which probably is similar to those tested with SG-3 groups.

What puzzles me in http://www.tcelectronic.com/media/AES121_S...ing_Samples.pdf and in AES papers related to this ("Objective Measures of Loudness", "Evaluation of Objective Loudness Meters") is that even simple Leq without any weighting produces results almost as good as Leq(LRB). And another thing is that they never state that some kind of window is used. As I understand Leq is always used on the entire sample which is 10-15 sec long. Do people reading this believe that Leq can perform well when this long samples are used without window and some kind of statistical analyses as the one performed in replaygain?

Loudness estimation

Reply #18
So this is not equivalent to replay gain.

Note that there are two versions, one that is "two pass" and one that is "one pass". The "one pass" EQ in fact sets a peak loudness for each track, and does no dynamic range compression.
Quote
Though it certainly uses some algorithm to estimate loudness which probably is similar to those tested with SG-3 groups.

It does not use an "A weight and power" algorithm.  I don't believe anything further is said at this time.
Quote
And another thing is that they never state that some kind of window is used. As I understand Leq is always used on the entire sample which is 10-15 sec long. Do people reading this believe that Leq can perform well when this long samples are used without window and some kind of statistical analyses as the one performed in replaygain?


Depends on what you want to do. Do you want to equalize peaks? Averages? Obviously, an estimation has to be made over a time period shorter than 200 milliseconds at the very least, and better every 30-40 milliseconds, but then what do you want to equalize? Peaks? Averages? What?
-----
J. D. (jj) Johnston

Loudness estimation

Reply #19
Note that there are two versions, one that is "two pass" and one that is "one pass". The "one pass" EQ in fact sets a peak loudness for each track, and does no dynamic range compression.

Are you sure you didn't mix up "one pass" and "two pass"? EQing for peak loudness without DRC sure requires an analysis pass before playback, doesn't it?

Loudness estimation

Reply #20
Quote

And another thing is that they never state that some kind of window is used. As I understand Leq is always used on the entire sample which is 10-15 sec long. Do people reading this believe that Leq can perform well when this long samples are used without window and some kind of statistical analyses as the one performed in replaygain?

Depends on what you want to do. Do you want to equalize peaks? Averages? Obviously, an estimation has to be made over a time period shorter than 200 milliseconds at the very least, and better every 30-40 milliseconds, but then what do you want to equalize? Peaks? Averages? What?


SG-3 group from ITU and TC were searching for best algorithm to calculate overall estimated loudness. For test they used sequences that are 10-15 seconds long. And they never say that they estimate using some windows of length shorter than 200 milliseconds. From what I understood they use Leq on entire sequences. So they use very long time periods to estimate overall loudness. This is really strange to me. Or I understood something wrong.

 

Loudness estimation

Reply #21

Note that there are two versions, one that is "two pass" and one that is "one pass". The "one pass" EQ in fact sets a peak loudness for each track, and does no dynamic range compression.

Are you sure you didn't mix up "one pass" and "two pass"? EQing for peak loudness without DRC sure requires an analysis pass before playback, doesn't it?



Ugh.  You're right.

Two pass sets peak.
One pass compresses. Yi. You'd think I ought to  notice, too.

SG-3 group from ITU and TC were searching for best algorithm to calculate overall estimated loudness. For test they used sequences that are 10-15 seconds long. And they never say that they estimate using some windows of length shorter than 200 milliseconds. From what I understood they use Leq on entire sequences. So they use very long time periods to estimate overall loudness. This is really strange to me. Or I understood something wrong.


Well, loudness is not constant in most stimulii.

So I think that trumps all the discussion.
-----
J. D. (jj) Johnston