Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Measuring Dynamic Range (Read 41773 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Measuring Dynamic Range

Having read some of the many discussions about ReplayGain and the Loudness War, I had some thoughts about how to measure, not loudness, but the dynamic range actually used.  Most people are just eyeballing the waveforms in their favourite sound editor, but I'm sure we can do better than that.

My main thoughts take advantage of the fact that there are two kinds of level meter typically used in production and broadcasting - VU meters, which specifically measure "average" level over a short period, and Peak meters, which do what they say on the tin.  The BBC, for example, much prefers to use a PPM (Peak Programme Meter) than a VU.

A VU meter has slow ballistics, which is to say that it takes about 300ms to get within spitting distance of the sound it's measuring.  Therefore, it tends to miss most transient sounds, but it responds well to syllables of speech.

A PPM meter has fast, but not instantaneous attack ballistics - it will not respond to (perceptually inaudible) clicks, but it will respond fully to a drum hit.  Since it uses a "lossy quasi-peak detector", it then decays quite slowly from the peak it has measured.  I believe the attack time for a BBC-spec PPM is 20ms to 90%, but I could be wrong - and different European broadcasters use different, but broadly similar, attack specs.  The decay time is measured in seconds, giving the engineer plenty of time to note the peak level.

As a result, for a steady tone both meters will read the same, but for speech and music the PPM should normally read higher than the VU.  This is reasonably well-known, at least by people who know that different kinds of level meters even exist.

How *much* higher is a function of the local dynamics of the measured sound - and this is the principle I want to use.

I think that the differential between a VU-like meter and a PPM-like meter could be a valid and useful measure of dynamic range, just as ReplayGain is a useful measure of the "progress" of the Loudness War.  It should allow people to quickly determine whether the mastering of a particular recording is likely to be any good, regardless of level.  It would probably be better if the decay rates of the two meters were more similar.

Now before I dust off my Model M and start coding, are there any comments from the locals?

Measuring Dynamic Range

Reply #1
Wouldn't peak minus RMS (both in dB) yield a good approximation of the dynamic range?
Ceterum censeo, there should be an "%is_stop_after_current%".

Measuring Dynamic Range

Reply #2
Wouldn't peak minus RMS (both in dB) yield a good approximation of the dynamic range?

I assume you're talking about taking the average power over the entire file, and comparing that to the digital peak.

That's not a good idea, because it doesn't result in a stable measurement.  The average power can be reduced by the simple expedient of adding silence at the beginning and end.  The digital peak might be a single-sample click that was never heard, while the rest of the file never goes above -3 dBFS.


Measuring Dynamic Range

Reply #4
... my webspace hosting the files for pfpf disappeared in the middle of the night (literally! without a contact phone number or anything!), but I should be able to get pfpf out to anybody who requests it.

pfpf has still been on my mind, but other projects take priority right now. Also look at this article, which I found only after I wrote pfpf, but makes a very strong case for the usefulness of measuring dynamic range on a highly frequency-dependent basis.

http://www.soundonsound.com/sos/may02/articles/cholakis.asp

Measuring Dynamic Range

Reply #5
Check this:
http://www.hydrogenaudio.org/forums/index....showtopic=60502

Interesting.  I think I'm taking a slightly different tack, by starting from a "live meter" and only later considering how to turn that into a One True Number for the file.

Also, my programs usually end up rather less than 90MB downloads.  :-)

I see that he comes up with three numbers, and expressly discards silence.  Two of the numbers seem to be broadly similarly sourced to what I suggested, that is measuring "local dynamics" on two different timescales, but using windowed RMS rather than (simulated) meter ballistics.  The third tries to distinguish overall forte from piano.

I'm a bit skeptical about his statistical methods.  I'm not convinced that taking the difference between the 97.5% and 50% marks is correct for all three measures, even if it is for one of them.  I'm also not convinced about unilaterally throwing away silence.  I think I'd want to start by seeing the whole histogram.

Measuring Dynamic Range

Reply #6
Wouldn't peak minus RMS (both in dB) yield a good approximation of the dynamic range?
In radar and RF comms, the term used for this is the "peak to average power ratio" . Oversimplifying a bit, it's an important factor in radar because you want to get as much energy out as possible, with an amp with as little power as possible (so maximizing PAPR is good). There are a variety of ways of measureing it (and several similar concepts). Maybe looking at those would give you an idea of ways this has been measured in the past. If you are at all interested, I can dig up some papers to send you.

Quote
pfpf has still been on my mind, but other projects take priority right now. Also look at this article, which I found only after I wrote pfpf, but makes a very strong case for the usefulness of measuring dynamic range on a highly frequency-dependent basis.

http://www.soundonsound.com/sos/may02/articles/cholakis.asp
That's pretty interesting. It looks like splitting the signal into three (overlapping) bands prior to measurement would add extra information to the measurement.

Measuring Dynamic Range

Reply #7
An interesting article about Cholakis' analysis.

But I think we can tell the difference between clipressed and not, without going to the trouble of separating the frequency bands.  Thats why I'm talking about "sparkle" and "local dynamics", rather than trying to differentiate ppp and fff.

Measuring Dynamic Range

Reply #8
I see that he comes up with three numbers, and expressly discards silence.  Two of the numbers seem to be broadly similarly sourced to what I suggested, that is measuring "local dynamics" on two different timescales, but using windowed RMS rather than (simulated) meter ballistics.  The third tries to distinguish overall forte from piano.
It's windowed RMS, but derived from BS.1770 than any existing meter.

I'm sure Bob Orban is on your side here, but as much as everybody loves hating 1770, it's the best tested loudness estimator currently available, and that (combined with its ease of implementation) makes it the obvious choice for constructing a time-varying loudness meter. If a study comes out comparing windowed 1770, HEIMDAL, and the CBS meters for transient loudness purposes, then I'll think harder about changing meters.

Quote
I'm a bit skeptical about his statistical methods.  I'm not convinced that taking the difference between the 97.5% and 50% marks is correct for all three measures, even if it is for one of them.


For most (but certainly not all) music, I found the cdf of the histogram to increase more or less at a constant slope from 50% on up to the 90-100% range. It had a lower slope below 50% and a much lower slope (a long tail) at percentiles very close to 100%.

If one assumes that louder dynamics are more important to represent than quieter dynamics - which is certainly true from a masking perspective at short time scales - then it's reasonable to choose the percentile points to ignore the quiet sections and the absolutely loudest sections. That's where the 50%-97.5% numbers came from. (But they're entirely configurable at runtime.)

Those numbers also have a certain simplicity to their meaning, in that it could be interpreted as a quasi-peak-to-mean measurement, in a weird way.

It's also important to avoid the 100th percentile for LP vs CD comparisons, to make them less sensitive to pops and ticks in the LP.

Quote
I'm also not convinced about unilaterally throwing away silence.  I think I'd want to start by seeing the whole histogram.


I believe I've argued (and if I haven't, I am now!) that any dynamic range estimate is meaningless without knowledge of the dynamic range of the listening environment. I believe that for some pairs of music, at high listening SNRs one will be perceived to have a higher DR than the other, but at low SNRs, the situation will be entirely reversed.

This is also a way to make LP vs CD comparisons easier - the idea being that you could compare the dynamic range across an entire album of multiple songs, and pfpf would automatically ignore the silence between tracks. That seems like a pathetically easy way to game the estimated dynamic range upwards if you don't do that.

It's also very useful at short time scales as a primitive masking model. One could argue that some electronic samples that go from silence to 0db and back down on the order of milliseconds has virtually no dynamic range, but would saturate an estimator at short time scales.

Again, this can be defeated by configuring it to some extremely low value (like -100db).

That's pretty interesting. It looks like splitting the signal into three (overlapping) bands prior to measurement would add extra information to the measurement.
Well, the logical conclusion I'm thinking of is to run pfpf on each frequency band of a 1-5ms-long FFT, with perhaps another FFT running on a decimated signal to get <1000hz numbers. You'd then have mean/low/hi/peak values for every frequency. That of course is completely abandoning the idea of measuring perceived loudness - the core BS.1770 filter becomes useless, although the multichannel algorithm is still required. But we'd be getting quite a bit more information in return.

BTW, another poster and I on GearSlutz independently came up with the idea of a Photoshop-like histogram/level adjustment control to apply across an entire track. Essentially, you could in theory take the histogram and adjust the high/mid/low levels, just like in photoshop. What you'd end up with is a 2-pass dynamic range compressor that is fully reversible.


An interesting article about Cholakis' analysis.

But I think we can tell the difference between clipressed and not, without going to the trouble of separating the frequency bands.  Thats why I'm talking about "sparkle" and "local dynamics", rather than trying to differentiate ppp and fff.


EDIT: I think I already agree with you about most of this. Frequency dependent measurements are largely about measuring different things.

But: To measure local dynamics aka short-term dynamic range, you must also measure the long-term dynamic range. Otherwise the swings in long term loudness will introduce a huge variability in the short term loudness that you'll never be able to get rid of.

Measuring Dynamic Range

Reply #9
I think we're trying to measure different things.

You're mostly after long-term dynamics, which is fine and says quite a lot about the musical performance.  The statistical approach you've taken sounds perfectly fine for that measure, but only on the "slow" timescale.  I think it's completely wrong for the other timescales - I would prefer to see the median or some other percentile of those, not the difference between two percentiles.

According to your results from another thread, most of your samples had about 6dB or less long-term dynamic range.  Only a couple had over 10dB.  That's pretty small.  Adding the three terms together gives a more sensible set of figures, with most having 18dB or less, about half having less than 15dB, and just a handful of Venetian Snares tracks having up to 24dB.  But the collab track from them had a total dynamic range of only 9dB (and by far the highest overall level).

I'm only after short-term dynamics, the presence or absence of which says rather more about the competence of mastering.  More than likely, they will be present in some parts of the piece but not others.

As I said earlier, I'm planning to take the difference (in dB) between a peak meter and an average meter - roughly corresponding to your "fast" and "medium" timescales, though measured differently - and treat that as a measure of short-term dynamics, or "sparkle".

I should point out that silent parts would read as 0dB of sparkle, whereas real sounds would invariably be positive.  So adding silence would bias the overall reading down - precisely the opposite effect to a measure of long-term dynamics.  But this is something I'll have to try and see.

The real reason for this is that ReplayGain gives a fairly good idea of how much headroom is available (or not) for sparkle to exist in, but not how much of it is actually used.  Some of the waveplots shown in those discussions showed headroom that was emphatically not used - the audio had, for unknown reasons, been hard-limited to various levels short of full-scale.

Measuring Dynamic Range

Reply #10
Well, I've pulled my finger out and actually written the tool.  It needs a bit of cleanup before public release, but it works reliably and "quickly enough".

After actually looking at the meter readings from a wide selection of records, I settled on the following measurement form:

There are two meters, an averaging meter and a peak meter.  The relationship between the two is of primary interest.

The averaging meter is a simple emulation of a VU meter, with attack and decay ballistics consisting of an exponential decay to the rectified waveform in linear voltage space, reaching 99% accuracy after 300ms.

The "peak meter" has the attack ballistics of a BBC PPM, thus reaching 80% accuracy after 10ms.  The decay ballistics are the same as for the averaging meter, and *not* the same as the BBC PPM.  Using the slower decay of the "official" PPM would have given far too much emphasis to diminuendo, particularly at the end of a track where there is a variable amount of silence.

Both meters are then calibrated in power decibels, with the reference level set to -14 dBFS.  The difference between them, in this dB scale, is taken at each sample and labelled "sparkle", being a measure of short-term dynamic range near that instant.

The meters' negative stops are at -58dB, which corresponds to about 12 bits of magnitude between there and full scale (at +14dB).  I considered this to be a reasonable condition for "silence".  The "sparkle" meter has the negative stop at 0dB, since it is a difference.  Obviously my virtual meters have a lot more dynamic range than the mechanical prototypes.

After the entire track has been metered, the track segments with meters at their negative stops are removed from the statistics, which effectively removes all of the silence.  The final "figure of merit" is then derived from the median, 10th and 90th percentiles of the "sparkle" data - namely, add the median and 90th, and subtract the tenth.  Thus, both the average and variance of the short-term dynamics are significant.  A compressed but not limited track will have a respectably high median but very low variance.

I'm still debating how to present this figure for maximum impact, but it gives high values for symphonic, choral, chamber and pre-1996 tracks, and low values for Red Hot Chilli Peppers' travesties of mastering (of which "Californication" itself is by far *not* the worst).  Circa-2000 albums that I've tested give intermediate figures.  Game-music tracks I've tested give either high or intermediate values, depending on mastering.  Some particular types of classical music give low-to-intermediate values, such as a Bach organ fugue.

The tool does give inconsistent results for very short tracks, usually drastically over-reading.  The meters require time to settle at the beginning and end, and the fairly crude statistics rely on the bulk of the track to hide these errors.  If that bulk is not present, Bad Things happen to the numbers.  Probably anything less than a minute (of sound, not total track length) will be suspect.  Radio edits of 3 minutes and change should be fine.

The statistics are computed using an approximate method involving buckets, rather than sorting a list of samples.  This is a heck of a lot faster and is probably accurate enough.

The tool (presently) emits a graph image as well as the figure of merit, which helps to visualise how it arrives at the figure.  (Obviously, I used this to help with design.)  In follow-up posts, I'll try to put up some of these graphs, together with actual figures from various tracks, and a commentary to explain what is shown.

Measuring Dynamic Range

Reply #11
Bad initial values at the start and the end could be solved by fast initialising the normally "slow to react" time scales, or initialising them based on the middle of the track, or repeating the track three times (if short), analysing over all three without re-setting, but keeping the results from only the middle one.

It's all interesting - though you've mentioned one "problem" already - some sounds, even if accurately recorded with no DRC at all, still have a relatively low dynamic range.

Cheers,
David.

Measuring Dynamic Range

Reply #12
I'm not too worried about the few types of music that genuinely have low dynamic range.  It is generally quite obvious when that is the case, simply by listening to it, and the figure-of-merit I get from a Bach fugue is still comparable to a circa-2000 mainstream pop release (although the Bach is obviously mastered at a lower level).  The worst "clipressed" tracks end up noticeably lower than that.

The graphs also include the readings from a third meter which I hadn't mentioned, which is a "true peak" meter with a fast decay.  I've applied strong anti-aliasing to the resulting image, so you get a sort of "corona" above the peak area.  On "clipressed" tracks, you can clearly see the "corona" being squashed against the full-scale limit.  On Bach, you don't see this happening.  This is of course best illustrated with the actual graphs, once I figure out how to put them up.  Unfortunately, I haven't yet worked out how to put a numerical measure on the corona, though I suspect it could be done in a similar way to my existing figure.

The problem with short tracks is perhaps more serious.  At the moment I'm leaning towards reducing the percentile range used for the variance measure for short tracks, perhaps on a linear scale from one minute (at 10%-90%) to zero length (50%-50%).  It is the variance that is most grossly over-estimated for short tracks, with the median rising only slightly, so I think this would give the right kind of correction.

The examples I have, however, are "cheers & jeers" tracks from a PC game, which are played after a match and are very short - so much so that I could probably post the entire track here as an example.  Actual music tracks would be very unlikely to end up this short.  Even so, eliminating nasty corner-cases is always good.

Now, I need to figure out where to stick these graphs so that the forum can see them...

Measuring Dynamic Range

Reply #13
Excellent work; So now that we have two dynamic range meters, I guess we have a reason to start discussing objective testing

I ran into similar boundary issues at the beginning/end with pfpf. I "solved" them by initializing the three timescale gains to whatever their initial values would be, but really, I think that the specific solution is going to depend on the context of the listening environment. Some samples will make sense to be looped, and some definitely won't. I think this matches the complexity of the situation well. For instance, I would go so far as to argue that the exact same track, placed either at the beginning of a CD or in the middle, would have a dynamic range that is literally perceived to be higher at the beginning.

Measuring Dynamic Range

Reply #14
Objective testing will have to wait until it's in a fit state for release!   

I've resurrected some of my old webspace to put my PNG images in.  Now to see if they'll fit into the forum without having to rescale them first...

Measuring Dynamic Range

Reply #15
To start off with, here is an exemplary Romantic-period symphonic track:  Berlioz' Le Carnaval Romain overture.

This version is from a Philips Silver Line Classics release, presumably created to show off CDDA's capabilities.  There is certainly plenty of dynamic range, and the mastering level is quite conservative.



ReplayGain: -1.01 dB (track)
Sparklemeter: 13.54 dB

I'm using the "raw" Sparklemeter figure of merit for this track (because I still haven't decided how to write it down in the long term), which is the sum of the median and variance, in dB.

This would be a good opportunity to point out the features of the graph you're looking at.  Apologies for the width; if it breaks the forum, let me know and I'll substitute a smaller version with a link to the bigger one.

Obviously, it's like a one-sided waveplot - the width of the image varies with the length of the track, so there's always 4 pixels per second.  Added to the left side is a histogram of the various components.  The main trick is that the vertical axis is logarithmic - each red line represents 6 dB, and there's 4 pixels per dB.

The third line from the top is the recommended -14 dBFS reference level for VU meters; the fourth line is conveniently at -20dBFS.  Counting from the bottom, the first line is 4dB (for "sparkle") and the next is 10dB.

The other trick is that instead of showing only the digital peaks, it shows the VU (blue), the "sparkle" reading (green), and the digital peaks *above* the peak meter's reading (red corona).  All of the meter readings are unweighted, so you shouldn't take them as psychoacoustically accurate.

As you can see from the green histogram, the median sparkle is about 9-10 dB, and the variance is quite wide.  Keep an eye on the shape of the green histogram in other tracks - it varies quite a lot.

Less visible is the blue histogram (magenta where it overlaps with the red one), but you can see that it is very wide with only one smallish mode.  This is what you'd expect from a symphonic work.

Look closely at the far bottom right of the graph - there is background noise visible at the end of the track, which covers the least significant 3-4 bits of CDDA even at this lowish mastering level.  Thus, extending the graph and the meter range further down would be pointless.

Measuring Dynamic Range

Reply #16
Next up is a similarly exemplary modern track - Leftfield's Leftism from 1995.



ReplayGain: -1.81 dB (track)
Sparklemeter: 15.40 dB

The mastering level here is clearly around the -20 dBFS mark, and it shows.  There is plenty of room left for a really phat drum and bass line, and they darn well use it.  There's even more in the way of short-term dynamics than in the Berlioz track - but a bit less in terms of long-term dynamics.

Notice that there is a dual mode on the sparkle histogram.  That gives the track a fairly normal median but an unusually wide variance, which is a Good Thing.  Also notice that there is a strong mode on the VU histogram - large portions of the track are at a steady loud level on that timescale.

Measuring Dynamic Range

Reply #17
And now for a thoroughly nasty one:  RHCP's Emit Remmus from 1999.



ReplayGain: -12.27 dB (track)
Sparklemeter: 9.28 dB

The graph says it all, really.  But let me point out the key features for you anyway:

The red "corona" is squashed against the top margin.  That's a sure sign that severe limiting or even clipping is going on.  On normal tracks, the digital peak can be substantially above the PPM reading for short instants, but on this one it never has a chance.

The VU meter is at about -12 dBFS for most of the track, but reaches up to -8 dBFS for a significant time.  Look at the sparkle graph (green) for that section - notice that it is both depressed and flat compared to the -12 dBFS sections (which are already rather flat compared to earlier examples).  This is precisely what I expected to see when I started this thread.

The sparkle histogram shows a very narrow distribution, which is also at least 1dB down at median from normal levels.  This is why I'm using a combination of the median level and the variance.


And now for something a bit more moderate:  The Manic Street Preachers' If You Tolerate This Then You Will Be Next.  Rather appropriate, I think.



ReplayGain: -8.23 dB (track)
Sparklemeter: 9.83 dB

This is a track that has clearly been run through a compressor, but unlike the RHCP track, hasn't had the levels forced up so far that the associated limiter is obvious.  However, you can still make out a brightening of the corona on the slightly louder late half of the track, which suggests to me that some extra processing is coming into effect there.

The sparkle histogram shows another very narrow distribution, at a slightly low median level.  The VU histogram also shows a very narrow double-mode distribution.

Measuring Dynamic Range

Reply #18
Finally (for today), here are a couple of Bach tracks.  Both of these are from a Decca "Best Of..." disc, which I know to have been remastered in a fairly basic fashion from old analogue recordings.  The background hiss is quite noticeable, until the engineer smartly fades it out at the end.

First is a small concerto - the slow movement from Brandenburg Concerto #2, in fact.



ReplayGain: +4.87 dB (track) | -1.82 dB (album)
Sparklemeter: 12.39 dB

The long-term dynamic range is nowhere near Berlioz standards - this is a relatively early composer, who didn't have access to the vast orchestras of later periods - but the short-term is just fine when compared to the Manics and RHCPs.  The recording level appears low, but it's a slow movement and thus expected to be on the quiet side.


The other one is Wachet auf, ruft uns die Stimme, which is effectively an organ fugue.  This one shows unusually low dynamic range for a classical piece, and is thus an interesting example.



ReplayGain: +6.85 dB (track)
Sparklemeter: 10.94 dB

The recording level is definitely on the low side for this one, but with the low dynamic range it probably reduces listener fatigue that way.  Even so, it still clearly has better dynamic range than either of the compressed modern tracks and is thus more listenable.


I've uploaded the graphs for all tracks from the discs I used, so if you want to have a look at them, feel free to fiddle with the image URLs a bit.  I have other recordings analysed as well, but not uploaded - these include an eye-opening comparison of a well-known artist at different times.

Edit: oh drat, it looks like I *did* break the forum formatting.  Miniature versions of the images are being put up now.

Measuring Dynamic Range

Reply #19
The measurements seem to be fine, but how about trying to recreate the scenario that mastering studios are using today?

Take a good track with high dynamics and use a good compressor (AFAIK WAVES has a pretty good one) with various settings and compare the results
Can't wait for a HD-AAC encoder :P

Measuring Dynamic Range

Reply #20
The VU meter is at about -12 dBFS for most of the track, but reaches up to -8 dBFS for a significant time.  Look at the sparkle graph (green) for that section - notice that it is both depressed and flat compared to the -12 dBFS sections (which are already rather flat compared to earlier examples).  This is precisely what I expected to see when I started this thread.
I really like that (the green graph - not the mastering!).

Have you / will you share this software?

Cheers,
David.

Measuring Dynamic Range

Reply #21
Me likey the pretty graphs. It's nice to see the same units (VU db) that mastering engineers use for referring to loudness; pronouncements about -8db being insanely loud now make more sense.

To be honest, I'm really not a fan of the corona plot. I don't see any information that can be gleaned from it that couldn't also be gleaned from a direct waveform plot more accurately. And waveform plots aren't known for their accuracy in evaluating mastering!

The Sparkle reading of 9.28db for the RHCP track is very suspect to me. Everybody knows that record is squashed to hell. For it to have only a shave less sparkle than the relatively uncompressed track below it - and only 1db lower than a Bach organ piece! - well, it's a little hard to swallow that that might be an accurate measure of the short-term dynamic range. My gut tells me that it's just being overly sensitive to drum transients, which is what one would predict from a PPM meter / VU meter, I suppose.

Then again, drum transients do make a real difference in the dynamics of a piece, and symphonic music with little percussion is going to have less dynamics as a result, in a very real sense. The alternative explanation for those numbers is that, in fact, very little dynamics have been lost in Californication. Certainly that's not exactly the most PC answer, but I found similar results in pfpf, where except for the most squashed stuff like Merzbow, there's a surprising amount of dynamics left in modern music.

This whole issue, of course, is why I chose a pure variance measurement for pfpf.  With a histogram like that, I woud probably be getting medium-term DR readings of <1db for the RHCP and ~3-5db for the classical. Basing the measurements exclusively on the variance would increase sensitivity to dynamics changes and reduce sensitivity to percussion.

Chromatix, would you like access to pfpf and the relevant screenshots? I have access to some webspace now.

Measuring Dynamic Range

Reply #22
The Sparkle reading of 9.28db for the RHCP track is very suspect to me. Everybody knows that record is squashed to hell. For it to have only a shave less sparkle than the relatively uncompressed track below it - and only 1db lower than a Bach organ piece! - well, it's a little hard to swallow that that might be an accurate measure of the short-term dynamic range.

Hmm... Chromatix did say the results were reported in terms of raw/unweighted values. For my part, I am eager to get a little "quality-time" with this program! The pre-release results already look very promising.

    - M.

 

Measuring Dynamic Range

Reply #23
Yes, I still need to do some playing with how the "single figure of merit" is calculated.  At the moment it produces numbers that sort in about the right order, but without as much "intuitive impact" as I would like.  Perhaps if I increased the weighting of the variance compared to the median, subtracted a "typical" median value, and converted it back to a ratio, it would have more impact.

The sparkle graph is very sensitive to drum hits - this is what a PPM is intended to watch for.  However, notice that the RHCP and Manics tracks, which *do* have strong percussion in their scores, still produce lower "sparkle" values than Wachet Auf, which emphatically does not.  This is because both the RHCP and Manics have had strong *compression* applied, though only the RHCP suffers from excessive *limiting*.  Notice that the Leftfield track, which has uncompressed drums, has a very strong sparkle reading, noticeably above even the Berlioz.

Wachet Auf is, as I said, very unusual among my collection of classical tracks.  I think this is because it comes entirely from a single mechanical wind instrument (the organ), which does not have a noticeable "decay" phase (in the sequence Attack, Decay, Sustain, Release).  There is a noticeable "chuff" in the attack phase (as the airflow into the pipe attempts to start the resonant oscillation), but this has at most the same energy as the sustained note, and is merely in a different frequency band.  This is very different behaviour from, say, a piano (which is a percussion instrument).  Also, the piece itself is very calm and laid-back, probably 60bpm or less, and is clearly not meant to have "exciting" dynamics in it.

I think I do need to get some measure of limiting (or more precisely, lack thereof) in there.  The Manics track is noticeably more listenable than the RHCP one, though I suspect this is really due to the harmonic distortion introduced by the limiter, rather than the lack of dynamic range itself.  I would also like to get a (possibly separate) measure of long-term dynamics in, just to please the classical folks who don't like it when drums get over-emphasised.

I also think that the corona graph is useful when talking to laymen and inexperienced audio people, especially those who have only ever seen a VU or a digital peak meter, and/or don't know what the difference is.  The corona shows both where the digital peaks are (approximately) and where the peak-meter reading is (the lower edge of the corona).  You can show people that the corona is squashed against the full-scale limit, and on the same graph show them just how much dynamic range is going to waste.  So I'm going to leave that in.  :-)

Measuring Dynamic Range

Reply #24
The measurements seem to be fine, but how about trying to recreate the scenario that mastering studios are using today?

Take a good track with high dynamics and use a good compressor (AFAIK WAVES has a pretty good one) with various settings and compare the results
Good idea - and I've just done that.  Somebody released an "unmastered" track on this forum called "Hollister - Bismarck", so I downloaded that and did my worst on it using GarageBand.  So I now have four five modified versions of it.

The first version is simply a downsampling to CDDA standard, using SoX.  GarageBand won't import something properly at 96/24, so I had to use this downsampled version for the fiddling.  The graph and figures from the downsampled version are essentially identical to the original version.

The second (Remastered Badly), third (Remastered Very Badly) and fourth (Stinky Remaster) versions are successively "hotter", as I figured out how to maximise the level without making it clip audibly.  I think there were a total of four compression filters in use by the last version, including one limiter set for soft-clipping and one aggressive multiband compressor.  The Stinky Remaster still isn't *quite* up to RHCP standards, but it's pretty close.

The fifth version (Limburger Remaster) isn't as "hot" as the other versions, as I simply turned up the compressors as hard as I dared, but wasn't able to "safely" bring up the level to match.  It still sounds bloody awful.

The ReplayGain values are as follows:

CDDA: -0.20 dB
Badly: -5.94 dB
Very Badly: -7.56 dB
Stinky: -8.06 dB
Limburger: -5.76 dB

The Sparklemeter readings are as follows:

CDDA: 7.8
Badly: 4.6
Very Badly: 2.8
Stinky: 1.8
Limburger: 1.4

These are not comparable to the numbers posted earlier, since I've changed the calculation and display methods.  For that reason, I'll give the new versions of the numbers here:

Berlioz: 7.0
Leftfield: 11.0
RHCPs: 1.1
Manics: 1.3
Brandenburg: 4.6
Wachet Auf: 2.5

Obviously, I'm not quite cut out to be a mastering engineer for a major record label.  :-)  However, this is simply what I could sort out in one morning using very ordinary tools and no experience.