Skip to main content
Topic: Replay Gain specification (Read 35143 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Replay Gain specification

Reply #75
Of course I have already objected (and still object!) to the "because that is what David did" argument regarding "preferred" ID3 tagging, so take the link with a grain of salt.
I'm not following this properly, so haven't seen this, but if you're referring to the sections in the original proposed spec describing how to store the ReplayGain values, then if there's a different implementation out there that's far more common that the one I originally suggested, then obviously that common implementation is the defacto standard, and that is what is supposed to be captured in this wiki spec (IMO!).

AFAICT for mp3 it's a bit of a mess (not a huge mess, but not ideal!) and the most useful thing to do might be to document the multiple approaches that are out there and what uses/supports each approach. A presumption could be that the widest supported approach(es) is/are the defacto standard, and anything with little or no support could be depreciated (even if it's what's in the original proposed standard). I doubt there can be one single standard for RG tags in mp3, since there isn't one single standard for tagging mp3.


Replay Gain specification

Reply #76
So if we're going to replace -20 dB SPL with -14 dB SPL in the spec, then shouldn't we also change the default pre-amp from +6 dB to zero? Or am I not understanding some nuance here?

You are correct. I am working my way from top to bottom to bring the specification up to date with current practice. Last night I finished "Reference level" and "Gain calculation" sections. Next is documenting the metadata. When I get to the player recommendations section I expect it will be revised to specify a 0 dB default preamp gain.

BTW: For these references, it is -20 dBFS not -20 dB SPL. I have chosen not to use "dBFS" in the specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).

Replay Gain specification

Reply #77
+replaygain has about twice as many Google results as +"replay gain".

David has said that he now prefers ReplayGain. I can make the change is there are no strong objections.

It would also be nice if RG had a logo.

Replay Gain specification

Reply #78
You may be aware of this already, but:

- XBMC uses ReplayGain
- XBMC is going to be supported in hardware in some fashion shortly per

So that seems to mean that ReplayGain will be supported in hardware to some extent, along with the rest of XBMC.  If you are not doing so already, maybe it makes sense to reach out to the XBMC developers to inform them of your updates, and see if they can implement any enhancements as part of their hardware porting.

Replay Gain specification

Reply #79
This is work in progress. Vorbis comment documentation is not in there yet but it is coming.

ID3 is the main reason why I'm using Vorbis on my wife's CLIP+ rather than MP3. On the SqueezeBox and on the PC, FLAC all the way. Converting from FLAC to Vorbis guarantee that there is no tag lossage. Documenting existing VorbisComment practices would probably take a minimal amount of time and would give users a working solution while the ID3 mess get very slowly argued and resolved.

By the way, the CLIP+ is one of the few playing supporting Replay Gain on the original firmware, but ID3 support is brittle and picky, as you would expect. They only accept the MediaMonkey encoding, and have problem with the Foobar2000 encoding. In your spec, you may want to make sure to specify the number of spaces and if '+' is included, which seems to be an issue :-(.

Another comment. I don't see of you can have working clipping prevention if you don't have both the track peak and the album peak. If you only store track peak, you may be force to adjust gain between tracks. If you store only album peak, you may adjust down a track unnecessarily in shuffle mode.



Replay Gain specification

Reply #80
Another comment. I don't see of you can have working clipping prevention if you don't have both the track peak and the album peak. If you only store track peak, you may be force to adjust gain between tracks. If you store only album peak, you may adjust down a track unnecessarily in shuffle mode.

It depends what you mean by "working clipping prevention". You certainly can prevent clipping with just the track peak metric. It will have the behavior you describe. I'm personally not convinced that this is any less obnoxious than lowering the gain below target for the entire album.

In the end, this sort of clipping prevention is a third-rate solution. The first-rate solution is to set your player up with enough dynamic range so that you never need to boost anything. Second-rate is to use a nice digital peak limiter after the boost.

Nevertheless, I do plan to add discussion in the RG specification of how album peak can be used to prevent clipping when album gain mode is used.

Replay Gain specification

Reply #81
So that seems to mean that ReplayGain will be supported in hardware to some extent, along with the rest of XBMC.  If you are not doing so already, maybe it makes sense to reach out to the XBMC developers to inform them of your updates, and see if they can implement any enhancements as part of their hardware porting.

In theory there will be no enhancements in the first version of the specification. It's all about documenting current practice in with more prescriptive language. This first version is half complete at this point and therefore not yet self consistent. Anyone looking at it now will get confused so not a good idea to share it beyond people following it here. Hopefully it will be finished in a few more weeks.

Thanks for all the comments everyone. They have really helped me up the learning curve.

Replay Gain specification

Reply #82
I have chosen not to use "dBFS" in the specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).

I went ahead and added a note to that effect at the first use of "dB relative to a full-scale sinusoid", but that phrase is only used twice, and I think it needs to be used a few more times. I think the spec would be more clear if we just used dBFS everywhere it's needed and had a note on the first use saying that for our purposes, dBFS is dB relative to a full-scale sinusoid. I was about to do that, then backpedaled a bit. I'd rather not be the one responsible

Replay Gain specification

Reply #83
I think that this v2.0 effort of the ReplayGain spec should also include the following:

Tag: Peak_Track
Type: 32bit float normalized

Tag: Peak_Album
Type: 32bit float normalized

Tag: RMS_Track
Type: 32bit float normalized

Tag: RMS_Album
Type: 32bit float normalized

Peak_track and Peak_Album is hopefully obvious enough.
The reason for 32bit float is that it's more precise than dB (which has conversion and rounding issues to/from float).
A float peak for the track and album can be used directly in the audio processing chain (after conversion from text first obviously, unless the metatag format allows binary float that is).
0.0 is silence and 1.0 is full digital, this is pretty much as industry standard as you can get, and it's likely that 1.0 will remain full digital for the foreseeable future, regardless of whether dB or LU is "popular".
And if the tag is textual the float could even be 64bit, as a player could simply truncate the text. (some text to binary float routines already does this).

Please note I wrote Peak_ rather than say ReplayGain_Peak_Track etc, as I believe track and album peak is important enough to be available that it should not be "name" limited to just the ReplayGain standard,
in fact peak should be fully independent and just part of the standard audio format meta tag specs instead.

Which brings me to the RMS_Track and RMS_Album tags.
Again it's 32bit float (but again implementations should handle a 64bit by truncating if the tag is text obviously),
and if a track had a "RMS_Track=0.1" then that would equal -20dBFS (which is exactly float value 0.1).

The calculation of the RMS should be (and I believe this is again the industry standard way):
50ms window (sine I believe?), each channel is squared and the means of the channels are added and then the root is calculated. (if I recall correctly, ReplayGain actually does this at one part in it's stage?)

Again RMS_Track and RMS_Album are named such as they like Peak_ should be part of the audio format meta spec.

The benefit is threefold, as implementations like ReplayGain etc improve or adapt new loudness curves and top/bottom percentage cutoff are adjusted, the RMS remains constant (Z filter, aka flat filter, or neutral if you will)
The artists could theoretically hand tag tracks they publish and get the RMS from say Adobe Audition or similar tools, almost every "studio" audio tool lets you do RMS (provided they also show the float value in addition to the dBFS).
The users can simply tell their audio player that they'd prefer the music loudness adjusted to -20dBFS.

Implementation is thus rather easy in the player.
1. The user specified -20dBFS as preferred loudness (I advise this to be the default value too as it matches with the K20 of Bob Katz "K-System", which in turn matches the SMPTE standard.
2. The player translates -20 dBFS (in this case) to 0.1 float.
3. The player finds the RMS_Track tag, which let's say is 0.2 (around -13.98 dBFS, aka K-14)
4. So the math is 0.1 / 0.2 = 0.5, meaning that the player simply need to do: (SMPL * 0.5)

This opens up for a lowest common denominator, ensures that even if ReplayGain or R128 is replaced by something new that playback is consistent.
With Peak_Track, Peak_Album, RMS_Track and RMS_Album as a minimum common denominator then the likes of ReplayGain and R128 are optional.
A user can choose to use RMS only, or RMS with ReplayGain preferred or use R128 but fallback to ReplayGain if missing and if that is missing then fallback to RMS.
ReplayGain or R128 may be obsolete one day, but RMS will not as it is the base of almost all these loudness algos out there.

It also shouldn't be so hard for the ReplayGain code to also spit out the RMS tag as well as the ReplayGain tag.

Some of you may say that RMS is not as accurate as ReplayGain etc.
But I do not agree with that, as most music is mixed by someone listening, anything that shouldn't be there is filtered out.
This means that a low frequency or high frequency sound that is present is 99% intentionally left there on purpose.

I have in many cases reacted to ReplayGain adjusting songs wrongly.
Like when a track with a very bassy sound is suddenly tagged (by foobar 2000 RG scanner) to need a +6 dB gain and it's already has peaks at 0dBFS (or sometimes above with lossy formats),
while a track with little bass is tagged with -3dB yet I can clearly hear that the bass modest track should be twice as loud while the bass heavy should have been at least twice as soft to match my -20dBFS preferred average.
And trust me, I double checked and re-scanned, and the funny thing is that when I ran the same tracks through my own plain simple RMS (done as described earlier and should the the standard way to do RMS AFAIK),
the RMS results actually matched what I heard.

And here is the even more amusing thing. Last time I did a full scan of a large amount of my collection of music it turned out that overall (summed average) of all loudness values.
The plain RMS only deviated +/- 1dB versus ReplayGain. (we're talking thousands of tracks, from metal, to techno to pop, to standup, to film scores and classic, and "chip" computer music.)
But always when I hear that RG is wrong vs what I hear, and I check with my RMS tool I find that the RMS value is always correct.

I always wondered why, but the answer is simple. It is the loudness curve combined with the percentile selection, both "ignore" parts of the sound.
While RMS takes it all. Sure, a straight RMS scan may get biased by noise. Then again...a lot of music is noise these days.
Loudness curves works best with tones, while RMS works "ok" with anything, from tones to static. And it's never that much off from what you hear vs the RMS values.

So I think that adding generic Track/Album tags for Peak and RMS as a minimum requirement, is a must before evolving ReplayGain further and adding loudness versions,
or R128 or ITU whatever into this mess.
And while RMS is not perfect, it has certainly never been wrong that I can recall hearing.

Also, RMS does not have issues with which loudness curve or which percentile range should be used.
Nor does it have the confusion of 89 vs 83db SPL reference level that ReplayGain has. (I find it amusing that ReplayGain itself fell pray to the louder is better trap itself as people complained their poptracks was not loud enough so voila there's the +6dB change), now ReplayGain is "trapped" at 89dB SPL (which is -14dBFS aka K-14) while it should have been K-20.
Maybe v2.0 of ReplayGain may mitigate that, or the R128 proposal etc.
But in the meantime my Peak and RMS suggestions above could be rolled out right now, both are established standards, easy to write and read, and would not conflict with ReplayGain nor any future similar standards.

I'm a programmer and a musician in case anyone are wondering. I've released 3 albums. And as a experiment I made sure the RMS of them was at -20dBFS (aka floating point at around 0.1).
ReplayGain insisted on making some tracks louder while others softer, again it was the loudness curve (or was it the percentile selection or a combo of both?) that was way off base.
As the artist that made those tracks I knew darn well how loud not only the tracks overall where, but also how loud they are intended to be at certain points.
And again my own RMS tool (named K20RMS for obvious reasons) showed me what I heard, that the loudness algo was nuts.
A track with a very high bass sound that is loud enough originally is supposed to be amplified by +10 dB then something is clearly not right when they should have been approx +0 dB instead.

The perfect loudness filter will never be possible. Everyone hear things differently either due to race, age, emotion, attention, environment or hearing damage, and what is used to emit the sounds.
And every piece of audio sounds different as far as loudness go, based on environment, race, age, emotion (do I need to go on?)
The only to get a perfect loudness filter is to play all recordings made to all people that can hear....good luck with that, and the extreme cases will still be upset due to the averaging of such filters.

But RMS on the other hand is unbiased, so if -20dBFS sounds ok for someone then it probably does. While others may like it at -18dBFS, while some may like it at -23dBFS. Let the user set the preferred RMS loudness, and optionally let them choose which method they'd like to take precedence over which others. In my case I'd probably choose RMS only but... *shrug*

Users get confused with the Preamp alone. Add to that the still existing confusion of ReplayGain's 83 and 89db SPL, the upcoming ReplayGain "revision", and the new R128 and so on, and we aren't exactly making the fight against the loudness war any easier are we?

Another thing to remember is that RMS calculation is darn cheap (2-3 lines of code added to a normal DSP loop maybe) compared to loudness filters, can be done faster than realtime, and no patent or license issues at all, and RMS is not tied to any SPL, as RMS is fully unbiased.
RMS would also act as a fallback in cases where the user has enabled clipping prevention but the ReplayGain or R128 etc insists to add so much gain the track would clip. tHe player could in that case look at the RMS and most likely see that the RMS shows the audio is originally at the exact preferred level of the user in the first place, for example.

Some audio formats do have Peak support but track only so no album peak, and none have so far had either track and album RMS, these four should be a bare minimum in any format these days.
Heck it would take Apple hardly any time at all to do a Peak and RMS scan of the entire iTunes catalog and update the tags, they might even be willing to do so if asked nicely, but good luck trying to get them to do a ReplayGain scan or R128 scan etc...

Sorry if this came out as a rant against ReplayGain, it really isn't. (at least not directly) I'm an avid fan of the efforts of the likes of Bob Katz and his K-System (especially K-20).
And ReplayGain is used on all the tracks in my music collection, and foobar2000 is set to gain+clipping prevention and track mode. It just nags me that every now and again a tracks comes along that RG has been way off base with.
Sure I could manually tweak the tag. But when a similar RMS tag would not need any manual tweaking I'd rather want to set foobar2000 to use -20dBFS preferred and use RMS tags even if ReplayGain tags are present.

I may just end up writing a proper tool one of these days which makes Peak_Track and Peak_Album and RMS_Track and RMS_Album tags. (+ ReplayGain tags with a REPLAYGAIN_ALGORITHM=K20RMS for compatibility reasons, which would not be needed if Peak and RMS was already supported, that is...)

Replay Gain specification

Reply #84
Rescator you totally confused me. You're talking about additions to the ReplayGain spec but then it really sounds like you gave a lot of examples of what you perceive as failings of the current ReplayGain algorithm.

Why would the current spec which presumes use of the current algorithm (and therefore also assumes that one sees value in using the ReplayGain solution despite lack of a perfect algorithm) add requirements to support a completely separate set of parallel RMS tags? Do the values have a use for a player that want to apply ReplayGain as intended? If you don't want to apply ReplayGain (because it doesn't work or whatever) then why would the ReplayGain spec have anything to do with a completely alternative solution?


Replay Gain specification

Reply #85
Because "ReplayGain" is becoming a umbrella standard.
It is moving towards a v2.0

Currently there is (let's call them this for simplicity sake):
RG83 and RG89
Then there will most likely be a RG2.0
plus there already IS a R128 (it ids itself with the tag REPLAYGAIN_ALGORITHM: EBU R128)

So that's RG83, RG89, R128, and probably RG2.0 down the road.
All I'm saying is to add RMS as well.

ReplayGain is both a specification and an algorithm,
I like the spesification but not the algo (as I pointed out).
But I also see no point in say the peak and rms having to be in REPLAYGAIN_TRACK_RMS tags when just RMS_Track, RMS_Album and Peak_Track, Peak_Album would do.
Thus the REPLAYGAIN_TRACK_PEAK and REPLAYGAIN_ALBUM_PEAK would just be phased out slowly.
As I said Peak and (IMO) RMS are so basic all formats should have them, I do not see a need to plonk REPLAYGAIN_ in front of them.
But that doesn't mean that the ReplayGain specification can't standardize what I explained about Peak and RMS in the post above.

Replay Gain specification

Reply #86
Excuse me if I missed this point in you long post. It seems to me that your personal findings with respect to using RMS as a predictor of perceived are in conflict with what the literature says. How do you explain this descrepancy?

Replay Gain specification

Reply #87
More importantly, please keep this thread for discussing the wiki ReplayGain spec - which is being authored to reflect current practice.

Ideas for improving ReplayGain are always welcome, but belong in their own threads.


Replay Gain specification

Reply #88
David has said that he now prefers ReplayGain. I can make the change is there are no strong objections.

There were no objections. I've made the edits. We're now officially "ReplayGain".

Replay Gain specification

Reply #89
I've cleaned up the HydrogenAudio wiki and Wikipedia articles to reflect the camel case name: ReplayGain.

Replay Gain specification

Reply #90
Would it be possible to provide links in the spec to pink noise reference samples? I know there is a link somewhere, but even that file targets the old 83dB reference. Also, mono and stereo versions would be nice.

FYI, I've finished implementing ReplayGain in native C# by following the revised specifications. It was pretty easy to understand, even for a layman developer. Good work!

Replay Gain specification

Reply #91
One thing that was a tad confusing is how 2 channels should be analyzed together. The WIKI says:

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise

This suggests to me that the left and right channels should be summed (or doubled for mono), then RMS should be calculated on these values. The "reference" implementation from replaygain.dll however calculates the RMS using all samples divided by the number of channels, which sounds more like its calculating the average of the two channels for each 50ms window.

Does that make sense? I assume that the reference implementation is correct and that the wording is confusing, or that my understanding is simply muddled. Which is it?

Replay Gain specification

Reply #92
Because it is all defined in terms of the reference, I don't think it matters how you combine the two channels so long as it is balanced and you do it the same way for the reference signal and the test signal.

It would be confusing to link to the SMPTE -20 dB reference signal. Does someone want to volunteer to generate a -14 dB version for us?

Re: Replay Gain specification

Reply #93
What happened to all of this? I wanted to add ReplayGain to an application of mine, but I stumbled upon so much stuff and so few actually useful things like libraries, since I prefer to hand over the analysation and calculation to someone who knows the stuff.
It would be good to have a reference analyser implementation in major programming languages separate from LAME so that it can be used directly by application developers and thus spread faster.

That said, I liked the proposal of Rescator to create standard Peak_Track and Peak_Album tags, since I think that he current keys are a bit long and unconcise, and since it isn't anything inherently dependent on RG itself. If that gets recognised, the other two could also be shortened to RG_Track and RG_Album.
I also like the idea of 0.0 to 1.0 represented by a simple float. Currently I have to parse the tags as strings from values like -1.76 dB, which is quite ugly in my opinion.

SimplePortal 1.0.0 RC1 © 2008-2018