Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Replay Gain specification (Read 54928 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Replay Gain specification

I've taken the first step towards fulfilling my threat to produce an up-to-date edition of the Replay Gain specification.

The working draft is published on the Hydrogen Audio Wiki. As it currently stands, this is a copy-paste from David's (2Bdecided)original proposal. The next steps include  copy editing to make it read like a standard and digging through the post-publication discussion on Hydrogen Audio forums (and elsewhere?) and conforming the specification to current practice.

If you would like to make small changes and corrections to the draft, feel free to edit the wiki. If you know of larger changes that need to be made, let's discuss them in this thread first.

Replay Gain specification

Reply #1
Great work Notat.

Cheers,
David.

Replay Gain specification

Reply #2
In almost all cases RG as currently defined gives very well results. It's a real advantage having RG. However, there are a few exceptions:
  • Consider two classical rock albums, Lynyrd Skynyrd's "One for the Road" (original CD issue, not Deluxe/remaster) and Pink Floyd's "The Wall". Both of them have very strong fluctuating track gains:
    • >wavegain -n -a *.wav

      Processing directory '.':

      Analyzing...

          Gain  |  Peak  | Scale | New Peak |Left DC|Right DC| Track
                |        |      |          |Offset | Offset |
      --------------------------------------------------------------
        -2.32 dB |  24401 |  0.77 |    18682 |  -14  |  -10  | 01_workin_for_mca.wav
        -2.51 dB |  27629 |  0.75 |    20695 |  -14  |  -10  | 02_i_ain_t_the_one.wav
        -1.91 dB |  25477 |  0.80 |    20448 |  -14  |  -10  | 03_searching.wav
        +1.40 dB |  22177 |  1.17 |    26056 |  -14  |  -10  | 04_tuesday_s_gone.wav
        +0.62 dB |  19248 |  1.07 |    20673 |  -14  |  -10  | 05_saturday_night_special.wav
        +2.15 dB |  17759 |  1.28 |    22747 |  -14  |  -10  | 06_whiskey_rock_a_roller.wav
        +3.03 dB |  18902 |  1.42 |    26793 |  -14  |  -10  | 07_sweet_home_alabama.wav
        +1.48 dB |  18897 |  1.19 |    22408 |  -14  |  -10  | 08_gimme_three_steps.wav
        +1.55 dB |  19715 |  1.20 |    23567 |  -14  |  -10  | 09_call_me_the_breeze.wav
        +3.80 dB |  14983 |  1.55 |    23205 |  -14  |  -10  | 10_the_needle_and_the_spoon.wav
        +3.07 dB |  16378 |  1.42 |    23321 |  -14  |  -10  | 11_crossroads.wav
        +2.31 dB |  16157 |  1.30 |    21079 |  -14  |  -10  | 12_free_bird.wav

      Recommended Album Gain:  -0.10 dB      Scale: 0.9886


      WaveGain Processing completed normally

      > _
    • >wavegain -n -a *.wav

      Processing directory '.':

      Analyzing...

          Gain  |  Peak  | Scale | New Peak |Left DC|Right DC| Track
                |        |      |          |Offset | Offset |
      --------------------------------------------------------------
        -4.81 dB |  31879 |  0.57 |    18323 |    0  |    0  | 01_in_the_flesh.wav
        +0.57 dB |  28577 |  1.07 |    30515 |    0  |    0  | 02_the_thin_ice.wav
        +2.14 dB |  31226 |  1.28 |    39950 |    0  |    0  | 03_another_brick_in_the_wall_part_1.wav
        +0.09 dB |  31879 |  1.01 |    32211 |    0  |    0  | 04_the_happiest_days_of_our_lives.wav
        +0.59 dB |  30688 |  1.07 |    32845 |    0  |    0  | 05_another_brick_in_the_wall_part_2.wav
        +2.28 dB |  30751 |  1.30 |    39982 |    0  |    0  | 06_mother.wav
        +2.42 dB |  15728 |  1.32 |    20781 |    0  |    0  | 07_goodbye_blue_sky.wav
        +2.34 dB |  18891 |  1.31 |    24732 |    0  |    0  | 08_empty_spaces.wav
        -2.37 dB |  31879 |  0.76 |    24266 |    0  |    0  | 09_young_lust.wav
        -2.83 dB |  31879 |  0.72 |    23015 |    0  |    0  | 10_one_of_my_turns.wav
        +0.96 dB |  30650 |  1.12 |    34232 |    0  |    0  | 11_don_t_leave_me_now.wav
        -0.91 dB |  31879 |  0.90 |    28708 |    0  |    0  | 12_another_brick_in_the_wall_part_3.wav
        +8.70 dB |  9227 |  2.72 |    25122 |    0  |    0  | 13_goodbye_cruel_world.wav
        -0.58 dB |  31153 |  0.94 |    29141 |    0  |    0  | 14_hey_you.wav
        +6.35 dB |  21597 |  2.08 |    44864 |    0  |    0  | 15_is_there_anybody_out_there.wav
        +2.25 dB |  20310 |  1.30 |    26316 |    0  |    0  | 16_nobody_home.wav
        +3.77 dB |  15999 |  1.54 |    24693 |    0  |    0  | 17_vera.wav
        -3.36 dB |  29563 |  0.68 |    20079 |    0  |    0  | 18_bring_the_boys_back_home.wav
        -3.28 dB |  31263 |  0.69 |    21430 |    0  |    0  | 19_comfortably_numb.wav
        +0.61 dB |  25850 |  1.07 |    27731 |    0  |    0  | 20_the_show_must_go_on.wav
        -0.82 dB |  31879 |  0.91 |    29007 |    0  |    0  | 21_in_the_flesh.wav
        -3.45 dB |  31879 |  0.67 |    21429 |    0  |    0  | 22_run_like_hell.wav
        -5.03 dB |  31879 |  0.56 |    17865 |    0  |    0  | 23_waiting_for_the_worms.wav
        +9.42 dB |  6138 |  2.96 |    18156 |    0  |    1  | 24_stop.wav
        -0.38 dB |  28105 |  0.96 |    26902 |    0  |    0  | 25_the_trial.wav
      +18.03 dB |  3771 |  7.97 |    30057 |    0  |    0  | 26_outside_the_wall.wav

      Recommended Album Gain:  -1.89 dB      Scale: 0.8045


      WaveGain Processing completed normally

      > _

    But both albums differ in a very important property:
    • All tracks on Lynyrd Skynyrd's "One for the Road" share the same characteristics, all of them are strong rock songs. It's not obvious why the album was produced this way. In order to correct this, I would prefer to always listen to it in track gain mode, no matter what mode I've chosen in RG configuration.
    • The opposite is true for Pink Floyd's "The Wall". The album consists of a mixture of strong rock and (very) soft acoustic songs. If I listen to it with RG set to track gain, some of the soft songs with exceptional high track gain will appear to be much to loud, hence I would prefer to always listen to this particular album in album gain mode, no matter what mode I've chosen in RG configuration.

    With respect to the current RG standard I can work around this by deleting the unwanted gain value, i.e. album gain from Lynyrd Skynyrd's "One for the Road" and track gain from Pink Floyd's "The Wall", respectively, because the standard requires a RG compliant player to choose the album gain if the track gain is not present and vice verse.

    The disadvantage of this (work around) strategy is that not only I have to make this decision at scan time, but also some information is lost, which may become important if I ever consider to change the decision.

    May be a better idea is to extend the RG standard in order to support another optional tag with the semantics of "preferred mode" containing the information whether album gain or track gain should be used no matter what mode is chosen in RG configuration. On the other hand the RG configuration should offer a switch whether the player should honor the proposed "preferred mode" tag or not.
  • The RG standard states the following:

    Quote
    A good method to determine the overall perceived loudness is to sort the RMS energy values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 7, where there are many values near the top), the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Gain.

    Possibly it is a good idea to let (expert) users overwrite the 95% value at scan time in order to more reflect the character of audio under consideration. The following, including manual post-processing, is not uncommon:

    Beatles tracks tend to sound much louder than other stuff in my collection after RG, so I normally bump them downward.

    As 2Bdecided already pointed out there is no solution to resolve this without manual intervention:

    As far as I can see, proper solutions to this problem require human intervention*. So if you don't like the automated results, change the RG tags in your own files to give the loudness your ears think is correct.

    The proposal is to have everything hidden for the casual user (as before)  but to have options for finer control for the expert user.

Replay Gain specification

Reply #3
That's not a bad idea, but I think it's best for the wiki to be developed so that it accurately reflects ReplayGain v1 (as widely implemented) before building on it.

The problem at the moment is that the original ReplayGain website is out of date, and a defacto standard exists out there which is based on the original but with several important modifications and improvements. That's what needs to be set in stone here IMO.

Then by all means improve it!

Cheers,
David.

Replay Gain specification

Reply #4
Thanks a lot for posting this specification! I've been curious about how Replay Gain works for quite some time.

Before I start with suggestions for improvement of RG in general, off to the current text version.

  • Peak amplitude data format: I assume we're talking about an unsigned IEEE 32-bit normalized floating point number? You should clarify this. Further, what speaks against also allowing negative values? The sign bit is there in floats anyway, so we can convey one more source of information (maximum or minimum normalized sample value, whatever has a larger magnitude).
  • Bit format: Is there any RG-specific header? What comes first, Track and album gain in 32 bits, then peak value? Is the peak value mandatory? What happens if another gain type is added later? If you don't have a RG sync header, how can you prevent parsing errors (like the MP3 false sync mentioned)?


Best,

Chris
If I don't reply to your reply, it means I agree with you.

Replay Gain specification

Reply #5
How do I edit this? Who do I ask for permission?

Cheers,
David.


Replay Gain specification

Reply #7
The problem at the moment is that the original ReplayGain website is out of date, and a defacto standard exists out there which is based on the original but with several important modifications and improvements. That's what needs to be set in stone here IMO.


This is indeed the plan. The immediate project is to document current practice. Without that in place, we don't have a stable platform from which to make improvements.

It would probably be best to open separate threads to propose and discuss individual improvements.

Replay Gain specification

Reply #8
I'm working on section 1.4 (Calibration with reference level). 83 dB SPL is mentioned frequently. This strikes me as a red herring. Replay Gain does not endeavor to tell anyone how loud, in absolute terms, they should be listening.

The important point taken by Replay Gain from the SMPTE standard is that -20 dBFS pink noise is the the reference to be used for average loudness. In other words, Replay Gain specifies a playback system with 20 dB of headroom to accommodate peaks.

Later in section 3.2 (Pre-amp), a 6 dB boost enabled by default is specified. This has the effect of bringing headroom down to 14 dB.

Does it seem reasonable to remove references to 83 dB SPL and speak in terms of headroom? I think 83 dB is causing confusion. I suspect it has lead several players to present user calibration parameters in terms of dB SPL.

Replay Gain specification

Reply #9
No - because you have to assume some listening level to use any psychoacoustics. Talking only about samples values in files with no real world reference is exactly how you create a dead-end standard which no one can ever improve.


There is a major change to make though: what's stored is the 83dB referenced result, plus an arbitrary 6dB. That's a defacto change from the original proposal.

Cheers,
David.

Replay Gain specification

Reply #10
Possibly it is a good idea to let (expert) users overwrite the 95% value at scan time in order to more reflect the character of audio under consideration. The following, including manual post-processing, is not uncommon:

Beatles tracks tend to sound much louder than other stuff in my collection after RG, so I normally bump them downward.

I edited my post to remove the word "much" as it was an unintended exaggeration.  My apologies to everyone.  If it matters any, my Beatles tracks are pre-2009.  I don't know if this is still a noticeable problem with the new remasters.

Replay Gain specification

Reply #11
No - because you have to assume some listening level to use any psychoacoustics. Talking only about samples values in files with no real world reference is exactly how you create a dead-end standard which no one can ever improve.


There is a major change to make though: what's stored is the 83dB referenced result, plus an arbitrary 6dB. That's a defacto change from the original proposal.

Cheers,
David.


An 83 dB SPL listening level assumption contradicts what is (not) said in section 1.1.2 (Required equal loudness filter) - "As we don't know the playback level the listener will choose, and don't want to use a different filter for sounds of differing loudness, a representative average of the above curves will is chosen as the target filter."

Its not entirely unambiguous what a representative average response curve is but I gather you did not use an 83 dB loudness contour to build the filter.

A simple option is for me to edit out conflicting non-normative detail in both sections. What's normative is the filter design (which I have yet to include in the specification) and the formula for calculating gain (I've got work to do there as well).

I believe the +6 dB is correctly called out in section 3.2 (Pre-amp). The text on replaygain.hydrogenaudio.com says 6 to 12 dB. I removed the 12 dB option in my early edits because I knew 6 dB was current practice.

Replay Gain specification

Reply #12
I believe the +6 dB is correctly called out in section 3.2 (Pre-amp).
No, that's nothing to do with what is stored.

Quote
The text on replaygain.hydrogenaudio.com says 6 to 12 dB. I removed the 12 dB option in my early edits because I knew 6 dB was current practice.
You know, I haven't read all this through since I wrote it! There were some nuances of meaning that don't seem important now, and others that seem more important.

I will try to contribute, time allowing. Sadly it can't be top of my list. Well, not "sadly" - new house to get ready, new baby on the way, job, Christmas - all good!

Cheers,
David.

Replay Gain specification

Reply #13
I don't know if this is still a noticeable problem with the new remasters.

If RG is restricted to remasters it should be clearly stated in the standard.

Replay Gain specification

Reply #14
That response seems unnecessary.  It was never my intention to suggest that RG be restricted to remasters.

The reason for my making that comment was to avoid having people come back saying they don't notice; later to find out that they are checking with the remasters.

Replay Gain specification

Reply #15
I think updaitng the documentation of RG V1 and structuring a V2 is a great idea.

A suggestion - please clarify for both RG V1 and any V2 whether the use of RG tags (as opposed to applying RG to mp3 data) prevents "bit perfect" playback.  I seen it posted elsewhere that the use of RG violates bit perfect playback, but I've never seen it clarified what modes of RG use this assumes.  Perhaps I know too much as an engineer and too little in this specific space (my engineering expertise lay elsewhere), but I thought that volume or level changes via RG were communicated digitally to a RG-capable playback device without altering the actual digital bits of the music, and that the playback device altered the level using the RG tag info.

Perhaps clarifying what is meant by "bit perfect" in any clarification will also help.  I think "bit perfect" can mean "no bits are lost, added, or inadvertently altered or distorted".  Assuming this , even if RG alters the actual bits during playback even when using tags, if no bits are lost or added and the only change is intentional, this seems to fit a reasonable defintion of "bit perfect with intended modification" or something like that.

Thanks for any clarification on RG and bit perfect playback, stating all assumptions for all judgments.

 

Replay Gain specification

Reply #16
If the replay gain is applied in the digital domain, bit transparency is lost. The original proposal included a short discussion of a digitally-controlled analog implementation. For some reason I had not carried that discussion over to the new revision. I have updated the new revision to include it. This demonstrates that a bit-transparent implementation is possible. I'm not aware of any such implementation, however.

Replay Gain specification

Reply #17
I will try to contribute, time allowing. Sadly it can't be top of my list. Well, not "sadly" - new house to get ready, new baby on the way, job, Christmas - all good!

Congratulations on all that!

I've yet to dig carefully through the discussion on RG changes. Hopefully some of these points will iron themselves out as I do.

Replay Gain specification

Reply #18
If the replay gain is applied in the digital domain, bit transparency is lost. The original proposal included a short discussion of a digitally-controlled analog implementation. For some reason I had not carried that discussion over to the new revision. I have updated the new revision to include it. This demonstrates that a bit-transparent implementation is possible. I'm not aware of any such implementation, however.



Thanks for addressing this.  I'm very impressed that V1 considered this back in 2002.  I think digitally-controlled analog implementation of RG or other bit-transparent RG methods are and will be increasingly important for those creating high quality home theater and whole home audio systems.  My guess is that a bit-transparent RG implementation in a uPnP/DLNA environment will require changes to uPnP or DLNA standards or practices, but I see that as an opportunity to suggest such changes to the relevant bodies that control those specs and practices rather than see that as a permanent barrier.  All things considered, everything evolves, and I'd love to see high quality audio evolve with RG as part of it.

Replay Gain specification

Reply #19
I'm not aware of any such implementation, however.

I'd like to see a justification of such an implementation based on the results of blind tests with real-world examples.

Replay Gain specification

Reply #20
I'm not aware of any such implementation, however.

I'd like to see a justification of such an implementation based on the results of blind tests with real-world examples.



I'd like to see what difference this makes as well under various real-world situations.  However, I don't know if I think of it as a "justification" - that implies to me that such an implementation has been proven to be tougher or less desireable to do in some undefined way.  Has it been proven to be tougher or less desireable to implement?  In what way?  What is the standard for "justification"?  What factors do you consider for justification?  Programming cost?  Hardware cost?  Ease of user setup?  Sound quality (double-blind confirmed, of course)?  What relative weights apply to each factor to arrive at a justification?  I'm not trying to be difficult, but as an Electrical Engineer who designed ADCs and DACs long ago this strikes me as just as easy to implement, inertia of current practices notwithstanding.

Replay Gain specification

Reply #21
Telling your amplifier to adjust the volume by a specific amount?  I would certainly think so.

EDIT: Noting your edit: sound quality.  Back on the topic of difficulty in implementation, perhaps you can explain the mechanism by which this can be accomplished within the current framework of digital transmission.  If it falls outside the current framework, please explain how you would get universal adoption and implementation.

Replay Gain specification

Reply #22
Maybe this is a helpful comment, and along the lines of what Notat is already doing...

The "Replay Gain Specification" wiki should be first and foremost an implementation guide. The "fluffy bits" can go. There will still be the ReplayGain HA wiki, which is already doing a very good job explaining it.

However, one thing is important: RG defines a calculation, a way of storing the result of the calculation, and a way of reading and using those values...
1. The most important thing is that you store two gain values and two peak values, with meanings + references/scales as defined.
2. The second most important thing is that a player does something sensible with these - about the most sensible thing I can think of is pretty much what was suggested a decade ago in the original spec, but I'm sure there are variations
3. The third most important thing is the calculation of those values. Only third-most, because you could improve it while remaining completely compatible with the intent of ReplayGain and all players.

Oh, while we're talking about defacto standards, I think ReplayGain is better than Replay Gain. FWIW Google seems to think it's more common.

Cheers,
David.

Replay Gain specification

Reply #23
Telling your amplifier to adjust the volume by a specific amount?  I would certainly think so.

EDIT: Noting your edit: sound quality.  Back on the topic of difficulty in implementation, perhaps you can explain the mechanism by which this can be accomplished within the current framework of digital transmission.  If it falls outside the current framework, please explain how you would get universal adoption and implementation.



I see you think that inertia of current practices is the obstacle, and I already anticipated that even if I also hold such inertia as more of an excuse than valid reason. 

Hardware perspective:  From a "clean slate" view for a new product I hope you can see that this is trivial.  Even from the inertia perspective, any playback device that reads tag information will already have the RG tag information if there is a RG tag in the song file, and even if it doesn't the change is straightforward.  Adding logic to a playback device to use the tag information for volume control is truly trivial.  I've been designing integrated circuits for many years, so I am confident in this assessment.  The incremental approach to any such implementation approaches zero if any change is made when stepper reticles or masks are changed for other reasons.  The microcode, logic and transistor redesign, layout design, design verification, and testing verification changes are all trivial if done in conjunction with other scheduled changes.  Package or pinout change should be zero.  Been there, done all of that.  Any such changes can be timed for product refreshes (as opposed to just for this one change), as is done across the industry.

If the design change is made using discrete components (trying to use existing integrated curcuits that have not designed this in), the component cost goes up a little, but not much.  I can definitely make the change cost look astronomical if I burden it with fixed and other allocated costs, but the true incremental cost for a competent designer is small.  I also have an MBA and I take on bogus business cases at work all the time from those that want to kill a change with nonincremental costs.

From a digital transmission perspective, I hope you were joking.  Surely you aren't proposing that ethernet(TCP/IP), 802.11x, USB, S/PDIF, or other digital transmission standards need to be modified to accommodate this.

I am not an expert on application changes, but I would truly be disappointed if the above can be done fairly easily and still have an application architect or programmer claim it's difficult.  Yes, it is different than today, but that doesn't automatically make it difficult.  Taking a step back, this can't be more difficult than implementing RG from scratch, and that happened years ago with far less capable technology. 

How get this adopted?  I'd go after the IC makers first for the reasons above and time the changes for an already-scheduled refresh.  I know this is not done most of the time, but I think it's because of a lack of relationships at the IC level.  I'd go after Sigma Designs or TI first.  I'm on the defense side of the biz, so I don't know if these market leaders are hungry or complacent in this consumer market, but if complacent, go after their competitors who may be more open to something that will differentiate their product.  You want fast adoption, go after the IC designers.

I hope we can at least agree that a rigorous sound comparision with well-executed implementations of the two approaches would be very informative.

Replay Gain specification

Reply #24
I guess I shouldn't have lumped this in with the thread you started about bit-perfect playback then.  Seeing that bit-perfect playback is about passing digital data from your media player unaffected through your soundcard to your external DAC, I don't see RG information being passed downstream to your preamp, integrated amplifier or receiver as a trivial endeavor.

Otherwise what you're suggesting has already been implemented in audio hardware such as the various Squeezebox devices or DAPs enabled with Rockbox or in a comparable but non-RG manner such as the iPod devices via soundcheck.

Anyway, if you haven't already, please read-up on this forum as I don't think any of this is new territory.

If I'm wrong on any of these points, please feel free to correct me provided someone else doesn't beat you to it.