I'm applying track level ReplayGain primarily for when I'm shuffling music, and it works fine for the majority of cases but it doesn't if the track is largely the spoken word.
The album 'A Star Is Born' consists of music from the film as well as a number of tracks that are excerpts from the movie just to link the songs together. These tracks consist of characters talking to each other, and because there's a lot of silence in the track (or because the peaks are relatively low if the silence isn't part of the calculation) ReplayGain is amplifying the gain to an extent where it's much higher that the rest of the songs.
Now I could obviously adjust them manually by ear, but is there a better, more logical way to do this?
In the absence of anything better, I know the target for music is 89dB so maybe I can set another for the spoken word and use that to calculate them accordingly?
Track Duration Name Album gain Track gain Album peak Track peak
23 2:57 I Don't Know What Love Is -9.64 -9.86 0.998840 0.998840
24 0:18 Vows -9.64 +5.87 0.998840 0.401215
AFAIK, ReplayGain measures "average" loudness. Guessing that't the problem.
I also noticed that it's often not the best way to equalize loudness across tracks/albums. Stuff with higher dynamic range ends up being too loud, for example. And it doesn't ignore silence, which is a shame.
Maybe some day I'll implement some kind of script to measure&tag files using different metric according to my taste (but the same tags format). Will probably be something like:
1. completely skip everything that's quieter than -60 dB (in relation to the file's peak level) or so
2. apply inverse of "equal loudness curve" EQ
3. measure RMS levels in small blocks, take top 80% percentile value
I haven't even researched this enough, maybe there's something that already does this.
Also it's definitely subjective, there probably isn't an ideal solution for everyone. Especially since people listen on devices with different bandwidth and frequency response.
Before that... another way is to use Album gain instead of Track gain, if you trust mastering engineer's job (sometimes they do questionable decisions like making 1 track much louder than everything else, but usually it makes sense to listen to an album with a constant gain)
Also have you tried this component? https://www.foobar2000.org/components/view/foo_arg
You say "And it doesn't ignore silence, which is a shame" which was my original assumption, so I used Audacity to aggresively truncate the silence (or to be honest, even the tailing off of words) which resulted in a 6 second track instead of 18 seconds, but the Track Gain went from +5.87 to +3.54, not as much of a difference as I expected.
To test this to the extreme I created a track that consistsed of a one second tone, 10 seconds of silence then another one second tone.
I then removed the silence from the above track and processed both with ReplayGain as an album.
album gain track gain album peak track peak
tones with silence -18.68 -18.36 0.800354 0.800232
continous tone -18.68 -19.03 0.800354 0.800354
Notice the track gain isn't affected much by the silence at all, so maybe it does ignore silence (or at least it contributes very little).
I tried the alternative ReplayGain and it actually raised the Track Gain to +10.78 from +5.87 so that's of no use, although the homepage of the component does indicate that "You may want to use a -6dB or lower preamp level with classical music if you dislike limiters/compressors", so I guess that's affected similarly.
So, my own "speech" standard that's [?]dB below the 89db used for music, or as you say, use the album gain instead.
Thanks for your input.
There are different replaygain algorithms out there, but the original RG spec divides the track into short blocks, computes the loudness of each, and then sorts all the blocks by loudness. The block with the 95 percentile loudness is used. So adding a lot of silence will shift which block is at 95% slightly, but since the 19/20 quieter blocks are ignored, you'd have to add a LOT of silence to make much difference.
In this case you could try different replaygain algorithms. I'm not sure which you're using, but both the EBU R-128 and RG2 specs are often claimed to be more accurate over a diverse range of material than the original spec. One of them might work better. The EBU R-128 algorithm might be a good one to try, since its made for broadcast material which is mostly going to be speech.
This is intended for speech (where it had a very good reputation), not music, but maybe there is a way to apply it to your material.
It I had an album or two to 'fix' I would do it manually by selecting the parts that were unsatisfactory (adjusting the ends of the selection to zero crossing points where possible) and using amplify to move the volume up or down as desired (or do more complex compression and limiting). Of course this is much easier if one is starting with lossless audio. If your material is mp3 it could probably be done with something like mp3DirectCut, snipping parts out and pasting them back in after modification, but I've never tried such operations.
You can also split it to tracks and modify individual track's gain values, then there's no need to irreversibly alter actual audio data.
(if you don't care about the result on players which don't support these tags)
Apologies that I wasn't clearer in my original post, but when I said "I'm applying track level ReplayGain primarily for when I'm shuffling music", what I was alluding to was that I'm actually only tagging the tracks with the ReplayGain info, and only using Track Gain when I'm shuffling, when I'm listening "critically" (albums in their entirety) I don't apply any auto gain, so adjust the volume as required.
I'm adding ReplayGain tags to my FLAC files using foobar which I believe uses EBU R-128, and serving the files using MinimServer (or foobar if at the desktop).
I store my music track based, and with this album at least I think there's only one speech track that blends slightly into the next song track so the tag approach should work quite nicely, but it looks like there's always going to be a need to change these outliers manually, as the algorithm itself can only go so far.
Thanks again everyone for your input.