Completely agree with greynol that dynamic range is more than adequate for 16-bit audio and adjusting even as low as 83dB SPL gives more than enough dynamic range to reproduce a highly dynamic orchestral piece transparently and hiss-free - and indeed many orchestral CDs would be at around 83 dB SPL perceived loudness, giving about 20 dB of headroom for peaks (K-20 on Bob Katz's metering scale).

People often say that CDs have about 96dB of dynamic range and human ears need 120 dB from threshold of hearing to pain threshold.

This isn't strictly accurate.

There's about 96 dB of maximum signal-to-noise ratio over the whole 22 kHz bandwidth, but that's treating it like an oscilloscope, as if you needed to see the peaks and troughs on screen to hear them. (Being an electronics engineer of sorts, this is my instinct, and I had to learn that's not relevant to human hearing). Given that the human ear perceives audio in critical frequency bands like a spectrum analyser rather than as a waveform, a closer representation is something like a 1024-point FFT power spectrum of the signal, which many audio editors can display. A full scale sine wave is typically about 120 dB above the noise floor assuming straightforward flat dither, which could encompass a range from rooms about 20-30dB quieter than any real quiet listening room all the way up to noises as loud as chainsaws heard without ear protection. With noise-shaped dither, the possible range is even more extreme.

**This old post of mine shows you in pictures based on 16-bit 44.1kHz audio how the dynamic range of CD-audio is a good deal more than 102 dB, which itself is well beyond the oft-quoted 96 dB.** If your volume control is set loud enough to discern the quietest -102dBFS sine wave, it's painfully/dangerously loud to discern the loudest one at 0dBFS!

We've also had mention of percentiles, which might need explaining:

The percentile measure is calculated like median. Median is the 50th percentile.

For example, if I have an array of just twenty-one numbers representing loudness measures at twenty-one moments in time:

Original array:

**3 6 8 6 4 7 7 4 3 2 9 8 5 2 0 3 2 3 8 1 3**To find the mean average, you add them and divide by 21.

MEAN = 94/21 = 4.476 (rounded to 4 significant figures)

To find the MODE, you look for which number is most commonly seen in the array. In this case it's 3.

To find the MEDIAN, you sort the list into numerical order and look halfway along the list. If halfway falls between two samples, take the mean of the values to the left and to the right:

Sorted array:

**0 1 2 2 2 3 3 3 3 3 4 4 5 6 6 7 7 8 8 8 9**MEDIAN is the value in 11th position (underlined) in the sorted array (there are 10 numbers less than or equal to the median on the left and also 10 numbers greater than or equal to the median on the right of the sorted array).

The n'th percentile corresponds to the number n% of the way along the array from left to right. To make this easier to work out, we'll number the array positions starting at zero, i.e. 0 to 20 (rather than 1st to 21st which is simply one greater)

Clearly the 100th percentile is the rightmost number, i.e. 9.

The 0th percentile is the leftmost number, i.e. 0.

The 50th percentile is in position 20*(50%/100%) = position 10 = the 11th position = the MEDIAN = the underlined 4.

If we look for the 55th percentile, starting from index 0, it's in position 20*(55/100) = 11, i.e. the 12th position. That's also 4.

The 57½th percentile would be position 20*(57.5/100) = 11.5. That's midway between the 12th and 13th values in the array, which are 4 and 5, so we can say the answer is 4.5

Try it in a spreadsheet of these numbers in a column, using the function =PERCENTILE(A1:A21, 0.575)

If you try 56.25 %-ile, =PERCENTILE(A1:A21, 0.5625) the position is 11.25 in the array, so making a straight line fit between 4 and 5, 0.25 of the way along the answer is 4.25

The 95th percentile in this case is the number in position 19, i.e. the 20th out of 21 numbers, which is 8.

In Replay Gain, a piece of music might have about 20 loudness measures per second and last 4 minutes (240 seconds), giving an array of about 4800 floating point loudness values. In this case the array is sorted into ascending order and the 95th percentile will have index 4799 * (95/100) = 4559.05 in the sorted list. For simplicity we can set Track Gain to be the value in position 4559 of the sorted list (in fact this is the 94.999th percentile). If we're pedantic about our percentile, we don't round off the 0.05 at the end and take 0.05*Array[4560] + (1-0.05)*Array[4559], which is very likely almost the same, so why bother being pedantic. There's nothing magic about the 95th percentile, it was just found to be a good match for human loudness perception in testing.

Album Gain does the same thing as if the album were a single track, so a 40 minute album would be 2400 seconds long, with 48000 floating point loudness values. The 95th percentle would have index 47999 * (95/100) = 47999 * 0.95 = 45599.05 in the sorted array. So we're likely to take index 45599 from the sorted array as our Album Gain value.

That's why you can't calculate Album Gain from a bunch of Track Gain values. All you can say is that Album Gain must be no greater than the highest Track Gain value but will be near the top end of the Track Gain values, and Album Peak will equal the highest Track Peak.

Some implementations of Replay Gain are using the newer EBU standard R-128 instead of this percentile method, which is different again, and copes well with both music and spoken dialogue in broadcast TV shows, for example, but broadly agrees with Replay Gain and human ears on music.