Skip to main content

Topic: Replaygain at high sample rate (Read 6996 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Ernst
  • [*]
Replaygain at high sample rate
Hi,

I noticed that replaygain (as applied by metaflac) only works on tracks with a sample rate upto 48 khz.
I'd like to apply it to all my audio files and I do now have some with 88.2, 96, or 192 khz sample rate.
Looking into the code I see the following:
Code: [Select]
FLAC__bool grabbag__replaygain_is_valid_sample_frequency(unsigned sample_frequency)
{
    static const unsigned valid_sample_rates[] = {
        8000,
        11025,
        12000,
        16000,
        22050,
        24000,
        32000,
        44100,
        48000
    };
    static const unsigned n_valid_sample_rates = sizeof(valid_sample_rates) / sizeof(valid_sample_rates[0]);

    unsigned i;

    for(i = 0; i < n_valid_sample_rates; i++)
        if(sample_frequency == valid_sample_rates[i])
            return true;
    return false;
}


In other words, a limit, but it looks as if I could just add other values. However, 96khz would go over the unsigned 16-bit integer limit, so changing to 32-bit integers might be necessary. Would this make anything run into trouble? I've yet to try, which I'll hopefully do later today.

Now the guesswork:
Have these rates been chosen as commonly used?
Is this check at all necessary? Isn't it enough for the sampling rate to be a positive integer?
Is this only in the flac implementation or is this part of the proposed replaygain specification?
Is the reason for the lack of higher sampling rates that at the time calculation for them would have been very slow (in this case it would be better to put up a warning instead)?
Is this because the replaygain reference values are based on psychoacoustics and those are lacking for frequencies we don't hear but which would be present in the higher sampling rate signals?

  • [JAZ]
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #1
unsigned means unsigned int, and unsigned int has been 32bits since the windows 95 era (even before, in some compilers). This won't be a problem to add higher sampling rates.

Next, the sample rates themselves: Those are the typical sample rates of audio

44.1Kz = CD , 22.05 = half CD, 11.025 = quarter CD  (the value of CD has a relation with analog video, but that's for another topic)
48Khz = DAT, 24 = half DAT, 12 = quarter DAC.
32Khz = FM?, 16Khz half FM, 8Khz = analog telephone.

I haven't checked the replaygain algorithm of FLAC, but there's nothing in replaygain specific to sampling rates. What is obvious is that replaygain is a psychoacoustic algorithm, so an implementation may be implemented for specific sampling rates. One such cases can be the design of a lowpass filter.

  • tuffy
  • [*][*][*]
Replaygain at high sample rate
Reply #2
I think what's missing is a set of equal loudness filter coefficients for higher sampling rates.  If someone could crank out a set of them, it shouldn't be hard to add.

  • DVDdoug
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #3
I'll make a couple of guesses & assumptions...  I can't think of any reason why the code couldn't be revised to work with higher sample rates.

Quote
In other words, a limit, but it looks as if I could just add other values. However, 96khz would go over the unsigned 16-bit integer limit, so changing to 32-bit integers might be necessary.
I don't know what you're getting at here...  There's no relationship between the bit-depth and the sample rate.  I haven't studied the code, but most DSP software converts the raw audio data to 32-bit floating point and then converts back at the end.

Quote
Is this because the replaygain reference values are based on psychoacoustics and those are lacking for frequencies we don't hear but which would be present in the higher sampling rate signals?
I assume the psychoacoustic model will treat anything supersonic as inaudible.  And, the loudness-curve filters have to be adjusted for the sample rate in any case...  since digital filters are simply working on a sequence of numbers, in order for the filter to work properly it has to "know" the sample rate.

  • lvqcl
  • [*][*][*][*][*]
  • Developer
Replaygain at high sample rate
Reply #4
I think what's missing is a set of equal loudness filter coefficients for higher sampling rates.  If someone could crank out a set of them, it shouldn't be hard to add.


wvgain accepts input samplerates up to 192000. From wvgain.c:

Code: [Select]
// These are the filters used to calculate perceived loudness. The table data was copied
// from the Foobar2000 source code.

  • Yirkha
  • [*][*][*][*][*]
  • FB2K Moderator
Replaygain at high sample rate
Reply #5
Some filter coefficients in ReplayGain need to be changed for different sample rates. The original implementations supported only the rates listed above, but this has been subsequently amended later in some programs. For example, Menno kindly provided additional values for fb2k's RG scanner in [a href='index.php?act=findpost&pid=145716']2003[/a], I don't know about the other implementations. Further, this [a href='index.php?showtopic=60188']topic[/a] might be interesting as it deals with a similar question.

Full-quoting makes you scroll past the same junk over and over.

  • Ernst
  • [*]
Replaygain at high sample rate
Reply #6
I'll make a couple of guesses & assumptions...  I can't think of any reason why the code couldn't be revised to work with higher sample rates.

Quote
In other words, a limit, but it looks as if I could just add other values. However, 96khz would go over the unsigned 16-bit integer limit, so changing to 32-bit integers might be necessary.
I don't know what you're getting at here...  There's no relationship between the bit-depth and the sample rate.  I haven't studied the code, but most DSP software converts the raw audio data to 32-bit floating point and then converts back at the end.

In some compilers int still means 16-bit. So the number 96000 could not be represented.

I think what's missing is a set of equal loudness filter coefficients for higher sampling rates.  If someone could crank out a set of them, it shouldn't be hard to add.


wvgain accepts input samplerates up to 192000. From wvgain.c:

Code: [Select]
// These are the filters used to calculate perceived loudness. The table data was copied
// from the Foobar2000 source code.



Maybe I'll copy from there to metaflac to see what works.

Some filter coefficients in ReplayGain need to be changed for different sample rates. The original implementations supported only the rates listed above, but this has been subsequently amended later in some programs. For example, Menno kindly provided additional values for fb2k's RG scanner in [a href='index.php?act=findpost&pid=145716']2003[/a], I don't know about the other implementations. Further, this [a href='index.php?showtopic=60188']topic[/a] might be interesting as it deals with a similar question.

Nice links.

Of course I did not yet get around to trying something, probably soon.

  • saratoga
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #7
I'll make a couple of guesses & assumptions...  I can't think of any reason why the code couldn't be revised to work with higher sample rates.

Quote
In other words, a limit, but it looks as if I could just add other values. However, 96khz would go over the unsigned 16-bit integer limit, so changing to 32-bit integers might be necessary.
I don't know what you're getting at here...  There's no relationship between the bit-depth and the sample rate.  I haven't studied the code, but most DSP software converts the raw audio data to 32-bit floating point and then converts back at the end.

In some compilers int still means 16-bit. So the number 96000 could not be represented.


In c int size is generally thought of as CPU dependent, rather then compiler dependent.  So on a 16 bit machine (e.g. a 286) int might be 16 bit.  However, most likely metaflac would not compile on such a device without being ported to a 16 bit architecture.  Its rare for ordinary programs to function on machines with int < 32 bits.

  • Ernst
  • [*]
Replaygain at high sample rate
Reply #8
I did get a few higher sampling rates set up in metaflac. I copied the values from mp3gain, adding 64, 88.2, and 96. Now I just need to convert the matlab program for calculating the filters to octave, scilab, or plain c/c++ to get other rates as well.

Replaygain at high sample rate
Reply #9
I noticed that replaygain (as applied by metaflac) only works on tracks with a sample rate upto 48 khz.

See http://lists.xiph.org/pipermail/flac-dev/2...ary/003064.html

  • Nessuno
  • [*][*][*][*]
Replaygain at high sample rate
Reply #10

In some compilers int still means 16-bit. So the number 96000 could not be represented.

In c int size is generally thought of as CPU dependent, rather then compiler dependent.  So on a 16 bit machine (e.g. a 286) int might be 16 bit.  However, most likely metaflac would not compile on such a device without being ported to a 16 bit architecture.  Its rare for ordinary programs to function on machines with int < 32 bits.


Pedantically speaking, and if my memory do not fail, int size is not defined by ANSI C standard and thus is implementation dependent: a compiler could have 32 bit int even in a 16 bit architecture and vice versa. So if you have to rely on a fixed width type and still write portable code, you must test for sizeof(int) and accordingly define an alias for long or short to use in your code.

I don't know if and how this could apply to Replaygain code.
... I live by long distance.

  • saratoga
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #11
Pedantically speaking, and if my memory do not fail, int size is not defined by ANSI C standard and thus is implementation dependent: a compiler could have 32 bit int even in a 16 bit architecture and vice versa.


To be pedantic, I didn't say that it was defined by the CPU.  I said it was thought of as defined by the CPU, because it is.  The reason the standard doesn't define the absolute widths of variables is to allow efficient programming on systems with different word sizes.  So compilers are given the freedom to pick the optimal size for a given system.  But the reason is to accommodate different CPUs, and since all modern CPUs are 32 bit or higher, you need not worry about a 16 bit compiler being installed a contemporary computer.

if you have to rely on a fixed width type and still write portable code, you must test for sizeof(int) and accordingly define an alias for long or short to use in your code.


For modern c variants, you'd probably use the c99 int32_t typedefs.

I don't know if and how this could apply to Replaygain code.


As I said last year, it does not apply at all to the code in question.

  • Nessuno
  • [*][*][*][*]
Replaygain at high sample rate
Reply #12
To be pedantic, I didn't say that it was defined by the CPU.


I didn't say you said...
... I live by long distance.

  • Wombat
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #13
When i remember right it was asked a long time ago to add support for higher sampling rates into metaflac. Unfortunately i didn´t see much action taking place regarding the developement of flac/metaflac.
Most likely Mr. Coalson has better things to do lately. The bugtracker at sourceforge doesn´t indicate much operation also. I even would like to have a metaflac with the chance to use R128 standard as alternate choice.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

  • romor
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #14
I noticed that replaygain (as applied by metaflac) only works on tracks with a sample rate upto 48 khz.

See http://lists.xiph.org/pipermail/flac-dev/2...ary/003064.html

From that data seems obvious that Butterworth coefficients are linearly correlated, with slight deviation at 18900 set. So those can be obtained for arbitrary rate
Only that Yulewalk are not so obvious just by looking in the data
Perhaps with some analytical skills?

  • 2Bdecided
  • [*][*][*][*][*]
  • Developer
Replaygain at high sample rate
Reply #15
See...
http://lists.xiph.org/pipermail/flac-dev/2...ary/003067.html

I just sent Ernst the filter coefficients for 28kHz.

Cheers,
David.

  • romor
  • [*][*][*][*][*]
Replaygain at high sample rate
Reply #16
Quote from David's link:
Quote
The extended gain analysis tables in Foobar2000 (and the derivatives that copied it) are wrong.

Here is how to show that it's wrong. Use <http://www.daniweb.com/software-development/python/code/263775> to create
a 1 kHz signal using a 48 kSamples/sec rate (48.wav) and and a 192
kSamples/sec rate (192.wav). I modified the script to generate 2s of
samples.

Because the underlying signal is the same (1kHz with a fixed amplitude) the perceived loudness should be identical,
independent of the sampling rate.


Is it?



I used that messy script, which shows itself on Google on multiple places, just to confirm that we are talking about same thing, but that's not the way to treat Python. Numpy exists for reason, even if someone's testing procedure is limited on only two audio files. It's just a shame to use bare Python for audio testing.

Assuming:
Code: [Select]
import numpy
from scipy.io import wavfile
from scipy.signal import chirp

Here are some oneliner functions to bake the data, just follow your imagination, it's fast and easy:
Code: [Select]
def np_tone(fs, freq=440, t=2): return np.array(16384*np.cos((2*np.pi*freq/fs)*np.arange(fs*t)), dtype=np.int16)

def np_rand(fs, t=2): return np.array(16384*np.random.random_sample(fs*t), dtype=np.int16)

def np_chirp(fs, t=2): return np.array(16384*chirp(np.linspace(0, t, fs*t), fs/2, .002, t), dtype=np.int16)

then create for example 1 kHz tone in stereo (double mono):
Code: [Select]
data = np_tone(44100, freq=1000)
wavfile.write('44100.wav', 44100, np.column_stack((data, data)))


BTW, RG for some reason does not work for 176400 both with wvgain, and with same table: http://goo.gl/JwxaT using other tool. Perhaps coefficients are bad for that rate.