Hi,
I tried to implement the
Harmonic Product Spectrum like it is described for instance in this Introduction to Signal Processing chapter (http://www.scribd.com/doc/50529329/154/Harmonic-product-spectrum).
The issue I have is that the peak is always detected at the lower frequencies with the various music samples I tested. But I'm certainly doing something wrong, so I'll describe the process I've followed so far. First, the basis:
- Split the audio samples in windows of size N=1024
- Apply a Hann window on these samples
- Run a FFT on those samples to get N/2+1 bins
- Compute the magnitude buffer with a hypot(re,im) giving a spectrum of len N/2 + 1
Those first steps are verified and OK, so I won't detail the implementation here.
So now, concerning HPS:
I first create a
f0 histogram of length
(N/2 + 1) / M,
M being the
number of downsampling - 1 (here, M=3). Each windows processing will increment the index of fundamental frequency found. Here is the code ran for each window:
for (i = 0; i < (N/2 + 1) / M; i++) {
// multiply downsampled (M-1 times) magnitudes of length N/2 + 1
float mul = 1;
for (n = 1; n <= M; n++)
mul *= magnitude[i * n];
// update maximum magnitude and get its related frequency
if (mul > max)
max = mul, freq_id = i;
}
f0[freq_id]++;
And at the end I pick the higher value in
f0 in order to get the fundamental frequency of the whole song. But since the higher magnitudes are always in the lower frequencies, the HPS results (peak in freq_id=0) are to be expected. So the question is: how is that really supposed to work?
And at the end I pick the higher value in f0 in order to get the fundamental frequency of the whole song. But since the higher magnitudes are always in the lower frequencies, the HPS results (peak in freq_id=0) are to be expected.
Sorry, I don't know what you mean by "fundamental frequency of the whole song"? I understand how the fundamental relates to a note or chord, but I don't know about a whole song... I would
assume that means the lowest frequency in the song???
That might work for a solo instrument, but if you are analyzing a recording of a rock band, the "fundamental frequency" is probably the kick-drum. If you want to analyze the musical notes, you might need to filter-out (or ignore) the percussion. You might also need to ignore the attack and analyze the sustained part of the note/chord.
And at the end I pick the higher value in f0 in order to get the fundamental frequency of the whole song. But since the higher magnitudes are always in the lower frequencies, the HPS results (peak in freq_id=0) are to be expected.
Sorry, I don't know what you mean by "fundamental frequency of the whole song"? I understand how the fundamental relates to a note or chord, but I don't know about a whole song... I would assume that means the lowest frequency in the song???
I am looking for the overall pitch of the song, so the histogram is here to count fundamental frequency of each window and grab the dominant one.
That might work for a solo instrument, but if you are analyzing a recording of a rock band, the "fundamental frequency" is probably the kick-drum. If you want to analyze the musical notes, you might need to filter-out (or ignore) the percussion. You might also need to ignore the attack and analyze the sustained part of the note/chord.
I'm looking for a way to extract the pitch of songs of any kind as best as possible, maybe HPS isn't what I need. Trying to filter-out some specific sounds might require a lot of heuristic I don't really want to deal with at first…
If you have a few samples where HPS applies, I'm interested in them: I could check if at least the algorithm is implemented correctly and that my target (whole song instead of specific musical notes) is just wrong.
Note that I'm kind of new to all of this so I'm certainly mixing up a bunch of things (you certainly have already noticed it).
The issue I have is that the peak is always detected at the lower frequencies with the various music samples I tested.
Maybe I'm wrong, but my guess would be you should apply some sort of equal loudness curve compensation to the spectrum.
Also, the window size probably has to be optimized, maybe even dynamically optimized. Again, I can't tell you how exactly, but the word "autocorrelation" comes to mind.
The issue I have is that the peak is always detected at the lower frequencies with the various music samples I tested.
Maybe I'm wrong, but my guess would be you should apply some sort of equal loudness curve compensation to the spectrum.
Also, the window size probably has to be optimized, maybe even dynamically optimized. Again, I can't tell you how exactly, but the word "autocorrelation" comes to mind.
I can't easily change the window size in the context of my app unfortunately. However, I started implementing the YIN method, and it seems much more efficient so I'll stick with that. It is "autocorrelation" based, so no spectrum comes into play, but results sound better.