Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Graph FFT analysis (Read 8681 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Graph FFT analysis

Hi,

I'm trying to make this kind of graph from an audio file:



Unfortunately, the only thing I'm able to get is this:



So here is what I do and how I understand it:

- Prepare a spectrum array of N/2+1 floating values (N=512, which is approximately a bit less than 1sec with a 44.1KHz source)
- Decode the audio and get series of PCM amplitudes data
- Split those data in windows of N samples each
- For each samples window, make a Fourier transform on it to get N/2+1 complex values. For each complex, re²+im² is added to spectrum, with i going from 0 to N/2+1
- When every windows has been processed, I have an average spectrum of the whole input file, so with i going from 0 to N/2+1, I make the graph with:
X = i / N
Y = 10 * log10(spectrum / number_of_windows)

So, I'm certainly be doing something completely absurd, and missing a lot of stuff, but maybe you can enlighten me a bit.

Thanks.

Graph FFT analysis

Reply #1
Well, the short answer is that you're not windowing (apodizing) your data, and the spectral leakage is causing the blatantly obvious 1/f sidelobe that's drowning the real data. Note that I mean "windowing" as referring to applying a windowing function, not to splitting up the data into segments.

You should also be using overlapping segments as per Welch's method, but that is a relatively minor quibble that doesn't materially affect how the plot "looks" (but it does effect higher-quality results). If you want to go completely over-the-top, use a power-complementary window function too.

The X-axis also ought to be scaled by Fs (0-22050 instead of 0-0.5), so, X = i*Fs/N.

Your Y-axis is screwed up unless you really are feeding it signals which are +30dbFS. Generally these plots will never rise above 0dbFS, unless you are showing a spectrum of a pure sine wave of amplitude 1, *and* you're not plotting RMS power figures (which I'm pretty sure would be a bug?). You probably need to scale it by N, so: Y = 10 log10 (spectrum / nWindows / N)

Label your axes.

Graph FFT analysis

Reply #2


Is that actually incorrect?  It looks like a reasonable DFT to me.

Also, FWIW 512 samples is a lot less then 1 second at 44100 samples per second.

Edit:  If you're just trying to compute the FFT of some mp3 file, just FFT it all in one big transform and then sum every few samples of the output so that you don't have to worry about windowing.

Graph FFT analysis

Reply #3
Well, the short answer is that you're not windowing (apodizing) your data, and the spectral leakage is causing the blatantly obvious 1/f sidelobe that's drowning the real data. Note that I mean "windowing" as referring to applying a windowing function, not to splitting up the data into segments.


By windowing, do you mean applying some kind of Hamming window on each samples window before calling the FFT? If I do that, the result are not that much changed:

[/quote]

Sorry, fixed in the image above.

Graph FFT analysis

Reply #4
Also, FWIW 512 samples is a lot less then 1 second at 44100 samples per second.
True but given the sample plots he's trying to replicate 512 samples ought to be about enough for that level of frequency resolution.

Quote
Edit:  If you're just trying to compute the FFT of some mp3 file, just FFT it all in one big transform and then sum every few samples of the output so that you don't have to worry about windowing.
... that's not avoiding windowing, that's windowing with a truncated sinc function and sampling the resulting spectrum. Which is a materially crummier method than Welch.

Graph FFT analysis

Reply #5
Is that actually incorrect?  It looks like a reasonable DFT to me.


Maybe it's coherent, but It's not the result I expect from the pattern image I pasted above.

Also, FWIW 512 samples is a lot less then 1 second at 44100 samples per second.


Mmh, true :)


Edit:  If you're just trying to compute the FFT of some mp3 file, just FFT it all in one big transform and then sum every few samples of the output so that you don't have to worry about windowing.


Well, I'd like to make it progressive (memory usage, stream support, etc), and I think understanding this windowing is important.

Graph FFT analysis

Reply #6
Well, the short answer is that you're not windowing (apodizing) your data, and the spectral leakage is causing the blatantly obvious 1/f sidelobe that's drowning the real data. Note that I mean "windowing" as referring to applying a windowing function, not to splitting up the data into segments.


By windowing, do you mean applying some kind of Hamming window on each samples window before calling the FFT? If I do that, the result are not that much changed:

Use a different window function. Hamming is about the third-worst window function you could have used - since its endpoints are nonzero, it effectively behaves much like a rectangular window, except a little better (note how there is a little bit more detail in the Hamming plot, but not much).

A much, much better choice would be Hann, aka Hanning, aka raised cosine. But if you do that, you should be averaging amplitudes instead of powers, since Hann is amplitude-complementary but not power-complementary, and adjust your Y axis calculation accordingly. (Or choose a power-complementary window.)

Graph FFT analysis

Reply #7
that's not avoiding windowing, that's windowing with a truncated sinc function and sampling the resulting spectrum. Which is a materially crummier method than Welch.


True, but its easier and unlikely to give noticeably worse results given the large window size.

Graph FFT analysis

Reply #8
Use a different window function. Hamming is about the third-worst window function you could have used - since its endpoints are nonzero, it effectively behaves much like a rectangular window. A much, much better choice would be Hann, aka Hanning, aka raised cosine. But if you're summing power as opposed to amplitude, you might want to try a straight-up cosine window too since that ought to be power complementary, but depending on your dynamic range needs, you might need to use a more elaborate window.


I changed the function with Hann one, still similar results:



The function I use is:

Code: [Select]
static void hann(double *in, int n)
{
    int i;
    for (i = 0; i < n; i++)
        in[i] *= 0.5 * (1 - cos(2 * M_PI * i / (n - 1)));
}


Which is called on each N samples before calling the FFT.

I still don't understand why it does not look like at all the spectrum I pasted in first post. Also, I'm wondering about the frequencies available above > 16kHz since it's a MP3… So there is definitely something wrong.

Graph FFT analysis

Reply #9
Post your code.


Graph FFT analysis

Reply #11
Might as well post your data while you're at it...


Graph FFT analysis

Reply #13
You're not separating the interleaved channel data before running the FFT. That explains everything.

Graph FFT analysis

Reply #14
You're not separating the interleaved channel data before running the FFT. That explains everything.


What do you mean? Left and right channels? I downmix so there is only one channel when preparing the data:

Code: [Select]
pa.in[pa.filled++] = (pcm[i] + pcm[i + 1]) / 2 / (double)SHRT_MAX; // downmix + int16_t to float


Edit: mmh I messed with the 'i':

Code: [Select]
for (i = 0; i < n; i += 2)


This is better. And here is the result:



It looks better, but it's still not perfect.

Graph FFT analysis

Reply #15
You're not separating the interleaved channel data before running the FFT. That explains everything.


What do you mean? Left and right channels? I downmix so there is only one channel when preparing the data:

Code: [Select]
pa.in[pa.filled++] = (pcm[i] + pcm[i + 1]) / 2 / (double)SHRT_MAX; // downmix + int16_t to float


Oh. I missed that line :F but the summation takes place under int16_t type rules, which causes major overflow issues. It probably ought to always be done in double precision.

Code: [Select]
pa.in[pa.filled++] = ((double) pcm[i] + pcm[i + 1]) / 2 / SHRT_MAX; // downmix + int16_t to float


This is also a very plausible explanation for what you're seeing.

Graph FFT analysis

Reply #16
You're not separating the interleaved channel data before running the FFT. That explains everything.


What do you mean? Left and right channels? I downmix so there is only one channel when preparing the data:

Code: [Select]
pa.in[pa.filled++] = (pcm[i] + pcm[i + 1]) / 2 / (double)SHRT_MAX; // downmix + int16_t to float


Oh. I missed that line :F but the summation takes place under int16_t type rules, which causes major overflow issues. It probably ought to always be done in double precision.

Code: [Select]
pa.in[pa.filled++] = ((double) pcm[i] + pcm[i + 1]) / 2 / SHRT_MAX; // downmix + int16_t to float


This is also a very plausible explanation for what you're seeing.


This fix results above 3-4 decimals after the comma. Thanks. Though, graph is looking the same.

Graph FFT analysis

Reply #17
that's not avoiding windowing, that's windowing with a truncated sinc function and sampling the resulting spectrum. Which is a materially crummier method than Welch.


True, but its easier and unlikely to give noticeably worse results given the large window size.
Doesn't that effectively limit the usable SNR to perhaps no more than 50db in the absolute best case scenario? As opposed to 200+db?

Graph FFT analysis

Reply #18
I finally got it. I messed with the number of channels, you were right. Here is the result:

Lossless song:



Same song with lossy compression:



Thanks a lot for your help. Here is the fixed code: http://pastie.org/1416138

Maybe it's still not perfect, but it is at least much more correct imo.

Graph FFT analysis

Reply #19
Well, the short answer is that you're not windowing (apodizing) your data, and the spectral leakage is causing the blatantly obvious 1/f sidelobe that's drowning the real data. Note that I mean "windowing" as referring to applying a windowing function, not to splitting up the data into segments.


By windowing, do you mean applying some kind of Hamming window on each samples window before calling the FFT? If I do that, the result are not that much changed:

Use a different window function. Hamming is about the third-worst window function you could have used - since its endpoints are nonzero, it effectively behaves much like a rectangular window, except a little better (note how there is a little bit more detail in the Hamming plot, but not much).

A much, much better choice would be Hann, aka Hanning, aka raised cosine. But if you do that, you should be averaging amplitudes instead of powers, since Hann is amplitude-complementary but not power-complementary, and adjust your Y axis calculation accordingly. (Or choose a power-complementary window.)



And, use 1/2 overlap.  When you window with anything you need to overlap. With a Hann, .5 is good.
-----
J. D. (jj) Johnston

Graph FFT analysis

Reply #20
Wouldn't using Hann with 1/2 overlap require him to average amplitudes instead of powers?

Or is it ever correct to average amplitudes?

Graph FFT analysis

Reply #21
Wouldn't using Hann with 1/2 overlap require him to average amplitudes instead of powers?

Or is it ever correct to average amplitudes?


Well, averaging power is more like what the ear does, BUT since Hann is amplitude-complementary there is some argument, I suppose, for summing abs(x(w)), but not signed amplitude, or complex amplitude. That gets you an interesting undersampled spectrum with some unusual characteristics, to say the least.
-----
J. D. (jj) Johnston

Graph FFT analysis

Reply #22

What do you mean? Left and right channels? I downmix so there is only one channel when preparing the data:
Code: [Select]
pa.in[pa.filled++] = (pcm[i] + pcm[i + 1]) / 2 / (double)SHRT_MAX; // downmix + int16_t to float

Oh. I missed that line :F but the summation takes place under int16_t type rules, which causes major overflow issues.

No it doesn't. In C and C++ we have a thing called "integral promotion". If the type int is 32bit this code won't suffer from any under/overflows. But I agree with you that this should be rewritten -- just to remove the additional quantization error involved with the integer division.