Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FFT Analysis for Dummies (Read 73497 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

FFT Analysis for Dummies

Folks,

I'd like to learn more about FFTs. I'm not a math guy, so I imagine I'll never fully understand all the nuances. But I'd like to try anyway. I understand the general concept, that an FFT shows how much energy is present at different frequencies. What I'd like to know is how to set the various parameters such as FFT Size and Overlap, when to use the different types of window smoothing and why, and so forth. Below is a list of settings in the Rightmark FFT analyzer with my associated questions, and hopefully this is a good place to start.

[blockquote]FFT Size: I understand that the higher the number, the better the frequency resolution. So why is this
even adjustable? Why not just use the highest resolution possible automatically?

Zero Padding: This ranges from None through 8x. What does this do?

FFT Overlap: What is this for, and when would you use higher or lower values?

FFT Window: I recognize some of the names, but have no idea when or why one would select these choices.

Kaiser Window beta: I have no idea what this does either.

I noticed that Sound Forge lets you pick the number of slices to show. What are slices and why is more than
one needed to analyze a Wave file?[/blockquote]
I realize this is a lot to ask! If anyone knows of a good newbie-level tutorial that explains this in plain English with minimal math, I'd love to see it. Everything I've found through Google starts right in with math that's way over my head.

--Ethan
I believe in Truth, Justice, and the Scientific Method

FFT Analysis for Dummies

Reply #1
[indent]FFT Size: I understand that the higher the number, the better the frequency resolution. So why is this
even adjustable? Why not just use the highest resolution possible automatically?

"better frequency resolution" means "worse time resolution", and vice versa.

FFT Window: I recognize some of the names, but have no idea when or why one would select these choices.

Kaiser Window beta: I have no idea what this does either.

There's a good Wikipedia article about window functions:
Quote
In summary, spectral analysis involves a tradeoff between resolving comparable strength signals with similar frequencies and resolving disparate strength signals with dissimilar frequencies. That tradeoff occurs when the window function is chosen.

FFT Analysis for Dummies

Reply #2
"better frequency resolution" means "worse time resolution", and vice versa.


Believe it or not, this still means very little.

Not everyone uses the same words the same way.   

Paul

       

"Reality is merely an illusion, albeit a very persistent one." Albert Einstein

FFT Analysis for Dummies

Reply #3
In most music the frequency changes over time. Very often there are multiple frequecies present at any given instant. If the frequencies are resolved more fully for a given period of time, for instance, displayed as graphic or numerical data, the information of when any particular frequency occurs in the selected period is less precisely available.

FFT Analysis for Dummies

Reply #4
Ok.

An FFT is a discrete-time, finite length linear algebraic transform. It is orthonormal, which is to say it is a tight frame, or 1 to 1 and onto.

It uses a set of basis vectors that are derived directly from the complex exponential used in the continuous-time, continuous-frequency Fourier Transform.

As a result of the periodic nature of the FFT basis vectors, it is periodic across block boundaries, i.e. it looks like the data repeats to infinity, if you are doing a 2-point fft on two values, say 1 and -1 (yes, that's ridiculous, but it makes the point), the result is the same as calculating the full integral form on the infinite sequence 1 -1 1 -1 1 -1 1 -1 1 -1 ....

Among other things, this means that for signals that are not periodic at a block length, there will be a discontinuity at the ends.

This is why you do windowing, it removes the discontinuity at the ends of the block by windowing the signal to zero.

A window is nothing but the impulse response of a lowpass filter with specific properties.

I have to go shopping. So more later?

Some points, an FFT is not an approximation, nor is it a model. It is a precise transform with a precise inverse, one that obeys power and amplitude conservation both in the time and short-term frequency domain. In this it is actually marginally more precise than the full Fourier Transform, which suffers some zero-power, small amplitude issues, but that only with signals that can not exist in the real world, perhaps outside of astrophysics.

Must go, spouse grumbling.
-----
J. D. (jj) Johnston

FFT Analysis for Dummies

Reply #5
I'm not a super math dude but I have been reading math texts lately and I'm going to give this a try. If you don't know it, you can't teach it and all. Please be kind.

First off FFT refers to an optimized implementation of the discrete Fourier transform (DFT). The FFT produces the same results as a DFT.
The discrete Fouier transform is for sampled data (i.e. digital audio).

The basic idea is you transform a block of samples representing signal intensity from time 0 to time N to a block of samples representing frequency intensity from frequency 0 to frequency N. N is the window or block size. If you choose N large (e.g. as long as a whole track) you'll get detailed frequency information for the whole track but you won't have a clue where in the track those components exist. Using numerous shorter blocks, you can pinpoint where in time the various frequencies are occurring (e.g. when which notes are being played) but in using the shorter block you only get a coarse idea about what frequencies are present (e.g. can't distinguish C from C#).

To overcome this, frequency analysis applications will often use the longer blocks and instead of placing them one after another, the blocks will be overlapped. This sort of gives you the best of both worlds. I does require more processing to do the transform - if your overlap is 50%, you are computing FFTs for twice as many samples.

When you edit audio on a workstation you create clicks or other artifacts at the edit points. We have this same problem when we edit audio to do FFTs. In editing, we address this by cross fading at the edit points. And that's exactly what we do with FFTs. We apply an envelope to the audio data in each block (fade it up at the beginning, fade it back down at the end) before performing the FFT. Mathemiticians call these envelopes "windows". There are many shapes of windows because there are many compromises to be made when you're slicing and dicing like this.

FFT Analysis for Dummies

Reply #6
I had a very informative conversation about transforms with SebastianG that seems relevant to post in this thread:

You have PCM data and perform a transform to get a spectral view. Is this an exact process or is there more than one way (transform) to do it?
Quote
Generating spectral views a la Cool Edit is usually done by applying the FFT on overlapping "windows" (portions of the signal). There are a lot of possibilities regarding windowing functions, block size, how blocks overlap etc, though. Also, what you see -- coded via color -- is also just one half of the information (amplitude) and the phase informations are lost during display.


After the transform, can the data be stored in a "frequency" view and then losslessly transformed back into standard PCM data?
Quote
What's the point in going back? What are you aiming for? What exactly do you mean by "view"? There are various ways for these kinds of transformations with different trade-offs. There are those that blow up the amount of data (non-critically sampled) and those that retain the amount of data (critically-sampled) which are used in many lossy codecs for example. The latter group is full of different time/frequency resolution trade-offs and other things like phase response that can be considered. A prominent example is the MDCT (used by MP3,AAC,Vorbis,WMA). Mathematically the MDCT (the transform as a whole on many consecutive blocks with special care taken at the start and end) is reversible. In practise, however, computers work with finite precision arithmetics which means that rounding errors accumulate. So, it's not exactly lossless. But there is a way to make it work even for lossless codecs which is what Fraunhofer did (IntMDCT).


Is there any correlation between wordlength (16 bit vs 24 bit) and how the spectral content will be distributed/displayed?
Quote
These kinds of transform are usually defined on real numbers and applied with the help of floating point numbers in practise. No notion of wordlength. If you convert your signal from 24 bits to 16 bits then you're usually adding noise which can affect your "spectral view". In any case (16 or 24) you'd convert the samples to floating point numbers and run the FFT or something like that.

Cheers,
SG

FFT Analysis for Dummies

Reply #7
After the transform, can the data be stored in a "frequency" view and then losslessly transformed back into standard PCM data?


No.  In order to route data through a FFT transform, and then back through the inverse, and get your origional data back, you must use both the amplitude and phase information that the FFT produces. Only the amplitude information is required for a frequency analysis.

Here is one of the best discussions of FFT filtering that I've found yet:

A FFT filtering tool from the world of communicaions that may have more general usage

 

FFT Analysis for Dummies

Reply #8
The basic idea is you transform a block of samples representing signal intensity from time 0 to time N to a block of samples representing frequency intensity from frequency 0 to frequency N. N is the window or block size.


If you have 'n' real-valued samples (i.e. no complex numbers on input, as most PCM supplies) you have n/2 +1 complex frequency values, where (starting at n=0 for DC and ending at n=N/2 for Nyquist) you have two real values (DC and Nyquist) and N/2-2 complex values. The complex values are echoed (by conjugation, i.e. changing the sign of the imaginary part) at the negative frequencies (for a real signal, you have both positive and negative frequencies, one is the complex conjugate value of the other).

There are other transforms (DCT, DST, etc) that are real to real, and that provide N frequency components instead of N/2 complex components, effectively, but then phase is encoded in interesting ways inside of twice as many real values.

While it may seem odd that you have N/2 components, they are complex, so you actually have N results, half real, half imaginary.
-----
J. D. (jj) Johnston

FFT Analysis for Dummies

Reply #9
Ethan, perhaps the best single resource to answer this type of question is the Bruel and Kjaer online library:

http://www.bksv.com/Library/Technical%20Re...mp;st=1984-1980

The publications are not only aimed at engineers but also technicians and some at wholly non-technical people. There is rather a lot of information but I would suggest starting by browsing the earlier publications since the more complete explanations were often given when the techniques were relatively new.

FFT Analysis for Dummies

Reply #10
While it may seem odd that you have N/2 components, they are complex, so you actually have N results, half real, half imaginary.

Thanks JJ. Dumbing it down a little bit: You start with N samples. The DCT produces N/2 frequency amplitude results and N/2 corresponding phase results. The phase information is typically discarded in spectral analysis applications.

FFT Analysis for Dummies

Reply #11
Answering a bit more than your question asks, for anyone else who might be interested in this thread.

A transform is just a different way of representing the same set of data.  If we have three samples, 4, 7 and 12, we could use a polynomial transform and use transformed coefficients of 4, 2, 1 and recreate the original samples through the equation a + b * x + c * x^2 by plugging in x for each sample (f(0) = 4, f(1) = 7, f(2) = 12).  We havn't stored any less or any more data by using the transform rather than the samples, and we can change between them without loss.

A fourier transform is the same thing, but the equation to recreate the samples is a + b*sin(x) + c*cos(x) + d*sin(2x) + e*cos(2x) + ...
Taking the fourier transform is done by solving a giant system of equations to determine the correct coefficients to recreate the original samples out of this equation.
The fourier transform has the nice property that it it transforms the sample data into a representation that models the human ear (and other physical systems) very well.

When we take the FT (via FFT or DFT) of a large audio file, we are going to get coefficients representing different frequency bands.  Unfortunately, this information gives us only the average energy of each frequency band, not the precise occurrence of the frequency within the song.  If you had a weird high-pitch noise in some small section of the song, you could see it in the spectrograph, but would have no idea where it occurs.  Just as looking at the streaming of audio samples gives you "time" data but no frequency information.  The frequency coefficients of a fourier transform give you "frequency" data but no time information.

To work around this issue, we take the FT of sub-sections of the song called windows.  A smaller window gives us less frequency bands, but if we see the weird high-pitched noise, at least we've narrowed it down to the current window.  In the FT lingo, you can tradeoff "time resolution" and "frequency resolution" by using smaller or larger windows of audio.  In Sound Forge, "FFT Size" corresponds to window size.  At the extreme smaller end of window sizes, you have something called "Short Time Fourier Transform" which is often doing an FFT with windows of 4 samples.  Wavelets are used when time and frequency resolution need to be adjusted more precisely, but that's a conversation for another day

When you see a spectrogram (2D spectrograph over time), it is a series of windowed FTs.  Each vertical "strip" is one window.

Windowing the audio, however, causes an annoying artifact.  If you were to play the sub-section of audio out of your speakers, you'd likely get a 'click' at the beginning and the end since they don't occur at zero crossings.  It shows up as noise in the spectrograph just like it shows up as noise on the speakers.  And because you can't tell "where" the noise occurs within the window, there's no way to isolate it out.  You can see this phenomenon in low bitrate video as "blocking" artifacts - same reason.  A windowing function is using to avoid these blocking artifacts.  Conceptually, it fades in the start of the window and fades out the end of the window.

But as you can imagine, a windowing function is also destroying out data.  This is where overlap comes in.  If we let the windows overlap each other - e.g. first window is samples 0 through 1023 and the second window is samples 512 through 1535, we can avoid destroying the data.  This is because the audio we "faded out" during the current window becomes part of the "fade in" of the next window.  The downside of the windowing function is that the data has "smeared" itself across two windows.  Different windowing functions change the amount of smear, allowing a trade-off between blocking artifacts and smear.

Hope that helps.

FFT Analysis for Dummies

Reply #12
Thanks for the advice and links. I thought to Google "FFT for dummies" and found a few more. I'm reading through this one now:

http://www.dspdimension.com/admin/dft-a-pied/

I'll be back with questions. I promise.

--Ethan
I believe in Truth, Justice, and the Scientific Method

FFT Analysis for Dummies

Reply #13
Okay, I have a few quick questions:

In Sound Forge I can select a portion of a Wave file for analysis, but I see no equivalent feature in the Rightmark analyzer. Is there a way to do that? If not, where in the file does Rightmark grab the specified number of samples to analyze?

Let's say I have a long wave file and I load it into Sound Forge and select a 5-second section. I understand that the FFT Size is the number of samples to analyze. But if my highlighted section is 5 seconds long, from where does Sound Forge take those samples? Are they contiguous? Does it skip every third sample so the total number (say, 65,536) starts at the beginning of the highlighted area and ends at the end? I realize the total number of samples dictates the lowest frequency that can be read, so that implies contiguous.

I have other questions, but I won't overload y'all for now. Also, assume that the two reasons I want to use FFT are:

1) To measure the frequency response of something, for example after passing white or pink noise through it.

2) To see the spectral content in a music file, as if I were to try to apply EQ in one audio file to match another as Harbal does. I'm not interested in doing such EQ matching! But I might want to know what frequencies are present in a music file and in what amounts.

--Ethan
I believe in Truth, Justice, and the Scientific Method

FFT Analysis for Dummies

Reply #14
Thanks JJ. Dumbing it down a little bit: You start with N samples. The DCT produces N/2 frequency amplitude results and N/2 corresponding phase results. The phase information is typically discarded in spectral analysis applications.
I think you used the wrong word here. The DCT produces N real outputs from N real inputs.

FFT Analysis for Dummies

Reply #15
While it may seem odd that you have N/2 components, they are complex, so you actually have N results, half real, half imaginary.

Thanks JJ. Dumbing it down a little bit: You start with N samples. The DCT produces N/2 frequency amplitude results and N/2 corresponding phase results. The phase information is typically discarded in spectral analysis applications.



You meant DFT ...

DCT is real to real.
-----
J. D. (jj) Johnston

FFT Analysis for Dummies

Reply #16
Note:

I generally point at "Fourier Analysis" by Norman Morrison as a good primer.

Warning: Mathematics will be involved.
-----
J. D. (jj) Johnston

FFT Analysis for Dummies

Reply #17
Okay, I have a few quick questions:

In Sound Forge I can select a portion of a Wave file for analysis, but I see no equivalent feature in the Rightmark analyzer. Is there a way to do that? If not, where in the file does Rightmark grab the specified number of samples to analyze?


You'd need a waveform screen to do that. I see none in RMAA. The assmuption is that you edited the data before you ran the analysis.


Quote
Let's say I have a long wave file and I load it into Sound Forge and select a 5-second section. I understand that the FFT Size is the number of samples to analyze. But if my highlighted section is 5 seconds long, from where does Sound Forge take those samples?


If the sample is larger than the size of the FFT, then the program walks usually down the wave, repeating the FFT over and over again on each chunk, and averaging the spectral information.

Quote
Are they contiguous?


Not unless the chunks are overlapping. Soem programs let you specify overalp

Quote
Does it skip every third sample so the total number (say, 65,536) starts at the beginning of the highlighted area and ends at the end?


No. That would give you a FFT of a different wave. The dopped samples would have the effect of changing the frequency of the wave you were analyzing.


Quote
I have other questions, but I won't overload y'all for now. Also, assume that the two reasons I want to use FFT are:

1) To measure the frequency response of something, for example after passing white or pink noise through it.


Right. This only works easily if the test signal is in some sense flat. The good news is that white and pink noise are in some sense flat.  This also works with multitones, if you are willing to look at a FR curve that is a series of peaks that are supposed to be the same height.

Quote
2) To see the spectral content in a music file, as if I were to try to apply EQ in one audio file to match another as Harbal does. I'm not interested in doing such EQ matching! But I might want to know what frequencies are present in a music file and in what amounts.


IME Eq is something you always apply by ear, unless you have some very special technical purpose.



FFT Analysis for Dummies

Reply #19
You'd need a waveform screen to do that. I see none in RMAA. The assmuption is that you edited the data before you ran the analysis.

Excellent Arnie, thanks for that and all the other answers.

--Ethan
I believe in Truth, Justice, and the Scientific Method


FFT Analysis for Dummies

Reply #21
I say we start with the Z-transform and express everything else as special cases of that!

FFT Analysis for Dummies

Reply #22
I say we start with the Z-transform and express everything else as special cases of that!


Naah, we should stick to continuous infinite length systems, and functional algebra
-----
J. D. (jj) Johnston

FFT Analysis for Dummies

Reply #23
In Sound Forge I can select a portion of a Wave file for analysis, but I see no equivalent feature in the Rightmark analyzer. Is there a way to do that? If not, where in the file does Rightmark grab the specified number of samples to analyze?

RMAA always analyzes the whole WAV file that you load and averages spectrum over all FFT blocks.
For more flexible analysis (including spectrograms) I recommend downloading a demo of iZotope RX. It doesn't require authorization to perform analysis. Here's a few screenshots:





FFT Size: I understand that the higher the number, the better the frequency resolution. So why is this
even adjustable? Why not just use the highest resolution possible automatically?

Because smaller FFT size will allow you to spread more FFT windows throughout the file and get less noise variance due to averaging of spectra.


Zero Padding: This ranges from None through 8x. What does this do?

This is a frequency interpolation of the resulting spectrum, it will allow you to better see what is happening "between the FFT bins".


FFT Overlap: What is this for, and when would you use higher or lower values?

It tells how densely FFT windows cover the WAV file, and how "evenly" the file is analyzed. More overlap does better averaging of spectrum (more is typically better for this parameter).


FFT Window: I recognize some of the names, but have no idea when or why one would select these choices.

It's a subtle moment: they control shape of spectral peaks. "Stronger" windows somewhat widen peaks (which can be compensated by increasing FFT size), but suppress "false" surrounding "skirts" around these peaks. Rectangular is "weakest", Kaiser with high "beta" is "strongest".


Kaiser Window beta: I have no idea what this does either.

It's the strength of a window's ability to suppress "false" surrounding around spectral peaks (at the expense of slight widening of peaks).


What are slices and why is more than
one needed to analyze a Wave file?

Averaging spectrum across many slices allows you to reduce noise variance and see white or pink noise more as lines rather than clouds of random data.

FFT Analysis for Dummies

Reply #24
Thanks very much Alexey! Lots for me to digest.

--Ethan
I believe in Truth, Justice, and the Scientific Method