2Bdecided, does the NR-2 plugin in "mode 2" really use plain windowed FFT ? does it take longer to process your signal, when compared with CoolEdit's function (which is known to use FFT with windowing) ?
As Gabriel suggested: have you tried another window shape by any chance ?
The MDCT is a lapped orthogonal transform used in signal processing. More traditional block transforms for signal processing (such as theDCT or FFT) suffer from blocking artefacts arising because of theindependent processing of each block . Lapped transforms, however,have a 50% overlap between successive blocks which results in muchreduced artefacing.
It's possible to implement a filter in the frequency domain using an FFT. Instead of convolving the impulse response in the time-domain, you multiply the frequency response in the frequency domain. It's relatively basic stuff, and is in most audio textbooks. The advantage is speed: for long impulse responses, this approach can be, say, ten times faster than direct convolution.Things get more interesting when you change the "filter" over time. I put the word "filter" in quotes because a similar technique is used in various processes that are not simple "filtering". I'm referring to anything where you fft, change the coefficients, then inverse fft. This could be an adaptive filter, a lossy audio codec, a noise reduction algorithm etc etc.An audible problem with this approach is that the FFT blocks become audible. You can window and overlap the blocks, and change their size, but this doesn't always make the problem go away. What problem you hear depends on what you're doing to the fft coefficients, but usually, without care, you hear artefacts related to the FFT block size. Either the temporal resolution of the signal or artefacts start to match the FFT length, or frequency domain artefacts appear which relate to the FFT frequency bins.My question is: Are there standard (or at least, "known") techniques for reducing the audibility of the FFT block structure in such a process?I realise in lossy audio coding, the more accurately the coefficients are stored, the less of a problem this is. When the coefficients are stored innaccurately, you get (generally) pre-echo or warbling. But the coefficients are always stored innaccurately - it's a lossy codec! So, what's done "right" to prevent this problem? Is it just careful choice of coefficient rounding to match psychoacoustics, or are there other techniques?A more pertinent example is Noise Reduction, because the coefficients are intentionally changed. In Cool Edit Pro, the NR feature often leaves "tinkly" or "bubbley" or "bleepy" results. These results can be changed by adjusting the window size, transition width, amount of nosie to be removed etc etc etc - but generally, it's quite easy to get bad results! (You can get good results too - I'm not trying to make a point against CEP). In the Sonic Foundry NR-2 DirectX plug-in, especially using "mode 2", it's impossible to get these kinds of artefacts. (You can still remove too much noise and make it sound bad, but you only hear that you've removed some signal as well as noise, you don't hear artefacts due to the actual working of the algorithm). It's still using an adaptive FFT noise reduction algorithm, but it's doing something dramatically different which hides most of the problems. What?The specific problem which I've never solved, and which is related (but different!) is when you don't even start with a time-domain signal; you generate a signal in the frequency domain (I was using spectrally shaped noise - specific amplitude, random phase), and IFFT it to get the time domain signal. You have to overlap the results (because you get clicks at block boundaries otherwise - inaudible for white noise, useless for coloured noise!), but during the overlap some of the noise cancels, so the noise envelope is modulated by the FFT block length. When I was trying this six (!) years ago, I didn't even get the noise spectrum I hoped for, but I think that may have been to do with time-domain aliasing due to a wildley variying spectrum and a short FFT length. However, even fixing this, the noise will average during the windowing, giving amplitude modulation at the rate of the FFT length - how could this be avoided?I'm sure there are other examples, but I know too little about them. From what I've read and guessed, there may be different techniques for reducing the FFT block audiblity depending on exactly what you're doing. I raise the last example of a specific task where I've hit the problem myself (but it was a long time ago!) - however, I'm most interested in the previous two examples, and the problem in general.Does anyone have any techniques or suggestions that they can share, or any papers they can point me to?Cheers,David.