Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: RMS, SNR PCM Data (Read 7972 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

RMS, SNR PCM Data

Hi guys,

I am new here so I apologise if this is the wrong place to be asking these questions.

First a little background. I'm currently in my last year of a software engineering undergraduate degree and am working on a project which 'intelligently' (basic) filters audio. I am using 16-bit mono PCM data to keep things simple. I have a speech sample, pure tone samples of varying frequencies and a white noise sample. These are then mixed together in different ways i.e speech+ tone, speech+white noise, speech + multiple tones and passed to the application.

So far I have gotten the application to discern whether the sample contains speech+tones or white noise (wideband noise i guess of any type) and other components. I have done this using the average magnitude of an FFT output using the following code (C#) :

        private void avgMagnitude()
        {
            double sum = 0.0;
            for (int i = 2; i < fftOutput.Length; i += 2)
            {
                sum += fftOutput * fftOutput;
                sum += fftOutput[i + 1] * fftOutput[i + 1];
                sum = Math.Sqrt(sum);
            }
            sum = sum / (fftOutput.Length);
            avgMag = sum;
        }

Does this seem correct if the output of the fft is a series of complex numbers listed line1:real line2:imag and so on? and can that kind of judgement be made  using this property? It seems to work so far but I have been told by my supervisor that i need more criteria to work with. One example i found was peak/rms of the audio but i dont know how to work that out.

Im currently using some filter implentations from math.net to remove all frequencies above 7K (as im looking to keep the speech) and the FFT implementation to find any peak freqs after this in case there is a tone inside the human speech freq range. This seems to work ok.

I do not really have any idea what to do about the white noise or if it is even possible to pull speech out of it. At the moment i am using a de-noiser but it doesnt give great results.

Anyway, finally. Could someone explain to me if its possible to calculate RMS or signal to noise ratio from PCM data please.

Sorry for such a large first post :S

Sean.


RMS, SNR PCM Data

Reply #1
I'm kinda surprised your advisor even let you do this project, if he/she is actually holding you to this high of a signal processing standard. You probably should have taken a DSP course before tackling this -- and a lot of these answers may be available in undergraduate signal processing and/or DSP texts.

Yes, you appear to be computing magnitudes correctly, although I haven't looked at C#'s format specifically (but iirc interleaved real/imag is common). But you should almost certainly be computing powers instead of magnitudes, as typically, powers are more physically meaningful. Moreover, that's required to compute RMS figures.

RMS computation is such an elementary process that I daresay I'd be doing your homework if I explained it to you any further. Look it up. Then look up "crest factor" for a better description of what "peak/rms" means.

Don't forget windowing.

Regarding detecting voices in noise, if you can successfully convince your advisor to constrain vocal inputs to be "real speech" (ie, not percussive), you might be able to get some mileage out of cepstrum analysis. Or you might be able to do a statistical analysis on the frequency bins and assert voice detection if a sufficient number of bins are widely varying from some nominal value. I dunno, I'm not at all an expert on the subject, but there should be about 5 billion refs for this exact problem in your local engineering library. Hit up old ICASSP volumes, in particular.

RMS, SNR PCM Data

Reply #2
Hi Axon,

Thank you for replying to me. You're right in some ways this shouldn't have been taken on from a software engineering perspective but being a few weeks away from the deadline I have no other choice.

Is the difference in power and magnitude just that I don't take the square root of the sum?

Also, I have seen that an rms value would be worked out by taking the signal values in the time domain, squaring each of them (to remove negative values),. adding these as i go along and then taking the mean at the end. Is this correct?

Is windowing essential? At the moment I'm taking 65536 samples for the FFT.

In terms of inputs: I have chosen the samples myself. I have a human speech sample, which is used in all cases. Tones and/or white noise is added to this. These sample are then passed to the program and the end result is a best effort approximation of the original human speech sample.

Lastly, after looking at the 'wiki' on the crest factor, it has a table giving values for some normalized waveforms. Im not sure how to interpret these, does this mean that if i generate a sine wave, regardless of the frequency, it will have an rms value of 0.707 and a crest factor of 1.414? ref: http://en.wikipedia.org/wiki/Crest_factor

Thanks again for your time and help.



RMS, SNR PCM Data

Reply #3
Thank you for replying to me. You're right in some ways this shouldn't have been taken on from a software engineering perspective but being a few weeks away from the deadline I have no other choice.
Ahh, the joys of the last-minute capstone project. I suppose that if you don't black out in a caffeine-overdose-and-sleep-deprivation-induced haze the night before the project is due, like I did, you'll be out ahead.

Quote
Is the difference in power and magnitude just that I don't take the square root of the sum?
That's the operational difference between rms and power magnitudes, yes.

Quote
Also, I have seen that an rms value would be worked out by taking the signal values in the time domain, squaring each of them (to remove negative values),. adding these as i go along and then taking the mean at the end. Is this correct?
No. RMS stands for "root mean square". You square the values, then take their arithmetic mean, and then take the square root of that. The meaning of the squaring is not only to make negative values positive but also to make use of all kinds of other mathematical tools, most importantly Parseval's Theorem, but perhaps more subtly, statistical variance.

Quote
Is windowing essential? At the moment I'm taking 65536 samples for the FFT.
You have no business whatsoever using an FFT that big, unless your audio files fit exactly into that (which is kinda cheesy). I can't quite see a reason why it should be any larger than 1024-4096 samples.

Answering this question depends largely on your filtering architecture. If the actual filtering is going to work in the time domain, then you're only using the FFT for analysis purposes, and given the dynamic ranges we're dealing with here you can probably get away with a rectangular window. If you're doing the filtering in the frequency domain, then you'll almost certainly need a window, and overlapping windows at that. (Those are not only useful for reconstruction, but also to implement Welch's Method for power spectrum estimation.)

Quote
Lastly, after looking at the 'wiki' on the crest factor, it has a table giving values for some normalized waveforms. Im not sure how to interpret these, does this mean that if i generate a sine wave, regardless of the frequency, it will have an rms value of 0.707 and a crest factor of 1.414? ref: http://en.wikipedia.org/wiki/Crest_factor
Yes -- in the context of continuous-time, periodic signals. In general RMS magnitudes are independent of sinusoidal frequency; Parseval's Theorem wouldn't work if they were. Obviously computing those numbers in a "real" digital environment (with a windowed sampled signal) won't give you precisely those values, and trivial slight frequency dependence will be observed simply due to windowing vagaries.

RMS, SNR PCM Data

Reply #4
.. I suppose that, instead of dismissing a 64k FFT length out of hand, I should be sure first I know why you're using that in the first place, instead of presuming the reasons.

RMS, SNR PCM Data

Reply #5
Yeah the FFT is purely for analysis. Its just a feature for the user to look at the data in the frequency domain and is used during the 'intelligent' filtering of the data to find peaks and notch them out. Unfortunately I cant find any information on whether the data is being filtered in time/frequency domain. I am using a Math.Net library called neodym and there seems to be very little documentation for it :S I chose the 64k FFT as i figured it would be a high enough resolution, but now that you have suggested otherwise im going to put it down to lack of knowledge/experience and reduce the sample size, it will make the GUI a little easier also.

I'm currently doing up a test harness for the project (even if it isnt finished yet), so i haven't had much time to create any new code. Thanks for the clarification on the rms / crest value info, does it seem feasible to use the crest factor of an audio sample in conjunction with its power (instead of the magnitude as i had originally been doing) to discern, albeit very basically, whether an audio sample contains just speech and tones vs speech and wideband noise? I'm thinking of dropping filtering of wideband noise as i have had no luck with it at all so far. I have read that a matched filter would do the job but it requires the original sample without the noise added to work, which i have but wouldnt be realistic and so wouldnt work in a real-time situation, which is pointless for my project.

Thanks again.

-Edit : I should clarify, the reason i am asking about the crest factor being used is that my supervisor was happy enough with me using the magnitude but told me that I needed more criterion to work with, or more accurately at least one more :S

RMS, SNR PCM Data

Reply #6
It might not be a bad idea to politely ask the department if they can front some money for you to buy a "real" speech processing/noise reduction library, depending on the scope of what you're actually supposed to accomplish yourself on this. Some grad student/prof might already have something if you look (and/or ask) hard enough.

I don't know the state of open source C# signal processing APIs, so all the comments I'm going to make are sufficiently technical, that you're probably still going to need to do a lot of legwork to implement them. But it shouldn't be *that* hard.

The easiest way to do broadband noise reduction is with spectral subtraction. This book is an extremely authoritative reference on the subject (see chapter 6). Really technical though.

An idea that comes to mind is, instead of doing an rms or crest factor analysis on the signal as a whole (on a block-by-block basis), do it for each FFT frequency bin. This is also inherently going to work in real time, and will adapt to any background noise level, and there are like 5 billion ways you can tune this if performance is inadequate, but does require that actual content (speech) is going on for a relatively small fraction of the time, compared to "silent" passages, containing only broadband or tonal noise. It's also ridiculously CPU intensive compared to what you'd probably find commercially.
  • crank down the FFT length to N=2048 or so, Hann ("Hanning") window, and use 50% overlap
  • For each FFT band, compute the power, then compute a band-specific histogram that's fairly large, say >=1000 points. The histogram size needs to be large enough so as not to be skewed by a long stretch of talking. You *might* also want to compute the histogram on a decibel basis instead of a linear basis too.
  • Use the histograms to compute cumulative density functions (CDFs).
  • To perform broadband noise filtering, use the CDFs to estimate an average noise spectrum magnitude, optionally scaling the magnitude with a configurable parameter, then subtract the FFT magnitude spectrum by that estimated noise spectrum.
  • If you do broadband noise filtering as above, you're probably going to get tonal noise filtering for free.
  • Reconstruct the filtered waveform with the techniques outlined in the book above, ie, inverse FFT, cutting out the middle 50% of the existing block, inverting the window function, and adding that to the output
  • To detect speech or tones or whatever (to use for your existing filtering scheme): For each band of an FFT block, compute its power, and compute the percentile that power lies on the band CDF. If X number of bands are at the Yth percentile or higher (where Y>50), you've detected something.

RMS, SNR PCM Data

Reply #7
Hi again, Thank you so much for all of this information, I hope I have enough time to whip up an implementation. I have brought my FFT sample size down to 2048, however does this mean then that I can only represent frequencies up to around 10K? It seems to be what's happening when I output the data though my drawing code may be wrong but I'm pretty sure its correct.

With regards to the hanning window, from doing a quick google search the algorithm seems to be:

for (int i = 0; i < 2048; i++) {
    double multiplier = 0.5 * (1 - cos(2*PI*i/2047));
    input = multiplier * input;
}

Do i successively take 2048 samples (the total is always 441000 samples : 10 secs at 44.1K sample rate). and then average all of these i.e:
(firstSet[1] + secondSet[1] + thirdSet[1]) / 3 = averageVal[1]; and so on?

RMS, SNR PCM Data

Reply #8
The maximum frequency is always half the sampling frequency. The width of the FFT changes the resolution.