Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Eyecandy / Graphical signal analysis / Psychoacoustic model (Read 2973 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Eyecandy / Graphical signal analysis / Psychoacoustic model

I've got something for you to play with. It's a program that creates a nice graphical analysis of a 44.1 kHz Stero WAVE file.

Download  the file here:

http://home.san.rr.com/sandiegodiaries/audio/SignalTest.exe

and a required support DLL here:
http://home.san.rr.com/sandiegodiaries/audio/AntFFT.dll

Copy the EXE and the DLL to a directory (e.g C:TEMP).

C:
CD C:TEMP

and run the program like this:

SignalTest.exe Some44kHzStereoWavefile.wav

I used this tool to create a qualitative assessment of various aspects in audio coding, as well as testing a simple psychoacoustic model. It is some research/experimentation I did before starting the actual work on an audio codec.

What it will display (use keys to toggle)

F1 = Linear magnitude of signal
F2 = Logarithmic magnitude of signal
Shift-F2 = Psychoacoustic model / Threshold of hearing
F3 = Phase
F4 = Real component of FFT (linear)
F5 = Imaginary component of FFT (linear)
F6 = Spectrogram (yellow=centered, red=left channel, grn=right channel)
Shift-F6 = Spectrogram multiplied with tonality estimation
F7 = Magnitude Delta (between two FFTs)
F8 = Phase Delta (between two FFTs)
F10 temporarily halts the output
Alt-enter: Toggle Fullscreen mode
ESC kills the GUI
ctrl-c in the DOS box kills the task.

Most impressive surely are the Shift-F2 and F6 mode.

The default is to use a 1024 sample FFT window, a sinusoidal windowing function and a 512 sample advance per transform. This combination allows for a perfect signal reconstruction given no quantization of the coefficients is done.

The window function will be applied once before the FFT and once following the inverse FFT. So the overlapping windows effectively produce a constant of sin^2 + cos^2 = 1  which is why you can achieve perfect reconstruction.

Notes regarding the psychoacoustic model:

a) it is slow (completely and utterly not optimized for speed)
b) it is based on work found here: http://www-ccrma.stanford.edu/~bosse/proj/proj.html  which was, however, buggy (my graphical display helped me to identify the problem).
Thanks to Bosse Lincoln for publishing his work.

The graphic visualization uses the bark frequency scale. This is why the lines seem to be squeezed towards right side of the plot. The bark scale is nonlinear. It better matches the characteristics of the human ear than a linear frequency scale - the reason for using it in psychoacoustics.

The yellow, solid bottomline shows the absolute threshold of hearing. The dip at the bottom is around 3kHz and marks the frequency range where the human ear is most sensitive due to resonance effects in your ear canal.

The moving yellow spots indicate the power levels of the audio signal. Total dynamic range of this display is about 120 dB.

The red and green lines are the calculated (estimated) threshold of hearing. The signal-to-mask ratio (which is the distance between the yellow -> red/green dots) actually controls quantization in an actual audio codec. Yes, I verified that it doesn't sound too bad  This psychoacoustic model also models some sort of postmasking.

I recommend you try the SQAM test samples to be downloaded form this page: http://sound.media.mit.edu/mpeg4/audio/sqam/

Options to play with:

-f  disables the psychoacoustical model (no calculation of threshold of hearing). MUCH FASTER! Great for watching the spectrogramm pass by

-a <any number>  defaults to 512. Lowering this value will make the sliding window move slower.

-N <any number>  FFT transform size. defaults to 1024. Don't mess with this  Graphical output is hardcoded to this transform size. You have been warned.

-W allows to switch the window function... rectangular,hamming,hanning,triangular,welch or sinusoidal


Comments are welcome.
Don't consider this a product. It's experimental code.
No warranties. No liabilities.

AntFFT.dll links against the GPL'ed FFTW package.
I didn't write AntFFT.dll, though. It found it on some obscure russian server.  It does the transforms and windowing for me.


Concerning the status of my experimental audio codec.

Yes, it works.
But it sucks.
So no release yet