Can anyone suggest a utility that takes an audio file as input, and can output to a text/CSV file various data (RMS, peaks, dominant frequencies, etc.) per each N ms of the input, for both time and frequency domains?
Windows is my main target, but *nix as well.
It seems SoX has a few features in this direction, but not flexible enough (e.g., data per N ms, frequency domain resolution).
It seems SoX has a few features in this direction, but not flexible enough (e.g., data per N ms, frequency domain resolution).
For the time-domain values, you could use something like this:
sox input.wav -n trim 0 0.012 stats : restart
(change 0.012 s = 12 ms into the desired block length). You are right, however, regarding the frequency domain; there is not much support for that.
It won't be efficient, executing it thousands of times.
It won't be efficient, executing it thousands of times.
Is that a realistic concern given how trivial the processing you are doing is?
If you don't want to write an application yourself, Your best bet is probably MATLAB (http://www.mathworks.com/products/matlab/) or a MATLAB Clone (http://www.dspguru.com/dsp/links/matlab-clones).
Of course, you'd need to understand the math required for what you want to accomplish. I've never used any of these programs, but I believe FFT is "built-in" (and simpler things like average & RMS algirithms) so you shouldn't have to develop all of the math from scratch.
As a free alternative to MATLAB I'd suggest to try Python + Numpy/Scipy. I've never used it myself, but there is also a scikit for audio processing (http://scikits.appspot.com/audiolab) around.
It won't be efficient, executing it thousands of times.
$ soxi data.wav
Input File : 'data.wav'
Channels : 2
Sample Rate : 44100
Precision : 16-bit
Duration : 01:12:00.91 = 190551984 samples = 324068 CDDA sectors
File Size : 762M
Bit Rate : 1.41M
Sample Encoding: 16-bit Signed Integer PCM
$ time (sox data.wav -n trim 0 0.055 stats : restart 2>&1 | wc)
1256737 5733870 45242573
real 0m23.383s
user 0m20.849s
sys 0m5.960s
$
So that’s less than half a minute for 78562 blocks. For a block length of 0.009 seconds (480101 blocks), it takes four minutes. The processor is a 1 GHz AMD 4850e, nothing extreme.
If you don't want to write an application yourself, Your best bet is probably MATLAB (http://www.mathworks.com/products/matlab/) or a MATLAB Clone (http://www.dspguru.com/dsp/links/matlab-clones).
+1. i've never used Matlab for audio but it should come with all necessary functionality. it can read WAV files, write CSV files, and has the most essential libraries such as FFTW built-in.
Thanks for the suggestions.
So that’s less than half a minute for 78562 blocks. For a block length of 0.009 seconds (480101 blocks), it takes four minutes. The processor is a 1 GHz AMD 4850e, nothing extreme.
Is that a realistic concern given how trivial the processing you are doing is?
Interesting. I'm wonder if execution overhead is that trivial on a webhost, or at least, in terms of what they'd consider valid use. But then again, there's also the missing features with SoX.
Your best bet is probably MATLAB or a MATLAB Clone
Although I'd rather use something existing made for the specific job, rather than semi-program, I'll check this direction too.
As a free alternative to MATLAB I'd suggest to try Python + Numpy/Scipy. I've never used it myself, but there is also a scikit for audio processing (http://scikits.appspot.com/audiolab) around.
Sounds like this might need some customization, but I'll check it out. I wonder how well Python performs, but there are also potential advantages to Python for running on a shared webhost.
As a free alternative to MATLAB I'd suggest to try Python + Numpy/Scipy. I've never used it myself, but there is also a scikit for audio processing (http://scikits.appspot.com/audiolab) around.
Sounds like this might need some customization, but I'll check it out. I wonder how well Python performs, but there are also potential advantages to Python for running on a shared webhost.
In case you aren't aware, and for reference, NumPy/SciPy provide C level BLAS/LAPACK routines in Python interface, all around Numpy's multidimension (nd)array object. Many packages depend on it. Further performance boost on matrix manipulation, linear algebra and some other things, can be applied by compiling/installing (depending on platform) with Intel MKL or ATLAS libraries (roughly x30 boost, but that also is different from case to case)
Then take performance test with other suggested/possible solutions
For reading PCM WAV files, SciPy can do it. Audiolab scikit, provides additional formats through sndfile library, and was made as SciPy extension (like many others scikits)
It won't be efficient, executing it thousands of times.
The command "restart" does not re-execute sox as a new process. It is just a sox parameter telling it to loop the internal processing chain.
Another alternative might be Praat (http://www.fon.hum.uva.nl/praat/), that has basic scripting (http://www.fon.hum.uva.nl/praat/manual/Scripting.html) capabilities and allows to easily extract low to mid level audio features.
No, I'm not aware of the specifics of SciPy/NumPy. Sounds like the plot thickens, at least with shared webhosting.
I also wasn't aware initially of SoX's restart command.
Praat is intriguing. I wonder why they seem very intent on not showing any screenshots in their site.