Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Bandpass RMS for audio content matching (Read 5697 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Bandpass RMS for audio content matching

Let's say we divide¹ frequency range to octaves (middle 9 from 11) and then bandpass ranges and calculate RMS per range
Then doing some kind of normalization (maybe using ReplayGain values) and store this values

Don't know if it can be done in one-pass, so maybe it would be computationally hungry, not optimized etc, but don't you think it would be good pool for matching track similarity on various bases through this normalized bandpass-ranged RMS values?

------------------
¹ ranges can be selected differently of course, not sure what would be optimal here


Bandpass RMS for audio content matching

Reply #2
Yeah I know, musical islands and heavy statistics over my head
I was just thinking simple and this does not seemed like a bad idea so I posted

Bandpass RMS for audio content matching

Reply #3
Let's say we divide¹ frequency range to octaves (middle 9 from 11) and then bandpass ranges and calculate RMS per range
Then doing some kind of normalization (maybe using ReplayGain values) and store this values

Don't know if it can be done in one-pass, so maybe it would be computationally hungry, not optimized etc, but don't you think it would be good pool for matching track similarity on various bases through this normalized bandpass-ranged RMS values?

------------------
¹ ranges can be selected differently of course, not sure what would be optimal here


Something like this has long been done to increase the apparent loudness of recordings. It is called multi-band compression.

Bandpass RMS for audio content matching

Reply #4
Yeah I know, musical islands and heavy statistics over my head
I was just thinking simple and this does not seemed like a bad idea so I posted

It's not a bad idea but if you're gonna do content matching you will have to consider the probabilities of a match at some point...

Bandpass RMS for audio content matching

Reply #5
Post title isn't good maybe
I was listening to Gobi by Monolake and thought to continue in that direction then my laziness thought about similarity
Strictly speaking "content matching" was not in my mind, but it may be interesting too

Bandpass RMS for audio content matching

Reply #6
you will have to consider the probabilities of a match at some point...

"Matching" can be implemented as one of hypothesis testing comparisons with custom error, (although small number of data to consider it as normal distribution, but IIRC hypothesis testing worked even on less then 10 samples) which is easy and cheap
Maybe even some predefined curves can be estimated to music styles, but that may be too ambitious

I think I'll try this (one of this days)

[edit] correlation analysis should be even easier and more sane instead my nonsensical hypothesis testing 

Bandpass RMS for audio content matching

Reply #7
There were no negative replies so I started roughly to see what could I see there (being almost DSP illiterate)
I choose to start with sox, as easiest way to me. Here is commented batch file I used: http://pastebin.com/n74i2BZh

As commented inside script, it outputs .stats file and optional gnuplot script for visualizing data which outputs two pictures: filename_RMS.png and filename_Crest.png:



.stats file looking like this (space delimitered columns):
Code: [Select]
Band_# RMS_lev Left Right RMS_pk Left Right Pk_lev Left Right CFLeft CFRight 
01_band= -55.23 -55.25 -55.22 -39.76 -39.76 -41.05 -32.26 -32.26 -33.41 14.11 12.32
02_band= -49.25 -49.26 -49.23 -33.79 -33.79 -35.06 -26.31 -26.31 -27.45 14.04 12.27
03_band= -43.35 -43.34 -43.35 -27.98 -27.98 -29.18 -20.56 -20.56 -21.71 13.77 12.08
04_band= -37.63 -37.55 -37.70 -22.66 -22.66 -23.61 -15.29 -15.29 -16.69 12.98 11.24
05_band= -32.36 -32.01 -32.75 -19.63 -19.63 -20.19 -11.56 -11.56 -12.28 10.53 10.56
06_band= -28.40 -27.85 -29.03 -12.92 -12.92 -17.30 -6.85 -6.85 -9.31 11.22 9.68
07_band= -29.98 -29.67 -30.31 -18.55 -18.86 -18.55 -8.27 -8.27 -8.36 11.75 12.52
08_band= -33.83 -33.34 -34.38 -20.47 -21.81 -20.47 -9.66 -9.94 -9.66 14.79 17.21
09_band= -36.62 -36.47 -36.76 -19.50 -19.50 -19.67 -8.88 -10.35 -8.88 20.23 24.77
10_band= -44.56 -45.65 -43.70 -24.12 -26.54 -24.12 -12.68 -14.36 -12.68 36.71 35.55
11_band= -57.21 -57.90 -56.62 -38.47 -39.81 -38.47 -22.43 -25.76 -22.43 40.49 51.22
[/size]
I tried all this to be as obvious as possible and to work as expected

Here is some cross-referenced tables with correlation coefficients for normalized RMS average per full 11 band octaves:

"Hysteria" by Def Leppard vs itself:


Table could suggest that track 4 (ballad) and 13 (live track) are worst match
Tracks that match most are track 2 and 3 and 11

So this looks like expected analysing same release

Some low coefficients between "Hysteria" and Ornette Coleman's "Shape of Jazz to Come":


Interesting match in last tracks from Gustav Holst's "Planets" and "Shape of Jazz to Come":


"The Planets" self-reference table:


I hope this isn't "see what you want to see", as I don't have much data right now as process is slow, but I wanted to post and maybe get some tips. I plan to try different bands, maybe other approaches and similar, i.e. Arnold's comment about multi-band compressor: It is possible with sox to do separate bands in one pass with cross-over filters like "mcompand" does, but I don't think (or don't know how) it can be used here?

Bandpass RMS for audio content matching

Reply #8
saga continues...

I sort of, found one pass solution by using Bidule and making bandpass filters from Christian Budde's Chebyshev LP/HP VST filters and couple of Destroy FX open sourced RMS buddies:
[a href="http://i53.tinypic.com/25q4rhd.png" target="_blank"]

 

Bandpass RMS for audio content matching

Reply #9
Apart from some talk, to me (DSP noob) this seems like fine approach

There is lot of information out there, and in this case I don't think it's hidden as with some physics aspect, but real practice (after necessary background) is needed to grasp meaning of some.
Having in mind recent JJ FFT talks, I don't really think it's possible to present the meaning of FFT (or anything DSP related) to someone that that doesn't understand basic mathematical physics concepts (as integral transform i.e.). There is no "need" and there is no sense, it's like pumping botox.

But this seems like nice and easily comprehensive and intuitive approach. Find right bins (from critical bands, like http://www.independentrecording.net/irn/re...in_display.htm), then get energy, and output numbers. Correlation magic could lead to very interesting things I think