Skip to main content

Topic: Bandpass RMS for audio content matching (Read 3223 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • klonuo
  • [*][*][*][*]
Bandpass RMS for audio content matching
Let's say we divide¹ frequency range to octaves (middle 9 from 11) and then bandpass ranges and calculate RMS per range
Then doing some kind of normalization (maybe using ReplayGain values) and store this values

Don't know if it can be done in one-pass, so maybe it would be computationally hungry, not optimized etc, but don't you think it would be good pool for matching track similarity on various bases through this normalized bandpass-ranged RMS values?

------------------
¹ ranges can be selected differently of course, not sure what would be optimal here

  • JapanAudio
  • [*][*]
Bandpass RMS for audio content matching
Reply #1
What type of content matching? do you mean like Shazam: http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf

They do fingerprint analysis with "constellation maps".

  • klonuo
  • [*][*][*][*]
Bandpass RMS for audio content matching
Reply #2
Yeah I know, musical islands and heavy statistics over my head
I was just thinking simple and this does not seemed like a bad idea so I posted

Bandpass RMS for audio content matching
Reply #3
Let's say we divide¹ frequency range to octaves (middle 9 from 11) and then bandpass ranges and calculate RMS per range
Then doing some kind of normalization (maybe using ReplayGain values) and store this values

Don't know if it can be done in one-pass, so maybe it would be computationally hungry, not optimized etc, but don't you think it would be good pool for matching track similarity on various bases through this normalized bandpass-ranged RMS values?

------------------
¹ ranges can be selected differently of course, not sure what would be optimal here


Something like this has long been done to increase the apparent loudness of recordings. It is called multi-band compression.

  • JapanAudio
  • [*][*]
Bandpass RMS for audio content matching
Reply #4
Yeah I know, musical islands and heavy statistics over my head
I was just thinking simple and this does not seemed like a bad idea so I posted

It's not a bad idea but if you're gonna do content matching you will have to consider the probabilities of a match at some point...

  • klonuo
  • [*][*][*][*]
Bandpass RMS for audio content matching
Reply #5
Post title isn't good maybe
I was listening to Gobi by Monolake and thought to continue in that direction then my laziness thought about similarity
Strictly speaking "content matching" was not in my mind, but it may be interesting too

  • klonuo
  • [*][*][*][*]
Bandpass RMS for audio content matching
Reply #6
you will have to consider the probabilities of a match at some point...

"Matching" can be implemented as one of hypothesis testing comparisons with custom error, (although small number of data to consider it as normal distribution, but IIRC hypothesis testing worked even on less then 10 samples) which is easy and cheap
Maybe even some predefined curves can be estimated to music styles, but that may be too ambitious

I think I'll try this (one of this days)

[edit] correlation analysis should be even easier and more sane instead my nonsensical hypothesis testing 
  • Last Edit: 30 November, 2010, 09:05:38 PM by klonuo

  • klonuo
  • [*][*][*][*]
Bandpass RMS for audio content matching
Reply #7
There were no negative replies so I started roughly to see what could I see there (being almost DSP illiterate)
I choose to start with sox, as easiest way to me. Here is commented batch file I used: http://pastebin.com/n74i2BZh

As commented inside script, it outputs .stats file and optional gnuplot script for visualizing data which outputs two pictures: filename_RMS.png and filename_Crest.png:



.stats file looking like this (space delimitered columns):
Code: [Select]
Band_# RMS_lev Left Right RMS_pk Left Right Pk_lev Left Right CFLeft CFRight 
01_band= -55.23 -55.25 -55.22 -39.76 -39.76 -41.05 -32.26 -32.26 -33.41 14.11 12.32
02_band= -49.25 -49.26 -49.23 -33.79 -33.79 -35.06 -26.31 -26.31 -27.45 14.04 12.27
03_band= -43.35 -43.34 -43.35 -27.98 -27.98 -29.18 -20.56 -20.56 -21.71 13.77 12.08
04_band= -37.63 -37.55 -37.70 -22.66 -22.66 -23.61 -15.29 -15.29 -16.69 12.98 11.24
05_band= -32.36 -32.01 -32.75 -19.63 -19.63 -20.19 -11.56 -11.56 -12.28 10.53 10.56
06_band= -28.40 -27.85 -29.03 -12.92 -12.92 -17.30 -6.85 -6.85 -9.31 11.22 9.68
07_band= -29.98 -29.67 -30.31 -18.55 -18.86 -18.55 -8.27 -8.27 -8.36 11.75 12.52
08_band= -33.83 -33.34 -34.38 -20.47 -21.81 -20.47 -9.66 -9.94 -9.66 14.79 17.21
09_band= -36.62 -36.47 -36.76 -19.50 -19.50 -19.67 -8.88 -10.35 -8.88 20.23 24.77
10_band= -44.56 -45.65 -43.70 -24.12 -26.54 -24.12 -12.68 -14.36 -12.68 36.71 35.55
11_band= -57.21 -57.90 -56.62 -38.47 -39.81 -38.47 -22.43 -25.76 -22.43 40.49 51.22
[/size]
I tried all this to be as obvious as possible and to work as expected

Here is some cross-referenced tables with correlation coefficients for normalized RMS average per full 11 band octaves:

"Hysteria" by Def Leppard vs itself:


Table could suggest that track 4 (ballad) and 13 (live track) are worst match
Tracks that match most are track 2 and 3 and 11

So this looks like expected analysing same release

Some low coefficients between "Hysteria" and Ornette Coleman's "Shape of Jazz to Come":


Interesting match in last tracks from Gustav Holst's "Planets" and "Shape of Jazz to Come":


"The Planets" self-reference table:


I hope this isn't "see what you want to see", as I don't have much data right now as process is slow, but I wanted to post and maybe get some tips. I plan to try different bands, maybe other approaches and similar, i.e. Arnold's comment about multi-band compressor: It is possible with sox to do separate bands in one pass with cross-over filters like "mcompand" does, but I don't think (or don't know how) it can be used here?

  • klonuo
  • [*][*][*][*]
Bandpass RMS for audio content matching
Reply #8
saga continues...

I sort of, found one pass solution by using Bidule and making bandpass filters from Christian Budde's Chebyshev LP/HP VST filters and couple of Destroy FX open sourced RMS buddies:
[a href="http://i53.tinypic.com/25q4rhd.png" target="_blank"]

  • romor
  • [*][*][*][*][*]
Bandpass RMS for audio content matching
Reply #9
Apart from some talk, to me (DSP noob) this seems like fine approach

There is lot of information out there, and in this case I don't think it's hidden as with some physics aspect, but real practice (after necessary background) is needed to grasp meaning of some.
Having in mind recent JJ FFT talks, I don't really think it's possible to present the meaning of FFT (or anything DSP related) to someone that that doesn't understand basic mathematical physics concepts (as integral transform i.e.). There is no "need" and there is no sense, it's like pumping botox.

But this seems like nice and easily comprehensive and intuitive approach. Find right bins (from critical bands, like http://www.independentrecording.net/irn/re...in_display.htm), then get energy, and output numbers. Correlation magic could lead to very interesting things I think
  • Last Edit: 08 February, 2012, 09:00:29 AM by romor