Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Extract background sounds of multiple variants (Read 7999 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Extract background sounds of multiple variants

Hello,

I have multiple audio files with the same music and speech in different languages to it. Sample-wise they're nearly equal. Some curves with the same audio contents differ a bit, I guess because of compression.

I wonder if it's possible to extract the nearly equal parts and thus filter out the different parts. I think, the more source files you have the better it must be possible, theoretically.

If there's a tool that does exactly that or a toolchain that allows you to accomplish this, please let me know. I havn't found it.

Does anyone know a good technique to accomplish the task?

Extract background sounds of multiple variants

Reply #1
Hello,

I have multiple audio files with the same music and speech in different languages to it. Sample-wise they're nearly equal. Some curves with the same audio contents differ a bit, I guess because of compression.

I wonder if it's possible to extract the nearly equal parts and thus filter out the different parts. I think, the more source files you have the better it must be possible, theoretically.

If there's a tool that does exactly that or a toolchain that allows you to accomplish this, please let me know. I havn't found it.

Does anyone know a good technique to accomplish the task?


The success of this project depends on how exactly identical the musical portions are. If they were the same basic recording recorded again and again with different speech each time, you can get some attenuation of the speech or more properly stated the music will increase as you add more slightly different recordings together.

This is in turn contingent on being able to precisely time synch the musical recordings.

If they were separate recordings of both music and speech from different performance, this is probably not going to work for you.

The best you can do is get a 3 dB increase in the loudness of the music every time you double the number of copies you add together. If you do the arithmetic, its going to take a lot of recordings to get much attenuation of the speech.

Extract background sounds of multiple variants

Reply #2
Thank you for your ideas. Just an attenuation would not be so desirable. Then all speech would be merged together and still hearable.

I think there has to be a way to filter the same parts out. If you would, e.g., process one line (band) of the spectrograms of the audio files, how would you determine which intensity of a frequence at a specific point in time is the common one? Of course this method would need to be implemented as a full solution of the problem, at least as a toolchain.

There are techniques for voice removal with inversion and such. Strangely I can't find information about filtering out matching background audio with multiple files as source.

Quote
The success of this project depends on how exactly identical the musical portions are.

When I said some curves differ a bit, I meant at wave level, that is, you almost see the single samples. You can see that it's the same base material. I think the differences come from compressing the matching audio with non-matching audio around it.

Extract background sounds of multiple variants

Reply #3
Strangely I can't find information about filtering out matching background audio with multiple files as source.


You're asking how to solve a system of N+1 unknowns from N knowns.  And you've added in noise because the records are not actually identical.  There is no general solution to this problem, but if you had a lot of recordings you can probably find an approximate one.  Maybe try computing time frequency distributions from a large number of recordings, comparing the content of each band, and taking the consensus band?  If the background really are very similar, then the only common part should be the music you want.  Be warned though, the quality is probably going to be quite poor. 

Extract background sounds of multiple variants

Reply #4
I have created spectrograms from two files and graphically subtraced the one from the other to see which parts are different. Does anyone know of some tools to equalize out those parts? You would probably need other data for equalizing out than a spectrogram image. So are there free tools to do so? A type of scripting language for audio that's not overly complex would probably also do the job. Does anyone know of the needed tools?

Extract background sounds of multiple variants

Reply #5
You could try to exploit stereo-to-surround Upmixers: they need to calculate a signal for the center speaker and they do it by looking at similar content in the left and right channels. The most basic one is "dolby prologic II", it only uses one frequency band, so it only works well if the sounds are separated in time. Advanced ones like "DTS Neo:6" use more frequency bands, so they have a stronger separation if the sounds overlap in time. I don't know what the best surround Upmixer currently is, but it was "DTS Neo:6" a long time ago. There is also a free plugin for foobar2000 called "fsurround" that has almost perfect separation, because it operates on the frequency spectrum of the signal.
I don't think there is any surround Upmix that doesn't add any artifacts so you probably need to try a few different ones.

I don't know what the best way is to extend this process to more than 2 files. Perhaps you can make a "KO-tournament": merge pairs of files till only one is left.

Extract background sounds of multiple variants

Reply #6
You could try to exploit stereo-to-surround Upmixers

Unfortunately I could not find a free software solution that's not old and offline. foobar2000 lacks the possibilty to save the output file for me. It's annoying that I'm not able to test this.

Extract background sounds of multiple variants

Reply #7
I just found out that foobar2000 can output to a file: There should be a "convert" option in the context menu if you select tracks. There is also an option to use sound processing plugins. Even though my version of foobar2000 (1.2.2) is past the (0.9.x) recommended by fsurround It seams to work: when I "converted" a stereo .mp3 to .wav using fsurround the resulting file had 6 Channels and the center channel sounded fine in audacity. I'm using version 0.9.0 of fsurround which I got from here.

Extract background sounds of multiple variants

Reply #8
I have created spectrograms from two files and graphically subtraced the one from the other to see which parts are different. Does anyone know of some tools to equalize out those parts? You would probably need other data for equalizing out than a spectrogram image. So are there free tools to do so? A type of scripting language for audio that's not overly complex would probably also do the job. Does anyone know of the needed tools?


I would suggest matlab ($$$) or the free version (Octave).  Python (using NumPy) is also pretty popular for DSP.  I don't think this is going to be possible without a general purpose programming language and DSP tools though.

 

Extract background sounds of multiple variants

Reply #9
I just found out that foobar2000 can output to a file: There should be a "convert" option in the context menu if you select tracks. There is also an option to use sound processing plugins. Even though my version of foobar2000 (1.2.2) is past the (0.9.x) recommended by fsurround It seams to work: when I "converted" a stereo .mp3 to .wav using fsurround the resulting file had 6 Channels and the center channel sounded fine in audacity. I'm using version 0.9.0 of fsurround which I got from here.

Yeah, I got it to work, too! Given that only two mono files are possible as input the results are pretty good!

I would suggest matlab ($$$) or the free version (Octave).  Python (using NumPy) is also pretty popular for DSP.  I don't think this is going to be possible without a general purpose programming language and DSP tools though.

Thanks! Octave gives a pretty good impression. I have to figure out whether it can help me. I hope it would be the only programming need in this context.