Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Audio Summing Algorithm (Read 12865 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Audio Summing Algorithm

hello

I'm a newbie in digital audio programming. Can anyone point me to an article/formula on how to mix two or more audio streams together? It's probably very simple, but I haven't found anything via Google/Bing.

I'm operating on 32-bit floats on a Mac and managing my own audio callback loop (not using Core Audio mixer objects)

Thanks,

Re: Audio Summing Algorithm

Reply #1
You just add the sample values. Nothing special needed. If you exceed 1.0, you can just clip when you convert the final result to integer samples, or divide every sample in the final result so that the highest peak is 1.0.

If you want to guarantee that there won't be any clipping during the mixing, you would need to divide the samples by the amount of streams to be mixed.

There are results for this on the net. You might want to search for "mixing audio samples clipping."

Re: Audio Summing Algorithm

Reply #2
Obviously, in the worst case (as noted above), one needs to reduce the sum of the levels by the count of sources.  However, there are some possible modifications of that rule if you have control of the result (and just want to avoid blowing out your ears before tuning the levels later on.)   So here are the mathematical rules:

1)  in the general case, you need to divide the level by the count of inputs.   when doing this math, you need to be careful about underflow (esp if not properly dithered) and overflow during the actual summation.

2) if you are summing multiple sound sources with the same song, mostly uncorrelated -- then an approximation using sqrt(N) instead of N might still overflow, but can come fairly close.  This will NOT work in the worst case, but gives you an idea of a starting point for each individual instance.

3) If you are summing totally independent sources (e.g. multiple bands of frequencies that do not overlap), then the sqrt(N) rule comes closer to correct.

I am NOT meaning to confuse the hard rule about dividing by 'N", but rather trying to show the subtle nature of the statistics involved.

AGAIN, the initial answer (divide by N) is perfectly accurate -- if you never can overflow (including during the math), then you need to divide each input by N before doing the summation.   If you can 'overflow' during the summation (happen to temporarily be using floating point or larger range number), then you can do the summation and then the division (it is likely to maintain quality better if you can do so and avoid the overflow of the temporary value.)   This also applies in the analog situation, where it is better to keep the signals as strong as you can for as long as you can -- however avoiding overflow (or the analog 'clipping') of the circuit in every part of the circuit.

Isn't it amazing about how complicated a simple 'putting together' of multiple signals can be?  (It really isn't that complicated, but it is always a good idea to keep aware of what is going on in the circuit or software.)  I guess that I just might be making it complicated, but really trying to help with 'thinking' and trying to help new thinkers practice their learning skills!!!

John

Re: Audio Summing Algorithm

Reply #3
If you care about the quality even in the slightest, then clipping is unacceptable.
If clipping is unacceptable and you need an universal solution, then there's no other choice than to divide everything by the maximum possible sum of peak levels. (And if they are 1.0 for all streams, then it's simply the number of streams). Because this is the maximum gain which cannot clip under any circumstances.
If you don't need an universal solution, then it's up to the user to choose the gain.
This is quite simple IMO; you are making it more complicated.

Quote
3) If you are summing totally independent sources (e.g. multiple bands of frequencies that do not overlap), then the sqrt(N) rule comes closer to correct.
Nope it doesn't; actually the opposite.
It's easily tested by summing 2 pure sine waves with different frequencies.
This test shows that "multiple bands of frequencies that do not overlap" is not enough; it's also easy to show that it's not necessary too.
a fan of AutoEq + Meier Crossfeed

Re: Audio Summing Algorithm

Reply #4
It's probably very simple
You just add the sample values. Nothing special needed. If you exceed 1.0, you can just clip when you convert the final result to integer samples, or divide every sample in the final result so that the highest peak is 1.0.
TL;DR Of course this is the simplest algorithm and if you're a newbie, stick with that.

<IF YOU'RE NOT AUDIO FANATIC, DON'T READ THAT CRAZY NERDY STUFF>
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING! It's intended while doing technical stuff, for example in a nulling test - if you want to check whether streams are the same, invert one of them and if you get silence, they're the same.
However, if you're doing musical stuff, that effect is unwanted and this is why downmixed-to-mono music sometimes sounds a bit differently than in stereo - some frequencies are out-of-phase and they change their power after downmixing because of interference. If you want to do it really-hyper-super-good quality way, dive into advanced programming and do FFT - change the signal into frequencies creating it. After FFT simply average resulting coefficients and then do Inverse FFT - create a signal from the information about its frequencies.
For example: let's see at the picture again. With normal downmixing they cancel each other, yeah? But let's FFT both signals. They both have the same frequency and the same amplitude, so FFT will show: FREQ1 blah-blah-blah, AMPL1 1.0, FREQ2 blah-blah-blah, AMPL2 1.0. FFT doesn't show info about the phase, it's ignored. Now because FREQ1 and FREQ2 are perfectly the same, wa can average the amplitudes. That's simple: average of 1.0 and 1.0 is 1.0 ;) We get FREQ blah-blah-blah, AMPL 1.0. Then simply inverse FFT and voila, you get audibly perfect downmix - no single frequency lost its power.

It's a bit nerdy, isn't it? I told you not to read that if you're newbie :D
sox -e float -b 32 -V4 -D gain -3 rate -v 48000 norm -1
opusenc --bitrate 128

Re: Audio Summing Algorithm

Reply #5
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
also including what happens when you mix a signal with silence and then adjust the gain to match the volume — does that produce a result that's (audibly) different from the source?
if you arbitrarily change phase across time, it could be possible that this distortion could be noticed… hard to tell without samples though.

> However, if you're doing musical stuff, that effect is unwanted
this effect is completely fine and it won't matter in practice if you are mixing uncorrelated signals, such as different instruments. it just means if you are duplicating tracks in your DAW for some reason, or recording a single instrument with several mics at the same time, then you need to think a bit more to do stuff correctly.
and it is what will happen in the air/ears anyway if you are listening to several natural sound sources, too.
a fan of AutoEq + Meier Crossfeed

Re: Audio Summing Algorithm

Reply #6
> FFT doesn't show info about the phase, it's ignored
FFT by itself preserves that info, in fact FFT is a completely reversible transform in the mathematical sense. It's you who have decided to discard that information.
a fan of AutoEq + Meier Crossfeed

Re: Audio Summing Algorithm

Reply #7
If you care about the quality even in the slightest, then clipping is unacceptable.
If clipping is unacceptable and you need an universal solution, then there's no other choice than to divide everything by the maximum possible sum of peak levels. (And if they are 1.0 for all streams, then it's simply the number of streams). Because this is the maximum gain which cannot clip under any circumstances.
If you don't need an universal solution, then it's up to the user to choose the gain.
This is quite simple IMO; you are making it more complicated.

Quote
3) If you are summing totally independent sources (e.g. multiple bands of frequencies that do not overlap), then the sqrt(N) rule comes closer to correct.
Nope it doesn't; actually the opposite.
It's easily tested by summing 2 pure sine waves with different frequencies.
This test shows that "multiple bands of frequencies that do not overlap" is not enough; it's also easy to show that it's not necessary too.
Note that I always qualified by an implied statistics disclaimer.   Yes -- if you purposely bias the experiment, then the results that I suggested would be more towards the worst case.  Please note that I was implying a lack of bias.
By giving SPECIFIC examples, you purposefully created bias -- and even though you probably didn't realize it, you didn't disagree with me -- but probably only meaning to be DISAGREEABLE.

Just use the more typical (average) example where music was being used for the material in question -- most likely due to the learning nature of the individual and the context of this environment...  My statement about the statistics remain accurate...  That is, even though you didn't know that you weren't disagreeing.

John

Re: Audio Summing Algorithm

Reply #8
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
also including what happens when you mix a signal with silence and then adjust the gain to match the volume — does that produce a result that's (audibly) different from the source?
if you arbitrarily change phase across time, it could be possible that this distortion could be noticed… hard to tell without samples though.

> However, if you're doing musical stuff, that effect is unwanted
this effect is completely fine and it won't matter in practice if you are mixing uncorrelated signals, such as different instruments. it just means if you are duplicating tracks in your DAW for some reason, or recording a single instrument with several mics at the same time, then you need to think a bit more to do stuff correctly.
and it is what will happen in the air/ears anyway if you are listening to several natural sound sources, too.
This software seems to be in the area that you are discussing:
https://www.soundradix.com/products/pi/

Re: Audio Summing Algorithm

Reply #9
@antony96 when adding samples: when you're adding two discrete time samples, you just add them like that.

Samples will either constructively or destructively interfere, that is perfectly normal and it happens in the real world all the time. You must then scale your envelope in case you want keep things from clipping, in case you want to ensure a non-clipped signal. Also keep in mind the type of signal we're talking here. Technically, you want your values to be in a manageable domain, like signed float, so they can be added beyond an imposed top-level. I.e. effectively using arbitrary precision, etc. When it comes to signal processing, it's much better (imo) to think of it purely as a mathematical construct. Whatever you want to then quantize to put it as PCM values into a file, is to be decided /after/ you're done with your signal processing as such. This is especially important when you're designing signal path circuits with active filters and suchlike.

In case you need a better mental image, make sure you're working on a signed, float basis, this actually represents a "natural" signal, unrelated to technical "technicalities" the closest.

In case you need a better representation of such signals, think of brightness levels in pictures, each line in an image is essentially a one-dimensional signal (assuming it's grayscale), etc.

If you wanna try things out, Octave comes with a bunch of pre-made function for discrete-time signal summation, and loads of other functions. So do some playing around with that, and you'll learn a lot by simply trial and error.

A nice beginner's cheat-sheet: https://en.wikibooks.org/wiki/Digital_Signal_Processing/Discrete_Operations

Not sure if it makes sense getting more technical, but if you are interested in LTI systems, and signal processing in general, it's a good idea to start with simply the nomenclature, learn what convolution is, and what it represents, etc.
Whether you want to go into Laplace and Fourier transform, I'd suggest getting to grips with the basics, first.

You might also wanna browse a bit around on: https://dsp.stackexchange.com/
Especially when it comes to the more mathematical aspects of signal processing, it's a good place to get help, etc.

You might also wanna check out university coursework material, like this one: http://web.eecs.umich.edu/~fessler/course/451/l/pdf/c1.pdf

Re: Audio Summing Algorithm

Reply #10
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
also including what happens when you mix a signal with silence and then adjust the gain to match the volume — does that produce a result that's (audibly) different from the source?
if you arbitrarily change phase across time, it could be possible that this distortion could be noticed… hard to tell without samples though.

> However, if you're doing musical stuff, that effect is unwanted
this effect is completely fine and it won't matter in practice if you are mixing uncorrelated signals, such as different instruments. it just means if you are duplicating tracks in your DAW for some reason, or recording a single instrument with several mics at the same time, then you need to think a bit more to do stuff correctly.
and it is what will happen in the air/ears anyway if you are listening to several natural sound sources, too.
This software seems to be in the area that you are discussing:
https://www.soundradix.com/products/pi/
Regarding messing with the phase -- I would really have to listen to that device and listen to how it is used to give it a yay or nay.  The reason for my attitude is dealing with my precious copy of DolbyA encoded ABBA UNMASTERED.   So, I have as pure as one can get without having access to the multi-track master (please don't ask me how I got that material -- I TRULY CANNOT REMEMBER -- it has been over a decade -- and my memory has been damaged.)
I love the ABBA music, but cannot stand the chaos in some of the recordings -- it has taken me over 1yr to figure out why they sometimes sound so bad -- they are damaging the sound quality by playing too much with the phase.
There is a lot of discordant (sp?) material in the 90degree (and 45 degree -- in between -- but really kind of the same thing) directions, making the sound ugly.  When blindly adding stuff to the quadrature, it does things that the ears aren't used to -- and causes an unevenness in sound.   By doing simple summing on semi-random material -- it is probably safer.  By messing with the phase in a non-physically real way -- it can cause problems.  (Of course, there are phase shifts all over the real-world, but a NON-ANALYTICAL type audio signal is what I am speaking of...   An analytical signal is closer to real world, and is kind of a natural signal.  Some artificial signals violate some of the rules, and sometimes sound bad.)

My guess is that some of the games played by the ABBA recording people were meant to pack more energy into the signal for AM radio (making the music more dense), but on high quality equipment it doesn't sound as good as a simple sum.

Of course, if one does bias the input (by increasing the correlation between the sources to be added), then odd things can happen, and then using phase tricks might maintain more of each source -- buuuut -- the ears are not really used to hearing that kind of game.   The natural mix of real and quadrature parts of a signal becomes too violated, and it does give a disrupting effect to the sound.

So, if the phase management software is used to change the phase when it is important to do so (the sources are not fully independent), then it can be helpful.  But, such a device can likely be used for evil also :-).

John

Re: Audio Summing Algorithm

Reply #11
@antony96 when adding samples: when you're adding two discrete time samples, you just add them like that.

Samples will either constructively or destructively interfere, that is perfectly normal and it happens in the real world all the time. You must then scale your envelope in case you want keep things from clipping, in case you want to ensure a non-clipped signal. Also keep in mind the type of signal we're talking here. Technically, you want your values to be in a manageable domain, like signed float, so they can be added beyond an imposed top-level. I.e. effectively using arbitrary precision, etc. When it comes to signal processing, it's much better (imo) to think of it purely as a mathematical construct. Whatever you want to then quantize to put it as PCM values into a file, is to be decided /after/ you're done with your signal processing as such. This is especially important when you're designing signal path circuits with active filters and suchlike.

In case you need a better mental image, make sure you're working on a signed, float basis, this actually represents a "natural" signal, unrelated to technical "technicalities" the closest.

In case you need a better representation of such signals, think of brightness levels in pictures, each line in an image is essentially a one-dimensional signal (assuming it's grayscale), etc.

If you wanna try things out, Octave comes with a bunch of pre-made function for discrete-time signal summation, and loads of other functions. So do some playing around with that, and you'll learn a lot by simply trial and error.

A nice beginner's cheat-sheet: https://en.wikibooks.org/wiki/Digital_Signal_Processing/Discrete_Operations

Not sure if it makes sense getting more technical, but if you are interested in LTI systems, and signal processing in general, it's a good idea to start with simply the nomenclature, learn what convolution is, and what it represents, etc.
Whether you want to go into Laplace and Fourier transform, I'd suggest getting to grips with the basics, first.

You might also wanna browse a bit around on: https://dsp.stackexchange.com/
Especially when it comes to the more mathematical aspects of signal processing, it's a good place to get help, etc.

You might also wanna check out university coursework material, like this one: http://web.eecs.umich.edu/~fessler/course/451/l/pdf/c1.pdf

Yes -- the very best is to be educated/understand the matters in depth.  I was purposefully being a little too detailed but not detailed enough also.  When I saw the question and answer about summing the signal, that started conjuring up all of the various 'summation' issues that I have run into...   Probably the most important are the matters of clipping and/or truncation when dealing with fixed point operations.  (I normally work in floating point which gets rid of 90% of the problems -- but not 100%.  It is still good to know the various potential problems.)
Also, I brought in the spectre of phase because of the additional dynamic range/gain issues that happen when doing the potentially problematical 'divide by N' but also noting that 'divide by sqrt(N)' isn't really an adequate solution either.
I was trying to short circuit the idea (esp if the questioner would ever write much DSP code in the future) that the one and only answer to summing the signals is to 'divide by N'.

For a perfect example of the fact that 'divide by N' isn't always the best answer -- think of yourself sitting at an audio control board -- are all of the gains turned down every time that a new source is added to the mix?  answer: NO.  What is the reason?  When adding the signals, one isn't adding the peak levels of each signal, but rather the signals themselves which have statistical and/or phase characteristics.  Simple amplitude values do NOT have phase, but are just a number.  So in the real world, it isn't always correct to just 'divide by N'.   Divide by N is mostly a worst case suggestion given that the truncation and clipping issues are also correctly handled.

John

Re: Audio Summing Algorithm

Reply #12
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)

Re: Audio Summing Algorithm

Reply #13
This software seems to be in the area that you are discussing:
https://www.soundradix.com/products/pi/
This was a link to a software which is proprietary, requires registration and is apparently infested with DRM ("iLok").
I asked for samples only. This one looks more like some situational marketing.

However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Yeah that's what I said.
a fan of AutoEq + Meier Crossfeed

Re: Audio Summing Algorithm

Reply #14
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Nope, left and right speaker are separate (best case scenario: headphones - near-perfect channel separation). This is why we can hear audio even if the speakers play signals being inverse of each other.
sox -e float -b 32 -V4 -D gain -3 rate -v 48000 norm -1
opusenc --bitrate 128

Re: Audio Summing Algorithm

Reply #15
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
I'm actually looking for software for FFT-based downmixing - I can't program it because I don't have enough skill... :/
I can't access computer at the moment, so I'm posting a sample of BAD downmixing by simple averaging samples.
This is the most obvious and ugliest example of simple downmixing. This doesn't apply only to this track, though; whenever I turn on "Convert stereo to mono" in music player, I hear that mono audio is dull. I have the same feeling while listening to my radio - bad stereo separation provided by my tiny "gear" combined with not so good signal reception means that I'm practically listening to mono radio. It sounds as dull as while downmixing in my music player. (Unfortunately radio has to perform downmix that way, that is L+R, not by FFT, otherwise it couldn't send stereo audio through radio.)
sox -e float -b 32 -V4 -D gain -3 rate -v 48000 norm -1
opusenc --bitrate 128


Re: Audio Summing Algorithm

Reply #17
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Nope, left and right speaker are separate (best case scenario: headphones - near-perfect channel separation). This is why we can hear audio even if the speakers play signals being inverse of each other.

Speakers (and rooms) aren't perfect, and the impulse response of a speaker also depends on the position of the listener's ear in space.
If you add different and random EQ and reverb to each channel of each track which you are mixing, you'll surely also hear something even if they were to cancel each other otherwise, unless you are infinitely (un)lucky.
a fan of AutoEq + Meier Crossfeed


Re: Audio Summing Algorithm

Reply #19
Here are the samples.
A "good" example is missing.
I know. There are none :( I'm looking for software to do FFT- based downmix.

Someone on HA mentioned FFT-based downmix earlier than me: https://hydrogenaud.io/index.php/topic,5747.0.html
I'm not alone!
sox -e float -b 32 -V4 -D gain -3 rate -v 48000 norm -1
opusenc --bitrate 128

Re: Audio Summing Algorithm

Reply #20
Speakers (and rooms) aren't perfect, and the impulse response of a speaker also depends on the position of the listener's ear in space.
If you add different and random EQ and reverb to each channel of each track which you are mixing, you'll surely also hear something even if they were to cancel each other otherwise, unless you are infinitely (un)lucky.
How does it relate to "smart" downmixing?
sox -e float -b 32 -V4 -D gain -3 rate -v 48000 norm -1
opusenc --bitrate 128

Re: Audio Summing Algorithm

Reply #21
So, you see, it's not really so nice to recommend an approach which you didn't even test yourself.
Here are the samples.
A "good" example is missing.
I know. There are none :( I'm looking for software to do FFT- based downmix.

Someone on HA mentioned FFT-based downmix earlier than me: https://hydrogenaud.io/index.php/topic,5747.0.html
I'm not alone!
So, you see, it's not really so nice to recommend an approach which you didn't even test yourself.

Speakers (and rooms) aren't perfect, and the impulse response of a speaker also depends on the position of the listener's ear in space.
If you add different and random EQ and reverb to each channel of each track which you are mixing, you'll surely also hear something even if they were to cancel each other otherwise, unless you are infinitely (un)lucky.
How does it relate to "smart" downmixing?
It counters the argument about speakers "always" leaving "something" even if the stuff you're playing is out of phase.
Speakers aren't perfect, and if you try to emulate these imperfections, the mix in software also won't cancel out everything.
You aren't gonna fool physics.
a fan of AutoEq + Meier Crossfeed

Re: Audio Summing Algorithm

Reply #22
The FFT does preserve phase.  Furthermore it is linear, so (A+B) and IFFT(FFT(A)+FFT(A)) actually give exactly the same result:

Code: [Select]
>> A = [1 2 3]

A =

     1     2     3

>> B = [2 4 6]

B =

     2     4     6

>> A+B

ans =

     3     6     9

>> ifft(fft(A)+fft(B))

ans =

     3     6     9

Probably the above posts mentioning FFT really mean power spectrum, which would not preserve phase.  However, I do not recommend doing this except in the case where you are specifically trying to avoid beating between two signals (for example, different channels of the same recording). 

Re: Audio Summing Algorithm

Reply #23
If I get up the energy to play with this stuff, I can probably put together a rough 'FFT' based summer (that is, not an FFT, but power based) to see what forcing the phases to match would sound like.  I suspect the results would at least be grainy sounding (like some of the ABBA recordings due to the abuse of the quadrature/destroying the analytical nature of the signal), but more than likely renders the sound unintelligible.

Actually, I have done something similar before -- by zeroing the phase of a given FFT (magnitudes the same, phase is zeroed -- all the same), and the result was pretty much unintelligble-- it was possible to detect some of the material, but didn't seem to be useful.   At the time, I was playing with an FFT based compressor/expander design, and found that the results were generally not much better than an excellently designed 'linear gain control scheme with well managed attack/decay behavior.'   The FFT approach is certainly helpful for NR, but AFAIK there are better techniques using a complex MDCT, where the blips and bleeps are better managed.

The comment about a true FFT being a linear transform is true -- add an FFT (each) of two signals and the result will be the same as an FFT of the sum of the actual time domain originals.

John

Re: Audio Summing Algorithm

Reply #24
Probably the above posts mentioning FFT really mean power spectrum, which would not preserve phase.  However, I do not recommend doing this except in the case where you are specifically trying to avoid beating between two signals (for example, different channels of the same recording). 
THIS. :) Downmix by simple adding the signals is useful when mixing uncorrelated signals. What I'm proposing is smart downmix for two highly (anti)correlated signals, e.g. channels of the same signal.
sox -e float -b 32 -V4 -D gain -3 rate -v 48000 norm -1
opusenc --bitrate 128