HydrogenAudio

Hydrogenaudio Forum => Scientific Discussion => Topic started by: antony96 on 2018-04-22 15:52:48

Title: Audio Summing Algorithm
Post by: antony96 on 2018-04-22 15:52:48
hello

I'm a newbie in digital audio programming. Can anyone point me to an article/formula on how to mix two or more audio streams together? It's probably very simple, but I haven't found anything via Google/Bing.

I'm operating on 32-bit floats on a Mac and managing my own audio callback loop (not using Core Audio mixer objects)

Thanks,
Title: Re: Audio Summing Algorithm
Post by: Nikaki on 2018-04-22 16:51:06
You just add the sample values. Nothing special needed. If you exceed 1.0, you can just clip when you convert the final result to integer samples, or divide every sample in the final result so that the highest peak is 1.0.

If you want to guarantee that there won't be any clipping during the mixing, you would need to divide the samples by the amount of streams to be mixed.

There are results for this on the net. You might want to search for "mixing audio samples clipping."
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-23 03:17:05
Obviously, in the worst case (as noted above), one needs to reduce the sum of the levels by the count of sources.  However, there are some possible modifications of that rule if you have control of the result (and just want to avoid blowing out your ears before tuning the levels later on.)   So here are the mathematical rules:

1)  in the general case, you need to divide the level by the count of inputs.   when doing this math, you need to be careful about underflow (esp if not properly dithered) and overflow during the actual summation.

2) if you are summing multiple sound sources with the same song, mostly uncorrelated -- then an approximation using sqrt(N) instead of N might still overflow, but can come fairly close.  This will NOT work in the worst case, but gives you an idea of a starting point for each individual instance.

3) If you are summing totally independent sources (e.g. multiple bands of frequencies that do not overlap), then the sqrt(N) rule comes closer to correct.

I am NOT meaning to confuse the hard rule about dividing by 'N", but rather trying to show the subtle nature of the statistics involved.

AGAIN, the initial answer (divide by N) is perfectly accurate -- if you never can overflow (including during the math), then you need to divide each input by N before doing the summation.   If you can 'overflow' during the summation (happen to temporarily be using floating point or larger range number), then you can do the summation and then the division (it is likely to maintain quality better if you can do so and avoid the overflow of the temporary value.)   This also applies in the analog situation, where it is better to keep the signals as strong as you can for as long as you can -- however avoiding overflow (or the analog 'clipping') of the circuit in every part of the circuit.

Isn't it amazing about how complicated a simple 'putting together' of multiple signals can be?  (It really isn't that complicated, but it is always a good idea to keep aware of what is going on in the circuit or software.)  I guess that I just might be making it complicated, but really trying to help with 'thinking' and trying to help new thinkers practice their learning skills!!!

John
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 10:28:17
If you care about the quality even in the slightest, then clipping is unacceptable.
If clipping is unacceptable and you need an universal solution, then there's no other choice than to divide everything by the maximum possible sum of peak levels. (And if they are 1.0 for all streams, then it's simply the number of streams). Because this is the maximum gain which cannot clip under any circumstances.
If you don't need an universal solution, then it's up to the user to choose the gain.
This is quite simple IMO; you are making it more complicated.

Quote
3) If you are summing totally independent sources (e.g. multiple bands of frequencies that do not overlap), then the sqrt(N) rule comes closer to correct.
Nope it doesn't; actually the opposite.
It's easily tested by summing 2 pure sine waves with different frequencies.
This test shows that "multiple bands of frequencies that do not overlap" is not enough; it's also easy to show that it's not necessary too.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 11:43:22
It's probably very simple
You just add the sample values. Nothing special needed. If you exceed 1.0, you can just clip when you convert the final result to integer samples, or divide every sample in the final result so that the highest peak is 1.0.
TL;DR Of course this is the simplest algorithm and if you're a newbie, stick with that.

<IF YOU'RE NOT AUDIO FANATIC, DON'T READ THAT CRAZY NERDY STUFF>
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING! It's intended while doing technical stuff, for example in a nulling test - if you want to check whether streams are the same, invert one of them and if you get silence, they're the same.
However, if you're doing musical stuff, that effect is unwanted and this is why downmixed-to-mono music sometimes sounds a bit differently than in stereo - some frequencies are out-of-phase and they change their power after downmixing because of interference. If you want to do it really-hyper-super-good quality way, dive into advanced programming and do FFT - change the signal into frequencies creating it. After FFT simply average resulting coefficients and then do Inverse FFT - create a signal from the information about its frequencies.
For example: let's see at the picture again. With normal downmixing they cancel each other, yeah? But let's FFT both signals. They both have the same frequency and the same amplitude, so FFT will show: FREQ1 blah-blah-blah, AMPL1 1.0, FREQ2 blah-blah-blah, AMPL2 1.0. FFT doesn't show info about the phase, it's ignored. Now because FREQ1 and FREQ2 are perfectly the same, wa can average the amplitudes. That's simple: average of 1.0 and 1.0 is 1.0 ;) We get FREQ blah-blah-blah, AMPL 1.0. Then simply inverse FFT and voila, you get audibly perfect downmix - no single frequency lost its power.

It's a bit nerdy, isn't it? I told you not to read that if you're newbie :D
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 12:59:46
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
also including what happens when you mix a signal with silence and then adjust the gain to match the volume — does that produce a result that's (audibly) different from the source?
if you arbitrarily change phase across time, it could be possible that this distortion could be noticed… hard to tell without samples though.

> However, if you're doing musical stuff, that effect is unwanted
this effect is completely fine and it won't matter in practice if you are mixing uncorrelated signals, such as different instruments. it just means if you are duplicating tracks in your DAW for some reason, or recording a single instrument with several mics at the same time, then you need to think a bit more to do stuff correctly.
and it is what will happen in the air/ears anyway if you are listening to several natural sound sources, too.
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 13:03:47
> FFT doesn't show info about the phase, it's ignored
FFT by itself preserves that info, in fact FFT is a completely reversible transform in the mathematical sense. It's you who have decided to discard that information.
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-23 13:47:22
If you care about the quality even in the slightest, then clipping is unacceptable.
If clipping is unacceptable and you need an universal solution, then there's no other choice than to divide everything by the maximum possible sum of peak levels. (And if they are 1.0 for all streams, then it's simply the number of streams). Because this is the maximum gain which cannot clip under any circumstances.
If you don't need an universal solution, then it's up to the user to choose the gain.
This is quite simple IMO; you are making it more complicated.

Quote
3) If you are summing totally independent sources (e.g. multiple bands of frequencies that do not overlap), then the sqrt(N) rule comes closer to correct.
Nope it doesn't; actually the opposite.
It's easily tested by summing 2 pure sine waves with different frequencies.
This test shows that "multiple bands of frequencies that do not overlap" is not enough; it's also easy to show that it's not necessary too.
Note that I always qualified by an implied statistics disclaimer.   Yes -- if you purposely bias the experiment, then the results that I suggested would be more towards the worst case.  Please note that I was implying a lack of bias.
By giving SPECIFIC examples, you purposefully created bias -- and even though you probably didn't realize it, you didn't disagree with me -- but probably only meaning to be DISAGREEABLE.

Just use the more typical (average) example where music was being used for the material in question -- most likely due to the learning nature of the individual and the context of this environment...  My statement about the statistics remain accurate...  That is, even though you didn't know that you weren't disagreeing.

John
Title: Re: Audio Summing Algorithm
Post by: knutinh on 2018-04-23 13:47:48
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
also including what happens when you mix a signal with silence and then adjust the gain to match the volume — does that produce a result that's (audibly) different from the source?
if you arbitrarily change phase across time, it could be possible that this distortion could be noticed… hard to tell without samples though.

> However, if you're doing musical stuff, that effect is unwanted
this effect is completely fine and it won't matter in practice if you are mixing uncorrelated signals, such as different instruments. it just means if you are duplicating tracks in your DAW for some reason, or recording a single instrument with several mics at the same time, then you need to think a bit more to do stuff correctly.
and it is what will happen in the air/ears anyway if you are listening to several natural sound sources, too.
This software seems to be in the area that you are discussing:
https://www.soundradix.com/products/pi/
Title: Re: Audio Summing Algorithm
Post by: polemon on 2018-04-23 13:59:26
@antony96 when adding samples: when you're adding two discrete time samples, you just add them like that.

Samples will either constructively or destructively interfere, that is perfectly normal and it happens in the real world all the time. You must then scale your envelope in case you want keep things from clipping, in case you want to ensure a non-clipped signal. Also keep in mind the type of signal we're talking here. Technically, you want your values to be in a manageable domain, like signed float, so they can be added beyond an imposed top-level. I.e. effectively using arbitrary precision, etc. When it comes to signal processing, it's much better (imo) to think of it purely as a mathematical construct. Whatever you want to then quantize to put it as PCM values into a file, is to be decided /after/ you're done with your signal processing as such. This is especially important when you're designing signal path circuits with active filters and suchlike.

In case you need a better mental image, make sure you're working on a signed, float basis, this actually represents a "natural" signal, unrelated to technical "technicalities" the closest.

In case you need a better representation of such signals, think of brightness levels in pictures, each line in an image is essentially a one-dimensional signal (assuming it's grayscale), etc.

If you wanna try things out, Octave comes with a bunch of pre-made function for discrete-time signal summation, and loads of other functions. So do some playing around with that, and you'll learn a lot by simply trial and error.

A nice beginner's cheat-sheet: https://en.wikibooks.org/wiki/Digital_Signal_Processing/Discrete_Operations

Not sure if it makes sense getting more technical, but if you are interested in LTI systems, and signal processing in general, it's a good idea to start with simply the nomenclature, learn what convolution is, and what it represents, etc.
Whether you want to go into Laplace and Fourier transform, I'd suggest getting to grips with the basics, first.

You might also wanna browse a bit around on: https://dsp.stackexchange.com/
Especially when it comes to the more mathematical aspects of signal processing, it's a good place to get help, etc.

You might also wanna check out university coursework material, like this one: http://web.eecs.umich.edu/~fessler/course/451/l/pdf/c1.pdf
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-23 14:13:44
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
also including what happens when you mix a signal with silence and then adjust the gain to match the volume — does that produce a result that's (audibly) different from the source?
if you arbitrarily change phase across time, it could be possible that this distortion could be noticed… hard to tell without samples though.

> However, if you're doing musical stuff, that effect is unwanted
this effect is completely fine and it won't matter in practice if you are mixing uncorrelated signals, such as different instruments. it just means if you are duplicating tracks in your DAW for some reason, or recording a single instrument with several mics at the same time, then you need to think a bit more to do stuff correctly.
and it is what will happen in the air/ears anyway if you are listening to several natural sound sources, too.
This software seems to be in the area that you are discussing:
https://www.soundradix.com/products/pi/
Regarding messing with the phase -- I would really have to listen to that device and listen to how it is used to give it a yay or nay.  The reason for my attitude is dealing with my precious copy of DolbyA encoded ABBA UNMASTERED.   So, I have as pure as one can get without having access to the multi-track master (please don't ask me how I got that material -- I TRULY CANNOT REMEMBER -- it has been over a decade -- and my memory has been damaged.)
I love the ABBA music, but cannot stand the chaos in some of the recordings -- it has taken me over 1yr to figure out why they sometimes sound so bad -- they are damaging the sound quality by playing too much with the phase.
There is a lot of discordant (sp?) material in the 90degree (and 45 degree -- in between -- but really kind of the same thing) directions, making the sound ugly.  When blindly adding stuff to the quadrature, it does things that the ears aren't used to -- and causes an unevenness in sound.   By doing simple summing on semi-random material -- it is probably safer.  By messing with the phase in a non-physically real way -- it can cause problems.  (Of course, there are phase shifts all over the real-world, but a NON-ANALYTICAL type audio signal is what I am speaking of...   An analytical signal is closer to real world, and is kind of a natural signal.  Some artificial signals violate some of the rules, and sometimes sound bad.)

My guess is that some of the games played by the ABBA recording people were meant to pack more energy into the signal for AM radio (making the music more dense), but on high quality equipment it doesn't sound as good as a simple sum.

Of course, if one does bias the input (by increasing the correlation between the sources to be added), then odd things can happen, and then using phase tricks might maintain more of each source -- buuuut -- the ears are not really used to hearing that kind of game.   The natural mix of real and quadrature parts of a signal becomes too violated, and it does give a disrupting effect to the sound.

So, if the phase management software is used to change the phase when it is important to do so (the sources are not fully independent), then it can be helpful.  But, such a device can likely be used for evil also :-).

John
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-23 14:23:54
@antony96 when adding samples: when you're adding two discrete time samples, you just add them like that.

Samples will either constructively or destructively interfere, that is perfectly normal and it happens in the real world all the time. You must then scale your envelope in case you want keep things from clipping, in case you want to ensure a non-clipped signal. Also keep in mind the type of signal we're talking here. Technically, you want your values to be in a manageable domain, like signed float, so they can be added beyond an imposed top-level. I.e. effectively using arbitrary precision, etc. When it comes to signal processing, it's much better (imo) to think of it purely as a mathematical construct. Whatever you want to then quantize to put it as PCM values into a file, is to be decided /after/ you're done with your signal processing as such. This is especially important when you're designing signal path circuits with active filters and suchlike.

In case you need a better mental image, make sure you're working on a signed, float basis, this actually represents a "natural" signal, unrelated to technical "technicalities" the closest.

In case you need a better representation of such signals, think of brightness levels in pictures, each line in an image is essentially a one-dimensional signal (assuming it's grayscale), etc.

If you wanna try things out, Octave comes with a bunch of pre-made function for discrete-time signal summation, and loads of other functions. So do some playing around with that, and you'll learn a lot by simply trial and error.

A nice beginner's cheat-sheet: https://en.wikibooks.org/wiki/Digital_Signal_Processing/Discrete_Operations

Not sure if it makes sense getting more technical, but if you are interested in LTI systems, and signal processing in general, it's a good idea to start with simply the nomenclature, learn what convolution is, and what it represents, etc.
Whether you want to go into Laplace and Fourier transform, I'd suggest getting to grips with the basics, first.

You might also wanna browse a bit around on: https://dsp.stackexchange.com/
Especially when it comes to the more mathematical aspects of signal processing, it's a good place to get help, etc.

You might also wanna check out university coursework material, like this one: http://web.eecs.umich.edu/~fessler/course/451/l/pdf/c1.pdf

Yes -- the very best is to be educated/understand the matters in depth.  I was purposefully being a little too detailed but not detailed enough also.  When I saw the question and answer about summing the signal, that started conjuring up all of the various 'summation' issues that I have run into...   Probably the most important are the matters of clipping and/or truncation when dealing with fixed point operations.  (I normally work in floating point which gets rid of 90% of the problems -- but not 100%.  It is still good to know the various potential problems.)
Also, I brought in the spectre of phase because of the additional dynamic range/gain issues that happen when doing the potentially problematical 'divide by N' but also noting that 'divide by sqrt(N)' isn't really an adequate solution either.
I was trying to short circuit the idea (esp if the questioner would ever write much DSP code in the future) that the one and only answer to summing the signals is to 'divide by N'.

For a perfect example of the fact that 'divide by N' isn't always the best answer -- think of yourself sitting at an audio control board -- are all of the gains turned down every time that a new source is added to the mix?  answer: NO.  What is the reason?  When adding the signals, one isn't adding the peak levels of each signal, but rather the signals themselves which have statistical and/or phase characteristics.  Simple amplitude values do NOT have phase, but are just a number.  So in the real world, it isn't always correct to just 'divide by N'.   Divide by N is mostly a worst case suggestion given that the truncation and clipping issues are also correctly handled.

John
Title: Re: Audio Summing Algorithm
Post by: Nikaki on 2018-04-23 14:26:37
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 15:32:19
This software seems to be in the area that you are discussing:
https://www.soundradix.com/products/pi/
This was a link to a software which is proprietary, requires registration and is apparently infested with DRM ("iLok").
I asked for samples only. This one looks more like some situational marketing.

However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Yeah that's what I said.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 15:54:08
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Nope, left and right speaker are separate (best case scenario: headphones - near-perfect channel separation). This is why we can hear audio even if the speakers play signals being inverse of each other.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 16:08:23
ziemek.z:
care to post a few samples with the sources and results of this mixing approach? (I presume you have some software implementation at hand)
I'm actually looking for software for FFT-based downmixing - I can't program it because I don't have enough skill... :/
I can't access computer at the moment, so I'm posting a sample of BAD downmixing by simple averaging samples.
This is the most obvious and ugliest example of simple downmixing. This doesn't apply only to this track, though; whenever I turn on "Convert stereo to mono" in music player, I hear that mono audio is dull. I have the same feeling while listening to my radio - bad stereo separation provided by my tiny "gear" combined with not so good signal reception means that I'm practically listening to mono radio. It sounds as dull as while downmixing in my music player. (Unfortunately radio has to perform downmix that way, that is L+R, not by FFT, otherwise it couldn't send stereo audio through radio.)
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 16:09:35
Here are the samples.
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 16:12:28
However, what if signals are out-of-phase?
In the worst-case scenario (two inverted signals, as in attached picture) waveforms cancel each other and after down-mixing you'll hear literally NOTHING!
This is true even if you don't mix but instead use two different sound sources. So actually not trying to avoid this gives you very high, real-life physics accuracy ;-)
Nope, left and right speaker are separate (best case scenario: headphones - near-perfect channel separation). This is why we can hear audio even if the speakers play signals being inverse of each other.

Speakers (and rooms) aren't perfect, and the impulse response of a speaker also depends on the position of the listener's ear in space.
If you add different and random EQ and reverb to each channel of each track which you are mixing, you'll surely also hear something even if they were to cancel each other otherwise, unless you are infinitely (un)lucky.
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 16:16:10
Here are the samples.
A "good" example is missing.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 16:22:07
Here are the samples.
A "good" example is missing.
I know. There are none :( I'm looking for software to do FFT- based downmix.

Someone on HA mentioned FFT-based downmix earlier than me: https://hydrogenaud.io/index.php/topic,5747.0.html
I'm not alone!
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 16:30:08
Speakers (and rooms) aren't perfect, and the impulse response of a speaker also depends on the position of the listener's ear in space.
If you add different and random EQ and reverb to each channel of each track which you are mixing, you'll surely also hear something even if they were to cancel each other otherwise, unless you are infinitely (un)lucky.
How does it relate to "smart" downmixing?
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 16:36:46
So, you see, it's not really so nice to recommend an approach which you didn't even test yourself.
Here are the samples.
A "good" example is missing.
I know. There are none :( I'm looking for software to do FFT- based downmix.

Someone on HA mentioned FFT-based downmix earlier than me: https://hydrogenaud.io/index.php/topic,5747.0.html
I'm not alone!
So, you see, it's not really so nice to recommend an approach which you didn't even test yourself.

Speakers (and rooms) aren't perfect, and the impulse response of a speaker also depends on the position of the listener's ear in space.
If you add different and random EQ and reverb to each channel of each track which you are mixing, you'll surely also hear something even if they were to cancel each other otherwise, unless you are infinitely (un)lucky.
How does it relate to "smart" downmixing?
It counters the argument about speakers "always" leaving "something" even if the stuff you're playing is out of phase.
Speakers aren't perfect, and if you try to emulate these imperfections, the mix in software also won't cancel out everything.
You aren't gonna fool physics.
Title: Re: Audio Summing Algorithm
Post by: saratoga on 2018-04-23 17:47:28
The FFT does preserve phase.  Furthermore it is linear, so (A+B) and IFFT(FFT(A)+FFT(A)) actually give exactly the same result:

Code: [Select]
>> A = [1 2 3]

A =

     1     2     3

>> B = [2 4 6]

B =

     2     4     6

>> A+B

ans =

     3     6     9

>> ifft(fft(A)+fft(B))

ans =

     3     6     9

Probably the above posts mentioning FFT really mean power spectrum, which would not preserve phase.  However, I do not recommend doing this except in the case where you are specifically trying to avoid beating between two signals (for example, different channels of the same recording). 
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-23 18:23:09
If I get up the energy to play with this stuff, I can probably put together a rough 'FFT' based summer (that is, not an FFT, but power based) to see what forcing the phases to match would sound like.  I suspect the results would at least be grainy sounding (like some of the ABBA recordings due to the abuse of the quadrature/destroying the analytical nature of the signal), but more than likely renders the sound unintelligible.

Actually, I have done something similar before -- by zeroing the phase of a given FFT (magnitudes the same, phase is zeroed -- all the same), and the result was pretty much unintelligble-- it was possible to detect some of the material, but didn't seem to be useful.   At the time, I was playing with an FFT based compressor/expander design, and found that the results were generally not much better than an excellently designed 'linear gain control scheme with well managed attack/decay behavior.'   The FFT approach is certainly helpful for NR, but AFAIK there are better techniques using a complex MDCT, where the blips and bleeps are better managed.

The comment about a true FFT being a linear transform is true -- add an FFT (each) of two signals and the result will be the same as an FFT of the sum of the actual time domain originals.

John
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 18:39:39
Probably the above posts mentioning FFT really mean power spectrum, which would not preserve phase.  However, I do not recommend doing this except in the case where you are specifically trying to avoid beating between two signals (for example, different channels of the same recording). 
THIS. :) Downmix by simple adding the signals is useful when mixing uncorrelated signals. What I'm proposing is smart downmix for two highly (anti)correlated signals, e.g. channels of the same signal.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 18:41:52
If I get up the energy to play with this stuff, I can probably put together a rough 'FFT' based summer (that is, not an FFT, but power based) to see what forcing the phases to match would sound like.  I suspect the results would at least be grainy sounding (like some of the ABBA recordings due to the abuse of the quadrature/destroying the analytical nature of the signal), but more than likely renders the sound unintelligible.

Actually, I have done something similar before -- by zeroing the phase of a given FFT (magnitudes the same, phase is zeroed -- all the same), and the result was pretty much unintelligble-- it was possible to detect some of the material, but didn't seem to be useful.
Can you send us some samples, please? And why does it happen?
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 18:48:25
The comment about a true FFT being a linear transform is true -- add an FFT (each) of two signals and the result will be the same as an FFT of the sum of the actual time domain originals.

WHAAAAAT?
Let's assume we have a signal and an inverse of it. If we FFT both, we get same freqs and amps, but another phases (which doesn't matter while listening). However if we sum anticorrelated signals, we get NOTHING. LITERALLY NOTHING. FFT OF NOTHING IS NOTHING.
How can be average of FFTs of something be equal to FFT of nothing?
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 18:56:09
The comment about a true FFT being a linear transform is true -- add an FFT (each) of two signals and the result will be the same as an FFT of the sum of the actual time domain originals.

WHAAAAAT?
Let's assume we have a signal and an inverse of it. If we FFT both, we get same freqs and amps, but another phases (which doesn't matter while listening). However if we sum anticorrelated signals, we get NOTHING. LITERALLY NOTHING. FFT OF NOTHING IS NOTHING.
How can be average of FFTs of something be equal to FFT of nothing?
Very easy. If you do it by the definition, that is, not ignoring phase information.
Similar to how an average of 2 numbers may be zero.
Title: Re: Audio Summing Algorithm
Post by: saratoga on 2018-04-23 19:02:02
The comment about a true FFT being a linear transform is true -- add an FFT (each) of two signals and the result will be the same as an FFT of the sum of the actual time domain originals.

WHAAAAAT?

https://en.wikipedia.org/wiki/Fourier_transform#Basic_properties

How can be average of FFTs of something be equal to FFT of nothing?

The sum of the FFTs of two waveforms that sum to zero also sums to zero.  That is what linearity means.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 19:06:21
OK... but how does it happen?
FFT of zero: FREQ 0, AMPL 0, PHASE 0.
Average of FFTs: FREQ freq, AMPL ampl, PHASE (phase1 + phase2) / 2
...is that right?
Title: Re: Audio Summing Algorithm
Post by: saratoga on 2018-04-23 19:17:13
OK... but how does it happen?

If two signals are 180 degrees out of  phase, then each element is the negative of the other.  If you sum (X + -X) you get zero.

FFT of zero: FREQ 0, AMPL 0, PHASE 0.

The FFT only returns 1 complex number per input representing the amplitude and phase. 

Average of FFTs: FREQ freq, AMPL ampl, PHASE (phase1 + phase2) / 2

Addition and subtraction of complex values works differently than you are assuming:

https://en.wikipedia.org/wiki/Complex_number#Addition_and_subtraction
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-23 19:32:56
So if I FFT'd both channels of the signal, averaged coefficients and IFFT'd them, would it do the same as simply averaging both signals? :o
Title: Re: Audio Summing Algorithm
Post by: magicgoose on 2018-04-23 19:40:55
If you do it correctly and if we ignore loss of precision (fixed size floating point is still fixed size floating point), then yes, it would do the same.
Title: Re: Audio Summing Algorithm
Post by: saratoga on 2018-04-23 19:48:25
So if I FFT'd both channels of the signal, averaged coefficients and IFFT'd them, would it do the same as simply averaging both signals? :o

Yes, that is exactly what I was saying above.
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-23 21:01:22
If I get up the energy to play with this stuff, I can probably put together a rough 'FFT' based summer (that is, not an FFT, but power based) to see what forcing the phases to match would sound like.  I suspect the results would at least be grainy sounding (like some of the ABBA recordings due to the abuse of the quadrature/destroying the analytical nature of the signal), but more than likely renders the sound unintelligible.

Actually, I have done something similar before -- by zeroing the phase of a given FFT (magnitudes the same, phase is zeroed -- all the same), and the result was pretty much unintelligble-- it was possible to detect some of the material, but didn't seem to be useful.
Can you send us some samples, please? And why does it happen?
I have massive archives from when I played with that stuff (about 2013 or so), and will look for something to demo to you.   Like I wrote earlier, there was little benefit even with the compressor/expander thing to work in the Fourier domain (more trouble than it was worth, even though I wanted to work in multiple bands -- seems like the transform would give you lots of bands and seems like that would be good -- but it wasn't all that helpful.)  I could definitely compress the hell out of the signal, but I didn't like the sound of a 512 band compressor. :-).   It seems like 6-8bands is into the diminishing returns, and it seems helpful to try to keep the 500-2500Hz range in one band by itself -- various reasons for that.

John
Title: Re: Audio Summing Algorithm
Post by: polemon on 2018-04-24 00:59:32
I believe @ziemek.z misunderstanding of FFT (i.e. the fact that FFT returns a complex function from a real variable) comes from how audio editors often display FFTs. Many - including Audacity - display it as a sequence of buckets, where frequencies of each bucket are painted in a voiceprint. The tools often quitely ignore the Imaginary part of the resulting function, and simply paint the magnitude of the complex value, and ignore the argument, which in that case is phase.
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-24 01:18:20
I believe @ziemek.z misunderstanding of FFT (i.e. the fact that FFT returns a complex function from a real variable) comes from how audio editors often display FFTs. Many - including Audacity - display it as a sequence of buckets, where frequencies of each bucket are painted in a voiceprint. The tools often quitely ignore the Imaginary part of the resulting function, and simply paint the magnitude of the complex value, and ignore the argument, which in that case is phase.
Okay -- that makes sense.  From what I know (and from my previous practical experience), the phase is all important.  When I wrote my prototype compressor/expander, the math operations were done on complex numbers.  Using magnitudes only was just a waste of time and just produced garbage.

In my recent work on compressors, expanders, Aphex Exciter removers, DolbyA decoders, etc -- when trying to clean up the sound of some old recordings -- I found that they were sometimes screwing with the phase in bad ways and summing/subtracting phase shifted versions of the signal to/from itself -- the results might have given an effect of making middle freqs more intense (in the case of the parameters used in certain devices), but with the super high quality equipment of today -- the ugliness becomes more apparent.  At least, they weren't zeroing the phase, but even playing with the quadrature (the stuff 90deg out of phase) should not be done lightly.   Even though I didn't like the results -- they weren't going to crazy with the phase -- it is just that phase can mess things up simiilarly to messed up freq response.  Doing some phase things (fancy for the time that 4ch matrix was common) is part of how matrix quad could work reasonably well.   IMO, f possible, unless done absolutely carefully -- not a good idea to play with the phase of any aspect of the signal.  There are right ways/good reasons for doing it -- but if questions need to be asked on forums like this, then it is a good idea to avoid doing it for now :-).

John
Title: Re: Audio Summing Algorithm
Post by: polemon on 2018-04-24 06:06:21
I believe @ziemek.z misunderstanding of FFT (i.e. the fact that FFT returns a complex function from a real variable) comes from how audio editors often display FFTs. Many - including Audacity - display it as a sequence of buckets, where frequencies of each bucket are painted in a voiceprint. The tools often quitely ignore the Imaginary part of the resulting function, and simply paint the magnitude of the complex value, and ignore the argument, which in that case is phase.
Okay -- that makes sense.  From what I know (and from my previous practical experience), the phase is all important.  When I wrote my prototype compressor/expander, the math operations were done on complex numbers.  Using magnitudes only was just a waste of time and just produced garbage.
Depends. If you have a signal that is even (symmetric at center) in your bucket, the Imaginary part of your FFT of that signal is zero. The magnitude is then only the Real part.
In cases where a voiceprint is displayed (I'm not sure "voiceprint" is the correct term, I've seen it a bunch of times, though), it seems it's /just/ being used for display purposes, mixing anything even when using that display option, doesn't mean the actual FFT of that signal is used, and it seems indeed it almost never (in my experience) is.

If your sample bucket is small enough such that it is manageable, you can move all samples beyond the center, map them into the negative part by duplicating them symmetrically, and then run the FFT on that. The Imaginary part of the signal in that bucket will be zero (or equal to the zero function rather). Using this for anything but display purposes isn't really sensical, though.
Title: Re: Audio Summing Algorithm
Post by: ziemek.z on 2018-04-24 06:19:14
Thank you for clarifying done very important things before I even started. You saved me from doing useless stuff and completely wasting my time! I think I should learn more about FFT and DSP in general...
Title: Re: Audio Summing Algorithm
Post by: jsdyson on 2018-04-24 06:23:58
I believe @ziemek.z misunderstanding of FFT (i.e. the fact that FFT returns a complex function from a real variable) comes from how audio editors often display FFTs. Many - including Audacity - display it as a sequence of buckets, where frequencies of each bucket are painted in a voiceprint. The tools often quitely ignore the Imaginary part of the resulting function, and simply paint the magnitude of the complex value, and ignore the argument, which in that case is phase.
Okay -- that makes sense.  From what I know (and from my previous practical experience), the phase is all important.  When I wrote my prototype compressor/expander, the math operations were done on complex numbers.  Using magnitudes only was just a waste of time and just produced garbage.
Depends. If you have a signal that is even (symmetric at center) in your bucket, the Imaginary part of your FFT of that signal is zero. The magnitude is then only the Real part.
In cases where a voiceprint is displayed (I'm not sure "voiceprint" is the correct term, I've seen it a bunch of times, though), it seems it's /just/ being used for display purposes, mixing anything even when using that display option, doesn't mean the actual FFT of that signal is used, and it seems indeed it almost never (in my experience) is.

If your sample bucket is small enough such that it is manageable, you can move all samples beyond the center, map them into the negative part by duplicating them symmetrically, and then run the FFT on that. The Imaginary part of the signal in that bucket will be zero (or equal to the zero function rather). Using this for anything but display purposes isn't really sensical, though.
Okay -- sorry that I wasn't clear -- not properly handling the phase produced garbage *AUDIO*.  I wasn't really looking at using the FFT for any numerical purpose at that time.  So with audio, I didn't really have much of an opportunity to massage the signal as much as you suggested.

I will tell you one thing though -- I think that my next venture into the time/frequency type transforms will be more likely a complex MDCT instead.  I know that you don't get magic for free -- but when people have had the patience to be a trailblazer using the complex MDCT, they have sometimes gotten good results.   I know that the normal (or a variation) MDCT is often used in audio compression, but the complex MDCT supports more flexibility for signal processing purposes.
The FFT domain is 'old hat' nowadays -- I was drawing FFT and even DCT type butterfly diagrams for undergraduate papers in EE in 1979.  I truly don't remember everything, but over the last year or so I have been reading some of the books that I collected over the years on the subject of DCTs, Lapped transforms, etc (the black DCT book, Malvar's book, and every document that I can find online.)   There aren't a lot of papers on the complex MDCT, but it does seem very interesting.
So much of the really cool stuff requires some concentration and a long startup time (a few weeks), but I am having so much fun with my easy stuff (DolbyA, REALLY GOOD expander/NR project, and a pretty good compressor package.)
I truly hope that I get a chance to work on some new technolgy (for me) before I mentally slow down -- I know that I have at least 1-2 more years to do on the mostly time domain compressor/expander/NR projects, so hope that health holds up!!!

John
Title: Re: Audio Summing Algorithm
Post by: kode54 on 2018-05-13 00:39:21
Very interesting discussion spawned by a spambot.

(It was a verbatim repost of this 2010 mailing list post (https://lists.apple.com/archives/coreaudio-api/2010/Mar/msg00094.html), thanks reporter who may choose to name themself in a reply.)