Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Merge Channels and keeping volume? (Sox or ffmpeg) (Read 8945 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Merge Channels and keeping volume? (Sox or ffmpeg)

Hi,

I am trying to understand why merging 2 mono files to make another mono file reduces the volume by half or something.
Is there a reason for this?

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #1
That is done to prevent overflow. If you divide both signals by 2 then it is not possible to exceed full scale when you add them.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #2
Quote
I am trying to understand why merging 2 mono files to make another mono file reduces the volume by half or something.
Is there a reason for this?
I don't know exactly what SoX or FFmpeg are doing, but that shouldn't happen.  The volume should essentially be an average between the two (at least where the sounds happen at the same time).    For example if you mix Thriller  and Beat It  together (mono or stereo), and you reduce the volume of each by 6dB before mixing, the mixed file should have about the same volume as either song by itself.   

You can also normalize after mixing to "maximize" the volume.

As pdq says, mixing is done by summation.  Obviously that could potentially result in clipping, so it's a good idea to divide by two (reduce by 6dB) before mixing (or it can be done after mixing if processing is done in floating-point).   

If you are using an audio editor or DAW, you can individually adjust the volumes of the tracks before mixing (and after mixing), and that's totally under your control.  Sometimes these applications have "automation" which allows you to adjust the volume of the individual tracks from moment-to-moment while mixing.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #3
Hmm, i guess i have to use normalization after they are mixed them.

Not sure if that's possible without 2-pass which makes things a bit complicated, but will look at it:)

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #4
Hmm, i guess i have to use normalization after they are mixed them.

Not sure if that's possible without 2-pass which makes things a bit complicated, but will look at it:)


Hmm, tried normalization, for some reason the volume kinda stays the same.

No idea why, i mean the volume is clearly different from the mixed compared to the unmixed ones;S

Also what settings would you suggest in mixing?
The sources are 16 bit 16khz.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #5
Reducing the volume -3dB on each (half the power) should keep the overall power the same, assuming they both have continuous sounds, rather than, say, speech where two people are taking turns to speak, in which case don't reduce the volumes.

However, when you do this, the peaks will almost certainly add to more than full-scale when you add them together at times and cause clipping of peaks up to 1.41 back to 1.00.

SoX by default on many such processes will prevent clipping by reducing the volume. What you probably really want, to maintain loudness, is the add them with -3dB gain in floating point then use a look-ahead limiter to tame the peaks with soft clipping.

When I'm adding waveforms, I usually use Audacity and open both in the same project. Each has a gain slider, which I could set to -3dB if it's this kind of job, and after Mixing and Rendering to a single track I could run a Fast Lookahead Limiter (optional add-in) to tame any peaks, or alternatively, reduce it by 6dB to prevent clipping, export to 24-bit lossless (e.g. FLAC) then use Foobar2000's Advanced Limiter after applying +3dB fixed pre-amp gain under the Replaygain menu to do the same when converting to a 16-bit lossless format.
Dynamic – the artist formerly known as DickD

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #6
Quote
Hmm, tried normalization, for some reason the volume kinda stays the same.
Normalization uses the peak as it's reference, so it depends on how the files "mix" to create a new peak.

...Imagine you again take Thriller  and Beat It,  but this time you do a "DJ style crossfade" where the end of one song overlaps the beginning of the other, and you mix without applying any fade-in or fade-out.*  Where those two files mix and overlap you'll get "twice the volume" (twice the peak level).    If you then normalize or otherwise adjust to avoid clipping, the volume of both songs will be lower where they are not mixed.

Then, there's human perception...  Loudness does not correlate well with peak level (so normalizing two files won't necessarily make them sound equally loud).    Or, if you mix narration over music, the result may sound quieter (or louder) than either file alone (after normalization).

If the mixed file is not loud enough after normalization, you can use dynamic compression or limiting to boost the average level without boosting or clipping the peaks. 





* I don't remember how those songs "naturally" fade-in or fade-out, so with these particular songs, you might not get twice the peak where they overlap.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #7
The purpose of attenuation is to prevent clipping. In worst case, if both channels are identical, the mixdown will be 6dB louder. In another extreme case, if both channels are identical and one of them is inverted, the mixdown will be silence because they nulled. In real life the resulting peak can be any number between silence and +6dB.

If you are using a DAW (audio suites like Cubase, Reaper, Sonar etc) you can prevent attenuation by setting the "pan law" to 0dB. To 100% avoid any possible clipping, set to -6dB.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #8
Hmm, this is more complex than i thought.

Cause my assumption was, that volume and perception of that is the same as the peak level.

So if you hade 2 low sounds and merged them, it would still follow that principle.

I can see this isn't the case as i do get clipping if i increase the volume, this i do not won't, i am not that desperate.

The filed in question are actually Skype recordings, meaning both will have a severe "leveled" volume, cause you know, VOIP etc.

I am a bit confused how floating point is supposed to help any though,
i was of the understanding that floating point is only good for rounding errors, which i do understand is good when merging files.
Other than that it can have extremely low volume, but i didn't think it had any impact on high volumes, i thought all depths had the same there?

And btw on Sox when you merge files, how can you do it in floating point for quality, is that default that it upsamples and then downsamples back the output files or?

Thanks

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #9
The concept of floating point is simple. If we represent digital audio in dB, we describe the loudest legitimate level as 0dBFS and silence as -infinity dB. In floating point representation, 0dBFS is +/-1.0 and -infinity dB is 0.0. In an audio editor like Audacity, +/- values determine the waveform is above or below the center line (0.0). In 16-bit fixed point, 2^16 = 65536. In signed representation, the range is -32768 to +32767 and 0 means silence.

Now, let's convert a 32-bit float value +0.5 to 16-bit fixed point, it is 0.5 * 32768 = 16384. How about a float value of -1.5? That will be -1.5 * 32768 = -49152.

A 32-bit float value has at least 7 digits of accuracy, that means values like 987654.3 and 0.000009876543 can be losslessly stored. If the value is 9876543987 it will be rounded to something like 9876544000, but for fixed point (integer) value, a signed 16-bit value cannot exceed the range of -32768 to +32767.

So you can see the problem, a float value of -1.5 cannot be correctly converted to -49152 because it is out of range. It will be limited to -32768 and result in clipping. To avoid this, just normalize the value to within -1.0 before converting to fixed point and that's why floating point processing can avoid clipping.

In another case, if the value is 0.000009876543 in 32-bit float, since 0.000009876543 * 32768 = 0.323634561024, it will be rounded to 0 in 16-bit fixed point. Since anything multiply by 0 will be equal to 0, the value is practically lost when converted to fixed point. To avoid this, simply normalize this extremely small value to a larger one before converting to fixed point.

Therefore the advantage of floating point processing is that it can represent very large and small values without clipping or losing precision. That means you don't need to worry about volume vs quality BEFORE converting to fixed point and DURING floating point processing.

My explanation about floating point is pretty simplified, to study the details please refer to some programming textbooks, tutorials or Wikipedia.

PS: "upsamples" and "downsamples" are used to describe sample rate conversion. For bit-depth conversion we can say padding(increase) or truncate/round(decrease) or simply convert.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #10
The concept of floating point is simple. If we represent digital audio in dB, we describe the loudest legitimate level as 0dBFS and silence as -infinity dB. In floating point representation, 0dBFS is +/-1.0 and -infinity dB is 0.0. In an audio editor like Audacity, +/- values determine the waveform is above or below the center line (0.0). In 16-bit fixed point, 2^16 = 65536. In signed representation, the range is -32768 to +32767 and 0 means silence.

Now, let's convert a 32-bit float value +0.5 to 16-bit fixed point, it is 0.5 * 32768 = 16384. How about a float value of -1.5? That will be -1.5 * 32768 = -49152.

A 32-bit float value has at least 7 digits of accuracy, that means values like 987654.3 and 0.000009876543 can be losslessly stored. If the value is 9876543987 it will be rounded to something like 9876544000, but for fixed point (integer) value, a signed 16-bit value cannot exceed the range of -32768 to +32767.

So you can see the problem, a float value of -1.5 cannot be correctly converted to -49152 because it is out of range. It will be limited to -32768 and result in clipping. To avoid this, just normalize the value to within -1.0 before converting to fixed point and that's why floating point processing can avoid clipping.

In another case, if the value is 0.000009876543 in 32-bit float, since 0.000009876543 * 32768 = 0.323634561024, it will be rounded to 0 in 16-bit fixed point. Since anything multiply by 0 will be equal to 0, the value is practically lost when converted to fixed point. To avoid this, simply normalize this extremely small value to a larger one before converting to fixed point.

Therefore the advantage of floating point processing is that it can represent very large and small values without clipping or losing precision. That means you don't need to worry about volume vs quality BEFORE converting to fixed point and DURING floating point processing.

My explanation about floating point is pretty simplified, to study the details please refer to some programming textbooks, tutorials or Wikipedia.

PS: "upsamples" and "downsamples" are used to describe sample rate conversion. For bit-depth conversion we can say padding(increase) or truncate/round(decrease) or simply convert.


Ah, okay i get it now.
I was simply looking at the "1.0" thing, and didn't think that you could just store the information as pure data and not "Audio" while processing.

And thanks for the clarification, i know sampling is the wrong word but didn't know what the other word was.
Padding to me sounds like you add bitdepth that hasn't any real data in it though, just increasing the size without reason. rounding/truncate make sense though.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #11
Padding to me sounds like you add bitdepth that hasn't any real data in it though, just increasing the size without reason.


Padding does not make any practical change in audio quality. Reasons for doing this are to prepare room for further processing, to mix files with different bit depths (such as 8 and 16) or for compatibility reasons. Some devices only support 16 and 32-bit natively, to playback a 24-bit file they need to pad to 32-bit on the fly during playback. Note that 32-bit can be fixed or floating point. Fixed point 32-bit means 2^32 = 4294967296, -2147483648 to +2147483647 signed. While it has higher accuracy its quality is still volume dependent.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #12
Padding to me sounds like you add bitdepth that hasn't any real data in it though, just increasing the size without reason.


Padding does not make any practical change in audio quality. Reasons for doing this are to prepare room for further processing, to mix files with different bit depths (such as 8 and 16) or for compatibility reasons. Some devices only support 16 and 32-bit natively, to playback a 24-bit file they need to pad to 32-bit on the fly during playback. Note that 32-bit can be fixed or floating point. Fixed point 32-bit means 2^32 = 4294967296, -2147483648 to +2147483647 signed. While it has higher accuracy its quality is still volume dependent.


Ah that's true, that explains the word Padding.

Well in most cases it doesn't matter and one should use floating point right, cause it has much more headroom?

Also how can i control in sox or ffmpeg to convert to floating point when merging, i can't find any information regarding that.


Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #14
Don't know about ffmpeg but SoX doesn't fully support floating point.
https://www.hydrogenaud.io/forums/index.php...amp;mode=linear


Oh, that was weird, expected such a software to support it, guess there's reasons for it.

I can't really find information regarding ffmpeg, but the way you mix with that seems kinda "hackish", not straightforward.
But i would assume that it uses float though, but that's just a guess.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #15
As some people replied in the thread I posted, SoX operates in 32-bit integer internally. It can read float files but the range is limited to +/- 1.0. Therefore if your input files are not in float format they can't be out of range and won't be clipped by SoX in the input stage. Since SoX has -G option (clip guard) it will automatically reduce level during processing to avoid clipping. 32-bit integer has more than 192dB of dynamic range so wasting a few dBs to guard against clipping will still outperform the best DACs in the world.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #16
FFmpeg supports unsigned 8 bit, signed 16 bit, signed 32 bit(what SoX uses internally), float and double sample format.
You can mix 2 mono channels into 1 channel with pan or amix filter, see documentation for more info.
Also you can merge 2 mono channels into stereo channel layout with amerge filter.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #17
FFmpeg supports unsigned 8 bit, signed 16 bit, signed 32 bit(what SoX uses internally), float and double sample format.
You can mix 2 mono channels into 1 channel with pan or amix filter, see documentation for more info.
Also you can merge 2 mono channels into stereo channel layout with amerge filter.


I used amix i think, but i couldn't find info what it was used internally when merging them.

Merge Channels and keeping volume? (Sox or ffmpeg)

Reply #18
I used amix i think, but i couldn't find info what it was used internally when merging them.


amix internally uses float, with pan filter (which can use any sample format), have more complicated syntax with which you can fully control how mixing will be done.