Difference between Downsample to Mono and averaging samples

Topic: Difference between Downsample to Mono and averaging samples (Read 835 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Difference between Downsample to Mono and averaging samples

2023-12-13 11:49:01

My question is maybe not strictly related to foobar2000 but it may be implementation-based:

I'm investigating the behavior of foo_vis_spectrum_analyzer. Given a track containing a stereo samples of a pure 1kHz sine signal I get a different spectrum in the following cases:

- Mathematically averaging the left and right channels: the resulting spectrum is suspicious
- Analyzing either the left or the right channel: the resulting spectrum is suspicious
- Using the downsample to mono DSP: the resulting spectrum seems correct and correlates with other analyzers.

Which leads me to assume the code is correct but the data I feed it is not.

Any explanations about what the DSP does different to get a correct result?

Re: Difference between Downsample to Mono and averaging samples

Reply #1 – 2023-12-13 12:46:17

Perhaps your averaging test had a bug. The built-in downmixer seems to just do (L+R)/2.

Re: Difference between Downsample to Mono and averaging samples

Reply #2 – 2023-12-16 05:47:55

Quote from: Case on 2023-12-13 12:46:17

Perhaps your averaging test had a bug. The built-in downmixer seems to just do (L+R)/2.

Probably agree with you, so for @pqyt the correct way to get the right data is you have to account that the samples that we got from get_data() function of audio chunk, which are gathered from get_chunk_absolute() (which is audio waveform data that is before the FFT, but the time offset of samples gathered by get_chunk_absolute and get_spectrum_absolute (before FFT of course) is different; the first one gathers samples with the last sample is ahead of the actual playback proportional to the length of the gathered chunk while the latter one have the first sample behind the actual playback proportional to half the FFT size and the last one is ahead of actual playback, again proportional to half the FFT size), are stored in interleaved form (and thus the actual length of get_data() array that we got is the requested length of samples times the number of channels), and properly deinterleave the samples first before doing the averaging process (and since we have channels option on your foo_vis_spectrum_analyzer, any channel that we got from interleaved get_data() array of samples matches the corresponding options that are ticked are used for the averaging samples)

BTW, this component mentioned above might not properly account for changes in channel configuration and sample rates (e.g. applying mono-to-stereo or any channel config-changing DSPs, and/or perhaps the sample-rate changing DSPs like resamplers) while this component is already active

Also, unless you need to use complex input (which is necessary for visualization of I/Q signals, but idk how useful it is for audio analysis, but let's see the add the feature to treat stereo pairs as complex-valued inputs for FFT), the FFT input data should be real-valued (and yet the FFT output are still complex-valued, which is required for proper sinc/Lanczos interpolation)

Re: Difference between Downsample to Mono and averaging samples

Reply #3 – 2023-12-17 09:07:57

Quote from: Case on 2023-12-13 12:46:17

Perhaps your averaging test had a bug. The built-in downmixer seems to just do (L+R)/2.

I don't exclude any option but that does not answer the question why a mono left or right channel shows the same, odd spectrum with no averaging being done.

I did some further experimenting and up to 1024 FFT bins the complex FFT shows a nice, clean peak, even for the stereo sample. As the FFT size gets near the size of the audio chunk the output changes. Surely a bug but where?

Edit: Even a mono signal shows a wonky spectrum when the FFT size is larger than 1024. Swapping the Nayuki FFT implementation with fftw has the same behavior.

Re: Difference between Downsample to Mono and averaging samples

Reply #4 – 2023-12-17 09:20:30

Quote from: TF3RDL on 2023-12-16 05:47:55

Quote from: Case on 2023-12-13 12:46:17
Perhaps your averaging test had a bug. The built-in downmixer seems to just do (L+R)/2.
Probably agree with you, so for @pqyt the correct way to get the right data is you have to account that the samples that we got from get_data() function of audio chunk, which are gathered from get_chunk_absolute() (which is audio waveform data that is before the FFT, but the time offset of samples gathered by get_chunk_absolute and get_spectrum_absolute (before FFT of course) is different; the first one gathers samples with the last sample is ahead of the actual playback proportional to the length of the gathered chunk while the latter one have the first sample behind the actual playback proportional to half the FFT size and the last one is ahead of actual playback, again proportional to half the FFT size), are stored in interleaved form (and thus the actual length of get_data() array that we got is the requested length of samples times the number of channels), and properly deinterleave the samples first before doing the averaging process (and since we have channels option on your foo_vis_spectrum_analyzer, any channel that we got from interleaved get_data() array of samples matches the corresponding options that are ticked are used for the averaging samples)

BTW, this component mentioned above might not properly account for changes in channel configuration and sample rates (e.g. applying mono-to-stereo or any channel config-changing DSPs, and/or perhaps the sample-rate changing DSPs like resamplers) while this component is already active

Also, unless you need to use complex input (which is necessary for visualization of I/Q signals, but idk how useful it is for audio analysis, but let's see the add the feature to treat stereo pairs as complex-valued inputs for FFT), the FFT input data should be real-valued (and yet the FFT output are still complex-valued, which is required for proper sinc/Lanczos interpolation)

It's probably me but I do not understand the point of your message: What part is explanation of DSP theory? What is observation of the component behavior? What part is conjecture about a possible bug?

Re: Difference between Downsample to Mono and averaging samples

Reply #5 – 2023-12-18 08:15:37

Quote from: pqyt on 2023-12-17 09:07:57

Quote from: Case on 2023-12-13 12:46:17
Perhaps your averaging test had a bug. The built-in downmixer seems to just do (L+R)/2.
I don't exclude any option but that does not answer the question why a mono left or right channel shows the same, odd spectrum with no averaging being done.

I did some further experimenting and up to 1024 FFT bins the complex FFT shows a nice, clean peak, even for the stereo sample. As the FFT size gets near the size of the audio chunk the output changes. Surely a bug but where?

Edit: Even a mono signal shows a wonky spectrum when the FFT size is larger than 1024. Swapping the Nayuki FFT implementation with fftw has the same behavior.

Since get_chunk_absolute(audio_chunk, time, requested_length) can get audio data of specified buffer length and time offset, I don't think a ring buffer is needed just for FFT right? Given the fact any discontinuities resulting from improper and/or unnecessary use of ring buffer gives you a wonky spectrum even on a mono signal right?

Notice