Stress testing sample rate converters

Topic: Stress testing sample rate converters (Read 35061 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Stress testing sample rate converters

Reply #25 – 2009-07-01 15:07:37

Quote from: 2Bdecided on 2009-07-01 12:24:40

It's not even a perfect impulse in the time domain - it might look like one for the specific 100% artificial example where you're looking at an impulse perfectly co-timed with a sampling instant - but delay the signal by half a sample, and then see what you get

David, sorry, you are on the same track of misunderstanding with Axon. Let me explain once again. The pulse plots at InfiniteWave show the amount of ringing introduced into the signal at all possible sub-sample shifts, not at the only shift when the impulse is perfectly aligned with a sampling instant. The depicted filter includes all the polyphase components of a multirate SRC filter. It is in fact a multirate SRC filter operating at 14 MHz. See below.
Indeed, Ableton is using a linear interpolation, hence no ringing is supposed to be introduced in the output signal, no matter what the "phase" is.
Ringing of SRC should not be confused with ringing of a possible further D/A conversion. This second ringing is completely defined by an oversampling filter of D/A and has nothing to do with the tested SRC. The graphs at InfiniteWave are only concerned with ringing of SRC.

Quote from: Cavaille on 2009-07-01 11:17:09

Oh, that is interesting. How can I setup the waveform window in RX to show me the impulse response and what tone do I need for that? I always wondered how people were doing it...

That's all about preparing the waveform data, not about setting RX. The pulse train test signal (see InfiniteWave Help and FAQ sections) is used to extract all the polyphase components of the multirate SRC filter operating at 14 MHz. Then this filter is written into a 14-MHz WAV file, and this WAV file is opened by RX. We do not even need to change default RX display settings.
However, for those interested, RX has a selectable interpolation order for showing the "analog" waveform between digital samples (see Preferences/Misc). The default value is 30 (windowed sinc), but it can be set to 0 or 1 for more accurate display of impulse responses of digital filters.

Stress testing sample rate converters

Reply #26 – 2009-07-01 15:29:37

Quote from: Alexey Lukin on 2009-07-01 15:07:37

Quote from: 2Bdecided on 2009-07-01 12:24:40
It's not even a perfect impulse in the time domain - it might look like one for the specific 100% artificial example where you're looking at an impulse perfectly co-timed with a sampling instant - but delay the signal by half a sample, and then see what you get

David, sorry, you are on the same track of misunderstanding with Axon. Let me explain once again. The pulse plots at InfiniteWave show the amount of ringing introduced into the signal at all possible sub-sample shifts, not at the only shift when the impulse is perfectly aligned with a sampling instant. The depicted filter includes all the polyphase components of a multirate SRC filter. It is in fact a multirate SRC filter operating at 14 MHz. See below.
Indeed, Ableton is using a linear interpolation, hence no ringing is supposed to be introduced in the output signal, no matter what the "phase" is.
Ringing of SRC should not be confused with ringing of a possible further D/A conversion. This second ringing is completely defined by an oversampling filter of D/A and has nothing to do with the tested SRC. The graphs at InfiniteWave are only concerned with ringing of SRC.

I didn't realise the testing was so sophisticated, and I've just enjoyed reading a little about it.

However, that wasn't my point at all - while you have gone to great lengths to accurately show the impulse response of these SRCs, a normal reader might conclude that what is shown is the response of a given SRC when fed with an(y) impulse.

In reality, there isn't just one impulse "response", but a range of responses, depending on the time relationship between the input impulse and the output samples.

For a good SRC the differences are real but (arguably) irrelevant - after a decent DAC, the analogue signal is the same for any given original impulse.

For some bad SRCs, the differences are equally real but represent the destruction of the original signal. The result is that, after a decent DAC, the amplitude of the analogue signal can vary dramatically depending on the time location of the original impulse.

So it's not a case of "great impulse, poor sine wave" - both are mangled.

Cheers,
David.

Stress testing sample rate converters

Reply #27 – 2009-07-01 15:54:33

You are right: there is a range of possible output impulses, when SRCing a single impulse. And all these possible impulses are contained in the "combined" impulse that InfiniteWave is depicting.

Stress testing sample rate converters

Reply #28 – 2009-07-01 16:47:34

...which is fine for a time invariant system, but in the case of some "broken" (time variant) SRCs means you'll be hiding a fundamental problem?

(Not sure any of these SRCs are that bad, but it's possible).

(Not sure what I'd plot in this case - maybe some overlays?)

Cheers,
David.

Stress testing sample rate converters

Reply #29 – 2009-07-12 05:29:30

Hi,

Quote

SebastianG: "The post ringing is under 2 milli seconds and the range of group delays within the audible band is probably limited to a couple of micro seconds. I would not expect you to be able to hear a difference."

Not wishing to hijack the thread, just a quick question:
Is the difference between different resamplers (with regard to impulse response) similar to the kind of difference between different EQs (like minimum phase vs linear phase ?)

I already ABXd myself multiple times comparing minimum phase and linear phase EQing. I can reliably identify differences.
In bass frequencies I can percieve a phase shift a lot more than a pre-echo; it also impacts stereo imaging. Did not really compare high frequency EQing, perhaps I would tend to prefer less pre-echo there.

In this thread: minimum phase vs linear phase eq
I asked what are the thresholds of audibility of pre-echo and of phase shift. (Sadly no answers) But, perhaps this would also be useful for resampling ?
Has any comprehensive research been done on this subject ?

And finally, can different material benefit from different resamplers ?
If yes, would it be possible, and would it make sense to continuously, automatically adapt the resampling filter's impulse response to better suit the specific frequency spectrum being resampled, as to minimize audible pre-echo and phase shift ?

Currently, its seems like
1. Don't care, just resample
2. Guess/follow advice/theorize on which resampler is best
3. ABX different resamplers for one or a few songs, pick best, generalize and use for everything
4. ABX resamplers for every song (if differing material can benefit from different resamplers) which I am not crazy enough to do.

Surely there is a way to automate the process of finding the optimal resampling impulse response according to the specific material, but I doubt it has been done already. Would this make any sense ?

Thanks !

Stress testing sample rate converters

Reply #30 – 2009-07-12 07:41:33

Quote

In bass frequencies I can percieve a phase shift a lot more than a pre-echo; it also impacts stereo imaging. Did not really compare high frequency EQing, perhaps I would tend to prefer less pre-echo there.

http://www.hydrogenaudio.org/forums/index....st&p=608302

Quote

The heart of the sample rate conversion process is the low pass filter, which is designed to reject aliases (down conversion) or images (up conversion) — both of which are detrimental to sound quality.

And phase settings of a resampler are essentially phase settings of its low-pass filter. So you should really compare (very) high frequency EQing.

Stress testing sample rate converters

Reply #31 – 2009-07-14 15:33:42

Quote from: Euphonie on 2009-07-12 05:29:30

Is the difference between different resamplers (with regard to impulse response) similar to the kind of difference between different EQs (like minimum phase vs linear phase ?)

Similar. A resampler is supposed to make no changes to sound quality other than the obvious one implied by the downward change in Nyqusit frequency.

Quote

I already ABXd myself multiple times comparing minimum phase and linear phase EQing. I can reliably identify differences.

The difference being that eq should *generally* take place at an audible frequency, while due to the crazy world we live in, resamplers often make changes above the range we can hear.

Quote

In bass frequencies I can percieve a phase shift a lot more than a pre-echo; it also impacts stereo imaging. Did not really compare high frequency EQing, perhaps I would tend to prefer less pre-echo there.

Unlike high frequencies the huma ear actually has a mechanism for discerning changes in phase. If you don't apply a phase change equally to all relevant channels, there will be FR changes which can easily be audible.

Quote

In this thread: minimum phase vs linear phase eq
I asked what are the thresholds of audibility of pre-echo and of phase shift. (Sadly no answers)

If I remember that thread correctly, it was limited to resampling to/from Nyquists that were above audibility.

Quote

But, perhaps this would also be useful for resampling ?

It's the same basic problem. Both A-D and resampling benefit from brick wall filtering as a general rule.

Quote

Has any comprehensive research been done on this subject ?

I don't know about exactly this problem, but certainly any of a number of relevant problems involving the same basic procedure - digital low-pass filtering.

Quote

And finally, can different material benefit from different resamplers ?

Depends on what you call a benefit. Some people like the artifacts of a grab-bag of distortion, look at vinyl!

If you define the goal of resampling as I did above, then a reampler that is good for very many things even tough jobs, it should also be good for very many more things that it hasn't been tested with.

Quote

If yes, would it be possible, and would it make sense to continuously, automatically adapt the resampling filter's impulse response to better suit the specific frequency spectrum being resampled, as to minimize audible pre-echo and phase shift ?

Only if you see resampling as a EFX.

Quote

Currently, its seems like

1. Don't care, just resample
2. Guess/follow advice/theorize on which resampler is best
3. ABX different resamplers for one or a few songs, pick best, generalize and use for everything
4. ABX resamplers for every song (if differing material can benefit from different resamplers) which I am not crazy enough to do.

Surely there is a way to automate the process of finding the optimal resampling impulse response according to the specific material, but I doubt it has been done already. Would this make any sense ?

The usual sitaution is that ringing is less audible when the pre/post ratio does the best possible job of hiding it below the relevant temporal masking curve, which says that the ear is more sensitive to pre-ringing than post-masking by something like 3-4:1.

IOW more post masking than pre masking would seem to be a reasonable goal.

The whole game of downsampling changes meaning when the Nyquist frequency is smack dab in the middle of the audio band, such as happens when we process for the best possible speech quality with minimal data.

Stress testing sample rate converters

Reply #32 – 2009-07-15 00:03:21

Quote from: Euphonie on 2009-07-12 05:29:30

I asked what are the thresholds of audibility of pre-echo and of phase shift. (Sadly no answers) But, perhaps this would also be useful for resampling ?Has any comprehensive research been done on this subject ?

How about these:

H. Møller and P Minnaar, "On the audibility of all-pass phase in electroacoustical transfer functions," J. Audio Eng. Soc., vol. 55, pp. 115-134 (2007 March).

J. Blauert and P. Laws, "Group Delay Distortions in Electroacoustic Systems," J. Acoust. Soc. Am., vol. 63, pp. 1478-1483 (1978 May).

Stress testing sample rate converters

Reply #33 – 2009-08-03 09:55:00

Alexey,

I am intrigued by the nature of the iZotope intermediate phase filter.

When downsampling to 44.1kHz it is linear phase up to 14kHz or so,
and only then turns MP-ish.

Is there anything you can/want to say about its architecture?

Thanks,

Werner

Stress testing sample rate converters

Reply #34 – 2009-08-03 10:19:14

The design goal was to be able to smoothly transition between linear-phase and minimum-phase filters. It not only allows trading pre-ringing for post-ringing, but also linearizes the passband phase response as much as possible. Our filter is a hybrid: it uses a linear-phase response below a certain frequency and a minimum-phase response above that frequency. The filter is FIR, so the design can be a straightforward window method.

Stress testing sample rate converters

Reply #35 – 2009-08-04 12:04:35

Quote from: Alexey Lukin on 2009-08-03 10:19:14

The design goal was to be able to smoothly transition between linear-phase and minimum-phase filters. It not only allows trading pre-ringing for post-ringing, but also linearizes the passband phase response as much as possible. Our filter is a hybrid: it uses a linear-phase response below a certain frequency and a minimum-phase response above that frequency. The filter is FIR, so the design can be a straightforward window method.

So, does it sound any different than a more naively-designed filter?

For downsampling to 44 KHz sampling?

For downsampling to 10 KHz sampling?

Stress testing sample rate converters

Reply #36 – 2009-08-04 12:09:56

Frankly, I have no idea. But our testers say that yes.

Stress testing sample rate converters

Reply #37 – 2009-08-16 12:01:46

Quote from: Alexey Lukin on 2009-08-04 12:09:56

Frankly, I have no idea. But our testers say that yes.

BWTW Alexy, you're the same Alexy Lukin whose name comes up when I click about in RMAA? If so, got any tips about what to do about the dozens of times I totally crashed the current downloadable version last night? Audio interface was a Card Deluxe with their current latest driver, and the OS was XP SP2. Older Via chipset with 333 Hz RAM.

Stress testing sample rate converters

Reply #38 – 2009-08-16 12:18:11

Quote from: Arnold B. Krueger on 2009-08-16 12:01:46

Older Via chipset with 333 Hz RAM.

May I kindly suggest that this is a rather exotic platform.

Stress testing sample rate converters

Reply #39 – 2009-08-16 20:54:37

Arnie, I've been developing RMAA since earliest versions, but more recent versions have been done without my participation. I'm sure that developers of the current version (also mentioned in the About box) will appreciate your feedback.

Stress testing sample rate converters

Reply #40 – 2015-01-13 14:20:45

Quote from: udauda on 2009-07-15 00:03:21

Quote from: Euphonie on 2009-07-12 05:29:30
I asked what are the thresholds of audibility of pre-echo and of phase shift. (Sadly no answers) But, perhaps this would also be useful for resampling ?Has any comprehensive research been done on this subject ?

How about these:

H. Møller and P Minnaar, "On the audibility of all-pass phase in electroacoustical transfer functions," J. Audio Eng. Soc., vol. 55, pp. 115-134 (2007 March).

J. Blauert and P. Laws, "Group Delay Distortions in Electroacoustic Systems," J. Acoust. Soc. Am., vol. 63, pp. 1478-1483 (1978 May).

Here is the abstract from the JAES paper:

"Audible effects of second-order all-pass sections with center frequencies in the range of
1–12 kHz were studied in headphone listening experiments. All-pass sections give rise to two
effects. 1) A perception of “ringing” or “pitchiness,” which is related to an exponentially
decaying sinusoid in the impulse response of all-pass sections with high Q factors. The
ringing is especially audible for impulsive sounds, whereas it is often masked with everyday
sounds such as speech and music. With an impulse signal the ringing was found to be audible
when the decay time constant for the sinusoid exceeds approximately 0.8 ms (peak group
delay of 1.6 ms), independent of the center frequency within the frequency range studied. 2)
A lateral shift of the auditory image, which occurs when an all-pass section is inserted in the
signal path to only one ear. The shift is related to the low-frequency phase and group delays
of the all-pass section, and it was found to be audible whenever these exceed approximately
35 s, independent of the signal."

and the conclusions:

"
The aim of this work was to study the effects of all-pass
components in transfer functions, and to determine under
which circumstances the effects are audible. The work was
carried out with the purpose of giving a background for
evaluating the all-pass phase in binaural synthesis and
playback, but the results may also be useful for other
purposes and applications.

All-pass sections give rise to an exponentially decaying
sinusoid, which may be perceived as ringing or “pitchiness.”

The higher Q is, the longer is the decay, and the
more audible is the ringing. It was found that the audibility
of the ringing depends strongly on the signal. For the most
sensitive signal, an impulse, thresholds for Q correspond
to a decay time constant of the ringing around 0.8 ms for
center frequencies in the range of 1–12 kHz. The peak
group delay of an all-pass filter is twice the decay time
constant. Therefore the threshold may also be given in
terms of a peak group delay of 1.6 ms. The audibility is the
same, whether the all-pass section is inserted in both sides
or in only one side, which suggests that the ringing is
detected in the individual ear and not as part of a binaural
processing. The thresholds are slightly lower for inverted
all-pass sections, which seems natural, since backward
masking is usually less effective than forward masking.

The thresholds found for the ringing correspond well with
the few data that exist in the literature for the same conditions.
In addition, the peak group delay at threshold corresponds
well with existing knowledge of the temporal
resolution of our hearing. For frequencies below 1 kHz,
the data are well complemented by literature data, which
suggest that the threshold is given by a constant Q of
around 2 to 3.

The experiments were carried out with an artificial signal,
an impulse, but sound signals that are highly impulsive
do occur in real life, such as from percussion, and the
thresholds may also apply to such critical real-life signals.
However, for most real-life signals the ringing is more
likely to be masked by the signal itself, and it is expected
to be inaudible, even if Q is higher than the thresholds
found. Reverberation in the recording or during loudspeaker
playback may also impair detection.

The thresholds were obtained with isolated all-pass sections,
and they may not apply to transfer functions where
the all-pass phase is accompanied by a minimum-phase
component. As an example, if an all-pass component
stems from zeros in the right half of the complex s plane
that do not have corresponding poles in the left half-plane,
the magnitude of the transfer function has a local dip at the
center frequency. Consequently the amplitude of the ringing
is significantly reduced, and the thresholds do not apply.

If an all-pass section is applied in the signal path to only
one ear, it may give rise to a lateral shift of the auditory
image. The lateralization seems related to the fact that, at
low frequencies, the all-pass section acts as a delay, that is,
its phase and group delays are approximately constant and
have the same value. The lower Q is, the longer is the
delay, and the more audible is the lateralization. The effect
is nearly the same for different broad-band signals. For
center frequencies in the range of 1–12 kHz thresholds for
Q correspond to low-frequency phase and group delays of
approximately 35 s, independent of the center frequency.

The literature does not offer data for the same conditions,
but the value agrees well with literature data on minimum
audible interaural delay. The result also agrees with our
knowledge about the localization blur for frontal sound
sources in the horizontal plane, for which the interaural time
difference is believed to be the main cue for discrimination.

In the experiments only one all-pass section was introduced
at a time. Real-life transfer functions may include
more than one all-pass section. For the ringing it is uncertain
what the joint effect of several all-pass sections will
be. The total peak group delay seems to play an important
role, possibly together with the group-delay bandwidth,
but more studies are needed. For the lateralization it seems
reasonable to conclude that the joint effect of several allpass
sections can be evaluated from the difference between
the two sides of the total low-frequency phase or
group delays.
"

Notice