Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: High-Resolution Audio: A perspective (Read 11728 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

High-Resolution Audio: A perspective

Hi all,
Mr B Stuart triggers my first ever post to this forum, because;
In the latest AES journal (Oct 2015) I see that he has written a piece on Hi-Res Audio where he states something (to me fairly confusing) regarding temporal resolution and the 'precision of neural processing', and concludes that "a distribution system that permits end-to-end resolution of 8 μs implies a Gaussian bandwidth of around 44 kHz".

What is your take on his discussion on temporal resolution in there?

PDF. (hope link sticks around)
Article. (deep journal link)


High-Resolution Audio: A perspective

Reply #1
This was originally tacked on to the end of this discussion:
https://www.hydrogenaud.io/forums/index.php?showtopic=110649

I figured it warranted its own topic.

High-Resolution Audio: A perspective

Reply #2
I see that contrary to received wisdom around here, Mr. Stuart defines "resolution" in terms other than just bit depth. Someone should correct him on this.

High-Resolution Audio: A perspective

Reply #3
I have to admit that going for eye to ear analogies just to explain how they don't really work, to end up talking about minimum delays when it's the the one thing an eye sucks at, it's a funny introduction this paper has.

everything I've seen about 5 to 8µs audibility rubbed me the wrong way:
it always comes from using signals that aren't music(iterated ripple noise, square waves and such)
it always assumes that if we were able to notice such small delays in a super specific test doing only that, then we will notice them when listening to complex music where both the hair cells and the brain have other stuff to do than wondering if a delay was 8 or 10µs. and here I was thinking that brain efficiency crumbled as soon as the task became complex. just focusing on one instrument makes me miss a lot of the others, but I guess I'm just some dumb guy without a golden ear.
shouldn't I get scared shitless about transducer distortions and phase shifts from crossovers before I start pretending to care about a 8µs delay? I don't even dare to think about speakers in a room now. where can I buy the room that goes with the speakers to get the right delays? maybe Madonna was a good singer? should I also eat more royal jelly for the signals to jump faster between neurons? being paranoid is so amazing!!!!

of course another small detail would be: why don't we all pass a blind test between CD and highres if we actually notice the differences in content such as those delays too small to be on CD?

High-Resolution Audio: A perspective

Reply #4
The 8µs is the human limit (for a sound of a certain frequency, roughly at 1kHz) to distinguish inter-aural time differences, i.e. to determine the direction a sound is coming from by having information from both ears. This is a measure of the brain's and auditory system's processing speed. The 8µs number is in no way related to the frequency of the sound itself, as Bob Stuart himself points out in his own paper quoted above. The other paper he cites for that contains no data suggesting that, either.

He doesn't know what he's talking about. Kudos to him for making this article (and others) Open Access, though.
It's only audiophile if it's inconvenient.

High-Resolution Audio: A perspective

Reply #5
Hi all,
Mr B Stuart triggers my first ever post to this forum, because;
In the latest AES journal (Oct 2015) I see that he has written a piece on Hi-Res Audio where he states something (to me fairly confusing) regarding temporal resolution and the 'precision of neural processing', and concludes that "a distribution system that permits end-to-end resolution of 8 ?s implies a Gaussian bandwidth of around 44 kHz".

What is your take on his discussion on temporal resolution in there?

PDF. (hope link sticks around)
Article. (deep journal link)


Typical DAC-o-phile speculations. It's like there are no speakers or listening rooms in audio systems, or they are there, but they don't do what we know they do, which is smear the music both spatially and temporally.

High-Resolution Audio: A perspective

Reply #6
It's like there are no speakers or listening rooms in audio systems, or they are there, but they don't do what we know they do, which is smear the music both spatially and temporally.

We do know what they do, but lumping them together as though they are the same thing obscures the issues. The room contributes delayed reflections which are 'natural' and which, as humans, we seem to ignore. We hear the direct sound before any of the reflections. In contrast, the typical speaker contributes arbitrary, artificial, frequency-dependent phase shifts, and we have no choice but to listen to those both directly, and reflected and delayed - unless we do the sensible thing and correct them.

Which is not to say that I agree with anything that Mr. Stuart is saying - I am happy that the temporal resolution of 44.1/16 is much better than 8 us for anything I will ever listen to, and I am mystified as to why people are taking his sales pitch so seriously.

High-Resolution Audio: A perspective

Reply #7
It's like there are no speakers or listening rooms in audio systems, or they are there, but they don't do what we know they do, which is smear the music both spatially and temporally.

We do know what they do, but lumping them together as though they are the same thing obscures the issues.

The room contributes delayed reflections which are 'natural' and which, as humans, we seem to ignore. We hear the direct sound before any of the reflections.


Room reflections are natural, really?

Rooms themselves are artificial creations of man. They aren't grown by any plant, and they are unlike natural caves which are highly irregular inside.

Humans have been evolving for 100,000's of years, which was mostly spent in natural settings. Having spent some time in natural settings (i.e., the woods lakes, and streams), I know that the acoustics in natures are vastly different than that inside modern homes. The woods and plains are generally free of large planar walls, have no ceilings, and are actually quite non-reflective and absorptive.

Secondly, while the human brain can adapt, the processing we are pretty sure from trying to do the same with computers that adapting is computationally expensive. Might engaging our brain's adaptive features to deal with a highly reverberent room  distract mental resources from more engagement with the music?


Quote
In contrast, the typical speaker contributes arbitrary, artificial, frequency-dependent phase shifts, and we have no choice but to listen to those both directly, and reflected and delayed - unless we do the sensible thing and correct them.


The phase shifts contributed by a well-designed speaker are not arbitrary, but are related to the relevant laws of physics. Some of them are based on acoustic delays and are similar to what the room adds.

Trying to separate  the speaker's undesirable addtions from those due to the room misses the point - both the room's contribution and the speaker's contribution increase the auditory chaos that the human brain has to unpack in order to engage the music.

Quote
Which is not to say that I agree with anything that Mr. Stuart is saying - I am happy that the temporal resolution of 44.1/16 is much better than 8 us for anything I will ever listen to, and I am mystified as to why people are taking his sales pitch so seriously.


Thanks for people who understand these things.

High-Resolution Audio: A perspective

Reply #8
Before reading the article, I was quite sceptical about it. Though, while reading, I did not really discover anything upsetting.

Quote
..we need to move away from the poor proxy of escalating sample-rates and bit-depths.

Makes sense and seems to be in line with what most of you here believe.

Quote
We recently proposed that system errors should only resemble those introduced when sound travels a short distance through air

This reminds me of what is said in the post above on well-designed speakers.

... and concludes that "a distribution system that permits end-to-end resolution of 8 ?s implies a Gaussian bandwidth of around 44 kHz".

You have an (analog) audio channel and want to send signals through it. Assuming your signal is made up of (a sum of) Gaussian pulses, what is the maximal width of the Gaussian that enables you to still differentiate two Gaussians sent through the channel with a delay of 8µs between them? I'm no expert in analog signal theory, so correct me if I'm wrong, but a typical requirement to detect the signals as two is an amplitude difference of 1/e. A width of 44kHz corresponds to approx 2,3µs, so I think the numbers are about right. These 44kHz have nothing to do with the sample rate in the digital domain.

The 8µs is the human limit (for a sound of a certain frequency, roughly at 1kHz) to distinguish inter-aural time differences, i.e. to determine the direction a sound is coming from by having information from both ears. This is a measure of the brain's and auditory system's processing speed. The 8µs number is in no way related to the frequency of the sound itself, as Bob Stuart himself points out in his own paper quoted above.

Then, I would prefer my audio gear to be able to process all channels without introducing relative timing errors above 8µs between the channels.

Which is not to say that I agree with anything that Mr. Stuart is saying - I am happy that the temporal resolution of 44.1/16 is much better than 8 us for anything I will ever listen to, and I am mystified as to why people are taking his sales pitch so seriously.

Not that I am an advocat of super high sample rates, but the temporal resolution of 44.1kHz is about 22µs.

High-Resolution Audio: A perspective

Reply #9
Which is not to say that I agree with anything that Mr. Stuart is saying - I am happy that the temporal resolution of 44.1/16 is much better than 8 us for anything I will ever listen to, and I am mystified as to why people are taking his sales pitch so seriously.

Not that I am an advocat of super high sample rates, but the temporal resolution of 44.1kHz is about 22µs.

It's more like 1ns, see: https://www.hydrogenaud.io/forums/index.php...rt=#entry905590 for example.

High-Resolution Audio: A perspective

Reply #10
Alright, assuming the signal has certain properties you can use an algorithm that effectively gives you higher resolution. My mistake, I read over the 16 in 44.1/16 ...
Though, for songs with dynamic range you do not reach the acclaimed 1ns during the quieter parts.

High-Resolution Audio: A perspective

Reply #11
Again, it's inter-aural time differences he's talking about. Sampling rates of audio signals is a completely orthogonal topic.
It's only audiophile if it's inconvenient.

High-Resolution Audio: A perspective

Reply #12
Again, it's inter-aural time differences he's talking about. Sampling rates of audio signals is a completely orthogonal topic.

Right, but if you can phase-shift 16/44 audio at a 1ns granularity, then you can phase-shift the left channel by 1ns and have a 1ns inter-aural time difference on a CD, i.e. 4 orders of magnitude better than the ~10us the article is asking for.

zimjo: agree it makes sense that the 1ns figure would be worse on quiet parts. Would be interesting to experiment with.

High-Resolution Audio: A perspective

Reply #13
zimjo: agree it makes sense that the 1ns figure would be worse on quiet parts. Would be interesting to experiment with.

With 44.1 kHz, 16 bit, signal -80 dB down, theoretical "temporal resolution" lowers to about 1us.

"I hear it when I see it."

High-Resolution Audio: A perspective

Reply #14
zimjo: agree it makes sense that the 1ns figure would be worse on quiet parts. Would be interesting to experiment with.

With 44.1 kHz, 16 bit, signal -80 dB down, theoretical "temporal resolution" lowers to about 1us.


In one of Stuart's papers he said that sub-LSB signal recovered by dither also does play a role in temporal resolution, so the real temporal resolution is even higher.

High-Resolution Audio: A perspective

Reply #15
zimjo: agree it makes sense that the 1ns figure would be worse on quiet parts. Would be interesting to experiment with.

With 44.1 kHz, 16 bit, signal -80 dB down, theoretical "temporal resolution" lowers to about 1us.

If I understand correctly, you'd have to run a sum over all samples of the signal to recover a temporal accuracy of 1ns (or 1µs at -80dB). This would introduce an undesirable delay during playback. Does anyone know how this is actually implemented in audio gear? When using a lookahead of say 1s, what would a typical temporal resolution (I have a feeling it will depend on the properties of the signal) be?

In one of Stuart's papers he said that sub-LSB signal recovered by dither also does play a role in temporal resolution, so the real temporal resolution is even higher.

Do you have a link or reference for that? How can you recover something from dither, when you don't know the noise model used during dithering? My understanding is that dithering helps reach the theoretical temporal resolution by preventing correlations in rounding errors.

High-Resolution Audio: A perspective

Reply #16
If I understand correctly, you'd have to run a sum over all samples of the signal to recover a temporal accuracy of 1ns


1ns is resolvable by eye with just a few samples.

For example, below, the data column  on the right is the same 44100/16 signal as the column on the left, but delayed 1ns (both columns dithered independently):

Code: [Select]
for n in $(seq 1 5); do echo; sox -r 44100 -c2 -n -b16 tmp.wav synth 1s sq pad 200s 200s rate 1000000k delay 0 1s rate 44100 norm -.0005; sox tmp.wav -tdat - trim 199s 3s | awk -e '{ if ($1 != ";") printf "%7s %6s\n", $2*32768, $3*32768 }'; done

   1692   1690
  32767  32766
   1693   1694

   1693   1691
  32766  32767
   1692   1694

   1693   1691
  32766  32765
   1692   1694

   1693   1691
  32766  32766
   1692   1694

   1692   1691
  32766  32766
   1692   1694

High-Resolution Audio: A perspective

Reply #17
With temporal resolution I mean the ability to distinguish two audio pulses separated by a delay. Of course you observe a delay of 1ns as a change of amplitude for each sample. By looking at your data I cannot tell if there are two pulses separated by less than 8µs.
You used upsampling, shifted the data and downsampled again. Depending on what kind of upsampling algorithm you used, it did use more than just one or two samples for interpolation! For most accurate results it would use all samples, see also proofs of the Nyqist theorem.

I'm interested to learn how typical audio gear performs PCM demodulation. If it simply performs D/A conversion on a 44.1kHz signal, it will not produce output with <22µs resolution. My guess is that (at least high end) equipment performs upsampling before converting to the analog domain. I know that many amps use 192kHz internally, which gives you about 5µs resolution.

High-Resolution Audio: A perspective

Reply #18
I'm interested to learn how typical audio gear performs PCM demodulation. If it simply performs D/A conversion on a 44.1kHz signal, it will not produce output with <22µs resolution. My guess is that (at least high end) equipment performs upsampling before converting to the analog domain. I know that many amps use 192kHz internally, which gives you about 5µs resolution.


All digital audio systems should incorporate a reconstruction filter that produces a smooth, continuous output. This can be done partly-digitally using oversampling or entirely with analogue electronics. I presume that 'sufficient' samples and windowing are used to achieve some set specification in terms of accuracy. The incoming signal is bandwidth-limited, again with a filter.

As viewed on an oscilloscope, a 'stairstep' view of raw D/A conversions will 'run' relative to a repeating sampled impulse unless the repetition rate is locked at some integer ratio of the sample frequency, but this doesn't matter: two impulses separated by 1ns, reproduced separately in stereo, at the outputs of the filters will still occur at the exactly-correct time, will still look beautiful and will stay separated by 1ns, with only a tiny amount of distortion ('noise') causing any deviation from this. That is, unless the amplitudes of the impulses are tiny, in which case the noise makes the real time display unintelligible. However, averaged over time it would still be revealed that the impulses were the correct shape, occurred at the correct times and were separated by 1ns. Dither is an essential part of this equation.

High-Resolution Audio: A perspective

Reply #19
I think the above comment by Green Marker outlines abilities of the sampling system quite understandably (to me) in this respect.

Regarding the exact definition of temporal resolution of a sampling system I recall JJ stating in another thread it's 1/(2*pi*BW*2expBITS) secs, and I have sometimes seen it referred to as 1/(2*pi*Fs*2expBITS) secs, and many discussions refers to it as just 1/(Fs*2expBITS) secs. I can't however find an analytically derived definition in my books etc.

Does anyone here know the correct definition?

regards,
/AddeP

High-Resolution Audio: A perspective

Reply #20
With temporal resolution I mean the ability to distinguish two audio pulses separated by a delay. Of course you observe a delay of 1ns as a change of amplitude for each sample. By looking at your data I cannot tell if there are two pulses separated by less than 8µs.

Ah ok, I think that sense of "temporal resolution" is just "bandwidth", and I think you're right, you wouldn't be able to tell if there are two pulses in one audio channel separated by 8us.
The other sense is about being able to position sounds in time (phase-shift) at a very fine, sub-sample granularity.

16/44 audio's bandwidth is 22050Hz, the period of a 22050Hz signal is 1/22050 ~= 45us. So, you can't store a sine wave in 16/44 audio with peaks separated by less than 45us, because that would have a frequency > 22050Hz. I'm not sure how this extends to pulses, but I would guess if you have pulses (in one audio channel) separated by less than 45us, they get merged together by the lowpass filtering at 22kHz.

High-Resolution Audio: A perspective

Reply #21
Ah ok, I think that sense of "temporal resolution" is just "bandwidth", and I think you're right, you wouldn't be able to tell if there are two pulses in one audio channel separated by 8us.
The other sense is about being able to position sounds in time (phase-shift) at a very fine, sub-sample granularity.

16/44 audio's bandwidth is 22050Hz, the period of a 22050Hz signal is 1/22050 ~= 45us. So, you can't store a sine wave in 16/44 audio with peaks separated by less than 45us, because that would have a frequency > 22050Hz. I'm not sure how this extends to pulses, but I would guess if you have pulses (in one audio channel) separated by less than 45us, they get merged together by the lowpass filtering at 22kHz.

Right, I realized that something is off when looking at it again. Stuarts remark on 8µs resolution and 44kHz Gaussian bandwidth confused me, cause I assumed he meant FWHM, and I messed up with some constants in the fourier transform. So, what I said above is wrong, a FWHM 44kHz pulse corresponds to 23µs, not 2.3µs. Stuarts numbers only make sense when using the 44kHz as standard deviation of the gaussian. So, obviously you can not sample 8µs pulses with 16/44. This also implies that you cannot reproduce 8µs time differences properly with 16/44. Imagine you have two pulses arriving at one ear simultaneously and with a delay of 8µs at the other: You cannot reproduce this with 16/44. Though, I do not consider this really relevant for music, since first of all such pulses contain frequencies above 22kHz and secondly a time delay difference of 8µs corresponds to something on the order of mm in space. Really, who needs this kind of precision for their music?

The definition of resolution I gave above is frankly the only one I came across so far, and imho standard. The 1ns referenced above is as you said "the ability to position sounds in time", which, instead of resolution, I would rather refer to as temporal accuracy or temporal fidelity.
Nevertheless, I don't feel my question regarding demodulation has been answered. It is clear that the output is smooth and not step-like. And it is clear that the amplitude of the output signal will be true to the original 44.1 thousand times during a second (at least up to the precision achievable for 16bit). However, when I hear that 16//44 has a temporal fidelity of 1ns, I expect that the signal is true to the original also in-between samples, not just at the instances at which a sample of the signal was taken. The Nyqist theorem guarantees that this is possible, but it requires proper interpolation between samples, as I tried to describe above. I guess you can see that the amplitude is not properly reproduced if you were to use, e.g., linear or quadratic interpolation. In fact, I don't believe any passive analog filter can give you the kind of interpolation required.
Again, this is probably not very relevant for people without golden ears. Though, in the context of accuracy of 16/44, it is important to have this in the back of one's mind.

High-Resolution Audio: A perspective

Reply #22
...Again, this is probably not very relevant for people without golden ears. Though, in the context of accuracy of 16/44, it is important to have this in the back of one's mind.

Your claim is this timing is audible? Please don't talk around it in circles and prove your point as TOS#8 suggests.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

High-Resolution Audio: A perspective

Reply #23
I did not claim anything. I'm simply trying to understand, what people in this discussion meant with resolution. My point is that calling it a resolution of 1ns is at best misleading, since you can't resolve pulses with 1ns. Wikipedia says
Quote
Temporal resolution, the sampling frequency of a digital audio device.

Maybe you can add to the discussion by clarifying my question regarding D/A conversion?

High-Resolution Audio: A perspective

Reply #24
I did not claim anything...

How to read "...this is probably not very relevant for people without golden ears" ?
Even worse if you mean not relevant to us deaf here at hydrogen.
I may be wrong.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!