Put another way, if an analogue source is captured simultaneously at 48KHz by two soundcards that are not locked in phase with each other, one card may be triggered by its sampling oscillator to take its sample* as much as 1/96000th sec after the other. In such a case, will the played back sound be perceptibly different in an A B comparison? This could be similar to comparing the sound from two microphones placed a distance apart equal to the distance sound travels in 1/96000th second. At 25 degrees Celsius, sound travels at about 346m/s. In 1/96000 sec, it would travel about 3.6mm, or a bit over a third of a centimetre.
... The next day, while the demo was being run for the Nth time, I was at the back of the room talking with someone. Suddenly, I heard a difference as the source switched. I was surprised, having failed to hear a difference the previous day listening in the sweet spot. I listened as it switched again, and heard it switch back – ah ha, it must have just gone analogue / digital / analogue. I kept listening – I couldn’t hear the difference next time it switched.I went to the middle/back of the room, and listened through the next demo. Without being told, I could pick out 44.1 and 48kHz. The difference was more obvious back from the sweet spot than in the sweet spot itself. More importantly, the difference wasn’t what I (or the other people who failed to hear it) had been listening for. It didn’t make any difference to the frequency response at all, or to the clarity of the high frequencies.What 44.1kHz and 48kHz did do was to make the sound slightly less realistic, like the difference between a good and bad CD player. If the lower sampling rate had any defined “quality” it was a glassy kind of sound – I’d heard that word associated with CD before and thought it was complete rubbish – but now I actually heard the difference, I understood exactly what people had meant.The change from 44.1 or 48kHz to analogue to 96kHz slightly increased the depth of the sound stage. I’d been listening to the amazing demo 1 for 2 days, so it was hardly an impressive difference, but it was still there.If you’re counting, that’s only two blind detections – once when I wasn’t even listening, and again when I went back to the middle of the room to check – I confirmed which had been which with Kevin afterwards – “The next to last one was 40something, wasn’t it?”You can say many things about this. You could say it was just luck, but I don’t think it was – I wasn’t even listening for the difference because (having listened the previous day) I didn’t think there was one to hear! You can say that I was hearing sonic deficiencies in the equipment. Well, maybe. That may be what the whole 44.1/96k debate is based on. All I can say is that, if there are sonic deficiencies in this equipment (I think the dCS boxes are around 5k each, and are used in many recording studios) then there isn’t much hope for the rest of us!What you could say, with some justification, is that the “character” of 44.1 was more obvious outside the sweet spot, so maybe it’s not such a big issue. That’s probably true – except that maybe I was just listening for the wrong thing when I was “in the sweet spot”. Maybe I had to stop listening to the Hi-Fi, and start listening to the music and the performance to hear what was happening.What is significant is that the 44.1kHz version wasn’t just different from the 96k and analogue version, it was [I[worse[/I]. As the analogue was the master, any difference would be bad news, but for it to be subjectively worse makes matters even, well, worse!I was upset to think how much recorded music only exists as a 44.1kHz or 48kHz sampled digital master tape. I discussed the subjective imperfections (the improved depth and realism of the 96k version) with Kevin, and he agreed. He was surprised that I’d noticed it that day, but couldn’t even hear anything wrong with 32kHz the previous day! I asked him what he heard with 16-bit (we’d been using 24-bit all along) and DSD. He said 16-bit was even worse – it made the whole sound “grungy”, and that DSD sounded nice, but added it’s own signature. “You can tell when you’re playing DSD through this system – the rooms heats up ” he said – I looked at the huge amps, and could believe it.One thing I should note: I didn’t think the analogue master was particularly good quality. It was a gorgeous recording, but it had obvious flaws – e.g. background noise, and some audible edits. Also, I didn’t hear any difference between analogue, 96k and 192k. I can’t explain why 44.1kHz and 48kHz sounded worse, but they did. No one responsible for the demo had any reason to rig the results, and I played with enough of the equipment to know that everything was above board and fair, even though some of the cables we used might not have met with audiophile approval. ...
If the two soundcards have clock frequencies that differ by only 0.001% (10 ppm) then the phase shift between them will reach 1/96000 second after only one second and will increase by this amount every second.
But there is another concern that is sometimes raised, beyond mere frequency response. It is a concern about relative timing and phase.
Arguably if 96KHz is used, any natural or artificial reverberation can be richer as the instantaneous wave cancellations are subtly recorded and reproduced without the constraint of a time structure (e.g. the volume level of different recorded tracks could be changed when creating a new mix and this could generate a whole new set of complex phase additions and cancellations, arguably more complex than if 48KHz had been used when recording).
Put another way, if an analogue source is captured simultaneously at 48KHz by two soundcards that are not locked in phase with each other, one card may be triggered by its sampling oscillator to take its sample* as much as 1/96000th sec after the other.
In such a case, will the played back sound be perceptibly different in an A B comparison?
This could be similar to comparing the sound from two microphones placed a distance apart equal to the distance sound travels in 1/96000th second. At 25 degrees Celsius, sound travels at about 346m/s. In 1/96000 sec, it would travel about 3.6mm, or a bit over a third of a centimetre.
A similar small difference due to sampling phase could also apply if downsampling a 192KHz recording to 48Khz. There will be 4 samples at 192KHz for every 1 at 48KHz. What if a 192Khz recording has 2 samples shaved off the start of it? If it is then converted to 48KHz it will give a slightly different result compared with a version that has not been shaved being converted to 48KHz. Substraction of the two conversions will leave a small residue. But will the two conversions sound different to the ear in an A-B comparison?
Even if they do sound different, is this not comparable with the difference we experience if we move our head back by a third of a centimetre [not when listening to headphones]. A practically negligible difference?
Are there any situations where it could make a material difference to the listening experience if the sound is captured at 48KHz and not, say, 96KHz?
2Bdecided, I have looked through the FAQs but there is not a lot that seems conclusive. Several times in my searches I came across this interesting report by yourself from 5 years ago, which seems quite relevant to the current topic:-
There are only two slightly credible explanations: human ears don't quite work in the way we think, or the well known and understood imperfections in real equipment combine together to create audible differences.There is an even more relevant point: no one has ABXed CD vs anything else, except by using seriously faulty equipment or by turning the volume up so high on near-silent passages that "normal" recordings would deafen you.
But I wonder whether there are any recent tests with highly evolved equipment that have concentrated on the 44.1 vs 48 vs 96+ issue with samples designed to highlight differences.
I could imagine that if six violinists played in front of a microphone each and the sound was mixed in analogue the result would be quite complex. If each of the six sources were separately converted to digital at just 44.1KHz and mixed digitally with the other violins each at 44.1KHz, the result seems likely to be different, compared with sampling each at say 96KHz and mixing; even if the final mixdown is to 44.1KHz.
Of course we could ask 'why bother to use a separate ADC for each microphone?': just mix in an analogue mixer.
If we really are sure that 96KHz is of no benefit now, are recording engineers using it just in case it may make a difference with loudspeakers of the future; or is the use of 96KHz driven by (i) flawed technical assumptions, and/or (ii) a market demand fostered by advertising hype?
I could imagine that if six violinists played in front of a microphone each and the sound was mixed in analogue the result would be quite complex. Alternatively, if each of the six sources were separately converted to digital at just 44.1KHz and mixed digitally with the other violins each at 44.1KHz, the result seems likely to be different, compared with sampling each at say 96KHz and mixing; even if the final mixdown of the 96KHz sources were at 44.1KHz.
96kHz ADCs are less likely to be plagued by analog antialiasing filter, which they need to include. You may (relatively) easily design something like the SSRC's lowpass in software but it is virtually impossible using analogue circuit.
Quote from: Martel on 15 May, 2008, 04:46:28 AM96kHz ADCs are less likely to be plagued by analog antialiasing filter, which they need to include. You may (relatively) easily design something like the SSRC's lowpass in software but it is virtually impossible using analogue circuit.That's why almost all modern ADCs use oversampling and digital filtering. I think it's the need for low latency that restricts the complexity of digital filtering in recording equipment.
No, because the output of reconstruction will be the same in both cases, given a bandlimited signal. Kotelnikov's original paper (one of the first in the field) actually discusses this, and it can be proven without difficulty. There is a small theoretical problem with the turn on condition (the beginning of time), but this can be ignored in audio.
To summarise, the systems involved are linear, and it doesn't make any difference whether you have 6 ADCs and a digital mixer, or an analogue mixer followed by 1 ADC.
I've noticed in several other threads in other forums that when a "What about different phases?" question is raised, it is dealt with by reference to steady waveforms and Nyquist. The argument goes that you can represent waveforms accurately with sampling at twice the maximum frequency of the Fourier series for a particular source. The question is not dealt with of the quality of representing interactions between waveforms from independent sources with continuously varying phase relationships. (I imagine I would not be in a position to understand a detailed mathematical explanation anyway!) Perhaps my query does seek to explore the "turn on condition".
Quote from: 2Bdecided on 15 May, 2008, 06:34:28 AMTo summarise, the systems involved are linear, and it doesn't make any difference whether you have 6 ADCs and a digital mixer, or an analogue mixer followed by 1 ADC.I do not understand the beginning of the explanation as these formulae appear to anticipate the conclusion reached:-[blockquote]Situation 1: 2 ADCs, digital mixingfinal output = f(x) + f(y)Situation 2: analogue mixing, 1 ADCfinal output = f(x+y)[/blockquote]They seem to be declarations that a sampled output resolves to the same thing as an analogue output, for bandlimited input.
Try zooming in the waveform in Cool Edit up to the sub-sample accuracy.
Let me see if I can provide an analog-domain equivalent to what we are discussing (and somebody correct me if I'm wrong).Let's say that you start with some waveform, and then you add a 22.05 kHz sine wave to it. Now lowpass the result to 22049 Hz.You will now have one of two things. Either the original waveform had no content above 22049 Hz, in which case you have back the original waveform, no matter how complex it was; or else the original waveform had content above 22049 Hz, in which case you now have intermodulation products between the original waveform and the 22.05 kHz sine wave.
A 44 kHz digital waveform PERFECTLY describes ANY signal (or mixture of signals), including phase, from 0 to 22049 Hz, if you do not consider distortion caused by finite number of amplitude quantization steps.Just looking at the waveform, you might get suspicious about accuracy at frequencies near the Nyquist one, since the signal hardly gets 3-4 samples per period. Try zooming in the waveform in Cool Edit up to the sub-sample accuracy. There you will see some interpolated points between actual samples. These are calculated solely by upsampling. No information is lost, you may recalculate the "missing" samples any time. This "upsampling" also happens naturally in DAC upon conversion to continuous-time domain.
I do get suspicious when I look at a digital mixdown of 19KHz and 20Khz sinewaves that were created at 44.1KHz. There are so few sample points and yet as you say cooledit manages to create a realistic graphical interpolation (with this relatively simple waveform).
In contrast, when I look at 19KHz and 20KHz sinewaves created at 96KHz and mixed digitally in cooledit, there are so many more sample points in the mixdown that sophisticated interpolation would not be necessary: you could simply join the dots with a most basic form a of integration (a resitor and capacitor). The undulations in overall amplitude at a rate of 1KHz appear to be relatively smooth, at this higher sampling rate. I could readily imagine this undulating signal surviving, despite the addition of other high frequency signals into the digital mix each needing to be 'interpolated'.