Old audio recordings in high frequencies and bitrates

@dc2bluelight :
You cannot directly compare film image (even less film video) to audio, because the latter is one dimensional while the others are two-dimensional.

- With image, you have pixel size (which is what you are talking about), pixel intensity (which is luminosity) and color spectrum (of which we are basically interested on the visible part of it, like with audio frequency)

Actually, no, I wasn't talking about pixel size. You have a number of pixels as related to the total area of the captured image. You end up with a number of pixel per unit area, which is roughly similar to the number of samples taken per unit time in audio. The specific pixel data is different in nature, but they are still samples, and pixels can also vary in bit depth. The only real difference is that a monochrome image has single pixels, each with only a luminance value, where with a color image each pixel is actually 3 pixels, one for each primary color.

- With audio, you only have audio bit depth (amplitude precision, determining the SNR) and audio sampled frequency (determining the audible range, and only limited on the upper side).

Just like a monochrome image, or a single color channel. No difference at all.

Color spectrum width is fixed in digitized images (to the visible part), much like the audio frequency range is on digitized audio (the audible range).

No, color spectrum is just 3 monochrome samples, each with 2 dimensional information: a position and a value. Mono audio is also 2 dimensional, value and time. Stereo audio is 2 samples and time, 5.1 is six samples, and time. It compares directly with image capture when to consider different information channels.

Usually higher audio sample rates are only of use when altering the sampled content itself, not when reproducing it.

No, sample rates never alter the sampled content. A higher sample rate permits a higher maximum sampled frequency. But sample rates of sampled and reconstructed audio must be identical, and are definitely in use in both processes. If you ignore sample rate during reconstruction you do not get proper reconstruction, you get a mess.

Some people also think that sampling at a higher rate increases the temporal resolution, but the phase of the signals is preserved when sampled, which means that inter-sample positions do exist, and come back after the reconstruction filter on the output. (So either they look for a frequency that is outside of the sampled bandwidth, or incorrectly assume that sampled audio is quantised in the time domain)

Yes, misconceptions abound. Interesting, but tangential.

pixel intensity and color spectrum quantization (number of colours) could be compared to audio bit depth.

No, pixel intensity is exactly bit depth. Color spectrum quantization is bit depth times color channel count. Just like multichannel audio.

Sampling rate is not comparable to spectrum quantization. It is only comparable to reducing the spectrum, as in missing the blue color). As such, low samplerate like the old telephone audio is more similar to monocrome images than to a reduction in number of different colors ( like quantization of the color spectrum to 256 colors)

No, you've confused rate (number of pixels per unit area of the image) with bit depth (dynamic range). The number of colors is channels times bit depth equals dynamic range.

So we are left with pixel size. And that has no comparable thing versus audio.

Pixel size is only important relative to how many pixels you can get per unit area. Pixel size, which is a sensor parameter, relates to the ability of an individual pixel to capture photons, which relates to the noise level of quantization.
Tiny pixels capture fewer photons, so even though quantization may be at 16 bits, the noise level might be on the order of 8 bits. This is where sensor-specific pre-processing makes it all work by applying several forms of noise reduction. The comparison in audio would be digitizing to 24 bits, but with an ADC that has a 20 bit noise floor (like most do). You get the full 24 bit depth, but 4 LSB's worth of noise.

The key factor in the comparison is the number of bits per unit area, which relates to the degree of detail that can be caputed in an image. It's measured in lines or line-pairs per unit length, and is directly comparable to frequency in audio, measured cycles per unit time.

44Khz sampled audio can be resampled to 192Khz with a very good resampler and you will get back a smooth signal in the temporal space, no added noise to the audible signal, but will gain nothing in the frequency domain.

This is where many people run into trouble. Resampling does not get you smoother signals! Reconstruction filtering gets you smooth signals. You do not increase resolution with up-sampling, you increase the amount of data, but it's just interpolated. The reconstructed signals are always smooth, regardless of sampling frequency or bit depth. The only thing that changes is noise and the maximum available frequency.

With images, you can resample to a higher resolution but you will only get blurring or squaring.

No, you get interpolated bits. Nothing is blurred, nothing is squared, that would be a reduction in resolution. Up-sampling of an image does the same thing that up-sampling in audio does. The highest frequency content is not changed, and the added samples are just interpolated.

In fact, in this specific case it would be more similar to audio bit depth, as you cannot improve the SNR when increasing the audio bit depth of an already sampled signal (or increasing its volume).

No. Bit depth is always dynamic range only. Up-sampling does not change dynamic range, nor does it change the maximum frequency of the original.

In both audio and image processing, there is an advantage to resampling with higher bit depth per channel if you plan to do any processing. For example, most final images are 8 bits per color channel (called 24 bit images), but camera sensors digitize at 16 bits. If you attempt to simply apply gain and offset to an image to stretch its contrast range, you get what's known as "banding", more apparent steps in tonal scale. If, however, you access the raw image sensor data, you can apply that same process with less "banding". If you up-sample the image from 8 bits per channel to 16 bits per channel, the gain process now has "room" to interpolate between sampled levels. Nothing beats the raw sensor data, but upsampling then applying dynamics processing provides an advantage.

The parallel in audio is mixing and also dynamics processing. Those processes are always done in the 32-bit floating point format, for exactly the same reasons: you have more data in which to apply processing, even if it's interpolated.

As a final note to remark that the frequency of an audio is not quantized, one can generate an FFT transform of, let say, 192000 samples out of a 48Khz rate signal, and you would still get a 24Khz bandwith, with a frequency precision (on the plotted graphic out of this data) of 0.25Hz. This, of course, gives you a temporal resolution of 4 seconds.
In order to increase the frequency precision, and keep a small temporal resolution, you need to pad the signal with long silence before and after it. It also needs a proper window function to remove the signal jump at silence boundaries. And of course it will need an amplitude compensation.

I'm not sure what you're trying to say. Quantized audio produces data that represents values at single points separated by a defined time interval. That data, when reconstructed using the same time interval, presents the original signal, with amplitude by virtue of the quantized values, and frequency by virtue of the specified time interval. Resampling produces more samples by virtue of the new time interval, but since the new samples are only interpolated, there can be no additional higher frequency content.

Quote from: [JAZ] on 2020-09-16 19:18:35

Additional note of the final note: This first made me think that a higher sampling rate could actually have more detail, but look:
A 1 second, 48Khz sample converted to an FFT that outputs 24K bands (all the audio samples) would show 24Kbands with content.
A 1 second, 48Khz sample resampled to 192Khz and converted to an FFT that outputs 96K bands (all the audio samples), would show 24K bands with content, and the rest without (because they are above the original cutoff frequency).
A 1 second, 192Khz sample converted to an FFT that outputs 96K bands (all the audio samples), the first 24K bands would still be the bands below the 48Khz sampling rate.
So i get more bandwith with higher sampling rates, but the bandwith precision remains constant.

I think that's a complicated way of saying what I just said.

Re: Old audio recordings in high frequencies and bitrates

Reply #11 – 2020-09-16 19:18:35

@ajinfla : Probably you didn't use the best image, since that is not a restoration, but a duplication (right-side image imitates the left-side image. You cannot get the second one from the first one using image processing tools of any kind).
If you only implied that audio is cleaned, and defects are corrected, then that can be done only in a way comparable to what a sculpture restoration could do: Either you keep faithful to the original and limit the amount of changes possible, or change it with new materials to correct the broken parts.

Re: Old audio recordings in high frequencies and bitrates

Reply #12 – 2020-09-16 20:07:30

@dc2bluelight :
We both have been talking about analog or digital but not being clear enough when we were implying limits or not. (Just like an example. I talked about the light spectral bandwidth and you talked about the digitized RGB pixels)

There are still some of your comments that seem incorrect to me, but i'm not sure I have the knowledge or time to check if I'm wrong.

The analogy of pixels of an image to channels of audio, on second thought, could be correct, since in multichannel we try to represent the 3D space more faithfully compared to stereo. (Only binaural audio can properly represent 3D, not standard stereo played on a pair of speakers). And usually on multichannel audio, the signals of near channels are simillar, much like near pixels of an image.

As for the line pairs that you mention, I have to be clear that I don't know what you mean. I assume it is something that I don't know about the composition of the chemicals on the film, but I still don't get where those "dark and light line pairs" come from.
(Given that interleaved video lines are out of the question, because we are talking about film, not about television).

But now I see where your association of bigger image resolution to higher audio frequency comes from.
If you think of the image pixels as "number of samples taken per unit time" (as you said above), then you are making a big mistake.

Audio samples are sampled one each period of time, this gives them the characteristics that make the reconstruction filter work.
You can only consider the whole pixels of the image on a single period, not as the progression of samples until the next photogram.
And given that, increasing the number of pixels cannot translate to an increased frequency because all pixels represent the same moment.

Oh, and just a clarification: When I said that when resampling, we can obtain a smooth signal on the temporal domain I was implying that the sampled audio has enough information to be able to reconstruct it, so it does not create stairsteps. So we both agree here.

And when I was talking about image upscaling getting blurred or "blocky", I mean that in digitized images, you cannot "zoom in" like we "zoom in" on a waveform. The information is not there.

Re: Old audio recordings in high frequencies and bitrates

Reply #13 – 2020-09-16 22:16:01

@ajinfla : Probably you didn't use the best image, since that is not a restoration, but a duplication

I realize that it might seem a bit absurd, but that may have been intentional.

Re: Old audio recordings in high frequencies and bitrates

Reply #14 – 2020-09-16 23:40:41

The analogy of pixels of an image to channels of audio, on second thought, could be correct, since in multichannel we try to represent the 3D space more faithfully compared to stereo. (Only binaural audio can properly represent 3D, not standard stereo played on a pair of speakers). And usually on multichannel audio, the signals of near channels are simillar, much like near pixels of an image.

On the most basic level, its quantizing the value of a sample (Y) taken along second axis (X). The only difference is what the X axis represents: in audio it's time, in a still image its position.

As for the line pairs that you mention, I have to be clear that I don't know what you mean. I assume it is something that I don't know about the composition of the chemicals on the film, but I still don't get where those "dark and light line pairs" come from.
(Given that interleaved video lines are out of the question, because we are talking about film, not about television).

We need some means of evaluating film resolution, so a target is shot comprised of progressively closer spaced lines, or line pairs (1 black, 1 white). There are many target designs, this is one example:

The maximum resolution point corresponds to the point where the line pair is no longer clearly distinct, and that point refers to a particular spacing of lines in lines per unit length, usually lines per inch or lines per mm.

But now I see where your association of bigger image resolution to higher audio frequency comes from.
If you think of the image pixels as "number of samples taken per unit time" (as you said above), then you are making a big mistake.

No, that's not correct. In either case, audio or image, its the number of samples per unit of measurement. In audio, the X axis is time, in image the X axis is a physical dimension. Increase the number of samples in either one, an increase the highs frequency that can be quantized. Sure, the X axis is different, but the principle is identical.

Audio samples are sampled one each period of time, this gives them the characteristics that make the reconstruction filter work.
You can only consider the whole pixels of the image on a single period, not as the progression of samples until the next photogram.

The reason you can't get this is you're trying to think of the image as a chain of samples. As I've been saying, it's X axis, the second data dimension is NOT time, it's distance. Audio is a time variant signal. A still image is not time variant, the variations occur over distance. Sampling theory still works in either case. And so does reconstruction. In audio, the reconstruction filter connects the sample points, which are theoretically infinitesimal, but spaced in finite time increments. In an image, the reconstruction filter is optical. When it is no longer possible to see individual pixels, the eye provides the reconstruction filter. The principle is the same.

And given that, increasing the number of pixels cannot translate to an increased frequency because all pixels represent the same moment.

You have to get your head of of pixels and resolution being time related. In a still image, they are not. The entire image is sampled at once. Consider one horizontal row of pixels in a digital image. How many lines, one pixel high, can they represent completely? Exactly half the number of pixels in that line. Because to represent a line you have to also represent the absence of a line, creating a "line pair". So if I took a picture that is 100 pixels wide, the most lines I can represent is 50 lines. If now I make my picture 1000 pixels wide, now I can represent 500 line pairs. If the physical dimension of the image doesn't change, only the number of pixels, what have I increased by adding pixels? I've increased the number of total lines per unit width. Lines per unit with, cycles per unit time, it's all the same thing, only in different domains. The same information theory and sampling theory applies.

Oh, and just a clarification: When I said that when resampling, we can obtain a smooth signal on the temporal domain I was implying that the sampled audio has enough information to be able to reconstruct it, so it does not create stairsteps. So we both agree here.
[/quote} I don't believe we do agree. First, in a digital audio system there are no actual stair steps. Samples are infinitesimal points, not steps with a horizontal dimension of one sample period. There are no stair steps anywhere in the system, quantization or reconstruction. Second, upsampling data quantized at one frequency to some other higher frequency does not smooth out anything, because it's already smooth. Infinitesimal points are connected to each other, connect-the-dots-style, by the reconstruction filter. To resample, there has to still be a reconstruction filter of sorts to interpolate between the original points, adding more points, but not smoothing out anything, nor increasing resolution, nor increasing dynamic range, because all of that is limited by the original data.

There's an excellent demonstration of this in this video.https://youtu.be/cIQ9IXSUzuM
Quote from: [JAZ] on 2020-09-16 20:07:30
And when I was talking about image upscaling getting blurred or "blocky", I mean that in digitized images, you cannot "zoom in" like we "zoom in" on a waveform. The information is not there.

No, that's wrong again. Zooming in on either type of data has the same result. At some point you are in to the single sample point, and can go no further. Some DAW software will show the actual samples as dots connected with lines, some will show it as stair steps, but neither is really there, you only see it that way because you've zoomed in and bypassed the reconstruction filter. It's the same thing in an image. When you magnify so much that you see individual pixels, you've bypassed the visual reconstruction filter.

Re: Old audio recordings in high frequencies and bitrates

Reply #15 – 2020-09-17 20:55:03

We do not agree, indeed.

I should be perfectly able to resample an 8Khz audio to a 384Khz audio, and the result will be correctly represented in the time domain as in each sample being what would have been in the audio if it had been sampled at 384Khz (properly lowpassed to 4Khz first)

With images, we have also resampling and interpolation, but you will either get blurring edges or sharp squares when zooming an image 48X.
Maybe it is just that I don't know good enough image resamplers so I am comparing low quality image resampling vs high quality audio resampling, but I don't see any "resconstruction filter" limit on audio.

Thanks about the clarification about line pairs. Now I understand that you're talking about a type of measure, not about the digitization process itself.

Re: Old audio recordings in high frequencies and bitrates

Reply #16 – 2020-09-17 23:01:51

3rd from bottom
Audio vs Video - the same or different? Highlights the differences between audio and video perception.

Re: Old audio recordings in high frequencies and bitrates

Reply #17 – 2020-09-18 03:44:26

We do not agree, indeed.

I should be perfectly able to resample an 8Khz audio to a 384Khz audio, and the result will be correctly represented in the time domain as in each sample being what would have been in the audio if it had been sampled at 384Khz (properly lowpassed to 4Khz first)

Well, I do agree with that. But there is no "smoothing" in doing that.

With images, we have also resampling and interpolation, but you will either get blurring edges or sharp squares when zooming an image 48X.

To understand why this is true, you have to make note of where the reconstruction filter is in the digital imaging system. Find it, and you'll have your answer.

Maybe it is just that I don't know good enough image resamplers so I am comparing low quality image resampling vs high quality audio resampling, but I don't see any "resconstruction filter" limit on audio.

Image resampling is no different from audio resampling. You are adding data points between existing data points. They must be interpolated or they won't make any sense.

And to understand where the reconstruction filter is in resampling audio, ask yourself what's doing the interpolation, and where that new data comes from. In answering that question, you'll have the answer you're looking for.

Quote from: ajinfla on 2020-09-17 23:01:51

Thanks about the clarification about line pairs. Now I understand that you're talking about a type of measure, not about the digitization process itself.

Ask yourself how you would measure the highest frequency that can be digitized in a digital audio system. What signal would you use?

Now as the same question about the maximum image detail in a digital or film image. What test signal would you use?

The answer is, they are the same signal, but in different types of data.

One more time....sampling theory works regardless of what the data represents, and regardless of what the X axis is, time, space, or some other quantity. Sampling theory can be applied time variant or spacial variant signals, it makes no difference to how sampling and reconstruction works. Once that fact is understood, then the concept of up-sampling, down-sampling, and altering bit depth can be seen to operate identically across different types of data also.

Re: Old audio recordings in high frequencies and bitrates

Reply #18 – 2020-09-18 03:49:20

3rd from bottom

Interesting, but a good part of the PPT is about video, in other words, a sequence of still images meant to convey movement. In this discussion we've only talked about still digital images, which hopefully makes the comparison easier.

Many thing happen when you present a rapid sequence of images. Just one example: individual frames look softer and more blurry than the sequence. Been true with film and video forever. But it's all sort of outside this discussion, and would seriously throw a wrench in something that is already not being understood well.

Re: Old audio recordings in high frequencies and bitrates

Reply #19 – 2020-09-18 05:19:39

Let me try :-)

On the image below there are 2 squares. I don't know how big they look like on your screen, so let's say you have them on paper and each has side of 1 inch. The square on the left has less details than the one on the right. In other words the left square has less (lower?) frequency content than the right one. For the left square it is enough to sample it with rate of 2 dots per inch to perfectly capture it and then reproduce. For the right square you need to sample it with higher frequency of 24 dots per inch.

So for the left sqquare you get 4 samples (pixels) and 2 DPI sampling rate associated with it. Now you can upsample it by duplicating the samples a few times and associating 24 DPI sampling rate with it. If you open those 2 images in some viewer that is DPI-aware then they will look exactly the same.

The above is the equivalent of upsampling an audio from let's say 8 kHz to 96 kHz. I can see 2 reasons why it may be hard to see why they are equivalent:

It may be hard to switch thinking from "samples per second" to "samples per inch".
In contrast to audio, where every audio player is aware of sampling rate, most of the image viewers are not. I guess that's mostly because the viewing screen has fixed DPI density.
In audio, 8'000 samples at 8 kHz will play for 1 second and after resampling to 96 kHz the resulting 96'000 samples will still play for 1 second, because the audio card will switch its output frequency.
In image, 96x96 samples at 96 DPI will show as 1 inch x 1 inch square but after resampling to 192 DPI, instead of still showing as 1 inch x 1 inch, the resulting 192x192 samples will show as 2 inch x 2 inch because the screen can't switch its density.

As a practical example, the attached 2x2.png (5.37 KB 2x2):

is 2x2 image at 2 DPI:

Code: [Select]

]$ identify -units PixelsPerInch -format 'number of samples: %w x %h\nsampling rate: %x dpi x %y dpi\n' 2x2.png
number of samples: 2 x 2
sampling rate: 2.0099999999999997868 dpi x 2.0099999999999997868 dpi

We can resample it to 200 DPI, either with ImageMagick:

Code: [Select]

]$ convert -units PixelsPerInch 2x2.png -filter point -resample 200 200x200.png

or with Gimp:

and we get 200x200 image at 200 DPI:

Code: [Select]

]$ identify -units PixelsPerInch -format 'number of samples: %w x %h\nsampling rate: %x dpi x %y dpi\n' 200x200.png
number of samples: 200 x 200
sampling rate: 200 dpi x 200 dpi

The one DPI-aware program that I know of is LibreOffice Writer. I suspect it will work the same in MS Word too. When we insert both 2x2.png and 200x200.png, they will look the same:

Re: Old audio recordings in high frequencies and bitrates

Reply #20 – 2020-09-18 18:50:13

You have selected "Interpolation : None".
If we select interpolation none with audio, we get aliasing, which is a bad thing for our purposes.
If we select any other interpolation in the image, we get these, which are different than what you pretended to demonstrate.

In the case of audio, the resampling allows the audio to be decoded at a higher sampling rate, and the audio would still sound the same.
In the case of image, the resampling would allow to show the image in a high DPI screen while looking the same than a normal DPI one.

But for best audio you need a good resampler ( else the audio can sound distorted) and for best image (in this specific scenario), you need a 0 order resampler (resampler: none) for it to look exactly as it was, or else it would look blurry.

(note: I always understood DPI as the dots inside a square inch, not the dots of the first line of a square inch, but I see that you are correct here.
Maybe it was because we usually have square inches, implying that X and Y have the same amount of dots and, as a simplification, only the X was mentioned.)

Re: Old audio recordings in high frequencies and bitrates

Reply #21 – 2020-09-19 04:39:26

You have selected "Interpolation : None".
If we select interpolation none with audio, we get aliasing, which is a bad thing for our purposes.

No, if you don't interpolate and sample to a higher rate, you don't get aliasing at all, you just get the original data with a whole lot of extra duplicates of it between the original points. No aliasing, though.

Aliasing, in sampling, is an intermodulation of the sampling frequency with the sampled signal. That's not happening in resampling, interpolation or not.

If we select any other interpolation in the image, we get these, which are different than what you pretended to demonstrate.

Remember, interpolation is just another "connect-the-dots" method. Your up-sampled image isn't blurry, it's interpolated. No information was lost (like real blurring). All the original information is still there with faked information added between samples. In fact, though the original might look sharper when you zoom in, it isn't because that's not a fair display method. You need the reconstruction filter of your vision to make it work.

In the case of audio, the resampling allows the audio to be decoded at a higher sampling rate, and the audio would still sound the same.
In the case of image, the resampling would allow to show the image in a high DPI screen while looking the same than a normal DPI one.

The only thing wrong with the above is the term "DPI", which doesn't apply to a screen at all. DPI applies only to a printed image. "Dots Per Inch". Screen resolution is stated in total pixel dimensions, or some generic name like "1080p" (1920x1080).

But for best audio you need a good resampler ( else the audio can sound distorted) and for best image (in this specific scenario), you need a 0 order resampler (resampler: none) for it to look exactly as it was, or else it would look blurry.

Nope. For the best audio you want NO resampler. Resampling accomplishes nothing unless the resampled data has some specific use, like being mixed with other data.

You still aren't understanding image resampling. There is no blurring going on at all, there is interpolation. Blurring = lost detail. Interpolation=adding additional samples between two originals.

You cannot "zoom in" on a resampled digital image to make a judgement like that. If you take an image that displays 6" wide on a screen and viewed at a distance where you cannot see individual pixels (your visual reconstruction filter is "on")., then up-sample it so there are now 4X the number of horizontal pixels, but still display the image as 6" wide, you will see no difference. It is not blurry at all, it's identical.

If you took an audio signal sampled at 10kHz (5kHz audio bandwidth) and up-sampled it 4x to 40kHz, the resulting audio spectrum would still be 5kHz wide. Nothing is improved by up-sampling.

(note: I always understood DPI as the dots inside a square inch, not the dots of the first line of a square inch, but I see that you are correct here.

Maybe it was because we usually have square inches, implying that X and Y have the same amount of dots and, as a simplification, only the X was mentioned.)

DPI= Dots Per Inch. It's a measure of printed resolution only, does not apply to digital until it is printed to physical dimensions. A digital image has no physical dimensions.

DPI can apply to scanning relative to an original size, but once the image is digital, unless you print it, DPI is meaningless.

Re: Old audio recordings in high frequencies and bitrates

Reply #22 – 2020-09-19 05:54:17

@dc2bluelight :
We both have been talking about analog or digital but not being clear enough when we were implying limits or not. (Just like an example. I talked about the light spectral bandwidth and you talked about the digitized RGB pixels)

There are still some of your comments that seem incorrect to me, but i'm not sure I have the knowledge or time to check if I'm wrong.

He is however correct.

But now I see where your association of bigger image resolution to higher audio frequency comes from.
If you think of the image pixels as "number of samples taken per unit time" (as you said above), then you are making a big mistake.

Pixels are spatial samples, so they're units of pixels per distance, not per time. Otherwise, same thing. Nyquist still applies and so on.

Audio samples are sampled one each period of time, this gives them the characteristics that make the reconstruction filter work.

This is not true. There is actually nothing in the sampling theorem that is specific to time. It applies equally to all types of sampling over any independent variable.

If we select interpolation none with audio, we get aliasing, which is a bad thing for our purposes.

You actually cannot get aliasing from upsampling, either with audio or image data. If you choose none, you'll get "imaging" in both cases (mirroring of the pass band onto higher frequencies). It is the opposite of aliasing (mirroring of higher frequencies onto the pass band).

By the way, "imaging" on an image gives the appearance of pixels, which are artifacts. Interpolation removes them.