Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: _Decibeled_ human hearing (Read 7445 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

_Decibeled_ human hearing

Hello everybody,

I've been reading and studying some stuff about the psychoacoustics of hearing, and, as far as I've understood, the human ear behaves in a logarithmic way, being this the reason of using the dB scale to measure the intensity (or power) of a sound. Ok, so, here is my doubt: does the previous statement mean that a change of less than 1 dB will be imperceptible to an average listener? (I know, this is not a _binary_ effect, so, what I really mean here is that changes approaching 1 dB will be practically distinguishable (or really distinguishable) while changes much lesser than 1 dB will be almost or completely distinguishable). And also supposing we are working at a dB level above the Absolute Threshold of Hearing (if below, not only the changes will be inaudible, but also the sound itself will).

Am I right or did I get it wrong?

I'm sorry for just posting to get someone to confirm or correct my thoughts, but I really need to be sure about this.

Thank you.

_Decibeled_ human hearing

Reply #1
You are correct. I don't recall the exact difference in loudness in dB that is considered the minimum audible difference, but it is indeed on the order of 1 dB.

_Decibeled_ human hearing

Reply #2
Ok, so, here is my doubt: does the previous statement mean that a change of less than 1 dB will be imperceptible to an average listener? (I know, this is not a _binary_ effect, so, what I really mean here is that changes approaching 1 dB will be practically distinguishable (or really distinguishable) while changes much lesser than 1 dB will be almost or completely distinguishable).


Well I don't see why logarithmic hearing implies that the resolution is 1 dB rather then 0.5 or 2dB (or anything really), but its often said to be about 1dB or a little less. 

FWIW I once tried ABXing replaygain volumes and was able to detect changes of 0.5 dB with great effort.


_Decibeled_ human hearing

Reply #3
Hello everybody,

I've been reading and studying some stuff about the psychoacoustics of hearing, and, as far as I've understood, the human ear behaves in a logarithmic way, being this the reason of using the dB scale to measure the intensity (or power) of a sound. Ok, so, here is my doubt: does the previous statement mean that a change of less than 1 dB will be imperceptible to an average listener? (I know, this is not a _binary_ effect, so, what I really mean here is that changes approaching 1 dB will be practically distinguishable (or really distinguishable) while changes much lesser than 1 dB will be almost or completely distinguishable). And also supposing we are working at a dB level above the Absolute Threshold of Hearing (if below, not only the changes will be inaudible, but also the sound itself will).

Am I right or did I get it wrong?

I'm sorry for just posting to get someone to confirm or correct my thoughts, but I really need to be sure about this.

Thank you.



The Decibel scale is a useful scale for human hearing, but it is not based on human hearing.

1dB is a very small change in level and is hard to hear in many situations. 3dB is more obvious but still quite small.

_Decibeled_ human hearing

Reply #4

A lot depends on the frequency distribution of the dip or peak.  A 1db prominence over several octaves where the ear is most sensitive can, if memory serves me, be clearly heard under double blind conditions.  On the other hand a very narrow 5db dip may be inaudible.  I believe that broadband differences of well less than 1db have been heard under double blind conditions.

As always, correction by the truly knowledgeable would be welcome.
Ed Seedhouse
VA7SDH

_Decibeled_ human hearing

Reply #5
Thank you very much.

One more doubt then. Given a signal (acoustic, of course), it's mask (noise or tone mask) has no correlation with the perceptibility of a variation in the intensity (say a increase or decrease of x dB) of the signal at a given frequency, besides that a change when the signal is far below the mask, it is probably not perceptible, but because the signal itself is not audible, and besides that if the resulting intensity crosses the mask, then the signal at that point will become audible (if was below the mask an we increase it) or inaudible (if was above the mask an we decrease it), right? With correlation I mean that the mask tells us nothing about the effect that a change at a given frequency will have.

Thanks again.

_Decibeled_ human hearing

Reply #6
One more doubt then. Given a signal (acoustic, of course), it's mask (noise or tone mask) has no correlation with the perceptibility of a variation in the intensity (say a increase or decrease of x dB) of the signal at a given frequency, besides that a change when the signal is far below the mask, it is probably not perceptible, but because the signal itself is not audible, and besides that if the resulting intensity crosses the mask, then the signal at that point will become audible (if was below the mask an we increase it) or inaudible (if was above the mask an we decrease it), right? With correlation I mean that the mask tells us nothing about the effect that a change at a given frequency will have.


It's only occasionally that a lossy encoder will completely silence a frequency component because it deems it to be either fully masked or and completely devoid of influence on human hearing.

This is how I originally naively imagined lossy coders worked - producing a frequency spectrum with a load of zeroes (easy to store in little space) and a few spikes for the small number of sinusoids present, which might be encoded efficiently thanks to harmonics. Real-world music signals, Transform-overlapping and sampling resolution plus a good deal of broadband noise and transients means that sampled digital audio rarely looks that simple when frequency-analyzed. My naive way of envisaging it was wrong.

If we consider an MP3 encoder, for example (somewhat crudely, but to make the point) aiming for just-transparent encoding:
• the audio frame is divided into many sub-bands in the time domain, each of which has fewer samples that represent a frequency band (with soft edges).
• each subband is transformed by a DCT to product DCT coefficients (effectively each is a frequency component or frequency, though it's not quite so simple).
• Each DCT coefficient starts as a floating point number but eventually has to be expressed to a certain binary precision (e.g. 2-bit, 3-bit, ...8-bit, 16-bit) with some scaling factor. This limited precision creates rounding error which is mathematically equivalent to adding noise to the true value of that coefficient. The scaling factor adjusts the absolute level of this noise. It is called quantization noise and its statistics and spectral characteristics are well understood in relation to the bit-depth applied.
• The precision (bit-depth) chosen to encode the DCT coefficients in each sub-band is such that the frequency-components of the quantization noise are kept just below the lowest masking level within that sub-band (the masking level was determined by the analysis process, using FFT, according to the psychoacoustic model of human hearing). Reducing the bit-depth like this means the audio can be encoded using fewer bits (the aim of lossy coding). The masking implies that the original signal with the added quantization noise and all its spectral variations should be indistinguishable from the lossless original.

Unlike MP3, some other codec working in its "transparent" region might not use sub-bands but instead choose the precision with which individual frequency components are stored after, say a full-spectrum transform into the frequency domain.

Or like MP2 and MusePack it might not use the DCT, but instead use time-domain samples within the sub-bands and quantize those to the required minimum number of bits to keep the added noise masked. Essentially this is dividing the signal into frequency bands, encoding each band separately with the lowest bit-depth you can get away with so you can save bits. The decoder then reconstitutes the numerous bands into the full spectrum, which will have added noise that varies somewhat over the frequency spectrum but.

LossyWAV goes a step further in simplification in reducing bit-depth of the PCM signal directly, having looked at the spectrum and determined the "noise floor" for each block, which can be directly related to the level of quantization noise that can be added without significantly increasing the noise floor. But because it added essentially white noise, or only lightly shaped, lossyWAV cannot efficiently choose to hide more noise at frequency bands where there's a good masking tone, so it won't come close to matching the low bitrate of MP2, MP3 or MPC for example when they achieve transparency.

The added noise could be calculated in the linear domain and compared to the mask in the linear domain, or the could both be expressed in decibels relative to some reference.

If you think of Test signal A as a strong sinusoid at say 3.0 kHz, with another fairly strong and audible sinusoid at 2.5 kHz where the masking effect is strong.
For Test signal B, perhaps the second sinusoid is at 2.0 kHz where masking is somewhat weaker.
Signals A and B sound different but now we can modify each of them by adjusting the amplitude of the quieter sinusoid in each while leaving the 3 kHz one the same.

Now consider varying the amplitude of the lower-freqeuency sinusoid by 0, 1, 2, 3, 4, 5 dB and randomly and ABXing the difference between the original and the changed amplitude of the lower frequency tone, to determine the threshold of audible change. You can generate tones in various programs to try this out for yourself.

The masking effect would indicate that one would expect to find a higher threshold of audibile change for the 2.5 kHz tone (ABXed against the original 2.5 kHz one) than for the 2.0 kHz tone (ABXed against the original 2.0 kHz tone). We could test this to find out.
Dynamic – the artist formerly known as DickD

_Decibeled_ human hearing

Reply #7
as far as i know, the original measurement for sound was the "bell", with 2 bells being twice as loud as 1, 3 bells twice as loud as 2 etc.  of course this proved too course of a measurement, so they took a tenth of a bell, or a "Deci"-bell (metric for a tenth, eg: decimeter), and that became the standard
My $.02, may not be in the right currency

_Decibeled_ human hearing

Reply #8
One more doubt then. Given a signal (acoustic, of course), it's mask (noise or tone mask) has no correlation with the perceptibility of a variation in the intensity (say a increase or decrease of x dB) of the signal at a given frequency, besides that a change when the signal is far below the mask, it is probably not perceptible, but because the signal itself is not audible, and besides that if the resulting intensity crosses the mask, then the signal at that point will become audible (if was below the mask an we increase it) or inaudible (if was above the mask an we decrease it), right? With correlation I mean that the mask tells us nothing about the effect that a change at a given frequency will have.


It's only occasionally that a lossy encoder will completely silence a frequency component because it deems it to be either fully masked or and completely devoid of influence on human hearing.



Uh, no.  Having read the statistics of just a FEW (thousand or more) encodings of audio signals, I am quite comfortable to say that "zero" is a very common result of the quantization in an MDCT in AAC or MP3, or in 1996-era AT&T PAC.

Not supposition, simple measurement.

All you need to do is look at the AAC codebooks to see evidence of this statistic.

As to loudness, it grows as the 1/3.5 power of the energy, assuming that you do not change the frequency content of the signal and that the same parts of the signal remain above the threshold of audibility.

If you change the bandwidth of a signal, however, the rules are very different.

Typical units of loudness (as opposed to intensity, of which dB is one such measure) are sones or phons. dB is not a measure of loudness.
-----
J. D. (jj) Johnston

_Decibeled_ human hearing

Reply #9
Thank you very much, Dynamic, your post has been very helpful.

This is the first time I've heard about this ABX thing, but I've searched some info about it and looks very interseting and useful for what I asked. I'll keep digging on it. Thanks.

_Decibeled_ human hearing

Reply #10
Thank you, Woodinville.

I may have found what I was looking for. Searching for further information concerning what I've learned with your posts, I found some links and texts referring to A-weighting and ITU-R BS 468 weighting curves. If I haven't misunderstood their ideas, what those weightings do is precisely to stablish a subjective weighting (in frequency) of the distortions in audio signals in accordance with the human way of hearing.

As I read some drawbacks to A-weighting ("A-weighting curve underestimates the role low frequency noise plays in loudness, annoyance, and speech intelligibility"), I think ITU-R 468 may be a better option to weight the allowed distortions in a somehow modified audio signal. Reading the ITU-R BS 468, it suggests some maximum tolerances in dB which range from 2 to 0 dB, having the minimum around 6kHz.

_Decibeled_ human hearing

Reply #11
Thank you, Woodinville.

I may have found what I was looking for. Searching for further information concerning what I've learned with your posts, I found some links and texts referring to A-weighting and ITU-R BS 468 weighting curves. If I haven't misunderstood their ideas, what those weightings do is precisely to stablish a subjective weighting (in frequency) of the distortions in audio signals in accordance with the human way of hearing.

As I read some drawbacks to A-weighting ("A-weighting curve underestimates the role low frequency noise plays in loudness, annoyance, and speech intelligibility"), I think ITU-R 468 may be a better option to weight the allowed distortions in a somehow modified audio signal. Reading the ITU-R BS 468, it suggests some maximum tolerances in dB which range from 2 to 0 dB, having the minimum around 6kHz.



You're getting there. If you go to www.aes.org/sections/pnw/ppt.htm you'll find a "loudness tutorial" that will lead on from there.
-----
J. D. (jj) Johnston


_Decibeled_ human hearing

Reply #13
Ok, so, here is my doubt: does the previous statement mean that a change of less than 1 dB will be imperceptible to an average listener? (I know, this is not a _binary_ effect, so, what I really mean here is that changes approaching 1 dB will be practically distinguishable (or really distinguishable) while changes much lesser than 1 dB will be almost or completely distinguishable).


Well I don't see why logarithmic hearing implies that the resolution is 1 dB rather then 0.5 or 2dB (or anything really), but its often said to be about 1dB or a little less. 

FWIW I once tried ABXing replaygain volumes and was able to detect changes of 0.5 dB with great effort.


Sounds about right.  1 dB can be a fairly easy level shift to hear. 0.3 dB is darn near impossible regardless, or simply impossible.

Our ability to hear loudness differences depends on the level at which we are listening, the amplitude variations, and  power spectral density of the signal.

The ear's sensitivity to changes peaks at a level that varies with the individual, but is near 85 dB.

Sounds whose level and/or  spectral content changes a lot and quickly tend to reduce our ability to hear level differences.

Sounds near the high and low frequency extremes also tend to reduce our ability to hear level differences. So sounds with the midrange sucked out can conceal larger level differences.

_Decibeled_ human hearing

Reply #14
Having read the statistics of just a FEW (thousand or more) encodings of audio signals, I am quite comfortable to say that "zero" is a very common result of the quantization in an MDCT in AAC or MP3, or in 1996-era AT&T PAC.
The result on the output isn't like a filter though.

What I mean is, especially in mp2 and mp3, you can often see (and hear!) that one of the 32 frequency bands was switched off for one or more blocks.

Whereas zeroing one or more MDCT coefficients doesn't usually have quite such a clean effect on the output signal, since aliasing, overlap etc etc prevent you from seeing/hearing the "hole".

I realise it's only a matter of degree, but in practice the results look and sound different.


I think what the OP is getting at is that many over-simplistic explanations of psychoacoustic coding simply talk of "removing inaudible frequencies" only - whereas for the most part, psychoacoustic codecs are approximating (quantising, adding noise to) all frequencies.

(Not trying to teach my grandmother to suck eggs here  - I know you know 1000000x better than me, and I'm sure you've seen many more common misconceptions than me!)

Cheers,
David.