Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Intelligibility of 8kHz sampling rate (Read 3648 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Intelligibility of 8kHz sampling rate

Hello. When I downsample a speech clip to 8kHz, it generally loses it's intelligibility because some voiceless sounds can't be heard. But when I talk with my cellphone and the system decides to use 8kHz sampling rate (it sometimes uses 16kHz), speech stays completely intelligible (all voiceless sounds are reproduced correctly). Does the cellphone or the system apply a DSP to enhance the voiceless sounds? If yes, what is the name of this process so I can investigate it? If no, why does this happen?

Re: Intelligibility of 8kHz sampling rate

Reply #1
Can you post some examples of the same speech in 16 kHz and 8 kHz, to illustrate what you mean by lost intelligibility?

Re: Intelligibility of 8kHz sampling rate

Reply #2
I attached the 16kHz (original) and 8kHz version of a speech taken from https://www.speex.org/samples/ (has a cutoff at 3.5kHz but this is what telephones do as far as I know). I also attached an 8kHz-by-original speech taken from the same page (has the same cutoff) which is totally intelligible to me.

Re: Intelligibility of 8kHz sampling rate

Reply #3
I'm wondering, is this better?


Re: Intelligibility of 8kHz sampling rate

Reply #5
Oh, well, no harm in trying.

Anyway, I was trying to match the spectrums of the two samples. The first difference between "nb_male" and "male" was volume. After roughly matching them:
X
the obvious next step was to get rid of that "hump" below 400 Hz:
Code: [Select]
sox "nb_male.wav" "nb_male2.wav" gain 8.0 highpass 400
X
To me the sample was intelligible from the start but I think this improved it (to me) nevertheless. Then I used eq at 1k and 1.6k to match it further:
Code: [Select]
sox "nb_male.wav" "nb_male3.wav" gain 6.4 highpass 400 equalizer 1000 400h 5 equalizer 1600 800h 5
X
but the difference was negligible.

The remaining difference I can hear is that "nb_male2" still has much higher bacground noise than "male". Maybe that's something that's affecting you?

And for this specific example, the voice in "male" seem to be some kind of professional radio or TV presenter of old, with very clear diction. Maybe it also plays some role in how clear it sounds.

Re: Intelligibility of 8kHz sampling rate

Reply #6
Oh, well, no harm in trying.

Anyway, I was trying to match the spectrums of the two samples. The first difference between "nb_male" and "male" was volume. After roughly matching them:
[attach type=image]34271[/attach]
the obvious next step was to get rid of that "hump" below 400 Hz:
Code: [Select]
sox "nb_male.wav" "nb_male2.wav" gain 8.0 highpass 400
[attach type=image]34273[/attach]
To me the sample was intelligible from the start but I think this improved it (to me) nevertheless. Then I used eq at 1k and 1.6k to match it further:
Code: [Select]
sox "nb_male.wav" "nb_male3.wav" gain 6.4 highpass 400 equalizer 1000 400h 5 equalizer 1600 800h 5
[attach type=image]34275[/attach]
but the difference was negligible.

The remaining difference I can hear is that "nb_male2" still has much higher bacground noise than "male". Maybe that's something that's affecting you?

And for this specific example, the voice in "male" seem to be some kind of professional radio or TV presenter of old, with very clear diction. Maybe it also plays some role in how clear it sounds.

I want to clarify: You did make it more like the reference but this did not make it more intelligible. And, nb_male is intelligible for me too, it's just significantly less intelligible than the reference (harder to understand, and impossible to understand when with significant background noise).

I feel like the main difference between the samples affecting intelliigibility is the 's' sounds. Maybe the difference in some other consonants like 'ɕ' (IPA symbol, meaning "sh" in English) (not limited with this, and including some which aren't included in this sample) are affecting intelligibility too, I don't know. I think there's some DSP which enhances some specific consonants.

Re: Intelligibility of 8kHz sampling rate

Reply #7
Does the cellphone or the system apply a DSP to enhance the voiceless sounds? If yes, what is the name of this process so I can investigate it?
Most likely, blind bandwidth extension, since this works quite well for speech signals (i.e. the bandwidth extension behavior can be trained using typical low-noise voice calls).

Chris
If I don't reply to your reply, it means I agree with you.

Re: Intelligibility of 8kHz sampling rate

Reply #8
Most likely, blind bandwidth extension, since this works quite well for speech signals (i.e. the bandwidth extension behavior can be trained using typical low-noise voice calls).

It isn't a bandwidth extension as this effect is also present on an 8kHz file, but I wonder what does it sound like so can you tell me how can I try (listen) an implementation of it @C.R.Helmrich ?

Re: Intelligibility of 8kHz sampling rate

Reply #9
Blind bandwidth extension of 8-kHz audio usually involves an upsampling to 16 kHz or so during playback and the actual extension process, to have more frequency range to operate on. I don't know how to "record" the output of that process to a file, it's only done during playback IIRC. You'd probably have to record your headphones output externally.

Chris
If I don't reply to your reply, it means I agree with you.

 

Re: Intelligibility of 8kHz sampling rate

Reply #10
Blind bandwidth extension of 8-kHz audio usually involves an upsampling to 16 kHz or so during playback and the actual extension process, to have more frequency range to operate on. I don't know how to "record" the output of that process to a file, it's only done during playback IIRC. You'd probably have to record your headphones output externally.

I don't need to record it, do you know how can I listen to some examples of it (I want to listen my samples too but it's not required)?