Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Differences in voice encoding methods (Read 5873 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Differences in voice encoding methods

I'm currently trying to teach myself about voice encoding, but am struggling to get my head around a few things (or even find information). I apologise, but this post will have a few questions spread throughout it.

From what I understand, the two main forms of encoders are "Waveform coders" and "Vocoders", where the former prefers quality and the latter prefers bit-rate.

Waveform coders can be in the time domain or frequency domain, but what's the difference between them (in regards to the end result)? And what are the differences between the main kinds (PCM, ADPCM, APC; sub-band coders, adaptive transform coders).

Vocoders I know even less about. All I know of is the LPC, and I don't know much about it. In short, what is there to know about Vocoders, and what are the basic ones worth learning about (as well as their pros/cons)?

I know I've asked a lot, but any additional information regarding voice encoding would be greatly appreciated. Thanks in advance.


Differences in voice encoding methods

Reply #2
Probably a good idea to look up what PCM is too before you start asking about audio compression too.  Its fundamental.

I understand how PCM works and that a bunch is based off of it, but I don't know where it is used nowadays (if at all).

Differences in voice encoding methods

Reply #3
Probably a good idea to look up what PCM is too before you start asking about audio compression too.  Its fundamental.

I understand how PCM works and that a bunch is based off of it, but I don't know where it is used nowadays (if at all).


PCM is digital audio, so anytime a computer is making sound, that is almost certainly PCM.

Differences in voice encoding methods

Reply #4
All that aside, would you happen to be able to describe the difference between time domain and frequency domain waveform encoders to me? Like, when is one preferable to the other and why?

Differences in voice encoding methods

Reply #5
I don't know much about audio compression, but I understand the difference in the time domain and the frequency domain...

Do you know what the frequency domain is?  Do you know about the Fourier Transform?

You might want to read Inside The MP3 CODEC.  Or, if you really  want to dig into MP3, LAME is open source so you can study the source code (but you have to be a programmer to understand it).  The LAME website says "LAME is an educational tool to be used for learning about MP3 encoding."

Quote
....when is one preferable to the other and why?
Again I'm not an expert, but i believe lossless compression has to be in the time domain (because the original audio data is time domain)

And, psychoacoustics and perceptual encoding require analysis of the frequency content.  In other words, you can do lossy compression in the time domain, but if you want to do smart  lossy compression where the lost information is least-likely to be missed, you've got to know something about the frequency content.


Quote
...but I don't know where it is used nowadays (if at all).
Analog-to-digital converters and digital-to-analog converters use PCM.    Digital audio editing or other digital signal processing is done in PCM.    i.e. If you open an MP3 (or any compressed file) in Audacity it get's decompressed before you can edit it or see the waveform.

Differences in voice encoding methods

Reply #6
Again I'm not an expert, but i believe lossless compression has to be in the time domain (because the original audio data is time domain)

As long as your conversion between time and frequency domains is lossless, you can have lossless compression in the frequency domain.

I couldn't tell you if it's useful or not; I've never seen any audio codecs that do this, but that could just be because no one has tried.

Differences in voice encoding methods

Reply #7
Analog-to-digital converters and digital-to-analog converters use PCM.

...except for the ones that don't.

https://en.wikipedia.org/wiki/Delta-sigma_modulation

Differences in voice encoding methods

Reply #8
All that aside, would you happen to be able to describe the difference between time domain and frequency domain waveform encoders to me?


These terms are not widely used, and have different meanings to different people.  Most perceptual audio codecs either work in the frequency domain, or use some kind of frequency analysis though.  Perhaps time domain codecs would be very simple formats as well as non-perceptual audio codecs.

Differences in voice encoding methods

Reply #9
These terms are not widely used, and have different meanings to different people.

Like this one?
"Vocoders"

Differences in voice encoding methods

Reply #10
Alrighty, so I've done hours upon hours of research and now understand everything that's been brought up in this thread, as well as a bunch of coders.

One thing I haven't found an answer to though is the difference in specs for hybrid coders. I understand how MPE, RPE and CELP coders work, as well as what their differences are, but I can't find solid documentation comparing them to one another in similar situations. Like, if the bit-rate or sampling rate were the same, which would have better MOS and other specs, etc?