Differences in voice encoding methods

Topic: Differences in voice encoding methods (Read 5873 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Differences in voice encoding methods

2015-10-13 23:37:50

I'm currently trying to teach myself about voice encoding, but am struggling to get my head around a few things (or even find information). I apologise, but this post will have a few questions spread throughout it.

From what I understand, the two main forms of encoders are "Waveform coders" and "Vocoders", where the former prefers quality and the latter prefers bit-rate.

Waveform coders can be in the time domain or frequency domain, but what's the difference between them (in regards to the end result)? And what are the differences between the main kinds (PCM, ADPCM, APC; sub-band coders, adaptive transform coders).

Vocoders I know even less about. All I know of is the LPC, and I don't know much about it. In short, what is there to know about Vocoders, and what are the basic ones worth learning about (as well as their pros/cons)?

I know I've asked a lot, but any additional information regarding voice encoding would be greatly appreciated. Thanks in advance.

Differences in voice encoding methods

Reply #1 – 2015-10-14 02:31:56

Overview with links:

https://en.wikipedia.org/wiki/Data_compression#Audio

Probably a good idea to look up what PCM is too before you start asking about audio compression too. Its fundamental.

Differences in voice encoding methods

Reply #2 – 2015-10-15 06:28:51

Quote from: saratoga on 2015-10-14 02:31:56

Probably a good idea to look up what PCM is too before you start asking about audio compression too. Its fundamental.

I understand how PCM works and that a bunch is based off of it, but I don't know where it is used nowadays (if at all).

Differences in voice encoding methods

Reply #3 – 2015-10-15 15:08:11

Quote from: Lewis_11 on 2015-10-15 06:28:51

Quote from: saratoga on 2015-10-14 02:31:56
Probably a good idea to look up what PCM is too before you start asking about audio compression too. Its fundamental.

I understand how PCM works and that a bunch is based off of it, but I don't know where it is used nowadays (if at all).

PCM is digital audio, so anytime a computer is making sound, that is almost certainly PCM.

Differences in voice encoding methods

Reply #4 – 2015-10-16 03:00:45

All that aside, would you happen to be able to describe the difference between time domain and frequency domain waveform encoders to me? Like, when is one preferable to the other and why?

Differences in voice encoding methods

Reply #5 – 2015-10-16 22:44:56

I don't know much about audio compression, but I understand the difference in the time domain and the frequency domain...

Do you know what the frequency domain is? Do you know about the Fourier Transform?

You might want to read Inside The MP3 CODEC. Or, if you really want to dig into MP3, LAME is open source so you can study the source code (but you have to be a programmer to understand it). The LAME website says "LAME is an educational tool to be used for learning about MP3 encoding."

Quote

....when is one preferable to the other and why?

Again I'm not an expert, but i believe lossless compression has to be in the time domain (because the original audio data is time domain)

And, psychoacoustics and perceptual encoding require analysis of the frequency content. In other words, you can do lossy compression in the time domain, but if you want to do smart lossy compression where the lost information is least-likely to be missed, you've got to know something about the frequency content.

Quote

...but I don't know where it is used nowadays (if at all).

Analog-to-digital converters and digital-to-analog converters use PCM. Digital audio editing or other digital signal processing is done in PCM. i.e. If you open an MP3 (or any compressed file) in Audacity it get's decompressed before you can edit it or see the waveform.

Differences in voice encoding methods

Reply #6 – 2015-10-17 16:45:43

Quote from: DVDdoug on 2015-10-16 22:44:56

Again I'm not an expert, but i believe lossless compression has to be in the time domain (because the original audio data is time domain)

As long as your conversion between time and frequency domains is lossless, you can have lossless compression in the frequency domain.

I couldn't tell you if it's useful or not; I've never seen any audio codecs that do this, but that could just be because no one has tried.

Differences in voice encoding methods

Reply #7 – 2015-10-17 17:31:25

Quote from: DVDdoug on 2015-10-16 22:44:56

Analog-to-digital converters and digital-to-analog converters use PCM.

...except for the ones that don't.

https://en.wikipedia.org/wiki/Delta-sigma_modulation

Differences in voice encoding methods

Reply #8 – 2015-10-18 01:37:50

Quote from: Lewis_11 on 2015-10-16 03:00:45

All that aside, would you happen to be able to describe the difference between time domain and frequency domain waveform encoders to me?

These terms are not widely used, and have different meanings to different people. Most perceptual audio codecs either work in the frequency domain, or use some kind of frequency analysis though. Perhaps time domain codecs would be very simple formats as well as non-perceptual audio codecs.

Differences in voice encoding methods

Reply #9 – 2015-10-18 03:43:08

Quote from: saratoga on 2015-10-18 01:37:50

These terms are not widely used, and have different meanings to different people.

Like this one?

Quote from: Lewis_11 on 2015-10-13 23:37:50

"Vocoders"

Differences in voice encoding methods

Reply #10 – 2015-10-20 12:24:37

Alrighty, so I've done hours upon hours of research and now understand everything that's been brought up in this thread, as well as a bunch of coders.

One thing I haven't found an answer to though is the difference in specs for hybrid coders. I understand how MPE, RPE and CELP coders work, as well as what their differences are, but I can't find solid documentation comparing them to one another in similar situations. Like, if the bit-rate or sampling rate were the same, which would have better MOS and other specs, etc?

Notice