Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: speech codecs overview (Read 11860 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

speech codecs overview

Hello,

could be someone so kind and fill in this speech codecs overview, please?

IMHO, it would be G R E A T help for all forum readers. I'll start fill in
with the only one i know, the Speex. Especially Jmvalin's and John33's posts
would be appreciated.

Speex http://www.speex.org/ open source
CELP
HVXC
PureVoice
VoxWare
Twin VQ
LPC

Thank You!

speech codecs overview

Reply #1
Quote
Speex http://www.speex.org/ open source
CELP
HVXC
PureVoice
VoxWare
Twin VQ
LPC

CELP (Code Excited Linear Prediction) is not a codec itself, but a widely used speech coding technique. Speex uses CELP like many other codecs use either CELP or a variant (ACELP being the most common).
I don't know anything about HVXC, PureVoice and VoxWare.
AFAIK, Twin VQ is for audio/music and not speech
LPC (Linear Prediction Coefficients/Linear Predictive Coding) again is not a codec but a (general) technique. There's a (mulitary) standard called LPC10 (LPC vocoder) though. It runs at 2.4 kbps and the quality is very bad.

speech codecs overview

Reply #2
Quote
VoxWare

Voxware isn't properly a codec. It doesn't encode the speech samples, it simulates them. That's why it's called "VoxWare Meta Sound"

Quote
Twin VQ


Like jmvalin already posted, TwinVQ is not a speech codec.

Also, you can add to the list:

-GSM (several variants, actually)
-g729 (lots of people are raving about it as the next best thing)
-DSP Group True Speech

There are others, but these are the ones I can remember from the top of my head.

I'm still planning on a vocodec test to be started by november (that is, if jmvalin is still interested in helping me out with this one  )

speech codecs overview

Reply #3
Quote
Voxware isn't properly a codec. It doesn't encode the speech samples, it simulates them. That's why it's called "VoxWare Meta Sound"

This intrigues me. Can you explain that further?
I remember way back (early 1997) I used to use Voxware for some speech encodings and it seemed pretty good at the time (not having much to compare it to) but RealAudio v3 came out later that year and eclipsed the quality so I switched.

speech codecs overview

Reply #4
Quote
This intrigues me. Can you explain that further?

Finding information about MetaVoice (VoxWare's voice compression technology) these days is a nightmare. VoxWare itself has stopped marketing it's compression technology and tools like ToolVox, TeleVox, Etc.

I saved this page from a mirror a long time ago, I can't find that mirror's location anymore:
http://www.rjamorim.com/rrw/metavoice/metavoice.html

To summarize things, MetaVoice models the components of the human voice ("resonance, pitch, timbre, timing, and character") and stores this modelling inside the stream, instead of just compressing the speech like most codecs do.

Don't forget to click the "Compression Analogy" link, it's quite funny.

Regards;

Roberto.


speech codecs overview

Reply #6
@rjamorim
Thanks for the link on Metavoice! I'd always wondered how you could mess with the playback speed so easily with minimal side effects on the quality of the output.

speech codecs overview

Reply #7
Quote
could be someone so kind and fill in this speech codecs overview, please?

IMHO, it would be G R E A T help for all forum readers. I'll start fill in
with the only one i know, the Speex. Especially Jmvalin's and John33's posts
would be appreciated.

In case it is of interest to someone, these are the speech codecs I do know about

ITU-T:
G.711 PCM 64 kbps (u-law, A-law)
G.721 ADPCM
G.722 ADPCM 48-64 kbps wideband codec
G.722.1 wideband codec by Picturetel
G.722.2 ACELP multi-rate wideband codec (aka AMR-WB), targeted at cell phones
G.723 ADPCM (renamed to G.726 I think)
G.723.1 ACELP 5.3 kbps, 6.3 kbps (used mostly in VoIP)
G.726 ADPCM
G.728 LD-CELP 16 kbps (low-delay CELP)
G.729 CS-ACELP 8 kbps (used mostly in VoIP)

GSM:
GSM-FR (full rate) (RPE-LTP) 13.2 kbps "old" GSM codec for which there's a free implementation, used in many free VoIP apps. This is what most people call the "GSM codec"
GSM-HF (half rate) (VSELP?) low bit-rate GSM codec (used in GSM cell phones)
GSM-EFR (ACELP) ~12 kbps. Latest GSM codec with much better quality than GSM-FR (no free implementation though)

Misc:
AMR-NB 4.7-12 kbps collection of narrowband codecs with possibility to switch bit-rate depending on error rate, used for cell phones
IS-54 VSELP 8 kbps codec used in TDMA cell phones
iLBC 13-15 kbps by GIPS. Free license but not open-source.
DoD MELP 2.4 kbps military standard with decent quality at very low bit-rate
LPC10 2.4 kbps military standard with very poor quality

For a comparison of these codecs with Speex in terms of features (no quality tests yet), see this

This is the meaning of the acronyms:
LPC: Linear Prediction Coefficients / Linear Predictive Coding
CELP: Code Excited Linear Prediction
ACELP: Algebraic CELP
CS-ACELP: Conjugate Structure - ACELP
LD-CELP: Low-Delay CELP
RPE-LTP: Regular Pulse Excitation - Long-Term Prediction
AMR-NB/WB: Adaptive Multi-Rate (narrowband, wideband)
VSELP: Vector-Sum Excited Linear Prediction
MELP: Mixed Excitation Linear Prediction
PCM: Pulse Code Modulation
ADPCM: Adaptive Differential Pulse Code Modulation

speech codecs overview

Reply #8
Some additions and minor corrections...
AMR-NB => 4.75-12.2 kbps, 8 codecs.
AMR-WB => 6.6-23.85 kbps, 5 codecs (mentioned as G.722.2 ACELP)
AMR codecs are used in GSM so they could have been mentioned under that heading as well. But they are also used in other cellular standards. I think NB is the same type as GSM-EFR, i.e ACELP. NB cuts of somewhere at 3-4 kHz, and WB somewhere at 7 kHz. The quality difference is huge...

GSM-FR => 13 kbps
GSM-HR => 6.5 kbps (not 100% sure)
GSM-EFR => 12.2 kbps
I am not sure that the speech codec (EFR) itself is much better than FR, but it uses slightly less bits on the radio interface. Bits that instead are used to protect the data from errors in a better way. EFR is the most widely used nowadays. You would have to have a really old phone to be using FR. HR is hardly used at all.

speech codecs overview

Reply #9
Thank you all! I didn't expected so huuuuge response!

Especially i would like to thanks jmvalin and Skymmer for comprehensive help!

speech codecs overview

Reply #10
Quote
I think NB is the same type as GSM-EFR, i.e ACELP. NB cuts of somewhere at 3-4 kHz, and WB somewhere at 7 kHz. The quality difference is huge...

[...]

I am not sure that the speech codec (EFR) itself is much better than FR, but it uses slightly less bits on the radio interface. Bits that instead are used to protect the data from errors in a better way. EFR is the most widely used nowadays. You would have to have a really old phone to be using FR. HR is hardly used at all.

AFAIK, GSM-EFR is the same (or very similar) as the highest AMR-NB bit-rate. As for the quality difference between GSM-FR and GSM-EFR, it is *huge*. I haven't done any formal testing, but to give you an idea, I find (my ear only) GSM-FR to be roughly equivalent to Speex @ 8 kbps, while GSM-EFR is almost as good as Speex @ 15 kbps.

speech codecs overview

Reply #11
Quote
AFAIK, GSM-EFR is the same (or very similar) as the highest AMR-NB bit-rate. As for the quality difference between GSM-FR and GSM-EFR, it is *huge*. I haven't done any formal testing, but to give you an idea, I find (my ear only) GSM-FR to be roughly equivalent to Speex @ 8 kbps, while GSM-EFR is almost as good as Speex @ 15 kbps.

I haven't done any listening myself, just picked up what I have heard from colleagues, so you are probably right. Anyway if the difference of the speech codecs themselves is huge, then the difference due to improved error protection on the radio interface is even bigger. (getting a bit OT here, sorry..)

GSM-EFR and AMR-NB 12.2 is almost the same, only minor corrections was made in AMR-NB 12.2.

speech codecs overview

Reply #12
At last it looks like i found someone with knowledge 

i have a umax DVcamera that uses a dvi_adpcm (0x0011) Intel Corporation codec

i have found a lot of threads all about it but can't find the codec anywhere

please tell me you know where i can get hold of it

speech codecs overview

Reply #13
Quote
Like jmvalin already posted, TwinVQ is not a speech codec.

Twin VQ may not qualify as a speech coder but at 8 kbps, sampling rate 8kHz, speech with background music such as those you often hear on radio broadcasts, can be encoded very well.. In fact it would outperform many LPC based speech codec..

However, the analysis window  length of Twin VQ is about 4096 samples and the need for window length switching caused unexceptable encoder delays.. Too long for realtime communications.. However for non-realtime applications Twin VQ would even outperform AAC at those bitrates..

speech codecs overview

Reply #14
Any idea how much CPU the common speech codecs use for encoding and decoding, particularly the allegedly patent-free ones?

speech codecs overview

Reply #15
Quote
Any idea how much CPU the common speech codecs use for encoding and decoding, particularly the allegedly patent-free ones?

I can only give an idea for Speex. The amount of CPU required depends a lot on the sampling rate and bit-rate used. For 8 kHz/8 kbps, I can encode in real-time on my Pentium-M with about 2-3% CPU. With the fixed-point port, I've been able to do real-time encoding+decoding with about 50% CPU on a 140 MHz StrongArm.