HydrogenAudio

Lossy Audio Compression => Speech Codecs => Topic started by: Lee James on 2023-08-01 18:53:26

Title: Recommend a speech codec - are there any comparisons anywhere?
Post by: Lee James on 2023-08-01 18:53:26
I'm looking for a lossy speech codec to pack a large amount of speech into a game I'm making.

• Ideally a non-proprietary codec
• Low quality, high compression
• My audio will be mono, ideally 11,025Hz
• Bit depth will ideally be around 5 or 6 bit, could go higher if necessary

Any suggestions?

Also, does anyone know if there are any speech codec comparisons out there?
Title: Re: Recommend a speech codec - are there any comparisons anywhere?
Post by: ktf on 2023-08-01 18:59:19
Well, Opus of course.

There are a few speech codec comparisons linked here: https://opus-codec.org/comparison/ (at the bottom)
Title: Re: Recommend a speech codec - are there any comparisons anywhere?
Post by: Lee James on 2023-08-01 23:47:44
Wow! ktf thank you so much! I had not heard of Opus but it seems AMAZING! I was thinking I'd have to go with mp3 because it's the only free option. This is fantastic! THANK YOU! :-)
Title: Re: Recommend a speech codec - are there any comparisons anywhere?
Post by: Artoria2e5 on 2023-08-10 18:00:01
You can find Opus speech samples at https://jmvalin.ca/opus/opus-1.3/. Opus is near state-of-the-art -- there's better, but then you get bad tooling and/or licensing issues.


So what if you want to go lower? You know, may be your game has no texture at all and sound is the only blob (I'm being ridiculous)?

On one path you have Google and Microsoft throwing money into neural network magic, trying to get cheaper phone calls. Google's Lyra (https://github.com/google/lyra) is open-source and delivers mind-boggling quality at 3 kbps -- on pure speech samples only. Samples: V1 (https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html), V2 (current version) (https://opensource.googleblog.com/2022/09/lyra-v2-a-better-faster-and-more-versatile-speech-codec.html). Note that whoever wrote the V2 blog page cleverly decided to not try any music this time.

On the other path you have HAMs going... HAM (ba dum tass) on lower bitrates, because radio bandwidth is precious. Codec 2 by DG Rowe is a classical speech codec pushed near its limits. You WILL hear artifacts typical of old-school speech codecs and "radio" sound effect filters.