Skip to main content
Topic: whats the best way to encode Spoken Words? (Read 7284 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

whats the best way to encode Spoken Words?

im am going to be encoding a 60CDs collection of Spoken text from the bible.
My goal is to compress it down to 1CD. In the end result i am looking for half way decent audio quaily.
Which format would be the best way to achive this?

I am thinking that i should use vorbis -q 0.
Or maybe a low bitrate Lame preset?

I just figured i would try to get some suggestions before i start this task.

thank you
-ty1er

whats the best way to encode Spoken Words?

Reply #1
I believe Speex is tailored for speech at very low bitrate. i've not tried it so can't comment on the quality.

Its on HA front page at the moment if you want to give it a try.
"If you cannot read this, please ask the flight attendant for assistance."
- United Airlines Flight Safety Brochure

whats the best way to encode Spoken Words?

Reply #2
I recently compressed 13 CDs of spoken word down to 1 CD using something around -q4.  But to do the same for 60  at -q0 I guess would take about 3 CDs!  What you could try is using a low quality setting (-q-1) AND downsampling to 22KHz or even lower.  Obviously the quality will suffer some, but on spoken word it's not too bad.

Cheers, Paul

whats the best way to encode Spoken Words?

Reply #3
Quote
Originally posted by ty1er
im am going to be encoding a 60CDs collection of Spoken text from the bible.
My goal is to compress it down to 1CD. 
...
I am thinking that i should use vorbis -q 0. Or maybe a low bitrate Lame preset?

In order to fit 60CDs worth of speech onto only 1 CD, you're going to have to use a bitrate of 24kbits/second (1440/60). Vorbis -q 0 is approximately 64kbits, which would compress onto around 3CDs.

Neither LAME nor Vorbis are optimised for low-bitrate speech encoding, but for mid-bitrate music. That's not to say that you can't use them to encode speech - but there may be alternatives which will get you better sounding speech at the equivalent bitrate. Take a look at this thread, where rc55 is asking about low-bitrate Vorbis settings. He may find one which you can also use.

But supposing that he can't, let's look at the more traditional voice coding formats:

There is a project to create a decent sounding free audio format for low-bitrate voice encoding, called Speex. However, I wouldn't recommend that you use this to make anything permanent yet, as the format is still in development.

There are many other standard voice encoders which would fit into the bitrate constraint which you have (the GSM encoding used by mobile phones, for example), but obviously the lower the bitrate, the less 'natural' the sound, and the more specialised the codec (i.e. they become less able to cope with non-speech, such as background music). You also have to work slightly harder than with MP3 or Vorbis to encode/decode them. If this doesn't put you off, there are several places to look for useful software/source code.

1) HawkVoice.
Quote
This is a free multiplayer voice over network API released under the GNU Library General Public License (LGPL) , with support for Linux/Unix and Windows 9x/ME/NT/2000. It is designed to be a portable, free, open source code alternative to Microsoft®'s DirectPlay® Voice in DX8.

It contains the code needed to encode/decode several standard voice codecs:
Quote
64 kbps - G.711 u-law - LGPL
32 kbps - Intel/DVI ADPCM - Free
13.2 kbps - GSM - Free
4.8 kbps - LPC - Free
ALL - CELP - Free
2.4 - LPC10 - Free
1.8 - OpenLPC - Free
1.4 - OpenLPC - Free

Bear in mind what I mentioned earlier about these being specialised voice codecs. The HawkVoice site contains a page which will let you compare the sound quality of them.

2) Open H.323
This is a project to provide open-source implementations of the standard ITU H.323 teleconferencing protocol. The audio standards included in this specification are:
Quote
G.711 - PCM audio codec 56/64 kbps 
G.722 - audio codec for 7 Khz at 48/56/64 kbps 
G.723 - speech codec for 5.3 and 6.4 kbps 
G.728 - speech codec for 16 kbps 
G.729 - speech codec for 8/13 kbps 

However, I believe that some of these codecs are only supported by the code provided from the above site if you have appropriate telecommunications hardware.

If you're happy to look at non-free solutions, then there is one monster that I have to mention:

3) RealAudio
Sites such as the BBC use Real to encode their audio at a similar bitrate to the one you are using. The downside to using this format is that the format is completely closed, and can only be produced by a quite expensive program (0). The advantage is that it can be played back on a wide variety of systems, via the RealPlayer.
These are the RealAudio voice codec specs:
Quote
5 Kbps Voice - 8 kHz
6.5 Kbps Voice - 8 kHz 
8.5 Kbps Voice - 8 kHz 
16 Kbps Voice - 16 kHz 
32 Kbps Voice - 22.05 kHz 
64 Kbps Voice - 44.1 kHz 


Note the sample rate information - you should note that, with only 24kbits to play with, it's unlikely that you're going to be getting more than a 16kHz mono output, whatever the codec you use.

I hope that was helpful.

whats the best way to encode Spoken Words?

Reply #4
I know it wouldn't be the best choice, but I recently encoded a speech-only CD (Gartner Feb 2002 edition) using Musepack --standard. I got an average of 90 kbps. Ive known that Musepack aint the right format for speech recording, but to be honest it does a good job
There are only 10 types of people on this earth - those who know binary and those who don't.

Dell Inspiron 5100
P4 2.4 533 FSB,
512 MB / 40 GB,
Windows XP Home

whats the best way to encode Spoken Words?

Reply #5
I'd make sure to convert all ripped .wav's to mono before encoding. Or, if you use an encoder that has a mono setting, use that.

/ Uosdwis

whats the best way to encode Spoken Words?

Reply #6
Quote
Originally posted by Jon Ingram

Sites such as the BBC use Real to encode their audio at a similar bitrate to the one you are using. The downside to using this format is that the format is completely closed, and can only be produced by a quite expensive program (0).


Helix Producer Basic is free and should fufill the needs.
http://www.realnetworks.com/products/producer/basic.html

Emanuel

whats the best way to encode Spoken Words?

Reply #7
Quote
Originally posted by Jon Ingram
3) RealAudio
Sites such as the BBC use Real to encode their audio at a similar bitrate to the one you are using. The downside to using this format is that the format is completely closed, and can only be produced by a quite expensive program (0). The advantage is that it can be played back on a wide variety of systems, via the RealPlayer.


RealAudio uses ACELP for voice coding. It's the same algorithm used by the free Windows Media Encoder for voice coding. Of course, the format is closed as well.

Regards;

Roberto.

whats the best way to encode Spoken Words?

Reply #8
Quote
Originally posted by ty1er
im am going to be encoding a 60CDs collection of Spoken text from the bible.
My goal is to compress it down to 1CD. In the end result i am looking for half way decent audio quaily.
Which format would be the best way to achive this?

I am thinking that i should use vorbis -q 0.
Or maybe a low bitrate Lame preset?

I just figured i would try to get some suggestions before i start this task.

thank you
-ty1er


Take a 700 MByte CDs. That's 80 min * 60*150*1024*8 bit.

Take the play-length of the 60 CDs. That is normally something between
45 and 80 hours.

Calculate play-length in sec.

Calculate the bits of the CD by the play-length in sec.
That gives something between 20 kbps and 36 kbps.

When you have this value ask again. Answers are different for
20 kbps and 36 kbps. For 20 kbps the only answer is to use a speech
codec (or to use two CDs in a nifty jewel case). For 36 kbps may
be other codecs are possible.
--  Frank Klemm

whats the best way to encode Spoken Words?

Reply #9
Quote
Originally posted by Saint
I believe Speex is tailored for speech at very low bitrate. i've not tried it so can't comment on the quality.

Its on HA front page at the moment if you want to give it a try.


Quality is comfortable compared with general purpose codecs.
But Speex is early alpha and file format is not freezed.
--  Frank Klemm

whats the best way to encode Spoken Words?

Reply #10
Quote
Vorbis -q 0 is approximately 64kbits, which would compress onto around 3CDs.

I guess the speech will be in Mono, so it will be twice less.

Also try --advanced-encode-option lowpass_frequency=N, it typically reduces the bandwidth even more.

whats the best way to encode Spoken Words?

Reply #11
Eh - this is probably not the most preferred solution, but for speech-only it will be one of the most functional with existing hardware, etcetera...

Try a Fraunhofer (FhG)-based MP3 encoder. I'm serious; you wouldn't want to use one for music (if you are serious about using MP3 for music, you are already using LAME), but for speech it can do quite nicely. This is the preferred codec for encoding Old-Time Radio shows, which were primarily speech and light music. Of course, they were broadcast over a restricted bandwidth and had lower overall requirements for aural fidelity, but you get the point.

Here's what I would recommend - and you'll have to modify your actual settings based on whichever implementation of the codec you manage to find:

    32 kbps/22 KHz MONO (roughly 51 hours per 700 Mb disc)
    24 kbps/22 KHz MONO (roughly 77 hours per 700 Mb disc)

In addition, if you need to fit more per disc you could try the following (but keep in mind that not all players are able to handle ultra-low bitrate MP3, which Fraunhofer designates as "Layer 2.5" or somesuch):

    16 kbps/11 KHz MONO (roughly 102 hours per 700 Mb disc)

For speech, my current preference is 32kbps/22 KHz MONO. Hope this helps.

    - M.

whats the best way to encode Spoken Words?

Reply #12
Quote
Originally posted by RM

I guess the speech will be in Mono, so it will be twice less.

Also try --advanced-encode-option lowpass_frequency=N, it typically reduces the bandwidth even more.


Stereo ~ 1.2...1.3 * Mono   

when heavily Intensity Stereo is used like in Ogg Vorbis in 64 kbps.

Stereo ~ 2 * Mono

is wrong.
--  Frank Klemm

whats the best way to encode Spoken Words?

Reply #13
Quote
Originally posted by Frank Klemm
Stereo ~ 1.2...1.3 * Mono   

when heavily Intensity Stereo is used like in Ogg Vorbis in 64 kbps.

Stereo ~ 2 * Mono

is wrong.


Agreed. Any music-oriented encoder, like MP3, Vorbis, AAC... will be a bad idea for voice.

I would suggest you use the ACELP.NET encoder bundled with Windows Media Encoder, or the other encoders Jon Ingram mentioned.

Or maybe wait for Speex 1.0, but that will probably take a long time.

Regards;

Roberto.

whats the best way to encode Spoken Words?

Reply #14
Quote
Originally posted by M
Eh - this is probably not the most preferred solution, but for speech-only it will be one of the most functional with existing hardware, etcetera...

Try a Fraunhofer (FhG)-based MP3 encoder. I'm serious; you wouldn't want to use one for music (if you are serious about using MP3 for music, you are already using LAME), but for speech it can do quite nicely. This is the preferred codec for encoding Old-Time Radio shows, which were primarily speech and light music. Of course, they were broadcast over a restricted bandwidth and had lower overall requirements for aural fidelity, but you get the point.

Here's what I would recommend - and you'll have to modify your actual settings based on whichever implementation of the codec you manage to find:

    32 kbps/22 KHz MONO (roughly 51 hours per 700 Mb disc)
    24 kbps/22 KHz MONO (roughly 77 hours per 700 Mb disc)

In addition, if you need to fit more per disc you could try the following (but keep in mind that not all players are able to handle ultra-low bitrate MP3, which Fraunhofer designates as "Layer 2.5" or somesuch):

    16 kbps/11 KHz MONO (roughly 102 hours per 700 Mb disc)

For speech, my current preference is 32kbps/22 KHz MONO. Hope this helps.

    - M.


I would never use MP3 below something around 40 kbps.
Only if noone should listen to the files.

Squeezing audio until you have 60 hours of trash, that should not
the task. Better is to use 2 or 3 CDs.

1500 hours of MPEG-4 TTS on one CD ...
--  Frank Klemm

whats the best way to encode Spoken Words?

Reply #15
Quote
Originally posted by Frank Klemm


Stereo ~ 1.2...1.3 * Mono   

when heavily Intensity Stereo is used like in Ogg Vorbis in 64 kbps.

Stereo ~ 2 * Mono

is wrong.


But they are allways talking about to encode a22KHz stream, mono. If you use Vorbis at -q-1 (nominal rate of 45Kbps *at* 44KHz stereo) it will end in the range of 20kbps at 22KHz mono, or even less (didn't tried). Task achieved, 60CDs in 1.

Greetings

whats the best way to encode Spoken Words?

Reply #16
Quote
Originally posted by ManyFaces
But they are allways talking about to encode a22KHz stream, mono. If you use Vorbis at -q-1 (nominal rate of 45Kbps *at* 44KHz stereo) it will end in the range of 20kbps at 22KHz mono, or even less (didn't tried). Task achieved, 60CDs in 1.

Greetings


Hehehe. But the quality is much worse than a voice-oriented codec at the same bitrates.

whats the best way to encode Spoken Words?

Reply #17
Quote
Originally posted by rjamorim


Hehehe. But the quality is much worse than a voice-oriented codec at the same bitrates.


Maybe, but if he wants a free solution today, that is one. He can tune-it, though: let's say mono at 36KHz or...

...the real solution is allways the same:  search, try, compare and get the lesser evil... <grin>

I will bet on the lossless way, but... hey! that's my opinion!...

whats the best way to encode Spoken Words?

Reply #18
Quote
Originally posted by verloren
I recently compressed 13 CDs of spoken word down to 1 CD using something around -q4.  But to do the same for 60  at -q0 I guess would take about 3 CDs!  What you could try is using a low quality setting (-q-1) AND downsampling to 22KHz or even lower.  Obviously the quality will suffer some, but on spoken word it's not too bad.

Cheers, Paul


Ups, i just saw verloren made the same statement..., but, again, lets say -q-1 AND downmix to mono AND downsample at 36KHz.

BTW, if it's voice, it would be easy to encode, right?, so it's even posible that only donmixing to mono and keeping the sample rate of 44KHz will do.

Just my 2 (Euro) Cents

whats the best way to encode Spoken Words?

Reply #19
Quote
Originally posted by ManyFaces
Ups, i just saw verloren made the same statement..., but, again, lets say -q-1 AND downmix to mono AND downsample at 36KHz.

BTW, if it's voice, it would be easy to encode, right?, so it's even posible that only donmixing to mono and keeping the sample rate of 44KHz will do.

Just my 2 (Euro) Cents


I keep my opinion that a voice codec will perform better.

I just did a try with a cd I have (remasterized version of the famous "War of the Worls" broadcast by Orson Welles)

ACELP.NET (I used the Windows Media Encoder one) at 16kbps/16kHz/Mono sounded much better than Vorbis at 19kbps/16kHz/Mono. (using q-1)

Regards;

Roberto.

whats the best way to encode Spoken Words?

Reply #20
Quote
Originally posted by rjamorim


I keep my opinion that a voice codec will perform better.

I just did a try with a cd I have (remasterized version of the famous "War of the Worls" broadcast by Orson Welles)

ACELP.NET (I used the Windows Media Encoder one) at 16kbps/16kHz/Mono sounded much better than Vorbis at 19kbps/16kHz/Mono. (using q-1)

Regards;

Roberto.


OK, nothing to argue... Again: try, compare, and choose what better suit your needs.

whats the best way to encode Spoken Words?

Reply #21
Quote
BTW, if it's voice, it would be easy to encode, right?, so it's even posible that only donmixing to mono and keeping the sample rate of 44KHz will do.

Not true at all. Because we listen to voices so often, we're very attuned to artifacts in voice reproduction. It's very hard for a general audio encoder to do low-bitrate voice well, unless it has sections which specifically search for 'voice like' components.

On the other hand, if you know that all you're going to be compression is voice, then you can make a number of assumptions that a general audio compressor cannot, which is why these voice encoders compress to a much smaller bitrate than Vorbis or MP3. Going down even further, and making even more assumptions, you can find specialised voice encoders which only use 1kbit/second... they don't sound very good, but the miracle is that they exist at all .

 

whats the best way to encode Spoken Words?

Reply #22
Quote
Originally posted by Jon Ingram

Not true at all. Because we listen to voices so often, we're very attuned to artifacts in voice reproduction. It's very hard for a general audio encoder to do low-bitrate voice well, unless it has sections which specifically search for 'voice like' components.

On the other hand, if you know that all you're going to be compression is voice, then you can make a number of assumptions that a general audio compressor cannot, which is why these voice encoders compress to a much smaller bitrate than Vorbis or MP3. Going down even further, and making even more assumptions, you can find specialised voice encoders which only use 1kbit/second... they don't sound very good, but the miracle is that they exist at all .


Neved doubt that a specialized speech codec will give better results. Only stated that, given the only free speech codec (i only know about speex) is still alpha, a readily available alternative would be vorbis, at the cost of quality.

Again, i did lossy backups of audio and images in the past, and payed dearly. If i were to encode 60CDs, i'll buy twenty or thirty CD-R and use FLAC.

 
SimplePortal 1.0.0 RC1 © 2008-2020