Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lame Settings For Speech? (Read 36271 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Lame Settings For Speech?

Hi.

I'm needing some help for a project converting PowerPoint narrations from .wav to .mp3.  Basically, I'd like to know the best settings to use in LAME for speech. 

I'd use something like SPEEX, but I need to ultimately stream the files via Flash as .swf files.  .SWF only supports .mp3.  Also, .SWF files do not support Joint Stereo or VBR.  I also will be targeting 24kbps/22050Hz/Mono.  Many users will be listening to the .swf's streamed via dial-up modems.

So given those paramaters, I have two questions.  First, what PCM setting is the best to start with when we do the narration recordings on the PC?  I don't want to capture at too high a quality, because disk space may be at somewhat of a premium, especially since captured streams can run up to an hour in length.  So the smallest I can get away with is preferred.

Second, what are the best LAME switches to use, given this will only be speech/voice-based audio.  (and also considering the limitations of .swf I stated above.)  Again, I'm thinking I want to encode to 24kbps/22050Hz/Mono, but I really don't know what to set the other options to.

Thanks so much!

David

Lame Settings For Speech?

Reply #1
If you are going to encode to 22.050Hz, I suggest you use some FhG encoder. Lame is only optimized for 44.100 Hz encodings.

Lame Settings For Speech?

Reply #2
Quote
I suggest you use some FhG encoder

Yes, but this will be an app that we distribute, and we would like to stay open-source to avoid license fees.

Lame Settings For Speech?

Reply #3
Portable compatibility? MPEG1 layer3 only or is MPEG2 layer 3 or MPEG2.5 layer 3 also ok? Other requirements? Speed requirements?

24kbps mp3 will be MPEG-2 layer III.

Well, there really aren't that much to tweak given the restrictions.. cbr/mono/24kbps.

Basic switch would be something like:
lame -a -h -b 24 --nspsytune --resample 22 --lowpass 7

I'll try testing few more settings quickly...

With --lowpass 8 you pretty much need --nspsytune, otherwise it starts to sound too distorted to me.

So, maybe:
lame -a -h -b 24 --nspsytune --resample 22 --lowpass 7 --athtype 2
Another option I found pretty good is:
lame -a -h -b 24 --resample 22 --lowpass 7 -X 1 --athtype 2

The above nspsytune line is a bit muffier. Below (default gpsycho) is maybe a bit clearer but has some higher freq swishing. I guess it's a matter of taste..

Using "higher quality" quantization noise shaping made it just worse: -q0 or -q1, but especially -q0 is known to be broken in Lame 3.90-3.92. Both nspsytune and gpsyho lines are using -X1 quantization noise measurement method option (nspsytune by default), which gives better results than gpsycho's default (-X0).

I guess it depends also from the source. I used 44khz/stereo speech.

[noticed that you couldn't use speex, so edited that away]
Juha Laaksonheimo

Lame Settings For Speech?

Reply #4
Quote
stereo/mono? portable compatibility? MPEG1 layer3 only or is MPEG2 layer 3 or MPEG2.5 layer 3 also ok? Other requirements?

Check out also speex. It's open source speech codec.
http://speex.sourceforge.net/
Check out Windows binaries from rarewares http://www.inf.ufpr.br/~rja00/ (of course down atm).
Winamp alpha speex plugin:
http://www.saunalahti.fi/~cse/Speex/in_speex.zip
http://www.saunalahti.fi/~cse/Speex/in_speex_src.zip

You can get speex binaries temporarily from here:

http://audio.ciara.us/rarewares/speexbundle.zip

Let's hope I stop being a lazy guy and finish setting up the mirror.

Lame Settings For Speech?

Reply #5
John,

Thanks for your detailed response.  And you are right, .swf supports only MPEG-2 layer III.

I wonder how much of a hit the sound will take capturing the original speech at a lower quality, say 22khz, 16bit, mono?  I have to consider the user's available disk space on their local machine for this project.

What are you thoughts about the right capture quality setting?  I'm attempting to balance original PCM file size with quality of source before encoding.

On other thing to add...there tends to be a lot of background "hiss" in these micropone recordings.  Sorry that this newbie doesn't know the technical name for this.    Any setting in particular that will help with that?

David

Lame Settings For Speech?

Reply #6
Hmm, I don't believe there's much quality loss if you capture with 22khz/16bit/mono and encode.
Can't say for sure though.. If you put few short sample .wavs online, I could do some testing.

It could also help to tweak the settings, especially if there are lots of background noise, and you are not gonna do any noise removal process with 3rd party software first. Some settings will definitely sound better than others with lots of background noise...
Juha Laaksonheimo

Lame Settings For Speech?

Reply #7
"Noise removal" algorithms tend to be fairly complex, but if you do some searching you may find some open-source implementations (if something like that isn't too CPU-intensive to use in your app).  Most are essentially glorified dynamic bandpass filters -- they subdivide the spectrum into small frequency ranges, look for the ones which "look like hiss" (I'm not entirely sure how this is recognized; perhaps too constant of a sound) and then filter out that region of the spectrum.  If you're only using one computer/mic combo for the encoding, you can simplify this process by recording some silence and spectrum-analyzing it to find out where the hiss is concentrated and then just filter out that frequency range.  If you need to work on arbitrary computer/mic setups, you'll have to do the more in-depth dynamic analysis though to figure out at runtime where the hiss is located.

Lame Settings For Speech?

Reply #8
Thats a great idea. You could also try pre-processing the wavs with the more complex routines of cool edit or other good noise reduction programme.

There is absolutely nothing to waste at 24kbps.

Lame Settings For Speech?

Reply #9
--alt-preset 24 -m m

Lame Settings For Speech?

Reply #10
Quote
--alt-preset 24 -m m

Hmm, in order to comply with the requirements it would have to be 22khz/24kbps cbr.
This sounds pretty decent:

--alt-preset cbr 24 -a --resample 22 --lowpass 7

It's otherwise exactly the same nspsytune-line I mentioned before, but adds --ns-bass -3. It's increasing the quality, so the above line is better than my first nspsytune suggestion. Imo the alt-preset 24's default lowpass (4khz) sounds pretty muffled, even for speech. -m m and -a switches do the same thing (downsample to mono).
Juha Laaksonheimo

Lame Settings For Speech?

Reply #11
My vorbis @ 22 KHz, mono sounds better if i preprocess them with Soundprobe: DC offset, resample [22,mono], Expander, Normalization... try yourself !

Lame Settings For Speech?

Reply #12
Hi again.

This was really great advice, and as such, we are using the following settings as our "main" settings for our application:

--alt-preset cbr 24 -a --resample 22 --lowpass 7

I'm hoping you can help me a little more.

We record the .WAV's at 16bit 22050khz mono.  We never have control over the microphone or the environment, as any user can use the software.

We want to be able to offer an even lower quaility/lower bandwidth setting as well...for those users who view the presentations on very low bandwidth connections.  Our options given the limitation of the Flash MP3 format are:



I'm happy to email you a sample .wav file if you like.  (But it is just speech, so if you want to just use your own...either way)   

Thanks again!

David

Lame Settings For Speech?

Reply #13
Ok, 8kbps is too low for Lame..

This is a line which was the best with my speech samples:
--alt-preset cbr 16 -a --resample 11 --lowpass 5 -Z

Yes, some people might wonder why use -Z  (here noiseshaping type 1). Obviously the bitrate is so low, that what is logical at a bit higher bitrates does not apply to extreme low bitrate. Using -Z made especially my female speech sample sound better.
Juha Laaksonheimo

Lame Settings For Speech?

Reply #14
>    --alt-preset cbr 16 -a --resample 11 --lowpass 5 -Z

Awesome!  Thanks so much!

David

Lame Settings For Speech?

Reply #15
Well, gotta say that I like this even better:
-b 16 -a --resample 11 --lowpass 5 --athtype 2

It's considerably less noisy, but a bit more metallic. So that's my best line so far.. 
Juha Laaksonheimo

Lame Settings For Speech?

Reply #16
Ok.  Now I'm gonna change the requirement completely.  Now we want to add a higher quality setting above our main setting.  So to review, our main setting is at:

--alt-preset cbr 24 -a --resample 22 --lowpass 7

And we capture the .wav at 16bit 22050khz mono.  So if we looked at 32, 40, and 48kbps,  (based on the chart above of supported .swf mp3 formats) where do we get the most bang for our bandwidth buck, and what settings work best there? 

Thanks as always!

David

Lame Settings For Speech?

Reply #17
Heh, I just noticed that you can increase the 24kbps quality still quite nicely if you add -Z to the 24kbps line.
I've never until this thread actually tested anything this low bitrate, so it's surprising to notice how some features function totally opposite compared to what one might think..

These are my recommendations so far for:

24kbps speech:
--alt-preset cbr 24 -a --resample 22 --lowpass 7 -Z

16kbps speech:
-b 16 -a --resample 11 --lowpass 5 --athtype 2 -X3
Juha Laaksonheimo

Lame Settings For Speech?

Reply #18
Should these low bitrate settings be added to the 'List of recommended LAME settings' thread once agreed upon.
There's really nothing below 80 kbps in that list.
BTW Can these tweaks be integrated in the (alt-)presets, to make the statement '--alt-preset (CBR) xx give best quality at bitrate xx' true for all bitrates, also the low end.

Lame Settings For Speech?

Reply #19
Notice that I've only tested mono speech here.. Could be that music needs lower lowpass in order to sound even half decent.
And I hope that some other people tries to test these also, in order to verify or better my findings.
Juha Laaksonheimo

Lame Settings For Speech?

Reply #20
just FYI: LAME before 3.93 has a bug on preecho-prevention when mono mode.
I recommend you to use the latest LAME, if you want to use mono.

PS. I think the last problem on the 3.93 is --preset fast standard.
May the source be with you! // Takehiro TOMINAGA

Lame Settings For Speech?

Reply #21
Quote
just FYI: LAME before 3.93 has a bug on preecho-prevention when mono mode.
I recommend you to use the latest LAME, if you want to use mono.

Hmm, yeah Lame3.93a is marginally better here with very sharp syllables. Overall, considering the speech quality, the improvement is very minor.

Of course with music coding and with a bit higher quality, this is more important issue.
Juha Laaksonheimo

Lame Settings For Speech?

Reply #22
Quote
PS. I think the last problem on the 3.93 is --preset fast standard.

Sorry for the off topic post, but if the rest of the developers are waiting for me to fix the fast presets in 3.93 before releasing, I suggest you just go ahead and release now.  What I'd prefer would happen is that the fast settings are just disabled in 3.93 with a notice that if people want to use them, that they should use 3.92 instead.  Right now, I'm just so busy.. I don't really have time to work on LAME at the moment.

Lame Settings For Speech?

Reply #23
Quote
These are my recommendations so far for:
24kbps speech:
--alt-preset cbr 24 -a --resample 22 --lowpass 7 -Z

16kbps speech:
-b 16 -a --resample 11 --lowpass 5 --athtype 2 -X3


Thank you.  And we're getting this into our code today.  Now, could I hit you up for the best speech settings at 32, 40, and 48kbps  (all 22050Hz and Mono).  I promise we'll be done, and both me and my boss will be very greatful! 
B)

Lame Settings For Speech?

Reply #24
One thing I do for getting Speech file as smal as possible is to aply a noisegate depending on how agressive you are using it you can get files a lot smaller by setting all the breaks bewseen two words to zero.

 
SimplePortal 1.0.0 RC1 © 2008-2021