Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: encoding the spoken word - settings? (Read 7838 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

encoding the spoken word - settings?

I have a fair number of CD audio books, and also many language samples (converstions on audio CDs) for teaching purposes. Most appear to be in stereo, but with a central audio image. I want to encode these for portable player / car use. Smaller file size would be helpful. Apart from encoding to mono, can anyone suggest space saving parameters for speech only files?

In asking for your help I am assuming portable, clear speech may be less demanding of bits than portable music, for which I use --alt-preset medium with modified v3.90.3, as per the 'recommended' page.

I see v3.96.1 has a 'voice' parameter.......?

Thanks in anticipation

encoding the spoken word - settings?

Reply #1
Quote
Apart from encoding to mono, can anyone suggest space saving parameters for speech only files?
I recently did a project similar to this.  It took a little trial and error, but the settings I ended up with seem to be the best for me.

I started with 16-bit mono WAV files at 22050Hz and converted them to MP3 using LAME 3.96.1 with the following parameters:

-V3 --vbr-new --lowpass 8

These settings create an MP3 file with a bit-rate around 48kbps and an 8kHz low-pass filter, which seems fine for speech.

A typical 45 minute speech will reduce from ~115M (WAV) to ~15M (MP3) in about 35 seconds on my computer (P4 2.8GHz, 1G RAM, Windows XP).

Hope this helps...

~esa


encoding the spoken word - settings?

Reply #3
As many portable mp3 players do not support sample rates < 32 kHz (mine does not even support 32 kHz), you shoult encode some snippets with different sample rates and test them.
I got quite good results using 3.96.1 and 3.97 (alpha!) with -V7 --resample 44100

encoding the spoken word - settings?

Reply #4
I might be wrong, but --resample 44100 has no effect if you rip a normal audio book CD - it' IS 44.1 kHz already... I also think that standard -V7 actually should DOWNsample to 32 kHz without switches. 

Furthermore, you shuld concider using -a, (mono) as it is unlikely that there are any stereo worth taking care of on an audio book. If you downsample to 22 kHz, you shold also be aware of the usefullness of a lowpass at 11 kHz or lower.

I really support your idea on testing the first files on relevant hardware before making GBs of potentially unplayable files, and as you say - there are limitations here!

encoding the spoken word - settings?

Reply #5
Quote
I might be wrong, but --resample 44100 has no effect if you rip a normal audio book CD - it' IS 44.1 kHz already... I also think that standard -V7 actually should DOWNsample to 32 kHz without switches. 
[a href="index.php?act=findpost&pid=310130"][{POST_SNAPBACK}][/a]


Yes, --V7 does downsample to 32 kHz, but my player only wants 44.1 or 48 kHz. That's why I need the resample switch.
Especially when listening with headphones, I don't like mono encodings. Spacial reverberation which needs stereo makes the whole thing sound more natural.

encoding the spoken word - settings?

Reply #6
Sorry, Sunhillow, I didn't read your post properly , and got mixed up concerning bitrate and sample rate  Given the limitation of your hardware, and your preferences with regards to stereo, your setting seems OK to me, now...
But your hardware isn't actually a mp3-player (as I'm sure you're aware of), since it's not in compliance with the mp3 standard....

So, ardea, I'll think that you should test what sample rate and bit rate your hardware support, then you will have to decide on quality vs filesize - that is other words mono vs stereo, lowpass filter/cutoff and chosen bitrate or quality setting... Good luck!!!

Edit: PS - the --voice setting is available also in a 3.90.3 compile - I think it gives mono, 24 kHz sample rate, 12 kHz lowpass and result in something like 56 kbps bitrate

encoding the spoken word - settings?

Reply #7
Thanks for your helpful suggestions and links. I had tried a search on just about every word for 'verbals' except 'speech'!

There's plenty here to start me off in the right direction. LAME is powerful software, supported by many hours of skilled labour - mere thanks does not seem to do this justice! However, for the aging erudite tyro like me, it can appear rather esoteric. With a large selection to choose from, I guess anticipatory knob twiddling can be fun (there, I've said it), but reading too much about it can cause the innocent to miss the on switch for the oscillating grommet widget (if you see what I mean)

Thanks again!