Some Lame settings for voice; a Lame bug

Topic: Some Lame settings for voice; a Lame bug (Read 4961 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Some Lame settings for voice; a Lame bug

2005-10-24 08:59:49

I have been fiddling around with Lame a bit, and find that the presets for speech / spoken word work for hi-quality speech (56k) and medium-quality speech (25k). In these examples, your_podcast.wav is the raw wav file of an audio file consisting mainly of spoken word

To make a high-bandwidth (56k) podcast:

lame --preset voice your_podcast.wav your_podcast.mp3

This will make an excellent-sounding mp3; this will even sound good with music in the background.

This will make a mp3, alas, which can not be listened to in real time over dialup. To work around this, we can encode it at around 25 kbps:

lame --preset sw your_podcast.wav your_podcast.mp3

"sw", I presume, stands for shortwave. This won't be quite as clear as the above 56k mp3, but will be perfectly listenable (even with music in the background), and has the advantage of being downloadable in real-time over dialup (as long as the user is able to dial in at a fast speed).

Lame also has a preset for encoding around 16k, which has a lot of compression artifacts. I don't reccomend this for anything besides spoken voice without any background music:

lame --preset phone your_podcast.wav your_podcast.mp3

Now, I have been fiddling around with Lame, and got a setting which is even more compact for just spoken word, and gives one telephone-quality audio (even audio with background music, to boot). The sound of an mp3 encoded this way reminds me of those old 1950 transitior radios. The incantation is:

lame --abr 12 -a --resample 11 --lowpass 2.5 --highpass .2 -B 16 your_podcast.wav podcast.mp3

The bitrate of something encoded with this is between 12kbps and 13kbps. The audio doesn't sound very good, but is still comprehensible. This is offset by the fact that an hour of audio takes less than six megabytes, and can be downloaded in real-time over a dialup with enough bandwidth to spare to allow the user to surf the web or what not.

In English, the settings mean:

--abr 12: "An average bitrate of 12kbps" (Lame actually makes a file a little over 12kbps in size; --abr is a lowball figure at low bitrates)

-a: Mono audio (this actually isn't needed; Lame knows to downsample down to mono at this bitrate)

--resample 11: Lame doesn't know to downsample to a sampling rate lower than 16khz; this forces Lame to downsample down to 11khz

--lowpass 2.5: We have a low-pass filter with a cutoff of 2.5 khz. In other words, no audio higher than 2.5 khz is let through. This muffles the sound a bit, but mostly eliminates the metallic sound of low bitrate mp3s.

--highpass .2: This is a high pass filter with a cutoff of 200hz. This gets rids of the lower frequencies so there are less frequencies to encode; the human brain knows how to reconstruct the lower frequencies.

-B 16: Do not have any frames larger than 16kbps. This reduces the size of the file by about 10%-20%. While it makes to have larger frames when encoding music without any audible artifacts,
it doesn't make sense when trying to make an spoken voice file as small as possible.

--

Lame has a bug where it cuts off the last second or so of audio; while not annoying when encoding music, it is very annoying when recording audio at low bitrates. The amount of audio cut off depends on the bit rate of the mp3; more audio is cut off at lower bitrates. I work around this bug by adding 1.25 seconds of silence at the end of a file before encoding it; this can be (with a bit of trouble) be automated:

sox file.wav file.raw
rm file.wav
dd bs=44100 count=5 if=/dev/zero of=silence.raw
cat silence.raw >> file.raw
rm silence.raw
sox -r 44100 -c 2 -w -s file.raw file.wav
rm file.raw

- Sam

Some Lame settings for voice; a Lame bug

Reply #1 – 2005-10-24 09:49:28

Quote

Lame has a bug where it cuts off the last second or so of audio

I'd like more details on this please:
*Lame version
*parameters
*input sample file

Some Lame settings for voice; a Lame bug

Reply #2 – 2005-10-24 10:25:51

What we need to know:

* Version of Lame
* Length of the input file (in samples)
* Sample rate of input file (44.1 or 48 kHz)
* Resampled to (okay, this is known: 11,025 kHz)
* decoder which was used (very important!)

An MP3 stream consists of overlapping blocks.
Some decoder discard incomplete blocks with the result, that the
decoded file is shorter than the original file.
Some decoder don't discard incomplete blocks with the result, that the
decoded file is longer than the original file.

Another idea:
Try to resample the file before encoding. May be there is a bug in the
resampling engine.

Some Lame settings for voice; a Lame bug

Reply #3 – 2005-10-24 19:31:06

First of all, I didn't think Lame developers were here; if I knew this, I would have made a more through bug report. See below for the requested information.

Since developers are here, I would like to say that, for music, I am very pleased with how Lame sounds with "--preset standard". The only times I have heard problems at this setting is when the original source file had the problems (and, yes, I'm one of those snobs who hates the sound of 128k mp3s).

The presets are also well designed for low-bandwidth audio, and work very well for just about any audio I throw at them.

The only time I have had problems with the presets is when trying to make a 16k mp3 where the source file is a combination of voice, sound effects, and music (--preset phone sounds bad, IMHO, unless the source is pure spoken word). Then again, anyone trying to make a mp3 16k or smaller is going to have to compromise the audio, and should really know the options well enough to be able to decide how they want the audio to be downgraded.

--

* Version of Lame

LAME version 3.96.1. If it would help the developers any, I am willing to try the latest Alpha or CVS version.

* Length of the input file (in samples)

8,818,824 16-bit stereo samples

* Sample rate of input file (44.1 or 48 kHz)

44.1 (CD audio)

* Resampled to (okay, this is known: 11,025 kHz)

Bug appears when resampled to 32khz (the very end of the last word is cut off), 16khz (about half of the last word is cut off), and 11.025 khz (most of the last word is cut off)

* decoder which was used (very important!)

XMMS 1.2.10

--

The silence workaround does work. If I was developing Lame the way I'd resolve this is to have a zero-pad option that would pad the last frame with silence to remove any incomplete frames. Another option I would add would be one to zero-pad the beginning of the song. Then again, Lame already has a zillion options and zero-padding can be done with a combination of dd, cat, and sox.

- Sam
http://www.samiam.org/

(Edit: Clarification; Lame sounds great for low-bandwisth spoken voice mp3s also)

Some Lame settings for voice; a Lame bug

Reply #4 – 2005-10-24 22:18:38

Quote

If I was developing Lame the way I'd resolve this is to have a zero-pad option that would pad the last frame with silence to remove any incomplete frames.

It is supposed to already be this way. Investigating your report...

Some Lame settings for voice; a Lame bug

Reply #5 – 2005-10-25 01:01:34

Quote

Quote
If I was developing Lame the way I'd resolve this is to have a zero-pad option that would pad the last frame with silence to remove any incomplete frames.

It is supposed to already be this way. Investigating your report...
[a href="index.php?act=findpost&pid=337015"][{POST_SNAPBACK}][/a]

I just did some cross-player testing. My portable player, a Creative Zen Nano Plus, cuts off the low bandwidth mp3 at the end, but cuts off less than XMMS. (As an aside, the Zen is buggy when seeking a low-bandwidth mp3.) Lame may be technically correct here; I think low-bandwidth mp3s have not been as extensively tested as music-bandwidth mp3s so a lot of players cut things off at the end.

Since the cut off exists with two different players, but at different points, it may be that Lame will have to just have a user-settable zero-pad-end option or some such.

BTW, is there a LAME FAQ which details problems like this, so people know how to work around things like buggy mp3 players cutting off MP3s before the end of the file and what not.

Now that podcasting is catching on, I'm sure low-bandwidth mp3s will be tested a lot more.

If there is any other testing I can do to help, please let me know. So far, I have gotten a far better response here than I'm sure I'd get from Creative if I reported their buggy low-bandwidth mp3 behavior. Most big corporations have a filter which makes it impossible for me to report bugs to the people who can fix bugs.

- Sam

Some Lame settings for voice; a Lame bug

Reply #6 – 2005-10-25 09:14:52

A stupid question: did you decode your MP3 to wav and listen if the end is still cut off?

Some Lame settings for voice; a Lame bug

Reply #7 – 2005-12-06 04:11:49

I'm trying to investigate a similar problem. I work at a public radio station which produces a large number of mp3 files, many of which play in several common players as if the last two or three seconds were cut off. (Both realplayer and itunes, are vulnerable to this, apparently.) In XMMS, I see that it also calculates the end time a bit short, but continues playing. Other utilities, such as eyeD3 and pymad, also estimate the length short. The files are all CBR, mostly 56K/44.1.

I've had mixed results in decoding and encoding the files in question (with lame; at this point I don't know what encoder is actually used to produce the original mp3s); sometimes I get rid of the problem and sometimes I do not.

Would anyone have any suggestion about how such a problem be debugged?

Notice