Lame setting suggestion...
Reply #2 – 2003-03-26 13:41:59
For maximum quality you can use lame --alt-preset standard (or above) just like you do for music at transparent quality. (Aside from encoding, you obviously need to go about capturing the sound in digital format carefully to preserve quality from analogue sources, preferably using Dolby NR where it was used on the source and taking care to avoid distortion and clipping, and turning off other analogue inputs if they add noise. I've found that digital noise reduction in EAC's WAV editor is also very good). It will encode everything the ear perceives as well as the MP3 format allows it to. By the same token, MusePack (.MPC) encoding with mppenc --quality 5 --xlevel (or above) will also achieve maximum quality and transparency and a smaller file size, but obviously it's not an MP3 and not compatible with hardware players. However, if the speech is relatively easy to encode, especially if it's almost mono, you will probably find it results in very few MP3 frames above 128 kbps, and only uses more for the few tricky frames (e.g. applause). This is because --alt-preset standard uses 128 kbps as a minimum (except for digital silence, which is 32 kbps) to overcome problems with a few tricky music samples. Lowering the minimum bitrate below 128 kbps saves very little in file size on stereo music (partly because spare space at 128 kbps can be used for bit reservoir I guess), so there was little downside to this precaution. If file size isn't important, don't worry. If it is, either use MusePack, or you could try permitting low bitrates by modifying the --alt-preset command line. You could try using: lame --alt-preset standard -b 32 The psymodel ought to still provide enough bits to encode everything audible without wasting too many when it could dip below 128 kbps. For music, it isn't as well tested as the unmodified --alt-preset standard and doesn't save many bits at all so it's not usually worth risking, but for speech it may be OK. In a test I just tried on one stereo music file, it just went down from 177.6 kbps to 176.7 kbps by enabling -b 32, and most of the very few frames below 96 kbps came at the end of the file during and after fadeout. It was a tape recording (the track isn't on the CD version of the album) with Dolby B at 44.1 kHz, then 24 dB of noise reduction in EAC's WAV editor based on a noise profile taken from the tape lead-in. It sounds really good from a casual listen, but why take the risk to save just 0.9 kbps over the unmodified preset?Encoding as 44.1 kHz VBR(q=2) j-stereo MPEG-1 Layer III (ca. 7.4x) qval=2 Frame | CPU time/estim | REAL time/estim | play/CPU | ETA 8675/8678 (100%)| 2:47/ 2:47| 2:47/ 2:47| 1.3541x| 0:00 32 [ 1] * 40 [ 0] 48 [ 12] % 56 [ 23] % 64 [ 29] % 80 [ 59] %% 96 [ 78] %% 112 [ 204] %%%%* 128 [ 661] %%%%%%%%%%%%** 160 [3280] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%** 192 [3171] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 224 [ 819] %%%%%%%%%%%%%%%%% 256 [ 255] %%%%%% 320 [ 86] %% average: 176.7 kbps LR: 8390 (96.68%) MS: 288 (3.319%) Writing LAME Tag...done As an aside, for lower quality (e.g. converting to lower sampling rates) to produce speech for the web in tiny downloads, I've found Ogg Vorbis is excellent (even though it's not tuned for my switches) and produces less audible and tiring artifacts and Dalek noises than most other codecs including voice codecs like Speex that I tried. (For some uses, like VoIP, low latency of Speex is a key advantage, but not for prerecorded speeches on the web). See my signature for more details.