Tutorial: Add speech intros to your tracks
2009-10-13 18:13:10
Introduction: I have an MP3 player without any sort of visual display to show artist/title information. I figured that adding an automated text-to-speech introduction at the beginning of each track would be a nice workaround. Here's how I went about doing it.Required software (Windows): MS-DOS command prompt (built into Windows) ptts.exe (included with Jampal media player ) or DSpeech (homepage ) SoX (homepage ) Better SAPI voices (several are listed listed here ) An audio encoder if your desired output format (such as WMA) is not supported by SoX. Step 1: The default voices installed on Windows (Sam and maybe Mike and Mary) sound like crap, so you'll want to replace them with something better. I can recommend the UK Emily voice by RealSpeak/ScanSoft/Nuance. Get the 22khz version instead of the 16khz version if at all possible.Step 2: Use the 'echo' MS-DOS command to create a text file with the text you wish spoken. For example:echo The Wall - Disc 1, by Pink Floyd. Track 6. Mother. > "text_file.txt" Be careful that the text doesn't contain certain reserved characters--more or less the same characters you're not allowed to use when (re-)naming a file or folder in Windows. The other punctuation (periods, dashes, commas, etc.) are necessary in order for the computer voice not to produce a bunch of run-on sounding sentences.Step 3: Use ptts to convert the written text in the text file into an audio file with spoken text. For example:"ptts.exe" -w "spoken_audio.wav" -v 67 < "text_file.txt" Use the "-v" parameter to adjust the volume of the spoken text. In this case 67%. Alternately, you can use DSpeech. DSpeech has a nice point-and-click GUI, but is slower when executed from the command line. Note that if you use DSpeech, you might have to fiddle with the format settings, as it doesn't output 44khz 16bit stereo by default.Step 4: Use SoX to mix the spoken audio with the music audio. For example:"sox.exe" -m -v 1 "spoken_audio.wav" -v 1 "music_audio.wav" "mixed_audio.wav" There are two '-v' parameters that adjust the volume of each input file. Make sure they are both set to 1, otherwise SoX will reduce the volume of the input files to one half before mixing. The result will be a small amount of clipping in the first 2% of the track, but this is more desirable than reducing the volume of the other 98% by half.Step 5: If SoX can't output to the format you want, use a separate encoder to encode the file produced in Step 4. You could use the aoTuV encoder (available here or here ) if you want better OGG Vorbis output, or the Windows Media Encoder to encode WMA files. The WME SDK includes a command line batch script called 'wmcmd.vbs' that is described here .Conclusion: I hoped that helped! It has certainly helped me figure out who exactly made the music I'm listening to... -Mike