Tutorial: Add speech intros to your tracks

2009-10-13 18:13:10

Introduction:

I have an MP3 player without any sort of visual display to show artist/title information. I figured that adding an automated text-to-speech introduction at the beginning of each track would be a nice workaround. Here's how I went about doing it.

Required software (Windows):

MS-DOS command prompt (built into Windows)
ptts.exe (included with Jampal media player) or DSpeech (homepage)
SoX (homepage)
Better SAPI voices (several are listed listed here)
An audio encoder if your desired output format (such as WMA) is not supported by SoX.

Step 1:

The default voices installed on Windows (Sam and maybe Mike and Mary) sound like crap, so you'll want to replace them with something better. I can recommend the UK Emily voice by RealSpeak/ScanSoft/Nuance. Get the 22khz version instead of the 16khz version if at all possible.

Step 2:

Use the 'echo' MS-DOS command to create a text file with the text you wish spoken. For example:

Code: [Select]

echo The Wall - Disc 1, by Pink Floyd. Track 6. Mother. > "text_file.txt"

Be careful that the text doesn't contain certain reserved characters--more or less the same characters you're not allowed to use when (re-)naming a file or folder in Windows. The other punctuation (periods, dashes, commas, etc.) are necessary in order for the computer voice not to produce a bunch of run-on sounding sentences.

Step 3:

Use ptts to convert the written text in the text file into an audio file with spoken text. For example:

Code: [Select]

"ptts.exe" -w "spoken_audio.wav" -v 67 < "text_file.txt"

Use the "-v" parameter to adjust the volume of the spoken text. In this case 67%.

Alternately, you can use DSpeech. DSpeech has a nice point-and-click GUI, but is slower when executed from the command line. Note that if you use DSpeech, you might have to fiddle with the format settings, as it doesn't output 44khz 16bit stereo by default.

Step 4:

Use SoX to mix the spoken audio with the music audio. For example:

Code: [Select]

"sox.exe" -m -v 1 "spoken_audio.wav" -v 1 "music_audio.wav" "mixed_audio.wav"

There are two '-v' parameters that adjust the volume of each input file. Make sure they are both set to 1, otherwise SoX will reduce the volume of the input files to one half before mixing. The result will be a small amount of clipping in the first 2% of the track, but this is more desirable than reducing the volume of the other 98% by half.

Step 5:

If SoX can't output to the format you want, use a separate encoder to encode the file produced in Step 4. You could use the aoTuV encoder (available here or here) if you want better OGG Vorbis output, or the Windows Media Encoder to encode WMA files. The WME SDK includes a command line batch script called 'wmcmd.vbs' that is described here.

Conclusion:

I hoped that helped! It has certainly helped me figure out who exactly made the music I'm listening to...

-Mike

Tutorial: Add speech intros to your tracks

Reply #1 – 2009-10-14 17:32:57

Any particular reason for using echo to create text files? You could automate the whole text file creation process using the Text Tools component with Foobar (assuming your music is tagged correctly), e.g. following your text pattern:

Code: [Select]

%album%[' - Disc '%disc%]', by '%artist%'. Track '%tracknumber%'. '%title%'.'$crlf()

(square brackets make the disc part conditional, i.e. if there's no disc number tag it will be omitted. $crlf is carriage return, line feed. For more details see http://wiki.hydrogenaudio.org/index.php?ti...ng_Introduction, http://wiki.hydrogenaudio.org/index.php?ti...rmat_Reference)

With that you would get the same result, and a blank line between every track which can be used in conjunction with the -m switch on ptts to make multiple wave files for each track from one text file.

Tutorial: Add speech intros to your tracks

Reply #2 – 2009-10-14 19:00:34

Quote from: CamelD on 2009-10-13 18:13:10

I hoped that helped! It has certainly helped me figure out who exactly made the music I'm listening to...

Good tutorial and interesting idea.
Thanks

[edit] ...and I like more Karen than Emily, but that's another story

Tutorial: Add speech intros to your tracks

Reply #3 – 2009-10-14 20:16:47

Thanks! Haven't tried Karen yet.

Tutorial: Add speech intros to your tracks

Reply #4 – 2009-10-17 07:55:58

Quote from: Zilog Jones on 2009-10-14 17:32:57

Any particular reason for using echo to create text files? You could automate the whole text file creation process using the Text Tools component with Foobar (assuming your music is tagged correctly), e.g. following your text pattern:

I don't know what Foobar is!

Thanks for the tip. I'll have to try it out.

Tutorial: Add speech intros to your tracks

Reply #5 – 2009-10-22 06:30:16

I have run into a problem with this (my own!) tutorial. The commands don't work if the filenames have non-English characters in them. At least, if the commands are executed from a MS-DOS batch file. I am able to execute the commands if I copy/paste them directly into the command prompt.

The issue is discussed here:

http://blogs.msdn.com/michkap/archive/2006/03/06/544251.aspx

However, a resolution to the problem was never really arrived at as far as I can tell. Does anyone here have any tips?

Mike

Notice