Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: foo_tts - Yet Another TextToSpeech plugin (Read 16927 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

foo_tts - Yet Another TextToSpeech plugin

I started developing this back in summer, before discovering foo_talktome by aganders3 - so you can see I'm not the fastest coder on the block! It was originally just to help a blind friend, but I've since figured out how to use foo_quicktag coupled with the excellent EventGhost and a cheapo IR remote control, so now I use it myself too.

As with foo_talktome, foo_tts can be set to speak details of each track when it starts playing, but there's also a menu option to read out details of current track at any time. Totally useless if you don't have a remote or a wireless keyboard - but if you do, it's great when you've got a large music library, can't remember everything in it, and don't want to go look at the screen.

You need a SAPI-compliant speech engine (various "lo-fi" freebies standard in Vista or freely available for XP, but I quite like the Scansoft "Daniel" voice).

Three configurable "title-format" strings define what should be read out (cycled through if you request repeated readouts while the same track is still playing). Some tts voices aren't very clear, so if you don't catch what was said the first time maybe you'll get it second or third time said a different way.

ALSO - the default format strings use tags called TTS_xxxxx rather than xxxxx, if present. For example, if you're using a British voice, and you define TTS_ARTIST = zee zee top on all their stuff it'll get pronounced correctly.

Anyway, here's the link, which I intend to keep up-to-date. Just download into your ...\(foobar2000)\components folder and you're away! I'd be grateful for any feedback.
foo_tts.dll

And here's a short text file setting out how I installed and configured foobar, the plugins I use, and EventGhost. I only really wrote it to remind myself in case I ever need to start from scratch again - but some steps took me a while to figure out, so they may help others.
foobar installation notes
my MS MediaCentre Remote.xml configuration file

Change Log:
1.0.3  Supports confirmation / audio feedback when setting star ratings.
1.0.4  Cycle back to first of multiple configured format-strings (don't just stop at the last one).

foo_tts - Yet Another TextToSpeech plugin

Reply #1
Thanks for another TTS component.

I like that you can have different format strings and to be able to correct spellings with tags, as opposed now I have to use my TTS' dictionary.

The volume when it's in DJ mode is very loud compared to when forcing it to speak.
After the first song it still says the second, more detailed title format even when it's a new song.

Also, when choosing level to reduce the playback volume it's not saved until clicking on Apply, which doesn't get activated until you change one of the title formats.

Thanks again for making this component, always good with alternatives!

Edit:
It doesn't seem to remember the DJ setting after restarting foobar2000.
I can also recommend NeoSpeech's voices, very clear sound.
Windows 10 Pro x64 // foobar2000 1.3.10

foo_tts - Yet Another TextToSpeech plugin

Reply #2
The volume when it's in DJ mode is very loud compared to when forcing it to speak.
After the first song it still says the second, more detailed title format even when it's a new song.
Also, when choosing level to reduce the playback volume it's not saved until clicking on Apply, which doesn't get activated until you change one of the title formats.
It doesn't seem to remember the DJ setting after restarting foobar2000.

Thanks for the feedback. Hopefully I can quickly deal with the points you raised.
I added the DJ mode at last minute, and I don't really use that myself, but I should have checked it better, sorry.
Regards volume, that's because fb has its own slider governing the music channel, but tts goes direct to system audio device. So I need to adjust that to match fb internal setting - but they use different scaling, so I need to work out a conversion function. On my system I always have fb volume at max anyway, so it wasn't something that concerned me - the relative volumes are bound to be correct then. Anyway, it's obviously fixable.

foo_tts - Yet Another TextToSpeech plugin

Reply #3
After the first song it still says the second, more detailed title format even when it's a new song.

I don't understand that. DJ-style speech was tied to one of the "on demand" config strings - which particular one of the three being specified in 'advanced config'. I've now scrapped that advanced config and added a format string just for DJ-style.

Hopefully the other issues you raised have been dealt with (link at start of thread is updated).
Speech volume should now match music playback volume in all contexts, but there's a fine-tuning config in case it's always a bit too loud.
DJ-mode setting is now remembered between sessions.


foo_tts - Yet Another TextToSpeech plugin

Reply #4
Now it's just a tiny bit that not quite work as it should, IMO.

When DJ mode is on it won't lower the volume when speaking next track.

The player seems to freeze until it's done announcing, a bit irritating.

Also, bigger text boxes would be good to have.
I rearranged it a bit, how it could look like: http://i.imgur.com/f5wbf.png

Thank you very much for the update.
Windows 10 Pro x64 // foobar2000 1.3.10

foo_tts - Yet Another TextToSpeech plugin

Reply #5
I don't think it's practical to "unfreeze" foobar's user interface while tts engine is speaking. I'd probably have to learn all about creating worker threads to do it at all - which is probably do-able, but I can't really see the point.

I think it would be very difficult to interrupt a SAPI tts speech act once it's started, so even if you could reduce the foobar volume during speech, this would only affect any foobar audio still playing in the background. Speech wouldn't change, because it's going direct to System audio output channel (which of course you could change using controls on amplifier/speaker, or the Windows "Volume" icon). Besides, foo_tts reduces foobar playback volume while speaking - when it re-instates the original setting afterwards, you'd lose your change anyway.

I went to some trouble arranging for speech to be about the same volume as audio output, which seems to me to be how you'd want things to work. The plugin is pretty pointless without a remote control, which by default would have Volume Up/Down keys connected to the Windows volume control, not foobar's. I'd make the same connection to appropriate keys if I were using a wireless keyboard rather than a dedicated remote.

In short, unfreezing the UI looks like a lot of work for a questionable benefit.


Regarding multi-line input boxes - again I don't see much point. As I understand it, some users create quite complex strings which could be thousands of characters long, so there's no obvious size of box to make it all visible at once. I'm not aware of any practical limit on actual length of input to my existing boxes. Personally I just use "cut&paste" from a temporary NotePad window if I need to see it all. Microsoft only provide short single-line boxes for most "potentially very large" configurable inputs within their packages, and that's ok by me.

The only other point I noticed in your mock-up display is the '?' "Floating Help" icon in top right. The basic Preferences window is actually implemented within SDK, which doesn't appear to support this in any obvious way. But I notice one or two other plugins implement a standard button marked "Help" within the display, connected to an html document supplied with the plugin dll. I will look into this.

Please accept my apologies if this seems a rather negative response. I really do appreciate your feedback.

foo_tts - Yet Another TextToSpeech plugin

Reply #6
I don't know how the author of foo_talktome did but it doesn't interrupt the foobar2000 window while it speaks.

The mock-up is very real, changed it with a resource editor.
One of the disadvantages with a single-line text box is that you can't paste more than one line.
It's also easier to get a better overview with a bigger area, easy to get lost in a small.

You should have a ?, it's connected to an ID of each component.
This one goes to: http://wiki.hydrogenaudio.org/index.php?ti...DB-1A171B931A06

I use it without a remote, works perfect with just the keyboard and hotkeys.
Windows 10 Pro x64 // foobar2000 1.3.10

foo_tts - Yet Another TextToSpeech plugin

Reply #7
Ok, well I'll certainly have a good long look at those points. And see if foo_talktome author aganders3 can enlighten me re keeping UI responsive during speech.

btw - your link to "? connected to an ID of each component..." doesn't work

foo_tts - Yet Another TextToSpeech plugin

Reply #8
That would be great.

Did you get the following when you clicked the link?
"There is currently no text in this page."

That's because you haven't created/edited the page yet.
Windows 10 Pro x64 // foobar2000 1.3.10

foo_tts - Yet Another TextToSpeech plugin

Reply #9
"There is currently no text in this page."

That is indeed what I got. But how does that help me connect a help screen to my preferences page?

foo_tts - Yet Another TextToSpeech plugin

Reply #10
If you want to use the official way then you should create a page on the wiki.

There should be some way to link to a specific site via that button.
Checked some components and foo_uie_wsh_panel_mod links to http://code.google.com/p/foo-wsh-panel-mod...ditorProperties

I kinda like what Yirkha did with his Dynamic Fields component, http://www.hydrogenaudio.org/forums/index....st&p=731893
The help-button toggles the help while it resizes a text box.
Windows 10 Pro x64 // foobar2000 1.3.10

foo_tts - Yet Another TextToSpeech plugin

Reply #11
I've just noticed a "feature" of foobar that's incredibly useful (to me, at least) when using a remote control with the excellent free utility EventGhost...

I use foobar as a kind of "in-house radio", but the PC is often used for other things, so it's often left with some other app in the foreground (google, email, whatever). In which case foobar doesn't have the focus, so it doesn't receive the emulated keystrokes from EventGhost.

BUT - EventGhost can be configured to Launch [some specified] Application on any remote key. AND - if foobar is already running when you try to launch it again, all that happens is it becomes the foreground application if it wasn't already.

So now if my remote doesn't work because someone left the pc with some other app in the foreground, I can just press my Launch foobar button and try again!

foo_tts - Yet Another TextToSpeech plugin

Reply #12
How to set foo_tts to use another SAPI-compliant speech engine?

foo_tts - Yet Another TextToSpeech plugin

Reply #13
AFAIK (I may be wrong) the TTS plugins use whatever voice you've set as the default voice in: Control Panel> Speech, e.g.:
Quote
"You have selected VW Kate as the computer's default voice."

At least that's the case for foo_talktome. And IIRC from that thread the developer mentioned that he had no control over that, i.e. speech engine selection is via / up to Windows (again I may be wrong).

C.

EDIT: Here's the quote from the foo_talktome thread. Perhaps foo_tts is different, don't know.
Quote
As far as other voices go, you can set the voice in the Windows speech control panel.
PC = TAK + LossyWAV  ::  Portable = Opus (130)

foo_tts - Yet Another TextToSpeech plugin

Reply #14
I'm afraid there's no support in foo_tts code for switching to a different voice. Depending which particular voice you're using, you may be able to insert various "meta commands" in the string passed to the speech generator (change pitch, volume, talking speed, etc.) but I'm not aware that any of them (incl those from MS itself) support changing the underlying voice. The only method I know is to use the Windows speech control panel as others have already indicated.

Which is annoying because sometimes you just can't make out odd words in one voice, but they might be intelligible in another. If it were possible I'd configure my setup to use different voices as it cycles through different ways of reading out track details.


foo_tts - Yet Another TextToSpeech plugin

Reply #16
Thanks very much for that, Andreasvb!  I'll get around to coding it in soon.

The time-consuming bit is the user interface. Most likely I'll just number available voices from 1 up. User can then configure each utterance format string to be spoken in an explicitly-numbered voice, or 0 for system default. It'll get jumbled when voices are installed/removed, but given the tiny number of people who've downloaded the plugin (mostly Japanese, actually!), I'm not too fussed about that.

foo_tts - Yet Another TextToSpeech plugin

Reply #17
New version supports confirmation / audio feedback when setting star ratings via remote control...

Personally I store my rating values in %STARS%. I definitely want these values to be held in the mp3 files themselves rather than foobar's database, and I get confused about where foobar core and other plugins store or look for RATING. If you don't, read RATING for STARS below.

Also, I distinguish between tracks with no defined STAR value, and those where it's explicitly set to 0 - the latter being tracks I positively DISLIKE, so it's easy to avoid them.

On my system, keyboard shortcuts Ctrl 0-5 (and Ctrl X) are configured to set (and remove) the STARS tag. The same key combinations plus Shift are configured for the corresponding actions within new foo_tts menu commands.

foo_tts assumes you might not want to change a star rating that's already explicitly set, so it asks for the keypress to be repeated for confirmation in that case. Otherwise it just accepts the change.

If a change is accepted, foo_tts sends the unshifted version of the key combination to the main menu processor (to be processed in the normal way) and tells you what it's done.

foo_tts - Yet Another TextToSpeech plugin

Reply #18
New version 1.0.4 correctly cycles back to the first "Speak TrackInfo NOW" format string if you keep asking for TrackInfo to be spoken after reaching the last configured format string.

Also included a short text file detailing the steps I go through to install/configure foobar, the plugins I use, and EventGhost. I use a "Portable Installation", so if I set it up for someone else I just transfer all files using a memory stick. But my stick failed when I was away from home recently, and it took quite a while to figure out all the little things I'd done over the years, so I'm keeping the details here in case I need them again. Let me know if they're helpful to you.

foo_tts - Yet Another TextToSpeech plugin

Reply #19
Norton Internet Security 2011 reported a threat. I wonder if this is safe? How does it compare with another one here

foo_tts - Yet Another TextToSpeech plugin

Reply #20
I don't use Norton - just MS Security Essentials these days. But I seriously doubt I have undetected malware on my development machine that adds malicious code to dll's that I compile. It's probably just a meaningless fluke caused by Norton's pattern-matching heuristics. Because there are hardly any copies of my dll out there, they wouldn't have bothered to specifically avoid fingering it as a false positive. Of course, if I was a professional malware writer myself, maybe that's exactly what I would say. 

As to differences between my plugin and the one from aganders3, they mostly come down to the fact that mine is specifically intended to be useful for people using a remote control (originally, blind people, until I realised it's good for anyone who doesn't want to keep going over to the screen/keyboard). His only supports "speak trackinfo before every track" (or disable it completely until you re-enable).

Mine supports "Speak trackinfo on request", with alternative details on successive requests if you want. Useful if you've got a remote when friends are round - anyone can just press the button to hear what's playing, without interupting the conversation. Plus I have "Set track ratings with audible feedback, and request for confirmation if relevant". Mine does support "DJ-style" speech as per aganders3's, because it was easy to include. But I've no use for it myself; I'd find it intrusive, to say the least.

When I get around to it, mine will also support "Add to playback queue another track by the current artist" where the selected track could be (a) "[one of] the highest-rated tracks by that artist in my library", or (b) "the least-recently played track by the artist, of those which I haven't assigned a star rating". I'll use (a) if I have friends round and someone says "Ah! I love this band! Have you got any more?". I'll use (b) when I'm looking for undiscovered gems in my library which has frankly gotten so big I don't have time to pay close attention to everything (besides which, sometimes I like a track later in a different context, even though I didn't particularly notice and rate it when I first heard it).

foo_tts - Yet Another TextToSpeech plugin

Reply #21
Thanks for the informative reply. That certainly cleared off my doubt. I guess that NIS reported it as infected files simply because there is a lesser pool of people using it. I'd like to ask one question though, how to make asian song title like Chinese or Japanese to be pronounced properly. For now the speaker will just speak glibberish letter whenever I played those song title.

foo_tts - Yet Another TextToSpeech plugin

Reply #22
For interupting the speech a and also not blocking the UI while the voice is speaking you can do that all with the Speak method.
Here are some relevant quotes:
Quote
The Speak method can be called synchronously or asynchronously. When called synchronously, the method does not return until the text has been spoken; when called asynchronously, it returns immediately, and the voice speaks as a background process.

If you want to interupt another chunk of speech you can sent an empty string and specify SVSFPurgeBeforeSpeak in the SpeechVoiceSpeakFlags . So at the end you want SVSFlagsAsync|SVSFPurgeBeforeSpeak.

foo_tts - Yet Another TextToSpeech plugin

Reply #23
...how to make ... Chinese or Japanese be pronounced properly.


I don't know if there are any SAPI-compliant Chinese or Japanese voices. Even if there are, I think you would have trouble because there are many dialects, and probably no standardisation on how to pronounce any particular sequence of letters. For instance, I believe to a Chinese person there is no good reason to write either Peking or Bejing - the sound of the name of the city has never changed, but I do not think any speech engine would pronounce both those written forms the same.

The best advice I can give is that you should create mp3 tags called TTS_ARTIST, TTS_TITLE, etc., containing whatever English text makes your installed SAPI voice pronounce the names as nearly as possible the way they should be spoken. For example, I don't know how Awaya is pronounced, but perhaps something like "OW YAH" might be acceptable to you. In Foobar, highlight all tracks by Awaya, right-click and select "Properties". Right-click in the "Properties" window and select "Add new Field...". Name the new field TTS_ARTIST (or tts_title, tts_album, etc.), and for the Single value put whatever text makes your speech engine say the words the way you want them spoken.

I have thought about this problem before. If there is any interest, I would consider creating support in my foo_tts plugin for people sharing customised pronunciations like this. I can imagine a special page here on Hydrogenaudio, to which people could upload text files containing lines something like this...

artist:ZZ Top:zee zee top
album:live songs:laive songs
artist:awaya:ow yah

(Where the colon is a 'field separator' between TagName:ActualSpelling:PhoneticSpelling)
I have those first two defined as mp3 tags (TTS_ARTIST and TTS_ALBUM) on appropriate tracks in my library, because even in English, they are spoken incorrectly.

My plugin could support a configuration option allowing you to open such a text file. I could speak each "Phonetic Spelling", and if you were happy with it, I could automatically add the relevant mp3 tag to all appropriate tracks in your library. If several people each spent just a little time creating phonetic spellings for "awkward" names, we might soon have a useful set of alternatives right here on Hydrogenaudio. It would be possible to merge lots of small "trusted" contributions into one big file that might sort out a lot of popular names that are mispronounced, all in one operation. Special "language-specific" versions of the configuration file could be maintained for non-English names.

foo_tts - Yet Another TextToSpeech plugin

Reply #24
For interupting the speech a and also not blocking the UI while the voice is speaking...


I think the functions you mention only control whether multiple consecutive requests to speak through SAPI override anything currently being spoken, or wait until the previous utterance has completed. The music itself is being played through a separate "processing thread", and would not be affected.

I could easily enough include code in my own plugin to "pause" the main music-playing thread while foo_tts is speaking, but I see no particular reason to do this. I personally wouldn't want my listening to be interrupted in such a distracting fashion. But if more than one person asks for such a feature here, I might consider adding a configuration setting to have it work this way. Perhaps, for example, you might want this only on the last of the three different speech patterns output if you repeatedly press "Speak TrackInfo NOW" (because by then you're probably having real trouble making out what words are being spoken, so you might well want to stop the music while you listen very carefully).

I suppose it might also make sense to pause music playback with "DJ-Style" speech before each track. But I never use this myself, and many tracks are very quiet in the first few seconds anyway, so I'm not sure it's particularly important.