Skip to main content

Topic: Text to speech component (Read 64774 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • zanson
  • [*][*][*]
Text to speech component
Reply #25
Quote
Vocal remover, lyrics as input and you can have Sam sing your favourite tune...  

SCARY VERY SCARY

  • VTR
  • [*]
Text to speech component
Reply #26
I'm in the process of getting those AT&T natural voices.  I'll let yall know how it goes.

  • anza
  • [*][*][*][*][*]
Text to speech component
Reply #27
A hard song for MS Mary: Led Zeppelin - D'yer Mak'er 

  • paulski
  • [*][*]
Text to speech component
Reply #28
:lol:  Mary likes a challenge.
I'm looking forward to your impression on natural voices VTR.

  • zanson
  • [*][*][*]
Text to speech component
Reply #29
Quote
I'm already working on your last suggestion Canar, but I'm not sure what the advantage of using it as an input plugin is. Can you expand on that?
upNorth - I'm not sure if it's possible to delay playback of the audio until the speech has finished because I currently use the callback from foobar that a song has just started as a trigger to speak. Any suggestions?

Paulski

kode54 has a "pause between tracks" plugin with source code on his page: http://www.cqasys.com/projects/kode54/ or http://www.cqasys.com/projects/kode54/0.7/ for it updated to the 0.7beta sdk.  You should be able to build upon this to insert the song names instead of a pause.  (I haven't looked at any of the code to see how he does the pauses)
  • Last Edit: 26 June, 2003, 03:40:18 PM by zanson

  • paulski
  • [*][*]
Text to speech component
Reply #30
There's a new version of the plugin available with the following features:
DJ mode: the volume of the music is reduced whilst speaking. (BTW thanks for the info zanson, I'll check it out coz theres currently a latency issue with reducing the volume).
Tag field customisation: There are 3 text fields where you can specifiy what tag fields should be read aloud.
The volume and the rate are currently not enabled (will let you know when).

Paulski

  • Saint
  • [*][*]
Text to speech component
Reply #31
This sounds like a really great plugin, any news on a new version? maybe for 0.7x ?? 
"If you cannot read this, please ask the flight attendant for assistance."
- United Airlines Flight Safety Brochure

  • jrbamford
  • [*][*][*][*]
Text to speech component
Reply #32
I used some linux speech tools when porting a friends mp3 jukebox over to supporting mpc and the like... I had it so you could print out a list of albums with their numbers.. and then using a normal infra red remote you could just press the digits for the album you wanted

123 enter

and it would queue or play the album... or best of all leave it in random mode and it would just peruse all your tracks... it had some user management too, u logged in via telnet.. you rated songs and then in random mode it only played songs everyone liked (or at least wouldn't play any that SOMEONE logged in, didn't like) it was great for computer labs at uni...

I really would love to replicate something like this using foobar but I digress.. when i did the speech part it outputted the usual, artist, album, title... crucially.. it was on a button on the remote... pressing it paused the song... told you its details.. and then played it again... something like this would be really useful.. I dont know how IR controls of foobar are working as yet but this to me is the ideal... telling you every track as an option is good but when you know it (and u usually know most tracks) its just a gimmik that could frustrate... as an option anytime in a song its great... although you really need to NOT be able to see the screen...

is this possible?
Binaural recordings of mine: http://binaural.jimtreats.com

  • Saint
  • [*][*]
Text to speech component
Reply #33
Quote
I used some linux speech tools when porting a friends mp3 jukebox over to supporting mpc and the like... I had it so you could print out a list of albums with their numbers.. and then using a normal infra red remote you could just press the digits for the album you wanted

123 enter

and it would queue or play the album...


That really is a great idea as i use a remote control, to control foobar myself (Hauppague wintv pci controller and a little program called IR Remote). Selecting the album using numbers and having it read out would be a dream come true.

Saint
  • Last Edit: 24 July, 2003, 11:10:55 AM by Saint
"If you cannot read this, please ask the flight attendant for assistance."
- United Airlines Flight Safety Brochure

  • jrbamford
  • [*][*][*][*]
Text to speech component
Reply #34
this reading aloud is only doing artist... album works too.. but title is never being read out..

Quote
That really is a great idea as i use a remote control, to control foobar myself (Hauppague wintv pci controller and a little program called IR Remote).


I have the PVR-250.. dont much like the remote but its only one possible remote i guess. i brought a few IR devices from europe.. one broke... i've since brought the actisys IR200L to use with the PVR-250 and SageTV ... it sends IR channels to my satellite box.. changing channels as and when it wants to... it also works reading in IR...

Anyways how much control of foobar have you got.. I've not tried as i assumed it wasn't much... for my Sage controlling i brought a wireless mouse and keyboard... this lets me do everything.. and is essential when i connect the PC to my projector to watch TV/DVDs.. its great.. but for straight audio nothing beats IR remote control..
Binaural recordings of mine: http://binaural.jimtreats.com

  • jrbamford
  • [*][*][*][*]
Text to speech component
Reply #35
ok title is working now... great stuff... it'd be nice if you could add your own plaintext seperaters... words such as

BY, ON, IN

etc are all very useful and makes the fields flow a little better together.. now to try some of the updated voices
Binaural recordings of mine: http://binaural.jimtreats.com

  • paulski
  • [*][*]
Text to speech component
Reply #36
Hi

Sorry, I was out of touch with the forum for a while. jrbamford, your suggestion sounds excellent. I think all that's needed is to provide a hotkey function to read out the current song. You can then associate this hotkey with whatever RC you are using. I will enable this feature and put it in the config page as an option.

Paulski

P.S. There are futher possibilities for speech that may enhance foobar and even eliminate the need for a display:
Speaking aloud the list of albums (from the database) and using speech input to add to the playlist and play (general speech control of the app is also an option).

What do you think?

  • foosion
  • [*][*][*][*][*]
  • Moderator
Text to speech component
Reply #37
Quote
There are futher possibilities for speech that may enhance foobar and even eliminate the need for a display:
Speaking aloud the list of albums (from the database) and using speech input to add to the playlist and play (general speech control of the app is also an option).

What do you think?

My suggestion would be that you implement an interface to the TTS engine in the form of a foobar service (in case you haven't already done this). Other plugins can use TTS capabilities in this way, and you only need one configuration for the TTS engine (in your plugin).
I don't know, if the STT stuff would work out well for things like adding entries to the playlist. Surprise me.
http://foosion.foobar2000.org/ - my components for foobar2000

  • paulski
  • [*][*]
Text to speech component
Reply #38
I'm not sure I understand the advantage of your suggestion foosion. I do not implement a TTS engine myself but simply make use of it via the Microsoft Speech API on client systems that already have it installed (win2k / XP). I guess your solution would save the developer from installing the MS Speech SDK though.

The hotkey approach would still be nice though, whereby users (non-programmers) of girder-type applications can simply foward a key event to foobar that the plugin can capture. The user simply associates key combis within the config panel of the plugin with the girder key to invoke a TTS call.

I'm unsure myself about whether the speech-based playlist control / composition would work nicely.

  • jrbamford
  • [*][*][*][*]
Text to speech component
Reply #39
paulski.. yes having it on a button (which can then be mapped to ir controller keys) is what you need... configuring it to pause or not pause (just play simultaneously as it does now) would be useful.. getting a lot more fields added would also be good... shouldn't take much to add 5 or 6 total num of fields... although like i said having a plaintext field where you can type whatever you want into it would be the most flexible.. following a reinstall i dont have your plugin installed yet.. i'm also moving over to the 0.7 beta (does it work with this, does it ONLY work with this??  ) but roughly this would be nice

Field 1: %TITLE%
Field 2: by
Field 3: %ARTIST%
Field 4: off of
Field 5: %ALBUM%
Field 6: in
Field 7: %YEAR%

I guess really you would want a page that let you have as many fields as you want without cluttering it up!?! above fields 2, 4, and 6 are plain text to allow you to sculpt it however you want.. I also noticed that this plugin when i last used it didn't get output to the streaming output created by foobar... not a biggie but it'd be nice to have this as an option if its possible.. streaming music is obviously a really good example for where this system is nice... ok they get the information but at the moment that information is restricted (oddcast only broadcasts artist - title) .. as such this kind of a mechanism especially if you are able to put it in before a track started say?! would be a great way to infrom people of what they are listening to..
Binaural recordings of mine: http://binaural.jimtreats.com

  • foosion
  • [*][*][*][*][*]
  • Moderator
Text to speech component
Reply #40
I know you did not develop a TTS engine yourself. I merely assumed* it would take some hassle to get the TTS engine working nicely with (within?) foobar, and that you already had some code to accomplish this. So the benefits of having a foobar service for a TTS engine would be that the extra code to get the TTS engine working with foobar would only need to be in one plugin. Other plugins would just use the service like this:
Code: [Select]
if (text_to_speech::present())
  text_to_speech::speak("Foobar rules!");
provided that present() and speak() are static methods of a hypothetical text_to_speech interface.
Sorry for making assumptions, I did not have a look at the MS SAPI.

*: usually a bad idea, I know.
http://foosion.foobar2000.org/ - my components for foobar2000

  • paulski
  • [*][*]
Text to speech component
Reply #41
With the latest version of the MS Speech SDK, only requires about 5 - 10 lines of code are needed to get it to say anything (through helper functions). Your point would certainly be valid for the older versions of the SDK though. The benefit of a single plugin may still be valid however since developers wouldn't need to download and install the SDK in order to make use of speech in their code.

  • jrbamford
  • [*][*][*][*]
Text to speech component
Reply #42
paulski, any ideas why the speech doesn't come out with the broadcasted web stream..?! what methods are u using to put out the sound?? i guess you are just creating the sound to wave out/direct sound etc.. its got no attachment to foobars output stream and so thats why its never broadcast!? do the SDKs allow you to piggy back the speech stream onto an existing output buffer!?
  • Last Edit: 30 July, 2003, 01:47:32 PM by jrbamford
Binaural recordings of mine: http://binaural.jimtreats.com

  • zanson
  • [*][*][*]
Text to speech component
Reply #43
Quote
but roughly this would be nice

Field 1: %TITLE%
Field 2: by
Field 3: %ARTIST%
Field 4: off of
Field 5: %ALBUM%
Field 6: in
Field 7: %YEAR%

I guess really you would want a page that let you have as many fields as you want without cluttering it up!?! above fields 2, 4, and 6 are plain text to allow you to sculpt it however you want.. I also noticed that this plugin when i last used it didn't get output to the streaming output created by foobar... not a biggie but it'd be nice to have this as an option if its possible.. streaming music is obviously a really good example for where this system is nice... ok they get the information but at the moment that information is restricted (oddcast only broadcasts artist - title) .. as such this kind of a mechanism especially if you are able to put it in before a track started say?! would be a great way to infrom people of what they are listening to..

It should be pretty easy to just have a text input that you can type foobar format strings into, and then use the conversion functions to get back the string to be said.

ie, just have a text input where you put
%TITLE% by %ARTIST% off of %ALBUM% in %YEAR%

which you use the foobar sdk to convert to
some song by some artist off of some album in 1962

then pass that string into the text to speach engine.

  • jrbamford
  • [*][*][*][*]
Text to speech component
Reply #44
sounds good.. cleaner than having lots of text boxes appearing too

dammit i would really like a program to create a wave of a txt->speech engine right now... so i could add one onto the end of this playlist before i kill it to a sleeping listener
Binaural recordings of mine: http://binaural.jimtreats.com

  • paulski
  • [*][*]
Text to speech component
Reply #45
Nice idea about the text input format. I like it.  I will definitely make a version that supports it.
The speech output currently goes directly to the soundcard. It is possible to mix the output with the stream but I don't know how much work that would be.
  • Last Edit: 31 July, 2003, 03:17:15 AM by paulski

Text to speech component
Reply #46
what about this nice plugin?? how is work going?? any results

  • paulski
  • [*][*]
Text to speech component
Reply #47
Not yet. I've been as busy as a very busy bee the past couple of weeks. I'll get started over the next few days though and put the source files on my site so others can extend it further.

Paulski

  • paulski
  • [*][*]
Text to speech component
Reply #48
At last. A new version of the text to speech plugin (go to the link at the start of this thread).
The plugin can now speak aloud a formatted string entered in the config using the same notation provided by title formatting.
There is also the option to manually trigger song announcements using a shortcut key (you have to assign a key yourself in the keyboard shortcuts config by choosing 'Say current playlist item').
There is also a (disabled) option for auto announcing around the end of a song (like the DJs do). I'll enable it at a later date.

Paulski

  • Saint
  • [*][*]
Text to speech component
Reply #49
I take it this is for 0.667 as it complains about needing to be compiled with a new SDK. Sounds like a great plugin, keep up the good work.

Saint
"If you cannot read this, please ask the flight attendant for assistance."
- United Airlines Flight Safety Brochure