Thursday, May 28, 2009

Text to Speech

Update: there's a series on Text-to-Speech eLearning:

As part of the eLearning Tour, Christopher von Koschembahr demonstrated a podcast where he conducted an interview. The interviewer was a female voice asking questions. She would ask a question and then he would respond. What was interesting is that the interviewer was produced using Text to Speech. You can listen to the result at the 12:23 mark of the first video on the eLearning Tour page.

The effect was interesting. For short bursts of text, the computer generated voice was okay. Thus, it works well with the interview structure. Of course, you quickly know that he's written the questions. Still it works for me. In general, I think the response during the session was positive.

Based on the comments during the session, a few questions came up:

  • Does Text to Speech work for longer form content?
  • Captivate 4 contains built in Text to Speech. Are people using that much? For what kinds of content?

I did a couple of tests with a few of the free Text to Speech services that I found to see what something that's a big longer would sound like. I tried YAKiToMe!, Feed2Podcast, SpokenText, and ReadTheWords. None of them produced something that I could listen to for any length of time. My conclusion -

Text to Speech works for short bursts, not for longer amounts of content.

I'd be curious what anyone is doing out there with Text to Speech beyond the example that Chris discussed.

Anyone using it in their Captivate courses? How? How much text do you use?


V Yonkers said...

I have a colleague who traveled 4 hours one way to class (in addition to working full time). She would "read" her required readings on Adobe, listening through the text to speech feature while driving down to class. She would plug her lap top into the car. While I found the voice a bit disconcerting (she tested out the 3 options and chose a voice she liked the best) she said it was not bad once you got used to the mechanical voice. And it was a much better use of her time.

I have had students and my nephew, who use the voice to text feature due to disabilities (including a student with severe MS and one with a broken wrist). Another colleague set one of her blind students up with the text to voice option.

Lynn Ireland said...

I find the text-to-speech capabilities of Captivate 4 to be very useful for the editing process. Generally the proper feel of a lesson can't be appreciated until the audio is on, but if changes need to be made this can then result in up to a day of wasted effort recording, editing, syncing etc (often just to replace a slide or two - have yet to manage replacing a single slide without it being obvious so need to re-record the whole thing).

Instead you can just type your script, set up the text to speech on each slide and give the lesson to the reviewer as something pretty close to the final product but within a much shorter time frame.

I think it's great for that, but certainly wouldn't advocate it as an alternative to human speech in the final product.

Sreya Dutta said...

Tony, we tried text to speech once in my aviation course creation experience. We used Microsoft TTS but the client rejected the machine like voice. I suppose technology has advanced today and we should have better options.


DrBob said...

HI Tony..if you remember from last year I gave you this:

Not only text to speech but a computer driven mannequin stuck in as well. We use these to present some key ideas during UG courses . JZK(the mannequin with the TtoS) is one of our most popular members of staff according to student evals...

Peter Miller said...

In similar vein to Bob, there is xtranormal. My very simple example:

Tony Karrer said...

Thanks for the great comments.

@Virginia - good use case where it makes sense.

@Lynn - I feel a bit stupid not to have thought of that. Great idea! Does everyone else do that as well and I've just missed it?

@Sreya - I wonder if that's really the case. What I've heard is a little better, but not that much.

@DrBob - I had forgot. Thanks for the reminder. I'm afraid that I could not listen to it for very long. What's interesting is that if you aren't looking at the character - the voice is not great. You give it a break when you see that it's computer generated. Still, I wouldn't want to listen to this for very long.

@Peter - I think the voice quality on the example is better than many of the others I heard, but still better as part of movie and for shorter burst. Also, funny to hear "beeta release" - wasn't sure what it meant at first.

That said, Peter, you hooked me. Made me want to play a bit with the movie maker. Fun stuff. Unfortunately, I got a SQL error when I tried to register, so I lost my movie. Rather unsatisfying ending.

Peter Miller said...

I'm UK sourced and hence the pronunciation works for me (nice, in fact, that the option was there although the intonation does sound vaguely alien to me).

I'm not sure you would want to use the xtranormal avatars for long screeds of text but I think it is a plus that you can change camera angles and also insert attention-grabbing phrases you might think twice about using f2f.

I also had problems with retrospective signup though the movie survived the experience.

Sreya Dutta said...

Tony, that was way back in 2001 so I don't think the quality was much the same as I see now. Plus the client was probably more used to human voices so he changed his mind. There was a lot of screen text to read then and it didn't work out for us.


Mark King said...

We did a pretty thorough research about six months ago for a four hour e-learning course but could not find much on offer that met our needs.

A major advantage to text to speach (other than the obvious) is the ability to maintain the same voice when updating content, say a year after initial production.

Though this is a little of the initial topic, it is worth keeping in mind when creating such content.

To cater for this, we made sure that our voice over artists have been with our narration provider for a while and would be likely to be around for a while.

subquark said...

After all these years talking accessibility . . . it's still talk - no text alternative. This is an example of the material existing in textual form and yet, still not being offered as an option for those of us who prefer, or are unable, to use audio.

I realize the person responding to the interview is just talking. But with incredibly inexpensive transcription options, like Amazon's Mechanical Turk, there is little excuse to not provide a text version.

In my opinion, it is laziness and lack of compassion to not do this. I am guilty of serving up elearning without text when using YouTube and iTunes (although the new QT is supposed to handle captioning) but that is a function of those channels. My eLearning always has text on my own site because I have control over the format displayed.

Anyway, just a rant of mine on our own hypocrisy in eLearning about accessibility.

globetrottingkerry said...

We've used some text-to-speech in the past 2-3 years. I've found that use the Neo voices (I think that's the term for them) creates a more realistic voice. And editing the pronunciation also helps if you need an alternate pronunciation or if you're including acronyms/abbreviations.

That being said, our clients still typically prefer human voice to TTS.

Matthew Bibby said...

I have recently used TTS in Captivate 4 to produce a series of short demonstrations.

Prior to using TTS we presented a trial to approximately 200 learners and the feedback about TTS was favorable (however well noted was a preference for an AU accent . . . )

In response to Subquark's comment about accessibility . . . one of the things I love about the TTS feature in Captivate 4 is that the script can be easily (& quickly) converted to CC and the learner can choose between CC & TTS.

One other thing to mention, I deliberately played with TTS and made the use of the 'robot' voices amusing by having two characters conversing and "OVERSTRESSING" key points, mispronouncing words occasionally (while the other character chuckles in the background) etc.

While this is not suited for every course, it was appropriate in this situation and was surprisingly well received.


Bobcat said...

I'd like to see Captivate support other TTS engines. I like Cepstral voices.

Has anyone used their new online TTS service at ? They offer over 50 voices. No other vendor has that much variety and students today want more personality.

weheh said...

I use the YAKiToMe! tts for listening to a wide variety of materials. I've found it acceptable for entire books. I can crank up the words per minute to a faster pace to listen to material while driving. It makes good use of my time. I like the rss feed integration, too.

Pablo said...

If you want to listen to some blogs, you may want to try Hear a Blog, It's not text to speech, it's real people narrating blogs.

Jennifer said...

I use captivate heavily for etraining purposes. my training courses are typically around software training. I use their text to speach which helps me to rapidly develop content. Unfortunately there are only 2 voices to choose from and i'm struggling trying to find TTS providers that will work with the tool.

joel said...

Relevant to this discussion, I'd like to mention our text-to-speech elearning tool Speech-Over Professional ( which comes with two premium TTS voices from NeoSpeech and Acapela-Group. Speech-Over adds voice-overs to PowerPoint presentations which can then be published to the Web with Camtasia, Captivate, Articulate, ViewletCam and many others. The success of this combination in training departments of US corporations indicates that text-to-speech will replace professional voice talents in training presentations.

Joel Harband
Tuval Software Industries

Anonymous said...

My name's Jonecir and I'm from Brazil. I've been using text-to-speech on my eLearning courses and it works pretty good even for large content. I don't use the text-to-speech capability offered by Captivate 4. Free text-to-speech applications are not that good. The one paid application I'm currently using is TextAloud which you can find at and it's very cheap.

joel said...


You should be aware that personal readers like Text Aloud are not licensed for audio distribution, which is usually required for e-learning where you distribute the sound files to the learners.


Jim Kinneer said...

Anyone aware of scholarly research comparing tts to human voiceover? I am in the process of researching the topic.