Tony Karrer's eLearning Blog on e-Learning Trends eLearning 2.0 Personal Learning Informal Learning eLearning Design Authoring Tools Rapid e-Learning Tools Blended e-Learning e-Learning Tools Learning Management Systems (LMS) e-Learning ROI and Metrics

Tuesday, July 27, 2010

Using Text-to-Speech in an eLearning Course

This is third post in a series on Text-to-Speech for eLearning written by Dr. Joel Harband and edited by me (which turns out to be a great way to learn).  In the first two posts, Text-to-Speech Overview and NLP Quality and Digital Signal Processor and Text-to-Speech, we introduced the text to speech voice and discussed issues of quality related to its components: the natural language processor (NLP) and the digital signal processor (DSP). In this post we will begin to address the practical side of the subject: How can e-learning developers use Text-to-Speech (TTS) voices to narrate their courses? What tools are immediately available?

Text-to-Speech (TTS) Tools for eLearning Applications

There are a number of possibilities available today for using TTS for eLearning; they fall into two categories or approaches:

  1. TTS Stand-Alone. A general approach in which developers use any standard authoring tool such as Articulate or Lectora and use stand-alone TTS on-demand services/products to create audio files that are then linked or embedded in the presentation.
  2. TTS Integrated. Products/services that have TTS voices bundled and integrated with an authoring solution, including Adobe Captivate and Tuval Software Industries’ Speech-Over Professional.

In this article, we are going to concentrate only on using TTS Stand-Alone tools to create audio files that are embedded into a course.

TTS Stand-Alone Web Services

TTS stand-alone products can be used by eLearning developers irrespective of the authoring tool they are used. Several of the voice vendors offer on-demand TTS voice web services which accept text and produce sound files.  Here are a few of the top web services for TTS:


Web Service




On Demand



These web services have the advantages:

  • Choose any voice among a set of vendors voices
  • Set pitch, speed volume of voice for the entire file
  • Select type of sound file output (wav, mp3, etc)
  • Preview function
  • Pronunciation dictionary
  • Pay as you go


  • Because they are web services, there’s no automatic connection with the desktop file system.  Most of the time you are creating audio files locally and thus having access to the file system means it will keep files up-to-date.  In some cases, this also applies to things like storing scripts and default settings.

This can be a major disadvantage and cause significant extra steps.  Because of this, we are going to concentrate on a particular desktop stand-alone product to illustrate the eLearning production workflow.

Acapela Virtual Speaker – a Desktop Stand-Alone TTS Product

Acapela-Group offers a desktop stand-alone product, Acapela Virtual Speaker, that is better suited to eLearning production than most of the web services solutions listed above. 

As an example, let’s see how to work with Acapela Virtual Speaker. Virtual Speaker works with input text files (the narration scripts) and output sound files organized into directories.  Narration scripts (text files) are stored for easy updates and the system makes it easy to generate the associated sound files based on updates.  The sound files are generally easy to find and access from any authoring tool.

To create a sound file from narration text for an authoring tool using Virtual Speaker, you perform the following:


  1. Define a file naming system to identify the text and sound files for the authoring tool
  2. Set working folders for input text files and output sound files
  3. Enter new narration scripts or open a stored narration script file from the text files working folder
  4. Select the language and voice for this sound file
  5. Select the volume, pitch, speed of the voice
  6. Press the Play button to preview the voice reading the text
  7. Make changes in text and voice settings as required
  8. Name the text file according to the naming system (for new text) and save it in the working folder
  9. Select the output format: wav, mp3, etc
  10. Press the Record button, a sound file is created with the same name as the text file and stored in the working folder

To import the sounds files into the authoring tool use the File Import function of the tool to import the file from the working folder as required.

It sounds really easy and it is.  Stand-alone TTS tools are used to create sound files just as you would if you had a human recording audio for the course. These sound files then need to be associated with the content using the authoring tool.  In later posts, we’ll get into more specific comparisons of TTS vs. human narration.  In terms of taking the resulting audio files and using them via an authoring tool, the level of effort is similar.

Of course, both human narration and TTS tools that produce audio files means that it takes some work to get the audio files embedded in the authored course, including importing the files and in some cases synchronizing them with a time-line editor.  Tools that have embedded TTS, like Adobe Captivate, make this significantly easier. And if you make changes to the script, you will need to create new audio files and import them again. This is much easier than having to go through another round of narration. But it still takes work.

Personal TTS Readers Not Licensed for eLearning

Some readers may be wondering why we haven’t mentioned the TTS “personal reader” products such as: Natural Reader , TextAloud, Read the Words,  and Spoken Text as possibilities for eLearning tools. The reason is that sound files produced by personal readers are for personal use only and are not allowed, by license, to be distributed. This restriction means that these products cannot be used for eLearning, where sound files are distributed to learners. We’ll talk more about this important subject in a future post.


Chris said...

Even the best TTS can only do one thing - receive text and spit it back out. There is no substitute for a professional voice talent, who can interpret the meaning and message of your e-learning scripts. A good VO knows how and when to change up the tone or feel of a read when things are getting overly technical or have gone on a while. The most sophisticated TTS (and yes I do listen to Kraftwerk, unsung pioneers of synthetic speech) cannot approach a real voice person for e-learning. Why budget VO out of your projects when the cost of a good VO will more than pay for itself with satisfied clients and learners.

info berita terbaru said...

The most sophisticated TTS (and yes I do listen to Kraftwerk, unsung pioneers of synthetic speech) cannot approach a real voice person for e-learning.

AudioDrug said...

I recommend VocaTalk Personal Podcast ( This is a tool that combines TTS + music and generates podcasts to promote on-the-go reading and learning experience. It also does posprocesing of the speech to create a more fun and listenable TTS experience.

vtlau said...

Seems to me this line of products is not matured yet. Most of the time a real person is cheaper (considering all the related cost) still.

niclasbergstrom said...

You guys should perhaps have a look at the ReadSpeaker web reading services used by a good number of eLearning companies and publishers, like this one for example; Gale Users to Hear ReadSpeaker, Loud and Clear.

Anonymous said...

I’ve used this method of learning in the past but hadn’t realized the technicalities involved until now. My experience with TTS was actually a very good one because I found it to be effective in the learning process. I can certainly imagine the difficulties in eliminating ambiguity to create a more natural product. I learned a great deal from your post and enjoyed finding out about the challenges involved. I’m curious to see where this technology takes us in the near future. Great information…thanks.

Question: The cost here seems a little steep…does anyone believe the cost is worth the benefits we will gain from using TTS in the learning/teaching process?


Anonymous said...

Great article Tony,

You have compiled a great TTS list here. However, I have found that no matter what text-to-speech program I use, there are far too many oddities in pronunciation to efficiently and cost effectively use one for narration. Besides, finding a professional narrator is easy, and depending on who you choose, it can be inexpensive as well.

Check out The Narrator Files. They price narration by the page, and have exemplary voice talent.