This is third post in a series on Text-to-Speech for eLearning written by Dr. Joel Harband and edited by me (which turns out to be a great way to learn). In the first two posts, Text-to-Speech Overview and NLP Quality and Digital Signal Processor and Text-to-Speech, we introduced the text to speech voice and discussed issues of quality related to its components: the natural language processor (NLP) and the digital signal processor (DSP). In this post we will begin to address the practical side of the subject: How can e-learning developers use Text-to-Speech (TTS) voices to narrate their courses? What tools are immediately available?
Text-to-Speech (TTS) Tools for eLearning Applications
There are a number of possibilities available today for using TTS for eLearning; they fall into two categories or approaches:
- TTS Stand-Alone. A general approach in which developers use any standard authoring tool such as Articulate or Lectora and use stand-alone TTS on-demand services/products to create audio files that are then linked or embedded in the presentation.
- TTS Integrated. Products/services that have TTS voices bundled and integrated with an authoring solution, including Adobe Captivate and Tuval Software Industries’ Speech-Over Professional.
In this article, we are going to concentrate only on using TTS Stand-Alone tools to create audio files that are embedded into a course.
TTS Stand-Alone Web Services
TTS stand-alone products can be used by eLearning developers irrespective of the authoring tool they are used. Several of the voice vendors offer on-demand TTS voice web services which accept text and produce sound files. Here are a few of the top web services for TTS:
Company | Web Service |
Loquendo | |
NeoSpeech | |
Acapela-Group |
These web services have the advantages:
- Choose any voice among a set of vendors voices
- Set pitch, speed volume of voice for the entire file
- Select type of sound file output (wav, mp3, etc)
- Preview function
- Pronunciation dictionary
- Pay as you go
Disadvantages
- Because they are web services, there’s no automatic connection with the desktop file system. Most of the time you are creating audio files locally and thus having access to the file system means it will keep files up-to-date. In some cases, this also applies to things like storing scripts and default settings.
This can be a major disadvantage and cause significant extra steps. Because of this, we are going to concentrate on a particular desktop stand-alone product to illustrate the eLearning production workflow.
Acapela Virtual Speaker – a Desktop Stand-Alone TTS Product
Acapela-Group offers a desktop stand-alone product, Acapela Virtual Speaker, that is better suited to eLearning production than most of the web services solutions listed above.
As an example, let’s see how to work with Acapela Virtual Speaker. Virtual Speaker works with input text files (the narration scripts) and output sound files organized into directories. Narration scripts (text files) are stored for easy updates and the system makes it easy to generate the associated sound files based on updates. The sound files are generally easy to find and access from any authoring tool.
To create a sound file from narration text for an authoring tool using Virtual Speaker, you perform the following:
- Define a file naming system to identify the text and sound files for the authoring tool
- Set working folders for input text files and output sound files
- Enter new narration scripts or open a stored narration script file from the text files working folder
- Select the language and voice for this sound file
- Select the volume, pitch, speed of the voice
- Press the Play button to preview the voice reading the text
- Make changes in text and voice settings as required
- Name the text file according to the naming system (for new text) and save it in the working folder
- Select the output format: wav, mp3, etc
- Press the Record button, a sound file is created with the same name as the text file and stored in the working folder
To import the sounds files into the authoring tool use the File Import function of the tool to import the file from the working folder as required.
It sounds really easy and it is. Stand-alone TTS tools are used to create sound files just as you would if you had a human recording audio for the course. These sound files then need to be associated with the content using the authoring tool. In later posts, we’ll get into more specific comparisons of TTS vs. human narration. In terms of taking the resulting audio files and using them via an authoring tool, the level of effort is similar.
Of course, both human narration and TTS tools that produce audio files means that it takes some work to get the audio files embedded in the authored course, including importing the files and in some cases synchronizing them with a time-line editor. Tools that have embedded TTS, like Adobe Captivate, make this significantly easier. And if you make changes to the script, you will need to create new audio files and import them again. This is much easier than having to go through another round of narration. But it still takes work.
Personal TTS Readers Not Licensed for eLearning
Some readers may be wondering why we haven’t mentioned the TTS “personal reader” products such as: Natural Reader , TextAloud, Read the Words, and Spoken Text as possibilities for eLearning tools. The reason is that sound files produced by personal readers are for personal use only and are not allowed, by license, to be distributed. This restriction means that these products cannot be used for eLearning, where sound files are distributed to learners. We’ll talk more about this important subject in a future post.