Google Chrome Extensions

chrome.tts

Use the chrome.tts module to play synthesized text-to-speech (TTS). See also the related ttsEngine module, which allows an extension to implement a speech engine.

Overview

You must declare the "tts" permission in your extension's manifest to use this API.

Chrome provides native support for speech on Windows (using SAPI 5), Mac OS X, and Chrome OS, using speech synthesis capabilities provided by the operating system. On all platforms, the user can install extensions that register themselves as alternative speech engines.

Generating speech

Call speak() from your extension or packaged app to speak. For example:

chrome.tts.speak('Hello, world.');

To stop speaking immediately, just call stop():

chrome.tts.stop();

You can provide options that control various properties of the speech, such as its rate, pitch, and more. For example:

chrome.tts.speak('Hello, world.', {'rate': 2.0});

It's also a good idea to specify the language so that a synthesizer supporting that language (and regional dialect, if applicable) is chosen.

chrome.tts.speak(
    'Hello, world.', {'lang': 'en-US', 'rate': 2.0});

By default, each call to speak() interrupts any ongoing speech and speaks immediately. To determine if a call would be interrupting anything, you can call isSpeaking(). In addition, you can use the enqueue option to cause this utterance to be added to a queue of utterances that will be spoken when the current utterance has finished.

chrome.tts.speak(
    'Speak this first.');
chrome.tts.speak(
    'Speak this next, when the first sentence is done.', {'enqueue': true});

A complete description of all options can be found in the speak() method documentation below. Not all speech engines will support all options.

To catch errors and make sure you're calling speak() correctly, pass a callback function that takes no arguments. Inside the callback, check chrome.extension.lastError to see if there were any errors.

chrome.tts.speak(
    utterance,
    options,
    function() {
      if (chrome.extension.lastError) {
        console.log('Error: ' + chrome.extension.lastError.message);
      }
    });

The callback returns right away, before the engine has started generating speech. The purpose of the callback is to alert you to syntax errors in your use of the TTS API, not to catch all possible errors that might occur in the process of synthesizing and outputting speech. To catch these errors too, you need to use an event listener, described below.

Listening to events

To get more real-time information about the status of synthesized speech, pass an event listener in the options to speak(), like this:

chrome.tts.speak(
    utterance,
    {
      onEvent: function(event) {
        console.log('Event ' + event.type ' at position ' + event.charIndex);
        if (event.type == 'error') {
          console.log('Error: ' + event.errorMessage);
        }
      }
    },
    callback);

Each event includes an event type, the character index of the current speech relative to the utterance, and for error events, an optional error message. The event types are:

Four of the event types—'end', 'interrupted', 'cancelled', and 'error'—are final. After one of those events is received, this utterance will no longer speak and no new events from this utterance will be received.

Some voices may not support all event types, and some voices may not send any events at all. If you do not want to use a voice unless it sends certain events, pass the events you require in the requiredEventTypes member of the options object, or use getVoices() to choose a voice that meets your requirements. Both are documented below.

SSML markup

Utterances used in this API may include markup using the Speech Synthesis Markup Language (SSML). If you use SSML, the first argument to speak() should be a complete SSML document with an XML header and a top-level <speak> tag, not a document fragment.

For example:

chrome.tts.speak(
    '<?xml version="1.0"?>' +
    '<speak>' +
    '  The <emphasis>second</emphasis> ' +
    '  word of this sentence was emphasized.' +
    '</speak>');

Not all speech engines will support all SSML tags, and some may not support SSML at all, but all engines are required to ignore any SSML they don't support and to still speak the underlying text.

Choosing a voice

By default, Chrome chooses the most appropriate voice for each utterance you want to speak, based on the language and gender. On most Windows, Mac OS X, and Chrome OS systems, speech synthesis provided by the operating system should be able to speak any text in at least one language. Some users may have a variety of voices available, though, from their operating system and from speech engines implemented by other Chrome extensions. In those cases, you can implement custom code to choose the appropriate voice, or to present the user with a list of choices.

To get a list of all voices, call getVoices() and pass it a function that receives an array of TtsVoice objects as its argument:

chrome.tts.getVoices(
    function(voices) {
      for (var i = 0; i < voices.length; i++) {
        console.log('Voice ' + i + ':');
        console.log('  name: ' + voices[i].voiceName);
        console.log('  lang: ' + voices[i].lang);
        console.log('  gender: ' + voices[i].gender);
        console.log('  extension id: ' + voices[i].extensionId);
        console.log('  event types: ' + voices[i].eventTypes);
      }
    });

API Reference: chrome.tts

Types

TtsEvent

( object )
An event from the TTS engine to communicate the status of an utterance.

Properties of TtsEvent

type ( enumerated string ["start", "end", "word", "sentence", "marker", "interrupted", "cancelled", "error"] )
The type can be 'start' as soon as speech has started, 'word' when a word boundary is reached, 'sentence' when a sentence boundary is reached, 'marker' when an SSML mark element is reached, 'end' when the end of the utterance is reached, 'interrupted' when the utterance is stopped or interrupted before reaching the end, 'cancelled' when it's removed from the queue before ever being synthesized, or 'error' when any other error occurs.
charIndex ( optional double )
The index of the current character in the utterance.
errorMessage ( optional string )
The error description, if the event type is 'error'.

TtsVoice

( object )
A description of a voice available for speech synthesis.

Properties of TtsVoice

voiceName ( optional string )
The name of the voice.
lang ( optional string )
The language that this voice supports, in the form language-region. Examples: 'en', 'en-US', 'en-GB', 'zh-CN'.
gender ( optional enumerated string ["male", "female"] )
This voice's gender.
extensionId ( optional string )
The ID of the extension providing this voice.
eventTypes ( optional array of string )
All of the callback event types that this voice is capable of sending.

Methods

speak

chrome.tts.speak(string utterance, object options)

Speaks text using a text-to-speech engine.

Parameters

utterance ( string )
The text to speak, either plain text or a complete, well-formed SSML document. Speech engines that do not support SSML will strip away the tags and speak the text. The maximum length of the text is 32,768 characters.
options ( optional object )
The speech options.
enqueue ( optional boolean )
If true, enqueues this utterance if TTS is already in progress. If false (the default), interrupts any current speech and flushes the speech queue before speaking this new utterance.
voiceName ( optional string )
The name of the voice to use for synthesis. If empty, uses any available voice.
extensionId ( optional string )
The extension ID of the speech engine to use, if known.
lang ( optional string )
The language to be used for synthesis, in the form language-region. Examples: 'en', 'en-US', 'en-GB', 'zh-CN'.
gender ( optional enumerated string ["male", "female"] )
Gender of voice for synthesized speech.
rate ( optional double )
Speaking rate relative to the default rate for this voice. 1.0 is the default rate, normally around 180 to 220 words per minute. 2.0 is twice as fast, and 0.5 is half as fast. Values below 0.1 or above 10.0 are strictly disallowed, but many voices will constrain the minimum and maximum rates further—for example a particular voice may not actually speak faster than 3 times normal even if you specify a value larger than 3.0.
pitch ( optional double )
Speaking pitch between 0 and 2 inclusive, with 0 being lowest and 2 being highest. 1.0 corresponds to a voice's default pitch.
volume ( optional double )
Speaking volume between 0 and 1 inclusive, with 0 being lowest and 1 being highest, with a default of 1.0.
requiredEventTypes ( optional array of string )
The TTS event types the voice must support.
desiredEventTypes ( optional array of string )
The TTS event types that you are interested in listening to. If missing, all event types may be sent.
onEvent ( optional function )
This function is called with events that occur in the process of speaking the utterance.
Parameters
event ( TtsEvent )
The update event from the text-to-speech engine indicating the status of this utterance.

Callback function

If you specify the callback parameter, it should specify a function that looks like this:

function() {...};

stop

chrome.tts.stop()

Stops any current speech.

isSpeaking

chrome.tts.isSpeaking()

Checks whether the engine is currently speaking. On Mac OS X, the result is true whenever the system speech engine is speaking, even if the speech wasn't initiated by Chrome.

getVoices

chrome.tts.getVoices()

Gets an array of all available voices.

Sample Extensions that use chrome.tts

  • Console TTS Engine – A "silent" TTS engine that prints text to a small window rather than synthesizing speech.
  • Speak Selection – Speaks the current selection out loud.
  • Talking Alarm Clock – A clock with two configurable alarms that will play a sound and speak a phrase of your choice.
  • TTS Debug – Tool for developers of Chrome TTS engine extensions to help them test their engines are implementing the API correctly.
  • TTS Demo – Demo Chrome's synthesized text-to-speech capabilities.