polly_instructions

The instructions and demo code here help show how to quickly get started with Amazon Polly and easily integrate it into your existing applications.

What you'll do

We will first use the Polly console to convert sample sentences and SSML to speech.

Permissions

Getting Started

Before proceeding, you'll need an AWS account. And for the purposes of this workshop, please choose Virginia (us-east-1) as your region.

Polly generates speech from both text and SSML (Speech Synthesis Markup Language). SSML allows you to customize and control things like pronunciation, volume, and speed.

SSML

Example 1

Our first SSML snippet below showcases tags to customize speech to account for phone numbers and variable speed.

<speak>
  Thank you for calling today. Your phone number is <say-as interpret-as="telephone">2122241555</say-as>. 
  <prosody rate="x-slow"> I know </prosody> this is not a number, hence I will not say 2122241555.
</speak>

Example 2

Our second SSML snippet showcases additional tags to customize speech with aliasing (saying World Wide Web Consortium instead when it sees W3C) and volume.

<speak>
  He was caught up in the game. <break time="1s"/> 
  In the middle of 10/3/2014, <sub alias="World Wide Web Consortium">W3C</sub> meeting, he shouted <prosody volume="x-loud">Score!</prosody> quite loudly. 
  When his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect>, in a whisper.
</speak>

Additional Tags

API

You can easily integrate text-to-speech generation into your own apps by leveraging the Polly API.

You can use any language, but this example here uses Ruby. So, to run this example, you will need a modern version of Ruby and you will need to install the AWS SDK Ruby Gem.

Step 1

Step 2

Step 3

Step 4

#!/usr/bin/env ruby

#
# Install the following gems from your command line:
#     gem install 'aws-sdk'
#
require 'aws-sdk'
require 'YAML'

class Synthesizer
  SSML1 = <<-eod
    <speak>
      Thank you for calling today. Your phone number is <say-as interpret-as="telephone">2122241555</say-as>. <prosody rate="x-slow"> I know </prosody> this is not a number, hence I will not say 2122241555.
    </speak>
  eod

  SSML2 = <<-eod
    <speak>
      He was caught up in the game. <break time="1s"/> In the middle of 10/3/2014, <sub alias="World Wide Web Consortium">W3C</sub> meeting, he shouted <prosody volume="x-loud">Score!</prosody> quite loudly. When his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect>, in a whisper.
    </speak>
  eod

  def initialize(region: 'us-east-1')
    #
    # N.B. 1: Do not hard-code credentials in source code. 
    # N.B. 2: Do not source control a credentials file.
    #
    creds       = YAML.load(File.read 'aws.yml')
    credentials = Aws::Credentials.new(creds['access_key_id'], creds['secret_access_key'])

    # Polly client
    @polly = Aws::Polly::Client.new(
      region: region,
      credentials: credentials
    )
  end

  # Method for Text to Speech conversion
  def synthesize(text:, response_target: 'speech.mp3', output_format: 'mp3', voice_id: 'Joanna')
    request = {
      response_target: response_target,
      output_format: output_format,
      voice_id: voice_id,
      text_type: 'text',
      text: text
    }

    @polly.synthesize_speech(request)
  end

  # Method for SSML to speech conversion
  def synthesize(ssml:, response_target: 'speech.mp3', output_format: 'mp3', voice_id: 'Joanna')
    request = {
      response_target: response_target,
      output_format: output_format,
      voice_id: voice_id,
      text_type: 'ssml',
      text: ssml
    }

    @polly.synthesize_speech(request)
  end
end

synth = Synthesizer.new
synth.synthesize response_target: 'speech1.mp3', ssml: Synthesizer::SSML1
synth.synthesize response_target: 'speech2.mp3', ssml: Synthesizer::SSML2

Step 5

Step 6

This code will have used the Polly API and generated speech from the two SSML snippets in speech1.mp3 and speech2.mp3. Play the files with your in-built audio player.

Polly Workshop Instructions