Amazon Polly is a service that turns text into lifelike speech.
The instructions and demo code here help show how to quickly get started with Amazon Polly and easily integrate it into your existing applications.
We will first use the Polly console to convert sample sentences and SSML to speech.
We will then use the Polly API for programmatic generation of speech.
The IAM user that you're logged in as will require perissmissions from whoever administers your AWS account. Please see http://docs.aws.amazon.com/polly/latest/dg/api-permissions-reference.html
Before proceeding, you'll need an AWS account. And for the purposes of this workshop, please choose Virginia (us-east-1) as your region.
Polly generates speech from both text and SSML (Speech Synthesis Markup Language). SSML allows you to customize and control things like pronunciation, volume, and speed.
Our first SSML snippet below showcases tags to customize speech to account for phone numbers and variable speed.
<speak>
Thank you for calling today. Your phone number is <say-as interpret-as="telephone">2122241555</say-as>.
<prosody rate="x-slow"> I know </prosody> this is not a number, hence I will not say 2122241555.
</speak>
Our second SSML snippet showcases additional tags to customize speech with aliasing (saying World Wide Web Consortium instead when it sees W3C) and volume.
<speak>
He was caught up in the game. <break time="1s"/>
In the middle of 10/3/2014, <sub alias="World Wide Web Consortium">W3C</sub> meeting, he shouted <prosody volume="x-loud">Score!</prosody> quite loudly.
When his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect>, in a whisper.
</speak>
For a full list of all supported SSML tags and what they do, please see Polly documentation here: http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html
You can easily integrate text-to-speech generation into your own apps by leveraging the Polly API.
You can use any language, but this example here uses Ruby. So, to run this example, you will need a modern version of Ruby and you will need to install the AWS SDK Ruby Gem.
Make sure you have Ruby installed
➜ $ ruby --version
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin16]
Install the aws-sdk ruby gem
gem install aws-sdk
Create a file called aws.yml and add your AWS credentials to it.
#
# change this to your specific id and secret
#
access_key_id: ABCDEFGHIJKLMNOPQRST
secret_access_key: ABCDEFGHIJKabcdefghijk/abcdefghABCDEFGHI
Create a synthesizer.rb file with the following code
#!/usr/bin/env ruby
#
# Install the following gems from your command line:
# gem install 'aws-sdk'
#
require 'aws-sdk'
require 'YAML'
class Synthesizer
SSML1 = <<-eod
<speak>
Thank you for calling today. Your phone number is <say-as interpret-as="telephone">2122241555</say-as>. <prosody rate="x-slow"> I know </prosody> this is not a number, hence I will not say 2122241555.
</speak>
eod
SSML2 = <<-eod
<speak>
He was caught up in the game. <break time="1s"/> In the middle of 10/3/2014, <sub alias="World Wide Web Consortium">W3C</sub> meeting, he shouted <prosody volume="x-loud">Score!</prosody> quite loudly. When his boss stared at him, he repeated <amazon:effect name="whispered">"Score"</amazon:effect>, in a whisper.
</speak>
eod
def initialize(region: 'us-east-1')
#
# N.B. 1: Do not hard-code credentials in source code.
# N.B. 2: Do not source control a credentials file.
#
creds = YAML.load(File.read 'aws.yml')
credentials = Aws::Credentials.new(creds['access_key_id'], creds['secret_access_key'])
# Polly client
@polly = Aws::Polly::Client.new(
region: region,
credentials: credentials
)
end
# Method for Text to Speech conversion
def synthesize(text:, response_target: 'speech.mp3', output_format: 'mp3', voice_id: 'Joanna')
request = {
response_target: response_target,
output_format: output_format,
voice_id: voice_id,
text_type: 'text',
text: text
}
@polly.synthesize_speech(request)
end
# Method for SSML to speech conversion
def synthesize(ssml:, response_target: 'speech.mp3', output_format: 'mp3', voice_id: 'Joanna')
request = {
response_target: response_target,
output_format: output_format,
voice_id: voice_id,
text_type: 'ssml',
text: ssml
}
@polly.synthesize_speech(request)
end
end
synth = Synthesizer.new
synth.synthesize response_target: 'speech1.mp3', ssml: Synthesizer::SSML1
synth.synthesize response_target: 'speech2.mp3', ssml: Synthesizer::SSML2
Execute this file. You can do either of
$ ruby synthesizer.rb
or
$ chmod +x synthesizer.rb
$ ./synthesizer.rb
This code will have used the Polly API and generated speech from the two SSML snippets in speech1.mp3 and speech2.mp3. Play the files with your in-built audio player.