DEV Community

Jonathan Wheat
Jonathan Wheat

Posted on • Edited on

Impress your boss by giving your Rasa-bot a voice with Twilio for $0

Building and interacting with a chat-bot can be fun, but calling a number and actually talking to it for the first time over the phone is next level. It also gives others the ability to interact with your bot in a new way. And talk about CDD! (Conversation Driven Development. All hell breaks loose when people talk to it. It's a whole different animal.

This tutorial will walk you through setting up your Rasa-bot with Twilio, an online service that provides a host of products, one of which is called Programmable Voice.

Twilio is a paid service, however they offer a free trial (YEA!). They do this in a pretty cool way. They allocate a credit ($15 US) to your account and the trial lasts as long as you have money in your account. You can spend it on anything they offer, and obviously the more you do, the faster it goes.

I've setup voice (the main topic of this tutorial) as well as messaging (texting with my bot) and we'll hit on that at the end, it's pretty easy once you have the voice interface configured.

The best part is that you can run your bot from your own machine! You don't have to deploy it to a server or jump through any crazy hoops. This is especially good if you happen to have proprietary data that the bot utters, or collect or divulge any (PII)[https://www.dhs.gov/privacy-training/what-personally-identifiable-information].

Requirements

Here's a quick run down of the various tools we'll need, just in case you're the kind of person that has to know going in.

  • A phone 😆 or at least some way to make a voice call to a phone number
  • Rasa ^2.6.0
  • A bot to hook up (the out of the box rasa init will be fine)
  • Twilio Trial Account
  • NGrok

Getting Started (Twilio)

The first thing you need to do is get yourself a trial Twilio account. It's a pretty simple process, albeit there are a few steps.

  • Sign up
  • Verify your email address
  • Log in

Once you log in, you'll need to verify your phone number.

Next pick the following -

Which Twilio product are you here to use? - Voice
What do you plan to build with Twilio? - IVR & Bots
How do you plan to build with Twilio? - With minimal code
What is your preferred coding language? - Python
Would you like Twilio to host your code? - Yes, host my code

Finally, you'll be delivered to the console. Depending on when you're reading this, the current console may be either the "Legacy Console" or the new "Beta Console".

I want you on the Beta console, which, like I said could be the default one now. You'll know because there will be a button
in the header that reads "Try the beta Console".

Click that.
Complete the little form.

You should land on your Dashboard, and right there dead center is a blue button that says "Get a Trial Number"

Click that.

When you click it, it'll hand you a phone number. There's a big red button that reads "Choose this Number"

Click that.

You'll get a Congratulations modal with a Done button.

Click that. 😄

You'll land back on your dashboard, and you'll see your trial balance ($15.50) as well as your new trial phone number. If you call it, it's legit, it'll connect to Twilio, although not much happens after the initial message.

Also on your dashboard are 2 very important items we'll need shortly for Rasa. The account SIT and the Auth Token. Keep your screen there, and head over to your bot.

Configure your bot for Twilio Voice

The next thing you need to do is prep your bot for Twilio. I'm presuming you have a bot already setup. If you don't, create a new directory and run rasa init to spin one up.

Rasa has something called connectors. These connectors can be used to hook to different channels such as Slack, Messenger, Telegram, etc. We, as developers, can build custom connectors too, which is pretty great, they're crazy powerful. Fun fact, I actually started this adventure with a custom Twilio connector, a python file sitting in my project...

BUT... As of Rasa 2.6.0, Twilio connector is built in! That's not to say we still can't use a custom one, but to save us some angst with respect to this tutorial, we'll use the built in one. So be sure you have at least Rasa 2.6.0 installed.

credentials.yml

Open your credentials.yml, a file you probably haven't ever touched. We're going to add a configuration that will tell that internal Twilio connector we're going to use it.

First we'll configure the basic Twilio connector with the Account SID, Auth Token and phone number from our account.

Add the following to your credentials.yml file. You can add it above or below the default rest: and rasa: config that may already be there.

twilio:
  account_sid: <copy-paste-your-account-sid-here>
  auth_token: <copy-paste-your-auth-token-here>
  twilio_number: <copy-paste-your-phone-number-here>
Enter fullscreen mode Exit fullscreen mode

One thing to note for the twilio_number is to just enter the digits, no - or ( and it does require a leading country code (in the USA that's a 1) So it should look something like this:

  twilio_number: 12223334444
Enter fullscreen mode Exit fullscreen mode

Next we'll specify some custom parameters specifically for Twilio Voice. These are items that you can change and some will directly impact the "cost" of your account. I'll explain each one below.

Add this to your credentials.yml file as well

twilio_voice:
  initial_prompt: "hello"
  assistant_voice: "Polly.Salli"
  language: "en-GB"
  reprompt_fallback_phrase: "I didn't get that could you repeat?"
  speech_timeout: "1"
  speech_model: "default"
  enhanced: "false"
Enter fullscreen mode Exit fullscreen mode

Some quick explanation here for each piece.

inital_prompt is what will get sent to your Rasa-bot in the background to trigger it. Your bot won't respond unprompted, so this gets things started. If you're using the out-of-the-box default Rasa bot, this will trigger the greet intent. You can also adjust the response specifically for Twilio using the channel specific variations Rasa allows. I'll touch on that near the end.

assistant_voice is the voice that will be used on the phone when your bot responds. There is a giant list of voices you can choose from, both male and female and from many nationalities.

What's great is you can play around with all of these to see what they sound like in the Text to Speech Console. Choose "Basic" or "Amazon Poly" at the top, and it will load the different voices. It's not very apparent, but click a voice and a modal will display. Enter some text into the box, and then click the little play arrow directly over the box and it will speak whatever you've just entered.

Once you find one you like, you can update the assistant_voice parameter accordingly. Use the Voice Name from the console (ie. Amy) and if you're using the Amazon Poly voice, add a Polly. before the voice name like this Polly.Amy. If you're using the Basic voice, you can just add the name alone, there's no need to prefix it with Basic or anything.

One thing to consider is that the Amazon Poly voices cost against your account, but it's literal fractions of a penny per character and if you're trying to impress your boss, then you want a great sounding voice, so this is worth the incremental cost.

Amazon has another voice system called Neural, these voices are more human, more robust and as you can imagine, cost more. But you have free money, so play around.

language is easy, just look at the voice list (or the Language code in the speech console) it'll be in the parenthesis ( ) and enter that. The voice names overlap, and some have the same names but different dialects and accents based on the language / nationality.

reprompt_fallback_phrase - this is functionally similar to Rasa's Default Fallback, however it is for Twilio. If for some reason it can't even attempt to understand you, or translate what you said to text, it will speak back this message instead of sending a garbled stream of characters to your bot. Because it is a different configuration you can make it a different phrase than your utter_default response.

speech_timeout Speech is different than text. When you complete a text message you hit send and the bot knows to process what you've typed. When speaking, the bot doesn't know when you're done, so this timeout is how much silence it will wait for before it attempts to send what you said to your bot. The shorter the time, the quicker response, however if you have someone that talks slow, a 1 second delay could produce a bot that interrupts that person. And if it didn't get the complete phrase it may ask them to repeat it when they never finished it. The flip side to this is that if it is too long, the system feels laggy, even though it's acting properly. I changed mine to 1 second because I talk quicker and when I demo the prototype, I want as little lag-feel as possible.

speech_model This is the model that Twilio will use when performing it's STT (Speech to Text) and TTS (Text to Speech) functions. Values here are limited to default, numbers_and_commands and phone_call. If you read the docs on these settings, default is best for talking to Rasa, unless of course you're building a number menu tree like many phone systems have in place, in which case you want the phone_call. If you are speaking short phrases and keywords, choose numbers_and_commands. These different models are trained up to specialize on the content that is being spoken.

enhanced This allows you to enable (or disable) Twilio's enhanced premium speech transcription (STT) service. This is a very good system and seems to nail the transcriptions better than when it's off. However, this will impact your account balance (costs about 50% more), so for the time being, unless you really have problems with your bot responding to your voice, I'd leave it at false most of the time. But then again, you have free money, just make sure you don't burn through it before you wow your boss 😉.

Believe it or not, that's all you have to do to prep your bot for voice, so it's now ready to respond to Twilio's API calls.

Lets fire it up and I'll show you how to get it wired up to Twilio.

Run this command -

rasa run -m models --enable-api --log-file out.log --cors "*" --debug
Enter fullscreen mode Exit fullscreen mode

That may take a bit to start up, but eventually you'll see the wonderful message -

Rasa server is up and running.
Enter fullscreen mode Exit fullscreen mode

I hear you saying - "But it's on my laptop! Am I going to call my laptop?"

Yes it is and no you're not, well not directly. We'll open it up to Twilio through a secure tunnel - which is WAY easier than it sounds, and infinitely easier than trying deploy it on a server.

That said - if you have one running on a server, you can certainly grab the rasa server url and skip the next section.

Getting your bot accessible from the internet

We'll use a utility called Ngrok to set this up. If you don't have it installed already, go to Ngrok's site, download and install it on your system. Once it's installed, open a new console, or Command Prompt if you're on Windows and type

ngrok http 50005
Enter fullscreen mode Exit fullscreen mode

You're looking for something like this to come up -

ngrok by @inconshreveable                                                                               (Ctrl+C to quit)

Session Status                online
Session Expires               1 hour, 57 minutes
Version                       2.3.40
Region                        United States (us)
Web Interface                 http://127.0.0.1:4040
Forwarding                    http://f43f711b19b9.ngrok.io -> http://localhost:5005
Forwarding                    https://f43f711b19b9.ngrok.io -> http://localhost:5005
Connections                   ttl     opn     rt1     rt5     p50     p90
                              0       0       0.00    0.00    0.00    0.00

Enter fullscreen mode Exit fullscreen mode

If you see this, we're golden, it's all working.
What this just did was setup a secure tunnel for port 5005 - where the Rasa server is running - and it maps it to a secure (yet publicly accessible url https://f43f711b19b9.ngrok.io

One thing to note is that the tunnel will only last for two hours. Notice the "Session Expires" time there. That will actively count down before it collapses the tunnel. And you just have to CTRL+C and run it again to generate a new tunnel.

>>>> UPDATE <<<<

You can create a free account on Ngrok's site and then your tunnel won't expire every hour!

Cool right?

All the information you need to do this is right there, however you can see there is a web interface at http://127.0.0.1:4040 If you hit that in a browser, you'll see the url's listed, but to get a crazy amount of information about the tunnel, click the status link. We don't need to know any of that for what we're doing but it's good to know it's there though. So back to the setup.

Configure the Twilio Webhook.

One more step and we're ready to call our bot. We need to Tell Twilio that our bot is temporarily living at the new Ngrok url

Go to your Twilio account, click the > next to Phone Numbers
Then click the > next to Manage
And click Active Numbers

Click your number to get to the configuration screen for that specific number.

Scroll down to the Voice & Fax section

In the row that says "When a call comes in"
Make sure it says "Webhook"

We want to change the first part of the URL to our ngrok email
The final url should look something like this

https://f43f711b19b9.ngrok.io/webhooks/twilio_voice/webhook
Enter fullscreen mode Exit fullscreen mode

Make sure you copy in the url from your Ngrok terminal.

And that's it!

Time to try it out

Grab your phone, call your Twilio number. A brief trial message will play telling you to hit a number to execute your code. This will send your initial_prompt: value through Ngrok to your local instance of Rasa. When it does, you'll see the webhook hit Ngrok in the console, and then you'll see your rasa terminal spin like a top (because we launched it in debug mode).

Rasa will process what came in from Twilio, and send back text and Twilio will translate that into speech, and your bot will talk back to you! (now quickly ask for a raise)

There are times in your bot where you may have a response like this -

  utter_greet:
    - text: Hello, I am TwilBot. \nPlease let me know how I can help you
Enter fullscreen mode Exit fullscreen mode

If this is read to you, Twilio (and any other TTS engine) will read back the line break \n and it will say "backslash n". But we want that line break there for our web chat component. So what to do????

Remember the channel specific responses we mentioned before? This is there those come in handy. Just change the response to this, and you're golden.

  utter_tell_greet:
    - text: Hello, I am TwilBot. \nPlease let me know how I can help you
      channel: "twilio_voice"
    - text: Hello, I am TwilBot, thank you for calling. Please let me know how I can help you
Enter fullscreen mode Exit fullscreen mode

My only gripe with the channel formatting looks a bit odd, having the channel: with the same indent level as the - text: response above it, and then having the channel-specific response at that same indent level. At first I thought it was a typo in the docs, but sure enough that's how it should look.

Save that and any other channel changes you want to make, train your bot and re-run it using the same command we had before

rasa run -m models --enable-api --log-file out.log --cors "*" --debug
Enter fullscreen mode Exit fullscreen mode

Wait for this message...

Rasa server is up and running.
Enter fullscreen mode Exit fullscreen mode

And as long as your Ngrok tunnel hasn't expired, you can call again and it should greet you properly.

Congratulations

Great job, you made it to the end and hopefully your boss is excited and ready to give you that raise.

But wait - there's more!!!

Setting up Messaging

Speaking to your bot is cool, but texting your bot is pretty awesome too. Here's how you configure that.

The number you chose will support both voice and SMS it just needs to be configured. This was a little confusing at first, because you need to setup what they call a Messaging Service. All this is, is a configured name, it's nothing involved. I thought you'd need some other service to hook in, but really you just need to set a name and go. Let's do that.

Explore Products > Messaging

This will land you on the Programmable Messaging Dashboard. There's a blue button that reads - Start Sending Messages

Click that 😆

An Modal will open and want to know the Messaging Service Name, so name it. It's arbitrary - you can call it "My Messaging Service" or you can call it "SendIt", heck you could call it AT&T if you wanted to, not sure why you would though.

For the Messaging Service Use Case on the same modal, choose one that fits our needs - "Engage in a discussion"

The blue button there that says Create - you guessed it - Click it.

In the left nav, click Integration
Then choose - Send a Webhook

In the Request URL box, enter our NGrok url, the same one you entered for Voice.

Click Save. (yea I know I like to switch it up a bit)

A few more steps and we're done.

In the top navbar, click the "My First Twilio Account" link, then navigate back down to the phone number screen

Phone Numbers > Manage > Active numbers > click your number

Scroll down to the Messaging section and select the Messaging Service you setup.

Then in the Webhook url - paste in your NGrok URL, the same one you entered for the voice config.

Click Save.

THATS IT!!

Now jump on your phone and text hello to the number you have setup. It should reply with a message that starts with "Send from your Twilio trial account -". This text will prefix all text messages received from your bot. Obviously it will go away if you upgrade.

Whelp, that's all I got. Hope this go you that raise you were looking for #fingersCrossed

Top comments (1)

Collapse
 
emmajadew profile image
Emma Jade Wightman

Absolutely brilliant!