DEV Community

loading...

Make your own Alexa Skill without using Lambda

appandflow profile image App & Flow ・9 min read

illustration

“Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.” — Ian, Jurrasic Park

Why should you even bother making your own custom Alexa Skill without using Lambda?

Actually, a few reasons come to mind :

  • You don’t like depending on other services

  • You enjoy tinkering / you consider yourself a DIY-er
    meme

  • Your Alexa Skill will be used alongside an existing service that already has its own backend (like a mobile app or a web page) and you’d rather this backend handles everything.

It’s obvious from this list that doing all of this isn’t for everyone. Actually, more people would benefit from using Lambda, as long as you don’t go over their very generous 1 million requests / month free-tier. Check out Amazon’s list of why you should use Lambda just to get an idea.

You’re still reading? Good. Just before we dive deeper, here’s a fictional service that we’ll use as a reference point throughout this article :

Your startup has a mobile app that keeps track of bus schedules. Groundbreaking, never-seen-before, edgy, right? Anyway, this mobile app communicates with your backend that handles communication with the transit company’s API. Users can create their own account and keep track of their favorite bus routes. Your app is doing well already, but so many users are writing to tell you that it would perfect if only you had an integration with Alexa. This is how you embarked on this journey.

You’ll need the following to be able to complete this tutorial :

  • A node.js backend hosted somewhere like on DigitalOcean or AWS. (any backend would do, you can recycle the concepts used here on pretty much anything)

  • A website that lets users log in to their account

  • A few use cases for the skill

  • A mobile phone with the Alexa app installed (no need to have an actual Alexa device!)

Use cases

Coming back to our bus schedule startup, some good ideas for use cases could be :

  • Alexa, when is the next 105 passing? -> Should tell me the amount of minutes to the next bus passing. For example “The next 105 passes in 10 minutes”.

  • Alexa, are there any interruptions in the metro today? -> The transit company’s API can tell us wether or not there are interruptions at the present time. For example “Yes, the purple line is down until 9:15PM”.

  • Alexa, what is the next bus? -> If the user has set up 2 busses leaving from their home, this service could tell them which one of these busses is passing next. For example “The next bus is the 105 which is passing in 5 minutes”.

Setting up the Alexa Developer Console

  • Create an account if you don’t already have one on Developer Amazon

  • Go to the Alexa Developer Console

  • Create a new skill : give it a name, use the “custom” template and “Start from scratch”. You should see this :

screenshot

This is where you’ll be doing most of the “Alexa developer” work. The following list is a short summary of this Alexa page :

  • Intents : An intent represents an action that fulfills a user’s spoken request

  • Utterances : A set of likely spoken phrases mapped to the intents

  • Custom slot types : A representative list of possible values for a slot

So coming back to our use case “Alexa, when is the next 105 passing?”, this utterance would be handled by an intent that we can call findNextBus for example. The 105 will be a custom slot type that we can define as busNumber that has the type number.

As this tutorial isn’t meant to be a “how to create an Alexa skill” but more how to make it work without lambda, I’ll let you read one of the many articles on the subject (or just figure it out as you go along, it’s really nothing special).

The endpoint section should be set to HTTPS and should point towards a route that handles Alexa’s requests (ie : https://api.mywebsite.com/alexaAction). During development, you can use ngrok to simulate an HTTPS connection, just make sure to set the SSL certificate type to the one that says “[..]is a subdomain of a domain that has a wildcard certificate[..]”.

The account linking section is optional in case you don’t plan on having users sign in to their account. For our example, we’ll need to set it up. These are the fields you’ll need to fill in this section :

  • Authorization URI : The URI where customers will be redirected to in the companion app to enter login credentials.

  • Client Id : Unique public string used to identify the client requesting for authentication. You can use your preferred way of generating strings (here’s some for inspiration) or just let your cat walk over your keyboard, your call. Just keep it somewhere as your backend will have to validate this client id.

That’s about it for the Alexa Developer stuff. Once you’ll have something functional, you can apply for certification.

Setting up your backend

Let’s assume for this example that you’re using a simple MVC-inspired “router → controller → service” kind of pattern on your backend.

Usually, this would mean your route /alexaAction would call a controller, who would in turn call the service; the service would do the job, return the information to the controller that takes care of sending the info back. But in our case, we first need to make sure that the network request is actually coming from Amazon, and the easiest way I’ve found is to use an auth middlewear. But it doesn’t end there. The only way to make sure that the request really comes from Alexa is to have access to the raw request body, before body parser does it’s job. This means that your Alexa route cannot be mixed in with your current router, it will have to be separate. Your app.ts will look like this :

app.post(/alexaAction, alexaAuth, alexaActionPost);
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));

For the alexaAuth middlewear, I heavily inspired myself from the lib alexa-verifier-middlewear. It wasn’t exactly what I was looking for, so I made my own middlewear with the code :

import { NextFunction, Response } from 'express';

import { Request } from '../types';
import verifier from 'alexa-verifier';

const auth = (req: Request | any, res: Response, next: NextFunction) => {
  try {
    if (req._body) {
      const er = 'The raw request body has already been parsed.';
      return res.status(400).json({ status: 'failure', reason: er });
    }
    req._body = true;
    req.rawBody = '';
    req.on('data', data => {
      return (req.rawBody += data);
    });

    req.on('end', () => {
      let er: any;
      try {
        req.body = JSON.parse(req.rawBody);
      } catch (error) {
        er = error;
        req.body = {};
      }

      const certUrl = req.headers.signaturecertchainurl;
      const signature = req.headers.signature;

      verifier(certUrl, signature, req.rawBody, (error: any) => {
        if (error) {
          res.status(400).json({ status: 'failure', reason: error });
        } else {
          next();
        }
      });
    });
  } catch (e) {
    req.user = null;
    return res.status(400).json({ message: 'Unauthorized' });
  }
};

export default auth;

With this in place, your backend is listening to the route /alexaAction and we can be sure that anything that gets to it will be coming from Amazon.

Up next, you’ll need a way to handle the post itself. I’ll explain the bigger picture, but you should implement it in whichever way you want. Also, I’ll explain the flow that includes user authentification, so if you don’t intend on doing that you’ll be able to skip certain parts.

To start, you’ll need to get session, context, request from the body’s request. You’ll also need applicationId from context as well as type from request.

const { session, context, request } = req.body;
const { applicationId } = context.System.application;
const { type } = request;

Then follow these steps :

  • validate that the applicationId is the same as your alexaSkillId

  • check the type : a LaunchRequest type should return an introductory message that will ask the user what they’d like to know about your service (for example “How can I help you with the bus schedules today?”) whereas an IntentRequest signals that the user is asking a question that needs an answer (like “When is the next 105 passing?”)

  • if you get an IntentRequest, you’ll be able to find the user’s accessToken like this : session.user.accessToken . You should use your own validation system to validate the token (this token is what your frontend (where you handle your login) will be giving Amazon once your user logs in, more on that later)

  • remember the list of intents that you created such as findNextBus? You’ll need to provide an answer. The intent can be found here request.intent . Personally, I made a simple switch that covers all possible intents. If you have custom slots, they can be found at request.intent.slots .

A very barebones, watered down, happy-path, no error management version of all this would like something like this :


function handleAlexaQuery() {
  if (applicationId === config.alexaSkillId) {
    if (type === 'IntentRequest') {
      if (session.user.accessToken) {
        // authenticate your accessToken
        if (authenticated) {
          const { name } = request.intent;
          if (name === 'findNextBus') {
            const busNumber = request.intent.slots.busNumber.value;
            if (busNumber) {
              // generate logic that will answer when the next bus is passing
            }
          }
        }
      }
    }
  }
}

At the end of the day, you want to take the text you’ve generated and send it back to amazon. It needs to be in this format :

response = {
  response: {
    outputSpeech: {
      type: 'SSML',
      ssml: `<speak>${speechText}</speak>`,
    },
    reprompt: {
      outputSpeech: {
        type: 'SSML',
        ssml: '<speak>Could you repeat?</speak>',
      },
    },
    shouldEndSession,
  },
  version: '1.0',
  sessionAttributes: {},
};

In this example, speechText is the text that you want Alexa to say. There are many intonations and ways of pronouncing words using ssml but this way is the most basic one. shouldEndSession should be either true or false, depending on your use case : sometimes you want to close the skill after the user has answered, other times you want to keep it open.

If a user is not authenticated yet, or started authentification and didn’t go through with it successfully, Amazon forces you to show the user a card that pops up and asks the user to sign in. You have to add

card: {
  type: LinkAccount,
},

to your response, within the response attribute.

Setting up the frontend (website)

  • In the Alexa app, the user will add the skill and will see a big button “Enable to use”. This button will redirect to your website, the user will log in, and if successful, they’ll be synced.

  • Upon loading, your website will have to take three paramaters from the search params (or query params if you prefer) : state, client_id and redirect_uri

  • Amazon will give you a few acceptable redirectURLs, you need to make sure that your website verifies this and gives an error otherwise. You’ll find the list of redirectURLs in the Account Linking section.

  • You also need to verify your clientId (the one you generated earlier on) to make sure it’s valid

  • Once the user logs in, the last thing left is to create a new url based on the the params you isolated earlier (redirectURI, state, access_token as well as adding token_type=bearer) and navigate to that new url.

That’s it.

meme2

Recap

Now you’ll have your Alexa Developer Console, your backend and your website working together :

  • Your bus schedule user can sync their current account to Alexa by using the Alexa app and selecting “Enable to use”. This will open up your …

  • …website. They will log in to their bus schedule account. When they’ll ask “Alexa, when is the next 105 passing?”, this will talk with your…

  • …backend that will handle the query and answer back to Alexa. Your backend must handle all queries that you’ve defined in your…

  • …Alexa Developer Console.

“Now my boss is asking me to implement Okay Google as well!”

Fear not, most of the info here could be recycled into having your own Google Home action. Theoretically, on the backend side of things, the part that creates the speech response could be made generic enough to work for both services if the same intents are developed on Alexa and Google. The website part is also almost the same, it’s mostly the Actions on Google part that is different. The vocabulary is similar for many terms too, and you can deduce the other ones like Alexa Skill = Google Action.

One thing to keep track of with Google is that they have 2 types of Actions : Conversational Actions and Direct Actions. What you’ll be looking to implement is Conversational Actions, as Direct Actions are for cases when you have a smart device that you’d like to sync with your smart home and all that jazz. It’s a different ballpark altogether.

Google has their own Lambda equivalent called Dialogflow, which you won’t be using hehehe. Their documentation is also pretty straightforward. Enjoy!

Discussion

pic
Editor guide