When you create an Amazon Lex chatbot you use slots to gather bits of information from the user. To ensure that the slots are capturing the right types of information, you have to give a Slot type to each of the slots.
Amazon provides a list of pre-made slot types that cover a lot of things such as numbers, names, cities and lots of other categories. This is great and I’ve found that about half of my slots can be set to an Amazon slot type.
If the slot type you need isn’t a default Amazon type then they let you build your own slot type. This is great as it theoretically allows for infinitely unique slot types that will match your user’s inputs perfectly.
There are 2 types of custom slot type you can create: Expanded Values and Slot values and Synonyms. Expanded values is where you provide some sample values and Amazon these as training data to learn to identify if the value is the correct slot type. Slot values and synonyms allows you to hard code slot values and then provide a list of synonyms that a use might say.
Recently we created a bot that needed to ask for a user’s postcode. Amazon doesn’t have a Postcode slot type so we needed to create a custom one. For this we would have to use Expanded values as we couldn’t just list every UK postcode.
The first attempt at a postcode type involved populating the list of example values with about 8 postcodes. I would have expected Lex to have learnt from those example postcodes but this results in some strange things happening. Sometimes it rejects about 1/3 of genuine postcodes, other times it lets almost anything through.
The whole point in slot types is to make sure that if the user types a valid value it will go though and incorrect values will be blocked. This method wasn’t good enough.
When creating the Expanded values slot type it lets you know that it uses the example values as training data. To improve the accuracy of the validator we should try adding more training data. This way it has more values to train and test against.
I downloaded a list of 999 real uk postcodes and wrote a script to build the slot type json file. I uploaded this to Lex and tested it out.
The fact that with 999 new postcodes it only took a few seconds to build implies that it isn’t running a machine learning training cycle. It must be using another method of matching the values.
When testing this out, it had some of the same problems as the first version. Whilst it did accept more real postcodes and reject the more insane values, it did still reject some real postcodes and accept some random values.
So having tried to get it working properly using Amazon’s methods we resorted to working around the system. We knew that there are some great rejex functions for postcode validation so we decided to give this a go.
We needed to design a new slot type that allowed everything through so we could pass it through to our own validation Lambda to perform the rejex. We copied our first slot type and then started adding random values that were between 3 and 8 characters long and included numbers, upper and lowe case letters, and spaces.
We added a initialisation and validation Lambda that took the value of the postcode slot and ran it through the rejex. If it succeeded then we let Lex carry on, else we re-elicited the postcode slot.
This worked much better than either of the previous methods, filtering out all of the nonsense values but still sometimes blocking valid postcodes before they even got through to the rejex validation.
I moved onto another part of the project whilst one of my colleagues started researching how other people dealt with similar issues. We couldn’t be the only ones hitting this problem.
Luckily we weren’t. Although a lot of the questions were about US postcode, there were some useful bits of information. The most important one was to try out the Music and Musician slot types. People suggested that they let any value through.
With this nugget of knowledge we created a new test for music and musician to see if the rumours were true.
Music was pretty good, letting through all postcodes with XXX XXX format but blocking all XXXX XXX postcodes. Musician is where we struck gold. This slot type appears to let absolutely anything through. When I say anything I mean anything, whether it was cat, dog or the lyrics to fresh prince.
This slot type has allowed our rejex to deal with whatever values users give it. You can use a validation Lambda to catch it when they say it or just run some validation checks at the start of your fulfilment Lambda.
- Using a few values to create a slot type wasn’t very good
- Using hundreds was better but not great
- Creating a slot type to capture anything was ok
- Using the Musician slot type lets anything through
Using this knowledge we can now create rejex slots validation, or any type of validation you want.
If you think you’ll use this in one of your chatbots or simply enjoyed seeing people hacking their way around Amazon then give this a like and follow me for more chatbot tips and tricks.