Although I've been programming professionaly for a couple of years now, most of my work has been rather narrowly focused on C# and .NET (and old versions at that), so I decided to start a new project and dive into completely unknown territory.
As such, I'm creating this new series where I'll create Flo, a virtual, AI powered and fully voice controlled assistant that will be able to engage in conversation as well as do various things through the power of RESTful APIs!! I'll be using Golang, a programming language I know nothing about too.
I hope any and all beginners can learn as much as I hopefully will during this series! Feel free to ask any questions or give guidance in the comments.
I'll be covering:
- Planning and thought process
- Structuring of tasks and problems
- Deconstruction of bigger goals/problems into smaller, more manageable pieces
- Learning a new programming language
- Using RESTful APIs and JSON
Things I'll be using (that I know of at the start):
Meet Flo(w)
The main focus of this project is Flo, a fully voiced and voice activated AI assistant that will have two jobs:
- Interact with multiple online services mainly via
RESTful APIs
A RESTful API is an architectural style for an application program interface (API) that uses HTTP requests to access and use data. This basically mean we can connect to thousands of services on the web and ask it to do many different things. For example, get a video from Youtube, add an appoitment to you Outlook Calendar or read from a Google Sheets document.
- Keep a fluent conversation by using OpenAI's API which allows it to "talk" to ChatGPT.
The two big technologies at work here are Google's DialogFlow and OpenAI's GPT. To make the whole "I talk, you answer" thing work though I will also need a couple of STT (Speech to Text) and TTS (Text to Speech) services. I'm thinking of using the Azure ones, but OpenAI also has these services. Vosk is also an offline option which would increase perfomance; it might come down to a matter of cost and ease of use!
How does the DialogFlo though?
DialogFlow is an incredibly powerful, yet complex tool which makes it even more fun to learn!
It is capable of recognizing speech patterns and it can be even be trained so it more easily recognizes intentions! You can then link these intentions (for example, setting up an appointment or asking what the weather is) to different parts of your program.
One really cool thing about DialogFlow is that it can also recognize when it doesn't have enough information to proceed.
As an example, lets say that you told it the following sentence:
- "Hey Flo, can you setup a dentists appointment for me please?"
The STT service would turn your voice into these words and then DialogFlow would go through and catch keywords that show your Intention and Context (more on this in following posts).
So Flo would filter the words "setup", "appointment", "dentist" and would be routed to a coded function that uses the Google Calendar API and schedules something. But you didn't tell Flo when to set this up. So Flo would reply with an AI voice (made by the TTS service) saying something like "When should I schedule this appointment to?".
Now, DialogFlow in itself is a very capable tool but you do have to code in the responses and so on and it might feel pretty... robotic 🤖 Also, Flo doesn't need to just be a mindless assistant when we have the power of..
Conversation with the GeePeeTee
ChatGPT is the most well known service offered by OpenAI and gladly we can have our app talk to it! This is actually simpler than using DialogFlow and is basically just sending it some prompts just like you do on the ChatGPT website. I'll be using it mostly to the same effect I use the website version for which is learning, asking random stuff about the world but also to give Flo more natural responses after she completes tasks.
For example, when Flo adds the dentist's appointment, I can run that information through ChatGPT and ask it to repeat it back in a natural, conversational way.
On the next episode...
We'll be going through some initial, yet important project setup!
- Setting up a basic Go project using VSCode
- Importing the packages we need to make this all work
- Setting up API Keys and using Environment Variables to keep them safe
- Using GitHub for version control
Cheers and bye for now! 😁
Top comments (0)