DEV Community

Cover image for How to Build an Alexa Clone with Python and OpenAI
Graham Patrick
Graham Patrick

Posted on

How to Build an Alexa Clone with Python and OpenAI

In this blog post, I will show you how to build a simple voice assistant app with Python and OpenAI. The app is called Peppa, and it is inspired by Pepper Potts from Iron Man. Peppa can accept voice commands from the user and respond with voice output. Peppa can also connect to OpenAI and use its powerful natural language processing capabilities to handle complex requests.

What You Will Need
To follow along with this tutorial, you will need the following:

A computer with Python 3 installed. You can download Python 3 from this source.

A microphone and speakers or headphones to interact with the app.
An OpenAI API key. You can request access to OpenAI from this source.

The following Python modules: ChatterBot, gtts, playsound, speech_recognition, and openai. You can install them with pip by running the following command in the terminal:

pip install chatterbot gtts playsound speech_recognition openai
Enter fullscreen mode Exit fullscreen mode

How the App Works
The app works as follows:

The app greets the user with voice output and asks for their name.
The app gets the user’s name with voice input and greets them with voice output and name.
The app asks the user if they need to hear its voice output and gets the user’s preference with voice input.
The app asks the user what they need help with and gets the user’s request with voice input.
The app processes the user’s request and responds with voice output.
The app uses the following modules for different purposes:

ChatterBot: This module is used to create a chatbot object that can handle simple conversational requests. The chatbot uses logic adapters to generate responses based on the user’s input. In this case, we use the BestMatch and TimeLogicAdapter logic adapters to handle general and time-related requests.

gtts: This module is used to convert text to speech using Google Text-to-Speech. This module allows us to generate voice output for the app.

playsound: This module is used to play sound files using the default audio player. This module allows us to play the voice output generated by gtts.

speech_recognition: This module is used to recognize speech from audio using Google Speech Recognition. This module allows us to get voice input from the user.

openai: This module is used to interact with the OpenAI API. This module allows us to use the natural language processing capabilities of OpenAI to handle complex requests.
The Code Explained

Now that we have an overview of how the app works and what modules we need, let’s look at the code in detail. The code is divided into several sections, each with a comment explaining its purpose.

Importing the required modules
The first section of the code imports the modules that we need for the app. We also fix a compatibility issue with the collections module by assigning the Hashable attribute to the collections.abc.Hashable class.

# Importing the required modules
from chatterbot import ChatBot
import gtts
from playsound import playsound
import collections.abc
import speech_recognition as sr
import openai

# Fixing a compatibility issue with collections module
collections.Hashable = collections.abc.Hashable
Enter fullscreen mode Exit fullscreen mode

Creating a speech recognizer object

The next section of the code creates a speech recognizer object that we will use to get voice input from the user. We assign the object to the variable r.

# Creating a speech recognizer object
r = sr.Recognizer()
Enter fullscreen mode Exit fullscreen mode

Creating a chatbot object with logic adapters

The next section of the code creates a chatbot object that we will use to handle simple conversational requests. We assign the object to the variable bot. We also specify the storage adapter, the database URI, and the logic adapters for the chatbot. The storage adapter is used to store the chatbot’s data in a SQLite database. The database URI is the path to the database file. The logic adapters are used to generate responses based on the user’s input. We use the BestMatch and TimeLogicAdapter logic adapters to handle general and time-related requests.

# Creating a chatbot object with logic adapters
bot = ChatBot(
    "Pepper",
    storage_adapter="chatterbot.storage.SQLStorageAdapter",
    database_uri="sqlite:///db.sqlite3",
    logic_adapters=["chatterbot.logic.BestMatch", "chatterbot.logic.TimeLogicAdapter"],
)
Enter fullscreen mode Exit fullscreen mode

The next section of the code sets the OpenAI API key that we will use to connect to the OpenAI API and use its natural language processing capabilities. We assign the API key to the variable api_key and set it as the openai.api_key attribute.

# Setting the OpenAI API key
api_key = ""
openai.api_key = api_key
Enter fullscreen mode Exit fullscreen mode

Printing the app name in ASCII art

The next section of the code prints the app name in ASCII art to make it more attractive and fun. We use a triple-quoted string to store the ASCII art and print it with the print function.

# Printing the app name in ASCII art
print(r"""\
              _          _            _          _          _            
        /\ \       /\ \         /\ \       /\ \       / /\          
       /  \ \     /  \ \       /  \ \     /  \ \     / /  \         
      / /\ \ \   / /\ \ \     / /\ \ \   / /\ \ \   / / /\ \        
     / / /\ \_\ / / /\ \_\   / / /\ \_\ / / /\ \_\ / / /\ \ \       
    / / /_/ / // /_/_ \/_/  / / /_/ / // / /_/ / // / /  \ \ \      
   / / /__\/ // /____/\    / / /__\/ // / /__\/ // / /___/ /\ \     
  / / /_____// /\____\/   / / /_____// / /_____// / /_____/ /\ \    
 / / /      / / /______  / / /      / / /      / /_________/\ \ \   
/ / /      / / /_______\/ / /      / / /      / / /_       __\ \_\  
\/_/       \/__________/\/_/       \/_/       \_\___\     /____/_/  

      """)
Enter fullscreen mode Exit fullscreen mode

Greeting the user with voice output and asking for their name

The next section of the code greets the user with voice output and asks for their name. We use the gtts module to convert text to speech and save it as a sound file. We use the playsound module to play the sound file using the default audio player.

# Greeting the user with voice output and asking for their name
tts = gtts.gTTS("Hello, My name is Peppa! And who might you be? ")
tts.save("greeting.mp3")
playsound("greeting.mp3")
Enter fullscreen mode Exit fullscreen mode

Getting the user’s name with voice input and greeting them with voice output and name

The next section of the code gets the user’s name with voice input and greets them with voice output and name. We use the speech_recognition module to recognize speech from audio using Google Speech Recognition. We use the microphone as the source for input and adjust the noise level based on the surrounding noise. We listen for the user’s input and recognize it with Google. We print the recognized input and use it to personalize the greeting.

# Getting the user's name with voice input and greeting them with voice output and name
with sr.Microphone() as source:
    # Adjusting the noise level
    r.adjust_for_ambient_noise(source, duration=0.2)
    # Listening for the user's input
    audio = r.listen(source)
    # Recognizing the user's input with Google
    name = r.recognize_google(audio)
    print("I just heard ", name)

# Greeting the user with voice and name
greeting = "Hello! " + name
tts = gtts.gTTS(greeting)
tts.save("personal.mp3")
playsound("personal.mp3")
Enter fullscreen mode Exit fullscreen mode

Asking the user if they need voice output and getting the user’s preference with voice input

The next section of the code asks the user if they need voice output and gets the user’s preference with voice input. We use the same technique as before to convert text to speech, play sound files, and recognize speech from audio.

# Asking the user if they need voice output
tts = gtts.gTTS("Do you need to hear my voice? Yes or no?")
tts.save("speech.mp3")
playsound("speech.mp3")

# Getting the user's preference with voice input
with sr.Microphone() as source:
    # Adjusting the noise level
    r.adjust_for_ambient_noise(source, duration=0.2)
    # Listening for the user's input
    audio = r.listen(source)
    # Recognizing the user's input with Google
    speech = r.recognize_google(audio)
    print("I just heard ", speech)
Enter fullscreen mode Exit fullscreen mode

Asking the user what they need help with and getting the user’s request with voice input

The next section of the code asks the user what they need help with and gets the user’s request with voice input. We use the same technique as before to convert text to speech, play sound files, and recognize speech from audio. We also print the user’s request for confirmation.

# Asking the user what they need help with
if speech == "yes":
    tts = gtts.gTTS("and what is it I can help you with " + name + " ?")
    tts.save("text.mp3")
    playsound("text.mp3")
print("What is it I can help you with today " + name + " ?")

# Getting the user's request with voice input
with sr.Microphone() as source:
    # Adjusting the noise level
    r.adjust_for_ambient_noise(source, duration=0.2)
    # Listening for the user's input
    audio = r.listen(source)
    # Recognizing the user's input with Google
    request = r.recognize_google(audio)
    print("I just heard ", request)
Enter fullscreen mode Exit fullscreen mode

Starting a loop to handle the user’s requests

The final section of the code starts a loop to handle the user’s requests and respond with voice output. We use a while loop to keep the app running until the user decides to quit. We use an if-elif-else statement to check the user’s request and generate a response accordingly. We use the following logic to handle different types of requests:

If the user’s request is “quit” or “exit”, we break the loop and end the app.
If the user’s request is “openai”, we use the openai module to connect to the OpenAI API and use its natural language processing capabilities to handle complex requests. We use the openai.Completion.create method to create a completion object that contains the response from the OpenAI API. We use the text attribute of the completion object to get the response as a string. We use the gtts module to convert the response to speech and play it with the playsound module.
If the user’s request is anything else, we use the chatterbot module to handle simple conversational requests. We use the get_response method of the chatbot object to get a response based on the user’s input. We use the gtts module to convert the response to speech and play it with the playsound module.

# Starting a loop to handle the user's requests and respond with voice output
while True:
    # Getting the user's request with voice input
    with sr.Microphone() as source:
        # Adjusting the noise level
        r.adjust_for_ambient_noise(source, duration=0.2)
        # Listening for the user's input
        audio = r.listen(source)
        # Recognizing the user's input with Google
        request = r.recognize_google(audio)
        print("I just heard ", request)

    # Checking the user's request and generating a response accordingly
    if request == "quit" or request == "exit":
        # Breaking the loop and ending the app
        break
    elif request == "openai":
        # Using the OpenAI API to handle complex requests
        completion = openai.Completion.create(
            engine="davinci",
            prompt=request,
            max_tokens=150,
            temperature=0.9,
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0.6,
            stop=["\n"]
        )
        # Getting the response from the OpenAI API
        response = completion["choices"][0]["text"]
        # Converting the response to speech and playing it
        tts = gtts.gTTS(response)
        tts.save("response.mp3")
        playsound("response.mp3")
    else:
        # Using the chatbot to handle simple conversational requests
        response = bot.get_response(request)
        # Converting the response to speech and playing it
        tts = gtts.gTTS(str(response))
        tts.save("response.mp3")
        playsound("response.mp3")
Enter fullscreen mode Exit fullscreen mode

The Result
That’s it! We have completed the code for our voice assistant app. We can run the app by running the following command in the terminal:

python app.py
Enter fullscreen mode Exit fullscreen mode

This will start the app and greet us with voice output. We can then interact with the app using voice input and get voice output. We can ask the app anything we want, such as the time, the weather, or a joke. We can also use the keyword “openai” to access the OpenAI API and ask more complex questions, such as the meaning of life,

Top comments (0)