DEV Community

Ashutosh Krishna
Ashutosh Krishna

Posted on • Originally published at iread.ga on

Building our own J.A.R.V.I.S. using Python - Part I

Image description

Introduction

Do you remember J.A.R.V.I.S., Tony Stark's virtual personal assistant? I'm sure you do!

Have you ever wondered about creating your own personal assistant? Yes? Tony Stark can help us with that! Oops, did you forget he is no more? It's sad that he cannot save us anymore.

But hey, your favorite language Python can help you with that. Yes, you heard it right. We can create our own J.A.R.V.I.S. using Python. Let's roll it!

Project Setup

During the development of the project, we'll come across various modules and external libraries. Let's learn and install them. But before we install them, let's create a virtual environment and activate it.

We are going to create a virtual environment using virtualenv. Python now ships with a pre-installed virtualenvlibrary. So, to create a virtual environment, you can use the below command:

$ python -m venv env
Enter fullscreen mode Exit fullscreen mode

The above command will create a virtual environment named env. Now, we need to activate the environment using the command:

$ . env/Scripts/activate
Enter fullscreen mode Exit fullscreen mode

To verify if the environment has been activated or not, you can see (env) in your terminal. Now, we can install the libraries.

  1. pyttsx3: pyttsx is a cross-platform text to speech library which is platform-independent. The major advantage of using this library for text-to-speech conversion is that it works offline. To install this module type the below command in the terminal.
$ pip install pyttsx3
Enter fullscreen mode Exit fullscreen mode
  1. SpeechRecognition : ** ** It allows us to convert audio into text for further processing. To install this module type the below command in the terminal.
$ pip install SpeechRecognition
Enter fullscreen mode Exit fullscreen mode
  1. pywhatkit: It is an easy-to-use library that will help us interact with the browser very easily. To install the module, run the following command in the terminal.
$ pip install pywhatkit
Enter fullscreen mode Exit fullscreen mode
  1. wikipedia: It is used to fetch a variety of information from the Wikipedia website. To install this module type the below command in the terminal.
$ pip install wikipedia
Enter fullscreen mode Exit fullscreen mode
  1. requests: It is an elegant and simple HTTP library for Python that allows you to send HTTP/1.1 requests extremely easily. To install the module, run the following command in the terminal:
$ pip install requests
Enter fullscreen mode Exit fullscreen mode

.env File

We need this file to store some private data such as API Keys, Passwords, etc related to the project. For now, let's store the name of the user and the bot.

Create a file named .env and add the following content there:

USER=Ashutosh
BOTNAME=JARVIS
Enter fullscreen mode Exit fullscreen mode

To use the contents from .env file, we'll install another module called python-decouple as:

$ pip install python-decouple
Enter fullscreen mode Exit fullscreen mode

Learn more about Environment Variables in Python here.

Setting up JARVIS

Before we start defining a few important functions, let's create a speech engine first.

import pyttsx3
from decouple import config

USERNAME = config('USER')
BOTNAME = config('BOTNAME')

engine = pyttsx3.init('sapi5')

# Set Rate
engine.setProperty('rate', 190)

# Set Volume
engine.setProperty('volume', 1.0)

# Set Voice (Female)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)
Enter fullscreen mode Exit fullscreen mode

Let's analyze the above script. First of all, we have initialized an engine using the pyttsx3 module. sapi5is a Microsoft Speech API that helps us use the voices. Learn more about it here. Next, we are setting the rateand volumeproperties of the speech engine using setPropertymethod. Now, we can get the voices from the engine using the getPropertymethod. voiceswill be a list of voices available in our system. If we print it, we can see as below:

[<pyttsx3.voice.Voice object at 0x000001AB9FB834F0>, <pyttsx3.voice.Voice object at 0x000001AB9FB83490>]
Enter fullscreen mode Exit fullscreen mode

The first one is a male voice and the other one is a female voice. JARVIS was a male assistant in the movies, but I've chosen to set the voice property to the female for this tutorial using the setProperty method.

Note: If you get an error related to PyAudio, download PyAudio wheel from here and install it within the virtual environment.

Also, using the configmethod from decouple, we are getting the value of USERand BOTNAMEfrom the environment variables.

1. Speak Function

Speak function will be responsible to speak whatever text is passed to it. Let's see the code:

# Text to Speech Conversion
def speak(text):
    """Used to speak whatever text is passed to it"""

    engine.say(text)
    engine.runAndWait()

Enter fullscreen mode Exit fullscreen mode

In the speak() method, the engine speaks whatever text is passed to it using the say() method. Using the runAndWait() method, it blocks during the event loop and returns when the commands queue is cleared.

2. Greet Function

This function will be used to greet the user whenever the program is run. According to the current time, it greets Good Morning, Good Afternoon, or Good Evening to the user.

from datetime import datetime

# Greet the user
def greet_user():
    """Greets the user according to the time"""

    hour = datetime.now().hour
    if (hour >= 6) and (hour < 12):
        speak(f"Good Morning {USERNAME}")
    elif (hour >= 12) and (hour < 16):
        speak(f"Good afternoon {USERNAME}")
    elif (hour >= 16) and (hour < 19):
        speak(f"Good Evening {USERNAME}")
    speak(f"I am {BOTNAME}. How may I assist you?")

Enter fullscreen mode Exit fullscreen mode

First, we get the current hour, i.e., if the current time is 11:15 AM, the hour will be 11. If the value of hour is between 6 and 12, wish Good Morning to the user. If the value is between 12 and 16, wish Good Afternoon and similarly, if the value is between 16 and 19, wish Good Evening. We are using the speak method to wish the user.

3. Take User Input

This function is for taking the commands from the user and recognizing the command using the speech_recognitionmodule.

import speech_recognition as sr
from random import choice
from utils import opening_text

# Takes Input from User
def take_user_input():
    """Takes user input, recognizes it using Speech Recognition module and converts it into text"""

    r = sr.Recognizer()
    with sr.Microphone() as source:
        print('Listening....')
        r.pause_threshold = 1
        audio = r.listen(source)

    try:
        print('Recognizing...')
        query = r.recognize_google(audio, language='en-in')
        if not 'exit' in query or 'stop' in query:
            speak(choice(opening_text))
        else:
            hour = datetime.now().hour
            if hour >= 21 and hour < 6:
                speak("Good night sir, take care!")
            else:
                speak('Have a good day sir!')
            exit()
    except Exception:
        speak('Sorry, I could not understand. Could you please say that again?')
        query = 'None'
    return query
Enter fullscreen mode Exit fullscreen mode

We have imported speech_recognitionmodule as sr. The Recognizer class within the speech_recognitionmodule helps us recognize the audio. The same module has a Microphone class that gives us access to the microphone of the device. So with the microphone as the source, we try to listen to the audio using the listen()method in the Recognizer class. We have also set the pause_thresholdto 1, i.e., it will not complain even if we pause for one second during we speak.

Next, using the recognize_google()method from the Recognizer class, we try to recognize the audio. The recognize_google() method performs speech recognition on the audio passed to it, using the Google Speech Recognition API. We have set the language to en-in, i.e. English India. It returns the transcript of the audio which is nothing but a string. We've stored it in a variable called query.

If the query has exit or stop words in it, it means we're asking the assistant to stop immediately. So, before stopping, we greet the user again as per the current hour. If the hour is between 21 and 6, wish Good Night to the user, else, some other message. We create a utils.py file which has just one list containing a few statements as:

opening_text = [
    "Cool, I'm on it sir.",
    "Okay sir, I'm working on it.",
    "Just a second sir.",
]
Enter fullscreen mode Exit fullscreen mode

If the query doesn't have those two words(exit or stop), we speak something to tell the user that we have heard you. For that, we will use the choice method from the random module to randomly select any statement from the opening_textlist. After speaking, we exit from the program.

During this entire process, if we encounter an exception, we apologize to the user and set the queryto None. In the end, we return the query.

Main Method

To run the project, we're using the main method.

if __name__ == ' __main__':
    greet_user()
    while True:
        query = take_user_input().lower()
        print(query)
Enter fullscreen mode Exit fullscreen mode

As we know, the first thing we need to do is to greet the user using the greet_user() function. Next, we run a while loop to continuously take input from the user using the take_user_input() function. For now, we're just printing the query.

For now, the complete code in main.py looks like this:

import pyttsx3
import speech_recognition as sr
from decouple import config
from datetime import datetime
from random import choice
from utils import opening_text

USERNAME = config('USER')
BOTNAME = config('BOTNAME')

engine = pyttsx3.init('sapi5')

# Set Rate
engine.setProperty('rate', 190)

# Set Volume
engine.setProperty('volume', 1.0)

# Set Voice (Female)
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[1].id)

# Text to Speech Conversion
def speak(text):
    """Used to speak whatever text is passed to it"""

    engine.say(text)
    engine.runAndWait()

# Greet the user
def greet_user():
    """Greets the user according to the time"""

    hour = datetime.now().hour
    if (hour >= 6) and (hour < 12):
        speak(f"Good Morning {USERNAME}")
    elif (hour >= 12) and (hour < 16):
        speak(f"Good afternoon {USERNAME}")
    elif (hour >= 16) and (hour < 19):
        speak(f"Good Evening {USERNAME}")
    speak(f"I am {BOTNAME}. How may I assist you?")

# Takes Input from User
def take_user_input():
    """Takes user input, recognizes it using Speech Recognition module and converts it into text"""

    r = sr.Recognizer()
    with sr.Microphone() as source:
        print('Listening....')
        r.pause_threshold = 1
        audio = r.listen(source)

    try:
        print('Recognizing...')
        query = r.recognize_google(audio, language='en-in')
        if not 'exit' in query or 'stop' in query:
            speak(choice(opening_text))
        else:
            hour = datetime.now().hour
            if hour >= 21 and hour < 6:
                speak("Good night sir, take care!")
            else:
                speak('Have a good day sir!')
            exit()
    except Exception:
        speak('Sorry, I could not understand. Could you please say that again?')
        query = 'None'
    return query

if __name__ == ' __main__':
    greet_user()
    while True:
        query = take_user_input().lower()
        print(query)
Enter fullscreen mode Exit fullscreen mode

You can run and test the application now.

$ python main.py
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this part, we have completed the setup of our virtual personal assistant. We have not added any functionality to it yet. We'll work on those functionalities in the next part of the blog. Stay Tuned!

Top comments (0)