Tonya Sims for Deepgram

Posted on Dec 1, 2022 • Originally published at dpgr.am

Taking Notes with Voice in Python

#python #speechtotext #voice #notes

In this blog post tutorial, we’ll learn how to take notes in Python using our voice. This means we can take an audio file and use AI speech-to-text to transcribe it. One could imagine dozens of scenarios where this could be helpful: from capturing the content of voice memos to providing a tidy written recap of a meeting to folks who couldn't attend.

Getting transcriptions out of these recordings is a pretty straightforward process. This project builds on Deepgram's speech-to-text APIs, which deliver high-quality AI-generated transcripts from both real-time streaming and batch processing pre-recorded audio sources. The project we'll do in this tutorial works with pre-recorded audio files.

Let’s walk through step-by-step taking notes with the voice in Python.

A Learn-by-Doing Speech AI Project in Python

Here’s a list of what we’ll cover in this project:

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK
Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python
Step 3 - Setup Your Python Project
Step 4 - Install Your Python Libraries and Packages using pip
Step 5 - How to Upload the Audio File in Python with Voice
Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python
Final Step - Run the Python Voice Note-Taking Project and Export the Results

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Deepgram has a Python SDK that we can tap into that’s located on Github. We’ll also need to get started with an API key which we can grab in Console, a game-like hub in Deepgram to try the different types of transcriptions in many coding languages, including Python. When you first sign up, you'll get $150 in API credits to try out Deepgram's speech AI capabilities.

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Our project, taking notes with voice in Python, will use the Deepgram speech-to-text transcription API and some of its more advanced capabilities to enhance our voice notes. Here are the following features we’ll use along with transcribing audio:

Diarization - Recognizes multiple people speaking and assigns a speaker to each word in the transcript.
Summarization - Summarize sections of the transcript so that you can quickly scan it.

We’ll see in a few sections how to easily implement these features in our Python project.

Step 3 - Setup Your Python Project

There are a few items we need to set up before we begin coding. I’m using Python3.10 for our project but any version equal to or higher than Python 3.7 will work. Create a folder directory anywhere on your computer, let’s call it voice-notes-with-python.

Then, open that same directory in a code editor like Visual Studio.

Next, create a virtual environment. This ensures our Python libraries get installed in that project and not system wide. Make sure we’re in the correct project directory and run these quick commands from the terminal to create the Python virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Finally, let’s create a Python file inside our directory called take_voice_notes.py.

Step 4 - Install Your Python Libraries and Packages using `pip`

Now we are ready to install Deepgram using pip. Make sure your virtual environment is activated and run the following command:

pip install deepgram-sdk

This allows us to use the Deepgram speech-to-text Python SDK for transcription, and tap into the features we mentioned earlier.

To verify that Deepgram was installed correctly, from the terminal type:

pip freeze

We should see the latest version of Deepgram from PyPI is installed and ready for use.

Step 5 - How to Transcribe the Audio File in Python with Voice

We’ll use Deepgram’s prerecorded transcription for this taking notes with voice Python project. This type of transcription is used to transcribe an audio file, either locally on your drive or by hosting it online. In this tutorial, we’ll transcribe audio using a local but this AI speech recognition provider, it’s very simple to do both. Let’s see how we transcribe an audio file either as a local download or an online file.

Transcribe a Local Audio File with Python

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’
PATH_TO_FILE = 'some/file.wav'

def main():
    # Initializes the Deepgram SDK
    deepgram = Deepgram(DEEPGRAM_API_KEY)

    # Open the audio file
    with open(PATH_TO_FILE, 'rb') as audio:
        # ...or replace mimetype as appropriate
        source = {'buffer': audio, 'mimetype': 'audio/wav'}
        response = deepgram.transcription.sync_prerecorded(source, {'punctuate': True})
        print(json.dumps(response, indent=4))

main()

Transcribe a Hosted Online Audio File with Python

from deepgram import Deepgram
import json

# The API key we created in step 3
DEEPGRAM_API_KEY = ‘YOUR_API_KEY_GOES_HERE’

# Hosted sample file
AUDIO_URL = "{YOUR_URL_TO_HOSTED_ONLINE_AUDIO_GOES_HERE}"

def main():
    # Initializes the Deepgram SDK
    dg_client = Deepgram(YOUR_API_KEY_GOES_HERE)
    source = {'url': AUDIO_URL}
    options = { "punctuate": True, "model": "general", "language": "en-US", "tier": "enhanced" }
    response = dg_client.transcription.sync_prerecorded(source, options)
    print(json.dumps(response, indent=4))

main()

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Now that we have an idea of what our Python code looks like, let’s see an example with our diarize and summarization features. In the same function as above, we can just pass in those features to a Python dictionary as keys and set the values to True, like so:

 with open(PATH_TO_FILE, 'rb') as audio:
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = deepgram.transcription.sync_prerecorded(source,                                                          
                                         {'diarize': True,                                                    
                                         'summarize': True}
                                                           )

Final Step - Run the Python Voice Note-Taking Project and Export the Results

We’ve reached the final step! In this step, we need to run the Python project so we can see our JSON response with the transcript split into multiple speakers and summaries.

From our terminal type:

python3 take_voice_notes.py > notes.txt

This runs our project and outputs a file called notes.txt, which is now in our directory.

Open the file and we see a JSON response that looks like the following, depending on which audio file was transcribed:

"alternatives": [
                    {
                        "transcript": "Hello, and thank you for being in this meeting...",
                        "confidence": 0.9916992,
                        "words": [
                            {
                                "word": "hello",
                                "start": 15.259043,
                                "end": 15.338787,
                                "confidence": 0.95751953,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "Hello,"
                            },
                            {
                                "word": "and",
                                "start": 15.418532,
                                "end": 15.617893,
                                "confidence": 0.99853516,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "and"
                            },
                            {
                                "word": "thank",
                                "start": 15.617893,
                                "end": 15.777383,
                                "confidence": 0.9975586,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "thank"
                            },
                            {
                                "word": "you",
                                "start": 15.777383,
                                "end": 15.9368725,
                                "confidence": 0.9975586,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "you"
                            },
],
     "summaries": [
                            {
                                "summary": "Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes. How may I help you today? I'm having some serious problem with my phone. Can you describe in detail for me? What kind of issues you're having with your device? Well, it isn't working.",
                                "start_word": 0,
                                "end_word": 649
                            },
                            {
                                "summary": "My phone won't turn on. I don't know what's wrong. My dad said I should get a new phone, but I didn't listen to him. I also never backed up my photos on the cloud like I know I should.",
                                "start_word": 649,
                                "end_word": 1288
                            },
        }
]

We received the transcript, and each word in the transcript gets assigned a speaker and the summaries of the transcript at the end of the response.

Conclusion of the Python Voice Note-taking Project with Speech Recognition

We’ve learned how to transcribe audio and take notes in voice with Python and an AI speech-to-text provider.

There are many ways to extend this project by using some of Deepgram's other features like redaction which hides sensitive information like credit card numbers or social security numbers or the search feature which searches a transcript for terms and phrases. For a full list of all the features, please visit this page.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.

Top comments (2)

ZaydanVeo • Aug 8 '23 • Edited

As someone who enjoys exploring different ways to streamline workflows, this project definitely caught my attention. It's great to see how Deepgram's speech-to-text features like diarization and summarization can enhance the note-taking process. The ability to transcribe audio files and generate speaker-assigned transcripts is particularly useful.
I'm intrigued to try this out and see how it can simplify my own note-taking tasks. Plus, the option to export the results as a JSON file is convenient for further analysis or integration with other tools.
For more creative ways to enhance your note-taking experience, you might want to check out online notes. They offer a platform for creating and sending anonymous notes, which can be a fun and unique way to express yourself.

OlenVinera • Jul 17 '23

Thanks for sharing this tutorial. It's fascinating how AI speech-to-text technology can be used for transcription and note-taking purposes. This step-by-step guide seems rather user-friendly.

DEV Community

Taking Notes with Voice in Python

A Learn-by-Doing Speech AI Project in Python

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Step 3 - Setup Your Python Project

Step 4 - Install Your Python Libraries and Packages using `pip`

Step 5 - How to Transcribe the Audio File in Python with Voice

Transcribe a Local Audio File with Python

Transcribe a Hosted Online Audio File with Python

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Final Step - Run the Python Voice Note-Taking Project and Export the Results

Conclusion of the Python Voice Note-taking Project with Speech Recognition

Top comments (2)

Read next

Advent of Code 2024 - Day 1: Historian Hysteria

Chatbot with Semantic Kernel - Part 2: Plugins 🧩

UV как альтернатива Poetry

Unlocking Text from Embedded-Font PDFs: A pytesseract OCR Tutorial

A Learn-by-Doing Speech AI Project in Python

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Step 3 - Setup Your Python Project

Step 4 - Install Your Python Libraries and Packages using pip

Step 5 - How to Transcribe the Audio File in Python with Voice

Transcribe a Local Audio File with Python

Transcribe a Hosted Online Audio File with Python

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Final Step - Run the Python Voice Note-Taking Project and Export the Results

Conclusion of the Python Voice Note-taking Project with Speech Recognition

Read next

Advent of Code 2024 - Day 1: Historian Hysteria

Chatbot with Semantic Kernel - Part 2: Plugins 🧩

UV как альтернатива Poetry

Unlocking Text from Embedded-Font PDFs: A pytesseract OCR Tutorial

Step 4 - Install Your Python Libraries and Packages using `pip`