DEV Community

Cover image for Taking Notes with Voice in Python
Tonya Sims for Deepgram

Posted on • Originally published at dpgr.am

Taking Notes with Voice in Python

In this blog post tutorial, we’ll learn how to take notes in Python using our voice. This means we can take an audio file and use AI speech-to-text to transcribe it. One could imagine dozens of scenarios where this could be helpful: from capturing the content of voice memos to providing a tidy written recap of a meeting to folks who couldn't attend.

Getting transcriptions out of these recordings is a pretty straightforward process. This project builds on Deepgram's speech-to-text APIs, which deliver high-quality AI-generated transcripts from both real-time streaming and batch processing pre-recorded audio sources. The project we'll do in this tutorial works with pre-recorded audio files.

Let’s walk through step-by-step taking notes with the voice in Python.

A Learn-by-Doing Speech AI Project in Python

Here’s a list of what we’ll cover in this project:

  • Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK
  • Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python
  • Step 3 - Setup Your Python Project
  • Step 4 - Install Your Python Libraries and Packages using pip
  • Step 5 - How to Upload the Audio File in Python with Voice
  • Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python
  • Final Step - Run the Python Voice Note-Taking Project and Export the Results

Step 1 - Getting Started with Deepgram Speech-to-Text Python SDK

Deepgram has a Python SDK that we can tap into that’s located on Github. We’ll also need to get started with an API key which we can grab in Console, a game-like hub in Deepgram to try the different types of transcriptions in many coding languages, including Python. When you first sign up, you'll get $150 in API credits to try out Deepgram's speech AI capabilities.

Step 2 - Useful Speech-to-Text Features for Taking Voice Notes in Python

Our project, taking notes with voice in Python, will use the Deepgram speech-to-text transcription API and some of its more advanced capabilities to enhance our voice notes. Here are the following features we’ll use along with transcribing audio:

  • Diarization - Recognizes multiple people speaking and assigns a speaker to each word in the transcript.

  • Summarization - Summarize sections of the transcript so that you can quickly scan it.

We’ll see in a few sections how to easily implement these features in our Python project.

Step 3 - Setup Your Python Project

There are a few items we need to set up before we begin coding. I’m using Python3.10 for our project but any version equal to or higher than Python 3.7 will work. Create a folder directory anywhere on your computer, let’s call it voice-notes-with-python.

Then, open that same directory in a code editor like Visual Studio.

Next, create a virtual environment. This ensures our Python libraries get installed in that project and not system wide. Make sure we’re in the correct project directory and run these quick commands from the terminal to create the Python virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Finally, let’s create a Python file inside our directory called take_voice_notes.py.

Step 4 - Install Your Python Libraries and Packages using pip

Now we are ready to install Deepgram using pip. Make sure your virtual environment is activated and run the following command:

pip install deepgram-sdk
Enter fullscreen mode Exit fullscreen mode

This allows us to use the Deepgram speech-to-text Python SDK for transcription, and tap into the features we mentioned earlier.

To verify that Deepgram was installed correctly, from the terminal type:

pip freeze
Enter fullscreen mode Exit fullscreen mode

We should see the latest version of Deepgram from PyPI is installed and ready for use.

Step 5 - How to Transcribe the Audio File in Python with Voice

We’ll use Deepgram’s prerecorded transcription for this taking notes with voice Python project. This type of transcription is used to transcribe an audio file, either locally on your drive or by hosting it online. In this tutorial, we’ll transcribe audio using a local but this AI speech recognition provider, it’s very simple to do both. Let’s see how we transcribe an audio file either as a local download or an online file.

Transcribe a Local Audio File with Python

from deepgram import Deepgram
import json

DEEPGRAM_API_KEY = YOUR_API_KEY_GOES_HERE
PATH_TO_FILE = 'some/file.wav'

def main():
    # Initializes the Deepgram SDK
    deepgram = Deepgram(DEEPGRAM_API_KEY)

    # Open the audio file
    with open(PATH_TO_FILE, 'rb') as audio:
        # ...or replace mimetype as appropriate
        source = {'buffer': audio, 'mimetype': 'audio/wav'}
        response = deepgram.transcription.sync_prerecorded(source, {'punctuate': True})
        print(json.dumps(response, indent=4))

main()
Enter fullscreen mode Exit fullscreen mode

Transcribe a Hosted Online Audio File with Python

from deepgram import Deepgram
import json

# The API key we created in step 3
DEEPGRAM_API_KEY = YOUR_API_KEY_GOES_HERE

# Hosted sample file
AUDIO_URL = "{YOUR_URL_TO_HOSTED_ONLINE_AUDIO_GOES_HERE}"

def main():
    # Initializes the Deepgram SDK
    dg_client = Deepgram(YOUR_API_KEY_GOES_HERE)
    source = {'url': AUDIO_URL}
    options = { "punctuate": True, "model": "general", "language": "en-US", "tier": "enhanced" }
    response = dg_client.transcription.sync_prerecorded(source, options)
    print(json.dumps(response, indent=4))

main()
Enter fullscreen mode Exit fullscreen mode

Step 6 - Using Speech-to-Text Features to Enhance Notetaking with Voice in Python

Now that we have an idea of what our Python code looks like, let’s see an example with our diarize and summarization features. In the same function as above, we can just pass in those features to a Python dictionary as keys and set the values to True, like so:

 with open(PATH_TO_FILE, 'rb') as audio:
       source = {'buffer': audio, 'mimetype': 'audio/mp3'}
       response = deepgram.transcription.sync_prerecorded(source,                                                          
                                         {'diarize': True,                                                    
                                         'summarize': True}
                                                           )
Enter fullscreen mode Exit fullscreen mode

Final Step - Run the Python Voice Note-Taking Project and Export the Results

We’ve reached the final step! In this step, we need to run the Python project so we can see our JSON response with the transcript split into multiple speakers and summaries.

From our terminal type:

python3 take_voice_notes.py > notes.txt
Enter fullscreen mode Exit fullscreen mode

This runs our project and outputs a file called notes.txt, which is now in our directory.

Open the file and we see a JSON response that looks like the following, depending on which audio file was transcribed:

"alternatives": [
                    {
                        "transcript": "Hello, and thank you for being in this meeting...",
                        "confidence": 0.9916992,
                        "words": [
                            {
                                "word": "hello",
                                "start": 15.259043,
                                "end": 15.338787,
                                "confidence": 0.95751953,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "Hello,"
                            },
                            {
                                "word": "and",
                                "start": 15.418532,
                                "end": 15.617893,
                                "confidence": 0.99853516,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "and"
                            },
                            {
                                "word": "thank",
                                "start": 15.617893,
                                "end": 15.777383,
                                "confidence": 0.9975586,
                                "speaker": 1,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "thank"
                            },
                            {
                                "word": "you",
                                "start": 15.777383,
                                "end": 15.9368725,
                                "confidence": 0.9975586,
                                "speaker": 0,
                                "speaker_confidence": 0.76544046,
                                "punctuated_word": "you"
                            },
],
     "summaries": [
                            {
                                "summary": "Hello, and thank you for calling premier phone service. Please be aware that this call may be recorded for quality and training purposes. How may I help you today? I'm having some serious problem with my phone. Can you describe in detail for me? What kind of issues you're having with your device? Well, it isn't working.",
                                "start_word": 0,
                                "end_word": 649
                            },
                            {
                                "summary": "My phone won't turn on. I don't know what's wrong. My dad said I should get a new phone, but I didn't listen to him. I also never backed up my photos on the cloud like I know I should.",
                                "start_word": 649,
                                "end_word": 1288
                            },
        }
]

Enter fullscreen mode Exit fullscreen mode

We received the transcript, and each word in the transcript gets assigned a speaker and the summaries of the transcript at the end of the response.

Conclusion of the Python Voice Note-taking Project with Speech Recognition

We’ve learned how to transcribe audio and take notes in voice with Python and an AI speech-to-text provider.

There are many ways to extend this project by using some of Deepgram's other features like redaction which hides sensitive information like credit card numbers or social security numbers or the search feature which searches a transcript for terms and phrases. For a full list of all the features, please visit this page.

If you have any feedback about this post, or anything else around Deepgram, we'd love to hear from you. Please let us know in our GitHub discussions.

Top comments (2)

Collapse
 
zaydanveo profile image
ZaydanVeo • Edited

As someone who enjoys exploring different ways to streamline workflows, this project definitely caught my attention. It's great to see how Deepgram's speech-to-text features like diarization and summarization can enhance the note-taking process. The ability to transcribe audio files and generate speaker-assigned transcripts is particularly useful.
I'm intrigued to try this out and see how it can simplify my own note-taking tasks. Plus, the option to export the results as a JSON file is convenient for further analysis or integration with other tools.
For more creative ways to enhance your note-taking experience, you might want to check out online notes. They offer a platform for creating and sending anonymous notes, which can be a fun and unique way to express yourself.

Collapse
 
olenvinera profile image
OlenVinera

Thanks for sharing this tutorial. It's fascinating how AI speech-to-text technology can be used for transcription and note-taking purposes. This step-by-step guide seems rather user-friendly.