DEV Community

Cover image for How to transcribe long audio files?
Eden AI
Eden AI

Posted on • Originally published at edenai.co

How to transcribe long audio files?

Using Eden AI for long audio transcription

Audio files are often encountered in various applications, ranging from podcasts and interviews to recordings of lectures or meetings. Nevertheless, dealing with long audio files can be challenging when the objective is to transcribe or process specific segments of the content. This is where Eden AI comes into play.

In this tutorial, we will guide you through the process of splitting long audio files into smaller chunks, generating text transcriptions, and concatenating the resulting text. Let’s get started.

Prerequisites:

Ensure that you have the following requirements in place beforehand:

  1. A valid API key from Eden AI.
  2. Python installed on your system.
  3. The necessary Python libraries: requests, pydub, and pydub.silence.

Step 1: Import the Required Libraries

To start with, let’s import the necessary libraries to access the Eden AI API and handle audio processing. Open your Python environment or IDE and import the following libraries:

import json import requests from pydub import AudioSegment from pydub.silence import split_on_silence

Step 2: Set Up the API Key and Audio File URL

Next, we need to set up the API key and specify the URL of the audio file that you want to split. To get your API key, you’ll need to create an account on Eden AI:

Image Description

Get your API key for FREE

Update the following variables with your API key and audio file URL:

# Replace with your API key and audio file URL api_key = "YOUR_API_KEY" audio_file_url = "AUDIO_FILE_URL"

Enter fullscreen mode Exit fullscreen mode

Step 3: Download and Prepare the Audio File

In this step, we will download the long audio file from the specified URL and prepare it for further processing. Add the following code:

# Download the audio file response = requests.get(audio_file_url) with open("temp_audio_file.mp3", "wb") as file: file.write(response.content) audio = AudioSegment.from_file("temp_audio_file.mp3", format="mp3")

Enter fullscreen mode Exit fullscreen mode

Step 4: Split the Audio File into Chunks

Now, let’s split the audio file into smaller chunks based on periods of silence. We’ll use the split_on_silence function from the pydub.silence module. Include the following code:

# Load the audio file into chunks chunks = split_on_silence(audio, min_silence_len=500, silence_thresh=-40)

Enter fullscreen mode Exit fullscreen mode

Step 5: Define the Transcription Function

To transcribe each audio chunk, we need to define a function that utilizes the Eden AI API. Add the following code:

# Function to transcribe an audio chunk def transcribe_audio_chunk(chunk, index): chunk.export(f"temp_chunk_{index}.mp3", format="mp3") url = "https://api.edenai.run/v2/audio/speech_to_text_async" headers = {"Authorization": f"Bearer {api_key}"} json_payload = { "providers": "google, amazon", "language": "en-US", "file_url": f"🔗 https://drive.google.com/file/d/1i9EmD2AGn7VzQKYJZWe14cPQojY6qSBy/view?usp=share_link {index}", } response = requests.post(url, json=json_payload, headers=headers) result = json.loads(response.text) return result["result"]["google"]["transcription"]

Enter fullscreen mode Exit fullscreen mode

Step 6: Transcribe and Concatenate Text

In this final step, we will transcribe each audio chunk and concatenate the resulting text. Add the following code:

# Transcribe each chunk and concatenate the text transcribed_text = "" for index, chunk in enumerate(chunks): text = transcribe_audio_chunk(chunk, index) transcribed_text += " " + text print(transcribed_text)

Enter fullscreen mode Exit fullscreen mode

Best Practices for Working with Audio Files

1. Audio File Format and Quality

Ensure that your audio file is in a compatible format supported by the Eden AI API. Commonly used formats include MP3, WAV, FLAC, and OGG. Additionally, consider the quality of the audio file. Higher-quality recordings generally yield better transcription results.

2. Pre-processing and Noise Reduction

Before splitting your audio file, consider applying pre-processing techniques to improve transcription accuracy. This includes reducing background noise, normalizing audio levels, and enhancing speech clarity. Tools like the pydub library provide functionalities for noise reduction and audio enhancement.

3. Optimal Chunk Size

Choose an appropriate chunk size based on your specific requirements. Smaller chunks allow for more granular processing but may increase API usage and processing time. Larger chunks reduce API calls but may result in longer transcriptions or lower accuracy for sections with significant background noise or overlapping speech. Experiment with different chunk sizes to find the balance that suits your needs.

4. Silence Threshold and Minimum Silence Length

The split_on_silence function requires setting the silence threshold and minimum silence length parameters. Adjust these values according to the characteristics of your audio file. Higher silence thresholds may result in splitting the audio at lower volumes, while shorter minimum silence lengths may lead to more frequent splits. Fine-tune these parameters to achieve desired results.

5. Error Handling and Retry Mechanisms

When making API calls to Eden AI, implement appropriate error handling and retry mechanisms. Network disruptions or API limitations may cause intermittent failures. Consider incorporating error handling and retries to ensure the reliability and robustness of your code.

Note: As a good practice, make sure to clean up any temporary files generated during the process.

Complete Code

Handling long audio files can be a complex task, especially when you need to extract specific sections for transcription or further processing. With the knowledge gained from this tutorial, you are now equipped to tackle the challenge of working with long audio files.

Remember to handle your API key securely and consider optimizing your audio files by choosing compatible formats, applying pre-processing techniques for noise reduction.

import json import requests from pydub import AudioSegment from pydub.silence import split_on_silence # Replace with your API key and audio file URL api_key = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyX2lkIjoiNDQ0YjI4NDItYjBkMC00YTdmL WI3NmMtYzI4NGY3MWNiZWFiIiwidHlwZSI6ImFwaV90b2tlbiJ9.eBl3f8bDffva4aKP-XJXgz5vxwpRIzLTnh5Vy9ptR_4" audio_file_url = "https://drive.google.com/file/d/1i9EmD2AGn7VzQKYJZWe14cPQojY6qSBy/view?usp=share_link" # Download the audio file response = requests.get(audio_file_url) with open("temp_audio_file.mp3", "wb") as file: file.write(response.content) # Load the audio file and split it into chunks audio = AudioSegment.from_file("temp_audio_file.mp3", format="mp3") chunks = split_on_silence(audio, min_silence_len=500, silence_thresh=-40) # Function to transcribe an audio chunk def transcribe_audio_chunk(chunk, index): chunk.export(f"temp_chunk_{index}.mp3", format="mp3") url = "https://api.edenai.run/v2/audio/speech_to_text_async" headers = {"Authorization": f"Bearer {api_key}"} json_payload = { "providers": "google, amazon", "language": "en-US", "file_url": f"🔗 https://drive.google.com/file/d/1i9EmD2AGn7VzQKYJZWe14cPQojY6qSBy/view?usp=share_link {index}", } response = requests.post(url, json=json_payload, headers=headers) result = json.loads(response.text) return result["result"]["google"]["transcription"] # Transcribe each chunk and concatenate the text transcribed_text = "" for index, chunk in enumerate(chunks): text = transcribe_audio_chunk(chunk, index) transcribed_text += " " + text print(transcribed_text)

Enter fullscreen mode Exit fullscreen mode

Create your Account on Eden AI

Top comments (0)