Automatically transcribe your video files uploaded to S3 using AWS Transcribe

#aws #s3 #lambda #serverless

Overview
Imagine you have a collection of files, and you want to make changes to them automatically as they're uploaded without any manual intervention.

For example, what if you wanted to resize images, mask personal data in documents, or even convert a file format? Using S3 Object Lambda, you can transform these files on the go, as they're uploaded to the bucket.

Auto Transcriber
Now, let's dive into the project which I've built. A pipeline which transcribes the audio and video files uploaded to the S3 bucket utilizing S3 event notification, lambda function and AWS Transcribe.

This is the architecture diagram of the project

Prerequisites

AWS Account - If you don't already have, you can sign up for free account

That's it. Everything else can be managed via the AWS Console.

Here’s how it works:

Uploading the Video: You start by uploading your video (like an interview or a lecture) into an S3 bucket. Think of this as a cloud folder where all your files are stored.
Triggering the Transcription: As soon as the video is uploaded, a special function automatically detects the new video. This function then tells another service, called Amazon Transcribe, to listen to the video’s audio and convert everything spoken into text.
Saving the Transcript: Once the transcription is done, the result is saved in a designated location, but this time as a JSON file. This file holds all the text, along with useful information like timestamps for when each word was spoken.