Introduction
If you've ever wanted to quickly and accurately get your zoom meeting, or any kind of speech turned into text, then Assembly AI is the API you need. Today I will be covering how to create a simple backend API to handle mp3 file uploads and converting them to PDF files with the transcript included. I will also show sign-in and sign-up functionality as a bonus.
What is Assembly AI?
"AssemblyAI is a top rated API for speech recognition, trusted by startups and global enterprises in production" - Assembly AI Website
It is very simple to get started with turning speech to text, and you can do it in just 2 minutes here: https://docs.assemblyai.com/overview/getting-started
You can get your API key here: https://app.assemblyai.com/login/
Note: You are limited to 3 hours of processing time for the month with this API.
Backend Stack
The following technologies will be used to build our backend.
- PostgreSQL
- Node.js
- Express
- Prisma ORM
- Bcrypt
- JWT
- pdfkit
Requirements
You will need PostgreSQL in your system. I use this software: PostgreSQL
Once PostgreSQL is installed, you will have to create the database and user with the following commands
$ createdb zoom-summarizer
$ createuser -P -s -e zoom_summarizer_user
Next, clone my express-prisma-starter to have the same code: Code Starter
Create a .env file inside the repo, and include this so that Prisma knows the database to connect to.
DATABASE_URL = 'postgresql://zoom-summarizer-user@localhost:5432/zoom-summarizer'
Lastly, install the dependencies and run the migration to setup the tables.
$ npm i
$ npx prisma migrate dev --name init
Development
If you want to skip to the point where we use the Assembly AI API, click here
Sign Up
We will start off with the sign up page, where we will collect a name, email and password. Don't worry, we are going to hash the password of course.
Inside your source folder, create a new folder called db, with a file called db.js. In here, we will have all database calls. We are doing this to decouple the data layer from the business logic and routes.
- Add create user CRUD in db.js
const { PrismaClient } = require("@prisma/client");
const prisma = new PrismaClient();
// CREATE
const createUser = async (email, password, name) => {
const result = await prisma.user.create({
data: {
email,
password,
name,
},
});
return result;
};
module.exports = {
createUser,
};
- Add post route for sign up in index.js
const db = require("./db/db");
const bcrypt = require("bcrypt");
const jwtService = require("jsonwebtoken");
const express = require("express");
const app = express();
app.use(express.json());
app.get(`/`, async (req, res) => {
res.json({ success: true, data: "Hello World!" });
});
app.post("/signup", async (req, res) => {
const { email, password, name } = req.body;
if (!email || !password || !name) {
res.status(400).json({
success: false,
error: "Email, password and name are required.",
});
return;
}
try {
// hash password
const salt = await bcrypt.genSalt(Number(process.env.SALT_ROUNDS));
const passwordHash = await bcrypt.hash(password, salt);
// create user
const response = await db.createUser(email, passwordHash, name);
res.json({ success: true, data: response });
} catch (e) {
console.log(e);
res.status(409).json({
success: false,
error: "Email account already registered.",
});
}
});
To test, hit http://localhost:3001/signup with a POST request with the body:
{
"email": "memo@memo.com",
"password": "123",
"name": "Guillermo"
}
And that's it for the sign up endpoint! Pretty straight forward. We use bcrypt to hash the password. If possible though, you should use a more serious solution if you want to take this to production. This was a quick implementation.
Sign In
Now that we can register users, it's time to log them in. We will be using JWT tokens in order to keep track of sessions. This is not the most secure method(like refresh tokens), but it will do for this tutorial.
We're going to create another folder inside src, called lib. Here we are going to put any code dealing with jwt, aws and pdfkit.
Create the folder lib and the file jwt.js
- lib/jwt.js
const jwt = require("jsonwebtoken");
const getJWT = async (id, email) => {
try {
return jwt.sign(
{
email,
id,
},
process.env.JWT_SECRET,
{
expiresIn: Number(process.env.JWT_EXPIRE_TIME),
}
);
} catch (e) {
throw new Error(e.message);
}
};
const authorize = (req, res, next) => {
// middleware to check if user is logged in
try {
const token = req.headers.authorization.split(" ")[1];
jwt.verify(token, process.env.JWT_SECRET);
next();
} catch (error) {
res.status(401).json({ success: false, error: "Authentication failed." });
}
};
module.exports = {
getJWT,
authorize,
};
Here, getJWT will give us a token for the frontend to store, and authorize is a middleware we will be using in protected routes to make sure a user is logged in.
Next, replace this line on top of the index.js file:
const jwtService = require("jsonwebtoken");
With this line:
const jwtLib = require("./lib/jwt");
Now we need to get a user by the email they entered, in order to compare passwords.
Add this function to db.js:
db.js
// READ
const getSingleUserByEmail = async (email) => {
const user = await prisma.user.findFirst({
where: { email },
});
return user;
};
module.exports = {
createUser,
getSingleUserByEmail
};
To finish off this sign-in endpoint, we will create a post route for it inside of index.js
index.js
app.post("/signin", async (req, res) => {
const { email, password } = req.body;
if (!email || !password) {
res
.status(400)
.json({ success: false, error: "Email and password are required." });
return;
}
try {
// Find user record
const user = await db.getSingleUserByEmail(email);
if (!user) {
res.status(401).json({ success: false, error: "Authentication failed." });
return;
}
// securely compare passwords
const match = await bcrypt.compare(password, user.password);
if (!match) {
res.status(401).json({ success: false, error: "Authentication failed." });
return;
}
// get jwt
const jwtToken = await jwtLib.getJWT(user.id, user.email);
// send jwt and user id to store in local storage
res
.status(200)
.json({ success: true, data: { jwt: jwtToken, id: user.id } });
} catch (e) {
console.log(e);
res.status(500).json({
success: false,
error: `Authentication failed.`,
});
}
});
Upload & Audio Processing
Now we finally get to the part where we use the Assembly AI API in order to get a transcript of our mp3 files!
First, we will upload our files to S3 so that the Assembly AI API has a place to pull the audio from.
Inside of src/lib, create a new file called aws.js.
aws.js
const AWS = require("aws-sdk");
s3 = new AWS.S3({ apiVersion: "2006-03-01" });
const uploadFile = async (file) => {
const params = {
Bucket: process.env.AWS_S3_BUCKET_NAME,
Key: file.name,
Body: file.data,
};
try {
const stored = await s3.upload(params).promise();
return stored;
} catch (e) {
console.log(e);
throw new Error(e.message);
}
};
module.exports = {
uploadFile,
};
This code will take care of our s3 file uploads.
Next we will create the last library file called pdf.js inside lib. Here we will take care of turning the text from the Assembly AI API into a nice pdf format.
pdf.js
const PDF = require("pdfkit");
const generatePdf = (title, text, terms, res) => {
const pdf = new PDF({ bufferPages: true });
let buffers = [];
pdf.on("data", buffers.push.bind(buffers));
pdf.on("end", () => {
let pdfData = Buffer.concat(buffers);
res
.writeHead(200, {
"Content-Length": Buffer.byteLength(pdfData),
"Content-Type": "application/pdf",
"Content-disposition": `attachment;filename=${title}.pdf`,
})
.end(pdfData);
});
pdf.font("Times-Roman").fontSize(20).text(title, {
align: "center",
paragraphGap: 20,
});
pdf.font("Times-Roman").fontSize(12).text(text, {
lineGap: 20,
});
if (terms) {
const termsArr = terms.results.sort((a, b) => b.rank - a.rank);
const cleanedTerms = termsArr.map((term) => term.text);
pdf.font("Times-Roman").fontSize(16).text("Key Terms", {
align: "center",
paragraphGap: 20,
});
pdf
.font("Times-Roman")
.fontSize(12)
.list(cleanedTerms, { listType: "numbered" });
}
pdf
.fillColor("gray")
.fontSize(12)
.text(
"Transcript provided by AssemblyAI ",
pdf.page.width - 200,
pdf.page.height - 25,
{
lineBreak: false,
align: "center",
}
);
pdf.end();
};
module.exports = {
generatePdf,
};
The format of the pdf is really up to you, this is a basic paragraph and a list of key terms.
We also need to store the transcriptId that the AssemblyAI API gives us to later get the transcript text, so we will create db functions for it inside db.js
db.js
const createRecording = async (name, s3Key, transcriptId, email) => {
const result = await prisma.recording.create({
data: {
name,
s3Key,
transcriptId,
user: {
connect: {
email,
},
},
},
});
return result;
};
const getSingleUserById = async (id) => {
const user = await prisma.user.findFirst({
where: { id },
});
return user;
};
module.exports = {
createUser,
createRecording,
getSingleUserByEmail,
getSingleUserById,
};
Lastly, we can put this all together to upload an mp3 file, call the Assembly AI API to process that file from S3, and save the transcript Id to later fetch the transcript as a pdf file.
Your index.js file should look like this:
index.js
const db = require("./db/db");
const jwtLib = require("./lib/jwt");
const awsLib = require("./lib/aws");
const pdfLib = require("./lib/pdf");
const fetch = require("node-fetch");
const bcrypt = require("bcrypt");
const express = require("express");
const fileUpload = require("express-fileupload");
const cors = require("cors");
const { response } = require("express");
const app = express();
app.use(cors());
app.use(express.json());
app.use(fileUpload());
.
.
.
app.post("/upload", jwtLib.authorize, async (req, res) => {
const { id } = req.body;
if (!id) {
return res
.status(400)
.json({ success: false, error: "You must provide the user id." });
}
if (!req.files || Object.keys(req.files).length === 0) {
return res
.status(400)
.json({ success: false, error: "No files were uploaded." });
}
try {
const file = req.files.uploadedFile;
// upload to s3
const uploadedFile = await awsLib.uploadFile(file);
const { Location, key } = uploadedFile;
const body = {
audio_url: Location,
auto_highlights: true,
};
// call aai api
const response = await fetch(process.env.ASSEMBLYAI_API_URL, {
method: "POST",
body: JSON.stringify(body),
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
"content-type": "application/json",
},
});
const result = await response.json();
if (result.error) {
console.log(result);
res.status(500).json({
success: false,
error: "There was an error uploading your file.",
});
return;
}
// get user email
const user = await db.getSingleUserById(Number(id));
const { email } = user;
// save transcript id to db
const recording = await db.createRecording(
file.name,
key,
result.id,
email
);
res.status(200).json({ success: true, data: recording });
} catch (e) {
console.log(e);
res.status(500).json({
success: false,
error: "There was an error uploading your file.",
});
}
});
Notice that we use the authorize middleware for this endpoint and we also need to send the user Id that you get once you log in.
All we need now is an endpoint to generate our pdf, which is what we will get to now.
Let's add a db function to get the transcript we saved.
db.js
const getSingleRecording = async (transcriptId) => {
const recording = await prisma.recording.findFirst({
where: {
transcriptId,
},
});
return recording;
};
module.exports = {
createUser,
createRecording,
getSingleUserByEmail,
getSingleUserById,
getSingleRecording,
};
And now we can create the endpoint to generate a pdf
app.post("/generate-pdf", jwtLib.authorize, async (req, res) => {
const { transcriptId } = req.body;
if (!transcriptId) {
return res
.status(400)
.json({ success: false, error: "You must provide the transcript id." });
}
try {
const url = process.env.ASSEMBLYAI_API_URL + "/" + transcriptId;
const response = await fetch(url, {
method: "GET",
headers: {
authorization: process.env.ASSEMBLYAI_API_KEY,
"content-type": "application/json",
},
});
const result = await response.json();
if (result.error) {
console.log(result);
res.status(500).json({
success: false,
error: "There was an error retrieving your recording.",
});
return;
}
const { text, auto_highlights_result } = result;
const recordingRecord = await db.getSingleRecording(transcriptId);
const { name } = recordingRecord;
pdfLib.generatePdf("Transcript", text, auto_highlights_result, res);
} catch (e) {
console.log(e);
res.status(500).json({
success: false,
error: "There was an error retrieving your recordings.",
});
}
});
Now you just need to provide the endpoint the transcriptId you saved in the database and it will return a pdf file for you!
Wrap up
That's it! You have a basic app that allows users to sign in/up, upload mp3 conversations and get transcripts back in pdf formats. There is tons of room for growth in this project, and if you would like to try it out for yourself, check the links below.
Source Code: https://github.com/guilleeh/zoom-summarizer
Demo: https://zoom-summarizer.vercel.app/
The source code is a full stack application, so you can see how I put this all together.
Hope you all learned something today!
Top comments (0)