Jayson DeLancey for Dolby.io

Posted on Apr 8, 2021 • Originally published at dolby.io

Generate a Transcript of Your Meeting

#transcribe #postman #audio

One of the commonly used features by developers who build with the Dolby.io Interactivity solution is capturing a recording of a video conference. This record of what happened during a meeting can be a valuable tool for anybody that missed the scheduled event. Sometimes, having that recording converted to a text transcript can make it easier for your users to find where a particular phrase or topic was discussed during a meeting.

Symbl Conversation API

You can learn more about Dolby.io real-time audio and video experiences from our product and developer documentation. To get a transcript, we took a look at Symbl.ai. Symbl is a conversation intelligence platform that provides programmable APIs.

Their REST API gives developers access to an AI-based solution that goes beyond a conversion of speech-to-text.

There is a range of solutions to level-up the impact of a meeting, some examples include:

Conversation Analytics such as how much each participant speaks, words per minute, or cross-talk
Sentiment Analysis to judge positive, negative, or neutral word selection
Topic Identification for understanding the hierarchy of key points discussed during a meeting

The Symbl.ai solutions can become a powerful tool when combined with the high-quality audio captured by Dolby.io. Let’s review an example of how it all works.

Recording a Conference

The first step in the process will be getting recordings of your conferences in the first place. This is the last step in the Getting Started tutorial covered in the documentation.

First, you should enable liveRecording when creating the conference.

VoxeetSDK.conference.create({
    alias: 'Symbl.ai Demo',
    params: {
        liveRecording: true,
    }
})

Then, you begin recording as a result of a user action.

VoxeetSDK.recording.start()

There is an option in the dashboard to choose the types of media to generate. If transcription will be something you want your application to support, you may want to configure your app to generate both audio and video so that you can easily process the mp3 files.

For more details on recording, the following guides may be helpful in getting started:

Once you’ve recorded a few conferences you can move on to the next step.

Postman Collection

To make this easier to see how these services can work together, we created a Collection in a Postman Public Workspace. A public workspace allows us to collaborate when building a project, giving a convenient Graphical User Interface to making REST requests.

You can review it here:
https://www.postman.com/dolbyio/workspace/dolby-io-community/overview

To use the Transcribe Media with Symbl.ai collection you’ll need to:

Fork the collection into your own workspace
Fork the environment to add your API Keys for both Dolby.io and Symbl.ai accounts.

The collection has a folder for each step in what might be a typical series of processing steps. We’ll look at a few of these in more detail to learn how they work.

Dolby.io Meeting Recordings

The first major task is to find an audio recording for a conference session. In order to accomplish this you can run some queries using the Monitor REST API. This end point gives you a way to query ongoing or completed conferences, participants, webhook events, etc.

What we’ll need to do is:

Authenticate
Find a Meeting
Identify a Recording

Authentication should be straight forward if you’ve already worked with a JSON Web Token (JWT) before. The consumer key and secret are used with basic authentication to return an auth token. This token is then used in all subsequent requests as a header. To Find a Meeting you have a few options for how to query past meetings. The folder contains a few ways you might identify a specific conference.

For example, you might be looking for a meeting by name, or from a specific time period, or just all meetings that have recordings. There are a few parameters you can use to tailor this to your use case. Ultimately, you’ll be looking for the confId which will help in identifying where the recording can be found.

{
    "recordings": [
        {
            "confId": "9ae8aa2d-...",
            "ts": 1615271325363,
            "duration": 14095,
            "region": "ca",
            "alias": "Symbl.ai Demo",
            "mix": {
                "mp4": 2085830,
                "region": "ca",
                "mp3": 440788
            }
        }
    ]
}

Once you’ve identified a confId with a recording you can download the audio or video file directly. The collection is setup with a Test function that can help extract this value and then store it as a dynamic variable in the environment.

For example, the test function looks like this:

var data = JSON.parse(responseBody)
postman.setEnvironmentVariable("dolby-conference-id", data.recordings[0].confId)

We can then construct the download URL such as:
https://api.voxeet.com/v1/monitor/conferences/dolby-conference-id/recordings/mp3

Symbl gives us the option to upload a file directly or we can pass a URL reference such as this directly to begin processing the file if it is available from a world readable location. In this case our recording requires an access token.

Symbl.ai Speech-to-Text

At this stage we have an MP3 recording we can analyze that can be downloaded locally or referenced as a URL. Both of those scenarios is something the Symbl.ai Async API is able to handle. To use it, we must go through a few steps.

Authenticate
Submit the MP3 Recording
Get Transcript

The specifics of how you Authenticate with Symbl are a little bit different but generally follow the same pattern of getting an access token. You can then share that JWT as an x-api-header in subsequent REST requests to symbl.ai.

The Speech-to-Text conversion will take time to process depending on the length of the recording. This is why the API is asynchronous, you initiate a job and then wait for the processing to complete.

You can check the job status to make sure it is running, and when it completes you’ll get back a transcript through a conversationId. This was a small sample, but you can choose to get a list of messages or even have it formatted as markdown to quickly embed within your web application.

{
    "transcript": {
        "payload": "This is a recording in order to test transcripts",
        "contentType": "text/markdown"
    }
}

What's Next

We explored a few things in this post:

How to capture Dolby.io recordings
How to find meetings and download a recording
How to run an asynchronous Symbl.ai speech-to-text process
How to download the transcript

In a future project, we’ll explore how to do this in real-time for use cases such as closed-captioning and impatience. Until then, take a look at the Postman Dolby.io Community Public Workspace.

DEV Community