DEV Community 👩‍💻👨‍💻

Cover image for Raw stereo audio to mono channel
Ilya Nevolin
Ilya Nevolin

Posted on

Raw stereo audio to mono channel

Lately I've stumbled upon an interesting engineering problem that may be useful to some of you.

I the creator and owner of two Discord bots that utilize speech recognition technology to perform certain actions. There are especially useful for hearing impaired and deaf people, to still communicate with their friends using Discord.

The first bot is just a framework that transcribes speech to text and posts it in the channel: https://github.com/healzer/DiscordEarsBot
The second bot is a music player that listens to voice commands to play songs, playlists, pause, skip, shuffle, etc.: https://github.com/healzer/DiscordSpeechBot

The problem I recently faced was due to audio conversion. Each user that speaks in Discord is treated as a separate stream. And this audio stream is raw binary data (signed 16 bit, 48khz, stereo/two-channel) also known as PCM data. But the free Speech-to-Text service that we use only accepts mono (single) channel audio data.

Initially I used the sox dependency to convert and manipulate the audio data. But this was a pain in the butt, because many of our users couldn't get sox correctly installed on their machine. Unfortunately I haven't found any other npm or javascript package, so it was time to do it myself.

A quick Google search was needed to understand the raw audio format. In my case we are dealing with signed 16 bit data and two channels. This means that each audio sample consists of 16 bits (= 2 bytes), but also that the first 2 bytes are left audio data, and the following 2 bytes are right audio data.

To convert two channels into a single channel we have to decide how we are going to approach this, because left and right may differ. However a regular microphone input does not distinguish left from right, meaning that left data should be the same as the right data. When you analyze the audio wave form, you will see that left waves are the same as the right waves. This simplifies our life, we can drop either left or right to get a mono audio file.

// stereo to mono channel
function convert_audio(infile, outfile) {
    try {
        // read stereo audio file into signed 16 array
        const data = new Int16Array(fs.readFileSync(infile))

        // create new array for the mono audio data
        const ndata = new Int16Array(data.length/2)

        // copy left audio data (skip the right part)
        for (let i = 0, j = 0; i < data.length; i+=4) {
            ndata[j++] = data[i]
            ndata[j++] = data[i+1]
        }

        // save the mono audio file
        fs.writeFileSync(outfile, Buffer.from(ndata), 'binary')
    } catch (e) {
        console.log(e)
    }
}
Enter fullscreen mode Exit fullscreen mode

Top comments (0)

🌚 Browsing with dark mode makes you a better developer by a factor of exactly 40.

It's a scientific fact.