DEV Community

Cover image for Podcast Production with streamtasks
Leo
Leo

Posted on

Podcast Production with streamtasks

Podcasts are easy to produce. Let's make it even easier!

Usually a podcast has multiple people speaking. Each with their own microphone and their own camera. To compose the podcast, most of the time, it will be cut based on who is speaking.

In this example we will assume, that this podcast has two participants, a camera and a microphone for each, all plugged into one computer.

Our goal is to automate the switching between participants then store the video for each participant and the composed video separately.

We start off with two audio inputs. We mix these inputs using an audio mixer and then encode each audio stream (the stream for P1, P2 and for P1+P2).

two audio inputs mixed and encoded

Next we store our encoded audio streams using an output container (as a MP4).

two mixed audio inputs stored as mp4 files

That is the audio side done. Now lets add some video. Like with the audio inputs we create video inputs and encode them as h264 video.

two video inputs with encoders

To store the video for each participant we add a video track to our output container and connect our encoded video to them. Lets start by just adding video tracks to the individual output files.

storing the video in the individual outputs

Now lets do the exciting part. Lets switch between the video feeds and save that to our p12.mp4 file.

We start by adding a video track to our composed output container and add a media switch to our deployment. We connect each video feed to one of the inputs on the media switch. We connect the output of the switch to the video track input of our output container.

adding a media switch

Right now the switch does nothing. To actually switch between video feeds we will use the volume of our microphones to set the control signal of the media switch. The media switch expects a number input on its control inputs and will switch to the input which has the highest value on its corresponding control input. So it will switch to "input 1" if "control 1" has the highest control value.

To extract the audio volume from our microphones we use the audio volume meter task. It outputs a number representing the volume of the audio input. The volume output is then connected to the corresponding control inputs. "control 1" is for P1, which means we use the audio volume of the P1 audio data as the control signal for the video of P1.

working switching logic

Now our automatic switching works, but it is a little slow. Optimally you want to switch just before someone starts speaking. We can't look in the future, but we can look in the past.

There are multiple ways of doing this. One way is to delay the video feeds by 1 second before we switch between them. Another way is to move our control signals into the future. We would actually just change the timestamp of the control messages, but the synchronizer in the media switch will then synchronize the streams, causing the video streams to be delayed by one second. This approach is better since we don't change the timestamps of the video, which we later use in our output container. If we changed the video time, we would need to change it back before putting it into the output container, otherwise the video will not be synchronized with the audio.

This is the config of the timestamp updaters, moving the control messages 1000 ms in the future.

timestamp updater config

Our final deployment looks like this.

back in time switching works

Just like that! Withing minutes you can automate your podcasting setup.

Try streamtasks!
GitHub: https://github.com/leopf/streamtasks
Documentation/Homepage: https://streamtasks.3-klicks.de
X: https://x.com/leopfff

Top comments (0)