Carlos Ocejo

Posted on May 11, 2023

Do you know what it takes to create a WebRTC video conferencing application?

#webrtc #webdev

I didn't know either, I had some experience using WebRTC for one-to-one communication, so doing it with more participants seemed easy to me, but I didn't know what I was getting into, because there were several concepts that I didn't know and I had to understand before to be able to start.

My experience of WebRTC vs. video conferencing app

Let's start with the basics What is WebRTC?

Web Real-Time Communication (WebRTC) is an open-source project supported by Google, Mozilla, and Opera among others, and a group of APIs and protocols developed at the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) that allows us to have real-time communication by integrating video, audio, and data natively through the browser, even on mobile devices.

Web Real Time Communication

How does it work?

The keyword here is “native”, which means that you don't need to install any plugin or software for it to work, all thanks to a set of APIs that make it possible, which are:

getUserMedia (gUM): It is responsible for safely obtaining the audio and video devices, by asking the user for the necessary permissions to share. This API only works on pages served through HTTPS, to take advantage of the security and features it provides, so the connection is secure.
RTCPeerConnection: Allows communication between peers *(users), the connection is made using SRTP, which is a secure variant of RTP. It should be noted that RTP is the network protocol designed to deliver audio and video over IP networks. Therefore, RTCPeerConnection handles media transmission between *peers and provides end-to-end delivery services for real-time data in a secure manner. Another protocol used is RTCP, which provides information about the quality of service provided by RTP.
DataChannel: It helps us to send any type of data like a text chat or file transfer.

Peer-to-peer connection

Signaling

WebRTC cannot create connections alone, a communication channel is needed to exchange information before establishing a connection, this is called a signal channel or signaling service. It is worth mentioning that it is not part of the WebRTC standard, but it is a fundamental piece.

The information that is exchanged is the “offer and response” using SDP (Session Description Protocol) which contains the codec, source address, and audio and video time information.

Peer A initiates the connection, and creates a message type called offer. Then it sends this offer to peer B using the signal channel. Peer B receives the offer on the signal channel and creates a response. Finally it sends it back to peer A using the same signal channel.

Also, the peers must exchange information about the network connection. This is known as an ICE candidate (ICE candidate) and gives us information for the connection such as the IP address and port of each host.

Once each peer knows where the other is, the connection is established directly. Once the connection is started, the server is no longer needed but can be used for participant authentication and session control.

Peer to peer connection using signaling

Suspiciously simple right? But something is not right.

Suspiciously simple

WebRTC uses peer-to-peer connections, where voice, video, and data connections are established directly between browsers. Unfortunately, firewalls and NAT make this difficult, to achieve that there are STUN and TURN servers.

STUN and TURN servers

A STUN server (Session Traversal Utilities for NAT) lets the peer know its public network information and how it can be reached from the Internet.

If the direct connection between the peers fails, or is not possible, a TURN (Traversal Using Relays around NAT) server can retransmit the traffic to the other peer, but this results in higher consumption of resources.

STUN Server

Peer to peer connection via internet

With this, we have the complete flow of how WebRTC works, in this case, the communication is one-to-one, but what happens when we need more participants?

Well, we know that for the connection to be possible each peer must create a connection object, this object contains all the audio, video, and data information, as well as the public network information of the STUN and TURN servers.

When this process is repeated, in each peer an additional object is created for each peer connected, this is called mesh topology.

Mesh topology, each peer must maintain a connection to each connected peer

If the number of peers increases, the browser must maintain a connection with each of the peers, which in turn increases resource consumption and can be problematic for low-end or mobile devices. This is the simplest solution since you don't have to change anything in the infrastructure because all the work is done in the browser, but if you need to support more users, you should take another strategy.

Multipoint Controller Unit (MCU)

The idea behind this is to have a central media server, which is in charge of receiving, decoding, processing, encoding, and sending back the streams to the peers. Therefore, the peers only maintain a single connection with the media server, instead of having one for each peer. So if the number of peers increases, the device resource consumption will not be impacted.

Multipoint Controller Unit (MCU)

With this strategy, the problems of the mesh topology are solved, but a server with enough resources is needed because processing, encoding, and decoding all the streams consume a lot of CPU, and this directly impacts the budget.

Selective Forwarding Unit (SFU)

SFU is similar to MCU, it also uses a media server, the difference is that instead of doing all the processing, the server redirects the streams to the peers so that they do the necessary processing even the server can select whether to send or not a specific stream according to the peer that is going to receive it, for example, the video quality can change depending on the connection speed of the peer, this way the server doesn't need to be so powerful. Therefore, it is a better option since the problems of mesh topology are avoided while taking the advantage of MCU without the high associated costs.

Selective Forwarding Unit (SFU)

We now know that WebRTC works natively in the browser, uses open APIs and protocols to establish secure peer-to-peer connections through STUN and TURN servers and previously coordinated by a signaling server, as well as the different strategies and topologies that can be used.

As you can see, creating a videoconference platform may seem simple, but it starts to get interesting when you apply it to the real world, since you must take into account different aspects and concepts, as well as the advantages or disadvantages of certain topologies according to the use that you want to give it, without leaving costs aside.

I hope this post will serve as a starting point to venture into the development of applications with WebRTC.

Top comments (1)

Tim • Jun 1 '23

Great article! I've been supporting the Ant media server github.com/ant-media/Ant-Media-Server for a while and can definitely tell you that its not easy and simple by all means.

DEV Community

Do you know what it takes to create a WebRTC video conferencing application?