The world of web-based communication is changing at the speed you might not even imagine. Now, for making a video conferencing call, sharing a file—all that you need is a URL, thanks to the power of WebRTC for opening the door of infinite possibilities for developers and businesses.
Supported by major browsers including Chrome, Safari, Mozilla, and Opera, the scope of new standard Web Real-Time Communication goes beyond the realms of web browsers.
But wait! it’s not that simple: it leverages a variety of protocols and standards including SDP (Session Development Protocol), SIP (Session Initiation Protocols), NAT, ICE, and UDP/TCP, and many more to provide secure, interoperable (between browsers), real–time browser-based peer-to-peer communication.
Similar to any genuinely disruptive technology, WebRTC may initially sound cool and magical! IT folks can be lured by the promises of building the next Zoom application only to discover how complicated it can be to gain an idea of the multiple moving components and understand how all of them fit into the overall piece of the puzzle—called WebRTC.
You can, however, get into the skin of things once you understand the basic concept. To enable WebRTC communication, the following four steps are required:
The API is used to access a webcam or microphone of the device and enables developers to receive access to the video/audio stream objects.
It assists in selecting the desired input user device out of multiple media capture devices. Whether it’s taking a profile picture of a user, collecting audio samples, or recording audio/video—getUserMedia API performs these tasks.
For instance, to open a default media device, it works the following way:
- A call to getUserMedia () instantly triggers a permission request that needs to be accepted by the user to get access to MediaStream.
- In case the permission is denied, it throws PermissionDeniedError.
- If it doesn’t find any matching device, it throws NotFoundError.
It is the heart and soul of WebRTC, and the most complicated one. It almost performs all tasks taking place inside a peer-to-peer communication.
It performs the following functions:
- Setting–up and creating a peer-to-peer connection
- Taking care of session management
- Managing all Session Description Protocol (SDP) message exchanges and handling negotiations through ICE candidates (uses STUN and Turn if required)
- Encoding and decoding media streams (audio/video/text) in real time
- Handling all network related issues such as bandwidth estimation, packet loss etc.
Once a peer connection between browsers is established, multimedia streams can be sent to the remote browser. This, however, is not that easy as it sounds because of the following three distinct possible scenarios:
- It is likely that both peers might reside within their private networks or behind multiple layers of NAT. Consequently, neither of them is approachable.
- They do not have the basic network information, such as IP, port, and location about the other which are vital to establish communication.
- And finally, both will require to traverse the NAT.
A proper understanding of why these scenarios arise at the first place is important. The simple reason is Internet has moved beyond the client-server paradigm long ago.
Before starting communication between browsers, it needs three things:
- Identify peers
- Exchange session descriptions to setup media ports and IPs
- Information about media data which is imparted through SDP (Session Description Protocol)
Nowadays, people prefer accessing the web behind firewalls or NAT that masks your original IP address by dynamically changing it. What public at large see the IP can be very different from the original IP of the user hidden behind a firewall, and there are some devices that blocks the unsolicited traffic toward users’ network. Some enterprises don’t allow any traffic to their network without vetting it. As a result, it is not always possible to communicate with the peer browser located in the private network.
That’s where the roles of STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers come into the picture.
This is how the process goes:
- A request for public IP address is sent to STUN/TURN servers.
- Now, this server responds with the IP address that it perceives correct.
- It creates a set of Interactive Connectivity Establishment (ICE) that contains IP address, port, and transport protocols.
- With this information about the public IP and port, it easily connects with the peer.
- The peer browser, on the other hand, does the same thing while using the STUN or TURN server.
Here it is to be noted that signalling is not a part of the WebRTC framework, it was left out for valid reasons. Different applications might need different protocols and the working group for WebRTC did not want to limit the choices for developers.
Apart from audio and video, WebRTC manages the bidirectional transmission of arbitrary data including text chats, games, and other files through RTC DataChannel API. Every data channel is connected through this API.
The importance of WebRTC in peer-to-peer communication is obvious, but multiple factors like Multiple Conferencing Unit (MCU), multitenancy, SIP integration need to be taken into account while building a reliable, robust, and scalable video calling solution. For developers, there is a better way to build a video calling solution—they can opt for a CPaaS service provider like EnableX that offers all features, building blocks, and SDKs to build exciting and scalable video calling solutions.