Why did ZEGOCLOUD choose to build its own proprietary WebRTC gateway instead of using an open-source project?

#zegocloud #webdev #webrtc #video

ZEGOCLOUD's RTC platform
is mainly composed of an acceleration data network called Massive Serial Data Network (MSDN),and a client-side RTC engine. Both components were built in houseby ZEGOCLOUD. The platform supports accesses from native mobile applications and web applications. It also enables communications between native mobile applications and applications.

To support web client access, ZEGOCLOUD needs a WebRTC gateway server to bridge its data network and the web applications. There are some open-source projects out there that can be used to build WebRTC gateway servers, such as Kurento and Janus. Then, why did ZEGOCLOUD choose to develop its own proprietary WebRTC gateway server from ground up? We will answer the question in this article.

To answer the question well, we will also cover several related topics as follows:

Why did ZEGOCLOUD need a WebRTC gateway ?

After first being introduced by Google in 2011, WebRTC 1.0 was officially made a W3C Recommendation on January 26th, 2021. With the evolution of WebRTC and its growing adoption in various industries, there are many solid use cases in the internet industry nowadays. We will look into a few use cases to explain why a WebRTC gateway is needed.

Under many circumstances, such as online fitness and remote interviews, users would prefer using a web browser-based video conferencing service rather than installing an application that needs to be downloaded and installed from an app store because using a web browser is much quicker and easier. In addition, users are generally reluctant to install an app for a service that they don't frequently use. From a service provider's perspective, allowing users to access their services through a web browser helps them reduce the cost of user acquisition and increase user conversion rate. Therefore, enabling service access from a web browser makes a lot of sense to both the users and service providers. Driven by requirements from many clients, ZEGOCLOUD recognized that allowing user access from a WebRTC-enabled browser is necessary and decided to add a WebRTC gateway to its RTC system.

Should we choose an open-source WebRTC gateway or build a proprietary one?

Well, there is no simple answer to this question. It depends on many factors and considerations To make the right choice, we carefully assessed our requirements and the features and constraints of several commonly used open-source WebRTC Gateways.
The open-source projects listed below are the most common ones in the market. We need to evaluate and compare them based on a number of factors, including server type, encoding & decoding capacities, documentation completeness, and the producer's background. First of all, there are two types of servers, i.e., Selective Forwarding Unit (SFU) and Multipoint Conferencing Unit (MCU) . SFU provides a routing function of simple relay. and MCU implements a mixing architecture and provides many extendable functionalities such as stream mixing capability and transcoding. And a typical MCU includes an SFU. Hence, the implementation of an MCU is more complicated. Secondly, the documentation completeness is paramount because it serves as a very important guide for developers. Lastly, the producer’s competency is also vital, which indicates whether the project will be continuously upgraded and supported. Additionally, you need to consider issues of intellectual properties or copy rights if you project is a commercial one. Let's take a closer look into these candidates of WebRTC gateways one by one.

1）Kurento is the most versatile one of the open-source projects in the above list. For example, it supports transcoding and has add-on features like video filters. But upon testing, we found that it was not quite stable. It also provides a cloud-based solution, and it seems that the producer open-sourced the project with an aim to promote their cloud service.
2）Janus was developed by Meetecho, aiming to provide a gateway server. The video call solution launched by Slack is based on this open-source project. But upon testing, we found that there are some problems with the performance of Janus. Slack has done a huge amount of work to optimize it.
3）Jitsi is relatively the most stable project in the above list. It is stable because it is relatively simple. It only implements an SFU rather than an MCU, which is essentially a relay router.
4）Licode provides both SFU and MCU functionalities. Its architecture was designed to support plug-ins. In other words, you can supplement your existing system with additional functionalities that Licode provides while keeping the design of the the existing system unchanged.
5）Intel CS for WebRTC is a solution that Intel built with Licode. It is free but not open-sourced. It provides a set of client and server end SDKs. It is the only project in the above list that provides protocol conversion between RTMP and WebRTC. Also, it is a good choice for use in conjunction with other Intel solutions.
6）MediaSoup only supports SFU. Mediasoup can be used as Node.js module or Rust crate which can be integrated into a larger application. At the time we evaluated it, it was not quite stable yet since it hadn't been out for long.
Now, back to our original question: Since there are so many open-source WebRTC gateways as introduced above, why did ZEGOCLOUD still decide to build its own proprietary WebRTC gateway? There are a few reasons:
1) These open-source projects available at the time cannot satisfy all the needs of commercial use. The open-source projects discussed above are not based on a distributed system architecture. If you want to implement a back-end server with a distributed architecture that is able to support a large scale of users based on these open-source projects, you have to refactor their architecture, which would cost a large amount of R&D resources and time that is comparable to those required by in-house development.

2) By building a WebRTC gateway from scratch, we can master the related technologies and have the freedom and ability to customize the framework of the gateway and optimize it according to our business requirements. There are some key technologies in WebRTC, such as RTP, RTCP, DTLS, and NETQ that are worth in-depth study. Actually, we already had a successful experience with RTMP. Our in-house built RTMP solution achieved an ultra-low latency of 300ms in real-time audio/video interactions. As RTMP is built upon TCP protocol, it is very challenging to reduce the latency to such a low level. But building the solution from ground up by ourselves helped us understand the technology well and enabled us to customize our design and optimize our performance to meet such challenging requirements. And this is also one of our core advantages.

The advantage and disadvantage of WebRTC in various communication models

WebRTC is designed to allow communication in a peer-to-peer (P2P) model, where the participating peers can connect directly without a server involved. As opposed to P2P, there is another kind of communication that is established in a client-server model. where the server handles the media relay and signaling. A typical commercial RTC system uses a client-server model.

In a basic WebRTC P2P connection, the two communication peers connect directly without a server. If there is a NAT in between them, WebRTC uses a STUN server for firewall traversal. If a STUN server doesn’t work well, a TURN server is used to relay media data for both peers. Whether to use STUN or TURN is orchestrated by an ICE server. Normally, these three servers are running on one single physical server. Since there is no server to guarantee bandwidth and computational power, The quality of transmission is vulnerable to congestion that occurs very often on the Internet.

In a typical case of RTC connections in a client-server model, the two communication peers connect indirectly via a cluster of servers, which take care of many things, including network access scheduling, routing, load balancing, media relay and mixing, and other tasks. There are many servers in the cluster, organized in a sophisticated way to form a data transmission network. In ZEGOCLOUD’s case, MSDN plays the role of this data transmission network. In a route between the two communication peers, the quality of transmission is safeguarded by sufficient bandwidth and computational power of the servers.

The advantage of the P2P model is its low cost of communication. As there is no server involved, the cost of bandwidth and computation is eliminated. Without server relay, traveling time between the two ends will be reduced. However, the disadvantage of the P2P model is its low quality of communication. Without a back-end media server, the Internet’s best-effort delivery approach often results in stutter and delay in P2P real-time communication.

Let’s take a closer look at several communication scenarios, and see the disadvantages of the P2P model with WebRTC as an example.
1) One-on-one. A one-on-one video call is a typical example of this kind of scenario In a one-on-one call, each user publishes an uplink stream and subscribes to a downlink stream, and the bandwidth required under the P2P model is similar to that required under the client-server model.
2)Many-to-many. A typical example of this scenario is a video conference with multiple parties involved. Let's say there are X number of users involved. In the P2P model , each user has to publish (X-1) uplink streams and subscribe to (X-1) downlink streams. While in the client-server model, each user has to publish one single uplink stream and subscribe to (X-1) downlink streams. We can see that the bandwidth consumption for uplink streaming in the P2P model will be much heavier than that in the client-server model.That's why it is very difficult to scale up the number of users in the P2P model.
3)One-on-many. A typical example of this kind of scenario is a one-way live streaming. The P2P model is not suitable for such scenario because it cannot support the scale of concurrent stream subscribers.

4)Communication with other protocols. Live streaming can be used as an example for this scenario too. For live streaming, a content distributing network (CDN) must be used to support a large scale of concurrent users. To use a CDN, streams need to be delivered using the RTMP protocol, which uses AAC and H.264 to encode and decode media data. As WebRTC uses RTP/RTCP protocol for data transmission, a WebRTC stream needs to be converted to RTMP protocol, and its media payload needs to be transcoded from OPUS&VP8 to AAC&H.264, or the other way around. An MCU is needed here to perform transcoding and transmuxing.Apparently, WebRTC in the P2P model doesn’t satisfy this requirement.

Therefore, WebRTC in the P2P model has many constraints, such as bandwidth, scalability, stability, smoothness, real-timeness, and compatibility Its advantages of low cost and open-source won’t outweigh its significant disadvantages as discussed above.

Commercial RTC solutions versus pure WebRTC (P2P) solutions

It would be fair to say that commercial RTC solutions have a clear competitive edge over pure WebRTC (P2P) solutions. To be more specific, the following are some major aspects where commercial systems outperform pure WebRTC solutions:

1)Cost-efficiency
Building a commercial-grade RTC system is an arduous task that requires a significant amount of R&D time and effort. In general, you will need to assemble a team of veterans who have at least five years of experience in multimedia software development, and it would cost about six months to a year to build a functional RTC system and make it generally available. The multimedia team will at least need 6 developers to work on the essential modules of WebRTC, such as transmission (RTP/RTP), voice engine (NetQ, ANS, AGC and AEC) and video engine (jitter buttering etc.).
A commercial RTC platform allows you to add RTC capabilities into your product with a few lines of code and get your project off the ground in a couple of weeks. In addition, it allows your platform to use RTC services in a pay-for-usage model.
So, by using a commercial RTC system, you can save your development costs, accelerate your time to market, and enjoy commercial-grade RTC performance without having to maintain all the underlying infrastructures.
2)Scalability
It is very hard to scale up a pure WebRTC (P2P) solution because of a few reasons: firstly, there is no server for a pure WebRTC (P2P) solution (ICE, STUN, and TURN are only for NAT traversal) to support system expansion; secondly, the nature of the P2P communication model consumes an excessive amount of bandwidth at the user end, and make it very hard to scale up; lastly, in order to support a high volume of concurrency and cross-border communications with global coverage, a sophisticated data transmission network is necessary.
A commercial RTC platform can save you from the aforementioned constraints given its server-centric architecture and powerful data transmission network. With clusters of servers taking care of heavy computations and transmissions such as nearby network access, smart routing, transcoding, transmuxing, and stream mixing, a commercial RTC system doesn't have a limitation on scaling up. Theoretically, it can support as many users as needed by just adding more servers horizontally. Meanwhile, to guarantee the user experience in cross-border communications, a commercial RTC platform normally have a lot of servers deployed around the world. For example, ZEGOCLOUD has deployed more than 500 BGP servers around the globe. ZEGOCLOUD has joined all these infrastructure components with its sophisticated transmission and routing algorithms to form a global data network called Massive Serial Data Network (MSDN), which allows you to scale up your user base without constraints and accelerate voice & video data transmission to reach an ultra-low latency.

3)User experience
The most important factors in evaluating an RTC platform’s user experience include audio/video quality, smoothness, and latency, which impose a great impact on user experience.
A P2P WebRTC-based system cannot support multiple streams of high definition because of its bandwidth constraint at the user end. Without the support of a data transmission network, a pure WebRTC (P2P ) solution cannot guarantee a smooth and low-latency communications experience.
While with a commercial RTC platform that is based on the client-server model and has a data transmission network, a communication peer just needs to publish an uplink stream and can either subscribe to a single downlink stream or multiple streams.This gives a commercial RTC platform a much higher capability for streaming high-definition videos. Also, mature commercial RTC platform like ZEGOCLOUD has been in the market for 7 years, and as driven by clients’ demands, ZEGOCLOUD has been optimizing its user experience to stay ahead of the curve. After all, if a commercial system cannot deliver great performance, its clients won’t pay to use its service.
4)Use case support
WebRTC is an open-source technology with fundamental voice and video call abilities, but was not designed for any specific business purposes or use cases in the first place. There is a big gap between WebRTC and a business platform empowered with RTC ability. A great advantage of a well-established commercial RTC platform is that it has been evolving to support hundreds of uses cases of its representative clients. It has customized its RTC features to cater to its client’s needs in various scenarios. For example, live streaming use cases require that streams are encapsulated using the RTMP protocol so that they can be broadcasted with CDN. Most commercial RTC platforms support RTMP protocol while WebRTC doesn’t. Anther example is online karaoke chorus, where there is a strong demand that lyrics and accompaniments are displayed in sync with the vocal track. ZEGOCLOUD customized its solution and enriched its RTC platform to support this use case. Over the years, ZEGOCLOUD has created many solutions for a diverse set of use cases in a wide range of vertical markets , and that’s something a pure WebRTC(P2P) solution cannot do.

5)System compatibility
RTC technology has evolved for decades, and there have been various protocols to cater to different needs, to name a few: SIP, H323, and WebRTC. Moreover, in terms of transmission protocols, there are RTMP and RTP/RTP. When it comes to encoding and decoding, there are H264 and VP8 for video, and there are AAC and OPUS for audio. For a platform, its users may come from various kinds of terminals. For example, a user who makes a video call from an RTMP-based RTC app won’t be able to communicate with the other user using a Google Chrome browser. There must be a gateway server in between to do protocol translating and media data payload transcoding. This scenario is far beyond what a typical pure WebRTC(P2P) solution can do. A commercial RTC platform is compatible with terminals of different kinds. For example, ZEGOCLOUD allows access from various kinds of terminals besides its own native SDK, such as WebRTC and SIP. ZEGOCOUD’s MSDN is compatible with a WebRTC terminal through a WebRTC gateway. The WebRTC gateway translates the WebRTC protocol into ZEGOCLOUD’s proprietary protocols and also transcodes VP8/OPUS into H.264/AAC, offering seamless compatibility for WebRTC-enabled browsers or WebRTC-based applications.

WebRTC has been a great textbook for developers who work on RTC, however, there is a big gap between what it offers as an open-source standard and actual commercial use. System for commercial use must meet the high demands for stability, scalability, and compatibility., which ultimately translate into revenue, cost and risk. The results of experiments by many companies have shown that a pure WebRTC solution is not adequate to satisfy the very challenging demands of commercial use. Therefore, ZEGOCLOUD decided to build its own proprietary WebRTC gateway in-house rather than adopt an open-source project.

Visit ZEGOCLOUD website to learn more about what you can build with real-time audio and video!

You can contact us if you have any questions