5 Interesting Trends in WebRTC from Kranky Geek 2019

#webrtc #dolby

The annual Kranky Geek RTC Conference was held a few months ago in San Francisco. It was an opportunity to listen to more than a dozen speakers from companies including Apple, AWS, Dolby, Google, Microsoft, Mozilla, and more discuss perspectives on solutions, emerging trends, and the complexity of delivering real-time audio & video on the web.

While the content isn’t aimed at those new to WebRTC to learn the basics, a lot of the topics are still interesting and engaging for all skill levels to get context into the technology and its application. Below is my list of some trends I observed while at the event.

(1) WebRTC Reaching Version 1.0

It’s been a 10-year journey to get WebRTC 1.0 where it is today in what was characterized in the opening talk as a niche area with less than 0.10% of questions tagged for a related technology. That is contrary to the growth in new projects, new emerging use cases, and the full-house that packed into the room to learn about how browsers are handling these capabilities – with team representatives from Firefox, Safari, Chrome, and Edge.

The browser panel covered a variety of concerns and features coming including media handling, mobile versions, progressive web apps, improvements through AI/ML, security, and developer tooling.

(2) Security is Important in RTC

Security is always a concern in any system but was given special attention in a few talks at Kranky Geek. Natalie Silvanovich from Google’s Project Zero discussed a few of the dangers from recent exploits that allowed remote execution of code without even requiring a call to be answered by the target. While this is a more high profile example, looking at websites like Zerodium demonstrate the bounties that are paid for discovering exploits in products.

If you are looking to add WebRTC capabilities to your site, you should either consider using a CPaaS that is already handling the security issues properly or make sure you are doing enough risk mitigation such as some of the ideas listed in the conclusions below:

Another concern that has been prevalent in recent years has been the misuse of IP discovery used by WebRTC to act as a fingerprint and track users. As Philipp Hancke of webrtcHacks said in his talk: “WebRTC is not an API designed to let you discover a user’s IP without their consent.” When accessing user media from the microphone or camera, browsers are adding additional protections and solutions such as mDNS – a multi-cast DNS approach to generate local network names that resolve to IP addresses without exposing it to JavaScript for alternate purposes.

Here are a few of the talks to check out:

If you aren't diligent about security, you may be better off using a
Communications Platform as a Service (CPaaS) that has.

(3) Communication Requires Good Audio Quality

The "C" in WebRTC is for communication, and if you for two-way communication good audio quality is important.

Microsoft has been extending its browser support for content protection as well as new audio and video capabilities. It’s an area that Google has also been trying to address in new media processes in Chrome – moving audio processing closer to the hardware and coping with mismatched audio devices / configurations through dedicated audio and video processes.

For audio perception, Paul Boustead from Dolby gave a good introduction to the theory behind spatial audio. Given one of the primary objectives in WebRTC is communication, audio is an important part of exchanging information that the best experiences are able to cope with overlapping speech without cutting out important affirmations and verbal cues.

You can read more about this talk in the article Improving Intelligibility with Spatial Audio or check out the talk video.

(4) Video Can Be Perceived Better Through Network Management

While audio is important for much of communications, for certain applications it is very important to be able to see what is happening with facial expressions, or just the general area. The talk WebRTC – More than Media and Optimizing H.264 Encoding for Self-driving cars discussed applications of tele-operations for robots or automobiles where visual information is crucial for a participant to make decisions. Between those talks and Handling 4K WebRTC Streams with Embedded Hardware there was a lot of interest in exploring
the trade-offs in the video streaming landscape to balance latency, packet
loss, network congestion, and throughput to give the best perceived experience.

(5) WebRTC is Not Just for Conferencing, Several Emerging Use Cases

Through a variety of applications one can see how WebRTC with audio and video can be used to solve a variety of problems. In the talk The State of Speech Recognition, Jeff “Susan” Ward made a few insightful observations: Speech is a major untapped input/output, recognition can balance latency vs. accuracy, and the power of getting to meaning, not just text as the future is not the word but the meaning and intent behind it.

The use case in using WebRTC for gaming from the talk Google WebRTC & Stadia Review is an example where the original vision for WebRTC had not considered. What was particularly interesting to me was the research done “On Latency and Player Actions in Online Games”. It isn’t the latency but the perception of latency that is important for enjoyment and that can vary widely depending on the type of game and interaction (a first person shooter vs. a trivia game for example). One thoughtful detail in this talk was discussing some of the developer tools at https://stadia.dev that will be important for anybody trying to leverage WebRTC for gameplay.

Want More

That's my summary for what interested me the most, but obviously there was a lot more detail by viewing the talks themselves. If you want to find them, check out the Kranky Geek Youtube channel:

https://www.youtube.com/channel/UC9qvM7eiCvDRO5Sm28byZiw