The annual Kranky Geek RTC Conference was held a few months ago in San Francisco. It was an opportunity to listen to more than a dozen speakers from companies including Apple, AWS, Dolby, Google, Microsoft, Mozilla, and more discuss perspectives on solutions, emerging trends, and the complexity of delivering real-time audio & video on the web.
While the content isn’t aimed at those new to WebRTC to learn the basics, a lot of the topics are still interesting and engaging for all skill levels to get context into the technology and its application. Below is my list of some trends I observed while at the event.
It’s been a 10-year journey to get WebRTC 1.0 where it is today in what was characterized in the opening talk as a niche area with less than 0.10% of questions tagged for a related technology. That is contrary to the growth in new projects, new emerging use cases, and the full-house that packed into the room to learn about how browsers are handling these capabilities – with team representatives from Firefox, Safari, Chrome, and Edge.
The browser panel covered a variety of concerns and features coming including media handling, mobile versions, progressive web apps, improvements through AI/ML, security, and developer tooling.
Security is always a concern in any system but was given special attention in a few talks at Kranky Geek. Natalie Silvanovich from Google’s Project Zero discussed a few of the dangers from recent exploits that allowed remote execution of code without even requiring a call to be answered by the target. While this is a more high profile example, looking at websites like Zerodium demonstrate the bounties that are paid for discovering exploits in products.
If you are looking to add WebRTC capabilities to your site, you should either consider using a CPaaS that is already handling the security issues properly or make sure you are doing enough risk mitigation such as some of the ideas listed in the conclusions below:
Here are a few of the talks to check out:
If you aren't diligent about security, you may be better off using a
Communications Platform as a Service (CPaaS) that has.
The "C" in WebRTC is for communication, and if you for two-way communication good audio quality is important.
Microsoft has been extending its browser support for content protection as well as new audio and video capabilities. It’s an area that Google has also been trying to address in new media processes in Chrome – moving audio processing closer to the hardware and coping with mismatched audio devices / configurations through dedicated audio and video processes.
For audio perception, Paul Boustead from Dolby gave a good introduction to the theory behind spatial audio. Given one of the primary objectives in WebRTC is communication, audio is an important part of exchanging information that the best experiences are able to cope with overlapping speech without cutting out important affirmations and verbal cues.
While audio is important for much of communications, for certain applications it is very important to be able to see what is happening with facial expressions, or just the general area. The talk WebRTC – More than Media and Optimizing H.264 Encoding for Self-driving cars discussed applications of tele-operations for robots or automobiles where visual information is crucial for a participant to make decisions. Between those talks and Handling 4K WebRTC Streams with Embedded Hardware there was a lot of interest in exploring
the trade-offs in the video streaming landscape to balance latency, packet
loss, network congestion, and throughput to give the best perceived experience.
Through a variety of applications one can see how WebRTC with audio and video can be used to solve a variety of problems. In the talk The State of Speech Recognition, Jeff “Susan” Ward made a few insightful observations: Speech is a major untapped input/output, recognition can balance latency vs. accuracy, and the power of getting to meaning, not just text as the future is not the word but the meaning and intent behind it.
The use case in using WebRTC for gaming from the talk Google WebRTC & Stadia Review is an example where the original vision for WebRTC had not considered. What was particularly interesting to me was the research done “On Latency and Player Actions in Online Games”. It isn’t the latency but the perception of latency that is important for enjoyment and that can vary widely depending on the type of game and interaction (a first person shooter vs. a trivia game for example). One thoughtful detail in this talk was discussing some of the developer tools at https://stadia.dev that will be important for anybody trying to leverage WebRTC for gameplay.
That's my summary for what interested me the most, but obviously there was a lot more detail by viewing the talks themselves. If you want to find them, check out the Kranky Geek Youtube channel: