Tasha for Daily

Posted on Sep 26, 2023 • Edited on Oct 2, 2023 • Originally published at daily.co

Daily AI Week* 2023: Voice, video, and AI for developers

#ai #news #webrtc #openai

By Kwindla Hultman Kramer, Nina Kuruvilla

Daily’s developer platform powers audio and video experiences for millions of people all over the world. Our customers are developers who use our APIs and client SDKs to build audio and video features into apps and websites.

We're now rolling out several AI initiatives, combining voice, video and AI for developers.

Last week we kicked off with the launch of our AI-powered Clinical Notes API, a first for telehealth developers. We're excited to follow up with two more toolkits, several new components of our global infrastructure, and hot-off-the-presses, AI-focused partnerships.

While we have more releases and announcements coming in the months ahead, we're delighted to start things off with AI Week, a series of content about our latest AI-related and AI-adjacent work. *We ended up having so much to share that AI Week turned into 2.5 weeks. Like OpenAI and GPT-3.5, we embrace floating point implementation improvements that round to ½.

Below we'll first run down what we're writing and talking about in this series. Then we'll give some brief context about AI at Daily, and how the “surface area” of AI intersects with our long-term work, roadmap, and mission.

Ahead This Week

Wednesday we'll start with our new Python SDK for AI-enabled WebRTC use cases like voice-driven Large Language Model UIs, real-time copilots, and interacting with synthetic characters. In our first use case example of live video and AI, we'll look at how to enable content moderation with ActiveFence, a leader in Trust and Safety.
Thursday we’ll do a technical deep dive into our AI-Powered Clinical Notes API for Telehealth, discussing the components and infrastructure behind it.
On Friday we'll share a post about how our engineering org ran a process to evaluate and manage AI tools for internal use at Daily.
Heading into next week, on Tuesday we'll show off some sample code that connects real-time video and audio to AI models running on the Cerebrium serverless GPU platform.
Next Wednesday we're excited to look at our new Automatic Video Highlights toolkit. We think these tools will change how everyone working with video (creators, influencers, marketers, SaaS platforms) approaches making edits and clips from recordings of live streams. We'll show components of an automated video editing workflow that starts with a poker session — courtesy of our friends at Cloud Poker Night, the platform for business minds who love poker — and ends with AI generating highlight clips ready to post on social media. We'll discuss its enabling technology, vcsrender, a new toolkit for server-side editing.
For next Thursday, we'll do a deep dive into building applications that interface with Large Language Models interactively. We’ll share demo code and several developer resources.

The Possibilities and Promise of AI in 2023

Starting last year, in quick succession, DALL-E, Stable Diffusion, and ChatGPT rewrote our understanding of what large-scale neural networks are capable of. So far this year, the pace of change in AI has continued to accelerate.

Over the past few months, we’ve helped many of our customers build AI-powered features spanning the video stack. These new features have combined WebRTC real-time video and audio, recorded video and audio, and generative AI and machine learning:

Talking in real-time to a Large Language Model – via both text and voice
Real-time translation during a video call – again with both speech-to-speech and speech-to-text output
Analyzing real-time video and audio with the help of specialized machine learning models – for example, for content moderation purposes
Co-pilot tools that provide specialized, on-the-fly help during a video call – for example, a tool that provides live coaching for sales professionals during meetings
Post-processing workflows that deliver summaries, analytics, and structured data output after a video call – for example, the AI-powered Clinical Notes APIs for Telehealth that we announced last week
AI-powered, automated video editing – highlights and short clips, templated output in multiple formats

A theme running through all of these use cases is that today’s new AI tools allow you to do more with data you already have. Video, voice, and events data carry meaning. Large Language Models can extract some of that meaning, save it for later use and reuse, transform it, generate structured data from it, and provide value and insights.

At Daily, our goal is to help developers build whatever they can imagine with video. This means building tools that support a wide range of use cases. To that end, we provide end-to-end infrastructure for every video experience, from real-time WebRTC, to live streams, to recorded video and audio, and now to AI.

As all of us are expanding our imaginations to include things that are now possible with new AI technologies, we’re expanding Daily’s capabilities to help enable new experiences that combine AI with video and audio.

How AI Fits Into What We Do At Daily

We’ve leveraged AI and machine learning internally in our core code for a long time at Daily for features like transcription, video background replacement, and large-scale analytics.

Now we’re extending Daily’s SDKs to support building app-level AI-powered features and workflows. This includes:

Sending real-time WebRTC audio, video, and events data to AI models and services
Connecting synthetic AI participants into real-time WebRTC sessions
Building post-processing pipelines that leverage Daily’s recording, compositing, and transcription building blocks in combination with Large Language Models and other AI tools

This is an important expansion of our platform's surface area.

To put this into context, we've shipped new features almost every week since we launched in 2016. In the first years of our platform development, those improvements focused on building out Daily's real-time capabilities — that is, giving developers more tools to build more features and more powerful offerings in, basically, video calls.

For example, we were the first platform to offer a full suite of HIPAA APIs and the first to offer an integrated customer support dashboard with complete application logs and WebRTC metrics. Supporting larger real-time experiences and more real-time interactivity is something we've consistently invested in, as well. Daily is the only developer platform that supports 100,000 active participants in a single, shared real-time video session.

We'll always view this real-time infrastructure and feature set as the core of everything we do! At the same time, in a video-first world, real-time complements broadcast streaming and recording like peanut butter goes with jelly, chips with salsa, rice with pickle (or however your palate moves you!).

In 2021 we embarked on a very large addition to Daily’s API footprint. Expanding our feature set and our global infrastructure, we defined an ambitious roadmap for cloud compositing, recording, and broadcast live streaming. The results of that roadmap and subsequent development effort include:

Daily’s Video Component System
HD cloud recording with output quality and frame rate guarantees
HIPAA-compliant recording
Multi-streaming
Recording raw rtp track data
Support for recording HEVC input tracks
WebVTT metadata support
HLS output with configurable bitrate ladders and storage destinations

For developers, these capabilities unlock new use cases, and are significantly different from and more advanced than similar capabilities available on other platforms. Fundamentally, this work organically extends the WebRTC real-time features that were our initial focus when we started Daily in 2016.

Now we’re approaching how we enable new AI capabilities the same way that we approached building our industry-leading recording offering. We're adding toolkits, APIs, and SDKs so developers can seamlessly expand what’s possible in their apps.

Do More With Your Data

Even as interest in cutting-edge, generative AI continues to increase, it’s clear there's so much more ahead; we’ve only just started to explore the possibilities of these new tools.

The capabilities we’re announcing this week are part of Daily’s long-term commitment to be the best platform and provide the best tools for combining AI and Large Language Models with WebRTC and video.

Our perspective on these new capabilities is informed by our experience building specialized real-time infrastructure that routes video and audio packets at real-time latencies to millions of people, all over the world. We understand what it takes to ship products, scale usage, and support customers. Just as we’ve helped thousands of teams build video experiences in the first place, we can help you use AI to do more with and get more value from your video and audio data.

We love to write and talk about this stuff. If you’re interested in how generative AI overlaps with interactive video and audio, please check out the rest of our posts this week, join us on the peerConnection WebRTC forum, and find us online or IRL at one of the events we host.