Playing With the New Document Picture-in-Picture API in Video Calls

There is a new Web API in town — the Document Picture-in-Picture API

The origin trial of this API is rolling out with Chrome M111 (released 7th March, 23). While it might not make it to all browsers, it can definitely pull a few (useful) party tricks in your application.

What is Picture-in-Picture?

Picture-in-Picture (PiP) is a feature in web browsers that allows a floating window, positioned on top of other windows. It can be moved and resized independently of the web page it originated from. This feature allows users to multitask while watching videos, as they can continue to browse other pages or applications while the video plays in a small window.

Video PiP

In modern web browsers like Chrome, Firefox, and Safari you can pop out the video stream of an existing HTMLVideoElement element.

There also exists a browser API for the same. If you want to do that from JavaScript, you can call requestPictureInPicture() method on a HTMLVideoElement to pop out the video. Chrome, Edge, and Safari support this browser Web API, however, Firefox has no programmatic way of triggering PiP.

Up until now, Picture-in-Picture was only implemented for video playback which severely limited the applications

A powerful new API: Document Picture-in-Picture API

Last December Chrome team put out a proposal to experiment with a new PiP API, that would allow any arbitrary content inside an always-on-top window.

This would work in a similar manner to how you would open a new window using window.open(), although the newly opened window would always stay on top of other windows and die whenever the original page becomes inactive.

const pipWindow = await documentPictureInPicture.requestWindow();

The above code will pop out a new window, now you can add any arbitrary element to this window

pipWindow.document.body.innerHTML = "Hello!";

Another very interesting difference when comparing it with window.open() is that you don’t have to use postMessage to communicate between the two windows — the parent and the popped out — you can directly access its JavaScript context directly. So you can simply move the DOM element from the parent window to the new window.

// from parent
const elem = document.getElementById("div#parent");
// to pop-out
pipWindow.document.body.appendChild(elem);

::: Note :::

This API is behind an Origin Trial

Origin trials enable developers to build demos and prototypes using new features. They also help Chrome engineers understand how new features are used, and how they may interact with other web technologies.

You can register for this trial at https://developer.chrome.com/origintrials/#/view_trial/1885882343961395201 or enable the flag for this API at chrome://flags/#document-picture-in-picture-api

Applications in Browser video calls

A typical video call involves someone or the other screen sharing content, and when they do that the video call loses focus and is backgrounded and some other screen or tab is in the foreground for the user who is screen sharing.

At this point the screen sharing user loses access to the chat, has to keep switching back and forth to the browser tab to access chat, and then move back to the shared screen content to resume.

With this new API, we can pop out the chat, so the user can share their screen without losing the track of the conversation in the chat

Let’s see how to implement this

// Check if the new API is supported, 
// Chrome 111+ with flag enabled or with origin trial
if ('documentPictureInPicture' in window) {
    // Check if user is screensharing
    if (screenShareEnabled) {
       // Check if the is screensharing a tab, 
       // we don't want to pop-out if they are screensharing the entire screen
       if (screenVideoTrack.__proto__[Symbol.toStringTag] === 'BrowserCaptureMediaStreamTrack') {

Now, if all of these are true, let’s pop out the chat!

pipWindow = await window.documentPictureInPicture.requestWindow({
       initialAspectRatio: 1/7,
});

(The section below specifically follows how Dyte meeting chats can be rendered in a separate window, but you can render any chat UI in a similar manner)

We can now render <dyte-chat /> from our Web Components UI Kit in the pipWindow

import { defineCustomElements } from '<https://cdn.jsdelivr.net/npm/@dytesdk/ui-kit/loader/index.es2017.js>';
defineCustomElements();

const chatElem = document.createElement('dyte-chat');
document.body.appendChild(chatElem);

Then you can pass the meeting context from the main window to pipWindow

pipWindow.document.querySelector('dyte-chat').meeting = meeting;

Live demo

You can go to demo.dyte.io and start a meeting (no login required). As soon as you screen share a “Tab” on Chrome M111+, the chat should pop out on the right side of your screen.

Summary

This is an exciting new API that further bridges the gap between native applications and browser applications. While it is still under an origin trial and might not exist a few months later, we definitely see a few real-world use cases that would be impossible without this API!

Live video is shaping the way the world interacts and we at Dyte are doing our bit to make it more seamless, engaging, and fun. So, if you are building a product that requires the use of live video/audio, do check us out and get started here on your 10,000 free minutes which renew every month. If you have any questions, you can reach us at support@dyte.io or ask our developer community.