ndesmic

Posted on Oct 22, 2021 • Edited on Aug 13, 2022

Building an extension to record videos part 2

#webextensions #shadowdom #video

Update: If you are newly viewing this, it seems like changes in Chrome might have broken the video recording functionality. If you record you'll probably get a black screen with audio only, canvas recording will still work. Very unfortunate.

Extension Refactor

After writing the extension once I decide to refactor the whole thing. This time the popup will directly control everything via message passing and the content script will respond with status messages to keep the UI up-to-date. The other thing was to start splitting the script into modules. The unfortunate thing is that extensions, as far as I can tell do not support ESM sigh.

I split out 2 functions, the function to record and the function to get the main video out into a script called video tools.

//js/video-tools.js
async function record(video) {
 //...
}

function getMainVideo() {
  //...
}

window.webVcr = {
    record,
    getMainVideo
};

We can approximate imports by exporting everything onto window. Since the content script cannot interact directly with any other script there is no danger of name clashes.

This script also must be injected so we add it to the js list in the manifest:

"content_scripts": [
 {
   "matches": ["*://*/*"],
   "css": ["css/styles.css"],
   "js": [
     "js/video-tools.js",
     "js/content-script.js"
    ]
 }
],

The order matters. The first script is injected first so we need video-tools.js to be first.

The main content script is now a message handler, it checks the command passed and uses a switch statement to do different things.

{
    const { record, getMainVideo } = window.webVcr;

    chrome.runtime.onMessage.addListener(async (request, sender, sendResponse) => {
        switch(request.command){
            case "toggle-record": {
                return await recordMainVideo(sendResponse);
            }
        }
    });

    let mainVideoRecorder;
    async function recordMainVideo(sendResponse){
        if (mainVideoRecorder) {
            mainVideoRecorder();
            mainVideoRecorder = null;
            sendResponse({ status: "record-stopped" });
        } else {
            mainVideoRecorder = await record(getMainVideo());
            sendResponse({ status: "record-started" });
        }
        return true;
    }
}

I've renamed some thing to be a bit better. You'll also notice that the event listener callback gets a 3rd parameter called sendResponse. This is a weird way in which extensions can do one-off communication. The message comes with a function that can send back. It only works once so the first responder wins and closes that communication line. I wouldn't be surprised if some day they tear this out too and make a cross-extension BroadcastChannel. Oh, and you have to return true if sendResponse is called asynchronously.

Lastly we have popup.js:

const captureBtn = document.getElementById("capture-btn");
let recording = false;

captureBtn.addEventListener("click", () => {
    chrome.tabs.query({ active: true, currentWindow: true }, tabs => {
        chrome.tabs.sendMessage(tabs[0].id, { command: "toggle-record" }, response => {
            switch(response.status){
                case "record-started": {
                    captureBtn.textContent = "Stop Recording";
                    break;
                }
                case "record-stopped": {
                    captureBtn.textContent = "Start Recording";
                    break;
                }
            }
        });
    });
});

Now it changes based on the response to toggle so if it fails it won't toggle. The initial state still won't be correct if there is a previous in progress recording but it's better and the messaging scenario will be scalable to handle newer features better.

After that grossness we have something that's quite a bit cleaned up. In fact I removed the record button from each video entirely as it didn't fit my uses cases very well and was hard to deal with. Instead we'll stick to just getting the main videos with the popup window, it makes everything simpler.

Icons

I also made a simple little VCR icon to make it look decent. You can do that by adding:

  "icons": {
    "256": "img/icon.png",
    "32": "img/icon-32.png"
  }

To the manifest. It actually takes different sizes if you want to make special sized version otherwise it'll take the largest and scale down. Sadly SVGs are not supported either. I originally started with a 256 sized icon I converted to png from svg.

Though at 32 pixels it was just a dark rectangle:

So I needed to build a 32 pixel version with reduced detail.

To be honest even that's not good either but I didn't want to spend all day designing.

Getting the main video

Before I was just getting the largest video assuming that was the main video in the page. To move away from the record button idea I decided that the current window's focus should dictation what we're recording.

async function recordMainVideo(sendResponse){
    if (mainVideoRecorder) {
        mainVideoRecorder();
        mainVideoRecorder = null;
        mainVideo.style.border = "none";
        sendResponse({ status: "record-stopped" });
    } else {
        mainVideo = getMainVideo();
        mainVideoRecorder = await record(mainVideo);
        mainVideo.style.border = "3px dashed magenta";
        sendResponse({ status: "record-started" });
    }
    return true;
}

What's new here is getMainVideo().

function getMainVideo() {
    const videos = Array.from(document.querySelectorAll("video"));

    if (videos.length === 0) return;

    let maxVideo;
    let maxIntersect = 0;
    const windowRect = { left: 0, top: 0, width: window.innerWidth, height: window.innerHeight };
    for (const video of videos) {
        const videoRect = video.getBoundingClientRect();
        const intersectArea = getIntersectionArea(videoRect, windowRect);
        if (!maxVideo || intersectArea > maxIntersect) {
            maxVideo = video;
            maxIntersect = intersectArea;
        }
    }

    return maxVideo;
}

function getIntersectionArea(rectA, rectB) {
    const overlapX = Math.max(0, Math.min(rectA.left + rectA.width, rectB.left + rectB.width) - Math.max(rectA.left, rectB.left));
    const overlapY = Math.max(0, Math.min(rectA.top + rectA.height, rectB.top + rectB.height) - Math.max(rectA.top, rectB.top));
    return overlapX * overlapY;
}

We sort videos by the amount of space they take up in the viewport, so even smaller videos will count so long as they're what the user is mostly focused on. In recordMainVideo we also add a border so that user knows which thing is being recorded.

Expanding to canvas

In the first post about recording media we were recording a canvas. We can still do that too but it might not be what the user wants in all cases so we can make it a toggle.

<!-- popup.html -->
<body>
    <form>
        <fieldset>
            <label>
                <input id="media-video" type="checkbox" checked />
                <span>video</span>
            </label>
            <label>
                <input id="media-canvas" type="checkbox" />
                <span>canvas</span>
            </label>
        </fieldset>
    </form>
    <button id="capture-btn">Record Screen</button>
    <script src="/js/popup.js" type="module"></script>
</body>

We'll add some checkboxes so the user can select their type (this could also expand to audio and maybe even gifs or something in the future).

We'll add a small style tweak so it doesn't look completely gross:

fieldset {
    border: none;
    margin: 0;
}

Then in the popup.js we can check which ones are click and send it as a payload property on the message. If nothing is clicked we can exit. I've setup a <div id="message"> at the top of the popup to handle error messages. The media types themselves are the names of element tags to query.

//popup.js
const captureBtn = document.getElementById("capture-btn");
const message = document.getElementById("message");
const mediaVideo = document.getElementById("media-video");
const mediaCanvas = document.getElementById("media-canvas");

let recording = false;

captureBtn.addEventListener("click", () => {
    message.textContent = "";

    const mediaTypes = [];
    if(mediaVideo.checked){
        mediaTypes.push("video");
    }
    if (mediaCanvas.checked) {
        mediaTypes.push("canvas");
    }

    if(mediaTypes.length === 0) {
        message.textContent = "Must pick at least one media type.";
        return;
    }

    chrome.tabs.query({ active: true, currentWindow: true }, tabs => {
        chrome.tabs.sendMessage(tabs[0].id, { command: "toggle-record", payload: { mediaTypes } }, response => {
            switch(response.status){
                case "record-started": {
                    captureBtn.textContent = "Stop Recording";
                    break;
                }
                case "record-stopped": {
                    captureBtn.textContent = "Start Recording";
                    break;
                }
                case "no-element-found": {
                    message.textContent = "No matching element found.  This could be because the element might be locked in a closed shadow DOM and you might need to use tab recording instead.";
                }
            }
        });
    });
});

The next part I don't think is worth showing but basically we take the mediaTypes, pass it through to getMainMedia which is a renamed getMainVideo and instead of querySelectorAll("video") we just pass in querySelectorAll(mediaTypes).

Dealing with Shadow DOM

Querying elements will work unless they are in a shadow DOM. In this case we have to get a bit more advanced. What we'll do is look to see which elements are not known elements.

There is not to my knowledge any property that lets you see if an element is custom but we can take advantage of the fact that custom elements must have hyphens.

function isCustomElement(element){
    return element.tagName.includes("-");
}

Now once we have a custom element lets look in the shadow DOM (if it's not closed). If it is closed we're screwed and we might need to make it so the user can fallback to screen recording instead.

function getMainMediaElement(mediaTypes) {
    const media = Array.from(document.querySelectorAll(mediaTypes));

    const shadowMedia = Array.from(document.querySelectorAll("*"))
        .filter(e => isCustomElement(e))
        .flatMap(e => e.shadowRoot ? Array.from(e.shadowRoot.querySelectorAll(mediaTypes)) : []);

    const allMedia = [...media, ...shadowMedia];

    if (allMedia.length === 0) return;

    let maxMedia;
    let maxIntersect = 0;
    const windowRect = { left: 0, top: 0, width: window.innerWidth, height: window.innerHeight };
    for (const media of allMedia) {
        const videoRect = media.getBoundingClientRect();
        const intersectArea = getIntersectionArea(videoRect, windowRect);
        if (!maxMedia || intersectArea > maxIntersect) {
            maxMedia = media;
            maxIntersect = intersectArea;
        }
    }

    return maxMedia;
}

We do a secondary element scan (very heavy, it might need to be a setting) of everything and filter to only custom elements. Then we try to access the shadow root with a querySelector. If there is no shadow root (which is also the case if it's a closed shadow root) then we emit no media elements otherwise we flat map all the matching types. As long as the shadow root is open we'll find it and capture it.

You might have noticed an extra status in popup.js for no element found. We can wire that up now too:

async function recordMainMedia(request, sendResponse){
    if (mainVideoRecorder) {
        mainVideoRecorder();
        mainVideoRecorder = null;
        mainMediaElement.style.border = "none";
        sendResponse({ status: "record-stopped" });
    } else {
        mainMediaElement = getMainMediaElement(request.payload.mediaTypes);
        if(!mainMediaElement){
            sendResponse({ status: "no-element-found" });
        } else {
            mainVideoRecorder = await record(mainMediaElement);
            mainMediaElement.style.border = "3px dashed magenta";
            sendResponse({ status: "record-started" });
        }
    }
    return true;
}

Full-screen recording

We've already done this but I want to make it an explicit user option. This also improves usability as the user can better understand why it's using screen recording rather than element recording when we encounter EME. Instead we'll change it so getCaptureStream errors bubble up to the UI by simply deleting all the try and catch blocks in record. Then we'll make recordScreen another function:

async function recordScreen(){
    const stream = await navigator.mediaDevices.getDisplayMedia({
        video: {
            cursor: "never"
        },
        audio: true
    });
    mediaRecorder = new MediaRecorder(stream, {
        mimeType: 'video/webm;codecs=vp9',
        ignoreMutedMedia: true
    });
    recordedChunks = [];
    mediaRecorder.ondataavailable = e => {
        if (e.data.size > 0) {
            recordedChunks.push(e.data);
        }
    };
    mediaRecorder.start();

    return () => {
        mediaRecorder.stop();
        setTimeout(() => {
            const blob = new Blob(recordedChunks, {
                type: "video/webm"
            });
            const url = URL.createObjectURL(blob);
            const a = document.createElement("a");
            a.href = url;
            a.download = "recording.webm";
            a.click();
            URL.revokeObjectURL(url);
        }, 0);
    };
}

We'll modify recordMainMedia to handle the error and propagate it to the popup.

async function recordMainMedia(request, sendResponse){
    if (mainVideoRecorder) {
        mainVideoRecorder();
        mainVideoRecorder = null;
        mainMediaElement.style.border = "none";
        sendResponse({ status: "record-stopped" });
    } else {
        mainMediaElement = getMainMediaElement(request.payload.mediaTypes);
        if(!mainMediaElement){
            sendResponse({ status: "no-element-found" });
        } else {
            try {
                mainVideoRecorder = await record(mainMediaElement);
                mainMediaElement.style.border = "3px dashed magenta";
                sendResponse({ status: "record-started" });
            } catch(ex){
                sendResponse({ status: "record-fail", payload: { ex } });
            }
        }
    }
    return true;
}

At the same time we can create another command to start a screen recording (some functions were renamed to be more precise):

//content-script.js
//some function were renamed
chrome.runtime.onMessage.addListener(async (request, sender, sendResponse) => {
    switch(request.command){
        case "toggle-record": {
            return await toggleRecordMainMedia(request, sendResponse);
        }
        case "toggle-screen-record": {
            return await toggleRecordScreen(request, sendResponse);
        }
    }
});

//...

let screenRecorder;
async function toggleRecordScreen(){
    if (screenRecorder) {
        screenRecorder();
        screenRecorder = null;
        sendResponse({ status: "record-stopped" });
    } else {
        screenRecorder = await recordScreen(mainMediaElement);
        sendResponse({ status: "record-started" });
    }
    return true;
}

And finally the case in popup.js:

case "record-failed": {
    message.textContent = "Could not record media, it may be using encrypted media extensions and you might need to use screen recording instead"
}

So now the user gets a message telling them to start a screen recording instead.

We can add another button (I won't bother showing it because you've seen this) for screen recording and wire it up:

screenCaptureBtn.addEventListener("click", () => {
    chrome.tabs.query({ active: true, currentWindow: true }, tabs => {
        chrome.tabs.sendMessage(tabs[0].id, { command: "toggle-screen-record" }, response => {
            switch (response.status) {
                case "record-started": {
                    captureBtn.textContent = "Stop Recording Screen";
                    break;
                }
                case "record-stopped": {
                    captureBtn.textContent = "Start Recording Screen";
                    break;
                }
            }
        });
    });
});

Conclusion

This was a fairly scattershot post on ways to improve the extension's capabilities. There's still some big rocks left to move though such as how we deal with frames. Since it's hard to tell what the "main" frame is we might need UI or something to let the user pick, and since content scripts are independent in frames we might need a centralized worker to figure out what a "main" media element is between iframes. The upshot is we might be able to fix the synchronizations issues with the popup as well using a similar technique.

This post is probably hard to follow but what I really want to get out is how to think about features and refactoring. Code will move around but hopefully it makes sense why it moved.

You might be able to see it better by diffing v2 with v1. https://github.com/ndesmic/web-vcr/tree/v2