Pushkar Anand

Posted on Mar 13, 2021 • Originally published at abstracted.in

Creating an End-to-End Encrypted alternative to Google Photos

#javascript #webdev #streaming #caching

It all started when my friend approached me for a web app. You see, he was spooked by the fact that all our photos are automatically uploaded to some cloud storage application without any encryption. And so all our private moments are available for corporate companies to train their ML models. Thus he set out to create an end-to-end encrypted photo storage application. He already had a native iOS and Android app in place (which he created using Fluter) when he approached me. So, all APIs and backend systems were already in place. Even a working native application was there to play around. I liked the idea and agreed to help him out. Putting my 6+ years of experience in development to use, I grossly underestimated the size of the project. 😝

As soon as I started working on it, I realized that this isn't a trivial web app where you can call a few APIs, show nice pictures to the user and call it a day. Since it is an end-to-end encrypted application, I had to decrypt everything after downloading them and then present it to the user. I can not rely on a blob/object storage to resize the images. All of this has to be done on the client-side without compromising on the speed. To make things worse, videos were also part of MVP! 😓

Challenges 😪

Most of the problem was solved for the app and the only re-implementation for the web was required. However, since web apps don't have access to the filesystem (without using an experimental API), and encryption / decryption of photos and videos would be a taxing process, I had to use every tool that I had to ensure performance.

Very early on I offloaded all encryption and decryption to a web worker. Thus, the main thread was free of the most taxing part of the application. It also reduced the time it took to encrypt/decrypt a file. I also used Comlink to communicate with the web worker. Initially, we were using AES encryption but later switched to libsodium. The code for this was quite simple.

First, we create a worker as follows.

// Worker File
import * as Comlink from 'comlink';

export class Crypto {
    async encrypt(data, key) {
        // Encryption Logic
    }

    async decrypt(data, nonce, key) {
        // Decryption Logic
    }
}

Comlink.expose(Crypto);

Then we simply load and instantiate the worker using Comlink.

// Application File where the worker is used.
import * as Comlink from 'comlink';

const CryptoWorker = Comlink.wrap(
    new Worker('worker.js', { type: 'module' })
);
const worker = await new CryptoWorker();

await worker.decrypt(data, nonce, key);

Then, we decided that we would cache any thumbnail that we load on UI in a CacheStorage. This way we don't have to re-download and decrypt the image. Improving our second load time. CacheStorage was ideal for this as it is accessible everywhere (main thread, webworker and service workers) and also responds with a Response object. We implemented the entire thing just using following few lines.

// Open cache named `thumbs`
const cache = await caches.open('thumbs');

// Check if we already have thumbnail for the file in cache
const cacheResp: Response = await cache.match(file.id.toString());

if (cacheResp) {
    // Return cached response
    return URL.createObjectURL(await cacheResp.blob());
}

// We don't have it in cache. Lets fetch and decrypt it
const fileData = await fetch(`https://sever/path/to/file?id=${file.id}`);
const decrypted = await worker.decrypt(fileData, file.nonce, file.key);

// Put it in cache for future use
await cache.put(
    file.id.toString(),
    new Response(new Blob([decrypted]))
);

// Return the object URL
return URL.createObjectURL(new Blob([decrypted]));

Also, a good UX for the gallery was a must-have. This is the part where users would interact most often. We wanted it to support all gestures available on a native application like swipe, pinch zoom, and pan, etc. And should also scale to the desktop as well as mobile. For this, we looked at many open-source libraries but found out that we didn't like the UX on most of them and all of them come with some sacrifices. Finally, we settled for PhotoSwipe. It fit most of our use cases. The only part missing was the infinite loading of pages.

Though they have it listed as a feature on their website, we found out that the already loaded images are not removed from the DOM, instead, new pages are just added. This wouldn't be ideal for us as a user can have thousands of images and we would want him to be able to scroll through them quickly. Thus we used react-window and CSS grid to create our gallery layout and let PhotoSwipe handle the interaction once the user clicks on an image. Making our app performant.

We used NextJS for the sweet out of the box SSR.

But we weren't done yet 😶

Everything was going great and we even rolled out a beta version for some users to test, when we started seeing random tab crashes on browser. Definitely there was some memory leak somewhere in the application.

After analyzing the behavior, I noticed that it was happening when my friend tried to open a few video files. Then it hit me, all our encryption and decryption were happening in memory! Which was good enough for small files on which I had tested. But for a large file, the tab would crash as it would run out of the memory limit.

We quickly checked the file sizes and found it was anywhere from 400MB to 1GB. This was not going to work on web. On app we had access to filesystem and so we can process it chunk by chunk and append to file. But on web we don't have access to filesystem and so a different approach was required. Putting everything behind an experimental API that would work only in Chrome is not the experience we wanted to deliver.

And so we kept looking. By luck, I stumbled on this awesome podcast.

Streams was the answer! Instead of putting everything in memory and then encrypting/decrypting the file, we can do it with Readable Streams. However, axios (the library that we were using for making API calls) didn't have support for this. So, we had to resort to fetch APIs. Not a bad compromise I would say.

Finally, I refactored my code to something like below:

// Get the file
const resp = await fetch(`https://sever/path/to/file?id=${file.id}`);

// Get reader to be used in readable stream.
const reader = resp.body.getReader();

// Create a readable stream.
const stream = new ReadableStream({
    async start(controller) {
        // The following function handles each data chunk
        function push() {
            reader.read().then(async ({ done, value }) => {
                if (!done) {
                    // Decrypt chunk
                    const decryptedData = await worker.decryptChunk(value);

                    // Add decrypted data to stream
                    controller.enqueue(decryptedData);

                    // Is there more data to read?
                    push();
                } else {
                    // All done, rest!
                    controller.close();
                }
            });
        };

        push();
    }
});
return URL.createObjectURL(await new Response(stream).blob());

I still had doubts whether this would work. However, once the video loaded without the tab crashing, I was in the seventh heaven.

And miles to go before I sleep 🚶‍♂️

I am happy with the progress that we have made with the project. I was aware that these technologies existed and how they can be used. But implementing them was a completely different ball game. Multiple times I had to rewrite or look for better ways to implement the same thing, as bare metal implementation was tough to maintain. I learned about WebWorkers, Comlink, CacheStorage, and ReadableStream. Worked with multiple libraries and filled in where ever they fall short. All this without sacrificing on UX, usability and performance.

Still, there are few things that I would like to solve. Right now the entire video needs to be downloaded before it can be played. Ideally, I would like it to be able to stream. For this I experimented with MediaSource. MediaSource requires codecs to be specified explicitly, which I don't have. Hence, I am stuck. Please let me know if you have any ideas on how I could work around this. I would love to hear from you. 😊

Sharing is a feature which I feel is also very essential for this application. Just API integration is left for this. I would also like to add Service worker with Workbox for offline caching and convert it to a PWA which can then be installed on mobile and desktop.

The source code for all of this is available here. If you're curious about the product, check out ente.io.😊

Top comments (6)

Rahul • Mar 14 '21

I'm way to surprised by the pricing that's so cheap.

Pushkar Anand • Mar 14 '21

It is also backed up across locations, including an underground fallout shelter.

Vincent Milum Jr • Mar 14 '21

One thing I'm curious about with all of this. If it is end-to-end encrypted, how are the encryption keys stored and transmitted, say, if moving to a new device?

Pushkar Anand • Mar 15 '21 • Edited

Everything is encrypted using a passphrase, which the user set during sign-up. If the user logs in from another device, he/she has to enter the same passphrase again. Obviously, this passphrase cannot be changed.

We are working on a doc where we will detail out the encryption logic. I will update this blog post once its ready.

Vincent Milum Jr • Mar 15 '21

Now I'm curious to know why the password can never be changed? There are other persistent encryption systems that allow this.

Vishnu Mohandas • Mar 16 '21 • Edited

Hey Vincent, founder of ente.io here.

There was a slight confusion. The password can indeed be changed. Just that we have not shipped the feature yet.

Circling back to your original question about how the keys are transmitted, we generate a masterKey when you sign up. This masterKey is encrypted with a keyEncryptionKey, derived from your password. This encryptedMasterKey is then stored on the server. When you sign in on a new device, this encryptedMasterKey is retrieved from the server. As the last step, once you re-enter your password the new device will derive the keyEncryptionKey, and compute the original masterKey.

Please let me know if you have any follow up questions!

DEV Community

Creating an End-to-End Encrypted alternative to Google Photos

Challenges 😪

But we weren't done yet 😶

And miles to go before I sleep 🚶‍♂️

Top comments (6)

Read next

Self Writing Lang Graph State

GraphQL: A Beginner's Guide

TypeScript for Domain-Driven Design (DDD)

.NET Development and Localization for JustAnswer – case study