If you ever try to stand up a Puppeteer service you will almost immediately find it is difficult to secure when running inside a Docker environment.
I love my serverless, so I was not prepared to take no for an answer. And with a lot of sweat, I think I able to stand up a Puppeteer service with full customer isolation and protection again serverside scripting from within a multi-tenancy docker container.
Customer Isolation
Customers should not be able to view each other's data.
--no-sandbox
Chrome itself is natively very good at sandboxing tabs for security reasons. Ideally, we would simply just exploit the inbuilt security model, e.g. put each customer in their own tab(s). Not so fast though! Unfortunately, Chrome won't boot under that configuration. The way that sandbox is implemented does not work containers, as a result, nearly every Dockerfile for Puppeteer on the internet launches with the --no-sandbox flag.
--cap-add=SYS_ADMIN
The few Dockerfiles I could find without --no-sandbox have added the SYS_ADMIN security capability. This is one solution to keep the sandbox, but most managed docker environments don't expose this control, unfortunately. So I needed a different way to work on Serverless.
Linux process isolation
Normal Linux processes cannot mess with each other's memory. So the OS approach for customer isolation is to run a different browser for each customer.
Resource isolation
You still need to be careful though, as even separate Chrome processes can still access common resources (e.g. filesystem). In particular, user sessions, cookies, and website cached data need to be stored in different directories for each customer
--disk-cache-dir
--media-cache-dir
Protection against Serverside Request Forgery
A Puppeteer service essentially allows end-users to run code within our infrastructure. The big danger is that the Puppeteer instance will become a bastion for network intrusion. This is not an academic thing either, a ton of exploits observed in the wild used Puppeteer or similar to launch attacks via serverside request forgery attacks. Check out
- https://techkranti.com/ssrf-aws-metadata-leakage/
- https://twitter.com/IAmMandatory/status/1196939247057457152
- https://www.hackerone.com/blog/spotlight-server-side
In particular, an easy attack is to have Puppeteer run a webpage that probes for Cloud metadata servers, which can then be used to obtain credentials.
So, we must prevent Chrome from accessing certain URLs and local IP addresses.
You can find a list of IP addressed to block on owasp.org.
--proxy-server
Chrome can be configured to use an outbound HTTP proxy server, which we can use to intercept and filter traffic. For our service, we used TinyProxy as it has a very low resource overhead (2MB).
The TinyProxy configuration then protects against access to sensitive IP addresses and domains.
Our translation to TinyProxy configuration can be found on Github
Exposing the Chrome DevTools protocol port
The most exciting thing for me is allowing users to script Chrome from within their browser environment remotely. This is enabled by exposing the Devtools debug WebSocket protocol.
To allow us to meter access, we expose a WebSocket endpoint that requires an access_token to be in the path. We can then verify the access_token, boot Puppeteer, and then proxy the public WebSocket endpoint to the internal WebSocket endpoint on demand.
Try it in your Browser without installation
With that in place, we are now able to offer Puppeteer access from right within a browser. Check it out, we are hosting it from within @observablehq in a notebook.
Full source code is available on Github and it is designed to be run on Google Cloud Run.
Top comments (0)