When I was at Amazon, I worked in the Seller Central org building tools to help companies sell their products. The app I primarily worked on was a complex multi-part form broken into numerous tabs with dozens of inputs dynamically populated based on product type, customer characteristics and various choices made along the way. The app was built with React and Redux, and the backend was a custom Java SpringMVC-based framework.
As a company, Amazon has a strong culture of web performance, but it also values shipping code rapidly. These competing interests resulted in friction; it could be deeply frustrating to see a month's worth of work improving page performance wiped out by an unintended negative side effect from a new feature. When I started as the sole frontend engineer on my team, and one of a handful in the org, my primary focus was on frontend architecture and web performance. It was my responsibility to come up with sustainable ways to hit those goals without compromising our ability to ship code. At the time, we were regularly missing our web performance targets. Most of the team members were smart backend devs, but few had much experience with React, or with optimizing frontend performance.
I came in, as many new hires do, wanting to be the hero who stepped in and neatly saved the day. I started by looking for the easy, high-yield performance wins: are we using an optimized lodash build for webpack? Are we bundle splitting? Exactly how many
fetch polyfills do we have in our bundle? I'd worked on performance in React apps before, and I had my mental checklist ready. The problem, though, was that the low hanging fruit wasn't yielding enough actual benefit. We shaved off 10kb here, 100kb there. Our bundle size dropped from 1.8mb to 1.5mb, and eventually all the way down to just over 1mb, but we still couldn't hit our performance goals. We relied heavily on real user monitoring to understand how users experienced our site. We eventually found that due to how users interacted with our app, our cache hit rate was fairly high. The reductions to the size of our JS bundle were definitely a good thing, but they weren't giving us anywhere near the improvements in how users were experiencing our performance that we wanted. There had to be something else that could speed us up.
The breakthrough came, as they sometimes do, after I'd exhausted my checklist and started exploring areas I was unfamiliar with. I was looking for new and different ways to analyze what was and wasn't working in our app, and that's when I stumbled on the coverage tab in Chrome's web inspector. Finding it is a convoluted process; it's buried two menus deep in the Chrome DevTools three-dot menu under "More Tools", or you can reach it by activating the Command Menu in DevTools with
> to see other available actions, and then typing
coverage. Seeing its results for a first time were a revelation and I got excited enough to tweet about it joyfully.
Jamund FergusonTIL about the "show coverage" option in Chrome Web Inspector 🤯 Includes both CSS and JS00:18 AM - 02 May 2020
The Coverage tab can show you unused JS and CSS on your page. Once you get into the coverage panel, by default you'll see both JS and CSS files. But you can also filter to just CSS.
What I saw there was that over 98% of our main CSS file went unused. I also realized that CSS file, on its own, was over 1mb. I'd been grinding away, paring down our JS bundle to the smallest possible size, but the CSS file was right there actually having a larger impact! The CSS coverage below comes from a different website, but it follows a similar trend)
While it's pretty common to discuss the downsides of large JS bundles, large CSS bundles are arguably worse! CSS is a render blocking resource which means the browser is going to wait for that CSS file to be downloaded, parsed and constructed into a CSSOM tree before rendering the contents of the page. Whereas JS files these days are usually added to the end of the
<body> or included with the
async tags, CSS files are rarely loaded in parallel with the page render. That's why it's imperative that you keep unused CSS out of your main CSS bundle.
There has been talk for years about including only "above the fold" or critical-path CSS on initial page load, but despite several tools that can try to automate this process it's not foolproof. When it comes to just avoiding including unneeded CSS I think many would agree CSS-in-JS approaches and even CSS Modules do a better job at this compared to the ever-too-common approach of having one large Sass or LESS file that contains all of the styles anyone might ever need for your site.
My team's initial approach to styling was to have a single large Sass file with dozens of dependent stylesheets @imported in. That made it quite difficult to figure out exactly what parts we were or weren't using, and I spent hours scouring our CSS files looking for unused styling. Nothing looked obviously wasteful, and I certainly couldn't find a whole extra mb of unused style. Where else could the CSS be coming from? Was it from a shared header/footer that included extra styles? Maybe a JS-based CSS import somewhere? I had to find out.
import 'semantic-ui/dist/styles.min.css' import 'semantic-ui/dist/styles.css'
I had looked at this code and ignored it literally dozens of times. But given my new challenge to figure out where the extra CSS was coming from it stood out. Why were we importing this library at all? Did we even need it? And why were we importing it twice (both minified and non-minified)?
The first thing I did was comment out both of them. I ran
npm run build and saw our CSS bundle drop from 1.25mb down to 30kb! It was ridiculous. This code was killing us. ☠️
Unfortunately, as you might predict, our website looked horrible after removing the CSS. We were relying on something in those CSS bundles. Next, I commented out each of them one at a time. Strangely, we needed to keep the non-minified one in there to avoid breaking the look & feel of the site, but at least I was making progress. We shaved off around 500kb of CSS just by removing one line.
Now began the more difficult part of removing our reliance on that UI library altogether.
Like many teams, we relied on an internal UI library that our app was already importing. I figured we could probably use that internal library to provide most, if not all, of the functionality we were getting from the external library.
An early approach I took was simply copy/pasting the whole built Semantic UI library CSS into a new file and then removing things we didn't need. That got me somewhere, but became increasingly difficult as the styles got more nested and complex. Eventually I removed the CSS imports completely, purposefully breaking the look of the site. That made it easy to identify which classes we were actually using. We took screenshots of the working site and then carefully compared them with the broken version.
It turns out we were primarily using three components:
- The grid system
- The navigation tabs
- Modal dialogs
Once we figured which pieces of the library we were using, it was easy enough to search through our code base and see which components were relying on them. There were a lot that used the grid for example, but we had a drop-in replacement for those that only required a small class name change. In some other cases, we had to either add new CSS or move the HTML around a bit to get it to work with our other UI library. It ended up being about a month of work for a new team member to completely detach us from that external library. We carefully reviewed her work, compared before & after screenshots, and where there were minor style differences, ran it by a few team members to make sure the changes were close enough to the original to not block the change.
After we shipped the changes we looked at our real user monitoring graphs and saw massive reductions in both our 50th and 90th percentile time to interactive measurements across the app. At the 90th percentile there was around half a second of reduction in TTI. After making so many changes that didn't seem to matter, it was so satisfying to finally have a solid performance win.
Removing that one UI library bundle probably ended up having a larger effect than any other single change I witnessed in my entire time working on web performance at Amazon.
I've found it's very difficult to generalize web performance wins. How likely is it that your app is also double importing a large CSS library? You might as well check, but it's probably not happening. What I hope you take away from my experience here is the underlying factors that enabled us to find & fix this problem.
Don't just optimize to a checklist (Learn the tools!)
The easier part is process-related: you can't just optimize to a checklist. It's important to have checklists when you're working on performance work, because many apps can be improved by a straightforward, well-known list of simple improvements. You can and should leverage the work you've done in the past, and that the community around you has done to improve performance. But when you reach the end of your checklist, you need to develop the skillset to keep digging. Just because other apps you've worked on benefitted from change A or change B doesn't mean it will work in your next app. You have to understand your tools. You have to know the specific characteristics & architecture of your site. And you have to know your customers. Lighthouse probably told me early on in this process that I had too much CSS on the page. Without a clear understanding of how our CSS files were built and better tools for analysis I wasn't able to do much with that information. While checklists of common web performance mistakes can absolutely be helpful, teaching teammates how to use the tools available to analyze web performance in the specific, is much more powerful.
Have a strong web performance mandate
The other major takeaway, though, is about culture. To build performant applications, performance itself needs to be a first class KPI. I think many engineers enjoy optimizing things. It's really fun and difficult work. The results as we all know can be very inconsistent. I can't tell you how many times I promised to shave off 150ms from our experience, got that improvement when testing locally, but saw nothing or even a negative impact when the change actually went live. In many cases that can lead engineering or product managers to be weary of such promises. My org at Amazon had amazing leadership when it came to web performance. That mandate ensured that we had the buy-in we needed to keep going until we had the impact we wanted.
I don't expect this article to provide any magic bullets for those out there trying to optimize their apps, but I do hope it encourages you to keep digging until you find your own.
P.S. I want to give a shout out to my colleagues Scott Gifford and Michael Kirlin. Scott remains hugely influential engineer at Amazon in the web performance space and mentored me throughout my time there. Michael not only reviewed this article, but edited it extensively for clarity. Thank you friends!