Benchmark Says No — The Mattermost Platformer #16

This post is part of a series previously posted in the public "The Platformer" channel on Mattermost's Community server. "The Platformer" is published on a weekly cadence sharing the latest updates from the Mattermost platform teams (web, server, mobile and QA). It's an insider look into how these teams operate, and what they're working on.

There are many reasons why it’s cool to contribute to open source. One reason that is often overlooked is that simply having a large code base available to the public is value in itself. Why? Other projects can use these code bases to showcase the value of their tools. I once heard that JetBrains used the Mattermost server repo as a benchmark for their Go tooling. And now, Meta demonstrated the performance improvement their new Hermes engine makes in React Native, by using Mattermost as a benchmark application. Check that post for animated GIFs (pronounced “jiffs” by the way) to get a visual feel of the performance boost from this engine. Also it’s great to see v1 in action again. Those were the days huh. Oh my, v2 is so much nicer looking (and faster).

And once more, purely coincidentally we got ourselves a theme of the week: performance and benchmarking. Performance is a complicated topic and tends to become a bit of a game of whack-a-mole: you solve one performance issue, but then a bit later it reappears, or another one appears elsewhere. That’s not great. There must be a better way, no? More on this in a bit, first let’s get to the weekly pickings of the fruits.

Cherry picks

On the mobile platform end, the team is a bit thinly populated this week (half is out), but we keep chugging along in terms of bringing v2 to feature parity with v1, focusing on upgrading settings screens (almost there) and our ever expanding end-to-end test coverage (even with some test performance improvements this week!).

On the web platform end, we are experimenting with “theme weeks” as a budgeting hack to work on important but not urgent stuff. This first theme week (and this will likely be a recurring one) is focused on 🥁 performance! We’re pulling in a whole bunch of tickets accumulated over time of smaller things we can optimize for performance. Some of these improvements are already landing.

On the desktop platform end, translations have landed! If you know some language that is not English, join our localization channel and help out getting the desktop app translated to your favorite and least favorite languages! There’s also work in progress to improve window resizing performance. The resizing performance regressed to be comically slow on some OSes, as you can see in my jiff attached to that PR.

On the server platform side, we’re firing on all four cylinders (now that Alejandro officially joined us 🎉) this week. He started running load tests with Postgres 12, showing a significant performance boost. Isn’t it great when simply bumping an (open source) dependency makes your product way faster, effectively for free (free as in beer)? We’re also debugging cloud production issues around the performance of plugin downloads. On the refactoring side, we continue to make progress on spreading the context love across our code base, and refactoring the multi-product architecture into larger services according to The Plan™️. We’re also fixing a bug in GraphQL that broke guest logins on community, until then, GraphQL is disabled (so if you’re feeling less graphy this week, now you know why).

On the QA platform side, more work is done on our build pipelines for mobile, in close collaboration with the mobile platform team. It seems that our set of “unstable” end-to-end tests needs a bit more love, and be lowered from the 260 tests (out of a total of 1600+), because we had a case where a regression wasn’t caught even though we had a test for it (this test was in the “unstable” suite and we missed it). Further: another week, another cloud release. There’s still opportunity to further oil this release machine (for performance — I had to work that in somehow).

The Performance Regression Challenge

Here’s a conversation I’ve been having multiple times over the last months: so we invest a lot in fixing performance issues in <<INSERT CLIENT HERE>>. How do we prove that they:

Make a material difference (are high impact)
Don’t regress again tomorrow (are lasting)

This is a problem we have tackled on the server side already. We have a load test tool and a test suite where we can verify the performance of a substantial part of the (server side) of our product in a reproducible way. As long as y’all keep adding your features to this suite (hint hint!). We run this suite at least every release to make sure there’s no performance regressions.

But of course the server side is only a part of the story. We have to figure out how to do this for our clients as well.

The ideal situation is that as part of our CI job (e.g. run in the context of a PR), we are able to run a performance regression suite, which compares the results of this run with previous runs and gives a 👍 or 👎 based on this (ideally fully automated). In Little Britain parlance, “computer says no” to any commit that makes performance worse.

This is easier said than done, because we have to be able to control a lot of variables. I think this worth the research & development, though. Ideally, we find a reusable solutions that we can leverage across our clients and server as well. I’ll bring some people together to brainstorm on this topic to see if this could be a potential thing to work on in Q4.

Ultimately we have to be able to put trustworthy numbers on our performance improvement work (to complement our nice looking jiffs). And if we’re able to do that, we may as well use those numbers to detect regressions.

If anybody has experience or ideas on how to do this in web app or mobile app contexts, please let us know.

And that’s all for this week. Have a great weekend!