Yoann Moinet

Posted on Nov 15, 2023

🧑‍💻 Platform Teams best practices

#dx #programming #devops #devrel

Bon matin 👋

I'm Yoann Moinet, a Frenchman living in Montpellier.
In early 2019, I joined Datadog and bootstrapped the Frontend Platform team.

But, what is a platform team you ask?

[opinionated vision]
They're here to improve the Developer eXperience and remove any pain points in the day-to-day work of all the engineers.
They can cover a lot of ground: build, tests, deployment, code health, internal tools, and even more...

Let's say, in a company, the engineers are working on a product or a service to satisfy the customers.
The platform team works on the platform to satisfy the engineers, building anything they can think of, to improve the reliability, productivity, efficiency and (more importantly) the happiness of our engineers shipping the product to our customers.
[/opinionated vision]

Working in a large scale environment at Datadog, we had to come up with some kind of charter so we stay focused on what's really important (DX) and don't try to fix everything at once: that would end up being a bad experience for everyone involved.

You have to be laser focused on what you want to achieve and always work incrementally.
There, I gave you the first best practice for free...
jk, they all are for free.

Over the years, I've identified a few best practices (most of them directly translated from our charter) that could be phrased agnostically enough to be applied not only to the frontend, but to any kind of platform team.

1. 🏢 Workflows --not their implementation-- need to be shared company wide.

You should share similar workflows between teams and technologies, so it's easy for newcomers or someone working on an incident to quickly get up to speed even in a different repository/project.

Markers and primitives need to be identified in each workflow in order to keep them similar across implementations.

These implementations can be different as long as a team is there to own the support and follow the same identified markers/primitives.

In the case where there is already an established tool foundation, you should have, at the very least, hooks/flexibility/documentation to customize the workflow in order to align with the project's tech stack and complexity.

🔎 IRL Example

We have a command to deploy to staging from whichever repo you work on.
This command waits for the feature branch's pipeline to be 100% ✅ before triggering a merge and deploy into our shared staging.

For the frontend, in a dedicated repository, it's safe enough to deploy on staging at an earlier step in the pipeline, so we updated the global tool to only wait for a specific job in the CI before deploying. This way it can trigger much sooner for our frontend engineers.

This alone reduced the staging workflow by half for our frontend teams without impacting the global workflow itself, and sharing the same primitives with the rest of the company.

2. 💡 Workflows should not be created or changed unless it's tightly related to a known and documented problem.

You may want to test new technologies, or read articles about new workflows and you want to try them out.

But unless you have a known issue with the related process, you should not change it.

You need to reach a consensus among impacted teams before starting any work on a new workflow or its update.

Exploration is fine though. Necessary even, to understand what the community has to offer.
See next point.

🔎 IRL Example

We use RFCs to have a transparent and open discussion about new technologies we want to use, or new workflows we want to implement.

Having a document written down helps with the global vision of the change we're about to make. It reveals misplaced or incompatible workflows and edge cases.

We're able to gather feedback from everyone involved and refine it along the way. To make it even better and more personalized.

3. 🗺️ Keep exploring what the community has to offer.

Keep a good exploration routine and a thorough tech watch. Using Hacker News, Reddit, X (Twitter), whichever you prefer. There is no single answer as long as you keep yourself up to date with what's happening in the community you're part of.

🔎 More details

When you want to try something new, do it in a controlled, sandboxed environment and process.
Keep a written list of things you're improving, as well as shortcomings and potential maintenance costs along the way.

You don't really have to go all in, it's really just to understand how it would translate to your area and the specific needs your platform has.

Later down the line, you'll notice that you'll be able to connect problems and feedback with solutions you've seen during your watch.

4. 💻 The technology chosen for a workflow should be known and understood by the people that use it the most.

Workflows implemented for the frontend should use JavaScript. For the backend, Python is used, etc…

This allows the people that are the most impacted by the workflow to fix and tweak it if needed.

🔎 IRL Example

We used to have a monorepo for both our frontend and backend.

The infrastructure was orchestrated around a Rakefile (Ruby) triggering Bash and Python scripts. No-one from the frontend teams wanted to dive into that.

We've split the frontend in its own repository, and started to port everything to NodeJS. Making it more approachable for us and the other frontend engineers.

5. 🦾 A workflow should be tightly related to the infrastructure it's applied to, its needs, and its context.

You should not try to implement a workflow once and expect it to cover every problem in existence across unrelated platforms or infrastructures.

A workflow should be implemented in the context of the infrastructure it's running on/over/in…

If you aim too broad, you'll end up with a cluttered workflow that's loaded with unwanted overheads, slower and more complicated than needed. This will impact multiple engineers, everytime they use it.

You implement and test it once, they use it thousands of times everyday.

🔎 IRL Example

Deployments used to be handled by a Bash script, written outside of our repository, which triggered a Go script, also in a different repository.

This workflow and tooling was written to cover every need, for everyone, with many conditions and edge cases.

Like the Rakefile based infrastructure from the previous example, nobody wanted to touch these Bash or Go scripts. It was slow, but too difficult to really update without a risk of breaking deployment of other projects.

We wrote our own deployment script with only what was needed for deploying our frontend. It's in NodeJS, versioned in our main repository, so everyone can tweak it as needed, and be aware of its changes. And it shares the same primitives as the rest of the company, so anyone from outside can still interact with it.

The overall workflow didn't change, meaning that from the engineer's point of view, nothing changed. The process is now 10 times faster and our frontend engineers are able to change what is uploaded or not simply by changing some JS code they are already familiar with.

6. 📣 Any new or updated workflow should be transparently communicated at large.

Too much is better than not enough (in this context).

When you finally have a go at a new workflow or updating an old one, it is very important to communicate at every step of the process to every one impacted by it.

🔎 More details

You start with a presentation of the whole project before the first line of code is even written. You explain why and how you do it, with an overall approximative timeline (can also be done through an RFC).

Then, once you've started working on it, you regularly report progress, clarify the timeline and list any required actions at every step of the migration (if necessary), this can be done by mail.

Finally, at completion, you explain again the new/updated workflow, but also reflect on what went well and what could have been done better, this can be done by mail and presentations.

This helps other engineers see your work. Enforcing the idea that you're not just a support team, but that you care about their happiness and you act on improving it every day of the week.

7. 🏷️ You own the platform, they own its use cases.

When creating a new tool or trying to find a new solution, it's important to think about ownership from the start.

Whatever solution you'll need to implement will end up being used by tens, hundreds, or, if you're very lucky, THOUSANDS of engineers.
You have to think of it as a platform from the beginning, a platform to which anyone could plug their very own tweak to it.
Tweaks that will ultimately be owned by them, meaning that any bug, fix or modification will be handled by them. Meanwhile, you can focus on the platform and the glue that ties everything together.

Ownership of these additions have to be very clearly defined and more importantly enforced. Either by documentation or even better, with a CODEOWNERS file.

In addition to that, contributing to that platform should be clearly documented, and if possible automated as much as possible. If you want people to help you, you have to help them do it in the most frictionless way possible.

🔎 IRL Example

We have a CLI platform in our frontend repository that we use for any tooling need. It goes from printing some information from the CI/CD to uploading our assets in production.

Anyone from anywhere, as long as they know how to write TypeScript, can contribute. All they have to do is run a CLI command to bootstrap their very own command. It will create the files, add them to the CODEOWNERS file and through comments in the code give some good practices and guidelines.

We've grown it to more than a hundred commands, all of them documented, fully owned and easily accessible from our repository.

8. 📋 Gather feedback periodically, if not continuously.

It is very important to keep yourself very aware of what happens in the community you're helping.
You need to understand how the platform is used everyday by your engineers.

There are a few ways to keep yourself in the game.

You can conduct regular embeds, from your platform team into product teams, so you can experience first hand the DX for a given team.
It means you'll work with them directly, as part of their team and following their processes.
You still have to prepare for it, meet with the team's leads beforehand, explain the motivations and look for self-contained tasks that can be completed easily, as if they were done by a new team member.

Another solution is to survey teams.
There are different ways of doing this, the easy way is to have a form with questions.
More complicated ways could be:

Some questions integrated into common CLIs.
Injected forms/popups in common tools' UI.
Automated Slack questions.

Your imagination is the limit.

A few tips for surveys though:

Questions should be the same across iterations, so you can track progress.
Questions should follow some kind of "framework" both for the phrasing and the meaning (like SPACE).
Remain high level, you'll go deeper later on if you need to.
Keep the survey as short as possible.
Organize the questions by theme.

When it's time to end the survey, analyze the answers and identify patterns.
Don't focus on the "one-off" comments, see the big picture.

🔎 IRL Example

We send a survey to all the frontend engineers twice a year.

It's usually sent out a few weeks before the transition to the next quarter, so we can use the results to define our objectives for the next 6 months (it spreads over two quarters).

We also brought in other platform teams so they can ask their own questions as well, so we keep a single survey, instead of having plenty of them and tiring our engineers with questions all year round.

It really helped our team, drove it in the right direction, especially when I was alone in it, to better define what was the most problematic and which points were high value for a small cost.

I hope this clears things up for you and will help you better manage the platform work at your company.

Do you have best practices regarding platform teams you'd like to share?

Psssst... 🤫 we're hiring.

ℹ️ This is a re-work of my previous article Developer Experience Magna Carta with more content.
🌄 Photo by Max Duzij on Unsplash
Thank you Erik for the thorough proofreading.

DEV Community

🧑‍💻 Platform Teams best practices

Bon matin 👋

1. 🏢 Workflows --not their implementation-- need to be shared company wide.

2. 💡 Workflows should not be created or changed unless it's tightly related to a known and documented problem.

3. 🗺️ Keep exploring what the community has to offer.

4. 💻 The technology chosen for a workflow should be known and understood by the people that use it the most.

5. 🦾 A workflow should be tightly related to the infrastructure it's applied to, its needs, and its context.

6. 📣 Any new or updated workflow should be transparently communicated at large.

7. 🏷️ You own the platform, they own its use cases.

8. 📋 Gather feedback periodically, if not continuously.

Do you have best practices regarding platform teams you'd like to share?

Top comments (0)

Read next

Building WordPress APIs: Connecting IoT Devices to Your Site

Buffer Logs and Flush Automatically on Error with Powertools for Lambda

Building a Secure and Scalable CI/CD Pipeline for EKS Using Jenkins and GitHub Actions

Day 17: Github Actions