Gavin Campbell

Posted on Dec 11, 2017 • Edited on Jul 28, 2020 • Originally published at gavincampbell.dev

Throwing code over a different fence

#devops #culture

We're often told that the existence of a DevOps Team is something of an antipattern, or indeed "considered harmful", but it wasn't until I saw this in action that some of the reasons for this advice became really clear in my mind, and I thought I'd note some of them down here.

In the "bad old days", the developers used to write code and "throw it over the fence" to the operations team, who were the custodians of the organisation's infrastructure and responsible for all software deployment and maintenance. This procedure had an associated cost, of course, in paying the operations team to come to work at night or at the weekend in order to "babysit the deployment". In order to reduce this cost, the "routine" deployment and maintenance procedures could even be outsourced to lower-paid workers elsewhere in the world.

This approach also created cultural rifts, of course, in that the developers weren't incentivised to "own" the reliability of the product, and the operations teams had little interest in accelerating the shipping of features, preferring to devote their attention to maintaining the integrity of the infrastructure, particularly when this was the sole deliverable associated with their outsourcing agreement.

Added to this was the increase in cycle time due to the need for "hand-offs" between the various teams involved in getting a feature from MacBook to production, which was exacerbated further when the teams involved in the hand-offs had different goals and priorities, and even different employers.

The DevOps Revolution promised to solve all of these problems by aligning the interests of the developers and the operations teams in shipping features more quickly as well as more reliably.

Naturally, since shipping more features more reliably sounded like an excellent idea, many organisations decided to get a headstart on their DevOps Journey by recruiting a dedicated team to establish their DevOps Capability. The DevOps Team arrived armed with a vast arsenal of tools, and set to the task of establishing carefully crafted pipelines, dashboards with a myriad of metrics, and deployments in blue, green, and all the colours of the rainbow.

The problem, of course, is that the interests of the DevOps Team were no better aligned with those of the feature developers than those of the Ops team had been previously. So, whilst the DevOps Team were busy high-fiving, having just migrated their deployment orchestrations to the latest tool of the day, and created a brand new dashboard with every metric imaginable regarding code quality, the feature teams were still very unclear about how to deploy and monitor the features they were creating, and were reduced to throwing code over the fence to the DevOps Team.

The way to break down the fences, of course, is that the feature teams need to own the delivery and monitoring processes themselves. It's already uncontroversial to suggest that development teams should seek to cultivate "T-shaped people", with deeper functional expertise in one area, such as databases, testing, or front end coding, as well as a base level of understanding of all the tasks required to successfully deliver a feature. In the modern world, of course, one of these competencies is the ability to create test and deployment automation, as well as infrastructure and monitoring, or in other words the tasks that are being done by the newly established DevOps Team on the other side of the fence.

Oldest comments (7)

Jason C. McDonald • Dec 11 '17 • Edited

Nice summary of the issue!

At my company (MousePaw Media), I've inadvertently served as the "devops" guy forever, and I've started anticipating many of these issues not far down the road. Thus, we are rolling out the following structure in 2018:

First, all programming department employees will have to pass our company's Repository Master training at some point. (I am currently writing this training material). Starting next summer, incoming interns will be required to pass Repository Master training as part of the internship program.

This training ensures employees understand the entire pipeline: the VCS (Git), static analysis (linters), dynamic analysis (memory checkers), build tools (CMake, Makefiles), code review, and the CI (Jenkins, Harbormaster).

While I'm creating the material to be primarily informative about our own pipeline, I'm also trying to ensure their understanding is solid enough to port to other companies and projects.

Second, we will have a formal Repository Master role. It is voluntary, and every developer is qualified to act in it, but only one or two are given the "power" at any one time. This is because acting Repository Masters have the ability to manage and bypass any part of the pipeline: they can fix problems, kick emergency bugfixes right into prod, tweak settings, configure repositories, etc. This is only to ensure the pipeline isn't damaged by 'too many cooks in the kitchen.'

Ultimately, the only privileges reserved for the Repository Masters are:

Bypassing the code review system (direct push to master and stable). Otherwise, any staff can land code that was reviewed and approved by any other staff.
Editing the repository's administrative settings.
Modifying the automatic CI build pipeline and triggers. (Any staff can create manual CI builds.)
Administrative control over the CI.
SSH into the build box.

However, because all the programmers have the training to act as Repository Masters, they are not dependent on anyone else to diagnose or fix most problems. They can untangle the VCS, use all the tools at their disposal, and diagnose when something breaks.

Gavin Campbell • Dec 11 '17

Right, so basically you are me!

A couple of things you might want to think about though; if it's necessary to be able to "kick emergency bugfixes right into prod", this is telling you something about the rest of the pipeline, viz. that it isn't fast enough!

As for "direct push to master", yes it is switched on for me, but I'd like to get to a place where I could turn it off, even for me!

Jason C. McDonald • Dec 11 '17 • Edited

Regarding pushing bugfixes to prod, two things:

1) Our deployment speed is limited by the fact most of our team works only a few hours a week (side effect of being a work-for-shares startup). The pipeline is actually really fast given a full-time team, but we don't have that yet. Someday!

2) There are exceptions. "Oh crap, there's a broken link in the README" could go through code review, but you're wasting everyone's time at that point. Ergo, git push origin master.

edA‑qa mort‑ora‑y • Dec 12 '17

I'm opposed to distinct dev ops teams. Deployment and management has to be an integral part of the programming team. Regardless of what it's called, it's as you say, if it's distinct it'll just be throwing code over the fence.

The team writing user features should be the same one coming up with the deployment. It's terrible to have somebody coding database features without having a clue about the install and management of that database. It's equally terrible to have somebody managing a DB that doesn't understand, or is unable to contribute to the code that works with that DB.

As more things move to distributed computing (shared services, virtual hosts, serverless, etc.) it becomes ever more important to stop segmenting these programming roles. Sure, you can have specialties, but the ownership of code must be shared across the board.

There should be no fence.

Kenneth Henderick • Dec 12 '17

I think "DevOps" isn't a team nor a person, but a mentality that needs to be grown. It's not that developers suddenly should know everything about operational tasks, or that operations should suddenly know how to write applications. It's that both teams need to start talking to each other.

And when there is the need so suddenly hire such DevOps people, or converting developers or operations into them, one should better start looking why the teams aren't working together as expected.

I've worked on teams in the past where operations could even ask development to create software to make their life easier, or they would set-up a meeting to give some basic understanding to the development team on how they worked, so they could align better.

Well, let's also state that in some companies, one could just merge both teams into a DevOps team, that's just another option. But generally speaking, just make sure everybody is aligned, and then everybody can do whatever he's good at. Whether that's writing software, setting up server environments, or whever.

Alex Rudenko • Dec 23 '17

I like the term SRE a lot. In my opinion, it's less ambiguous than DevOps which few people clearly know what it is :-) At my company we didn't have any DevOps or Operations teams at first. But we had several teams of developers working on different related services. Every team is responsible for the deployment process as well as monitoring. We used to do this for quite a while, but now we clearly see the need in a dedicated team caring about "operations". But I see the responsibilities of such teams as follows:

be responsible for shared infrastructure such as, for example, the central logging system, the API gateway etc.
develop guidelines and advice the teams on best practices of deployment and monitoring (teams still do it itself and are responsible) and how to use the shared infrastructure. Also providing tools for developers.
have more low-level expertise which can help to troubleshoot problems not directly related to the features/product
have more focus on general security of the infrastructure and help to keep it up-to-date
help the teams to improve the reliability of services over time by adopting best practices

Gavin Campbell • Dec 27 '17

Yes, naming problems aside, I can definitely see the need for a team like the one you describe. Almost all of your bullet points refer to "making it easier for the developers to ship features" which is indisputably a good thing. If the term "support" weren't so tainted, then "Developer Support" might be a good name!