Whenever I dare open LinkedIn, I suddenly come to the realisation that DevOps is one of the most misused terms since the concept of Agile. It's not hard to see why, if you've followed the concept as it grew.
Quick background: "devops" was first used by a guy named Patrick Debois around 2008. The context: the implementations of Agile felt as if they weren't good enough to guide development processes. Together with a guy called Andrew Clay, they point at silo-ization of teams (and even companies). Debois masterminded an event called DevOpsDays the very next year and things spiralled from there.
A step towards misuse (a gift from "silos"): everybody knows the development process. You get requirements, you write some code that works, you hope it gets tested and the job is done. Making it available to users, making sure it runs, gathering info about issues: these are someone else's problem now. This should be assigned to an "ops" team so someone had to cross the silo divide to make DevOps happen - obviously we should call that "devops engineering" making a "devops" position. Thus making "devops" another word for "ops".
The best definition of DevOps comes from a guy called Ken Mugrage, who spends a good deal of time outlining the actual meaning of DevOps and the fact that it is not a role/job/position, but a practice (or a mindset). To be fair, he also wrote that it's ok to list yourself as a DevOps engineer as long as you don't promote "ops silos" or the silo mentality.
It was just early last year that a bigger project came to my company. As it usually goes it was also rather urgent (as in, full-featured product a month before Christmas). We pooled in a good team and went for an MVP.
The directive was clear: have a working MVP at any cost. We approached it the usual way with frontend (two frontend applications) and backend (6 backend services). People worked according to their primary field and moved across applications to keep the expertise and knowledge flowing. We pulled in from our recipe stack a development quickstart package (based on Docker Compose, for local development) to enable the team to get up and running ASAP.
The main delivery restriction known at the time was that we had to use Azure, which we did.
Everything was working fine in local (sure, why don't we ship your local environment then!) but when it came to deploying it, things went also just about every other project: developers have it working so it's someone else's job to make their code run as it was with all the chose components.
Once thing got into Azure, various issues were exposed:
- the team had decided to use Redis 6 for caching. However Redis 6 was a beta in Azure and not available as a cluster. We had to use Redis 4 and the dev team had to go modify their stack and realign their libraries.
- we had a requirement for scaling (which you don't usually enable in development) so we had to use Redis cluster to accommodate scalable services. The devs used sigle-node Redis and their NodeJS library was configured as such. Changes were needed to allow cluster access.
- the devs also used Redis for a pub/sub service communication system. However, aside from flaky persistence, we also had a requirement for resilience. This means to have the application ready to go in two different Azure regions. It's not a big deal if caching has to be recreated but pub/sub queues had to flawlessly make the transition. In Azure Redis can't do that without manual intervention.
- logging: logging was implemented differently without standard structure. One of the most time wasteful things happened when it took over a week just to align logging models (and even afterwards fixes were needed to offer proper context to logs). Also, often logging is needed in the frontend as well. It's a sad day when frontend log instrumentation is postponed.
- metrics: while a late requirement, metrics are that thing you know it's going to be needed by a running system (even when people say it's not needed) and it's trivial to instrument an application from the start (and harder to bake in later). Of course, this came biting.
- observability: sooner or later the question comes - why am I throwing compute power at an app and it's still sluggish as hell in any non-dev environment? Tracing and APM come to answer that. Of course, some instrumentation and config is needed. It's trivial to prepare for it early.
- configuration management: in development various services used dotEnv files or actual environment variables. It's never an issue in development when you use one or the other but in a production system it's important to know that in the vast majority of config management libraries, actual environment variables override values taken from files. Also, questions come as to where would you store sensitive stuff? How do you pull them if you need them at build time? Or at runtime? Do platform choices impact that somehow? In Azure for example it was fairly easier to make secrets available as pure env vars in running containers (same for Github builds for build-time values). Anybody who was relying on pure dotEnv files had to change stuff.
I happened to mention dev, but it's not a blame. Organisationally, the requirements came through a PO to the dev team, which was taken literally in the context of a Scrum--wannabe company - the dev team was made by devs with Ops and QA considered external dependencies.
Even though technically there are QA and ops attached to the team, they were treated mostly as dependencies (dev does work, hand it to qa, then it gets handed to ops for deployment - while people get together in plannings and whatnot, most decisions rests with devs or architects and everyone else executes).
Consequences? Deadlines missed, overtime, mounting costs and eventually external help was needed to get it over the finish line.
Solutions? While a suggested "fix" was to have more oversight in the beginning, truth is that changes will be needed along the way. DevOps is a better answer, with communication front and centre and with knowledge-sharing and ownership coming in.
One thing I tend to say about DevOps is that: everyone in the team is DevOps engineer or nobody is. To achieve that, there are some basics to deal with.
Know your tools: too often I see developers that know their development tools but not what it means to actually run your application. Many React/Angular devs know to execute
npm runbut not what it means to build the application and run it. What are build time config variables and what are run time config variables. The attitude that "this is someone else's job" has to go away when teamwork is needed.
Communicate: might seem obvious but I found it's often not. When requirements come in, they have to be seen in the grad scheme of things, by every team member. They aren't standalone piece of work that need coding and then that's that. No amount of "definition of done" or "acceptance criteria" can replace good ol' fashioned alignment. Daily scrums or whatever of the ceremony teams perform on a daily basis may align you on daily tasks but not on the "big picture". It's everyone's job to make sure everyone in the team has a chance to weigh in (or at least be aware of) the implications of a task.
Imagine: imagination is important in development. How do you envision your piece of work becoming useful to a user? How does the journey looks like? What does it take to consider it working? How does it get from being code in my IDE to something users interact with? If there's a problem, what information do I need to fix it? There are lots of things that may go to the back of your mind.
Ken Mugrage's definition is an elegant way to talk about ownership. The Team owns developing and operating it. Even when outside help is needed. Ownership makes everyone thing about the consequences of not doing something because eventually you'd have to handle fixing it (whether doing it yourself or finding someone who can help).
This is DevOps: owning the development and operation of software. It's the missing piece for "True Agility". The job doesn't end when you commit and push your code.