Cover Photo by Yan from Pexels
Yes - this post started as a tweet. One that went semi-viral. It struck a real, naked, buzzing nerve. A nerve that most of us prefer not to touch.
Some of us pretend it's not a real pain, others are just too busy fixing production issues. Still others - and that would probably be the majority of the industry - are yet to discover how much of a distress it is when they meet with it face to face.
the gap between the elite and everybody else is only growing wider
But the truth is out there. Mere mortals can't have Continuous Delivery and even less so - Continuous Deployment. CD has become the privilege of that group that the folks at DORA quite non-accidentally call "elite performers". And the gap between the elite and everybody else is only growing wider.
Fooling Ourselves
Most organizations we work with say: "of course we have CI/CD pipelines!"
But when one digs deeper - there's usually some CI - and no CD in sight. Or, as @itaysk noted "it's not even CI, but continuous build..."
When asked what stops them from safely and regularly deploying every change into production environments - everybody seems to have their own reasons. Organizational, cultural, historical, technical, contractual.. Some go as far into denial as saying : "Oh, we don't need continuous delivery. In fact most companies out there don't really need it." But the underlying reason is of course the lack of confidence. Nobody wants to be the culprit for a system outage. According to a number of industry surveys the average cost of one hour of downtime is around 75000 USD. There's a lot at stake!
So instead we choose to move slower, to add controlled handoffs and build home-grown guardrails. To hire more Ops engineers and call them SRE to feel more secure. Rarely discussing the price of establishing and maintaining all of these over time.
But why can't we have CD?
Continuous Delivery is a sociotechnical practice. And as many Twitter commenters correctly noted - the barriers on the way to having it are two-fold. As with anything in DevOps it starts with culture and shared understanding that continuously delivering in small increments makes everything better. Engineers who've experienced true CD can't really fathom any other way of delivering software. As @giltayar puts it "CD ... is a total game changer. It changes how you perceive software development and delivering features... I did CD and EVERYTHING about how I developed changed. It was magical."
The Social Dilemma
But we humans are scared of change. The new mode of delivery challenges our perceptions: of ownership, of reliability, of hierarchy. If your SRE team is responsible for production site uptime - then what's their incentive for enabling the constant flow of change that continuously threatens the very thing they are responsible for? If you have folks whose job it is to control what gets released when - what will they do when this control is made obsolete? The existing organizational barriers make the blame game easier - thus providing us with a false sense of confidence. Because the tools we currently have can't promise true confidence - and this bring us to...
The Technical Dilemma
The socio-cultural obstacles are truly the hardest to remove. But as Archimedes used to say: "Give me a lever long enough and a fulcrum on which to place it, and I shall move the world." Technology, while meaningless on its own can become a great enabler for societal innovation.
Trouble is - the tools for continuous delivery/deployment are still lacking. And this is especially true for the new brave cloud/edge-native world we see rapidly unfolding before our eyes.
But Aren't CI Tools Enough?
This is where some readers might say: "Why are you saying there are no tools for CD? We already have Jenkins/CircleCI/Github Actions... Why can't we use those? and then there's Spinnaker, isn't there?"
That, of course, is a grave mistake. Yes - any CI server or even generic workflow automation tool can theoretically orchestrate your deployments - the mechanics of deployment are trivial. But deploying like this is the same thing as the proverbial "throwing changes over the wall" practice that brought on the DevOps revolution.
Because CI tools ignore the semantics of change. The only kind of feedback they provide is deterministic one - verifying a pre-defined functionality under pre-defined conditions. While the production environment has inherent uncertainty leading it to behave in often unpredictable manner. Therefore - in modern complex systems no change is verified until it reaches production. As they say - until the wheels hit the road.
And that is exactly why most orgs out there can't have CD. Because blindly pushing into production is scary, stressful and in the end falls on the shoulders of the undermanned SRE team.
And that is exactly why most orgs out there can't have CD. Because blindly pushing into production is scary, stressful and in the end falls on the shoulders of the undermanned SRE team.
Cloud Native CD is Possible
It's not all bad, of course. Some teams we talk to succeed to establish true cloud native CD by investing multiple man-months in home-grown solutions. This is costly, most orgs can't allow this, but those who do are very proud of their achievement - until the platform changes under their feet and they need to reinvent the home-grown solution.
Some very interesting OSS projects have emerged in the last couple of years in an attempt to tackle the problem. ArgoCD with Argo Rollouts, Flux and Flagger, Shipper and Keptn are all definitely worth looking at.
Still no one comprehensive, reliable, usable platform exists that can help us deploy to production continuously with confidence and without complex unsustainable in-house hackery.
That's why we at Canarian decided to step up to the challenge.
We're building a platform that will allow you to deploy continuously with confidence, full observability and automated recovery.
In the next post I'll describe the feature set that we see as the minimal viable proposition for such a platform and how we're building it.
Sounds interesting? Send us an email, sign up for our beta version on the site or just follow this blog.
We'll keep you continuously updated ;)
Keep delivering!
Top comments (30)
Sweet Jesus, of course it's a sales pitch.
Not yet :) we still don't have anything to sell. This is just my opinion based spending a couple of decades in IT. Cloud and then containers were tectonic shifts but now it's time for the next stage.
Hence the clickbait title.
For me, the big "hell no" to CD was when my boss received a phone call from a customer who was screaming and crying because we added 1 button to the UI for a requested feature. This change was the last straw for this particular user, and after rolling back the change, our customer agents collected user feedback and we got a very important backhand across the face from reality...
We used to practice 2 week deployments with a roadmap to get onto a full CI/CD workflow so that, even with feature flags, we could roll out changes quickly. After we almost lost our largest client over a button, we pulled back to a quarterly deployment compromise since many of our customers were adamant that we should do less updates, as few as once a year in some cases. What we took away from this event was end users DO NOT WANT CHANGES; when they have their own work to do, they don't have time to learn new features every week, day, or god forbid every hour. This is not a matter of fear, it's a matter of compassion for your end-users' time.
Another thing to consider is regulatory compliance. In some industries (like healthcare in the US) you have to certify your software, and major "feature" modifications trigger a significant and costly recertification process. Adding new features more than a few times a year could drive small businesses out of business with the $20k and up fee per recertification.
IMO this a product/leadership horror story, not necessarily a technology horror story. The issue is that the right feature wasn't built – or it was built in a way that required new processes from the customer. When that happens, the methodology and timeline of its release isn't the cause of the failure – it would have always been poorly received. There is a missing 'product' role here – a person or team that is in constant conversation with the customer and ultimately responsible for features make it into the product.
You're right on about compliance. So a CD "culture" or structure isn't a drop-in solution for every business. Some industries simply should not be releasing new ui changes or features all the time.
CD isn't always about features, though. Sometimes it's about performance, security or technical debt. In fact, a trusted CD process is a potential solution to this type of "bad feature" issue, allowing for fast backpedaling.
When talking about features, it is totally true that customers don't want change. They don't even want the "product", what they want is what your product enables them to achieve. They're hiring your product to get their job done. Compassion includes being on that journey with them when scenarios change.
An example: your customer's industry has a new legal regulation that requires them to change how they work (or let's say, a global pandemic occurs and changes everything 😉 ). In this scenario, compassion for the customer means anticipating their needs, and releasing changes as quickly and confidently as possible – as the situation evolves. This responsiveness is what CD enables.
"lost a client over a button" is surely a scary story! but wouldn't a better approach be making sure each client only gets the changes they need? CD isn't necessarily about new features. It's also about fixing bugs, improving performance and continuously paying off our tech debt. It's also about being able to roll things back quickly when the shit hits the fan. Once we're able to do this - we'll have our client's trust and won't fear losing them over a button.
This confirms my belief that CD is driven almost entirely by parochial MIS departments wanting to jump onto the latest devops craze or improve their own processes. I would be interested to know if anybody has seen quantifiable real world benefits experienced by uses outside of the IT department.
Definitely the key to continuous delivery. Devs avoid deploys when they are difficult or risky; deploys are risky when:
Instead of taking steps to make mistakes no/low impact; gatekeeping steps are layered atop each other to ensure no mistakes are made, simultaneously ensuring that any mistakes that are missed stay for weeks to months (to years) waiting for a fix to make it through the same gatekeepers.
Beautifully put! Lack of usable technology or engineering expertise is compensated for by broken culture. That's the situation we are out to fix.
This is a good article, but I'm a little confused by this central claim. It seems surprising that someone would claim to have a CI/CD pipeline if they only have CI. It seems like a difficult mistake to make, like if I said that I basically have a car when I actually just have a bicycle. They're very different things.
Could you perhaps elaborate on what you consider to be CD that other organizations don't recognize? Or could you give an example of something that someone thought was CI/CD that wasn't?
My main idea is that we tend to conflate CI with CD. Folks start with CI naively believing that with time - as they build it out - the same pipeline will take them to CD land. But then they hit the wall of uncertainty and stop the pipeline at the "staging" environment. So when you ask them, they say "we have CD, but we're not deploying to production because reasons" - and that's denial of course.
This makes sense, thanks!
This is through the prism of a backend dev right?
Because on the frontend, we are so many to use CI/CD now, with tools like Netlify, Vercel, Render, Surge... We have deploy previews for each PR, and it deploys to production on merge.
Afaik some backend colleagues using Heroku also have this kind of workflow with Review Apps.
But despite that setup, humans are still afraid to be. responsible for production problems. This just moves the fear to merging a PR, and we generally have a human reviewing the deploy previews.
Yes, definitely - things look brighter in the FE world. There's still that issue with syncing between the previews of FE and BE. And - can you release your previews gradually to a small percentage of your customers with Netilfy and the bunch? Asking because I don't know.
For rollout strategies, seen this Netlify product recently talking about phased rollout: netlify.com/products/edge/
I used to have 2 deployments for my startup: one on the dev branch, one on master. We had some users (including ourselves) using the dev branch by default, so we can notice early if something is wrong (as we use our own product).
Technology change fast. We, as humans, have difficulties to change.
That's the only answer I have for too many questions, including why many companies out there don't do CD, why they don't try to really create their own agile environment adapted to their culture (even if they think they do because they use scrum or whatever process / tooling), why the interview process in tech is often a joke full of whiteboard (I mean, who code on a computer?) with Google forbidden, of course.
That was a really good read! Thanks for that.
thank you Matthieu! but we as humans are also capable of much more if we create an environment that supports it.
I think when we're talking about this, it's really useful to distinguish /deployment/, which is getting code onto production, from /release/, which is the business decision about what users see and when they see it. Breaking those up does a lot to de-risk CD, because it's not immediately visible to users. Then you have time to test in production and find all the places things can or will go wrong. Or, well, most of them.
We all should keep in mind, that "production" often means "isolated environment in the field, anywhere in the world, with small embedded systems which might not even be capable of being deployed automatically"
This means, in the real world, true feedback loops are often impossible to implement. To blame the developers that are only using CI in those cases, because that is the most they are able to do, is not the correct thing to do.
The judgement may apply for those fancy, well-connected developers from the hipster web development bubble where true CD can be established.
But there are an awful lot of embedded systems programmers out there who already struggled with manual deployment processes for years and certainly will through the next decades.
For me and my team "only CI" has been a dramatic improvement in code quality and automation of nasty build processes which take hours to complete and took a lot of time before we had a working CI environment.
So I consider "only CI" a great thing overall which should not be blamed that harsh.
A great point! Edge and embedded deployments are definitely a largely untackled challenge. But even that is changing today. Look for example at what Zededa zededa.com/ are building.
Most of my past bosses be like "What the f**k is this CI/CD you're talking about?" Why would I spend good money on this DevOps BS when I already have code monkeys at my disposal?
My heart goes out to you. Never, ever let your boss treat you like a code monkey! I was there - it sucks!
No worries, actually most of them was O.k... just a couple that was horrible and I was out of there very fast, so... Still, most didn't wanna hear about automation or paid expert consultents...
I think the terminology in unfortunately overloaded.
gitlab.com/jessephillips/blog/-/bl...
Industry has shoved everything into CI and then reserved CD for production. Realistically there is CI, something in the middle, then CD.
Organizations have done CI for a long time, but then they want to do "something in the middle". It is a hard sales push, "let's implement CI!"... "umm, didn't we do that last year?" "yes, but we just do this every year"
I think it is because to realize there is ambiguity in where the lines exist and ask for clarification.
yes, ambiguity definitely leads to misinterpretation. but my point is that CD in cloud native world is a totally different concern - not just an extension of our pipeline. CI or "something in the middle" can all be implemented by a basic workflow automation tool. CD requires smarter, domain-specific algorithms and strategies.