Don't deploy on Friday afternoons!
This expression is taken as programmer wisdom but I hate it. I'm going to try and kill it, with words and experience.
The motivation behind it is sound. I don't want to spend my Friday nights debugging a production problem either.
To me the expressions smacks of unprofessionalism. Software development as an industry has a poor reputation and phrases like this do not help.
If you went to an ER on a Friday afternoon and was turned away because the doctors don't trust their own tools and know-how, how would you feel?
If we want people to take our craft seriously we need to own it and not give the impression that we dont understand the systems are we making enough to make changes at the end of the week.
- You don't understand your system. You're making changes but you're not 100% sure it'll be OK. Ask yourself why this is. Why are you scared of changing your software?
- Poor monitoring. If the users are the first to tell you if something is wrong, that feedback loop spills into time at the pub, rather than when you deploy.
- Overly-complicated rollback/branching. How is your disaster recovery? How easy is it to push a fix once you solve a bug?
I have worked on teams that have deployed new versions of various services in a distributed system multiple times at 4:30pm and not broke a sweat.
Why? Because deployment is fully automated and is a non-event. Here is the groundbreaking process.
- Write some code
git commit -am "made the button pop"
git pull -r && ./build && git push
- Make a cup of tea
- It's now live.
Not so long ago it was considered normal for there to be releases every 6 months, or even just one at the end of a project.
The forward thinkers in that age saw problems with this
- Poor feedback loops
- Stressed development teams
- Catastrophic failures.
So the industry as a whole worked on lots of tooling, techniques and best practices to allow us to release software far quicker.
Recognising that releasing often reduces risk is generally accepted nowadays but teams still often settle on weekly or fortnightly releases; often matching the cadence of their sprints.
- The feedback loops are still not great. If you do your release there can be quite a lot of commits going live and if something is wrong it can be challenging to figure out exactly what broke. Especially if you wrote it 2 weeks ago.
- Still overly reliant on manual processes. I have seen teams actually skip a release because a QA was on holiday. This is surely unacceptable in 2018. Manual testing does not scale into the future. People leave, things get forgotten, etc.
- Let's you fall into the trap of writing stories that are dependant on other stories being finished in a "sprint". When they aren't things can get very complicated.
With CD we recognise that we can go further, deploying new software to live every time the build is green. This has some amazing benefits,
- Extremely fast feedback loops. No longer do you have to think about code you wrote 2 weeks ago when there is a problem in live.
- Forces best practices. In order to be able to deploy to live on green you need excellent monitoring and tests. These are all good things in their own right.
- Reduces stress. "Releasing" is no longer a thing any more. You can be confident in writing your software again!
- Vastly improves agility. Found a bug? Just fix it! This encourages a more lean way of working vs lots of upfront planning. There isn't even an option for a convoluted release process, you have to keep it simple.
- Forces you to work on stories that are actually releasable. Not dependent on story x y and z. Forces the best practices on user stories that everyone acknowledges but often people ignore.
Often people say with CD
Yeah it's nice but what if it breaks? We should have a QA check things over
Here's the thing, no process in the world prevents bugs. You will ship broken code. What's really important is how quickly you can detect and recover from it. Hoping manual testing will catch everything is wishful thinking.
It is much easier to do CD on a new project since you can start small and evolve.
Generally your work should be centered on delivering the most valuable user journeys first, so this is an excellent chance to practice how to ensure that feature works without any humans checking anything.
- Write an end to end test. These are expensive to write and run and should only be reserved for your most important journeys
- Have monitoring with threshold alerts for when things go wrong
- Set up your pipeline so that when your code is pushed all the automated tests are run, if they pass go to production.
- Have some kind of green/blue release mechanism. Run your automated tests on the deployed release candidate and if they dont pass, dont ship it.
For each subsequent story ask yourself
- How will we know this is working? (monitoring)
- What tests do I need to have enough confidence this will work without any humans checking. Not every story needs a full end-to-end test on a huge distrubuted system but obviously you'll need some tests.
- Is this story as small as it can be? If your user stories are massive they are more likely to go wrong. If the story takes a week then that's back to slow feedback loops.
- If you cant answer these questions then you need to rethink the story. Notice these are all just basic agile principles for user stories. Continous delivery forces you to adhere to the principles that so often get ignored
- You may have some kind of "run book" that is used when shipping the software. See what you could do to automate it.
- Find out all manual processes are happening. Ask why they are needed and what could be done to automate them.
Some companies have many environments in their delivery pipeline. A good first start is to automatically ship all the way up to the environment before live. A better step is remove as many of them as you can. It's ok to have some kind of "dev" environment to maybe experiment with but ask yourself why cant just test these things locally in the first place.
If you're working with a distributed system you might be able to identify a system which is easier to CD than the rest. Start with that because it'll give your team some insights into the new way of working and can help you begin to break the cultural barriers.
Often a product owner or project manager wants to be the one who is in charge of releasing.
There are circumstances where exposing features to users should be controlled by a non-technical member of your team, but this can be managed with feature toggles.
But the copying of code from one computer to another is the responsibility of the developers on the team. After all we are the ones who are responsible for making sure the system works. It is a technical concern, not a business one.
CD is actually liberating for QAs
- Rather than spending their time manually testing poorly tested systems they can now focus on a more holistic view of the system, trying to facilitate an environment for CD so that the whole team can be confident things are working working
- QAs spend more effort helping developers define what needs to be tested and monitored for a story to be written.
- More time for exploratory testing
Lots of companies think they cannot have any defects and will spend a lot of time and effort on complicated, time consuming (and therefore expensive) processes to try and stop them.
But think about the cost of all this? If you push a change to production that isn't covered by tests, perhaps a CSS change; consider if it's really catastrophic if there's a small visual fault for some browsers
Maybe it is, in which case there are techniques to test specifically for this too.
Each release you do with CD will have the following qualities
- Plenty of tests
- Good monitoring
- Small scope
- Still "fresh" in the developer's mind
So in my experience fixing anything that falls through the cracks is easy. It's much less complicated than trying to look through 2 week's worth of git history.
I would recommend in most cases not rolling back (unless it's really bad), but just fixing the issue and releasing it. Rollback is sometimes not an option anyway (e.g database migrations) so the fact that your system is geared to releasing quickly is actually a real strength of CD.
- Fast feedback loops are key. Measure your
git pushto live time and keep it low.
- If things are getting slow re-evaluate your end-to-end tests. If you removed a test or refactored it to be a unit test, would you be any less confident? If not, then refactor away
- You may need to invest some time in making your components more testable to avoid writing lots of slow end-to-end tests. This is a good thing.
- Feature toggles are a useful tool but can become messy, keep an eye on the complexity
- Celebrate it. Moving from one release every 2 weeks to 50 a day feels great.
This has been a small introduction to CD, it's a huge topic with plenty of resources to investigate.
Continuous delivery is a big effort both technically and culturally but it pays off massively.
I have worked on distributed systems with over 30 separately deployable components with CD and it has been less stressful than another project I've worked on that had just a handful of systems but a ton of process and ceremony.
Being able to release software when it's written puts a higher emphasis on quality and reduces risk for the team. It also forces you into using agile best practices like testable, small, independently releasable user stories.
Maybe most importantly, it demands that you understand your system and that you're not deploying to production and crossing your fingers. That feels more professional to me.