DEV Community

"Crap, I broke production" - How do we ensure it never happens again?

Tomasz Łakomy on February 06, 2020

Before we start - I'm working on https://cloudash.dev, a brand new way of monitoring serverless apps 🚀. Check it our if you're tired of switching b...

Read full post

Jan Küster • Feb 7 '20

Biggest tower defense is to push back management and sales in their weird expectations that you have to deploy this new feature untested and immediately so they can present it in an totally unrealistic scheduled demo to a potential customer who hasn't event signed in yet.

Tomasz Łakomy • Feb 7 '20

In my opinion that problem can be largely mitigated by changing the mindset of the development team - a feature is not done (and therefore cannot be shown to others, demo'ed etc.) unless it's tested.

So many of us consider development and tests to be separate parts whereas I consider them to be two sides of the same coin

Jan Küster • Feb 7 '20

The important part here is to make management and sales to understand this thinking so there will be even no discussion about such an issue.

miniscruff • Feb 7 '20

I have noticed that most developers cave into management or PO type co workers very easily. Usually if you stand your ground they will understand. But doing so feels dangerous to them like they will be fired if they talk back in a way. But when I do it they are understanding and reasonable. Just takes guts sometimes.

Andrei Dascalu • Feb 7 '20

When discussing this, mostly everyone thinks the same. I generally fight the tendency to have too many explicit 'statuses' in the workflow (eg: JIRA).

For example, an issue is in development as long as there's work to be done on it. It moves along when deployed to a non-dev/test environment.

However, when management comes knocking and discuss it with people, developers tend to 'cave under pressure' and say that we're actually testing it instead of coding. Understandably though, since if you simply say people are working on it, things go sideways (oh, you've been working on it for 2 weeks, is it THAT difficult? at planning you said it's easy - sure, but at planning it's also difficult to accurately size the testing part and even though it gets mentioned, management tends to put testing out of their mind unless they are already test-oriented people).

Sure, developers should push back BUT in real life it doesn't happen much.

Nijeesh Joshy • Feb 7 '20 • Edited

I did something similar once, i pushed the code without removing binding.pry ( its like debugger for Ruby ). and i found about after deploying to staging ( good thing that we didn't pushed to production directly).

Learned about git precommit hooks that day. and i haven't done that mistake since. and have been telling juniors about it.

i think its important that we make mistakes, so other's doesn't have to make the same mistake.

"The only real mistake is the one from which we learn nothing."

miniscruff • Feb 7 '20

I am not a Ruby Dev but it sounds like that file should be git ignored

Nijeesh Joshy • Feb 7 '20

It's not a file, but it's just a single line of debugger statement used for setting breakpoint for your code execution.

Andrei Dascalu • Feb 7 '20

Well, no amount of best practices will guarantee production will never break again. Even not breaking in the exact same way again is quite a goal, though it's generally doable as long as adopted practices are respected.

In the face of disaster lessons are learned and strategies implemented. But in too many cases those strategies keep up only as long as people remember the size of the potential disaster and use it to keep off the pressure to sidestep safeguards. Otherwise, a few months from adoption someone will find a justification to silently bypass something.

In the happy case, all hell breaks loose and you have a chance to enhance discipline after hopefully averting disaster at the last minute like James Bond. Things do improve then.

In the sad case, it works, management praises the dodger for delivering something quickly simply introducing an incentive to do it more and get away with it. Eventually several hells break loose at the same time leaving the team unable to avert them all.

Đào Tuấn • Feb 7 '20

tl;dr: the last line of this post 😂

Ashish Agre • Feb 7 '20

Nice post, yes, I did such a mistake in the past, while working with the team one thing is to have a code review process where we know what is being pushed into master.

prozz • Feb 6 '20

whats your fav tower defence variant? btw nice post :)

Matti Bar-Zeev • Feb 23 '20

"For younger readers out there, Flash was the best thing ever and we're still catching up to it"
Yes. So true.