DEV Community

Cover image for "Crap, I broke production" - How do we ensure it never happens again?

"Crap, I broke production" - How do we ensure it never happens again?

Tomasz Łakomy on February 06, 2020

Before we start - I'm working on https://cloudash.dev, a brand new way of monitoring serverless apps 🚀. Check it our if you're tired of switching b...
Collapse
 
jankapunkt profile image
Jan Küster

Biggest tower defense is to push back management and sales in their weird expectations that you have to deploy this new feature untested and immediately so they can present it in an totally unrealistic scheduled demo to a potential customer who hasn't event signed in yet.

Collapse
 
tlakomy profile image
Tomasz Łakomy

In my opinion that problem can be largely mitigated by changing the mindset of the development team - a feature is not done (and therefore cannot be shown to others, demo'ed etc.) unless it's tested.

So many of us consider development and tests to be separate parts whereas I consider them to be two sides of the same coin

Collapse
 
jankapunkt profile image
Jan Küster

The important part here is to make management and sales to understand this thinking so there will be even no discussion about such an issue.

Thread Thread
 
miniscruff profile image
miniscruff

I have noticed that most developers cave into management or PO type co workers very easily. Usually if you stand your ground they will understand. But doing so feels dangerous to them like they will be fired if they talk back in a way. But when I do it they are understanding and reasonable. Just takes guts sometimes.

Collapse
 
andreidascalu profile image
Andrei Dascalu

When discussing this, mostly everyone thinks the same. I generally fight the tendency to have too many explicit 'statuses' in the workflow (eg: JIRA).

For example, an issue is in development as long as there's work to be done on it. It moves along when deployed to a non-dev/test environment.

However, when management comes knocking and discuss it with people, developers tend to 'cave under pressure' and say that we're actually testing it instead of coding. Understandably though, since if you simply say people are working on it, things go sideways (oh, you've been working on it for 2 weeks, is it THAT difficult? at planning you said it's easy - sure, but at planning it's also difficult to accurately size the testing part and even though it gets mentioned, management tends to put testing out of their mind unless they are already test-oriented people).

Sure, developers should push back BUT in real life it doesn't happen much.

Collapse
 
nijeesh4all profile image
Nijeesh Joshy • Edited

I did something similar once, i pushed the code without removing binding.pry ( its like debugger for Ruby ). and i found about after deploying to staging ( good thing that we didn't pushed to production directly).

Learned about git precommit hooks that day. and i haven't done that mistake since. and have been telling juniors about it.

i think its important that we make mistakes, so other's doesn't have to make the same mistake.

"The only real mistake is the one from which we learn nothing."

Collapse
 
miniscruff profile image
miniscruff

I am not a Ruby Dev but it sounds like that file should be git ignored

Collapse
 
nijeesh4all profile image
Nijeesh Joshy

It's not a file, but it's just a single line of debugger statement used for setting breakpoint for your code execution.

Collapse
 
andreidascalu profile image
Andrei Dascalu

Well, no amount of best practices will guarantee production will never break again. Even not breaking in the exact same way again is quite a goal, though it's generally doable as long as adopted practices are respected.

In the face of disaster lessons are learned and strategies implemented. But in too many cases those strategies keep up only as long as people remember the size of the potential disaster and use it to keep off the pressure to sidestep safeguards. Otherwise, a few months from adoption someone will find a justification to silently bypass something.

In the happy case, all hell breaks loose and you have a chance to enhance discipline after hopefully averting disaster at the last minute like James Bond. Things do improve then.

In the sad case, it works, management praises the dodger for delivering something quickly simply introducing an incentive to do it more and get away with it. Eventually several hells break loose at the same time leaving the team unable to avert them all.

Collapse
 
tuandse62171 profile image
Đào Tuấn

tl;dr: the last line of this post 😂

Collapse
 
silentsudo profile image
Ashish Agre

Nice post, yes, I did such a mistake in the past, while working with the team one thing is to have a code review process where we know what is being pushed into master.

Collapse
 
prozz profile image
prozz

whats your fav tower defence variant? btw nice post :)

Collapse
 
mbarzeev profile image
Matti Bar-Zeev

"For younger readers out there, Flash was the best thing ever and we're still catching up to it"
Yes. So true.