DEV Community

Discussion on: An Engineer’s Rite of Passage

Collapse
 
ben profile image
Ben Halpern

My worst production outage was accidentally adding code which redeployed the application upon boot. On this very website. 😄

I added some code in a Rails initializer file which pinged the Heroku API to change a config variable on boot. I didn't really think through the whole thing because every time you change a config variable, the app redeploys and restarts. The code was written in such a way that it only executed in this way in production, so we had not caught it earlier.

Enter the infinite loop.

Nothing we could do would stop the loop. The app just kept redeploying over and over again and nothing would work to stop it. We couldn't push new code, we couldn't figure anything out.

status.heroku.com showed yellow indicating something was going on with the system. That was because of me.

Eventually we figured out we could stop the problem by revoking my account's privileges within the app on Heroku—But shortly after that, Heroku suspended our whole organization account. dev.to was no longer being served.

We got some people on the phone and got the account restored and back online soon enough after that.

That was a day of learning.

Collapse
 
molly profile image
Molly Struve (she/her)

Great story! Thanks for sharing @ben ! That was some innovative problem solving to revoke your account privileges to fix the issue. I always marvel at how innovative our team gets with solutions when our backs are against the wall. Feels like the pressure tends to really make us think outside the box to get things done.

Collapse
 
dansilcox profile image
Dan Silcox

Wow that's a good shout, changing the permissions - kinda the closest you have to ripping out ethernet cables as 'the hackers get closer' :D