Ben Halpern

Posted on Apr 1, 2019

Tell me about a time you messed up

#discuss

So I brought down the site for a little while this morning. Now I'm interested in hearing about when you messed everything up!

Top comments (56)

Jason R Tibbetts • Apr 1 '19

I forked a GitHub repo my first week at work, then had to delete it when I realized I'd forked the wrong one. Well, despite GitHub's repeated warnings, I managed to delete the source repo, not the fork. There were at least 6 open PRs in active development against it, and nobody had a full local clone that we could use to restore it. Thankfully, GitHub support was incredibly helpful and restored it.

The worst part is that some sympathetic coworkers humorously explained that this happens to all GitHub n00bs at some point. The problem was that I'd been using GH for at least 5 years at that point, and I should have known better.

Judith • Apr 2 '19

You are not alone Jason! I have ten years experience and LAST WEEK I merged a big commit not realizing another feature was finished first and already merged that had conflicts with my code. Thank God for merge tool or it would have been a sh*t show! Instead I walked away with my tail between my legs and a chance to fix my code. Moral of the story: it’s why we have versioning tools - we are all Human and will make mistakes 👍😀

Aivan Monceller • Apr 2 '19

That's a good thing to know that Github has backups

Jason R Tibbetts • Apr 2 '19

I wouldn’t rely on that option, though. We’re a big, visible company. Your results may vary.

savan kaneriya • Apr 27 '19

Git should give peptalk before trying any thing in cli mode.

JeFFBlanco • Apr 1 '19 • Edited

Don't sweat it! Software wouldn't be software without bugs and some outages. It happens 😃

My most heinous incident involved multiple threads, hitting one shared API connection and, in turn, criss-crossing customer data 😳

Max Antonucci • Apr 1 '19

When starting my current job, I was doing my first git rebase. But I didn't understand the command and wound up rebasing off the incorrect one. So my branch had dozens of extra commits that wound up getting pushed to master.

Thankfully the changes were reverted fast, but it also means I didn't see how spectacularly I screwed up the main site.

Judith • Apr 2 '19

Rebase should have a consumer warning lable😂😂😂

Pabi Forbes • Apr 2 '19

Almost experienced that while learning to rebase. Fortunately it got resolved quickly

Todd Stark II • Apr 1 '19

Last night I forgot to stop all my docker containers before running yum update. Now all of my containers are corrupted. Yay!

Micah Riggan • Apr 1 '19 • Edited

Once I used a PassThrough stream instead of an EventEmitter. Apparently PassThrough streams retain some state as the data goes through them, and so eventually it caused a memory leak.

This memory leak was in a multi-machine process, which led to two processes thinking they were responsible for updating the database.

That caused mongo queries from the affected nodes to randomly execute after about an hour of being slammed from lack of memory.

Eventually the affected node would restart and start to function normally. Which was worse, because it allowed this problem to go unnoticed, until some users started reporting duplicate or corrupted data.

Stuff of nightmares.

Ian Knighton • Apr 1 '19

I was working on writing a shell script to delete some files that were installed in the Applications directory on a Mac. In my mind, I had run it through command line so many times that I had just forgotten to fully write out the path.

So I finish up, make it fully executable, and bam! it erases my entire application folder.

Thank goodness for Time Machine.

Kevin McKenna • Apr 2 '19 • Edited

The one that springs to mind for me was a quick and dirty update against a single entry in the database.. check the SQL, run the code, 1000000+ entries updated.

I had the where clause on its own line and somehow had the rest highlighted when i hit F5 to run it.

The panic and 'Oh nooooooooo!' was horrifying.

Luckily I had my transaction log backups in place and was able to get all but about the most recent 5 minutes worth of data back in quickly, and very thankfully before most of the end users had logged in for the day. Make a habit of checking your backups!

Alan Hylands • Apr 2 '19

Sounds VERY familiar. Lost count of the number of "heart dropping through the floor" moments I've had over the years doing something similar :-D

Sam Myers • Apr 2 '19

Fat-fingered a command and failed to read the diff carefully. Wiped out the authentication system for the Kubernetes cluster my team had spent weeks building.

Fortunately, everything was in Terraform, so we only lost a few hours of work and gained a lot of confidence in the reliability of what we were working on.

The experience made me really sit down and think about the nature of professional integrity. The incident occurred near the end of my work-day and I wanted to just pretend nothing had happened. Maybe my login just so happened to stop working at the exact same time I ran something stupid... It would have been so easy.

But I was the only one who had the logs to figure out how I'd screwed up. I made a full write-up on what I had done, what recovery steps I had attempted, and what our options were. I felt terrible about it and apologized to my coworkers and the client. No one was upset; shit happens.

Judith • Apr 2 '19

Wow! Great work! If I were your boss I’d promote you for your work ethic alone; not to mention your honesty and courage. 👍

Karl N. Redman • Apr 10 '19 • Edited

I had just started as a Perl developer for Wolfram Research when Wolfram|Alpha (a sort of curated search engine) was launching. Because I had developed and automated their weather updates I was included in their launch team to be broadcast on twitch. However, once we were near ready to go to launch I discovered that none of the developers had wifi connectivity in the building we were launching from (it's a development launch and none of the developers could develop) -so I pressed the admin team to get the wifi working.

My insistence caught the notice of Stehpan Wolfram and the Director of Development. The Director, Peter O. asked me to oversee any and all potential hacks (DDos, etc.) that might be trying to interfere with our launch. I reluctantly took on the position for the launch (it was a 27 hour day in total).

Smack dab in the middle of the launch Stephan Wolfram is streaming our internet numbers network input when I realized that we were getting hit with an extreme number of search queries from a (seemingly) foreign range of IP addresses that would potentially start bringing down our in-house cluster. I hit the panic button and shut down all network traffic until I could block those IPs from our router.

As it turns out, the web group had been running tests from an outside cluster to drive up our numbers during the presentation. And no one had told me.... So, Stephen Wolfram is standing there, showing the numbers and there's a huge drop in traffic..... derp -that was me.

I turned everything back on in about 30 seconds but it didn't go unnoticed.

Karl N. Redman • Apr 11 '19

Stephan Wolfram video where the crash happens: the launch video is no longer available. The crash acknowledgement happens at 10:15

derp.

I think, at the time, 3 million people saw this happen.....

Sam Leibowitz • May 25 '19

Beautiful. :D

In a former life when I was doing network administration for a university's school of business, I got a phone call from another network admin demanding that I give him the contact information connected to a specific IP address that was "attacking" their system. He figured that a university kid was trying to brute force a password to their FTP service.

Turns out that it was a company being hosted by our business incubator. The angry admin's company had hired them to redo their website, so they were trying to FTP into the web server. Only the password was wrong, and their stupid software kept trying to reconnect.

That guy threatened to sic his lawyers on me. I told my boss, and to his credit, he shrugged and said, "that's fine. Here's the contact info for our lawyers. You can tell him to give it to his lawyers."

Markus Siering • Apr 1 '19

Three weeks into my new job, I deleted the marketing website sidebar including various signup widgets. Was not aware of it being used on every page, did not need it on the one I was editing and went „Nah, let‘s throw this out!“ There was no undo function for this. Luckily, a colleague noticed quite fast and she was able to insert the content again quickly 😅

View full discussion (56 comments)