DEV Community

Ben Halpern
Ben Halpern

Posted on

Tell me about a time you messed up

So I brought down the site for a little while this morning. Now I'm interested in hearing about when you messed everything up!

Top comments (56)

Collapse
 
jrtibbetts profile image
Jason R Tibbetts

I forked a GitHub repo my first week at work, then had to delete it when I realized I'd forked the wrong one. Well, despite GitHub's repeated warnings, I managed to delete the source repo, not the fork. There were at least 6 open PRs in active development against it, and nobody had a full local clone that we could use to restore it. Thankfully, GitHub support was incredibly helpful and restored it.

The worst part is that some sympathetic coworkers humorously explained that this happens to all GitHub n00bs at some point. The problem was that I'd been using GH for at least 5 years at that point, and I should have known better.

Collapse
 
jrohatiner profile image
Judith

You are not alone Jason! I have ten years experience and LAST WEEK I merged a big commit not realizing another feature was finished first and already merged that had conflicts with my code. Thank God for merge tool or it would have been a sh*t show! Instead I walked away with my tail between my legs and a chance to fix my code. Moral of the story: it’s why we have versioning tools - we are all Human and will make mistakes πŸ‘πŸ˜€

Collapse
 
geocine profile image
Aivan Monceller

That's a good thing to know that Github has backups

Collapse
 
jrtibbetts profile image
Jason R Tibbetts

I wouldn’t rely on that option, though. We’re a big, visible company. Your results may vary.

Collapse
 
savankaneriya profile image
savan kaneriya

Git should give peptalk before trying any thing in cli mode.

Collapse
 
jeffblanco profile image
JeFFBlanco • Edited

Don't sweat it! Software wouldn't be software without bugs and some outages. It happens πŸ˜ƒ

My most heinous incident involved multiple threads, hitting one shared API connection and, in turn, criss-crossing customer data 😳

Collapse
 
tstark2 profile image
Todd Stark II

Last night I forgot to stop all my docker containers before running yum update. Now all of my containers are corrupted. Yay!

Collapse
 
maxwell_dev profile image
Max Antonucci

When starting my current job, I was doing my first git rebase. But I didn't understand the command and wound up rebasing off the incorrect one. So my branch had dozens of extra commits that wound up getting pushed to master.

Thankfully the changes were reverted fast, but it also means I didn't see how spectacularly I screwed up the main site.

Collapse
 
pabiforbes profile image
Pabi Moloi, but Forbes

Almost experienced that while learning to rebase. Fortunately it got resolved quickly

Collapse
 
jrohatiner profile image
Judith

Rebase should have a consumer warning lableπŸ˜‚πŸ˜‚πŸ˜‚

Collapse
 
micahriggan profile image
Micah Riggan • Edited

Once I used a PassThrough stream instead of an EventEmitter. Apparently PassThrough streams retain some state as the data goes through them, and so eventually it caused a memory leak.

This memory leak was in a multi-machine process, which led to two processes thinking they were responsible for updating the database.

That caused mongo queries from the affected nodes to randomly execute after about an hour of being slammed from lack of memory.

Eventually the affected node would restart and start to function normally. Which was worse, because it allowed this problem to go unnoticed, until some users started reporting duplicate or corrupted data.

Stuff of nightmares.

Collapse
 
ianknighton profile image
Ian Knighton

I was working on writing a shell script to delete some files that were installed in the Applications directory on a Mac. In my mind, I had run it through command line so many times that I had just forgotten to fully write out the path.

So I finish up, make it fully executable, and bam! it erases my entire application folder.

Thank goodness for Time Machine.

Collapse
 
torpne profile image
Kevin McKenna • Edited

The one that springs to mind for me was a quick and dirty update against a single entry in the database.. check the SQL, run the code, 1000000+ entries updated.

I had the where clause on its own line and somehow had the rest highlighted when i hit F5 to run it.

The panic and 'Oh nooooooooo!' was horrifying.

Luckily I had my transaction log backups in place and was able to get all but about the most recent 5 minutes worth of data back in quickly, and very thankfully before most of the end users had logged in for the day. Make a habit of checking your backups!

Collapse
 
alanhylands profile image
Alan Hylands

Sounds VERY familiar. Lost count of the number of "heart dropping through the floor" moments I've had over the years doing something similar :-D

Collapse
 
sammyers profile image
Sam Myers

Fat-fingered a command and failed to read the diff carefully. Wiped out the authentication system for the Kubernetes cluster my team had spent weeks building.

Fortunately, everything was in Terraform, so we only lost a few hours of work and gained a lot of confidence in the reliability of what we were working on.

The experience made me really sit down and think about the nature of professional integrity. The incident occurred near the end of my work-day and I wanted to just pretend nothing had happened. Maybe my login just so happened to stop working at the exact same time I ran something stupid... It would have been so easy.

But I was the only one who had the logs to figure out how I'd screwed up. I made a full write-up on what I had done, what recovery steps I had attempted, and what our options were. I felt terrible about it and apologized to my coworkers and the client. No one was upset; shit happens.

Collapse
 
jrohatiner profile image
Judith

Wow! Great work! If I were your boss I’d promote you for your work ethic alone; not to mention your honesty and courage. πŸ‘

Collapse
 
karlredman profile image
Karl N. Redman • Edited

I had just started as a Perl developer for Wolfram Research when Wolfram|Alpha (a sort of curated search engine) was launching. Because I had developed and automated their weather updates I was included in their launch team to be broadcast on twitch. However, once we were near ready to go to launch I discovered that none of the developers had wifi connectivity in the building we were launching from (it's a development launch and none of the developers could develop) -so I pressed the admin team to get the wifi working.

My insistence caught the notice of Stehpan Wolfram and the Director of Development. The Director, Peter O. asked me to oversee any and all potential hacks (DDos, etc.) that might be trying to interfere with our launch. I reluctantly took on the position for the launch (it was a 27 hour day in total).

Smack dab in the middle of the launch Stephan Wolfram is streaming our internet numbers network input when I realized that we were getting hit with an extreme number of search queries from a (seemingly) foreign range of IP addresses that would potentially start bringing down our in-house cluster. I hit the panic button and shut down all network traffic until I could block those IPs from our router.

As it turns out, the web group had been running tests from an outside cluster to drive up our numbers during the presentation. And no one had told me.... So, Stephen Wolfram is standing there, showing the numbers and there's a huge drop in traffic..... derp -that was me.

I turned everything back on in about 30 seconds but it didn't go unnoticed.

Collapse
 
karlredman profile image
Karl N. Redman

Stephan Wolfram video where the crash happens: the launch video is no longer available. The crash acknowledgement happens at 10:15

derp.

I think, at the time, 3 million people saw this happen.....

Collapse
 
leibowitzsam profile image
Sam Leibowitz

Beautiful. :D

In a former life when I was doing network administration for a university's school of business, I got a phone call from another network admin demanding that I give him the contact information connected to a specific IP address that was "attacking" their system. He figured that a university kid was trying to brute force a password to their FTP service.

Turns out that it was a company being hosted by our business incubator. The angry admin's company had hired them to redo their website, so they were trying to FTP into the web server. Only the password was wrong, and their stupid software kept trying to reconnect.

That guy threatened to sic his lawyers on me. I told my boss, and to his credit, he shrugged and said, "that's fine. Here's the contact info for our lawyers. You can tell him to give it to his lawyers."

Collapse
 
herrsiering profile image
Markus Siering

Three weeks into my new job, I deleted the marketing website sidebar including various signup widgets. Was not aware of it being used on every page, did not need it on the one I was editing and went β€žNah, letβ€˜s throw this out!β€œ There was no undo function for this. Luckily, a colleague noticed quite fast and she was able to insert the content again quickly πŸ˜…

Collapse
 
kdavis profile image
Kim Davis

I once took a good portion of a large site down for over 24 hours by clearing the whole site's cache. So, it could've ben worse ;).

Collapse
 
bjorngrunde profile image
BjΓΆrn Grunde

I was working on a quite easy and basic feature of an Economy system and the only advanced part was a really old legacy module that had two functions with really similar and bad names, something like cr_EDC_use_ETM20 and cr_EDC_use_ET20. My ide auto spelled me the wrong function and I managed to send several invoices to over 60k clients. Invoices with an infinite amount to pay. Luckily we had rollback systems, so clients never noticed. Later our old-school legacy programmer refactored the module with names that made sense to the younger generation :P

Collapse
 
jrecas profile image
JReca

I manually updated store database with a SQL file. I messed up field order, and basically had stock and price reversed. For a few minutes we had a lot of very cheap products that we really had few expensive ones. At least no one bought anything.

Collapse
 
torpne profile image
Kevin McKenna

The worst part about that kind of mistake is the sudden 'oh no!' that sweeps over you as you realize the implications of the mistake

Collapse
 
dmfay profile image
Dian Fay

Last week I set up a load balancer to automatically forward http to https for a particular application. That part went fine. What didn't is that we use Slack's slash commands extensively to interact with the application, and I forgot I'd set those up before I'd even gotten an SSL cert. Slack does not like getting a 301 response. They were all broken for hours for our people in Europe before I woke up and figured out what had happened.

Collapse
 
itsmarkodonnell profile image
Mark O' Donnell • Edited

In my first job I managed a firewall configuration app for a large Telco. It allowed Fortune 500 companies to make firewall changes on their network. Can’t remember the exact problem but long story short I changed something which stopped anyone logging in at all. No-one could make any changes in for 2 days until we found the issue. 😳

This was particularly bad because it affected bonuses that year, whooops! πŸ˜‚πŸ˜‚