DEV Community

Ben Halpern
Ben Halpern

Posted on

Tell me about a time you messed up

So I brought down the site for a little while this morning. Now I'm interested in hearing about when you messed everything up!

Oldest comments (56)

Collapse
 
maxwell_dev profile image
Max Antonucci

When starting my current job, I was doing my first git rebase. But I didn't understand the command and wound up rebasing off the incorrect one. So my branch had dozens of extra commits that wound up getting pushed to master.

Thankfully the changes were reverted fast, but it also means I didn't see how spectacularly I screwed up the main site.

Collapse
 
pabiforbes profile image
Pabi Moloi, but Forbes

Almost experienced that while learning to rebase. Fortunately it got resolved quickly

Collapse
 
jrohatiner profile image
Judith

Rebase should have a consumer warning lableπŸ˜‚πŸ˜‚πŸ˜‚

Collapse
 
desi profile image
Desi

I don’t have an example (yet) but I was thinking how this goof could help others - especially newbies/beginners - realize they are definitely not alone when their first outage happens!

Collapse
 
herrsiering profile image
Markus Siering

Three weeks into my new job, I deleted the marketing website sidebar including various signup widgets. Was not aware of it being used on every page, did not need it on the one I was editing and went β€žNah, letβ€˜s throw this out!β€œ There was no undo function for this. Luckily, a colleague noticed quite fast and she was able to insert the content again quickly πŸ˜…

Collapse
 
tstark2 profile image
Todd Stark II

Last night I forgot to stop all my docker containers before running yum update. Now all of my containers are corrupted. Yay!

Collapse
 
jeffblanco profile image
JeFFBlanco • Edited

Don't sweat it! Software wouldn't be software without bugs and some outages. It happens πŸ˜ƒ

My most heinous incident involved multiple threads, hitting one shared API connection and, in turn, criss-crossing customer data 😳

Collapse
 
jrtibbetts profile image
Jason R Tibbetts

I forked a GitHub repo my first week at work, then had to delete it when I realized I'd forked the wrong one. Well, despite GitHub's repeated warnings, I managed to delete the source repo, not the fork. There were at least 6 open PRs in active development against it, and nobody had a full local clone that we could use to restore it. Thankfully, GitHub support was incredibly helpful and restored it.

The worst part is that some sympathetic coworkers humorously explained that this happens to all GitHub n00bs at some point. The problem was that I'd been using GH for at least 5 years at that point, and I should have known better.

Collapse
 
geocine profile image
Aivan Monceller

That's a good thing to know that Github has backups

Collapse
 
jrtibbetts profile image
Jason R Tibbetts

I wouldn’t rely on that option, though. We’re a big, visible company. Your results may vary.

Collapse
 
jrohatiner profile image
Judith

You are not alone Jason! I have ten years experience and LAST WEEK I merged a big commit not realizing another feature was finished first and already merged that had conflicts with my code. Thank God for merge tool or it would have been a sh*t show! Instead I walked away with my tail between my legs and a chance to fix my code. Moral of the story: it’s why we have versioning tools - we are all Human and will make mistakes πŸ‘πŸ˜€

Collapse
 
savankaneriya profile image
savan kaneriya

Git should give peptalk before trying any thing in cli mode.

Collapse
 
micahriggan profile image
Micah Riggan • Edited

Once I used a PassThrough stream instead of an EventEmitter. Apparently PassThrough streams retain some state as the data goes through them, and so eventually it caused a memory leak.

This memory leak was in a multi-machine process, which led to two processes thinking they were responsible for updating the database.

That caused mongo queries from the affected nodes to randomly execute after about an hour of being slammed from lack of memory.

Eventually the affected node would restart and start to function normally. Which was worse, because it allowed this problem to go unnoticed, until some users started reporting duplicate or corrupted data.

Stuff of nightmares.

Collapse
 
ianknighton profile image
Ian Knighton

I was working on writing a shell script to delete some files that were installed in the Applications directory on a Mac. In my mind, I had run it through command line so many times that I had just forgotten to fully write out the path.

So I finish up, make it fully executable, and bam! it erases my entire application folder.

Thank goodness for Time Machine.

Collapse
 
nuculabs_dev profile image
Nucu Labs

Today, I forgot to turn off a service which was indexing data in ElasticSearch and I've ran an command on my machine that was also indexing data, causing the system to go into a corrupt state. I had to turn of the service and re-import data into ElasticSearch from Mongo and then re-index all the data, luckily it didn't took that much

Collapse
 
dmfay profile image
Dian Fay

Last week I set up a load balancer to automatically forward http to https for a particular application. That part went fine. What didn't is that we use Slack's slash commands extensively to interact with the application, and I forgot I'd set those up before I'd even gotten an SSL cert. Slack does not like getting a 301 response. They were all broken for hours for our people in Europe before I woke up and figured out what had happened.

Collapse
 
phlash profile image
Phil Ashby

Relevant tweet thread for those who do: twitter.com/ScribblingOn/status/11...

Collapse
 
kdavis profile image
Kim Davis

I once took a good portion of a large site down for over 24 hours by clearing the whole site's cache. So, it could've ben worse ;).

Collapse
 
steveboyd profile image
Steve Boyd

I once took down a stock ordering service of the company I was working for at the time, and couldn't roll back the changes because backups were manual and I'd taken an old copy from the wrong server. Took half a day for me and 2 other developers to figure out what had gone wrong and how to fix it. It didn't help we were required to do them at 3 am.

Needless to say, I've learned to triple check the deployment plans before I'm sleep deprived.

Collapse
 
kd2718 profile image
kd2718

The site I was working on for a real estate company was having issues when users requested all photos for a house. The server would fetch all the photos and add them to a zip for download. This was a bad idea from the start as there would often be over 100 high quality images.

The worst part was this was a synchronous task. The users would stare at a blank browser until the request finished or timed out. They had me make this async with celery (python). Even though the whole process was bad, we settled on this solution as a "quick fix". Celery was already used in other parts of the site.

I made the changes and deployed them near the end of the day. It worked fine. The next morning I was woken up by an emergency phone call. Most of the site was no longer working.

I had forgotten to disable the download button when a zip job was in the queue. Apparently people were mashing the button expecting the old behavior and there were a massive amount of jobs backed up in the celery queue. Anything else using celery was basically broken on the site.

I had to use git rollback to revert to the previous version of the site. I felt horrible. I was chewed out by the owners and told that they lost "millions". I guess it wasn't that bad because they kept using us for a while after that...

Collapse
 
jrecas profile image
JReca

I manually updated store database with a SQL file. I messed up field order, and basically had stock and price reversed. For a few minutes we had a lot of very cheap products that we really had few expensive ones. At least no one bought anything.

Collapse
 
torpne profile image
Kevin McKenna

The worst part about that kind of mistake is the sudden 'oh no!' that sweeps over you as you realize the implications of the mistake