So I brought down the site for a little while this morning. Now I'm interested in hearing about when you messed everything up!
For further actions, you may consider blocking this person and/or reporting abuse
So I brought down the site for a little while this morning. Now I'm interested in hearing about when you messed everything up!
For further actions, you may consider blocking this person and/or reporting abuse
Arjun Vijay Prakash -
Thomas Hansen -
Michael Tharrington -
Jenuel Oras Ganawed -
Oldest comments (56)
When starting my current job, I was doing my first git rebase. But I didn't understand the command and wound up rebasing off the incorrect one. So my branch had dozens of extra commits that wound up getting pushed to master.
Thankfully the changes were reverted fast, but it also means I didn't see how spectacularly I screwed up the main site.
Almost experienced that while learning to rebase. Fortunately it got resolved quickly
Rebase should have a consumer warning lableπππ
I donβt have an example (yet) but I was thinking how this goof could help others - especially newbies/beginners - realize they are definitely not alone when their first outage happens!
Three weeks into my new job, I deleted the marketing website sidebar including various signup widgets. Was not aware of it being used on every page, did not need it on the one I was editing and went βNah, letβs throw this out!β There was no undo function for this. Luckily, a colleague noticed quite fast and she was able to insert the content again quickly π
Last night I forgot to stop all my docker containers before running
yum update
. Now all of my containers are corrupted. Yay!Don't sweat it! Software wouldn't be software without bugs and some outages. It happens π
My most heinous incident involved multiple threads, hitting one shared API connection and, in turn, criss-crossing customer data π³
I forked a GitHub repo my first week at work, then had to delete it when I realized I'd forked the wrong one. Well, despite GitHub's repeated warnings, I managed to delete the source repo, not the fork. There were at least 6 open PRs in active development against it, and nobody had a full local clone that we could use to restore it. Thankfully, GitHub support was incredibly helpful and restored it.
The worst part is that some sympathetic coworkers humorously explained that this happens to all GitHub n00bs at some point. The problem was that I'd been using GH for at least 5 years at that point, and I should have known better.
That's a good thing to know that Github has backups
I wouldnβt rely on that option, though. Weβre a big, visible company. Your results may vary.
You are not alone Jason! I have ten years experience and LAST WEEK I merged a big commit not realizing another feature was finished first and already merged that had conflicts with my code. Thank God for merge tool or it would have been a sh*t show! Instead I walked away with my tail between my legs and a chance to fix my code. Moral of the story: itβs why we have versioning tools - we are all Human and will make mistakes ππ
Git should give peptalk before trying any thing in cli mode.
Once I used a PassThrough stream instead of an EventEmitter. Apparently PassThrough streams retain some state as the data goes through them, and so eventually it caused a memory leak.
This memory leak was in a multi-machine process, which led to two processes thinking they were responsible for updating the database.
That caused mongo queries from the affected nodes to randomly execute after about an hour of being slammed from lack of memory.
Eventually the affected node would restart and start to function normally. Which was worse, because it allowed this problem to go unnoticed, until some users started reporting duplicate or corrupted data.
Stuff of nightmares.
I was working on writing a shell script to delete some files that were installed in the
Applications
directory on a Mac. In my mind, I had run it through command line so many times that I had just forgotten to fully write out the path.So I finish up, make it fully executable, and bam! it erases my entire application folder.
Thank goodness for Time Machine.
Today, I forgot to turn off a service which was indexing data in ElasticSearch and I've ran an command on my machine that was also indexing data, causing the system to go into a corrupt state. I had to turn of the service and re-import data into ElasticSearch from Mongo and then re-index all the data, luckily it didn't took that much
Last week I set up a load balancer to automatically forward http to https for a particular application. That part went fine. What didn't is that we use Slack's slash commands extensively to interact with the application, and I forgot I'd set those up before I'd even gotten an SSL cert. Slack does not like getting a 301 response. They were all broken for hours for our people in Europe before I woke up and figured out what had happened.
Relevant tweet thread for those who do: twitter.com/ScribblingOn/status/11...
I once took a good portion of a large site down for over 24 hours by clearing the whole site's cache. So, it could've ben worse ;).
I once took down a stock ordering service of the company I was working for at the time, and couldn't roll back the changes because backups were manual and I'd taken an old copy from the wrong server. Took half a day for me and 2 other developers to figure out what had gone wrong and how to fix it. It didn't help we were required to do them at 3 am.
Needless to say, I've learned to triple check the deployment plans before I'm sleep deprived.
The site I was working on for a real estate company was having issues when users requested all photos for a house. The server would fetch all the photos and add them to a zip for download. This was a bad idea from the start as there would often be over 100 high quality images.
The worst part was this was a synchronous task. The users would stare at a blank browser until the request finished or timed out. They had me make this async with celery (python). Even though the whole process was bad, we settled on this solution as a "quick fix". Celery was already used in other parts of the site.
I made the changes and deployed them near the end of the day. It worked fine. The next morning I was woken up by an emergency phone call. Most of the site was no longer working.
I had forgotten to disable the download button when a zip job was in the queue. Apparently people were mashing the button expecting the old behavior and there were a massive amount of jobs backed up in the celery queue. Anything else using celery was basically broken on the site.
I had to use git rollback to revert to the previous version of the site. I felt horrible. I was chewed out by the owners and told that they lost "millions". I guess it wasn't that bad because they kept using us for a while after that...
I manually updated store database with a SQL file. I messed up field order, and basically had stock and price reversed. For a few minutes we had a lot of very cheap products that we really had few expensive ones. At least no one bought anything.
The worst part about that kind of mistake is the sudden 'oh no!' that sweeps over you as you realize the implications of the mistake