As a developer, your worst fears may include losing all your production data, servers going down, and latent errors, among other worries. However, there are several free tools you can use to prevent these nightmares from coming true. In this blog, we'll go over five nightmare situations for developers along with some free solutions they can use to prevent them from coming to life.
😱 Your Data Center Goes Down
Ideally, when planning out your systems, you want to set them up to handle any failure. Of course, there's always a chance you missed something during this initial phase. And also, mistakes happen.
Take this story from Amazon. While Amazon is a huge enterprise, it can fall victim to its own mistakes. One example is when Amazon brought down the entire Northern Virginia Region in 2017! According to their summary of the incident, an Amazon Simple Storage Service (S3) member executed a command that accidentally removed a larger set of servers than intended. Oops.
If Amazon can make mistakes like this, then it's all the more important to have an outage alerting tool at the ready.
In this case, we would like to recommend The Outages Project. This open-source project tracks outages using "git-scraping."
⚰️ Losing Your Production Data
Deleting your production environment seems impossible, right? Well, think again. In this Reddit post, a Junior Developer deleted their company's production environment. The worst part: the backups of production were unusable! What a nightmare.
Today was my first day on the job as a Junior Software Developer and was my first non-internship position after university. Unfortunately i screwed up badly.
I was basically given a document detailing how to setup my local development environment. Which involves run a small script to create my own personal DB instance from some test data. After running the…
The big takeaway from this story: back up your data! Not only should you back it up, but this process should be automated.
We found these open-source projects on GitHub that could help with this if you're using MongoDB. The MongoDB-backup and Mongodb-restore automate the process of backing up to MongoDB. If you're not using MongoDB or need something more robust, check out this article that reviews seven database backup tools.
Whatever you decide to go with, double-check that your backup system works.
👻 Invisible Errors
Also referred to as Latent Errors. These are errors that, no matter how well-tested your code is in staging, still manage to sneak into production, just waiting for the right path to be activated. Once that path is activated the error surfaces.
Here's a great example of a latent error brought to you by this Stackoverflow post. During takeoff, the pilots of a Boeing 777 received an error message while they were in autopilot mode that said their plane had stalled. The pilots took back manual control and landed the aircraft safely. Upon investigation, it was discovered that the issue was a latent error within data from a failed accelerometer.
In the case of errors, whether they're active or latent, you need a reliable error monitoring tool. If you're not too familiar with error monitoring, it's a tool that alerts you when an error occurs within your project. One tool we're fond of is Airbrake Error and Performance Monitoring (obviously since we're Airbrake, lol), which has a free developer tier.
🧠 Memory Leaks
In his article, "Understanding Memory Leaks in Programming," Pierre DeBois describes a memory leak as "a type of resources mismanagement in programming…They occur when programming objects are stored in computer memory, be it a laptop or smartphone, but then the allocated memory is not released as designed with the programming." If left unchecked, memory leaks will consume memory and lead to performance degradation.
How can you detect memory leaks? You probably already have the solution!
Chrome Developer Tools allows you to take heap snapshot images and Allocation Instrumentation on Timeline recordings, which provide metrics such as jsHeapSizeLimit, usedJSHeapSize, and totalJSHeapSize. DeBois goes through the entire process step by step in the link above.
🧟 Slow Response Time
Everyone has experienced this! You're on a website or application, and it takes FOREVER for the page to load. Slow load times are common in programming due to several factors (new commits, new features, etc.). For a full explanation of why this occurs, check out the blog post at Johny's Software Lab.
You can find out if your SaaS app is slowing down with Google Chrome Lighthouse. Lighthouse should be a staple in your monitoring because it lets you see the performance, accessibility, and SEO metrics related to your app's performance.
Now, if you want a free tool that allows you to see which of your paths have slowed down, again, we have to recommend Airbrake Performance Monitoring because of its ability to pinpoint routes with slower response times.
😨 What's Your Nightmare?
And those are just a few of the nightmare issues developers face. But we're hoping that you can alleviate some of the pain points associated with servers going down, data loss, latent errors, memory leaks, and slow response times by providing some free tools.
Enough from us, though! What about you? What nightmare situations have you encountered as a developer? How did you get through it? Let us know in the comments!
Top comments (3)
After a few years on the job the "I dropped the production database" became a meme, almost everyone had screwed that up at least once and spent a night or few days fixing it.
then you go home have a good cry and learn a new thing.
backup automation is essential, but the worst you can have is a corrupted backup when the day comes that you actually need it.
So not only backup automation but with integrity checks on top with email notification as soon as that happens, so if there is an out-of-the-box tool to run good integrity checks it's nice, otherwise you might end up with an actual setup with a db/ds instance to actively test backups as you take them.
A nice combination of sentry and n8n could do wonders to register and automate the dissemination of failures to the right people.
Another bad thing that can happen is full on system blocks due to resource shortages like something as trivial as disk space, you can't always run on a automatically regulated cloud sometimes content comes from components residing on-premise. always monitor according to your needs and design for self sustainable with long term plans on data retention.
nice article, we all live on the edge.
Great advice about including integrity checks! You worded it way better than I did: "the worst you can have is a corrupted backup." Best to make sure that everything is backed-up correctly, then to assume it is.
These sounds scary.