What should a program do when it encounters errors? In my experience, a lot of programs hardly give any thought to this. Usually it's just the bare minimum of silently ignoring bugs and maybe recording that they occurred.
However, this is an important consideration in error handling. A program should behave correctly when errors occur.
In this article, we'll examine:
- possible responses to errors
- examples of error responses in common programs
- how to handle bugs in your program
- how to handle other errors in your program
Let's dive in.
There are different ways that you can respond to errors. You can:
- crash the program
- silently ignore the error
- try to recover in some way
Crashing the program is a good default option. There are multiple benefits to it.
One benefit is that the program won't be doing the wrong thing. There have been multiple situations where software doing the wrong thing has been disastrous.
This doesn't mean that crashing is good. It means that it's probably better than silently ignoring bugs. For example, it's fine for a calculator program to crash. You can just restart it and continue what you were doing. But, if a calculator has errors and produces the wrong result, that can be a big problem.
Another benefit to crashing is that it makes errors obvious. This means that you can debug the problem immediately.
In addition, crashing as soon as possible means that the stack trace will be more helpful. It will point to the problem code. Instead, if you silently ignore the error, the program may not crash until later (if at all). At that point, it won't point to the real source of the error. This means that debugging will be harder.
Another option is to silently ignore errors. Sometimes, this option is good. It depends on the user experience. Later on, we see a few examples where ignoring the error is probably the best option.
The final option is to try to recover from the error in some way. The result should be as though the error never happened. The program should be able to continue executing correctly.
Here are some examples of how different programs may respond when encountering errors.
If a program is in early development, it's probably fine to just crash on errors. This will make debugging easier.
Most desktop applications crash when something goes wrong (if they can't recover in the background). This is usually fine. It's very easy for the user to start the program again.
For programs that are "viewers" (such as Windows Photos), no data is lost.
For programs which change data (such as Microsoft Word), very little tends to be lost. These programs tend to have autosave features to minimise lost data.
Consider that you have an ecommerce website. There could be an error in the "add to cart" button where sometimes the product isn't being added to cart.
How should that error be handled?
For starters, you probably want to notify the user. It would be very bad if the user doesn't realise that a product is missing from their cart. They might go through checkout and order everything, wait for the item to arrive and never receive it. (I mean, I've done that without any errors to "add to cart" and I can tell you it's bad...)
To notify the user, you could display a message to them. For example, you could tell them:
- the action failed
- that they should try again later
- that they should refresh the page and try again
For this website, if the animation fails to trigger, it's probably not a big deal.
In this case, you essentially want to ignore the error silently. Don't notify the user that a problem occurred. The animation isn't important enough to notify and distract the user from what they're doing.
If there is a bug in a single player video game, it probably doesn't matter. In most cases, it's very bad for the user experience for the game to crash. Having buggy gameplay is preferred to crashing. In this case, the best option is probably to silently ignore the bug.
For something life-critical, you would want to recover from errors very carefully and deliberately.
This might mean having redundancy. For example, you might have backup systems so that one can take over if something goes wrong. Or you might have a live-monitoring program, which can restart and reinitialise other programs that have errored or crashed. Or any other number of things. You might also use defensive programming to prevent certain programs from failing in the first place.
A bug is when something unexpected or obviously wrong happens in your program. It arises from faulty coding. It wouldn't be there if the code was correct.
When handling bugs (or any error), you need to consider:
- whether the bug is recoverable or not
- the user experience
- the development time for different responses
Also, regardless of what you do, you should record errors to debug them later.
Some bugs are impossible to recover from. For example, there's nothing you can do if some important code always fails. The only solution is to fix the code.
However, some bugs may be recoverable.
One example of possibly recoverable bugs are intermittent bugs. These are bugs that only occur under certain circumstances. This includes race conditions or errors that only happen with specific state.
With some effort, you may be able to handle these without restarting the main program. For example, if an operation fails, you could:
- try to run the operation again. If the problem is a race condition, it may work next time.
- try to restart a faulty subprogram in the background. Then retry the operation afterwards.
- try to manually fix the state of the program to something that works
- offload the erroring operation to a server
Another example may be something like running out of memory. Even this can be recoverable sometimes.
However, one issue is that you may not know that your program has a particular bug. After all, if you knew about the bug, then the best solution would be to fix it. So, if you don't know about the bug, you might not have error handling for it.
The exception is if you're doing defensive programming. In this case, you'll have error handling "just in case". You won't actually know of whether you have a bug or not. Instead, you'll implement error handling pre-emptively for all kinds of possible bugs.
So, in summary:
- some bugs aren't recoverable
- some bugs are recoverable but you won't have error handling for them
- some bugs are recoverable and you'll have error handling for them
As shown in the examples above, different programs need to respond to errors differently. Sometimes, it's fine to crash the program. The user can restart it or another process can restart it automatically. At other times, you can silently ignore the error. In other cases, you may need to do everything in your power to recover.
In general, even though some bugs are recoverable, recovery can be extremely difficult. Some of the issues with it are explained in defensive programming.
In comparison, crashing the program is very easy. Also, this usually fixes errors just as well (if not better) than manually recovering.
Generally, when you encounter bugs, the rule of thumb is to crash the program. The most important reason for this is because defensive programming can be very difficult to implement.
At other times, it's okay to ignore the bugs. For example, when the area of the codebase that errored is insignificant.
Recovering from bugs is rare. It's only reserved for defensive programming. Defensive programming is mostly used for software where uptime and correctness are extremely valuable.
Finally, regardless of what you do, remember to record errors to debug them later.
"Other errors" are things which aren't bugs in your program. These can include things like:
- failing to send a network request because the network connection dropped
- failing to read a file from the filesystem because the file was deleted by the user, manually, a few milliseconds ago
These "errors" are normal and expected. They are things that any program may encounter. There is nothing that your program can do to completely prevent them.
Once again, you need to consider:
- whether the error is recoverable or not
- the user experience
- the development time for different responses
In general, many of these errors are both recoverable and easy to recover from.
For example, consider that a network request failed to send because the user lost internet connection. It's relatively easy to show the user a message. Then, you can ask them try again after ensuring they're connected to the internet. This also results in a good user experience. In comparison, crashing the program would be a very bad user experience.
As another example, you might try to write to a file that doesn't exist. In this case, an appropriate and easy solution might be to create the file first.
So, overall, the common advice for these errors is to handle and recover from them. Of course, the details depend on your program. Some of these errors may be unrecoverable. In this case, crashing the program may be a better option.
So that's it for this article. I hope that you found it useful.
As always, if any points were missed, or if you disagree with anything, or have any comments or feedback then please leave a comment below.
For the next steps, I recommend looking at the other articles in the error handling series.
Alright, thanks and see you next time.
- Signs photo - Photo by Alexander Schimmeck on Unsplash
- Jenga photo - Photo by Michał Parzuchowski on Unsplash
- Person photo - Photo by Sebastian Herrmann on Unsplash
- Chess photo - Photo by Nothing Ahead from Pexels