About error levels and logging

#errors #logging #monitoring

Logging is an important part of development as well as monitoring and maintenance. It allows any technical party to answer questions about how the application is behaving and what went wrong when things need investigating.

But logging always comes at a cost. Whether it's logging to disk, stdout/stderr or directly shipping to a remote service, logging takes up resources, bandwidth, memory/cpu and so on - not to mention storage space wherever it ends up.

Which begs the question - what should we log? In most logging systems, there are some clear cut levels of logging. Usually, these are:

FATAL: the application ran into an issue that cannot allow it to safely continue, risking data corruption or affecting other services that depend on it. It must shutdown and handle control to external systems (orchestrators, systemd, etc) which will decide on a restart or not.
ERROR: the application ran into an unexpected behaviour that can be recovered from but it may have affected operations or may require user intervention. Eg: someone needs to be notified of what happened.
WARN(ING): an event unexpected during the "happy flow" has occurred. In the event of an error, people should find useful information here which may explain it.
NOTICE: anything outside of the happy flow could come in here which would allow for a more detailed flow of information about how the system runs. (note: I found the many Go loggers don't have this).
INFO: anything. I use this to track variables of interest and how values may be affected, returns of important functions and the like.
DEBUG: same as above but with extra context and generally more (any function should have this).

Other levels that are particular to different platforms:

TRACE: I have seen it only in log4j. Not sure how it would differ from DEBUG (perhaps this is what it stands for in practice?)
PANIC: some Go loggers have this, which also calls Go's panic function. The difference here is that Fatal errors shouldn't affect other goroutins that may be in effect at the time of an Error or Fatal error. While the system may not be in a position to recover in those cases, it's left to a simple system exit, meaning that the application's other goroutines/threads have a chance to complete. Opposite that, panic means stop everything, now.

What about you? What's your meaning for error levels? Are there more?

DEV Community

About error levels and logging

Top comments (0)

Read next

Challenges I Faced in the BloodLinePro Project and How I Overcame Them

Part 2 - CosmosDB Logical Partition and the Impact on Partition Key Choice

Multi-Container Pods in Kubernetes: Best Practices and Use Cases

In the AI Era, Templates are More Important, So I Created a Directory Site with 500+ Free Web Templates!