loading...
Cover image for Logging Best Practices

Logging Best Practices

raysaltrelli profile image Ray Saltrelli ・7 min read

When running an application in a local development environment, the de facto tactic to see what it’s doing is, of course, to attach a debugger. You can step through the code line by line and inspect every variable along the way. It’s a no-brainer. But what about higher environments like QA, Staging or Production where you cannot attach a debugger? How do we gain insight into what an application is doing in these environments? One word: Log!

Alt Text

Log Levels

Logs aren’t just for capturing errors. A robust logging strategy makes thorough use of all of the different log levels your logging framework has to offer. As a .NET developer, I've used several different logging frameworks over the course of my career, each of which offers the same six log levels, but the naming differs a bit between them. For consistency's sake, we’re going to use the names of the Microsoft log levels in this article.

Microsoft Serilog lognet Description
Critical Fatal FATAL Logs that describe an unrecoverable application or system crash, or a catastrophic failure that requires immediate attention.
Error Error ERROR Logs that highlight when the current flow of execution is stopped due to a failure. These should indicate a failure in the current activity, not an application-wide failure.
Warning Warning WARN Logs that highlight an abnormal or unexpected event in the application flow, but do not otherwise cause the application execution to stop.
Information Information INFO Logs that track the general flow of the application. These logs should have long-term value.
Debug Debug DEBUG Logs that are used for interactive investigation during development. These logs should primarily contain information useful for debugging and have no long-term value.
Trace Verbose ALL Logs that contain the most detailed messages. These messages may contain sensitive application data. These messages are disabled by default and should never be enabled in a production environment.

Best Practices

It’s very common for the only logging in an application to be Error logs from within try/catch blocks. While this is a good thing to do, if that’s all you’re doing, then you’re going to have very limited visibility into the execution of your application once it leaves your local development environment. Consider making better use of the other log levels according to the best practices described below.

Alt Text

Set the Current Log Level Via External Configuration

Setting the current log level should always be an operational task; never hard coded. External configurations allow operators to toggle verbose logging on demand via environment variables, configuration files, etc.

Default to the Information Log Level

When choosing a global default log level (e.g. in web.config or appsettings.json) for your application, Information is usually a good choice. This means that Critical, Error, Warning and Information log entries will be written while Debug and Trace log entries will be suppressed. You might then want to provide a different default log level per environment (e.g. in web.{env}.config or appsettings.{env}.json); perhaps Trace in Development and Debug in QA.

Log Catastrophic Failures Using the Critical Log Level

As developers, we're are usually pretty good about logging run-of-the-mill errors: catch exception, log an error message, carry on. But catastrophic failures are a different animal.

A situation might arise where an application encounters an unrecoverable error during its start-up routine and throws an exception. You can catch the exception, log it using the Critical log level instead of Error and exit cleanly. In this flow, the log entry should make it out of the application since the application decided to end its own life as oppose to dying in a sudden accident.

Alt Text

Other catastrophic failures aren’t so easily handled. Maybe an exception is thrown during startup that is outside of your reach or Kubernetes decides to kill a container from the outside due to it hitting a memory limit. In the former case, any log entries that might be written are like people trying to escape a burning building; some might make it out before the building collapses on them, others might not. In the latter case, the application itself is unaware of the failure. One millisecond it is healthy and executing and the next millisecond its entire execution environment has been nuked. There isn’t the opportunity to even attempt to write a log entry. Be aware of these potential cases and work with your operations team to diagnose them when they occur.

Alt Text

Use the Warning and Information Log Levels Judiciously

Critical and Error are pretty easy to use appropriately (i.e. when things go off the rails) but Warning and Information are not so clear.

Warning should be reserved for logs that are not errors—they don’t impede execution—but are not entirely normal behavior. Examples might be a cache miss on an expensive object that really should have been in the cache, a piece of code that completed but took longer than expected, a failed login attempt, or an access control violation. The key though is that the warning should be actionable. You shouldn’t clutter the logs with warnings that you don’t intend to do anything about. Only use them if you plan to do something if/when that warning appears consistently in the logs.

Information should be used when the application is executing normally but you want to communicate something to the operator. The most common of these might be application life-cycle events, like Application started or Application stopped. In the absence of a monitoring tool (like New Relic, AppDynamics or DataDog) it might be appropriate to log application events at the Information log level, but if you have these tools, you should consider writing application events there rather than to your logs.

Alt Text

Don’t use either of these log levels so often that your application is writing copious logs when things are going smoothly. A healthy application’s logs should be mostly silent; perhaps only infrequent Information log entries to confirm that it is still there.

Step Through Code Using the Debug Log Level

When your application is misbehaving is when you’ll want the ability to get enhanced visibility into what it’s doing. Your first inclination might be to try to reproduce the bug in a local development environment with a debugger attached, but this isn’t always possible. You might not know enough about the bug to reproduce it locally. It might be unique to the configuration or data of a particular environment. “It works on my machine.” Right? So what do we do in these cases? Well, if you’ve made good use of the Debug log level when you wrote your code, a simple configuration change can get you more detailed logging from the environment where the issue is occurring.

But what does “good use of the Debug log level” look like? To answer that, lets think about how we use a debugger. When you’re debugging an application, the first thing you might do is put a bunch of strategically placed breakpoints in the code to see which ones are hit, in which order, how many times, etc. You’d normally do this ad hoc after a bug has been discovered, but what if you did it proactively and methodically during development? You could log a Debug entry at each place you’d potentially put a breakpoint with your debugger. Later, when your application misbehaves in a higher environment, you can ask operations to switch the log level to Debug, see the log entries, and follow along in the source code to see what your application is doing out in the wild. Think of it like leaving a map for your future self.

Alt Text

These log entries should just be mile markers so you can identify which lines of code your application is executing. Don’t put any data, like the values of variables and parameters, in these logs. That should be reserved for the Trace log level.

Inspect Variables Using the Trace Log Level

Trace is the most detailed log level and should only be turned on as a last resort. When stepping through the code with the Debug log level isn’t enough, use Trace to write the values of variables, parameters and settings to the logs for inspection. This mimics the behavior of inspecting the contents of variables with the debugger attached. It should only be done as a last resort for two reasons: 1) if you’re being thorough, this will drastically increase the amount of log data your application will produce, which has storage, cost and performance implications and 2) there could be sensitive data in the log entries written using this log level (more about this in the next section).

Alt Text

Never Log Personal Identifying Information or Secrets

Never—and I really mean never—write personal identifying information, credentials or other secrets to any log level (including Trace). Seriously, don’t do it!

Personal Identifying Information

Regarding PII, writing this information to logs makes your life that much harder when dealing with GDPR, CCPA and whatever other privacy laws pop up in the future. You don’t want to have to worry about purging PII from your logs because your log retention is greater than the 30-day deadline for complying with GDPR requests.

Alt Text

PII you should avoid logging includes, but is not limited to:

  • First Name
  • Last Name
  • Username
  • Gender
  • Birthday
  • Mailing/Billing Addresses
  • Email Addresses
  • Phone Numbers
  • Social Security Number
  • Credit Card Numbers

Secrets

Regarding secrets, writing this information to logs is a big security faux pas. Logs are often openly available to a broad audience in a technology organization so anything written there is visible to lots of people. This means that if anything shows up in the logs that isn’t meant to be broadcast to everyone, the admin of that system has to change it in order to maintain security.

Alt Text

Secrets you should avoid logging include, but are not limited to:

  • Usernames
  • Passwords
  • API Keys
  • Encryption Keys
  • Connection Strings
  • Magic URLs

Example

public class Goku
{
    private readonly ILogger _logger;

    public Goku(ILogger<Goku> logger)
    {
        _logger = logger;
    }

    public int PowerUp(int desiredPower)
    {
        _logger.LogDebug($"BEGIN {MethodInfo.GetCurrentMethod()}");
        _logger.LogTrace("desiredPower = {desiredPower}", desiredPower);

        try
        {
            var power = 0;
            var warned = false;

            for (; power < desiredPower; power += 100)
            {
                _logger.LogDebug("Charging ...");
                _logger.LogTrace("power = {power}", power);
                _logger.LogTrace("warned = {warned}", warned);

                if (power > 9000 && !warned)
                {
                    _logger.LogWarning("It's over 9000!");
                    warned = true;
                }
            }

            _logger.LogDebug("Power up complete.");
            _logger.LogTrace("power = {power}", power);
        }
        catch(Exception ex)
        {
            _logger.LogError(ex, "Failed to complete power up.");
        }

        _logger.LogDebug($"END {MethodInfo.GetCurrentMethod()}");

        return power;
    }
}

Links

Posted on by:

raysaltrelli profile

Ray Saltrelli

@raysaltrelli

Software Architect at TCGplayer. Things I'm interested in: .NET, Event-Driven Microservices, AWS, Docker, DevOps Culture, Security, Privacy, constantly learning.

Discussion

markdown guide
 

Great article! As knowing which log level to use can be tricky (and not an exact science 😅) at some point my team decided to limit to only 2 levels: debug and error with error logs being sent via Slack/Email to make sure they don't go silent.

 

Most people think of logging as text based, but structured logging to a database is incredibly useful. It allows you to track down specific instances of something failing by searching with parameters you suspect.

You can also save a lot of space by simply storing the hash of log message template, instead of the merged string.