DEV Community

Logging done right, makes your life bright!

Gajanan on March 27, 2017

I have been working in application development on embedded devices for a few years now. All the embedded platforms that we work on do not have a go...

Read full post

Phil Ashby • Apr 2 '17

Having also worked on embedded systems for a good number of years including those that end up in space (GDB definitely not an option!), I now encourage a slightly different approach:

Write debugging / logging user stories, this helps the team keep in mind why they are adding log lines..
Never have a run-time module/level selection mechanism, because that elusive bug will only show up once (en.wikipedia.org/wiki/Heisenbug), instead use a log filter to achieve the same results, while retaining the detail when it's needed.
Maybe think of logging / debugging working like a hardware logic analyser, keep as much recent info in a circular buffer as you can afford storage, take snapshots when 'interesting' events occur (like crashes!), that way you have the detail leading up to the event, but are not overwhelmed by volume, nor exhaust storage when everything is working. We caught two live bugs from space using this technique :)

Shashank Sabniveesu • Nov 8 '17

Good suggestions Phil! I was initially interested to find out what Gajanan meant by

So we used a file watcher to watch the config file for changes and whenever a change was detected, reloaded the logger module config in memory.

But you seem to be advising something different. Can you give an example of how you used

a log filter to achieve the same results
?

Phil Ashby • Nov 8 '17

Hi Shashank, Gajanan makes a good point about being able to adjust logging levels at runtime /while an application is exhibiting a fault/, he says: "once the application is in some problematic state, it would remain in that state".

This is an assumption that is not always true (sadly!), hence my approach of creating a logging system as a detailed (debug level) history recorder albeit with limited capacity. Provided faults are detectable (eg: system crash, exceptions), then the debug level history buffer can be saved at that point, hopefully with enough duration & detail to find the root cause. If you are familiar with the concept of core dump files, this is similar, but more manageable than a dirty great memory blob, and with event history, not just a memory snapshot at the point of failure.

If, in addition to crash traces, we also want to create a long term activity log, we would apply a filter (eg: log levels, log sources or both) to the history buffer, and emit these filtered events to suitable storage (eg: log file).

Such a filter can respond to configuration changes in much the same way that Gajanan describes for setting a 'log config' at source, that is: detecting file changes and reloading itself, thereby changing the activity log content. The important difference is that the debug level history buffer and crash dumps are not affected by changes to an activity filter.

FYI: In one system, we had a separate management card that held the event history in RAM for a number of daughter boards, and filtered this information to disk storage. The management card had a CLI, so activity logging could be managed interactively as well as via configuration files (Cisco IOS has similar features).

Gajanan • Dec 29 '17

I mostly agree with Phill. However unfortunate it is, the following is true! 1. It may be very hard to reproduce an issue with proper log settings. 2. System may not remain in the problematic state forever.

But like I mentioned the size of the log file on some systems are limited. We have a rotating log file of 50MB size. You may not believe me when I say 50MB is not sufficient to capture the logs for 30 minutes when we set log level to debug/trace. We don't have a way of backing up these logs to some server to analyse later. What we have is just 50MB! So we have to apply filters at the generation point!

We have to smartly design our logging system after considering all the limitations of the target system!

Phil Ashby • Dec 30 '17

Sucks to be limited to 50MB and no way to siphon off to longer term storage! How about allocating some of that space to a short term circular log at debug level, enough to capture a few seconds of history or a couple of cycles through the main application loop (probably <1MB?), then as/when bad things happen, copy that buffer to a new file and raise the alarm... you'll likely get 50+ chances to see what's going wrong before running out of space and having to remove the oldest logs.

#benaryorg • Aug 20 '17

As an ops person there is something that I had to deal with recently that kind of fits in here.

A logfile should consist of lines where every single one is exactly one message.
No interleaved messages, no messages longer than one line.
This way I can view it at runtime, tail it easily, comb through them and filter them.

less has some amazing features like a tail -f mode (using F), or filtering (using &), but if I cannot quickly find out why your app suddenly crashed with just less, then I'd call your logformat garbage.

Sorry for that strong wording, but there's situations where every seconds counts, and these are the situations you should be logging for.

Patrick Schönfeld • Dec 25 '17

„Sorry for that strong wording, but there's situations where every seconds counts, and these are the situations you should be logging for.“

As another kinda Ops-Person I have to disagree.

Logging has to fit and be useful for several cases, including the situations and avoidance of the situations you describe. It does not make that much sense, to define boundaries for the usefulness of logging by almost the simplest of all tools we have in our belt. That‘s like saying that every dish in the kitchen needs to be cookable with the microwave, because there are situations were ... well, you know: cooking needs to happen quick.

In fact it is the other way round: if your tool is too simple for the job, then maybe you should use different tools. Like, lets say, use the tools made for analysing, storing and querying log data.

#benaryorg • Dec 25 '17

if your tool is too simple for the job, then maybe you should use different tools. Like, lets say, use the tools made for analysing, storing and querying log data.

But most of these tools don't belong on a server, or aren't quite applicable in a server-only environment.

Logging has to fit and be useful for several cases, including the situations and avoidance of the situations you describe

And in a lot of cases these extra information you want to be included does not fit in the environment described.

So I am proposing that we just keep writing minimal log-files with information needed to find and fix problems fast, for example as NCSA Common Log Format files.

And for everything else you log to a different sink.
That could be the tools for analysing logs, for example an ELK stack.
You log all the information to the ELK stack and keep on the system only the minimum log-information.

A prime example would be profiling information.
Tideways is really nice for profiling, but you wouldn't want that information to be interspersed within your access log, right?
So you put it in a different sink (in that case directly over the net).

That‘s like saying that every dish in the kitchen needs to be cookable with the microwave, because there are situations were ... well, you know: cooking needs to happen quick.

So in those terms what I want from developers is to always have a microwave pizza on top of the refrigerator so one can make that when in a hurry fast.
If you have time for cooking then you can always spare the time to get out the more hard to reach food below the pizza.

Ghost • Mar 27 '17

I would recommend a different sort of logging level:

Debug - Any info a developer might ever want. This should be as verbose as possible and only be enabled when debugging (duh).
Info - Information you generally want, but don't need for everyday business. Start/Stop of services, certain events happening, etc.
Warning - Warnings are reserved for potential error sources. For instance, when a config file was not found and the default is assumed instead, or when a primary data source is not available and a secondary is used instead.
Error - Errors are for when a process fails irrecoverably, but the application as a whole remains stable. For instance, when a program tries to save a file, but write permission was denied.
Fatal - These logs indicate that tje program encountered an error from which it cannot recover. Include as much information as possible here.

I know this is quite similar to what the original post suggested, but distinguishing between fatal and non-fatal errors is very important.

Pedro Manoel Fabiano Alves Evangelista • Sep 20 '17 • Edited

Your article is really well written and easy to understand! I would say I'm a beginner in logging and this article really helped.

I'm curious if you have a guideline for how to write log messages and what should be logged. Just like writing commit messages, I have a hard time with this. For instance, sometimes the name of the function is clear enough, and writing a message that simply states the function name with different wording seems to be redundant. What's your experience with this?

Gajanan • Sep 20 '17

Glad that the article helped you.

We have set up our logging macros in such a way that we get the function name which generated the log for each line of the log file. So mentioning from which function the log got generated is totally redundant.
When a function I called returns failure, I generally log the function name which returned failure along with the error code and if possible a brief description of the error code.
Eg:

void myStupidFunction()
{
    char *fileName = "myFile.txt";
    File *pFile = fopen (fileName,"w");
    if(NULL == pFile)
    {
        LOG_ERROR("fopen failed for file:%s with error code:%d and error description:%s",fileName,errno,std::strerror(errno));
    }   
}

Phil Ashby • Nov 8 '17

The principle Gajanan demonstrates here is simple: "never hide information from the logs" (credentials being the only exception to this rule!). In this case a function that returns an error.

I would also suggest using a machine parseable textual log format, defined in a logging story for the team, that will make life much easier for operations and your monitoring tools (SIEM, protective monitoring, etc.)

The use of macros (in C anyway) or similar language features to add consistent source code traceability is also a Good Thing (tm), that's what the FILE and LINE macros are for :)

Kritz Rathod • Sep 19 '17 • Edited

Would be helpful to add a table for logging levels.

Info: Parameters, Calculated Values,
Error: Error Messager, Stack Trace,
Verbos: Function Entry-Exit Points, Code Flow,
etc.

Should be shared with the team for uniformity in logger usage.

Gregor Petrin • Mar 28 '17

As for log analyzing competition, Lnav is really good. It has filters, context aware navigation, can mesh multiple log files chronologically, it has zoomable histograms and it can even evaluate sql clauses against your log. It's also a single binary to install with no dependencies and you can have it on your server since it works via ncurses. Open source, too.

lnav.org

Gajanan • Mar 28 '17

Looks interesting. Let me give it a try. Thanks for sharing :)

Paesano Design, Inc. • Mar 30 '17

Great stuff, thanks for the post. So many times logging is squeezed out of the project timeline with looming deadlines... :(

Junaid Atique • Sep 19 '17

Nice article. I am using loggly to handle logs. It gives the functionality to log on different levels plus distributing modules wise. Another thing I use is to mention the function name along with module. Loggly provide nice searching through the logs.
My only suggestion is to log only required parameters. I used to log complete objects which was disastrous.

Gajanan • Sep 20 '17

Thanks. Loggly seems interesting. Not sure how easy it will be to get it working with our system. I will give it a try.
We have name of the function from which the log is generated for every line of log. That is a very helpful piece of information.
The last point is kinda debatable. If you are accepting objects as arguments to your API, it is better to dump all the input arguments in the log (probably with 'Debug' level). In other cases, Yes, you should log only the parameters you need to debug.

dallgoot • Dec 29 '17

"We added the ability to select the log level for each of these modules via a config file."
i'm very curious on this since i only have a vague idea on how to do it codewise :)

Tony • Mar 27 '17

Happen to have a working snippet for how you handled watching the config file and propagating changes live?

Gajanan • Mar 27 '17

We used QFileSystemWatcher to watch the config file and added a thread-safe method to the logger apply the new config.
You can find some example code for QFileSystemWatcher here