DEV Community

Cover image for Logging done right, makes your life bright!

Logging done right, makes your life bright!

Gajanan on March 27, 2017

I have been working in application development on embedded devices for a few years now. All the embedded platforms that we work on do not have a go...
Collapse
 
phlash profile image
Phil Ashby

Having also worked on embedded systems for a good number of years including those that end up in space (GDB definitely not an option!), I now encourage a slightly different approach:

  • Write debugging / logging user stories, this helps the team keep in mind why they are adding log lines..
  • Never have a run-time module/level selection mechanism, because that elusive bug will only show up once (en.wikipedia.org/wiki/Heisenbug), instead use a log filter to achieve the same results, while retaining the detail when it's needed.
  • Maybe think of logging / debugging working like a hardware logic analyser, keep as much recent info in a circular buffer as you can afford storage, take snapshots when 'interesting' events occur (like crashes!), that way you have the detail leading up to the event, but are not overwhelmed by volume, nor exhaust storage when everything is working. We caught two live bugs from space using this technique :)
Collapse
 
fossterer profile image
Shashank Sabniveesu

Good suggestions Phil! I was initially interested to find out what Gajanan meant by

So we used a file watcher to watch the config file for changes and whenever a change was detected, reloaded the logger module config in memory.

But you seem to be advising something different. Can you give an example of how you used

a log filter to achieve the same results
?

Collapse
 
phlash profile image
Phil Ashby

Hi Shashank, Gajanan makes a good point about being able to adjust logging levels at runtime /while an application is exhibiting a fault/, he says: "once the application is in some problematic state, it would remain in that state".

This is an assumption that is not always true (sadly!), hence my approach of creating a logging system as a detailed (debug level) history recorder albeit with limited capacity. Provided faults are detectable (eg: system crash, exceptions), then the debug level history buffer can be saved at that point, hopefully with enough duration & detail to find the root cause. If you are familiar with the concept of core dump files, this is similar, but more manageable than a dirty great memory blob, and with event history, not just a memory snapshot at the point of failure.

If, in addition to crash traces, we also want to create a long term activity log, we would apply a filter (eg: log levels, log sources or both) to the history buffer, and emit these filtered events to suitable storage (eg: log file).

Such a filter can respond to configuration changes in much the same way that Gajanan describes for setting a 'log config' at source, that is: detecting file changes and reloading itself, thereby changing the activity log content. The important difference is that the debug level history buffer and crash dumps are not affected by changes to an activity filter.

FYI: In one system, we had a separate management card that held the event history in RAM for a number of daughter boards, and filtered this information to disk storage. The management card had a CLI, so activity logging could be managed interactively as well as via configuration files (Cisco IOS has similar features).

Thread Thread
 
grhegde09 profile image
Gajanan

I mostly agree with Phill. However unfortunate it is, the following is true! 1. It may be very hard to reproduce an issue with proper log settings. 2. System may not remain in the problematic state forever.

But like I mentioned the size of the log file on some systems are limited. We have a rotating log file of 50MB size. You may not believe me when I say 50MB is not sufficient to capture the logs for 30 minutes when we set log level to debug/trace. We don't have a way of backing up these logs to some server to analyse later. What we have is just 50MB! So we have to apply filters at the generation point!

We have to smartly design our logging system after considering all the limitations of the target system!

Thread Thread
 
phlash profile image
Phil Ashby

Sucks to be limited to 50MB and no way to siphon off to longer term storage! How about allocating some of that space to a short term circular log at debug level, enough to capture a few seconds of history or a couple of cycles through the main application loop (probably <1MB?), then as/when bad things happen, copy that buffer to a new file and raise the alarm... you'll likely get 50+ chances to see what's going wrong before running out of space and having to remove the oldest logs.

Collapse
 
benaryorg profile image
#benaryorg

As an ops person there is something that I had to deal with recently that kind of fits in here.

A logfile should consist of lines where every single one is exactly one message.
No interleaved messages, no messages longer than one line.
This way I can view it at runtime, tail it easily, comb through them and filter them.

less has some amazing features like a tail -f mode (using F), or filtering (using &), but if I cannot quickly find out why your app suddenly crashed with just less, then I'd call your logformat garbage.

Sorry for that strong wording, but there's situations where every seconds counts, and these are the situations you should be logging for.

Collapse
 
aptituz profile image
Patrick Schönfeld

„Sorry for that strong wording, but there's situations where every seconds counts, and these are the situations you should be logging for.“

As another kinda Ops-Person I have to disagree.

Logging has to fit and be useful for several cases, including the situations and avoidance of the situations you describe. It does not make that much sense, to define boundaries for the usefulness of logging by almost the simplest of all tools we have in our belt. That‘s like saying that every dish in the kitchen needs to be cookable with the microwave, because there are situations were ... well, you know: cooking needs to happen quick.

In fact it is the other way round: if your tool is too simple for the job, then maybe you should use different tools. Like, lets say, use the tools made for analysing, storing and querying log data.

Collapse
 
benaryorg profile image
#benaryorg

if your tool is too simple for the job, then maybe you should use different tools. Like, lets say, use the tools made for analysing, storing and querying log data.

But most of these tools don't belong on a server, or aren't quite applicable in a server-only environment.

Logging has to fit and be useful for several cases, including the situations and avoidance of the situations you describe

And in a lot of cases these extra information you want to be included does not fit in the environment described.

So I am proposing that we just keep writing minimal log-files with information needed to find and fix problems fast, for example as NCSA Common Log Format files.

And for everything else you log to a different sink.
That could be the tools for analysing logs, for example an ELK stack.
You log all the information to the ELK stack and keep on the system only the minimum log-information.

A prime example would be profiling information.
Tideways is really nice for profiling, but you wouldn't want that information to be interspersed within your access log, right?
So you put it in a different sink (in that case directly over the net).

That‘s like saying that every dish in the kitchen needs to be cookable with the microwave, because there are situations were ... well, you know: cooking needs to happen quick.

So in those terms what I want from developers is to always have a microwave pizza on top of the refrigerator so one can make that when in a hurry fast.
If you have time for cooking then you can always spare the time to get out the more hard to reach food below the pizza.

Collapse
 
ghost profile image
Ghost

I would recommend a different sort of logging level:

Debug - Any info a developer might ever want. This should be as verbose as possible and only be enabled when debugging (duh).
Info - Information you generally want, but don't need for everyday business. Start/Stop of services, certain events happening, etc.
Warning - Warnings are reserved for potential error sources. For instance, when a config file was not found and the default is assumed instead, or when a primary data source is not available and a secondary is used instead.
Error - Errors are for when a process fails irrecoverably, but the application as a whole remains stable. For instance, when a program tries to save a file, but write permission was denied.
Fatal - These logs indicate that tje program encountered an error from which it cannot recover. Include as much information as possible here.

I know this is quite similar to what the original post suggested, but distinguishing between fatal and non-fatal errors is very important.

Collapse
 
pedromanoel profile image

Your article is really well written and easy to understand! I would say I'm a beginner in logging and this article really helped.

I'm curious if you have a guideline for how to write log messages and what should be logged. Just like writing commit messages, I have a hard time with this. For instance, sometimes the name of the function is clear enough, and writing a message that simply states the function name with different wording seems to be redundant. What's your experience with this?

Collapse
 
grhegde09 profile image
Gajanan

Glad that the article helped you.

We have set up our logging macros in such a way that we get the function name which generated the log for each line of the log file. So mentioning from which function the log got generated is totally redundant.
When a function I called returns failure, I generally log the function name which returned failure along with the error code and if possible a brief description of the error code.
Eg:

void myStupidFunction()
{
    char *fileName = "myFile.txt";
    File *pFile = fopen (fileName,"w");
    if(NULL == pFile)
    {
        LOG_ERROR("fopen failed for file:%s with error code:%d and error description:%s",fileName,errno,std::strerror(errno));
    }   
}

Collapse
 
phlash profile image
Phil Ashby

The principle Gajanan demonstrates here is simple: "never hide information from the logs" (credentials being the only exception to this rule!). In this case a function that returns an error.

I would also suggest using a machine parseable textual log format, defined in a logging story for the team, that will make life much easier for operations and your monitoring tools (SIEM, protective monitoring, etc.)

The use of macros (in C anyway) or similar language features to add consistent source code traceability is also a Good Thing (tm), that's what the FILE and LINE macros are for :)

Collapse
 
kritulrathod profile image
Kritz Rathod • Edited

Would be helpful to add a table for logging levels.

Info: Parameters, Calculated Values,
Error: Error Messager, Stack Trace,
Verbos: Function Entry-Exit Points, Code Flow,
etc.

Should be shared with the team for uniformity in logger usage.

Collapse
 
gregopet profile image
Gregor Petrin

As for log analyzing competition, Lnav is really good. It has filters, context aware navigation, can mesh multiple log files chronologically, it has zoomable histograms and it can even evaluate sql clauses against your log. It's also a single binary to install with no dependencies and you can have it on your server since it works via ncurses. Open source, too.

lnav.org

Collapse
 
grhegde09 profile image
Gajanan

Looks interesting. Let me give it a try. Thanks for sharing :)

Collapse
 
paseanodesign profile image
Paesano Design, Inc.

Great stuff, thanks for the post. So many times logging is squeezed out of the project timeline with looming deadlines... :(

Collapse
 
junaidatique profile image
Junaid Atique

Nice article. I am using loggly to handle logs. It gives the functionality to log on different levels plus distributing modules wise. Another thing I use is to mention the function name along with module. Loggly provide nice searching through the logs.
My only suggestion is to log only required parameters. I used to log complete objects which was disastrous.

Collapse
 
grhegde09 profile image
Gajanan

Thanks. Loggly seems interesting. Not sure how easy it will be to get it working with our system. I will give it a try.
We have name of the function from which the log is generated for every line of log. That is a very helpful piece of information.
The last point is kinda debatable. If you are accepting objects as arguments to your API, it is better to dump all the input arguments in the log (probably with 'Debug' level). In other cases, Yes, you should log only the parameters you need to debug.

Collapse
 
dallgoot profile image
dallgoot

"We added the ability to select the log level for each of these modules via a config file."
i'm very curious on this since i only have a vague idea on how to do it codewise :)

Collapse
 
zegio profile image
Tony

Happen to have a working snippet for how you handled watching the config file and propagating changes live?

Collapse
 
grhegde09 profile image
Gajanan

We used QFileSystemWatcher to watch the config file and added a thread-safe method to the logger apply the new config.
You can find some example code for QFileSystemWatcher here

Collapse
 
asynccrazy profile image
Sumant H Natkar

Logging is really important when you are writing enterprise applications, we use azure blob storage for our logging.

Collapse
 
uniquer profile image
uniquer

Nice post. Just a thought if logger can capture each module logs % and throughput, that would help to debug the debug logs :)

Collapse
 
grhegde09 profile image
Gajanan

That's a neat idea :) I will give it a try!

Collapse
 
vaibhavshn profile image
Vaibhav Shinde

Great article!
Thank you. I am loving this place. 🙂