DEV Community

Cover image for Application Logging and Production Monitoring
muthandir
muthandir

Posted on

Application Logging and Production Monitoring

In my old days, I used to work in the corporate world as a developer, tech lead, architect etc. Back in those days I rarely worried about how we should do logging & monitoring. We always had tools, means and ways to get end 2 end visibility.

Later on, I co-founded a startup and my partner and I had to pick our tech stack. Me being a .net guy forever and him being a laravel pro, we went on with node.js 🙂 (For several reasons, but that is another story).

Back to logging, what we needed was the ability to save the entire lifetime of an incoming request. This means the request body/header info, service layer calls and respective responses, DB calls and so on. Additionally we wanted to use microservices back then (Again, another story with lots of pros and cons). So the entire lifetime also includes the communication between the microservices back and forth. So we needed a request id, and with it we could filter the logs and sort by time. Let me break it down to separate steps:

UI: We use a SPA on our front-end. The UI makes HTTPs calls to our API.

API Layer: Our business services in the APIs are instantiated using Factories which inject the dependencies. So in theory you could create a custom logger, enrich it with “request-id” and inject the logger to the business services for the use of developers, so they can log whenever they need so. But it feels like logging is not something we could leave up to our preferences. What we needed was an automated way to flush data. Additionally, the logs also reduce the readability and they could potentially cause bugs. (In theory, a business logic code should not be “polluted” with extra logging codes). To accomplish the task, our factories, instead of injecting the logger into the services, wrap the service functions with a self-logging capability (using an in-house logging library) which simply adds another layer of Javascript promise to capture the input parameters and resolve the response objects. This way, all input and return values are available in the in-house logging library for enriching (method name, function start/end time, server ip, microservice name, elapsed duration etc) and logging. We, as the developers, don’t have to worry about it and know that the system will capture everything that is needed in a well-formatted fashion.

Flat file logging vs searchable logging

Microservice Communication: We created another in-house library, a forked version of “Request Promise Native”. It helps our developers with injecting out of band request-id info so the target microservice can read and use it throughout the lifetime of its underlying services. This means, all our microservices have the capability to read the incoming request-ids and forward it to outgoing microservice calls.

Logger: A word of caution, please mask your messages and don’t log any sensitive data! I’ve seen logs with PII or credit card info in the past, please don’t do it. Your users depend on you and this is your responsibility! Anyways, there are tons of good logging libraries out there. We decided to use Winston because,
1-Winston is good
2-It has Graylog2 support, which brings us to our next item:

Log Repository: In the last 10 years or so, I don’t remember a single case when I had to check the server log files for monitoring/debugging purposes. It is just so impractical to walk through those files with a line of log after the other all coming from different requests. It simply won’t help and actually in one of the US banks that I used to work at, the Devops folks suggested that we could simply stop creating them. Of course, that doesn’t mean you could stop logging. ‘Au contraire!’, it is very important that you have a log repository where you can search, filter, export and manage your logs. So we reduced our options to the following tools:
-Splunk
-Graylog
We selected Graylog because we had experience administrating a Graylog server, it is an open-source tool (meaning much lower costs as it just needs a mid-sized server) and it does the job.

Your logs will show you lots of insights about your application and will potentially help you to uncover bugs. My team regularly walk-through the logs before each release to understand if we are about to introduce any new unexpected errors. With a tool like Graylog, you can create alerts for different scenarios (http response codes, app error codes etc) and this way you will know there is a problem even before the customer sees the error message. Your QA team can insert request-ids in the tickets so the developers can trace what exactly happened at the time of test. If you want to dive deeper, I remember using Splunk logs for fraudulent behavior mitigation through near-real-time and batch analysis. For whatever reason we use the logs, we want them, embrace them, love them:)

Top comments (0)