Code[ish]
95. Intelligence Through Logging
Corey Martin, a customer solutions architect at Heroku, interviews Ariel Assaraf, the CEO of Coralogix, a platform that helps companies get a grasp on their log data. All too often, logs are considered as only a useful debugging tool. After receiving an alert around high resource usage or an elevated error rate, a developer might check their logs to see what caused the issue. But Ariel argues that this is too late to investigate a problem; by visualizing and alerting log data, you can figure out production problems before users encounter them.
Metrics, in other words, are a lagging indicator, while logs are a real-time representation of how your code is really performing. One way to reconcile these two is to aggregate log data and funnel it into other long-term metric storage. This would allow you to see longer term trends. Ariel provides a scenario where log records appear in groups, such as a user purchasing a product, followed by an API call to Stripe, and concluding with an email notifying the user. A platform like Coralogix can automatically identify that the three logs arrive together within a certain time frame. If, for any reason, one of these steps fails to log, then a notification can be set up to notify the team to proactively investigate, rather than a customer writing in to report an error.
For an organization to beginning using logs as time-series data, Ariel recommends three things. First, a unified log format, which could be something structured like JSON. These can be generated by a middleware service. Next, a shared understanding across teams on the severity with which to log a message. The final step is to set up an alerting policy; not only which types of alerts to create, but also where they go, such as Slack, email, or text message. After that, you can begin to incorporate your logs into your monitoring processes.
Links from this episode
- Coralogix is an observability platform for logs, metrics, and security