DEV Community

Clavin June
Clavin June

Posted on • Originally published at clavinjune.dev on

Create Log Parser Using Go

Photo by @iammrcup on Unsplash

Introduction

Log File is a file that contains all events that happened in the system. By simply reading the log file, Developers can understand what happened, who did something to the system, and many more. Some systems have their standard way to write logs like Apache, Nginx, Envoy, Etc. But how about the custom one? Of course, developers need to write the log file as descriptive as possible to read it easily.

Perhaps writing logs is not an issue, but reading it? Do developers review and read their logs? Do they can easily understand the logs? Do they know what happened to the system right now? Perhaps not. That's where monitoring apps like Elastic or Grafana come to help parse and monitor the logs file.

Monitoring apps could help developers read the system logs, creating an alert if something went wrong. But they don't understand what happened to the system. They only follow the rules created by the developers. What if the developers want to put a little brain to the monitoring system so the monitoring apps could understand what happened? Sure by creating a deep learning model to analyze the logs is more than a help. But before that, developers should make sure they can parse the logs.

In this blog post, you will create a simple logs parser using Go as a first step to understand the logs file better.

Define the Log Format

Let's say there's a single line of log formatted like this:

[2021-08-27T07:39:54.173Z] "GET /healthz HTTP/1.1" 200 - 0 61 225 - "111.114.195.106,10.0.0.11" "okhttp/3.12.1" "0557b0bd-4c1c-4c7a-ab7f-2120d67bee2f" "example.com" "172.16.0.1:8080"
Enter fullscreen mode Exit fullscreen mode

You could extract the data you want from that line, for example:

  1. Timestamp
  2. HTTP Method
  3. Request Path
  4. Response Code
  5. IPs

Then create the log format according to that line. Let's say you want to name the timestamp as $time_stamp, and the unimportant data as $_. Now you will have a formatted string like this:

[$time_stamp] "$http_method $request_path $_" $response_code - $_ $_ $_ - "$ips" "$_" "$_" "$_" "$_"
Enter fullscreen mode Exit fullscreen mode

So you can read your logs data like this:

  $time_stamp    => 2021-08-27T07:39:54.173Z
  $http_method   => GET
  $request_path  => /healthz
  $response_code => 200
  $ips           => 111.114.195.106,10.0.0.11
Enter fullscreen mode Exit fullscreen mode

Create the parser

Let's create a main.go file with the logs data and the format. To be easily used by the regex, you should escape the special symbol in your format using \.

func main() {
  logsExample := `[2021-08-27T07:39:54.173Z] "GET /healthz HTTP/1.1" 200 - 0 61 225 - "111.114.195.106,10.0.0.11" "okhttp/3.12.1" "0557b0bd-4c1c-4c7a-ab7f-2120d67bee2f" "example.com" "172.16.0.1:8080"`
  logsFormat := `\[$time_stamp\] \"$http_method $request_path $_\" $response_code - $_ $_ $_ - \"$ips\" \"$_\" \"$_\" \"$_\" \"$_\"`
}
Enter fullscreen mode Exit fullscreen mode

After define the format, adjust your logFormat to a format that regex could read. Because your variable starts with $ and only contains alphanumeric and underscore. You can match the variable using this regex \$([\w_]*) then change all of the variables into a named capturing group in regex. Which is (?P<name>re). Because you want to replace the <name> to your defined variable name, you can modify the named capturing group to (?P<$1>.*). So if you put that in the code, it should be like this:

  ...

  regexFormat := regexp.MustCompile(`\$([\w_]*)`).ReplaceAllString(logsFormat, `(?P<$1>.*)`)

  ...
Enter fullscreen mode Exit fullscreen mode

Now your regexFormat looks like this:

\[(?P<time_stamp>.*)\] \"(?P<http_method>.*) (?P<request_path>.*) (?P<_>.*)\" (?P<response_code>.*) - (?P<_>.*) (?P<_>.*) (?P<_>.*) - \"(?P<ips>.*)\" \"(?P<_>.*)\" \"(?P<_>.*)\" \"(?P<_>.*)\" \"(?P<_>.*)\"
Enter fullscreen mode Exit fullscreen mode

Then compile your regexFormat to find all data in the logs line.

  ...

  re := regexp.MustCompile(regexFormat)
  matches := re.FindStringSubmatch(logsExample)

  ...
Enter fullscreen mode Exit fullscreen mode

Now matches should have all your matched data. Let's print it.

  ...

  for i, k := range re.SubexpNames() {
    // ignore the first and the $_
    if i == 0 || k == "_" {
      continue
    }

    fmt.Printf("%-15s => %s\n", k, matches[i])
  }

  ...
Enter fullscreen mode Exit fullscreen mode

The output should be like this:

$ go run main.go 
time_stamp      => 2021-08-27T07:39:54.173Z
http_method     => GET
request_path    => /healthz
response_code   => 200
ips             => 111.114.195.106,10.0.0.11
Enter fullscreen mode Exit fullscreen mode

After parsing a single logs line, you should be able to parse all your logs files. The only thing you need to do is define your logs file format. And then transform it into a human-readable format like the previous step.

Here is the complete code:

package main

import (
  "fmt"
  "regexp"
)

func main() {
  // a line of log
  logsExample := `[2021-08-27T07:39:54.173Z] "GET /healthz HTTP/1.1" 200 - 0 61 225 - "111.114.195.106,10.0.0.11" "okhttp/3.12.1" "0557b0bd-4c1c-4c7a-ab7f-2120d67bee2f" "example.com" "172.16.0.1:8080"`

  // your defined log format
  logsFormat := `\[$time_stamp\] \"$http_method $request_path $_\" $response_code - $_ $_ $_ - \"$ips\" \"$_\" \"$_\" \"$_\" \"$_\"`

  // transform all the defined variable into a regex-readable named format
  regexFormat := regexp.MustCompile(`\$([\w_]*)`).ReplaceAllString(logsFormat, `(?P<$1>.*)`)

  // compile the result
  re := regexp.MustCompile(regexFormat)

  // find all the matched data from the logsExample
  matches := re.FindStringSubmatch(logsExample)

  for i, k := range re.SubexpNames() {
    // ignore the first and the $_
    if i == 0 || k == "_" {
      continue
    }

    // print the defined variable
    fmt.Printf("%-15s => %s\n", k, matches[i])
  }
}
Enter fullscreen mode Exit fullscreen mode

Thank you for reading!

Top comments (0)