DEV Community

Sourav Atta
Sourav Atta

Posted on

Writing an effective GROK pattern

Grok is one of the popular Logstash filters which is used to parse the unstructured log data to a meaningful format.

Logstash ships with 120 default built-in patterns. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

Also, some of the patterns can be referred from https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns
I personally prefer the above link for constructing grok pattern.

Now, there may be cases when these grok patterns won't fit. So, we have a regular expression library Oniguruma, which can be combined with grok to create powerful patterns.


Grok Syntax

%{SYNTAX:SEMANTIC}
Enter fullscreen mode Exit fullscreen mode

Oniguruma Syntax

(?<field_name>regex pattern)
Enter fullscreen mode Exit fullscreen mode
  • field_name is the key
  • regex pattern is the placeholder to add your regex

How to use?

Let's try to create a pattern to parse unstructured log data.

Sample Log Data

09:33:45,416 (metrics-logger-reporter-1-thread-1) type=GAUGE, name=notifications.received, value=2
Enter fullscreen mode Exit fullscreen mode

Required fields from log data

Field Field Value
timestamp 09:33:45,416
logthread metrics-logger-reporter-1-thread-1
type GAUGE
name notifications.received
value 2

Grok Pattern

We will use Grok Debugger to test our pattern to match the log data.

Let's disintegrate the log data to create a pattern that matches a particular field:

Field Pattern
timestamp %{TIME}
type %{DATA}
name %{DATA}
value %{POSINT}

The field thread, can be a combination of the alphanumeric characters.

So, we need to use oniguruma to match the field logthread. Considering the syntax of oniguruma, we need to create a regex pattern that will match the value of the field logthread

Constructing Regex Pattern

We now use Regex Checker that will help us to construct and test the regex pattern for the value of field logthread

image

The (?:[()a-zA-Z\d-]+) non-capturing group matches single character present in the list below:

  • + greedy match i.e. matches the previous token between one and unlimited times, as many times as possible
  • () matches a single character in the list ()
  • a-z matches a single character in the range between a and z
  • A-Z matches a single character in the range between A and Z
  • \d matches a digit
  • - matches the character -

Oniguruma

The final Oniguruma pattern for the field logthread:

(?<logthread>(?:[()a-zA-Z\d-]+))
Enter fullscreen mode Exit fullscreen mode

Grok Pattern + Oniguruma (Final Pattern)

The final pattern that will match the log data:

%{TIME:timestamp} \((?<logthread>(?:[()a-zA-Z\d-]+))\) type=%{DATA:type}, name=%{DATA:name}, value=%{POSINT:value}
Enter fullscreen mode Exit fullscreen mode

image

Output of the pattern

{
  "timestamp": [
    [
      "09:33:45,416"
    ]
  ],
  "HOUR": [
    [
      "09"
    ]
  ],
  "MINUTE": [
    [
      "33"
    ]
  ],
  "SECOND": [
    [
      "45,416"
    ]
  ],
  "logthread": [
    [
      "metrics-logger-reporter-1-thread-1"
    ]
  ],
  "type": [
    [
      "GAUGE"
    ]
  ],
  "name": [
    [
      "notifications.received"
    ]
  ],
  "value": [
    [
      "2"
    ]
  ]
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

The combination of Grok Pattern and Oniguruma is a perfect pair. Tha pairing can help to transform any complex logs into structured data. Give it a try using Grok Pattern + Oniguruma in Logstash !!


Let me know in the comments if you have any better way of doing or facing any problem with the above example.

Top comments (0)