Grok is one of the popular Logstash filters which is used to parse the unstructured log data to a meaningful format.
Logstash ships with 120 default built-in patterns. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns
Also, some of the patterns can be referred from https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns
I personally prefer the above link for constructing grok pattern.
Now, there may be cases when these grok patterns won't fit. So, we have a regular expression library Oniguruma, which can be combined with grok to create powerful patterns.
Grok Syntax
%{SYNTAX:SEMANTIC}
- SYNTAX is the default grok patterns
- SEMANTIC is the key
Oniguruma Syntax
(?<field_name>regex pattern)
- field_name is the key
- regex pattern is the placeholder to add your regex
How to use?
Let's try to create a pattern to parse unstructured log data.
Sample Log Data
09:33:45,416 (metrics-logger-reporter-1-thread-1) type=GAUGE, name=notifications.received, value=2
Required fields from log data
Field | Field Value |
---|---|
timestamp | 09:33:45,416 |
logthread | metrics-logger-reporter-1-thread-1 |
type | GAUGE |
name | notifications.received |
value | 2 |
Grok Pattern
We will use Grok Debugger to test our pattern to match the log data.
Let's disintegrate the log data to create a pattern that matches a particular field:
Field | Pattern |
---|---|
timestamp | %{TIME} |
type | %{DATA} |
name | %{DATA} |
value | %{POSINT} |
The field thread
, can be a combination of the alphanumeric characters.
So, we need to use oniguruma
to match the field logthread
. Considering the syntax of oniguruma, we need to create a regex pattern that will match the value of the field logthread
Constructing Regex Pattern
We now use Regex Checker that will help us to construct and test the regex pattern for the value of field logthread
The (?:[()a-zA-Z\d-]+)
non-capturing group matches single character present in the list below:
-
+
greedy match i.e. matches the previous token between one and unlimited times, as many times as possible -
()
matches a single character in the list () -
a-z
matches a single character in the range between a and z -
A-Z
matches a single character in the range between A and Z -
\d
matches a digit -
-
matches the character -
Oniguruma
The final Oniguruma pattern for the field logthread
:
(?<logthread>(?:[()a-zA-Z\d-]+))
Grok Pattern + Oniguruma (Final Pattern)
The final pattern that will match the log data:
%{TIME:timestamp} \((?<logthread>(?:[()a-zA-Z\d-]+))\) type=%{DATA:type}, name=%{DATA:name}, value=%{POSINT:value}
Output of the pattern
{
"timestamp": [
[
"09:33:45,416"
]
],
"HOUR": [
[
"09"
]
],
"MINUTE": [
[
"33"
]
],
"SECOND": [
[
"45,416"
]
],
"logthread": [
[
"metrics-logger-reporter-1-thread-1"
]
],
"type": [
[
"GAUGE"
]
],
"name": [
[
"notifications.received"
]
],
"value": [
[
"2"
]
]
}
Conclusion
The combination of Grok Pattern and Oniguruma is a perfect pair. Tha pairing can help to transform any complex logs into structured data. Give it a try using Grok Pattern + Oniguruma
in Logstash !!
Let me know in the comments if you have any better way of doing or facing any problem with the above example.
Top comments (0)