In the previous blog post, we saw how to create a production-ready logging system. In this blog post, we will see how to analyze the logs generated by the system.
🫱🫱🫱🫱Let's build a production-ready logger using Winston🫲🫲🫲🫲
While many advanced tools are available for log analysis, such as the ELK stack and Splunk, we will see how to do it the lame way using JavaScript🥱🥱.
The Log Format
The logs generated by the logger are in the following format:
Understanding The Log Format
The log format includes the following fields:
-
timestamp
: The time at which the log was generated -
level
: The log level (DEBUG, INFO, WARN, ERROR) -
Source IP
: The IP address of the source of the client making the request -
kind of request
: The kind of request made by the client (GET, POST, PUT, DELETE) -
response time
: The time taken by the server to respond to the request -
path
: The path of the request made by the client -
status code
: The status code of the response sent by the server -
response time
: The time taken by the server to respond to the request -
total time
: The total time taken by the server to process the request -
Remote IP
: The IP address of the client making the request
The Log Analyzer
Laying The Basis For The Log Analyzer
First, create a regular expression to parse the log format.
const logLevelPattern = /(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\s+(?<level>\w+):\s+(?<client_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)\s+(?<method>\w+)\s+(?<path>\/\S*)\s+(?<status>\d{3})\s+(?<response_time_1>\d+\.\d{3})\s+ms\s+(?<response_time_2>\d+\.\d{3})\s+ms\s+(?<other_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)/g;
The regular expression seems daunting at first sight, but it's not that hard to understand. It consists of named groups for each field in the log format. The named groups are enclosed in (?<name>pattern)
.
Using named groups makes it easier to extract the fields from the log line. For example, to extract the level
field from the log line, we can use the following code:
const logLine = '2024-07-04 10:42:09 error: ::ffff:72.200.26.133 GET /api/v1/resource 201 0.082 ms 0.177 ms ::ffff:72.200.26.133';
const match = logLevelPattern.matchAll(logLine);
for (const m of match) {
//We are interested in the groups from the match object because we have used named groups in the regex.
console.log(m.groups);
console.log(m.groups.level);
}
The match
object will have this kind of output:
The groups
property of the match object will have the named groups extracted from the log line. We can access the level
field using m.groups.level
.
Analyzing The Logs Using The Log Analyzer
Now that we have the regular expression to parse the log format, let's create a log analyzer that reads the logs from a file and analyzes them.
import { EventEmitter } from 'events';
import fs from 'fs';
import path from 'path';
class LogAnalyzer extends EventEmitter {
// initialize the log analyzer with the log file stream and the log level pattern and initialize the objects to store the analysis results
constructor(logFileStream, logLevelPattern) {
super();
this.logFileStream = logFileStream;
this.logLevelPattern = logLevelPattern;
this.time = {};
this.paths = {};
this.ips = {};
this.responseTime = [];
this.totalResponseTime = [];
this.count = 0;
}
//method to start the analysis
analyze() {
this.logFileStream.on('ready', () => console.log('================START======================='));
this.logFileStream.on('data', this.processChunk.bind(this));
this.logFileStream.on('end', this.finishAnalysis.bind(this));
}
//method to process each chunk of data read from the log file
processChunk(chunk) {
console.log('Processing chunk:', this.count);
this.logFileStream.pause();
const output = chunk.toString().matchAll(this.logLevelPattern);
for (const match of output) {
//extract the named groups from the match object
const { groups } = match;
//update the objects with the extracted fields
this.updateObjects(groups);
}
this.count++;
this.logFileStream.resume();
}
//method to update the objects with the extracted fields
updateObjects(groups) {
const hourKey = groups.time.split(':')[0] + ':00';
this.time[hourKey] = (this.time[hourKey] || 0) + 1;
this.updateObject(this.paths, groups.path);
this.updateObject(this.ips, groups.client_ip);
this.responseTime.push(parseFloat(groups.response_time_1));
this.totalResponseTime.push(parseFloat(groups.response_time_2));
}
//method to update an object with a key
updateObject(obj, key) {
obj[key] = (obj[key] || 0) + 1;
}
//method to finish the analysis
finishAnalysis() {
console.log('================END=========================');
console.log("Let's perform some analysis on the log file");
this.emit('analysisComplete', this.getAnalysisResults());
}
//method to sort an object based on the count of the keys
sortingObject(obj, max = true) {
//sorting is based on the count of the keys to get the most used or least used keys
if (max) {
return Object.entries(obj).sort((a, b) => b[1] - a[1]);
} else {
return Object.entries(obj).sort((a, b) => a[1] - b[1]);
}
}
//method to get the analysis results
getAnalysisResults() {
return {
timeDistribution: this.sortingObject(this.time, true).slice(0, 4),
mostUsedPathDistribution: this.sortingObject(this.paths, true).slice(0, 4),
leastUsedPathDistribution: this.sortingObject(this.paths, false).slice(0, 4),
ipDistribution: this.sortingObject(this.ips, true).slice(0, 4),
avgResponseTime: this.average(this.responseTime).toFixed(5),
avgTotalResponseTime: this.average(this.totalResponseTime).toFixed(5),
};
}
average(arr) {
return arr.reduce((a, b) => a + b, 0) / arr.length;
}
}
// Usage:
const logLevelPattern = /(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})\s+(?<level>\w+):\s+(?<client_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)\s+(?<method>\w+)\s+(?<path>\/\S*)\s+(?<status>\d{3})\s+(?<response_time_1>\d+\.\d{3})\s+ms\s+(?<response_time_2>\d+\.\d{3})\s+ms\s+(?<other_ip>::ffff:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|::1)/g;
const logFileStream = fs.createReadStream(path.resolve('./logs/combined/generated_logs.log'), {
encoding: 'utf-8',
highWaterMark: 10 * 1024, // 10KB
});
const analyzer = new LogAnalyzer(logFileStream, logLevelPattern);
analyzer.on('analysisComplete', (results) => {
console.log('Analysis complete. Results:', results);
});
analyzer.analyze();
Workflow:
- The
logFileStream
is created usingfs.createReadStream
to read the log file in chunks. - The
LogAnalyzer
class is created with thelogFileStream
andlogLevelPattern
as arguments. - The
analyze
method is called on theLogAnalyzer
instance to start the analysis. - The
processChunk
method is called when a chunk of data is read from the log file. It processes the chunk and extracts the fields from the log line using thelogLevelPattern
. - In the
processChunk
method,updateObjects
is called to update the objects with the extracted fields. - The
finishAnalysis
method is called when the analysis is complete. It emits an eventanalysisComplete
with the analysis results. - The
sortingObject
method is used to sort the objects based on the count of the keys. - The
getAnalysisResults
method returns the analysis results. - The
average
method calculates the average of an array of numbers. - The
analysisComplete
event is listened to and the analysis results are logged to the console. - The
analyze
method is called to start the analysis.
The LogAnalyzer
class reads the log file in chunks and processes each chunk to extract the fields from the log line. It updates the objects with the extracted fields and calculates the average response time and total response time. Finally, it emits an event analysisComplete
with the analysis results.
The analysis results include the time distribution, most used path distribution, least used path distribution, IP distribution, average response time, and average total response time.
Output:
This way we can analyze the logs generated by the logger using the log analyzer in an absolutely lame way😂😂
You can find the complete code for the log analyzer here
Generating random logs
Let's address the elephant in the room: not everyone has a log file from a real production system to analyze. So, let's generate some random logs using the faker
library.
import fs from 'fs';
import { faker } from '@faker-js/faker';
// Sample data
const methods = ["GET", "POST", "PUT", "DELETE"];
const paths = ["/", "/api/v1/resource", "/login", "/register"];
const ips = ['45.94.188.156', '238.249.31.148', '91.113.1.90', '113.232.207.105', '96.129.247.250', '105.171.179.234', '144.42.125.14', '109.111.74.178', '72.200.26.133', '83.65.134.149'];
const statusCodes = [200, 201, 400, 404, 500];
const levels = ["info", "error", "warn", "debug"];
// Function to generate a random timestamp
const randomTimestamp = (start, end) => {
const startDate = start.getTime();
const endDate = end.getTime();
return new Date(startDate + Math.random() * (endDate - startDate));
};
// Generate log entries
const logEntries = [];
const startDate = new Date(2024, 6, 4, 10, 41, 0); // Months are 0-based in JavaScript
const endDate = new Date(2024, 6, 4, 10, 45, 0);
for (let i = 0; i < 1000; i++) { // Generate 500 log entries
//Generating random log entries
const timestamp = randomTimestamp(startDate, endDate);
const dateStr = timestamp.toISOString().replace('T', ' ').substring(0, 19);
const level = faker.helpers.arrayElement(levels);
const clientIp = faker.helpers.arrayElement(ips);
const method = faker.helpers.arrayElement(methods);
const path = faker.helpers.arrayElement(paths);
const status = faker.helpers.arrayElement(statusCodes);
const responseTime1 = (Math.random() * (0.1 - 0.05) + 0.05).toFixed(3);
const responseTime2 = (Math.random() * (0.3 - 0.1) + 0.1).toFixed(3);
const otherIp = faker.helpers.arrayElement(ips);
logEntries.push(`${dateStr} ${level}: ::ffff:${clientIp} ${method} ${path} ${status} ${responseTime1} ms ${responseTime2} ms ::ffff:${otherIp}`);
}
// Write to log file
fs.writeFileSync('./logs/combined/generated_logs.log', logEntries.join('\n'));
console.log("Log file generated successfully.");
This script generates 1000 random log entries and writes them to a log file generated_logs.log
. The log entries are generated with random timestamps, log levels, client IPs, methods, paths, status codes, response times, and other IPs.
You can find the complete code for generating random logs here
Conclusion
In this blog post, we saw how to analyze the logs generated by the logger using the log analyzer. We created a log analyzer that reads the logs from a file, parses the log format, and analyzes the logs. We also generated random logs using the faker
library to test the log analyzer.
While the log analyzer is a simple way to analyze logs, it is not suitable for large log files or real-time log analysis. For large log files or real-time log analysis, you can use advanced log analysis tools like ELK stack, Splunk, etc. But for small log files or testing purposes, the log analyzer is a good starting point.
So... Thanks for reading this blog post. I hope you enjoyed it. If you have any questions or feedback, feel free to leave a comment below. And I'll...
PS:
I typically write these articles in the TIL form, sharing the things I learn during my daily work or afterward. I aim to post once or twice a week with all the things I have learned in the past week.
Top comments (0)