loading...

List unique paths requested by popularity in NGINX logs (access.log)

peterc profile image Peter Cooper ・1 min read

We had a big pile of NGINX access.log files for our site and wanted to quickly know all of the unique paths that had been requested.

If your access.log file(s) follow a reasonably standard format that looks like this:

127.0.154.222 - - [19/Oct/2020:06:26:59 +0000] "GET / HTTP/1.1" 301 178 "-" "-"
Enter fullscreen mode Exit fullscreen mode

.. then you can use this solution:

awk -F\" '{print $2}' access.log | awk '{print $2}' | sort | uniq -c | sort -g
Enter fullscreen mode Exit fullscreen mode

The output will look like this:

[lots of stuff here]
    104 /xmlrpc.php
    114 /wp-includes/wlwmanifest.xml
    121 /robots.txt
    161 /feed/
    336 /
   3056 //xmlrpc.php
  53786 /wp-login.php
Enter fullscreen mode Exit fullscreen mode

So what's going on?

awk -F\" '{print $2}' access.log splits each line on the first quotation mark and returns the second part.

awk '{print $2}' then skips the HTTP verb (GET/POST/PUT/etc.) and prints out the path (which follows the space after the HTTP verb).

sort sorts the output into groups of the same thing which..

uniq -c then turns into a list of the unique paths only. The -c prefixes the output with the number of non-unique lines.

sort -g then sorts the lines in numeric order.

Want the result in descending numeric order? Use sort -gr instead.

Discussion

pic
Editor guide