DEV Community

Cover image for Detecting PII leakage in logs
Wesley Skeen
Wesley Skeen

Posted on

Detecting PII leakage in logs

First I wanted to mention I collaborated on this project and article with @mereta.

Before we begin, I want to direct you to the post I published to set up grafana locally using docker. Here you will find simple steps to get your environment set up to experiment.

Once you have this running, I want to direct you towards the promtail.yml file. This is what we are going to change to let promtail apply our PII detection logic.

Pipeline Stages

We are going to add pipeline_stages to this file.

Simply put, each log that gets passed through promtail will go through these stages. We can perform a number of actions that you can read in detail about here in the grafana docs, but I will go through stages to

  1. Detect PII
  2. Validate the result of the detection
  3. Create a label to hold the result

Detect PII

As part of the stages section, we added the regex stage

- regex:
    expression: '(?P<sensitive_email>([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+))'
Enter fullscreen mode Exit fullscreen mode

Here we add an expression. This is built up of 2 parts
(?P<{0}>({1}))

  • 0 - This is the variable that holds the result of the regex match
  • 1 - This is the actual regex used on the log content

Validate the result of the detection

Next we have the template stage

- template:
    source: sensitive_email
    template: '{{ not (empty .Value) }}' 
Enter fullscreen mode Exit fullscreen mode

This stage takes the result held in the variable that was set in the regex stage and applies some logic to it. This logic also updates the value of the variable.

Log value in sensitive_email {{ not (empty .Value) }} sensitive_email new value
My email is JP@mail.com JP@mail.com true true
My email is *** false false

Create a label to hold the result

For this all we have to do is add the following

- labels:
    sensitive_email:
Enter fullscreen mode Exit fullscreen mode

This adds a label to the log and sets its value to what is held in sensitive_email

Example of it working

I added a log in my API

_logger.LogInformation($"my data is JP@mail.com");
Enter fullscreen mode Exit fullscreen mode

Here is the result in Loki

Image of Loki with example of PII flagging

As you can see, the log line is

Close up image of the log content

and the value of sensitive_email is true

Close up image of label value

New content of promtail.yml

With the above addition of pipeline_stages this file should look like. I have added another example of detecting credit card PII.

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    pipeline_stages:      
      - match:
          pipeline_name: "security"
          selector: '{app="api"}'
          stages:

            - regex:
                expression: '(?P<sensitive_creditcard>(?:\d[ -]*?){13,16})'
            - regex:
                expression: '(?P<sensitive_email>([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+))'

            - template:
                source: sensitive_creditcard
                template: '{{ not (empty .Value) }}'
            - template:
                source: sensitive_email
                template: '{{ not (empty .Value) }}'            

            - labels:
                sensitive_creditcard:
                sensitive_email:

    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*local.log
          app: 'api'
Enter fullscreen mode Exit fullscreen mode

Using the results of these stages

There are several things you can do with these new log labels. Among others, you could

  1. Create an alert to detect if PII has leaked into your logs.
  2. Create dashboards to monitor base on the new labels
  3. You can do some interesting things in grafana such as route these logs to a different tenant. This tenant would have special privileges to view logs with PII contained.

Improvements

Merge the results of the regex matches into a single label.

First we need to update the source template to

- template:
    source: sensitive_email
    template: '{{ if not (empty .Value) }} true {{ end }}' 

- template:
     source: sensitive_creditcard
     template: '{{ if not (empty .Value) }} true {{ end }}'
Enter fullscreen mode Exit fullscreen mode

then we add a new source template to merge the results

- template:
     source: sensitive
     template: '{{ or .sensitive_email .sensitive_creditcard false }}'

- labels:
     sensitive:
Enter fullscreen mode Exit fullscreen mode

Top comments (0)