DEV Community

Luke Liukonen
Luke Liukonen

Posted on

Logging Framoworks... Moving from SEQ to f/Lo/G... and why I didn't pick ELK

First.. anyone with a "FLOG" app... not trying to take the name away or claim it, but I do think Flog is a neat little way of explaining the logging stack... similar to ELK... (FluentBit / Loki / Grafana)

the why

With the latest flare-ups of companies doing a "rug pull" on the licensing schemas, locking down functionality, and outright abandoning the people who put them in the spots they are in (looking at you, redis)... I felt it was time to start migrating any proprietary solutions over to more free and open-source solutions. I'm not in any way saying DataLust is doing this with their SEQ product. I have used this for years as a log aggregator and it has worked very well. I did have a few hiccups a bit back over looking at logs on my computer and phone at the same time due to limitations of the software, but overall, it has done what I wanted it to do, which is a very lightweight logging system that just works out of the box. Evaluating the options though was rough. The industry standard is ELK (or an Opensearch variation) is the way to go. Having Elestisearch as your main engine, Logstash as the aggregator, and Kilbana as your main UI felt very heavy to me in terms of running something that does one thing... or in my mind does one thing). Also.. is it just me, or do setting up these solutions feel like you need a PHD in Logging to get them to work? So I did what any developer would do... google something for 5 minutes, give up, and ask the ChatGPT what it would do. A lot of the recommendations it gave Im sure are great, but for the full suite of things to run on something like a Raspberry Pi.. most of my research pointed to Grafana Loki as the main product to use.

I've used Loki before, and it was OK, but Grafana was kind of overkill. I originally started with Grafana / Prometheus / Agents to monitor my systems. This solution is great, but very, very overkill for what I have, which is a basic home network with a handful of services I host. I prefer and still use, Uptime Kuma as my main monitoring solution, but when using Loki, I was overwhelmed. Not so much in the data, but in the amount of data. I was using Promtail as my collector, and I felt the system was very, very slow... especially with the amount of data I collected. See, I was collecting all of my docker container logs, and some of my home lab services are very chatty to the console log. That said, I figured I'd give it a fresh start, and use something other than promtail. To me, the concept of something like Promtail that would either have to read the data live, or query it every so many minutes felt like a waste. When it comes to my applications I want live data coming into one and only one location, and limited data.

I could have used Lokis rest API for my solution. I hate adding complexity to a project, but I do love performance. For me, FluentBit felt like a good solution for this. I can configure my endpoints, and how I want to connect to them, which I did. I picked a TCP connection since it is a level down in the network stack from http, and doesn't need a handshake or the overhead of establishing a connection to transmit the data. Meaning my log event is as light as possible. It also gave me a similar trio of services as the ELK platform each serving the same use case. The 3rd reason I went with FluentBit was if I ever needed to switch over to ELK or Opensearch, I could. FluentBit is a great, quick middleware platform and is recommended for people who don't want to use Logstash in their ELK platform.

final thought on this... In terms of resources used, looking them up on my Portainer instance, it appears my "Flog" stack uses just a few more resources than my SEQ instance. This is good since I was worried that by cutting over to a multi-app stack, somehow I'd be pushing the hardware limits on my Raspberry Pi.

the code

The cutover. Im at the time doing a proof of concept more than anything, but I found that Serilog for DotNet was probably the best bet in cutting over. My services already use the Microsoft Logging platform, and my original service was calling in the service.cs a simple

services.AddLogging(loggingBuilder =>
 { loggingBuilder.AddSeq(Configuration.GetSection(SEQ));
Enter fullscreen mode Exit fullscreen mode

With Serilog, it took a bit, but I was able to get it fairly close to the same code. I needed to go to Nuget and install the serilog.aspnetcore and packages, and really, really tried using the serilog.configuration package to make the imports easy, but settled without the automatic configuration bindings. This left me with

 services.AddLogging(loggingBuilder =>
    var logger = new LoggerConfiguration()
      .Enrich.WithProperty("App", name)
      .WriteTo.TCPSink(loggerConnection, new CompactJsonFormatter(), Serilog.Events.LogEventLevel.Warning)
Enter fullscreen mode Exit fullscreen mode

Not as clean as Seq importing, but to be fair, not too bad either. I can reduce it by not writing to the console if I don't want to, and I am adding an extra app attribute to the JSON structure so that I can tell what app is causing the error message.

Now to the server-side stuff.
I am a huge fan of docker compose files. This stack is no exception. for it, the setup is as follows (note: anything with an _ is a folder)

├── docker-compose.yml
├── fluent-bit.conf
├── loki-config.yaml
├── _fluentbit
│ └── _logs
├── _loki
│ ├── _loki
│ └── _wal
└── _grafana

I prefer having my configs under the same as my root, but each their own. Folders are needed for storing the live data. Im not a fan of volumes in the docker sense, and just mount to the folder that exists in the project.

The first step in the process is fluent. The config I have set is

    name http
    port 24224   

    Name            tcp
    Port            12201

    Name        grafana-loki
    Match       *
    Url         http://loki:3100/loki/api/v1/push
    RemoveKeys  source,container_id
    Labels      {job="fluent-bit"}
    LabelKeys   container_name
    BatchWait   1s
    BatchSize   1001024
    LineFormat  json
    LogLevel    info

Enter fullscreen mode Exit fullscreen mode

I give 2 endpoints, on 2 different ports (12201, 24224)
Input 1 is an HTTP endpoint that I can make post-calls to. This will come in handy for basic bash shell scripts I have, and logging their items. The second input is the TCP input. This one I will use for any application-based logging I do. The output is configured specifically for Loki, and the log levels are set to info, while I could have gone higher, I think having that limitation in the application fits better than in the logger (at least for my scenario which is hosted 100% internally and has a lesser likelihood of being breached).

From FluentBit, we go to the Loki Configuration

auth_enabled: false
  http_listen_port: 3100
  grpc_listen_port: 9095

        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 15m
  max_chunk_age: 1h
  chunk_target_size: 1048576
  chunk_retain_period: 30s

    - from: 2020-02-25
      store: boltdb
      object_store: filesystem
      schema: v11
        prefix: index_
        period: 24h

    directory: /var/lib/loki/index

    directory: /var/lib/loki/chunks

  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h

  max_look_back_period: 0s

  retention_deletes_enabled: true
  retention_period: 30d

  working_directory: /var/lib/loki/boltdb-shipper
  shared_store: filesystem

    type: local
      directory: /var/lib/loki/rules
  rule_path: /var/lib/loki/rules-temp
Enter fullscreen mode Exit fullscreen mode

Im sure there is some cleanup I can do with this. For one, Im not using grpc for any of my logging, so I don't think I need this (could be wrong though) My goal for this though is to keep data for about a month before throwing it away (which means persistence with the Loki instance) I've yet to run this any longer then a few hours, so there might end up being some corrections to these files.

Third and last is the compose file. I could have set up loki right off the bat in the compose, or something like that, but it takes a few seconds to add Loki (HTTP://loki:3100) to the sources in grafana.

version: '3'

    image: grafana/grafana:latest
      - "3003:3000"
      - ./grafana:/var/lib/grafana
      - logging_network

    image: grafana/loki:latest
    command: -config.file=/etc/loki/local-config.yaml
      - "3100:3100"
      - ./loki/loki:/var/lib/loki/
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - ./loki/wal:/wal
      - logging_network

    image: grafana/fluent-bit-plugin-loki:latest
    #image: fluent/fluent-bit:latest
    container_name: fluent-bit
      - LOKI_URL=http://loki:3100/loki/api/v1/push
      - LOG_PATH=/var/log/*.log=value
      - "24224:24224"
      - "12201:12201"
      - ./fluentbit/log:/var/log
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
      - logging_network


Enter fullscreen mode Exit fullscreen mode

Im currently using the Grafana/fluent-bit-plugin container which only runs on x64 at the moment. I do plan on trying to cut this over to the official Fluentbit container, which supports ARM-based CPUS.

As with any of my posts... if you see any improvements I can make or ways to clean things up, let me know. Im still in the middle of my project, but at a point right now where publishing a quick little how-to / why article felt ok to do.

Top comments (0)