DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Hassan Ibrahim
Hassan Ibrahim

Posted on

Ship application logs from AWS ECS or EC2 to OpenSearch - Filebeat

On this post, we explore the options to ship your applications' logs to OpenSearch using Filebeat. The applications will be running as AWS Elastic Container Service ECS, Kubernetes Pods, or applications running on AWS EC2.
We will use a very popular tool Filebeat. Filebeat is Lightweight shipper for logs. Whether you’re collecting from security devices, cloud, containers, hosts, or OT, Filebeat helps you keep the simple things simple by offering a lightweight way to forward and centralize logs and files.
The stack we will use also know as ELK (Elasticsearch, Logstash, and Kibana). The same stack will work for OpenSearch as well. We will discuss in another post using FluentD instead of Logstash, but using Filebeat remains the same.
Let's have an overview of the solution as in the following diagram:

Filebeat with ECS
From the above diagram, we have the following:

  1. The ECS cluster is composed from one or many EC2 instance.
  2. The applications running as container on EC2 instance.
  3. The containers are configured to write logs from stdout and stderr to a location on the host using volumes.
  4. Each EC2 instance has a running instance of Filebeat as a container.
  5. The Filebeat read and forward logs to Logstash/FluentD.
  6. The Logstash/FluentD collect, transform the logs, and them to OpenSearch.

Optionally, you can configure the Filebeat to send the data directly to OpenSearch if you don't need to have a transformation on the logs data.

Configure and Build Filebeat Container Image

Now, we will build a container image that will setup and configure a Filebeat instance as a container. This Filebeat will collects the logs as per filebeat.yml configuration file.
Filebeat configurations file:

#=========================== Input for Harvesters ===============================
filebeat.inputs:
- type: log
  fields:
    type: syslogMessages
  scan_frequency: 5s
  close_inactive: 1m
  backoff_factor: 1
  backoff: 1s
  paths:
    - /host/var/log/messages
  tail_files: true
  fields_under_root: true

- type: log
  fields:
    type: ecsAgent
  scan_frequency: 5s
  backoff_factor: 1
  close_inactive: 10s
  backoff: 1s
  paths:
    - /host/var/log/ecs/ecs-agent.log.*
  fields_under_root: true


- type: container
  enabled: true
  fields_under_root: true
  overwrite_keys: true
  fields:
    type: docker
  paths:
   - /host/var/lib/docker/containers/*/*.log
  stream: all


#================================ General =====================================
filebeat.shutdown_timeout: 5s

logging.metrics.enabled: true
logging.metrics.period: 60s
logging.level: info

fields_under_root: true
fields:
  accountId: "'${ACCOUNTID}'"
  instanceId: ${INSTANCEID}
  instanceName: ${INSTANCENAME}
  region: ${REGION}
  az: ${AZ}
  environment: ${ENV}

#================================ Outputs =====================================
output.logstash:
  hosts: ["__OUTPUT.LOGSTASH__"]
  compression_level: 1
  worker: 1
  bulk_max_size: 1024
  ttl: 18000
  ssl.certificate_authorities: ["/etc/filebeat/logstash.pem"]
  max_retries: -1

# Uncomment to enable console output for debugging, Disable the above logstash output by commenting
# output.console:
#   pretty: true
Enter fullscreen mode Exit fullscreen mode

The above configuration file has the following:

  • Under filebeat.inputs:, we telling filebeat to collect logs from 3 locations. The first one for the host logs, the EC2 logs, the second for ecsAgent logs, and the third is the any logs from the containers running on the host.
  • Then, we are enabling the filebeat metrics and enriching the logs entries with additional data such as the AWS AccountId, EC2 Instance Id and Name, AWS Region and environment.
  • Lastly, we telling Filebeat where to forward data, we configure it to logstash endpoint. You can have a look on all filebeat configuration options from here

The next step is to create a container image and push it to AWS ECR. The Dockerfile is

FROM docker.elastic.co/beats/filebeat:7.16.3

USER root

COPY docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
COPY filebeat.yml /etc/filebeat/filebeat.yml
COPY logstash.pem /etc/filebeat/logstash.pem
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
CMD [ "filebeat", "-e", "-c", "/etc/filebeat/filebeat.yml", "-httpprof", "0.0.0.0:9700"]
Enter fullscreen mode Exit fullscreen mode

As you noticed, the Dockerfile is copying 3 files:

  • docker-entrypoint.sh, this is the entry file for Filebeat container, we see it in the next step.
  • filebeat.yml: this the filebeat configuration file, we just created above.
  • logstash.pem: this is Logstash public key certificate file.

The docker-entrypoint.sh

#!/bin/bash
set -e

# Add filebeat as command if needed
if [ "${1:0:1}" = '-' ]; then
        set -- filebeat "$@"
fi

exec "$@"
Enter fullscreen mode Exit fullscreen mode

Alright, now you have everything to build a container image. Use docker build command and push it your repository.

Now the tricky part to have a Filebeat running as a container on EC2 instance. First, we have an ECS cluster that running on EC2s. The Cluster instance is manage by Auto-Scaling group that responsible to keep the desired number of instances running. In the LaunchConfiguration for the Autoscaling group we will execute some commands to bring a Filebeat container on each EC2 instance.
We will create a shell script file that do this job. We called the file filebeat.sh. This file will be uploaded to S3 bucket as part of the build pipeline and then download during EC2 spin up from the LaunchConfiguration.

#!/bin/bash
ACCOUNTID=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep -oP '(?<="accountId" : ")[^"]*(?=")')
REGION=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep -oP '(?<="region" : ")[^"]*(?=")')
AZ=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep -oP '(?<="availabilityZone" : ")[^"]*(?=")')
INSTANCEID=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | grep -oP '(?<="instanceId" : ")[^"]*(?=")')
INSTANCENAME=$(aws ec2 describe-tags --filters "Name=key,Values=Name" "Name=resource-id,Values=$INSTANCEID" --region $REGION --query Tags[0].Value --output text)
ENV=$1

#download filebeat docker
AwsAccountId="123456789"
$(aws ecr get-login --no-include-email --region eu-west-1 --registry-ids $AwsAccountId)
ECR="$AwsAccountId.dkr.ecr.eu-west-1.amazonaws.com"

docker run --restart=always \
           --detach \
           --memory=200m \
           --memory-reservation=55m \
           --memory-swap=300m \
           --name='filebeat' \
           --env "ACCOUNTID=$ACCOUNTID" \
           --env "ENV=$ENV" \
           --env "REGION=$REGION" \
           --env "AZ=$AZ" \
           --env "INSTANCEID=$INSTANCEID" \
           --env "INSTANCENAME=$INSTANCENAME" \
           --publish 9700:9700 \
           --volume /var/lib/docker/containers/:/host/var/lib/docker/containers/ \
           --volume /var/log:/host/var/log/ \
           --volume /opt/filebeat/:/host/opt/filebeat/ \
           --log-driver=json-file \
           --log-opt labels=loggingstack.program,loggingstack.program-version \
           --log-opt max-size=50m \
           --log-opt max-file=5 \
           $ECR/logging/filebeat:latest
Enter fullscreen mode Exit fullscreen mode

The shell script is doing the following 3 steps:

  1. Extract the additional fields we have configured in filebeat.yml file such as AWS accountId from the EC2 metadata endpoint.
  2. Login to ECR repository that has the filebeat container image. This step vary based on the container image registry.
  3. The last step is using docker run command to create an filebeat container. The command pass the env variables, the image and some voulumes. Have you noticed that we making the container have access to other containers logs by using --volume /var/lib/docker/containers/:/host/var/lib/docker/containers/

The last step is making the LaunchConfiguration execute the shell script file. We used the AWS CloudFormation to create the ECS cluster. If you are creating from AWS Console, just copy the content under UserData. The LaunchConfiguration as follow:

ECSLaunchConfiguration:
        Type: 'AWS::AutoScaling::LaunchConfiguration'
        Properties:
            IamInstanceProfile: !GetAtt ECSInstanceProfile.Arn
            ImageId: !FindInMap [AmazonMachineImages, !Ref 'AWS::Region', Id]
            InstanceType: !Ref ECSInstanceType
            KeyName: !Ref KeyPairName
            SecurityGroups:
                -  !Ref ECSSecurityGroup
            UserData:
                Fn::Base64: !Sub |
                    #!/bin/bash -x
                    aws s3 cp [S3_PATH] filebeat.sh
                    chmod +x filebeat.sh
                    ./filebeat.sh ${EnvType}
Enter fullscreen mode Exit fullscreen mode

Don't forget to replace [S3_PATH] with your path. The ${EnvType} is passed from the CloudFormation stack parameter.
The next post is configuring the Logstash to complete the scenario.
Thanks.

Top comments (0)

Advice For Junior Developers

>> Check out this classic DEV post <<