DEV Community

Cover image for Airflow on EC2
Kunal Shah
Kunal Shah

Posted on • Edited on

Airflow on EC2

Airflow — The Easy Way

“Running Airflow on AWS EC2 & RDS using docker-compose”

Hello Folks,

I am Kunal Shah, AWS Certified Solutions Architect, helping clients to achieve optimal solutions on the Cloud. Cloud Enabler by choice, having 7+ Years of experience in the IT industry.

I love to talk about Cloud Technology, DevOps, Digital Transformation, Analytics, Infrastructure, Dev Tools, Operational efficiency, Cost Optimization, Cloud Networking & Security.

You can reach out to me @ www.linkedin.com/in/kunal-shah07

Abstract

For quick set up of Apache Airflow, we will deploy airflow using docker-compose and run it on AWS EC2 & RDS Instance.

Some of the readers reached out to me for more easy & development friendly playground for Airflow Setup on AWS.

Here I am with Airflow — The Easy Way

Table Of Contents

  • Introduction

  • Prerequisites

  • Architecture

  • AWS Infrastructure Provisioning

  • Airflow Provisioning

  • Environment Validation

  • Cleanup

Introduction -

Airflow — Please check my first blog

docker-compose — It is used to run multiple containers as a single service. For example, suppose you had an application which required NGNIX and MySQL, you could create one file which would start both the containers as a service without the need to start each one separately.

The docker-compose.yaml contains several service definitions:

  1. airflow-scheduler — The scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.
  2. airflow-webserver — The webserver available at http://localhost:8080.
  3. airflow-worker — The worker that executes the tasks given by the scheduler.
  4. airflow-init — The initialization service.
  5. flower — The flower app for monitoring the environment & available at http://localhost:5555.
  6. redis — The redis — broker that forwards messages from scheduler to worker.

Some directories in the container are mounted, which means that their contents are synchronized between the services.

  • ./dags — you can put your DAG files here.

  • ./logs — contains logs from task execution and scheduler.

  • ./plugins — you can put your custom plugins here.

Prerequisites -

  • Must have access to an AWS account with the required roles or permissions. The below steps can be run from AWS EC2 Instance(Ubuntu) in the given AWS account with necessary access permissions.

  • AWS Services — Full Access to RDS, EC2, IAM, S3, VPC

  • Tools Dependencies — AWS CLI (V2), Cron, docker-compose

Architecture -

High Level — Airflow on EC2 & RDS Architecture

AWS Infrastructure Provisioning -

  • Create two S3 buckets for DAGs & Plugins from AWS Console.

  • Amazon EC2 Instance having latest Ubuntu AMI.

  • Amazon RDS PostgreSQL Database

  • Deploy the CloudFormation scripts from Repo

  • Your AWS EC2 Instance & AWS RDS Instance are ready to use.

  • Install AWS CLI version 2 & configure https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

    $ aws configure
    AWS Access Key ID [None]: (Your Access Key)
    AWS Secret Access Key [None]: (Your Secret Key)
    Default region name [None]: (Your Region)
    Default output format [None]: json

  • Install Ubuntu Desktop & XRDP for remote RDP.

    # sudo apt-get update && sudo apt-get upgrade
    # sudo apt install tasksel
    # sudo tasksel install ubuntu-desktop
    # reboot (You have to Log In Again to EC2 Instance & run the below command)
    # sudo apt-get install xrdp

  • Now you can either change the user ubuntu password or create a new user.

  • This will be used for RDP authentication.

  • Install vim editor -> apt install vim

  • Install Cron -> apt install cron

  • (Optional) Install Google Chrome browser. Run below mentioned commands in the given order.

    # wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
    # sudo apt install ./google-chrome-stable_current_amd64.deb

Airflow Provisioning -

  • Copy the docker-compose.yaml file on AWS EC2 Instance & update below parameters.

    ‘AIRFLOW_CORESQL_ALCHEMY_CONN’
    ‘AIRFLOW
    CELERY_RESULT_BACKEND’

  • set the env variable -> echo -e “AIRFLOW_UID=50000
    AIRFLOW_GID=0” > .env

  • Create local folders on EC2 instance -> mkdir ./dags ./logs ./plugins

  • Install docker-compose ->

    sudo curl -L “https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)” -o /usr/bin/docker-compose

  • Set Crontab to Sync s3 folder to EC2 local folder.

    # crontab -e
    # add below commands inside the editor.
    # * * * * * /usr/local/bin/aws s3 sync s3:// /root/dags/
    # * * * * * /usr/local/bin/aws s3 sync s3:// /root/plugins/
    # Change s3 folder as per your environment bucket folder.

  • Start the Cron service -> service cron start

  • Deploy Airflow through docker-compose -> docker-compose up -d

  • Please verify container status using below commands from EC2 bash terminal

    # docker ps
    # docker-compose run airflow-worker airflow info

docker ps — output

  • To upload custom DAGs on Airflow Web UI -

  • We need to upload DAGs & plugins file in the respective created s3 bucket.

Environment Validation -

Airflow Web UI

  • Enter Credentials

    username — airflow
    password — airflow

  • After login Check the DAGs & start running it.

Example DAGs

  • As you trigger the DAG, Airflow will create pods to execute the code included in the DAG.

DAGs Running Status

  • Check RDS connections on AWS Console it will show current connections from Airflow docker.

  • Voilaaaa..!! Airflow is ready on AWS EC2 & RDS.

  • Pros- Easy, Fast, developer friendly setup

  • Cons- Not production ready, Performance issues, Slowness

Cleanup -

  • docker compose stop.

  • Delete the CloudFormation template of AWS EC2 & RDS.

  • Delete the S3 buckets created from console.

THANK YOU & FOLLOW FOR MORE..

I had fun deploying this setup & playing around AWS EC2, RDS & AIRFLOW.

Hope you guys like it & start playing around.

“Nothing is particularly hard if you break it down into small bits”

Image Source — Google

Top comments (0)