DEV Community

Stefen
Stefen

Posted on

ELT Airflow Pipeline Project

About

Project using data engineering concepts.

The project is an ELT (Extract, Load, Transform) data pipeline, orchestrated with Apache Airflow through Docker containers.

Faker is used as a package to generate data to a mysql database. The data is extracted from mysql, transformed with pandas and Sql and then loaded into an Olap postgres database. A notification is then sent by email once the whole process is completed.

Architecture

Prerequisites

Setup mailtrap

One platform toTest, Send, Control your emails:

Setup

Clone the project to your desired location:

$ git clone https://github.com/Stefen-Taime/airflow_etl.git
Enter fullscreen mode Exit fullscreen mode

fill the AIRFLOW_SMTPSMTP_USER, AIRFLOWSMTPSMTP_PASSWORD, AIRFLOWSMTP_SMTP_MAIL_FROM in .envExample file:

AIRFLOW_ADMIN_MAIL=airflow
AIRFLOW_ADMIN_FIRSTNAME=airflow
AIRFLOW_ADMIN_NAME=airflow
AIRFLOW_ADMIN_PASSWORD=airflowpassword
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres+psycopg2://airflow:airflowpassword@postgres:5432/airflow
AIRFLOW__CORE__FERNET_KEY=81HqDtbqAywKSOumSha3BhWNOdQ26slT6K0YaZeZyPs=
AIRFLOW_CONN_METADATA_DB=postgres+psycopg2://airflow:airflowpassword@postgres:5432/airflow
AIRFLOW_VAR__METADATA_DB_SCHEMA=airflow
AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC=5
AIRFLOW__CORE__EXECUTOR=LocalExecutor
AIRFLOW__SMTP__SMTP_HOST=smtp.mailtrap.io
AIRFLOW__SMTP__SMTP_PORT=2525
AIRFLOW__SMTP__SMTP_USER=xxxxxxxxxxx
AIRFLOW__SMTP__SMTP_PASSWORD=xxxxxxx
AIRFLOW__SMTP__SMTP_MAIL_FROM=your_email@gmail.com
AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflowpassword
POSTGRES_DB=airflow
AIRFLOW_UID=1000
AIRFLOW_GID=0
AIRFLOW_UID=1000
AIRFLOW_GID=0
AIRFLOW_UID=1000
AIRFLOW_GID=0
PG_VER=14-alpine
POSTGRES_SRC_PASSWORD=Sup3rS3c3t
PORT=5432
POSTGRES_USER_OLAP=postgres
HOSTNAME=olap
ONTAINER_NAME=postgres
POSTGRES_DB_OLAP=postgres
Enter fullscreen mode Exit fullscreen mode

grant permissions to the bash script:

chmod a+x build_Services.sh
Enter fullscreen mode Exit fullscreen mode

Bash:

$ ./build_Services.sh
Enter fullscreen mode Exit fullscreen mode

Build Docker:

$ docker-compose up --build -d
Enter fullscreen mode Exit fullscreen mode

When everything is done, you can check all the containers running:

$ docker ps
Enter fullscreen mode Exit fullscreen mode




oltp Interface

Now you can access adminer web interface by going to http://localhost:8085 with the default user which is in the docker-compose.yml:

Système     MySQL
Serveur oltp
user root
password myrootpassword

Database testdb
Enter fullscreen mode Exit fullscreen mode




olap Interface

Now you can access new adminer web interface by going to http://localhost:8085 with the default user which is in the docker-compose.yml:

Système     PostgesSQL
Serveur olap
user postgres
password Sup3rS3c3t

Database postgres
Enter fullscreen mode Exit fullscreen mode




Airflow Interface

Now you can access Airflow web interface by going to http://localhost:8080 with the default user which is in the docker-compose.yml. Username/Password: airflow/airflowpassword:

Airflow DAG

Now you can run Airflow etl dag:

Check oltp and olap database

:)

Check your mailtrap.io/inboxes

Shut down or restart Airflow

If you need to make changes or shut down:

$ docker-compose down
Enter fullscreen mode Exit fullscreen mode




References

https://medium.com/@stefentaime_10958/elt-airflow-pipeline-project-dcf834c1be17

Top comments (0)