How to run a development environment on docker-compose
Quick overview of how to run Apache airflow for development and tests on your local machine using docker-compose.
We will be still using unofficial puckel/docker-airflow
image. There is already an official docker image but I didn't test it yet.
Requirements
- docker
- docker-compose - https://docs.docker.com/compose/install/
Project structure
- docker-compose.yml - configuration file for the docker-compose
- dags - will contain all our dags
- lib - will contain all our custom code
- test - will contain our pytests
- .env - file with environment variables that we wish to include the containers
The environment variables are very handy because they allow you to customize almost everything in Airflow (https://airflow.apache.org/docs/stable/best-practices.html?highlight=environment#configuration)
docker-compose.yml
The basic structure:
version: '2.1'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
webserver:
image: puckel/docker-airflow:1.10.9
restart: always
mem_limit: 2048m
depends_on:
- postgres
env_file:
- .env
environment:
- LOAD_EX=n
- EXECUTOR=Local
volumes:
- ./dags:/usr/local/airflow/dags
- ./test:/usr/local/airflow/test
- ./plugins:/usr/local/airflow/plugins
# Uncomment to include custom plugins
- ./requirements.txt:/requirements.txt
- ~/.aws:/usr/local/airflow/.aws
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
As you can see we have several things there:
- we allow to pass custom environment variables straight from the dotenv file (best practice is not include it in the files)
- we will use postgres instance running as another docker container
- we share our dags/test/plugins directories with the host so we can just edit our code on our local machine and run all the tests in container
Dummy DAG
Let's edit our first DAG: dags/dummy_dag.py
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime
with DAG('my_dag', start_date=datetime(2016, 1, 1)) as dag:
op = DummyOperator(task_id='op')
Running the environment
$ docker-compose up
Starting airflow-on-docker-compose_postgres_1 ... done
Starting airflow-on-docker-compose_webserver_1 ... done
Attaching to airflow-on-docker-compose_postgres_1, airflow-on-docker-compose_webserver_1
[...]
webserver_1 | __init__.py:51}} INFO - Using executor [2020-05-05 10:19:08,741] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1 | LocalExecutor
webserver_1 | [2020-05-05 10:19:08,743] {{dagbag.py:403}} INFO - Filling up the DagBag from /usr/local/airflow/dags
Let's open the (http://localhost:8080)
Running the tests in the environment
In order to run the tests in the environment we can just run:
docker-compose run webserver bash
This will give us access to the bash running in the container:
➜ airflow-on-docker-compose git:(master) ✗ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
WARNING: You are using pip version 20.0.2; however, version 20.1 is available.
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
airflow@be3e69366e23:~$ ls
airflow.cfg dags plugins test
airflow@be3e69366e23:~$ pytest test
bash: pytest: command not found
Of course we didn't install pytest yet - this is very easy:
$ echo "pytest" >> requirements.txt
$ docker-compose run webserver bash
Starting airflow-on-docker-compose_postgres_1 ... done
Collecting pytest
Downloading pytest-5.4.1-py3-none-any.whl (246 kB)
|████████████████████████████████| 246 kB 222 kB/s
Collecting more-itertools>=4.0.0
Downloading more_itertools-8.2.0-py3-none-any.whl (43 kB)
|████████████████████████████████| 43 kB 3.1 MB/s
Collecting wcwidth
Downloading wcwidth-0.1.9-py2.py3-none-any.whl (19 kB)
Requirement already satisfied: importlib-metadata>=0.12; python_version < "3.8" in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (1.5.0)
Collecting packaging
Downloading packaging-20.3-py2.py3-none-any.whl (37 kB)
Collecting pluggy<1.0,>=0.12
Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Collecting py>=1.5.0
Downloading py-1.8.1-py2.py3-none-any.whl (83 kB)
|████████████████████████████████| 83 kB 956 kB/s
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/site-packages (from pytest->-r /requirements.txt (line 1)) (19.3.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < "3.8"->pytest->-r /requirements.txt (line 1)) (2.2.0)
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from packaging->pytest->-r /requirements.txt (line 1)) (1.14.0)
Collecting pyparsing>=2.0.2
Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
|████████████████████████████████| 67 kB 624 kB/s
Installing collected packages: more-itertools, wcwidth, pyparsing, packaging, pluggy, py, pytest
WARNING: The scripts py.test and pytest are installed in '/usr/local/airflow/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed more-itertools-8.2.0 packaging-20.3 pluggy-0.13.1 py-1.8.1 pyparsing-2.4.7 pytest-5.4.1 wcwidth-0.1.9
We can implement our first basic test taken directly from (https://github.com/apache/airflow/blob/master/docs/best-practices.rst)
from airflow.models import DagBag
def test_dag_loading():
dagbag = DagBag()
dag = dagbag.get_dag(dag_id='dummy_dag')
assert dagbag.import_errors == {}
assert dag is not None
assert len(dag.tasks) == 1
And now we can freely run our tests:
airflow@a6ca8c1b706d:~$ .local/bin/pytest
========================================================================== test session starts ==========================================================================
platform linux -- Python 3.7.6, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /usr/local/airflow
plugins: celery-4.4.0
collected 1 item
test/test_dag_loading.py .
===================================================================== 1 passed in 0.83s =====================================================================
All the code can be found here: https://github.com/troszok/airflow-on-docker-compose
Top comments (1)
If you’re searching for the best places to go zip lining, look no further than this curated collection of the top 13 most amazing ziplines in the world! We hand-picked the fastest, highest, and most beautiful locations for ziplining across the planet.