DEV Community

Jano Roberto Camacho Vicente
Jano Roberto Camacho Vicente

Posted on

Unleashing the Boundless Power of AWS EC2: Empowering Airflow and Beyond

In this lab we are going to create an EC2 Instance using Ubuntu OS, and on it we are going to install Apache Airflow to programmatically author, schedule and monitor workflows as we want, using Postgres as backend database. Why Postgres? Because it allows us to run multiple tasks at the same time in Airflow.

Architecture Diagram

Image description

Creating an EC2 Instance

  1. In the AWS Management Console search bar, enter EC2, and click the EC2 result under Services. You will be placed in the Amazon EC2 console.

  2. To start creating a new EC2 Instance, in the left-hand pane, click Instances option. The EC2 Instances list will load.
    Image description

  3. Click Launch instances. The Launch an instance form will load.
    Image description

  4. Under Name and tags section, enter the following:

    • Name: airflow-instance Image description
  5. Under Application and OS Images (Amazon Machine Image) section, select Ubuntu option.
    Image description

  6. Under Instance type section, in the drop-down list, select t2.small instance type. This is because we are going to use PostgreSQL as database in Airflow, and this needs a minimum of 2GB of RAM to work properly.
    Image description

    • Warning: This is not eligible for the AWS Free-Tier. However, the use of this instance type is cheeper.
  7. For a productive environment we will need to create a Key pair (login) to access our instance from the terminal, however for our demo we will not do it, since we will access our instance through Instance Connect.
    Image description

  8. Under Network settings section, select the following:

    • Create security group: Checked (In this lab we are going to create a new Security Group; however, if you already have a Security Group, you feel free to use it)
    • Allow SSH traffic from: Checked (For demo purposing we do it; however, for production environments it is recommended to map specific origin sources) Image description
    • Note: As shown in the image above, we also are going to create a new Security Group called β€œlaunch-wizard-1”. It is important to take it into account because later we are going to modify this security group to enable the use of the port 8080 for inbound requests.
  9. Finally, under Summary the section located on the top-right, click Launch instance button.
    Image description

  10. The Create key pair form will be shown to confirm/deny the creation of a key pair. For demo purposes, we select Proceed without key pair and then click Proceed without key pair button.
    Image description

  11. Click newly in the Launch instance button shown.
    Image description

  12. Back to the list of instances doing click in the View all instances button appeared in the bottom-right side.

  13. The next step will be configure the Security Group β€œlaunch-wizard-1” created in this phase. To do that, go to the left navigation pane and click Security Groups under the Network & Security section. The Security Groups list will load.
    Image description

  14. Select launch-wizard-1 security group, click Actions button and then click Edit inbound rules.
    Image description

  15. In the Edit inbound rules form open, click Add rule button.
    Image description

  16. For demo purposes, select/insert the next values and click Save rules button:

    • Type: Custom TCP
    • Protocol: TCP
    • Port range: 8080
    • Source: Anywhere - 0.0.0.0/0 Image description

Installing Airflow

  1. Back to the list of instances doing click Instances option located in the left-hand side navigation pane.
    Image description

  2. When the Instance state column is Running, right click on our EC2 instance called airflow-instance and then click Connect. The Connect to instance form will load.
    Image description

  3. Select EC2 Instance Connect tab option. Under it, in the Connection Type section, select Connect using EC2 Instance Connect.

  4. Under Public IP address section, copy the public ip and save it in a secure place (we will use it later to connect to our Airflow instance).

  5. Click Connect. A new ssh browser tab will load.
    Image description

  6. Inside the new ssh browser tab, type the following to update out SO:

    sudo apt update
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  7. Install pip for Python 3. The command below will also install all the dependencies required for building Python modules.

    sudo apt install python3-pip
    

    We will probably be asked to accept or not accept certain dependencies. In case it appears to us, insert Y and then press Enter to continue.

  8. After the installation of pip, we will be asked to select the services we want to restart. We select all (we can select one by one using the space bar on our keyboard) and press Enter.
    Image description

  9. Install SQLite 3. Initially we will use this package to install Airflow.

    sudo apt install sqlite3
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  10. A good practice when installing specific packages is to work with virtual environments. This will help us to have a better management of the packages that we install:

    sudo apt install python3.10-venv
    

    We will probably be asked to accept or not accept certain dependencies. In case it appears to us, insert Y and then press Enter to continue.

  11. Create a new virtual environment called venv:

    python3 -m venv venv
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  12. Activate our virtual environment created in the previous step:

    source venv/bin/activate
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  13. Because we are going to install Airflow using Postgres as database, we need to install some additional libraries for that:

    sudo apt-get install libpq-dev
    

    We will probably be asked to accept or not accept certain dependencies. In case it appears to us, insert Y and then press Enter to continue.

  14. After the installation of libpq-dev, we will be asked to select the services we want to restart. We select all (we can select one by one using the space bar on our keyboard) and press Enter.
    Image description

  15. Install Airflow package with postgres and complementary dependencies. The selected version for this demo will be 2.5.0:

    pip install "apache-airflow[postgres]==2.5.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.5.0/constraints-3.7.txt"
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  16. Initialize the database backend in Airflow:

    airflow db init
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  17. Install Postgres:

    sudo apt-get install postgresql postgresql-contrib
    

    We will probably be asked to accept or not accept certain dependencies. In case it appears to us, insert Y and then press Enter to continue.

  18. Access to the postgres instance created in the previous step:

    sudo -i -u postgres
    

    Image description

  19. Start the postgres client:

    psql
    

    Image description

  20. Create Airflow database objects and permissions:

    CREATE DATABASE airflow;
    CREATE USER airflow WITH PASSWORD 'airflow';
    GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  21. Exit the postgres client and return to the postgres instance:

    \q
    

    Image description

  22. Exit the postgres instance connection:

    exit
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  23. The next step will be change the sql_alchemy_conn and executor values inside the airflow.cfg file. For this, open the airflow.cfg file and replace the values of the sql_alchemy_conn and executor variables using vim, nano or any other preferred editor:

    cd ~/airflow/
    vim airflow.cfg
    
    # Locate the "sql_alchemy_conn =" line and replace the current line by this:
    sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost/airflow
    
    # Locate the "executor =" line and replace the current line by this:
    executor = LocalExecutor
    
    # Save the changes and close the airflow.cfg file
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  24. Initialize the database backend in Airflow to take the new changes:

    airflow db init
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  25. Create a new admin user called airflow:

    airflow users create -u airflow -f airflow -l airflow -r Admin -e airflow@gmail.com
    

    Where:

    • -u β†’ Username
    • -f β†’ First name
    • -l β†’ Last name
    • -r β†’ Role
    • -e β†’ Email

    Note: We will asked to enter a password. Insert airflow and press Enter. At this step, we have our user (airflow) and password (airflow) configured.

  26. Start the web server in the background:

    airflow webserver &
    

    Note: After that press Enter to insert the command detailed in the next step. Don’t worry, the webserver service will continue to run.

  27. Start the scheduler in the foreground:

    airflow scheduler
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  28. Do you remember the EC2 instance ip we saved in the step 4 of this section? Yes, we are going to use it right now! Open a new tab browser (in our case we chose Chrome) and go to the next url format:

    http://<my_ec2_public_ip>:8080
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  29. Congratulations! Airflow is running πŸš€
    Image description

  30. Insert the Airflow credentials created in the step 25 and click Sign In button.

    • Username: airflow
    • Password: airflow Image description
  31. Feel free to explore and run some DAG examples to be sure Airflow is working properly. After that, if we want to stop the Airflow services, back to the EC2 Instance Connect session and insert the following commands:

    # Press "ctrl + c" twice to stop the scheduler and insert the following:
    kill $(ps -ef | grep "airflow scheduler" | awk '{print 
    $2}')
    kill $(ps -ef | grep "airflow webserver" | awk '{print $2}')
    

    πŸ‘¨β€πŸ’»πŸ‘©β€πŸ’»

  32. Finally, don’t forget to delete the resources created as part of this lab to avoid unexpected costs in our billing πŸ˜‰. Basically we need to delete the following resources:

    • EC2 instance: airflow-instance
    • Security Group: launch-wizard-1

Conclusion

AWS EC2 is an unstoppable force in the world of cloud computing, and its power to transform businesses and streamline operations is truly remarkable. In this article, we explored the incredible capabilities of EC2, specifically its ability to seamlessly integrate and leverage third-party tools like Airflow. By harnessing the boundless power of EC2, organizations can unlock new frontiers of data orchestration, workflow management, and beyond.

🌟 Attention, cloud and data enthusiasts! 🌟 Are you ready to join an incredible community of like-minded individuals who are passionate about cloud and data topics? Look no further! Follow me on my social networks for a thrilling journey into the world of cloud and data. πŸš€

🌐 Follow me:
πŸ§‘β€πŸ’» Medium: jandro898.23
πŸ§‘β€πŸ’» Github: jandroro
πŸ§‘β€πŸ’» Youtube: The Cloud Lover
πŸ§‘β€πŸ’» LinkedIn: jano-camacho-vicente

Top comments (0)