DEV Community

Claudemir Woche
Claudemir Woche

Posted on

Apache Airflow: How to use git-sync with multiple GitHub repositories

Apache Airflow DAGs versioning is an important topic. There's many ways to do it and a lot of tutorials covering how to do it with a single Git repository. In this post I will walk you through on how to use git-sync, Git Submodules and GitHub Workflows to sync Airflow DAGs of multiple GitHub repositories.

Setup - GitHub

In this post, I will use 3 GitHub Repositories. I will refer to them as:

Setup - Main Repository

  • Create the GitHub Main Repository
  • Create the GitHub DAGs repository 1
  • Create the GitHub DAGs repository 2
  • Clone the main repository to your local machine
  • Execute the following commands to add the first submodule to the Main repository:
git submodule add git@github.com:<your-user>/<your-dag-repo-1>.git
git submodule update --init --remote <your-dag-repo-1>
git submodule update --remote <your-dag-repo-1>
git add .
git commit -m 'Adding <dag-repo-1> submodule'
git push
Enter fullscreen mode Exit fullscreen mode
  • Execute the following commands to add the second submodule to the Main repository:
git submodule add git@github.com:<your-user>/<your-dag-repo-2>.git
git submodule update --init --remote <your-dag-repo-2>
git submodule update --remote <your-dag-repo-2>
git add .
git commit -m 'Adding <dag-repo-2> submodule'
git push
Enter fullscreen mode Exit fullscreen mode

Setup - Personal Access Token

You will need to create a GitHub Personal Access Token to access the Main Repository using GitHub Workflows.

To create a PAT:

  • Go to Profile -> Settings

Image description

  • On the left side bar -> Developer settings

Image description

  • Personal access tokens -> Fine-grained tokens

Image description

  • Generate new token

Image description

  • Fill with your infos

Image description

  • On Repository access, select "Only select repositories" and select your Main Repository

Image description

  • On Permissions -> Repository permissions. Select "Read and write" permissions to "Commit statuses" and "Contents". All other permissions can be left default

Image description

  • Generate the token and store the token somewhere safe. It will be used shortly.

Setup - DAGs repositories secrets

We will use the previously created PAT as Secrets in the two DAGs repositories.

You must do the following on both repositories:

  • On the repository settings

Image description

  • On the left side bar. Secrets and variables -> Actions

Image description

  • Create a Secret with the PAT

Image description

Setup - DAGs repositories workflows

We will create GitHub Workflows to sync the DAGs repositories with the Main Repository.

On both DAGs repositories:

  • On the root folder of the repo, create the file: .github/workflows/github_ci_sync_main.yaml

  • With the content:

name: Sync with Main Repo

on:
  push:
    branches:
    - main

env:
  REPOSITORY_NAME: ${{ github.event.repository.name }}


jobs:
  repository-sync:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout main repo
      uses: actions/checkout@v3
      with:
        repository: <your-user>/<your-main-repository>
        ref: main
        token: ${{ secrets.<your-token-secret-name> }}
        submodules: true

    - name: Pull & update submodules recursively
      run: |
        git submodule sync $REPOSITORY_NAME
        git submodule update --init --remote $REPOSITORY_NAME
        git submodule update --remote $REPOSITORY_NAME

    - name: Commit to pipeline hub
      run: |
        git config user.email "actions@github.com"
        git config user.name "GitHub Actions"
        git add --all
        git commit -m "Update submodule $REPOSITORY_NAME" || echo "No changes to commit"
        git push
Enter fullscreen mode Exit fullscreen mode

Explaning the steps:

  • Checkout main repo: This step will use the token to checkout the Main Repository

  • Pull & update submodules recursively: Once the job is on the Main Repository, it executes the git comands to update the submodules locally.

  • Commit to pipeline hub: Commits and Pushes the update.

Setup - ssh-key

To use git-sync, we will need to setup an SSH key with permissions to access the 3 repositories.

We won't be able to use GitHub Deploy Keys because they're repository specific.

You will need to use an SSH key linked to a GitHub profile with read and write permissions to the 3 repositories. You can create this SSH key following this tutorial.

Setup - git-sync

I will use the git-sync parameters available on the Apache Airflow Official Helm Chart.

These are the relevant values:

dags:
  gitSync:
    enabled: true
    repo: git@github.com:your-user/your-main-repo.git
    branch: main
    rev: HEAD
    depth: 1
    maxFailures: 0
    subPath: ""
    sshKeySecret: airflow-ssh-secret
    knownHosts: |
      github.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
      github.com ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
      github.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=

Enter fullscreen mode Exit fullscreen mode
  • dags.gitSync.sshKeySecret: You can create this Secret with the following command
kubectl create secret generic airflow-ssh-secret -n <your-airflow-namespace> --from-file=gitSshKey=path/to-your/privatekey
Enter fullscreen mode Exit fullscreen mode

Conclusion

Now you can update your Helm Release with these new values and all the DAGs from the DAGs Repositories will be available in your Airflow Release.

Top comments (0)