DEV Community

epassaro
epassaro

Posted on • Updated on

Keep your research reproducible with conda-pack and GitHub Actions

Reproducibility is a major principle underpinning the scientific method, and scientific software is not an exception.

Anaconda is a distribution of the Python and R programming languages for scientific computing with more than 25 million users. But, how reproducible is science made with Anaconda? And most important:

Do you think you will be capable of reproducing the results your research in the next 10 years?.

Currently, the reproducibility of Anaconda environments is not guaranteed. conda list --explicit provides just some kind of short term reproducibility.

For example, if you use packages from non-standard channels, the owner could delete them at any moment. Also, the resolved URLs could vary due to changes in package labels or storage.

There is an ongoing debate about how to unify the different available tools to solve this problem. In this workflow, I propose a simple but effective way to keep your environments reproducible using GitHub Actions and conda-pack:

conda-pack is a command line tool for creating archives of conda environments that can be installed on other systems and locations. This is useful for deploying code in a consistent environment —potentially where Python and/or conda isn’t already installed.

Every time you publish a new release of your code (e.g. a paper) on GitHub, the environment is solved, packed and uploaded as an asset.

name: pack

on:
  release:
    types: [published]

env:
  BASENAME: ${{ github.event.repository.name }}-${{ github.event.release.tag_name }} 

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Setup Mambaforge
        uses: conda-incubator/setup-miniconda@v2
        with:
            miniforge-variant: Mambaforge
            miniforge-version: latest
            environment-file: environment.yml
            activate-environment: my-env
            use-mamba: true

      - name: Freeze packages
        shell: bash -l {0}
        run: conda env export -n my-env > $BASENAME.yml

      - name: Install conda-pack
        shell: bash -l {0}
        run: mamba install -c conda-forge conda-pack

      - name: Pack environment
        shell: bash -l {0}
        run: conda pack -n my-env -o $BASENAME.tar.gz

      - name: Upload assets
        uses: AButler/upload-release-assets@v2.0
        with:
          files: '${{ env.BASENAME }}.{yml,tar.gz}'
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          release-tag: ${{ github.event.release.tag_name }}
Enter fullscreen mode Exit fullscreen mode

Finally, follow the instructions to deploy an identical environment at any point in the future.

Get the code

GitHub logo epassaro / repro-conda-envs

An example repository on how to keep Anaconda environments reproducible in the long term with GitHub Actions

repro-conda-envs

An example repository on how to keep Anaconda environments reproducible in the long term with GitHub Actions

Top comments (0)