loading...
Cover image for Get Jupyter Notebook diff with Github Actions

Get Jupyter Notebook diff with Github Actions

canas profile image Cristobal Silva ・3 min read

Hello everyone! I'm Canas and this is my first post on DEV, hopefully it can be useful for people working in Data Science and Machine Learning :)


One of the known challenges of working with notebooks is that version control is not ideal, with few tools available that actually deal with this. This motivated me to try and lessen the burden for some engineers that work with these files using Github Actions, leveraging on an existing open source tool called ndime.

My Workflow

The general idea is to make the Github Action post a comment on the PR that contains the changes to any notebook with respect to the target branch.

We will use the following existing actions to accomplish this:

  • checkout@v2, for fetching the code
  • actions/setup-python@v1, for installing python
  • peter-evans/create-or-update-comment@v1, to create a comment on the PR with nbdiff's output.

Submission Category:

I guess this would fall in Maintainer Must-Haves, since it will provide much better context when notebooks are submitted in a shared repository (e.g., for researching or persisting experiments).

Yaml File or Link to Code

You can see a working implementation in this repository.

name: Generate notebook diff

on: ["pull_request"]

jobs:
  check-diff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0

      - name: Fetch target branch
        run: git fetch origin ${{ github.event.pull_request.base.ref }}:${{ github.event.pull_request.base.ref }}

      - name: Setup Python
        uses: actions/setup-python@v1
        with:
          python-version: "3.6"

      - name: Install requirements
        run: pip3 install nbdime

      - name: Run and store diff
        run: |
          nbdiff ${{ github.event.pull_request.base.ref }} --no-color > diff.log
          sed -i '1s/^/\`\`\`diff\n&/' diff.log
          sed -i '$s/$/\n&\`\`\`/' diff.log

      - name: Get comment body
        id: get-comment-body
        run: |
          body=$(cat diff.log)
          body="${body//'%'/'%25'}"
          body="${body//$'\n'/'%0A'}"
          body="${body//$'\r'/'%0D'}"
          echo ::set-output name=body::$body

      - name: Create comment
        uses: peter-evans/create-or-update-comment@v1
        with:
          issue-number: ${{ github.event.pull_request.number }}
          body: ${{ steps.get-comment-body.outputs.body }}

In simple terms, we use nbdiff to generate a file called diff.log. After that, we use sed to append and prepend the markdown enclosing characters. In the next step, we take diff.log and do additional replacements that ensure that the PR comment will not truncate newlines, which are then stored in the body variable. Finally, we pass the body variable to the create-or-update-comment action which will take care of posting our formatted output in the PR.

This repo is to try out a Github action that comments PRs with Jupyter Notebook diffs (vía nbdime) if available. Sample is available in the only open PR.

nbdiff.yaml

name: Generate notebook diff
on: ["pull_request"]
jobs
  check-diff
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
        with:
          fetch-depth: 0

      - name: Fetch target branch
        run: git fetch origin ${{ github.event.pull_request.base.ref }}:${{ github.event.pull_request.base.ref }}

      - name: Setup Python
        uses: actions/setup-python@v1
        with:
          python-version: "3.6"

      - name: Install requirements
        run: pip3 install nbdime

      - name: Run and store diff
        run: |
          nbdiff ${{ github.event.pull_request.base.ref }} --no-color > diff.log
          sed -i '1s/^/```diff\n&/' diff.log
          sed -i '$s/$/\n&```/' diff.log

      - name: Get comment body
        id: get-comment-body
        run: |
          body=$(cat diff.log)
          body="${body//'%'/'%25'}"
          body="${body//$'\n'/'%0A'}"
          body="${body//$'\r'/'%0D'}"
          echo ::set-output name=body::$body

      - name: Create comment
        uses: peter-evans/create-or-update-comment@v1



*Be sure to checkout to the change branch if you want to see the actual file!

Additional Resources / Info

Actions used:

Libraries used:

Possible future work:

  • Test and benchmark on large size notebooks
  • Look for a way to deploy the web version of nbdime, nbdiff-web

Posted on by:

canas profile

Cristobal Silva

@canas

Hi there, I'm a Machine Learning Engineer who spends too much time playing videogames :)

Discussion

pic
Editor guide