Posted on Jul 2, 2021

An Overview of Runflow

#datascience #python #workflow #automation

Runflow is a tool to define and run workflows. If you haven't heard of it, well, after reading this article, give it a try, you will feel how Runflow will change your workflow experience.

Getting Started

To start using Runflow, you will need a Python (>=3.7) environment.

$ python3 -mvenv venv
$ source venv/bin/activate
$ pip install -U runflow

To define a workflow, simply by creating a .hcl file. This is a file in HCL2 syntax and follow the Flow spec.

Overall speaking, HCL2 is like JSON but is more human friendly. Let's see an example:

# File: hello_world.hcl
flow "hello_world" {
  task "bash_run" "echo" {
    command = "echo hello world"
  }
}

The ".hcl" file is quite intuitive. It defines a flow named "hello_world". It has one single task named "echo" of type "bash_run". The bash run command is echo hello world.

To run such a workflow, simply by running runflow run:

$ runflow run hello_world.hcl
[2021-07-02 23:24:33,550] "task.bash_run.echo" is started.
hello world
[2021-07-02 23:24:33,561] "task.bash_run.echo" is successful.

Flow Variables

The workflow can be parametrized by providing some flow variables.

# File: hello_vars.hcl
flow "hello_vars" {
  variable "greeter" {
    default = "world"
  }
  task "bash_run" "echo" {
    command = "echo 'hello ${var.greeter}'"
  }
}

Runflow supports setting variables from CLI using option --var and --var-file.

# use default variable
$ runflow run hello_vars.hcl
[2021-07-02 23:28:01,686] "task.bash_run.echo" is started.
hello world
[2021-07-02 23:28:01,696] "task.bash_run.echo" is successful.

# use `--var`
$ runflow run hello_vars.hcl --var greeter=世界
[2021-07-02 23:28:39,351] "task.bash_run.echo" is started.
hello 世界
[2021-07-02 23:28:39,359] "task.bash_run.echo" is successful.

# use `--var-file`
$ cat vars.hcl
greeter = "WORLD"

$ runflow run hello_vars.hcl --var-file=vars.hcl
[2021-07-02 23:29:31,314] "task.bash_run.echo" is started.
hello WORLD
[2021-07-02 23:29:31,322] "task.bash_run.echo" is successful.

Task Types

There are various built-in task types you can use. You can look them up from the Runflow docs site: Runflow.org

Just name a few:

Read/write a file: "file_read", "file_write". The task type supports reading file not only from local file system, but also from a zip archive, a Git repo, a GitHub project, FTP server, etc.
Send HTTP requests: "http_request".
Run SQL commands: "sql_exec", "sql_row". The task type supports running SQL commands on SQLite, MySQL, MSSQL, PostgreSQL, etc.
Run Bash command: "bash_run".
Run Docker container: "docker_run".
Run another flow: "flow_run". This allows modularize your workflows.

Task Dependency

A task can get attributes from another flow using HCL2 interpolation syntax: ${task.TASK_TYPE.TASK_NAME.ATTRIBUTE}.

Runflow can detect the task dependency and guarantees the order of task executions, e.g. if task A uses attributes of task B, task B always gets run first.

For example, the flow below will run "task1" first despite it's defined afterward.

flow "hello-implicit-deps" {
  task "bash_run" "task2" {
    command = "echo 'hello ${task.bash_run.greeter.stdout}'"
  }

  task "bash_run" "task1" {
    command = "xxd -l16 -ps /dev/urandom"
  }
}

Conditional Trigger

Each task can have an optional _depends_on argument so the task is checked first before running. Only when all values in _depends_on list is truthy, the task will run.

For example, the below flow runs task.file_write.echo only when the given version is >= 0.6.0.


flow "conditional_trigger" {

  variable "version" {
    default = "0.6.0"
  }

  task "file_read" "read" {
    filename = "pyproject.toml"
  }

  task "file_write" "echo" {
    filename = "/dev/stdout"
    content = task.file_read.read.content

    _depends_on = [
      var.version >= "0.6.0"
    ]
  }

}

Retry

Runflow supports retry running the task in case of task execution failure. This is done through task argument _retry.

Say, I want to retry running the task for 3 attempts or within 10 seconds, and for each attempt, wait 1 seconds before retry:

task "http_request" "fetch" {
  method = "GET"
  url = var.url
  _retry = {
    stop_after = "3 times | 10 seconds"
    wait = wait_fixed(1)
  }
}

Pretty cool, right!

Conclusion

Runflow allows defining your workflow in a declarative code style. You will focus on how to get each of your task done right, instead of wasting time writing ugly DAG code like in Airflow.

Check more information on Runflow documentation: https://runflow.org and Runflow repo: https://github.com/soasme/runflow.

Happy hacking!

DEV Community

An Overview of Runflow

Getting Started

Flow Variables

Task Types

Task Dependency

Conditional Trigger

Retry

Conclusion

Top comments (0)

Read next

Part 2: Building Your Own AI - Setting Up the Environment for AI/ML Development

Deploying a MongoDB Collection Generator on Kubernetes

Automate Your Scripts with Systemd Services: Benefits and Step-by-Step Guide

From prompts to programs: Language models' unbounded computational power