DEV Community

Jorge Eψ=Ĥψ
Jorge Eψ=Ĥψ

Posted on • Originally published at jorge.aguilera.soy on

Executing Nextflow pipelines with K3d

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.

In this post I’ll explore how to create a "work environment" where run Nextflow pipelines using Kubernetes.

The user will be able to create and edit the pipeline, configuration and assets into their computer and run the pipelines in the cluster in a fluent way.

The idea is to provide to the user the most complete environment in their computer so, once tested and validated, it will run in a real cluster.

Problem

When you’re deploying Nextflow’s pipelines in kubernetes you need to find a way to share the workdir. Options are basically use volumes or use fusion

Fusion is very easy to use (basically enable=true in the nextflow.config) but you create an external dependency.

Volumes are more "native" solution, but you need to fight with the infrastructure, providers, etc.

Another challenge working with pipelines in kubernetes is to retrieve outputs once the pipeline is completed. Probably you need to run some kubectl cp commands

In this post I’ll create a cluster (with only 1 node) from scratch and run some pipelines on it. We’ll see how pods are created and how we can edit the pipeline and/or configuration using our preferred IDE (notepad, vi, VSCode,…​)

Requirements

  • a computer (and internet connection of course)

We need to have installed following command line tools:

Create a cluster

k3d cluster create nextflow --port 9999:80@loadbalancer

We’re creating a new cluster called nextflow (can be whatever). We’ll use 9999 port to access to our results

kubectl cluster-info
Kubernetes control plane is running at https://0.0.0.0:40145
CoreDNS is running at https://0.0.0.0:40145/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://0.0.0.0:40145/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy
Enter fullscreen mode Exit fullscreen mode

Preparing our environment

Create a folder test

Nextflow area

Create a subfolder project (will be our nextflow working area)

Create a nextflow.config in this subfolder

k8s {
   context = 'k3d-nextflow' (1)

   namespace = 'default' (2)

   runAsUser = 0
   serviceAccount = 'nextflow-sa'
   storageClaimName = 'nextflow'
   storageMountPath = '/mnt/workdir'
}

process {
   executor = 'k8s'
   container = "quay.io/nextflow/rnaseq-nf:v1.2.1"
}
Enter fullscreen mode Exit fullscreen mode

| 1 | k3d-nextflow was created by k3d. If you chose another name you need to change it |
| 2 | I’ll use the default namespace |

K8s area

Create a subfolder k8s and create following files into it:

pvc.yml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nextflow
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 2Gi
Enter fullscreen mode Exit fullscreen mode

admin.yml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nextflow-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: nextflow-role
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/status", "pods/log", "pods/exec"]
    verbs: ["get", "list", "watch", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: nextflow-rolebind
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: nextflow-role
subjects:
  - kind: ServiceAccount
    name: nextflow-sa
Enter fullscreen mode Exit fullscreen mode

jagedn.yml

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jagedn (1)
  labels:
    app: jagedn
spec:
  selector:
    matchLabels:
      app: jagedn
  template:
    metadata:
      labels:
        app: jagedn
    spec:
      serviceAccountName: nextflow-sa
      terminationGracePeriodSeconds: 5
      securityContext:
        fsGroup: 0
        runAsGroup: 0
        runAsNonRoot: false
        runAsUser: 0
      containers:
      - name: nextflow
        image: jagedn
        volumeMounts:
        - mountPath: /mnt/workdir
          name: volume
      - name: nginx-container
        image: nginx:latest
        ports:
        - containerPort: 80
        volumeMounts:
          - name: volume
            mountPath: /usr/share/nginx/html
      volumes:
      - name: volume
        persistentVolumeClaim:
          claimName: nextflow
Enter fullscreen mode Exit fullscreen mode

| 1 | jagedn is my nick, you can use whatever but pay attention to replace in all places |

kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
 - pvc.yml
 - admin.yml
 - jagedn.yaml
Enter fullscreen mode Exit fullscreen mode

Basically we’re creating:

  • service account and their roles

  • a persistent volume claim to share across pods

  • a pod

Skaffold

In the "parent" folder create following files:

Dockerfile

FROM nextflow/nextflow:24.03.0-edge
RUN yum install -y tar (1)

ADD project /home/project (2)

ENTRYPOINT ["tail", "-f", "/dev/null"]
Enter fullscreen mode Exit fullscreen mode

| 1 | Required by skaffold to sync files |
| 2 | Required by skaffold to sync files |

skaffold.yaml

apiVersion: skaffold/v4beta10
kind: Config
metadata:
    name: nextflow

build:
    artifacts:
        - image: jagedn
          context: .
          docker:
            dockerfile: Dockerfile
          sync:
            manual:
                - src: 'project/**'
                  dest: /home

manifests:
    kustomize:
        paths:
            - k8s

deploy:
    kubectl: {}
Enter fullscreen mode Exit fullscreen mode

Watching

Only for the purpose to watch how pods are created and destroyed we’ll run in a terminal console k9s

k9s

Go

Open the project with VSCode (for example)

vscode

Open a terminal tab and execute:

skaffold dev

skaffold

let the terminal running

In another terminal console execute:

kubectl exec -it jagedn-56b9fb64dc-2xw8f — /bin/bash

WARNING

You’ll need to use the pod id skaffold has created

and cd to /home/project

nextflow k3d 1

INFO

Using you VCode editor open nextflow.config and change something (a comment for example). Save the change and in the terminal run a cat command to verify skaffold has been synced the file

nextflow k3d 2

Run

In the terminal execute

NXF_ASSETS=/mnt/workdir/assets nextflow run nextflow-io/rnaseq-nf -with-docker -w /mnt/workdir -cache false

If you change to the k9s console, you’ll see how pods are created

k9s 2

After a while the pipeline is completed !!!!

nextflow k3d 3

Extra ball

If you inspect the folder /home/project/results you’ll find the outputs of the pipeline, so …​ how we can inspect them?

Execute cp -R results/ /mnt/workdir/ in the kubectl terminal and open a browser to http://localhost:9999/results/multiqc_report.html

tadam

INFO

A better approach is to create another volume for the result so nginx sidecar pod can read directly

Clean up

Once you want to end with your cluster only finish (ctrl+c) the skaffold session and it will remove all resources

To delete the cluster you can run k3d cluster delete nextflow

Conclusion

Don’t know if this approach is the best or maybe a little complicate but I think can be a good approach to have a very productive kubernetes environment without the need to have a full cluster

Top comments (0)