Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages.
In this post I’ll explore how to create a "work environment" where run Nextflow pipelines using Kubernetes.
The user will be able to create and edit the pipeline, configuration and assets into their computer and run the pipelines in the cluster in a fluent way.
The idea is to provide to the user the most complete environment in their computer so, once tested and validated, it will run in a real cluster.
Problem
When you’re deploying Nextflow’s pipelines in kubernetes you need to find a way to share the workdir. Options are basically use volumes
or use fusion
Fusion is very easy to use (basically enable=true
in the nextflow.config) but you create an external dependency.
Volumes are more "native" solution, but you need to fight with the infrastructure, providers, etc.
Another challenge working with pipelines in kubernetes is to retrieve outputs once the pipeline is completed. Probably you need to run some kubectl cp
commands
In this post I’ll create a cluster (with only 1 node) from scratch and run some pipelines on it. We’ll see how pods are created and how we can edit the pipeline and/or configuration using our preferred IDE (notepad, vi, VSCode,…)
Requirements
- a computer (and internet connection of course)
We need to have installed following command line tools:
kubectl (if you work in kubernetes you’ve it)
skaffold https://skaffold.dev/docs/install/
k9s. Is not required but very useful
Create a cluster
k3d cluster create nextflow --port 9999:80@loadbalancer
We’re creating a new cluster called nextflow
(can be whatever). We’ll use 9999 port to access to our results
kubectl cluster-info
Kubernetes control plane is running at https://0.0.0.0:40145
CoreDNS is running at https://0.0.0.0:40145/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
Metrics-server is running at https://0.0.0.0:40145/api/v1/namespaces/kube-system/services/https:metrics-server:https/proxy
Preparing our environment
Create a folder test
Nextflow area
Create a subfolder project
(will be our nextflow working area)
Create a nextflow.config
in this subfolder
k8s {
context = 'k3d-nextflow' (1)
namespace = 'default' (2)
runAsUser = 0
serviceAccount = 'nextflow-sa'
storageClaimName = 'nextflow'
storageMountPath = '/mnt/workdir'
}
process {
executor = 'k8s'
container = "quay.io/nextflow/rnaseq-nf:v1.2.1"
}
| 1 | k3d-nextflow
was created by k3d. If you chose another name you need to change it |
| 2 | I’ll use the default
namespace |
K8s area
Create a subfolder k8s
and create following files into it:
pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextflow
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 2Gi
admin.yml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nextflow-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: nextflow-role
rules:
- apiGroups: [""]
resources: ["pods", "pods/status", "pods/log", "pods/exec"]
verbs: ["get", "list", "watch", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: nextflow-rolebind
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: nextflow-role
subjects:
- kind: ServiceAccount
name: nextflow-sa
jagedn.yml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: jagedn (1)
labels:
app: jagedn
spec:
selector:
matchLabels:
app: jagedn
template:
metadata:
labels:
app: jagedn
spec:
serviceAccountName: nextflow-sa
terminationGracePeriodSeconds: 5
securityContext:
fsGroup: 0
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
containers:
- name: nextflow
image: jagedn
volumeMounts:
- mountPath: /mnt/workdir
name: volume
- name: nginx-container
image: nginx:latest
ports:
- containerPort: 80
volumeMounts:
- name: volume
mountPath: /usr/share/nginx/html
volumes:
- name: volume
persistentVolumeClaim:
claimName: nextflow
| 1 | jagedn
is my nick, you can use whatever but pay attention to replace in all places |
kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- pvc.yml
- admin.yml
- jagedn.yaml
Basically we’re creating:
service account and their roles
a persistent volume claim to share across pods
a pod
Skaffold
In the "parent" folder create following files:
Dockerfile
FROM nextflow/nextflow:24.03.0-edge
RUN yum install -y tar (1)
ADD project /home/project (2)
ENTRYPOINT ["tail", "-f", "/dev/null"]
| 1 | Required by skaffold to sync files |
| 2 | Required by skaffold to sync files |
skaffold.yaml
apiVersion: skaffold/v4beta10
kind: Config
metadata:
name: nextflow
build:
artifacts:
- image: jagedn
context: .
docker:
dockerfile: Dockerfile
sync:
manual:
- src: 'project/**'
dest: /home
manifests:
kustomize:
paths:
- k8s
deploy:
kubectl: {}
Watching
Only for the purpose to watch how pods are created and destroyed we’ll run in a terminal console k9s
Go
Open the project with VSCode (for example)
Open a terminal
tab and execute:
skaffold dev
let the terminal running
In another terminal console execute:
kubectl exec -it jagedn-56b9fb64dc-2xw8f — /bin/bash
- WARNING
-
You’ll need to use the pod id skaffold has created
and cd to /home/project
- INFO
-
Using you VCode editor open
nextflow.config
and change something (a comment for example). Save the change and in the terminal run acat
command to verify skaffold has been synced the file
Run
In the terminal execute
NXF_ASSETS=/mnt/workdir/assets nextflow run nextflow-io/rnaseq-nf -with-docker -w /mnt/workdir -cache false
If you change to the k9s
console, you’ll see how pods are created
After a while the pipeline is completed !!!!
Extra ball
If you inspect the folder /home/project/results
you’ll find the outputs of the pipeline, so … how we can inspect them?
Execute cp -R results/ /mnt/workdir/
in the kubectl terminal and open a browser to http://localhost:9999/results/multiqc_report.html
- INFO
-
A better approach is to create another volume for the result so nginx sidecar pod can read directly
Clean up
Once you want to end with your cluster only finish (ctrl+c) the skaffold session and it will remove all resources
To delete the cluster you can run k3d cluster delete nextflow
Conclusion
Don’t know if this approach is the best or maybe a little complicate but I think can be a good approach to have a very productive kubernetes environment without the need to have a full cluster
Top comments (0)