Skip to content

Commit f286f5d

Browse files
committed
Removed kube-run.sh, added new scripts for running nextflow pipelines, updated README.md
1 parent 2d90712 commit f286f5d

File tree

5 files changed

+242
-155
lines changed

5 files changed

+242
-155
lines changed

README.md

+52-51
Original file line numberDiff line numberDiff line change
@@ -1,96 +1,99 @@
11
# kube-runner
22

3-
This repository provides a tool for running bioinformatics workflows as jobs on a Kubernetes cluster. This repository also contains some examples for several applications:
3+
This repository provides scripts for running nextflow pipelines on a Kubernetes cluster. These scripts have been tested for the following pipelines:
44

5-
- [GEMmaker](https://github.com/SystemsGenetics/GEMmaker)
6-
- [gene-oracle](https://github.com/ctargon/gene-oracle)
7-
- [KINC](https://github.com/SystemsGenetics/KINC)
5+
- [SystemsGenetics/GEMmaker](https://github.com/SystemsGenetics/GEMmaker)
6+
- [bentsherman/gene-oracle-nf](https://github.com/bentsherman/gene-oracle-nf)
7+
- [bentsherman/KINC-nf](https://github.com/bentsherman/KINC-nf)
88

99
## Dependencies
1010

11-
You need Docker to build and push Docker images, and [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to test GPU-enabled Docker images on a local machine. To interact with a Kubernetes cluster, you need [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
11+
To get started, all you need is [nextflow](https://nextflow.io/), [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/), and access to a Kubernetes cluster (in the form of `~/.kube/config`). If you want to test Docker images on your local machine, you will also need [docker](https://docker.com/) and [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) (for GPU-enabled Docker images).
1212

13-
## Usage
13+
## Configuration
1414

15-
### Creating a Docker image
15+
There are a few administrative tasks which must be done in order for nextflow to be able to run properly on the Kubernetes cluster. These tasks only need to be done once, but they may require administrative access to the cluster, so you may need your system administrator to handle this part for you.
1616

17-
Build a Docker image:
17+
- Nextflow needs a service account with the `edit` and `view` cluster roles:
1818
```bash
19-
sudo docker build -t <tag> <build-directory>
19+
kubectl create rolebinding default-edit --clusterrole=edit --serviceaccount=<namespace>:default
20+
kubectl create rolebinding default-view --clusterrole=view --serviceaccount=<namespace>:default
2021
```
2122

22-
Run a Docker container (locally):
23+
- Nextflow needs access to shared storage in the form of a [Persistent Volume Claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) with `ReadWriteMany` access mode. The process for provisioning a PVC depends on what types of storage is available. The `kube-create-pvc.sh` script provides an example of creating a PVC for CephFS storage, but it may not apply to your particular cluster. Consult your system administrator for assistance if necessary.
24+
25+
## Usage
26+
27+
First you must transfer your input data from your local machine to the cluster. You can use the `kube-load.sh` script to do this:
2328
```bash
24-
sudo docker run [--runtime=nvidia] --rm -it <tag> <command>
29+
./kube-load.sh <pvc-name> <input-dir>
2530
```
2631

27-
List the Docker images on your machine:
32+
Then you can run the pipeline using nextflow's `kuberun` command:
2833
```bash
29-
sudo docker images
34+
nextflow kuberun <pipeline>
3035
```
3136

32-
Push a Docker image to DockerHub:
37+
__NOTE__: If you create your own `nextflow.config` in your current directory then nextflow will use that config file instead of the default.
38+
39+
Once the pipeline finishes successfully, you can transfer your output data from the cluster using `kube-save.sh`:
3340
```bash
34-
sudo docker push <tag>
41+
./kube-save.sh <pvc-name> <output-dir>
3542
```
3643

37-
NOTE: In order to push an image to DockerHub, the image must be tagged with both a username and a repo name. For example:
44+
You can also use nextflow to create an interactive terminal on the cluster where you can access your PVC directly:
3845
```bash
39-
sudo docker tag a88adcfb02de systemsgenetics/gemmaker:latest
40-
sudo docker push systemsgenetics/gemmaker:latest
46+
nextflow kuberun login
4147
```
4248

43-
### Running a Job on a Kubernetes cluster
49+
Consult the [Nextflow Kubernetes documentation](https://www.nextflow.io/docs/latest/kubernetes.html) for more information.
4450

45-
Once you install `kubectl`, you must save a configuration to `~/.kube/config`. For example, if you are using [Nautilus](https://nautilus.optiputer.net/) you can download the config file from the Nautilus dashboard by selecting "Get config". Note that authentication tokens for the NRP expire so you will need to download a new config file periodically.
51+
## Appendix
4652

47-
Test your Kubernetes configuration:
53+
### Working with Docker images
54+
55+
__NOTE__: Generally speaking, Docker requires admin privileges in order to run. On Linux, for example, you may need to run Docker commands with `sudo`.
56+
57+
Build a Docker image:
4858
```bash
49-
kubectl config view
59+
docker build -t <tag> <build-directory>
5060
```
5161

52-
Before you run a job, create a directory with the following:
53-
- A script named `command.sh` that you want to run on each container
54-
- Any input data files that are to be copied to each container
55-
56-
The script `kube-run.sh` can automatically run a Docker image by (1) creating a job configuration, (2) creating the job, (3) copying input files to each container in the job, (4) executing the command script on each container, and (5) copying output files from each container. You must provide the following:
57-
- the job name
58-
- the image you want to run
59-
- the number of work items
60-
- the path to your input directory
61-
- the path to your output directory
62+
Run a Docker container:
63+
```bash
64+
docker run [--runtime=nvidia] --rm -it <tag> <command>
65+
```
6266

63-
Run a job:
67+
List the Docker images on your machine:
6468
```bash
65-
./kube-run.sh <job-name> <image-name> <job-size> <input-dir> <output-dir>
69+
docker images
6670
```
6771

68-
Additionally, you can use the `nodeSelector` property in the job configuration file to select specific nodes by their properties, for example:
69-
```yaml
70-
nodeSelector:
71-
disktype: ssd
72+
Push a Docker image to Docker Hub:
73+
```bash
74+
docker push <tag>
7275
```
7376

74-
Note that labels are arbitrary and will vary for a given Kubernetes cluster. To see how labels are assigned to nodes on your cluster:
77+
Remove old Docker data:
7578
```bash
76-
kubectl get nodes --show-labels
79+
docker system prune
7780
```
7881

79-
### Additional Commands
82+
### Interacting with a Kubernetes cluster
8083

81-
Check the status of your jobs:
84+
Test your Kubernetes configuration:
8285
```bash
83-
kubectl get jobs
86+
kubectl config view
8487
```
8588

86-
Check the status of your pods:
89+
View the physical nodes on your cluster:
8790
```bash
88-
kubectl get pods -o wide
91+
kubectl get nodes --show-labels
8992
```
9093

91-
Get information on a job:
94+
Check the status of your pods:
9295
```bash
93-
kubectl describe job <job-name>
96+
kubectl get pods -o wide
9497
```
9598

9699
Get information on a pod:
@@ -103,9 +106,7 @@ Get an interactive shell into a pod:
103106
kubectl exec -it <pod-name> -- bash
104107
```
105108

106-
Delete a job:
109+
Delete a pod:
107110
```bash
108-
kubectl delete job <job-name>
111+
kubectl delete pod <pod-name>
109112
```
110-
111-
__Always delete jobs/pods that are finished to return their resources to the cluster.__

kube-create-pvc.sh

+61
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/bin/bash
2+
# Create a Persistent Volume Claim on a Kubernetes cluster.
3+
4+
# parse command-line arguments
5+
if [[ $# != 1 ]]; then
6+
echo "usage: $0 <pvc-name>"
7+
exit -1
8+
fi
9+
10+
PVC_NAME="$1"
11+
PVC_FILE="pvc.yaml"
12+
NAMESPACE="deepgtex-prp"
13+
STORAGE="1TiB"
14+
15+
# create PV claim
16+
cat > ${PVC_FILE} <<EOF
17+
kind: PersistentVolume
18+
apiVersion: v1
19+
metadata:
20+
name: ${PVC_NAME}-volume
21+
spec:
22+
storageClassName: manual
23+
capacity:
24+
storage: ${STORAGE}
25+
accessModes:
26+
- ReadWriteMany
27+
flexVolume:
28+
driver: ceph.rook.io/rook
29+
fsType: ceph
30+
options:
31+
clusterNamespace: rook
32+
fsName: nautilusfs
33+
path: /${NAMESPACE}
34+
mountUser: ${NAMESPACE}
35+
mountSecret: ceph-fs-secret
36+
---
37+
kind: PersistentVolumeClaim
38+
apiVersion: v1
39+
metadata:
40+
name: ${PVC_NAME}
41+
spec:
42+
volumeName: ${PVC_NAME}-volume
43+
storageClassName: manual
44+
accessModes:
45+
- ReadWriteMany
46+
resources:
47+
requests:
48+
storage: ${STORAGE}
49+
EOF
50+
51+
kubectl create -f ${PVC_FILE}
52+
53+
# display PV claim
54+
kubectl get pvc
55+
56+
# delete PV claim
57+
# kubectl delete -f ${PVC_FILE}
58+
# rm -f ${PVC_FILE}
59+
60+
# create secret for cephfs shared filesystem
61+
# kubectl create secret -n <namespace> generic ceph-fs-secret --from-literal=key=<secret-key>

kube-load.sh

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
#!/bin/bash
2+
# Load input data to a Persistent Volume on a Kubernetes cluster.
3+
4+
# parse command-line arguments
5+
if [[ $# != 2 ]]; then
6+
echo "usage: $0 <pvc-name> <local-path>"
7+
exit -1
8+
fi
9+
10+
PVC_NAME="$1"
11+
PVC_PATH="$PWD"
12+
POD_FILE="pod.yaml"
13+
POD_NAME="data-loader"
14+
LOCAL_PATH="$(realpath $2)"
15+
16+
# create pod config file
17+
cat > $POD_FILE <<EOF
18+
apiVersion: v1
19+
kind: Pod
20+
metadata:
21+
name: $POD_NAME
22+
spec:
23+
containers:
24+
- name: $POD_NAME
25+
image: ubuntu
26+
args: ["sleep", "infinity"]
27+
volumeMounts:
28+
- mountPath: $PVC_PATH
29+
name: $PVC_NAME
30+
restartPolicy: Never
31+
volumes:
32+
- name: $PVC_NAME
33+
persistentVolumeClaim:
34+
claimName: $PVC_NAME
35+
EOF
36+
37+
echo
38+
cat $POD_FILE
39+
echo
40+
41+
# create pod
42+
echo
43+
kubectl create -f $POD_FILE
44+
echo
45+
46+
# wait for pod to initialize
47+
POD_STATUS=""
48+
49+
while [[ $POD_STATUS != "Running" ]]; do
50+
echo "Waiting for pod to initialize...$POD_STATUS"
51+
sleep 1
52+
POD_STATUS="$(kubectl get pods --no-headers $POD_NAME | awk '{ print $3 }')"
53+
POD_STATUS="$(echo $POD_STATUS)"
54+
done
55+
56+
# copy input data to pod
57+
echo "Copying data..."
58+
echo
59+
kubectl cp "$LOCAL_PATH" "$POD_NAME:$PVC_PATH/$(basename $LOCAL_PATH)"
60+
echo
61+
62+
# delete pod
63+
kubectl delete -f $POD_FILE
64+
rm -f $POD_FILE

kube-run.sh

-104
This file was deleted.

0 commit comments

Comments
 (0)