SystemsGenetics
diff --git a/‎README.md
+52-51 b/‎README.md
+52-51
diff --git a/‎kube-create-pvc.sh
+61 b/‎kube-create-pvc.sh
+61
diff --git a/‎kube-load.sh
+64 b/‎kube-load.sh
+64
diff --git a/‎kube-run.sh
-104 b/‎kube-run.sh
-104
@@ -1,96 +1,99 @@
 # kube-runner
 
-This repository provides a tool for running bioinformatics workflows as jobs on a Kubernetes cluster. This repository also contains some examples for several applications:
+This repository provides scripts for running nextflow pipelines on a Kubernetes cluster. These scripts have been tested for the following pipelines:
 
-- [GEMmaker](https://github.com/SystemsGenetics/GEMmaker)
-- [gene-oracle](https://github.com/ctargon/gene-oracle)
-- [KINC](https://github.com/SystemsGenetics/KINC)
+- [SystemsGenetics/GEMmaker](https://github.com/SystemsGenetics/GEMmaker)
+- [bentsherman/gene-oracle-nf](https://github.com/bentsherman/gene-oracle-nf)
+- [bentsherman/KINC-nf](https://github.com/bentsherman/KINC-nf)
 
 ## Dependencies
 
-You need Docker to build and push Docker images, and [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to test GPU-enabled Docker images on a local machine. To interact with a Kubernetes cluster, you need [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/).
+To get started, all you need is [nextflow](https://nextflow.io/), [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/), and access to a Kubernetes cluster (in the form of `~/.kube/config`). If you want to test Docker images on your local machine, you will also need [docker](https://docker.com/) and [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) (for GPU-enabled Docker images).
 
-## Usage
+## Configuration
 
-### Creating a Docker image
+There are a few administrative tasks which must be done in order for nextflow to be able to run properly on the Kubernetes cluster. These tasks only need to be done once, but they may require administrative access to the cluster, so you may need your system administrator to handle this part for you.
 
-Build a Docker image:
+- Nextflow needs a service account with the `edit` and `view` cluster roles:
 ```bash
-sudo docker build -t <tag> <build-directory>
+kubectl create rolebinding default-edit --clusterrole=edit --serviceaccount=<namespace>:default 
+kubectl create rolebinding default-view --clusterrole=view --serviceaccount=<namespace>:default
 ```
 
-Run a Docker container (locally):
+- Nextflow needs access to shared storage in the form of a [Persistent Volume Claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) with `ReadWriteMany` access mode. The process for provisioning a PVC depends on what types of storage is available. The `kube-create-pvc.sh` script provides an example of creating a PVC for CephFS storage, but it may not apply to your particular cluster. Consult your system administrator for assistance if necessary.
+
+## Usage
+
+First you must transfer your input data from your local machine to the cluster. You can use the `kube-load.sh` script to do this:
 ```bash
-sudo docker run [--runtime=nvidia] --rm -it <tag> <command>
+./kube-load.sh <pvc-name> <input-dir>
 ```
 
-List the Docker images on your machine:
+Then you can run the pipeline using nextflow's `kuberun` command:
 ```bash
-sudo docker images
+nextflow kuberun <pipeline>
 ```
 
-Push a Docker image to DockerHub:
+__NOTE__: If you create your own `nextflow.config` in your current directory then nextflow will use that config file instead of the default.
+
+Once the pipeline finishes successfully, you can transfer your output data from the cluster using `kube-save.sh`:
 ```bash
-sudo docker push <tag>
+./kube-save.sh <pvc-name> <output-dir>
 ```
 
-NOTE: In order to push an image to DockerHub, the image must be tagged with both a username and a repo name. For example:
+You can also use nextflow to create an interactive terminal on the cluster where you can access your PVC directly:
 ```bash
-sudo docker tag a88adcfb02de systemsgenetics/gemmaker:latest
-sudo docker push systemsgenetics/gemmaker:latest
+nextflow kuberun login
 ```
 
-### Running a Job on a Kubernetes cluster
+Consult the [Nextflow Kubernetes documentation](https://www.nextflow.io/docs/latest/kubernetes.html) for more information.
 
-Once you install `kubectl`, you must save a configuration to `~/.kube/config`. For example, if you are using [Nautilus](https://nautilus.optiputer.net/) you can download the config file from the Nautilus dashboard by selecting "Get config". Note that authentication tokens for the NRP expire so you will need to download a new config file periodically.
+## Appendix
 
-Test your Kubernetes configuration:
+### Working with Docker images
+
+__NOTE__: Generally speaking, Docker requires admin privileges in order to run. On Linux, for example, you may need to run Docker commands with `sudo`.
+
+Build a Docker image:
 ```bash
-kubectl config view
+docker build -t <tag> <build-directory>
 ```
 
-Before you run a job, create a directory with the following:
-- A script named `command.sh` that you want to run on each container
-- Any input data files that are to be copied to each container
-
-The script `kube-run.sh` can automatically run a Docker image by (1) creating a job configuration, (2) creating the job, (3) copying input files to each container in the job, (4) executing the command script on each container, and (5) copying output files from each container. You must provide the following:
-- the job name
-- the image you want to run
-- the number of work items
-- the path to your input directory
-- the path to your output directory
+Run a Docker container:
+```bash
+docker run [--runtime=nvidia] --rm -it <tag> <command>
+```
 
-Run a job:
+List the Docker images on your machine:
 ```bash
-./kube-run.sh <job-name> <image-name> <job-size> <input-dir> <output-dir>
+docker images
 ```
 
-Additionally, you can use the `nodeSelector` property in the job configuration file to select specific nodes by their properties, for example:
-```yaml
-nodeSelector:
-  disktype: ssd
+Push a Docker image to Docker Hub:
+```bash
+docker push <tag>
 ```
 
-Note that labels are arbitrary and will vary for a given Kubernetes cluster. To see how labels are assigned to nodes on your cluster:
+Remove old Docker data:
 ```bash
-kubectl get nodes --show-labels
+docker system prune
 ```
 
-### Additional Commands
+### Interacting with a Kubernetes cluster
 
-Check the status of your jobs:
+Test your Kubernetes configuration:
 ```bash
-kubectl get jobs
+kubectl config view
 ```
 
-Check the status of your pods:
+View the physical nodes on your cluster:
 ```bash
-kubectl get pods -o wide
+kubectl get nodes --show-labels
 ```
 
-Get information on a job:
+Check the status of your pods:
 ```bash
-kubectl describe job <job-name>
+kubectl get pods -o wide
 ```
 
 Get information on a pod:
@@ -103,9 +106,7 @@ Get an interactive shell into a pod:
 kubectl exec -it <pod-name> -- bash
 ```
 
-Delete a job:
+Delete a pod:
 ```bash
-kubectl delete job <job-name>
+kubectl delete pod <pod-name>
 ```
-
-__Always delete jobs/pods that are finished to return their resources to the cluster.__
 
@@ -0,0 +1,61 @@
+#!/bin/bash
+# Create a Persistent Volume Claim on a Kubernetes cluster.
+
+# parse command-line arguments
+if [[ $# != 1 ]]; then
+	echo "usage: $0 <pvc-name>"
+	exit -1
+fi
+
+PVC_NAME="$1"
+PVC_FILE="pvc.yaml"
+NAMESPACE="deepgtex-prp"
+STORAGE="1TiB"
+
+# create PV claim
+cat > ${PVC_FILE} <<EOF
+kind: PersistentVolume
+apiVersion: v1
+metadata:
+  name: ${PVC_NAME}-volume
+spec:
+  storageClassName: manual
+  capacity:
+    storage: ${STORAGE}
+  accessModes:
+    - ReadWriteMany
+  flexVolume:
+    driver: ceph.rook.io/rook
+    fsType: ceph
+    options:
+      clusterNamespace: rook
+      fsName: nautilusfs
+      path: /${NAMESPACE}
+      mountUser: ${NAMESPACE}
+      mountSecret: ceph-fs-secret
+---
+kind: PersistentVolumeClaim
+apiVersion: v1
+metadata:
+  name: ${PVC_NAME}
+spec:
+  volumeName: ${PVC_NAME}-volume
+  storageClassName: manual
+  accessModes:
+    - ReadWriteMany
+  resources:
+    requests:
+      storage: ${STORAGE}
+EOF
+
+kubectl create -f ${PVC_FILE}
+
+# display PV claim
+kubectl get pvc
+
+# delete PV claim
+# kubectl delete -f ${PVC_FILE}
+# rm -f ${PVC_FILE}
+
+# create secret for cephfs shared filesystem
+# kubectl create secret -n <namespace> generic ceph-fs-secret --from-literal=key=<secret-key>
@@ -0,0 +1,64 @@
+#!/bin/bash
+# Load input data to a Persistent Volume on a Kubernetes cluster.
+
+# parse command-line arguments
+if [[ $# != 2 ]]; then
+	echo "usage: $0 <pvc-name> <local-path>"
+	exit -1
+fi
+
+PVC_NAME="$1"
+PVC_PATH="$PWD"
+POD_FILE="pod.yaml"
+POD_NAME="data-loader"
+LOCAL_PATH="$(realpath $2)"
+
+# create pod config file
+cat > $POD_FILE <<EOF
+apiVersion: v1
+kind: Pod
+metadata:
+  name: $POD_NAME
+spec:
+  containers:
+  - name: $POD_NAME
+    image: ubuntu
+    args: ["sleep", "infinity"]
+    volumeMounts:
+    - mountPath: $PVC_PATH
+      name: $PVC_NAME
+  restartPolicy: Never
+  volumes:
+    - name: $PVC_NAME
+      persistentVolumeClaim:
+        claimName: $PVC_NAME
+EOF
+
+echo
+cat $POD_FILE
+echo
+
+# create pod
+echo
+kubectl create -f $POD_FILE
+echo
+
+# wait for pod to initialize
+POD_STATUS=""
+
+while [[ $POD_STATUS != "Running" ]]; do
+	echo "Waiting for pod to initialize...$POD_STATUS"
+	sleep 1
+	POD_STATUS="$(kubectl get pods --no-headers $POD_NAME | awk '{ print $3 }')"
+	POD_STATUS="$(echo $POD_STATUS)"
+done
+
+# copy input data to pod
+echo "Copying data..."
+echo
+kubectl cp "$LOCAL_PATH" "$POD_NAME:$PVC_PATH/$(basename $LOCAL_PATH)"
+echo
+
+# delete pod
+kubectl delete -f $POD_FILE
+rm -f $POD_FILE