Skip to content
Draft
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
8f4469e
Initial sketch for the mriqc/fmriprep singularity based workflow
yarikoptic Jul 18, 2019
de0442f
DOC: added a few comments
yarikoptic Jul 18, 2019
a18521d
DOC: note on execution of mriqc
yarikoptic Jul 18, 2019
92388ed
ENH: make possible to quickly switch from reproman to datalad + some …
yarikoptic Aug 6, 2019
1588f76
ENH: text2git for mriqc output
yarikoptic Aug 6, 2019
5b95ded
BF: make get_participants_ids work + set -x for debugging
yarikoptic Aug 15, 2019
79c3a99
another perspective one on kwyk
yarikoptic Aug 21, 2019
57b6b6c
ENH/RF: shellcheck, group common bids-app logic into run_bids_app, ex…
yarikoptic Oct 10, 2019
fedb8b0
moved datalad install containers before working in containers subdir
chaselgrove Dec 12, 2019
dc5dc0b
Merge branch 'master' into doc-usecases
chaselgrove Dec 17, 2019
76450af
RF: compose proper call for fmriprep, inline querying participant labels
yarikoptic Dec 19, 2019
850b36f
RF: reordered commands so settings come first and then all the actions
yarikoptic Dec 19, 2019
7110ec6
BF+RF: improve handling of fs license, more TODO comments (seems to w…
yarikoptic Dec 19, 2019
a81e457
ENH: do not create bids app results dataset if directory exists already
yarikoptic Dec 20, 2019
d5e9028
defaulting RM_RESOURCE and RM_SUB to local but allowing overrides
chaselgrove Jan 3, 2020
924980c
Merge remote-tracking branch 'origin/master' into doc-usecases
yarikoptic Jan 8, 2020
d05430d
Merge branch 'doc-usecases' of https://github.com/yarikoptic/ReproNim…
yarikoptic Jan 8, 2020
5ab5a8b
clean the containers repo after freezing versions
chaselgrove Jan 14, 2020
7ae46ab
Revert "clean the containers repo after freezing versions"
chaselgrove Jan 28, 2020
a4af6ba
Merge branch 'master' into doc-usecases
chaselgrove Apr 27, 2020
5a2f0f3
Fix runscript regexp to work on Mac OS
chaselgrove Apr 29, 2020
b70144e
ENH: always set -x, add env vars to not require patching for FS licen…
yarikoptic May 21, 2020
82df248
Comments on which images must be prepropulated in containers repo and…
yarikoptic May 22, 2020
6009dd0
ENH: Script for reproducible rerun of the demo script
yarikoptic May 22, 2020
295d9d2
kyle1-ps4 setup
yarikoptic May 25, 2020
fc91ec2
Merge remote-tracking branch 'origin/master' into doc-usecases
yarikoptic May 25, 2020
8905cc4
BF: Fix failure on unset BIDS_APP with set -u
chaselgrove May 26, 2020
cffa3a3
BF: Add mac workaround in get_participant_ids
chaselgrove May 27, 2020
85a41dc
ENH: Update parallel install message for mac users
chaselgrove May 27, 2020
baab991
BF: export PS1 within -reproduce.sh
yarikoptic May 27, 2020
75fa815
ENH: Use temporary HOME, cp .gitconfig and .freesurfer-license, confi…
yarikoptic May 27, 2020
7920234
reproman-master setup for -reproduce and min datalad 0.12.7
yarikoptic May 27, 2020
29500a7
master reproman now has [datalad] installation target
yarikoptic May 28, 2020
3bf4e38
Merge remote-tracking branch 'origin/master' (needs datalad 0.13.0rc1…
yarikoptic May 28, 2020
d895cc4
ENH: add containers/licenses into --input, specify data/bids explicit…
yarikoptic May 28, 2020
e28d011
DOC: note that datalad runner group analysis probably does nothing
yarikoptic May 28, 2020
795eed8
ENH: point to subject specific input data
yarikoptic May 28, 2020
52c19fc
Merge remote-tracking branch 'yarik/doc-usecases' into doc-usecases
chaselgrove Jun 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/usecases/bids-fmriprep-workflow-NP-reproduce.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/bin/bash

set -eu

export PS4="> "

set -x
setup=${1:-pip}
cd "$(mktemp -d ${TMPDIR:-/tmp}/rm-XXXXXXX)"

trap "echo Finished for setup=$setup under PWD=`pwd`" SIGINT SIGHUP SIGABRT EXIT

py=3
d=venv$py;
(
virtualenv --python=python$py --system-site-packages $d
) 2>&1 | tee venv-setup.log

source "$d/bin/activate" # should be outside of () to take effect

(
case "$setup" in
kyle1)
# Kyle's setup from https://github.com/ReproNim/reproman/issues/511#issuecomment-632776223
pip install git+http://github.com/datalad/datalad@53765be03838ee8b07d4b44a2a27bbbe259fe160
# This one seems to be for older datalad
pip install git+http://github.com/ReproNim/reproman@a9c9842302cad707bbdaf56fa4050fe0136ffe23
# with unbuffered io:
#pip install git+http://github.com/ReproNim/reproman@4f05f3aa96c7ab550aa218d5de705ea3cfe5f600
;;
kyle1-ps4)
# Like above but for reproman have #513 merged for PS4 details
pip install git+http://github.com/datalad/datalad@53765be03838ee8b07d4b44a2a27bbbe259fe160
pip install git+http://github.com/ReproNim/reproman@setup-kyle1-ps4
;;
debug1) # the "default
# Current master of datalad
pip install git+http://github.com/datalad/[email protected]
# ReproMan PR https://github.com/ReproNim/reproman/pull/506 with support of datalad master
pip install git+http://github.com/kyleam/[email protected]
;;
pip) # should be our target -- install via pip everything and it must be working
pip install datalad reproman;;
*)
echo "Unknown setup $setup" >&2
exit 1
;;
esac

# in either of the cases default datalad-container should be ok
pip install datalad-container

# Actual script to run from the current state of the PR
# https://github.com/ReproNim/reproman/pull/438
wget https://raw.githubusercontent.com/ReproNim/reproman/b70144e993660c271831e4ea8d2f4bb436bb7eeb/docs/usecases/bids-fmriprep-workflow-NP.sh
) 2>&1 | tee install.log

(
BIDS_APPS=mriqc FS_LICENSE=bogus RM_ORC=datalad-pair bash ./bids-fmriprep-workflow-NP.sh output
) 2>&1 | tee run.log
288 changes: 288 additions & 0 deletions docs/usecases/bids-fmriprep-workflow-NP.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
#!/bin/bash
#emacs: -*- mode: shell-script; c-basic-offset: 4; tab-width: 4; indent-tabs-mode: nil -*-
#ex: set sts=4 ts=4 sw=4 et:
#
# This script is intended to demonstrate a sample workflow on a BIDS
# dataset using mriqc, fmriprep, and custom analysis pipeline, mimicing the
# steps presented in an fmriprep paper currently under review but using
# DataLad, ReproNim/containers, and ReproNim.
#
# COPYRIGHT: Yaroslav Halchenko 2019
#
# LICENSE: MIT
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#
# Description
#
# Environment variables
# - RUNNER - datalad or reproman (default: reproman)
# - Options to reproman run invocation
# - RM_ORC - orchestrator to use (default: datalad-pair-run)
# - RM_RESOURCE - resource to use (default: local)
# - RM_SUBMITTED - submitter to use (default: local)
# - BIDS_APPS - if set -- ,-separated list of apps to consider (out of
# mriqc and fmriprep ATM)
# - FS_LICENSE - filename or content of the license for freesurfer
# - CONTAINERS_REPO - an alternative (could be local) location for containers
# repository.
# Make sure that you have got the images for specific versions we freeze to below:
# datalad get images/bids/bids-mriqc--0.15.0.sing images/bids/bids-fmriprep--1.4.1.sing
#
# - INPUT_DATASET_REPO - an alternative (could be local) location for input
# BIDS dataset
#
# Note that if FS_LICENSE does not point to a file and is not empty, it would
# assume to contain the license content. If you are not interested in running
# only MRIQC, just set it to some bogus value.
# So to run only mriqc if you don't have freesurfer license, do
# BIDS_APPS=mriqc FS_LICENSE=bogus ...
#
# Sample invocations
# - Pointing to the existing local clones of input repositories for faster
# "get"
# RUNNER=datalad \
# FS_LICENSE=~/.freesurfer-license \
# CONTAINERS_REPO=~/proj/repronim/containers \
# INPUT_DATASET_REPO=$PWD/bids-fmriprep-workflow-NP/ds000003-demo \
# ./bids-fmriprep-workflow-NP.sh bids-fmriprep-workflow-NP/out2
#

set -eu
export PS4='ex:$? > '
set -x

# $STUDY is a variable used in a paper this workflow mimics
STUDY="$1"

# Which runner - reproman or datalad
: "${RUNNER:=reproman}"

# Define common parameters for the reproman run

# ReproMan orchestrator to be used - determines how data/results would be
# transferred and execution protocoled
# Use reproman run --list orchestrators to get an updated list
: "${RM_ORC:=datalad-pair-run}" # ,plain,datalad-pair,datalad-local-run

# Which batch processing system supported by ReproMan will be used
# Use reproman run --list submitters to get an updated list
# RM_SUB=condor,pbs,local

# Which resource to use
# It would require (if was not done before) to configure
# a resource where execution will happen. For now will just use smaug below.
# TODO: provide pointers to doc ( ;-) )

# On discovery resource use PBS, and
# Necessary modules to be loaded in that session:
# - singularity/2.4.2
# Necessary installations/upgrades to be done (TODO: contact John)
# - datalad (0.11.6, TODO: release first)
# - datalad-container

: "${RM_RESOURCE:=local}"
: "${RM_SUB:=local}"

# TODO: at reproman level allow to specify ORC and SUB for a resource, so there would
# be no need to specify for each invocation. Could be a new (meta) resource such as
# "smaug-condor" which would link smaug physical resource with those parameters
# TODO: point to the issue in ReproMan


unknown_runner () {
echo "ERROR: Unknown runner $RUNNER. Known reproman and datalad" >&2
exit 1
}

# Common invocation of ReproMan
# TODO: just make it configurable per project/env?
reproman_run () {
reproman run --follow -r "${RM_RESOURCE}" --sub "${RM_SUB}" --orc "${RM_ORC}" "$@"
}


# TODO: see where such functionality could be provided within reproman, so could
# be easily reused
get_participant_ids () {
# Would go through provided paths and current directory to find participants.tsv
# and return participant ids, comma-separated
for p in "$@" .; do
f="$p/participants.tsv"
if [ -e "$f" ]; then
sed -n -e '/^sub-/s/sub-\([^\t]*\)\t.*/\1/gp' < "$f" \
| tr '\n' ',' \
| sed -e 's/,$//g'
break
fi
done
}

function run_bids_app() {
app="$1"; shift
do_group="$1"; shift
app_args=( "$@" -w work )

if [ -n "${BIDS_APPS:=}" ] && ! echo "$BIDS_APPS" | grep -q "\<$app\>" ; then
echo "I: skipping $app since BIDS_APPS=$BIDS_APPS"
return
fi
outds=data/$app
container=containers/bids-$app
app_runner_args=( --input 'data/bids' --output "$outds" )

mkdir -p work
grep -e '^work$' .gitignore \
|| { echo "work" >> .gitignore; datalad save -m "Ignore work directory"; }

# set -x
# Create target output dataset
# TODO: per app specific configuration? some might have too heavy xml etc
# files
[ -e "$outds" ] || datalad create -d . -c text2git "$outds"

case "$RUNNER" in
reproman)
# Serial run
# reproman_run --jp container=containers/bids-mriqc "${RUNNER_ARGS[@]}" "${MRIQC_ARGS[@]}"
# Parallel requires two runs -- parallel across participants:
reproman_run --jp "container=$container" "${app_runner_args[@]}" \
--bp "pl=$(get_participant_ids data/bids)" \
'{inputs}' '{outputs}' participant --participant_label '{p[pl]}' "${app_args[@]}"
case "$do_group" in
1|yes)
# serial for the group
reproman_run --jp "container=$container" "${app_runner_args[@]}" \
'{inputs}' '{outputs}' group "${app_args[@]}"
;;
0|no)
;;
*)
echo "Unknown value APP_GROUP=$do_group" >&2
exit 1
;;
esac
;;
datalad)
case "$do_group" in
1|yes) app_args=( group "${app_args[@]}" ) ;;
0|no) ;;
*) exit 1 ;;
esac
datalad containers-run -n "$container" "${app_runner_args[@]}" \
'{inputs}' '{outputs}' participant "${app_args[@]}"
;;
*) unknown_runner;;
esac
# set +x
}

#
# Check asap for licenses since fmriprep needs one for FreeSurfer
#

if [ -z "${FS_LICENSE:-}" ]; then
if [ -e "${FREESURFER_HOME:-/XXXX}/.license" ]; then
FS_LICENSE="${FREESURFER_HOME}/.license"
else
cat >&2 <<EOF
Error: No FreeSurfer license found!
Either define FREESURFER_HOME environment variable pointing to a directory
with .license file for FreeSurfer or define FS_LICENSE environment variable
which would either point to the license file or contain the license
(with "\\n" for new lines) to be used for FreeSurfer
EOF
exit 1
fi
fi


# Create study dataset
datalad create -c text2git "$STUDY"
cd "$STUDY"

#
# Install containers dataset for guaranteed/unambigous containers versioning
# and datalad containers-run
#
# TODO: specific version, TODO - reference datalad issue

# Local copy to avoid heavy network traffic while testing locally could be
# referenced in CONTAINERS_REPO env var
datalad install -d . -s "${CONTAINERS_REPO:-///repronim/containers}"

# TODO: shift that into some helper script in the containers
CONTAINERS_FS_LICENSE=containers/licenses/freesurfer
if [ -e "$FS_LICENSE" ]; then
cp "$FS_LICENSE" "$CONTAINERS_FS_LICENSE"
else
echo -n "$FS_LICENSE" >| "$CONTAINERS_FS_LICENSE"
fi
datalad save -d . -m "Added licenses/freesurfer (needed for fmriprep)" containers/licenses/
( cd containers; git annex metadata licenses/freesurfer -s distribution-restrictions=sensitive; )


# possibly downgrade versions to match the ones used in the "paper"
containers/scripts/freeze_versions --save-dataset=^ \
poldracklab-ds003-example=0.0.3 \
bids-mriqc=0.15.0 \
bids-fmriprep=1.4.1

#
# Install dataset to be analyzed (no data - analysis might run in the cloud or on HPC)
#
# In original paper name for the dataset was used as is, and placed at the
# top level. Here, to make this demo easier to apply to other studies,
# and also check on other datasets, we install input dataset under a generic
# "data/bids" path. "data/" will also collect all other derivatives etc
mkdir data

# For now we will work with minimized version with only 2 subjects
# datalad install -d . -s ///openneuro/ds000003 data/bids
datalad install -d . -s "${INPUT_DATASET_REPO:-https://github.com/ReproNim/ds000003-demo}" data/bids

#
# Execution.
#
# That is where access to the powerful resource (HPC) etc would be useful.
# Every of those containerized apps might need custom options to be added.
#
#

# datalad save -d . -m "Due to https://github.com/datalad/datalad/issues/3591" data/mriqc


run_bids_app mriqc yes
# note: not using $CONTAINERS_FS_LICENSE just to make things a bit more explicit
run_bids_app fmriprep no --fs-license-file=containers/licenses/freesurfer

# 3. poldracklab-ds003-example -- analysis

# X. Later? visualization etc - used nilearn


exit 0 # done for now


reproman run --follow -r "${RM_RESOURCE}" --sub "${RM_SUB}" --orc "${RM_ORC}" \
--bp 'thing=thing-*' \
--input '{p[thing]}' \
sh -c 'cat {p[thing]} {p[thing]} >doubled-{p[thing]}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest push to run-subjobs (ac14277) checked out, try

reproman run --follow -r "${RM_RESOURCE}" --sub "${RM_SUB}" --orc "${RM_ORC}" \
  --jp container=containers/bids-mriqc \
  --bp 'pl=02,13' \
  --input data/bids \
  data/bids data/mriqc participant --participant_label '{p[pl]}'

I was able to get that [*] to successfully run via condor on smaug. As you've already experienced, the management of existing datasets is a bit rough, so you may want to use a fresh dataset.

[*] Or more specifically, this script:

script
#!/bin/sh
set -eu

cd $(mktemp -d --tmpdir=. ds-XXXX)
datalad create -c text2git .
datalad install -d . ///repronim/containers
datalad install -d . -s https://github.com/ReproNim/ds000003-demo data/bids

mkdir licenses/
echo freesurfer.txt > licenses/.gitignore
cat > licenses/README.md <<EOF

Freesurfer
----------

Place your FreeSurfer license into freesurfer.txt file in this directory.
Visit https://surfer.nmr.mgh.harvard.edu/registration.html to obtain one if
you don't have it yet - it is free.

EOF
datalad save -m "DOC: licenses/ directory stub" licenses/

datalad create -d . data/mriqc

reproman run --resource sm --follow \
         --sub condor --orc datalad-pair-run \
         --jp container=containers/bids-mriqc --bp 'pl=02,13' \
         -i data/bids \
         data/bids data/mriqc participant --participant_label '{p[pl]}'



42 changes: 42 additions & 0 deletions docs/usecases/simple_kwyk.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash
#emacs: -*- mode: shell-script; c-basic-offset: 4; tab-width: 4; indent-tabs-mode: t -*-
#ex: set sts=4 ts=4 sw=4 noet:
#
#
# COPYRIGHT: Yaroslav Halchenko 2019
#
# LICENSE: MIT
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#

set -eu

cd $(mktemp -d --tmpdir=. ds-XXXX)
pwd
datalad create .
datalad install -d . ///repronim/containers
datalad install -d . -s https://github.com/ReproNim/ds000003-demo data/bids

mkdir data/kwyked
datalad containers-run \
--input data/bids/sub-02/anat/sub-02_T1w.nii.gz \
--output data/kwyked/sub-02_T1w \
-n containers/neuronets-kwyk \
'{inputs}' '{outputs}'