Skip to content

Commit 344936b

Browse files
authored
Merge pull request #849 from marrlab/benchmark_slurm_log
Benchmark slurm log folder custom
2 parents ca493e5 + d72a8e4 commit 344936b

File tree

4 files changed

+13
-6
lines changed

4 files changed

+13
-6
lines changed

docs/doc_benchmark.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,10 @@ hyperparameter sampling and pytorch.
7474
The following script will help to find out which job has failed and the error message, so that you could direct to the
7575
specific log file
7676
```cluster
77-
bash ./sh_list_error.sh ./zoutput/slurm_logs
77+
bash ./sh_list_error.sh ./zoutput/benchmarks/[output folder of the sepcifed benchmark in the yaml file]/slurm_logs
7878
```
7979
#### Map between slurm job id and sampled hyperparameter index
80-
suppose the slurm job id is 14144163, one could the corresponding log file in `./zoutput/slurm_logs` folder via
80+
suppose the slurm job id is 14144163, one could the corresponding log file in `./zoutput/[output folder of the sepcifed benchmark in the yaml file]/slurm_logs` folder via
8181
`find . | grep -i "14144163"`
8282

8383
the results can be

domainlab/exp_protocol/benchmark.smk

+4
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ rule parameter_sampling:
7272
expand("{path}", path=config_path)
7373
output:
7474
dest=expand("{output_dir}/hyperparameters.csv", output_dir=config["output_dir"])
75+
# resources:
76+
# log_dir="slurm_logs_test"
7577
params:
7678
sampling_seed=os.environ["DOMAINLAB_CUDA_HYPERPARAM_SEED"]
7779
run:
@@ -159,6 +161,8 @@ rule agg_results:
159161
# put different csv file in a big csv file
160162
input:
161163
exp_results=experiment_result_files
164+
# resources:
165+
# log_dir="slurm_logs_test"
162166
output:
163167
out_file=expand("{output_dir}/results.csv", output_dir=config["output_dir"])
164168
run:

examples/yaml/slurm/config.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This yaml file has been adapted from https://github.com/jdblischak/smk-simple-slurm
22
cluster:
3-
mkdir -p zoutput/slurm_logs/{rule} &&
3+
mkdir -p $logdir/{rule} &&
44
sbatch
55
--partition=gpu_p
66
--qos=gpu_normal
@@ -10,8 +10,8 @@ cluster:
1010
-c 2
1111
--mem=160G
1212
--job-name=smk-{rule}-{wildcards}
13-
--output=zoutput/slurm_logs/{rule}/{rule}-{wildcards}-%j.out
14-
--error=zoutput/slurm_logs/{rule}/{rule}-{wildcards}-%j.err
13+
--output=$logdir/{rule}/{rule}-{wildcards}-%j.out
14+
--error=$logdir/{rule}/{rule}-{wildcards}-%j.err
1515
default-resources:
1616
- partition=gpu_p
1717
- qos=gpu_normal

run_benchmark_slurm.sh

+4-1
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,7 @@ echo "Number of GPUs: $NUMBER_GPUS"
3232
echo "Results will be stored in: $results_dir"
3333

3434
# Helmholtz
35-
snakemake --profile "examples/yaml/slurm" --config yaml_file="$CONFIGFILE" --keep-going --keep-incomplete --notemp --cores 3 -s "domainlab/exp_protocol/benchmark.smk" --configfile "$CONFIGFILE" --config output_dir="$results_dir" 2>&1 | tee "$logfile"
35+
export logdir="${results_dir}/slurm_logs/"
36+
echo "slurm logs going into ${logdir}"
37+
# snakemake --config logdir="zoutput/benchmark/logs" does not seem to work
38+
snakemake --profile "examples/yaml/slurm" --config yaml_file="$CONFIGFILE" --keep-going --keep-incomplete --notemp --cores 3 -s "domainlab/exp_protocol/benchmark.smk" --configfile "$CONFIGFILE" --config output_dir="$results_dir" 2>&1 | tee "$logfile"

0 commit comments

Comments
 (0)