Skip to content

Commit

Permalink
Merge pull request #849 from marrlab/benchmark_slurm_log
Browse files Browse the repository at this point in the history
Benchmark slurm log folder custom
  • Loading branch information
smilesun authored Jul 11, 2024
2 parents ca493e5 + d72a8e4 commit 344936b
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 6 deletions.
4 changes: 2 additions & 2 deletions docs/doc_benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ hyperparameter sampling and pytorch.
The following script will help to find out which job has failed and the error message, so that you could direct to the
specific log file
```cluster
bash ./sh_list_error.sh ./zoutput/slurm_logs
bash ./sh_list_error.sh ./zoutput/benchmarks/[output folder of the sepcifed benchmark in the yaml file]/slurm_logs
```
#### Map between slurm job id and sampled hyperparameter index
suppose the slurm job id is 14144163, one could the corresponding log file in `./zoutput/slurm_logs` folder via
suppose the slurm job id is 14144163, one could the corresponding log file in `./zoutput/[output folder of the sepcifed benchmark in the yaml file]/slurm_logs` folder via
`find . | grep -i "14144163"`

the results can be
Expand Down
4 changes: 4 additions & 0 deletions domainlab/exp_protocol/benchmark.smk
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ rule parameter_sampling:
expand("{path}", path=config_path)
output:
dest=expand("{output_dir}/hyperparameters.csv", output_dir=config["output_dir"])
# resources:
# log_dir="slurm_logs_test"
params:
sampling_seed=os.environ["DOMAINLAB_CUDA_HYPERPARAM_SEED"]
run:
Expand Down Expand Up @@ -159,6 +161,8 @@ rule agg_results:
# put different csv file in a big csv file
input:
exp_results=experiment_result_files
# resources:
# log_dir="slurm_logs_test"
output:
out_file=expand("{output_dir}/results.csv", output_dir=config["output_dir"])
run:
Expand Down
6 changes: 3 additions & 3 deletions examples/yaml/slurm/config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This yaml file has been adapted from https://github.com/jdblischak/smk-simple-slurm
cluster:
mkdir -p zoutput/slurm_logs/{rule} &&
mkdir -p $logdir/{rule} &&
sbatch
--partition=gpu_p
--qos=gpu_normal
Expand All @@ -10,8 +10,8 @@ cluster:
-c 2
--mem=160G
--job-name=smk-{rule}-{wildcards}
--output=zoutput/slurm_logs/{rule}/{rule}-{wildcards}-%j.out
--error=zoutput/slurm_logs/{rule}/{rule}-{wildcards}-%j.err
--output=$logdir/{rule}/{rule}-{wildcards}-%j.out
--error=$logdir/{rule}/{rule}-{wildcards}-%j.err
default-resources:
- partition=gpu_p
- qos=gpu_normal
Expand Down
5 changes: 4 additions & 1 deletion run_benchmark_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,7 @@ echo "Number of GPUs: $NUMBER_GPUS"
echo "Results will be stored in: $results_dir"

# Helmholtz
snakemake --profile "examples/yaml/slurm" --config yaml_file="$CONFIGFILE" --keep-going --keep-incomplete --notemp --cores 3 -s "domainlab/exp_protocol/benchmark.smk" --configfile "$CONFIGFILE" --config output_dir="$results_dir" 2>&1 | tee "$logfile"
export logdir="${results_dir}/slurm_logs/"
echo "slurm logs going into ${logdir}"
# snakemake --config logdir="zoutput/benchmark/logs" does not seem to work
snakemake --profile "examples/yaml/slurm" --config yaml_file="$CONFIGFILE" --keep-going --keep-incomplete --notemp --cores 3 -s "domainlab/exp_protocol/benchmark.smk" --configfile "$CONFIGFILE" --config output_dir="$results_dir" 2>&1 | tee "$logfile"

0 comments on commit 344936b

Please sign in to comment.