Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request chapel-lang#5316 from ronawho/slurm-aware-paratest
Add a simple slurm-aware paratest wrapper for chapcs [reviewed by @mppf and @ben-albrecht] Add a basic wrapper for paratest that does an salloc and runs paratest. It essentially does: salloc --nodes=${num_free_nodes} --partition=chapel --share paratest.server You use it just like you do paratest.server, except that you don't set -nodefile or -nodepara. The wrapper automatically determines how many nodes to use by calculating how many nodes are not reserved exclusively. It also does some things so make paratests play nicer with other paratests in order to avoid timeouts. This should allow us to run a std paratest in ~20 minutes, without interfering with nightly testing, and without causing timeouts for ourselves, or other developers. It should also make it easy for devs to grab a node exclusively during the day to run performance tests. They'll have to wait for existing paratests to finish, but paratests started after an exclusive reservation will leave nodes open for that job. With regular paratest I often see timeouts even if just one other person is running paratest. With this, I was able to run 6 concurrent paratests (max allowed by slurm) without getting any timeouts. I tested with both gasnet and std configuration paratests. Some details about the wrapper: ------------------------------- To calculate the number of nodes to use we: use `sinfo` to determine how many nodes are online, and `squeue` to determine how many nodes are reserved by non-shared jobs. This allows us to run on all nodes not being used exclusively (by nightly testing, or by a developer wanting to do performance testing or something.) The wrapper does a few other things: - automatically determines a "good" nodepara so that testing runs faster - throws `--share --nice` to `salloc` to share slurm resources - sets `CHPL_TEST_LIMIT_RUNNING_EXECUTABLES=yes`, `QT_AFFINITY=no`, and `QT_SPINCOUNT=300` to limit timeouts
- Loading branch information