-
-
Notifications
You must be signed in to change notification settings - Fork 240
1226 paralilisation #1234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
LukasFehring
wants to merge
7
commits into
development
Choose a base branch
from
1226-paralilisation
base: development
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+97
−98
Open
1226 paralilisation #1234
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1c5d703
Play With paralelisation
LukasFehring 50b4d3f
Update Parallelism Doc
LukasFehring 502542b
Update paralelisation example
b42c9c0
Update docs
LukasFehring 9c73411
Update parallelism docs
LukasFehring a395f32
Update parallelism
LukasFehring afd9619
Adapt Changelog.md
LukasFehring File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,41 +1,76 @@ | ||
| # Parallelism | ||
|
|
||
| SMAC supports multiple workers natively via Dask. Just specify ``n_workers`` in the scenario and you are ready to go. | ||
|
|
||
| To facilitate parallel execution, SMAC supports executing multiple workers simultaneously via [Dask](https://www.dask.org/). Using this functionality, splits SMAC into a main process, and DASK workers which handle the execution. | ||
| The main job handles the optimization process, and coordinates the executor jobs. The executors are queried with the target function and hyperparameter configurations, execute them, and return their result. The executors remain open between different executions. | ||
|
|
||
| !!! note | ||
|
|
||
| Please keep in mind that additional workers are only used to evaluate trials. The main thread still orchestrates the | ||
| optimization process, including training the surrogate model. | ||
|
|
||
|
|
||
| !!! warning | ||
|
|
||
| Using high number of workers when the target function evaluation is fast might be counterproductive due to the | ||
| overhead of communcation. Consider using only one worker in this case. | ||
|
|
||
| When using multiple workers, SMAC is not reproducible anymore. | ||
|
|
||
| !!! warning | ||
|
|
||
| When using multiple workers, SMAC is not reproducible anymore. | ||
| ## Parallelizing Locally | ||
|
|
||
| To utilize parallelism locally, that means running workers on the same machine as the main jobs, specify the ``n_workers`` keyword when creating the scenario. | ||
| ```python | ||
| Scenario(model.configspace, n_workers=5) | ||
| ``` | ||
|
|
||
| ## Running on a Cluster | ||
|
|
||
| You can also pass a custom dask client, e.g. to run on a slurm cluster. | ||
| See our [parallelism example](../examples/1%20Basics/7_parallelization_cluster.md). | ||
| ## Parallelizing on SLURM | ||
|
|
||
| !!! warning | ||
| To utilize this split of main and execution jobs on a [SLURM cluster](https://slurm.schedmd.com/), SMAC supports manually specifying a [Dask](https://www.dask.org/) client. | ||
| This allows executing the target function on dedicated SLURM jobs that are necessarily configured with the same hardware requirements,. | ||
|
|
||
| On some clusters you cannot spawn new jobs when running a SLURMCluster inside a | ||
| job instead of on the login node. No obvious errors might be raised but it can hang silently. | ||
| !!! note | ||
|
|
||
| !!! warning | ||
| While most SLURM clusters behave similarly, the example DASK client might not work for every cluster. For example, some clusters only allow spawning new jobs | ||
| from the login node. | ||
|
|
||
| Sometimes you need to modify your launch command which can be done with | ||
| `SLURMCluster.job_class.submit_command`. | ||
| To configure SMAC properly for each cluster, you need to know the ports which allow communication between main and worker jobs. The dask client is then created as follows: | ||
LukasFehring marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ```python | ||
| cluster.job_cls.submit_command = submit_command | ||
| cluster.job_cls.cancel_command = cancel_command | ||
| ``` | ||
| ... | ||
| from smac import BlackBoxFacade, Scenario | ||
| from dask_jobqueue import SLURMCluster | ||
|
|
||
| cluster = SLURMCluster( | ||
| queue="partition_name", # Name of the partition | ||
| cores=4, # CPU cores requested | ||
| memory="4 GB", # RAM requested | ||
| walltime="00:10:00", # Walltime limit for a runner job. | ||
| processes=1, # Number of processes per worker | ||
| log_directory="tmp/smac_dask_slurm", # Logging directory | ||
| nanny=False, # False unless you want to use pynisher | ||
| worker_extra_args=[ | ||
| "--worker-port", # Worker port range | ||
| "60010:60100"], # Worker port range | ||
| scheduler_options={ | ||
| "port": 60001, # Main Job Port | ||
| }, | ||
| ) | ||
| cluster.scale(jobs=n_workers) | ||
|
|
||
| # Dask creates n_workers jobs on the cluster which stay open. | ||
| client = Client( | ||
| address=cluster, | ||
| ) | ||
|
|
||
| #Dask waits for n_workers workers to be created | ||
| client.wait_for_workers(n_workers) | ||
LukasFehring marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Now we use SMAC to find the best hyperparameters | ||
| smac = BlackBoxFacade( | ||
| scenario, # Pass scenario | ||
| model.train, # Pass Pass target-function | ||
| overwrite=True, # Overrides any previous result | ||
| dask_client=client, # Pass dask_client | ||
| ) | ||
| incumbent = smac.optimize() | ||
| ``` | ||
|
|
||
| The full example of this code is given in [parallelism example](../examples/1%20Basics/7_parallelization_cluster.md). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.