-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Models will always be initialized without dropout layers in self-tuning ruleset #753
Comments
Our current API has 2 dropout related limitations: Currently, in the external tuning ruleset we read the dropout value from the hparam config and pass it to the model initialization functions. In the self-tuning ruleset there exist no convenient way to specify the dropout value in the model initialization. |
Some considerations about changing the dropout implementation. Current situationThe dropout probability value is provided as a hyperparameter in the JSON search space. It is then used in model_params, model_state = workload.init_model_fn(
model_init_rng, dropout_rate, aux_dropout_rate) After initializing the model, we torch.compile it and initialize the optimizer. Current limitations
How can we address these problems?I can see several possibilities, some require major changes, some are less disruptive. (A) extend the submission module API to provide initial dropout value ⭐A submission should provide a function (B) re-init and re-compile the modelWe could add a (C) pass dropout to the model fwd callNot trivial, need to modify all model implementations. ConclusionMy suggested option is (A), but I am happy to discuss! |
Hi Niccolo, are there any open things here left to discuss regarding the plan? I think we agreed on (A) in the eng meeting? |
Hey Priya! Nothing left to discuss, just lagging behind! Will submit a PR by this week! Sorry for the delay. |
As discussed offline, I have implemented a fix in: #851 |
In submission_runner.py, if we are in the self-tuning rules, the hyperparameters argument to train_once will always be None.
Then in this code snippet
workload.init_model_fn will always get None for dropout_rate and aux_dropout_rate, so Dropout layers won't ever be added to the model.
Although submissions could call workload.init_model_fn again themselves to make use of its side effect of setting workload._model, this is awkward and also challenging for workloads near the memory limit since it involves superfluously reconstructing model_params again on device.
The text was updated successfully, but these errors were encountered: