-
Notifications
You must be signed in to change notification settings - Fork 186
Add RL examples #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add RL examples #463
Conversation
Codecov Report
@@ Coverage Diff @@
## master #463 +/- ##
==========================================
- Coverage 85.80% 85.75% -0.06%
==========================================
Files 80 80
Lines 7794 7794
==========================================
- Hits 6688 6684 -4
- Misses 1106 1110 +4
|
@foksly gentle reminder: do you still have time for the PR? |
Great job!
|
return exp_name | ||
|
||
|
||
class AdamWithClipping(torch.optim.Adam): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to @mryab : we've recently merged the same clipping functionality here:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/moe/server/layers/optim.py#L48
Would you prefer if we...
- keep everything as is, accept some code duplication?
- extract moe.server.layers.optim to utils.optim and use it here?
- keep wrapper in hivemind.optim and import from there?
- insert your option here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 50:50 between the "keep here, accept duplication" and "move OptimizerWrapper and ClippingWrapper to hivemind.optim.wrapper" solutions, so ultimately, it's @foksly's call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The utils option is also acceptable, but I'm slightly against this folder becoming too bloated. That said, it looks like a reasonable place to put such code, so any solution of these three is fine by me (as long as you don't import the wrapper from hivemind.moe
)
examples/ppo/README.md
Outdated
@@ -0,0 +1,45 @@ | |||
# Training PPO with decentralized averaging | |||
|
|||
This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i believe PPO is on-policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have the same belief :)
[just in case] feel free to ping me if you need any help with black / isort |
Current plan:
TODO:
Later: