Add RL examples #463

foksly · 2022-03-18T14:05:32Z

Current plan:

working with https://stable-baselines3.readthedocs.io
trying to make a minimalistic example that uses hivemind.Optimizer

TODO:

make a PPO run with more than 1 peer
run baseline PPO on Atari games (adapt from here)
run hivemind optimizer with target batch size large enough to average every ~30 seconds

Later:

find out what caused the problem with use_local_updates + cuda
figure out how to use learning rate schedule (e.g. disable the default one and make use of hivemind.Optimizer(scheduler=...))

codecov · 2022-03-18T14:13:50Z

Codecov Report

Merging #463 (1eb7522) into master (712e428) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #463      +/-   ##
==========================================
- Coverage   85.80%   85.75%   -0.06%     
==========================================
  Files          80       80              
  Lines        7794     7794              
==========================================
- Hits         6688     6684       -4     
- Misses       1106     1110       +4

Impacted Files	Coverage Δ
hivemind/averaging/matchmaking.py	`87.50% <0.00%> (-1.79%)`	⬇️
hivemind/dht/node.py	`91.44% <0.00%> (-0.24%)`	⬇️
hivemind/utils/asyncio.py	`100.00% <0.00%> (+0.86%)`	⬆️
hivemind/dht/dht.py	`91.51% <0.00%> (+1.21%)`	⬆️

justheuristic · 2022-06-09T08:07:16Z

@foksly gentle reminder: do you still have time for the PR?

justheuristic · 2022-06-20T10:36:52Z

Great job!
Initial request: please trigger auto-merge (sync with master) and apply linters

black .
isort .

justheuristic · 2022-06-20T10:50:39Z

examples/ppo/ppo.py

+    return exp_name
+
+
+class AdamWithClipping(torch.optim.Adam):


note to @mryab : we've recently merged the same clipping functionality here:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/moe/server/layers/optim.py#L48

Would you prefer if we...

keep everything as is, accept some code duplication?

extract moe.server.layers.optim to utils.optim and use it here?

keep wrapper in hivemind.optim and import from there?

insert your option here :)

I'm 50:50 between the "keep here, accept duplication" and "move OptimizerWrapper and ClippingWrapper to hivemind.optim.wrapper" solutions, so ultimately, it's @foksly's call

The utils option is also acceptable, but I'm slightly against this folder becoming too bloated. That said, it looks like a reasonable place to put such code, so any solution of these three is fine by me (as long as you don't import the wrapper from hivemind.moe)

justheuristic · 2022-06-20T10:51:38Z

examples/ppo/README.md

@@ -0,0 +1,45 @@
+# Training PPO with decentralized averaging
+
+This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.


nit: i believe PPO is on-policy

I also have the same belief :)

justheuristic · 2022-06-21T11:24:07Z

[just in case] feel free to ping me if you need any help with black / isort

Create RL

30122a9

foksly changed the title ~~Create RL~~ RL examples Mar 18, 2022

borzunov changed the title ~~RL examples~~ Add RL examples Mar 19, 2022

add ppo example

2f42352

foksly requested a review from justheuristic June 20, 2022 10:28

justheuristic reviewed Jun 20, 2022

View reviewed changes

foksly added 2 commits June 20, 2022 14:05

Merge branch 'master' into hive-rl

4bb482c

minor update to readme.md

1eb7522

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RL examples #463

Add RL examples #463

foksly commented Mar 18, 2022 •

edited

Loading

codecov bot commented Mar 18, 2022 •

edited

Loading

justheuristic commented Jun 9, 2022

justheuristic commented Jun 20, 2022 •

edited

Loading

justheuristic Jun 20, 2022

mryab Jun 20, 2022 •

edited

Loading

mryab Jun 20, 2022

justheuristic Jun 20, 2022

foksly Jun 20, 2022

justheuristic commented Jun 21, 2022

		@@ -0,0 +1,45 @@
		# Training PPO with decentralized averaging

		This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.

Add RL examples #463

Are you sure you want to change the base?

Add RL examples #463

Conversation

foksly commented Mar 18, 2022 • edited Loading

codecov bot commented Mar 18, 2022 • edited Loading

Codecov Report

justheuristic commented Jun 9, 2022

justheuristic commented Jun 20, 2022 • edited Loading

justheuristic Jun 20, 2022

Choose a reason for hiding this comment

mryab Jun 20, 2022 • edited Loading

Choose a reason for hiding this comment

mryab Jun 20, 2022

Choose a reason for hiding this comment

justheuristic Jun 20, 2022

Choose a reason for hiding this comment

foksly Jun 20, 2022

Choose a reason for hiding this comment

justheuristic commented Jun 21, 2022

foksly commented Mar 18, 2022 •

edited

Loading

codecov bot commented Mar 18, 2022 •

edited

Loading

justheuristic commented Jun 20, 2022 •

edited

Loading

mryab Jun 20, 2022 •

edited

Loading