Add CleanRL RNN (modified to GRU/vanilla RNN) example#250
Add CleanRL RNN (modified to GRU/vanilla RNN) example#250Ivan-267 merged 8 commits intoedbeeching:mainfrom
Conversation
Add README for CleanRL PPO GRU Discrete Actions example
Updated comments to reflect additional changes and hyperparameter adjustments.
Added installation note for tyro.
Fixes some issues when loading on CUDA after training with --no-cuda, and updates hyperparams (to the last ones used while training an env that worked OK for that env, not necessarily globally optimal).
|
I've trained this memory env using the script (I might have potentially modified some hyperparams locally vs the PR script, but I shared the ones used in the env description). It uses only local obs: a modified raycast sensor (distances ordered by physics layer), the movement/turn applied and normalized episode time. It's my first env designed to use an RNN with Godot RL Agents after the textual one which we merged. memory_env.mp4https://github.com/Ivan-267/Memory-Find-Clue-Then-Goal-RL-Environment |
|
I ran a test on the Simple Memory Test env from the PR: edbeeching/godot_rl_agents_examples#58 For reference, the old frame stacking result from the PR (note some hyperparams are different so it's not the closest comparison possible): Using the RNN script: Using
Using
GRU (green) and Vanilla RNN runs on the same chart: Long episode length (200) Using Training with this setting took a while and I had to stop it early, so the result doesn't show whether/where it converges. Still, we can see the average reward progressing. |
|
@edbeeching I've added some test results and this is ready to merge from my side. The reported errors on Linux might have something to do with memory/space, but should not be affected by this PR. |





Clean RL based RNN example.
With some help from an LLM, I modified the example to use GRU instead of LSTM (it's also possible to use vanilla RNN instead of GRU as switching between them is simple). I didn't make a comparison with the original LSTM, the idea was just to have an option with vanilla RNN/GRU.
I also added a checkpoint saving feature, basic loading/inference, and reporting min/max reward from the latest 40 episodes along with the avg to the terminal (it helps to see if the agent has discovered the "max reward", e.g. a success condition, and how far the average is from the maximum).
The hyperparameters were based on testing, they don't necessarily have to be optimal, but at least they worked to train at least one unreleased environment (I still need to check if they are the latest iteration I have locally, but if not, I can still update them later). I will test to verify later, but this example should be capable of training the newly added memory test env: edbeeching/godot_rl_agents_examples#58
Note that onnx export/inference is not featured in this example, as supporting RNNs requires some modifications on Godot plugin side too.
For more information, I'm copying the readme here:
CleanRL PPO GRU Discrete Actions example
This example is a modification of CleanRL PPO Atari LSTM,
it's adjusted to work with GDRL and vector obs, along with adding inference, changing the default params, and other modifications.
You may need to install tyro using
pip install tyro. If you get an error while running the script:ModuleNotFoundError: No module named 'tyro', install it.Observations:
Actions:
CL arguments unique to this example:
RNN settings:
By default, uses GRU. It can use vanilla RNN instead if you use the CL argument
--use_vanilla_rnnCheckpoint saving:
Example: Save checkpoint every 500_000 steps:
--save_model_frequency_global_steps=500_000.If you don't set this argument, the model will not be saved, only the logs.
The checkpoints will be saved inside the
runsfolder in a different folder for each run, you will see the full path displayed in console when a checkpoint is saved.Inference:
Example use:
--load_model_path=path_to_saved_file.pt --inference(set the true path to a checkpoint).Other CL args should be similar to those described in https://github.com/edbeeching/godot_rl_agents/blob/main/docs/ADV_CLEAN_RL.md (but there is no onnx export/inference currently implemented for this example).