Skip to content

Add CleanRL RNN (modified to GRU/vanilla RNN) example#250

Merged
Ivan-267 merged 8 commits intoedbeeching:mainfrom
Ivan-267:add_cleanrl_rnn_example
Jan 12, 2026
Merged

Add CleanRL RNN (modified to GRU/vanilla RNN) example#250
Ivan-267 merged 8 commits intoedbeeching:mainfrom
Ivan-267:add_cleanrl_rnn_example

Conversation

@Ivan-267
Copy link
Copy Markdown
Collaborator

@Ivan-267 Ivan-267 commented Jan 7, 2026

Clean RL based RNN example.

With some help from an LLM, I modified the example to use GRU instead of LSTM (it's also possible to use vanilla RNN instead of GRU as switching between them is simple). I didn't make a comparison with the original LSTM, the idea was just to have an option with vanilla RNN/GRU.

I also added a checkpoint saving feature, basic loading/inference, and reporting min/max reward from the latest 40 episodes along with the avg to the terminal (it helps to see if the agent has discovered the "max reward", e.g. a success condition, and how far the average is from the maximum).

The hyperparameters were based on testing, they don't necessarily have to be optimal, but at least they worked to train at least one unreleased environment (I still need to check if they are the latest iteration I have locally, but if not, I can still update them later). I will test to verify later, but this example should be capable of training the newly added memory test env: edbeeching/godot_rl_agents_examples#58

Note that onnx export/inference is not featured in this example, as supporting RNNs requires some modifications on Godot plugin side too.

For more information, I'm copying the readme here:

CleanRL PPO GRU Discrete Actions example

This example is a modification of CleanRL PPO Atari LSTM,
it's adjusted to work with GDRL and vector obs, along with adding inference, changing the default params, and other modifications.

You may need to install tyro using pip install tyro. If you get an error while running the script: ModuleNotFoundError: No module named 'tyro', install it.

Observations:

  • Works with vector observations.

Actions:

  • Accepts a single discrete action space.

CL arguments unique to this example:

RNN settings:

By default, uses GRU. It can use vanilla RNN instead if you use the CL argument --use_vanilla_rnn

Checkpoint saving:

Example: Save checkpoint every 500_000 steps: --save_model_frequency_global_steps=500_000.
If you don't set this argument, the model will not be saved, only the logs.
The checkpoints will be saved inside the runs folder in a different folder for each run, you will see the full path displayed in console when a checkpoint is saved.

Inference:

Example use: --load_model_path=path_to_saved_file.pt --inference (set the true path to a checkpoint).

Other CL args should be similar to those described in https://github.com/edbeeching/godot_rl_agents/blob/main/docs/ADV_CLEAN_RL.md (but there is no onnx export/inference currently implemented for this example).

Add README for CleanRL PPO GRU Discrete Actions example
Updated comments to reflect additional changes and hyperparameter adjustments.
Added installation note for tyro.
Fixes some issues when loading on CUDA after training with --no-cuda, and updates hyperparams (to the last ones used while training an env that worked OK for that env, not necessarily globally optimal).
@Ivan-267 Ivan-267 requested a review from edbeeching January 7, 2026 14:43
@Ivan-267
Copy link
Copy Markdown
Collaborator Author

Ivan-267 commented Jan 8, 2026

I've trained this memory env using the script (I might have potentially modified some hyperparams locally vs the PR script, but I shared the ones used in the env description). It uses only local obs: a modified raycast sensor (distances ordered by physics layer), the movement/turn applied and normalized episode time.

It's my first env designed to use an RNN with Godot RL Agents after the textual one which we merged.

memory_env.mp4

https://github.com/Ivan-267/Memory-Find-Clue-Then-Goal-RL-Environment

@Ivan-267
Copy link
Copy Markdown
Collaborator Author

I ran a test on the Simple Memory Test env from the PR: edbeeching/godot_rl_agents_examples#58
Note that I didn't run multiple runs for each setting, results might vary between runs.

For reference, the old frame stacking result from the PR (note some hyperparams are different so it's not the closest comparison possible):
image

Using the RNN script:
Default episode length (5)

Using --no-cuda --num_steps 5 (other parameters should match the PR script)

gru rnn tensorboard

Using --no-cuda --num_steps 5 --use_vanilla_rnn (other parameters should match the PR script)

vanilla rnn tensorboard

GRU (green) and Vanilla RNN runs on the same chart:
gru and vanilla rnn tensorboard

Long episode length (200)

Using --no-cuda --num_steps 1000
gru tensorboard

Training with this setting took a while and I had to stop it early, so the result doesn't show whether/where it converges. Still, we can see the average reward progressing.

@Ivan-267
Copy link
Copy Markdown
Collaborator Author

@edbeeching I've added some test results and this is ready to merge from my side.

The reported errors on Linux might have something to do with memory/space, but should not be affected by this PR.

@Ivan-267 Ivan-267 merged commit ba84413 into edbeeching:main Jan 12, 2026
7 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants