Add CleanRL RNN (modified to GRU/vanilla RNN) example by Ivan-267 · Pull Request #250 · edbeeching/godot_rl_agents

Ivan-267 · 2026-01-07T14:43:56Z

Clean RL based RNN example.

With some help from an LLM, I modified the example to use GRU instead of LSTM (it's also possible to use vanilla RNN instead of GRU as switching between them is simple). I didn't make a comparison with the original LSTM, the idea was just to have an option with vanilla RNN/GRU.

I also added a checkpoint saving feature, basic loading/inference, and reporting min/max reward from the latest 40 episodes along with the avg to the terminal (it helps to see if the agent has discovered the "max reward", e.g. a success condition, and how far the average is from the maximum).

The hyperparameters were based on testing, they don't necessarily have to be optimal, but at least they worked to train at least one unreleased environment (I still need to check if they are the latest iteration I have locally, but if not, I can still update them later). I will test to verify later, but this example should be capable of training the newly added memory test env: edbeeching/godot_rl_agents_examples#58

Note that onnx export/inference is not featured in this example, as supporting RNNs requires some modifications on Godot plugin side too.

For more information, I'm copying the readme here:

CleanRL PPO GRU Discrete Actions example

This example is a modification of CleanRL PPO Atari LSTM,
it's adjusted to work with GDRL and vector obs, along with adding inference, changing the default params, and other modifications.

You may need to install tyro using pip install tyro. If you get an error while running the script: ModuleNotFoundError: No module named 'tyro', install it.

Observations:

Works with vector observations.

Actions:

Accepts a single discrete action space.

CL arguments unique to this example:

RNN settings:

By default, uses GRU. It can use vanilla RNN instead if you use the CL argument --use_vanilla_rnn

Checkpoint saving:

Example: Save checkpoint every 500_000 steps: --save_model_frequency_global_steps=500_000.
If you don't set this argument, the model will not be saved, only the logs.
The checkpoints will be saved inside the runs folder in a different folder for each run, you will see the full path displayed in console when a checkpoint is saved.

Inference:

Example use: --load_model_path=path_to_saved_file.pt --inference (set the true path to a checkpoint).

Other CL args should be similar to those described in https://github.com/edbeeching/godot_rl_agents/blob/main/docs/ADV_CLEAN_RL.md (but there is no onnx export/inference currently implemented for this example).

Add README for CleanRL PPO GRU Discrete Actions example

Updated comments to reflect additional changes and hyperparameter adjustments.

Added installation note for tyro.

Fixes some issues when loading on CUDA after training with --no-cuda, and updates hyperparams (to the last ones used while training an env that worked OK for that env, not necessarily globally optimal).

Ivan-267 · 2026-01-08T23:43:19Z

I've trained this memory env using the script (I might have potentially modified some hyperparams locally vs the PR script, but I shared the ones used in the env description). It uses only local obs: a modified raycast sensor (distances ordered by physics layer), the movement/turn applied and normalized episode time.

It's my first env designed to use an RNN with Godot RL Agents after the textual one which we merged.

memory_env.mp4

https://github.com/Ivan-267/Memory-Find-Clue-Then-Goal-RL-Environment

Ivan-267 · 2026-01-11T21:57:23Z

I ran a test on the Simple Memory Test env from the PR: edbeeching/godot_rl_agents_examples#58
Note that I didn't run multiple runs for each setting, results might vary between runs.

For reference, the old frame stacking result from the PR (note some hyperparams are different so it's not the closest comparison possible):

Using the RNN script:
Default episode length (5)

Using --no-cuda --num_steps 5 (other parameters should match the PR script)

Using --no-cuda --num_steps 5 --use_vanilla_rnn (other parameters should match the PR script)

GRU (green) and Vanilla RNN runs on the same chart:

Long episode length (200)

Using --no-cuda --num_steps 1000

Training with this setting took a while and I had to stop it early, so the result doesn't show whether/where it converges. Still, we can see the average reward progressing.

Ivan-267 · 2026-01-11T22:00:03Z

@edbeeching I've added some test results and this is ready to merge from my side.

The reported errors on Linux might have something to do with memory/space, but should not be affected by this PR.

Ivan-267 added 8 commits December 17, 2025 12:36

Add files via upload

ae4371f

Create readme.md for CleanRL PPO GRU example

6d2c5b3

Add README for CleanRL PPO GRU Discrete Actions example

Update comments for clarity and hyperparameter changes

eb5a9f4

Updated comments to reflect additional changes and hyperparameter adjustments.

Refactor RNN state handling and imports

9a640e8

Reorder import for CleanRLGodotEnv

35043a7

Replace fixed iteration count with total_timesteps

70ac165

Update readme with tyro installation instructions

e6458a0

Added installation note for tyro.

Adjust training parameters and model loading

620026f

Fixes some issues when loading on CUDA after training with --no-cuda, and updates hyperparams (to the last ones used while training an env that worked OK for that env, not necessarily globally optimal).

Ivan-267 requested a review from edbeeching January 7, 2026 14:43

edbeeching approved these changes Jan 12, 2026

View reviewed changes

Ivan-267 merged commit ba84413 into edbeeching:main Jan 12, 2026
7 of 13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CleanRL RNN (modified to GRU/vanilla RNN) example#250

Add CleanRL RNN (modified to GRU/vanilla RNN) example#250
Ivan-267 merged 8 commits intoedbeeching:mainfrom
Ivan-267:add_cleanrl_rnn_example

Ivan-267 commented Jan 7, 2026 •

edited

Loading

Uh oh!

Ivan-267 commented Jan 8, 2026 •

edited

Loading

Uh oh!

Ivan-267 commented Jan 11, 2026

Uh oh!

Ivan-267 commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ivan-267 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Clean RL based RNN example.

CleanRL PPO GRU Discrete Actions example

Observations:

Actions:

CL arguments unique to this example:

RNN settings:

Checkpoint saving:

Inference:

Uh oh!

Ivan-267 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivan-267 commented Jan 11, 2026

Uh oh!

Ivan-267 commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ivan-267 commented Jan 7, 2026 •

edited

Loading

Ivan-267 commented Jan 8, 2026 •

edited

Loading