Skip to content

Commit 263e657

Browse files
authored
Merge branch 'master' into feat/mps-support
2 parents 955382e + dd3d0ac commit 263e657

File tree

15 files changed

+110
-32
lines changed

15 files changed

+110
-32
lines changed

.github/workflows/ci.yml

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,18 +31,21 @@ jobs:
3131
- name: Install dependencies
3232
run: |
3333
python -m pip install --upgrade pip
34+
# Use uv for faster downloads
35+
pip install uv
3436
# cpu version of pytorch
35-
pip install torch==2.3.1 --index-url https://download.pytorch.org/whl/cpu
37+
# See https://github.com/astral-sh/uv/issues/1497
38+
uv pip install --system torch==2.3.1+cpu --index https://download.pytorch.org/whl/cpu
3639
3740
# Install Atari Roms
38-
pip install autorom
41+
uv pip install --system autorom
3942
wget https://gist.githubusercontent.com/jjshoots/61b22aefce4456920ba99f2c36906eda/raw/00046ac3403768bfe45857610a3d333b8e35e026/Roms.tar.gz.b64
4043
base64 Roms.tar.gz.b64 --decode &> Roms.tar.gz
4144
AutoROM --accept-license --source-file Roms.tar.gz
4245
43-
pip install .[extra_no_roms,tests,docs]
46+
uv pip install --system .[extra_no_roms,tests,docs]
4447
# Use headless version
45-
pip install opencv-python-headless
48+
uv pip install --system opencv-python-headless
4649
- name: Lint with ruff
4750
run: |
4851
make lint

.readthedocs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,6 @@ conda:
1616
environment: docs/conda_env.yml
1717

1818
build:
19-
os: ubuntu-22.04
19+
os: ubuntu-24.04
2020
tools:
21-
python: "mambaforge-22.9"
21+
python: "mambaforge-23.11"

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ into two categories:
66
- Create an issue about your intended feature, and we shall discuss the design and
77
implementation. Once we agree that the plan looks good, go ahead and implement it.
88
2. You want to implement a feature or bug-fix for an outstanding issue
9-
- Look at the outstanding issues here: https://github.com/DLR-RM/stable-baselines3/issues
9+
- Look at the outstanding issues here: https://github.com/DLR-RM/stable-baselines3/labels/help%20wanted
1010
- Pick an issue or feature and comment on the task that you want to work on this feature.
1111
- If you need more context on a particular issue, please ask, and we shall provide.
1212

README.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<!-- [![pipeline status](https://gitlab.com/araffin/stable-baselines3/badges/master/pipeline.svg)](https://gitlab.com/araffin/stable-baselines3/-/commits/master) -->
2-
![CI](https://github.com/DLR-RM/stable-baselines3/workflows/CI/badge.svg)
3-
[![Documentation Status](https://readthedocs.org/projects/stable-baselines/badge/?version=master)](https://stable-baselines3.readthedocs.io/en/master/?badge=master) [![coverage report](https://gitlab.com/araffin/stable-baselines3/badges/master/coverage.svg)](https://gitlab.com/araffin/stable-baselines3/-/commits/master)
2+
[![CI](https://github.com/DLR-RM/stable-baselines3/workflows/CI/badge.svg)](https://github.com/DLR-RM/stable-baselines3/actions/workflows/ci.yml)
3+
[![Documentation Status](https://readthedocs.org/projects/stable-baselines/badge/?version=master)](https://stable-baselines3.readthedocs.io/en/master/?badge=master) [![coverage report](https://gitlab.com/araffin/stable-baselines3/badges/master/coverage.svg)](https://github.com/DLR-RM/stable-baselines3/actions/workflows/ci.yml)
44
[![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
55

66

@@ -22,6 +22,8 @@ These algorithms will make it easier for the research community and industry to
2222
**The performance of each algorithm was tested** (see *Results* section in their respective page),
2323
you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselines3/issues/48) and [#49](https://github.com/DLR-RM/stable-baselines3/issues/49) for more details.
2424

25+
We also provide detailed logs and reports on the [OpenRL Benchmark](https://wandb.ai/openrlbenchmark/sb3) platform.
26+
2527

2628
| **Features** | **Stable-Baselines3** |
2729
| --------------------------- | ----------------------|
@@ -41,7 +43,13 @@ you can take a look at the issues [#48](https://github.com/DLR-RM/stable-baselin
4143

4244
### Planned features
4345

44-
Please take a look at the [Roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) and [Milestones](https://github.com/DLR-RM/stable-baselines3/milestones).
46+
Since most of the features from the [original roadmap](https://github.com/DLR-RM/stable-baselines3/issues/1) have been implemented, there are no major changes planned for SB3, it is now *stable*.
47+
If you want to contribute, you can search in the issues for the ones where [help is welcomed](https://github.com/DLR-RM/stable-baselines3/labels/help%20wanted) and the other [proposed enhancements](https://github.com/DLR-RM/stable-baselines3/labels/enhancement).
48+
49+
While SB3 development is now focused on bug fixes and maintenance (doc update, user experience, ...), there is more active development going on in the associated repositories:
50+
- newer algorithms are regularly added to the [SB3 Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib) repository
51+
- faster variants are developed in the [SBX (SB3 + Jax)](https://github.com/araffin/sbx) repository
52+
- the training framework for SB3, the RL Zoo, has an active [roadmap](https://github.com/DLR-RM/rl-baselines3-zoo/issues/299)
4553

4654
## Migration guide: from Stable-Baselines (SB2) to Stable-Baselines3 (SB3)
4755

@@ -79,7 +87,7 @@ Documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
7987

8088
We implement experimental features in a separate contrib repository: [SB3-Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib)
8189

82-
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Recurrent PPO (PPO LSTM), Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO).
90+
This allows SB3 to maintain a stable and compact core, while still providing the latest features, like Recurrent PPO (PPO LSTM), CrossQ, Truncated Quantile Critics (TQC), Quantile Regression DQN (QR-DQN) or PPO with invalid action masking (Maskable PPO).
8391

8492
Documentation is available online: [https://sb3-contrib.readthedocs.io/](https://sb3-contrib.readthedocs.io/)
8593

@@ -97,17 +105,16 @@ It provides a minimal number of features compared to SB3 but can be much faster
97105
### Prerequisites
98106
Stable Baselines3 requires Python 3.8+.
99107

100-
#### Windows 10
108+
#### Windows
101109

102110
To install stable-baselines on Windows, please look at the [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/install.html#prerequisites).
103111

104112

105113
### Install using pip
106114
Install the Stable Baselines3 package:
115+
```sh
116+
pip install 'stable-baselines3[extra]'
107117
```
108-
pip install stable-baselines3[extra]
109-
```
110-
**Note:** Some shells such as Zsh require quotation marks around brackets, i.e. `pip install 'stable-baselines3[extra]'` ([More Info](https://stackoverflow.com/a/30539963)).
111118

112119
This includes an optional dependencies like Tensorboard, OpenCV or `ale-py` to train on atari games. If you do not need those, you can use:
113120
```sh
@@ -177,6 +184,7 @@ All the following examples can be executed online using Google Colab notebooks:
177184
| ------------------- | ------------------ | ------------------ | ------------------ | ------------------- | ------------------ | --------------------------------- |
178185
| ARS<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
179186
| A2C | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
187+
| CrossQ<sup>[1](#f1)</sup> | :x: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
180188
| DDPG | :x: | :heavy_check_mark: | :x: | :x: | :x: | :heavy_check_mark: |
181189
| DQN | :x: | :x: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
182190
| HER | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: | :x: | :heavy_check_mark: |
@@ -191,7 +199,7 @@ All the following examples can be executed online using Google Colab notebooks:
191199

192200
<b id="f1">1</b>: Implemented in [SB3 Contrib](https://github.com/Stable-Baselines-Team/stable-baselines3-contrib) GitHub repository.
193201

194-
Actions `gym.spaces`:
202+
Actions `gymnasium.spaces`:
195203
* `Box`: A N-dimensional box that contains every point in the action space.
196204
* `Discrete`: A list of possible actions, where each timestep only one of the actions can be used.
197205
* `MultiDiscrete`: A list of possible actions, where each timestep only one action of each discrete set can be used.
@@ -218,9 +226,9 @@ To run a single test:
218226
python3 -m pytest -v -k 'test_check_env_dict_action'
219227
```
220228

221-
You can also do a static type check using `pytype` and `mypy`:
229+
You can also do a static type check using `mypy`:
222230
```sh
223-
pip install pytype mypy
231+
pip install mypy
224232
make type
225233
```
226234

@@ -252,6 +260,8 @@ To cite this repository in publications:
252260
}
253261
```
254262

263+
Note: If you need to refer to a specific version of SB3, you can also use the [Zenodo DOI](https://doi.org/10.5281/zenodo.8123988).
264+
255265
## Maintainers
256266

257267
Stable-Baselines3 is currently maintained by [Ashley Hill](https://github.com/hill-a) (aka @hill-a), [Antonin Raffin](https://araffin.github.io/) (aka [@araffin](https://github.com/araffin)), [Maximilian Ernestus](https://github.com/ernestum) (aka @ernestum), [Adam Gleave](https://github.com/adamgleave) (@AdamGleave), [Anssi Kanervisto](https://github.com/Miffyli) (@Miffyli) and [Quentin Gallouédec](https://gallouedec.com/) (@qgallouedec).

docs/conda_env.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
name: root
22
channels:
33
- pytorch
4-
- defaults
4+
- conda-forge
55
dependencies:
66
- cpuonly=1.0=0
7-
- pip=22.3.1
8-
- python=3.8
9-
- pytorch=1.13.0=py3.8_cpu_0
7+
- pip=24.2
8+
- python=3.11
9+
- pytorch=2.5.0=py3.11_cpu_0
1010
- pip:
11-
- gymnasium
11+
- gymnasium>=0.28.1,<0.30
1212
- cloudpickle
1313
- opencv-python-headless
1414
- pandas
15-
- numpy
15+
- numpy>=1.20,<2.0
1616
- matplotlib
1717
- sphinx>=5,<8
1818
- sphinx_rtd_theme>=1.3.0

docs/guide/algos.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Name ``Box`` ``Discrete`` ``MultiDiscrete`` ``MultiBinary``
1010
=================== =========== ============ ================= =============== ================
1111
ARS [#f1]_ ✔️ ✔️ ❌ ❌ ✔️
1212
A2C ✔️ ✔️ ✔️ ✔️ ✔️
13+
CrossQ [#f1]_ ✔️ ❌ ❌ ❌ ✔️
1314
DDPG ✔️ ❌ ❌ ❌ ✔️
1415
DQN ❌ ✔️ ❌ ❌ ✔️
1516
HER ✔️ ✔️ ❌ ❌ ✔️

docs/guide/sb3_contrib.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ See documentation for the full list of included features.
4242
- `PPO with recurrent policy (RecurrentPPO aka PPO LSTM) <https://ppo-details.cleanrl.dev//2021/11/05/ppo-implementation-details/>`_
4343
- `Truncated Quantile Critics (TQC)`_
4444
- `Trust Region Policy Optimization (TRPO) <https://arxiv.org/abs/1502.05477>`_
45+
- `Batch Normalization in Deep Reinforcement Learning (CrossQ) <https://openreview.net/forum?id=PczQtTsTIX>`_
4546

4647

4748
**Gym Wrappers**:

docs/index.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,12 +113,14 @@ To cite this project in publications:
113113
url = {http://jmlr.org/papers/v22/20-1364.html}
114114
}
115115
116+
Note: If you need to refer to a specific version of SB3, you can also use the `Zenodo DOI <https://doi.org/10.5281/zenodo.8123988>`_.
117+
116118
Contributing
117119
------------
118120

119121
To any interested in making the rl baselines better, there are still some improvements
120122
that need to be done.
121-
You can check issues in the `repo <https://github.com/DLR-RM/stable-baselines3/issues>`_.
123+
You can check issues in the `repository <https://github.com/DLR-RM/stable-baselines3/labels/help%20wanted>`_.
122124

123125
If you want to contribute, please read `CONTRIBUTING.md <https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md>`_ first.
124126

docs/misc/changelog.rst

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
Changelog
44
==========
55

6-
Release 2.4.0a9 (WIP)
6+
Release 2.4.0a10 (WIP)
77
--------------------------
88

9+
**New algorithm: CrossQ in SB3 Contrib**
10+
911
.. note::
1012

1113
DQN (and QR-DQN) models saved with SB3 < 2.4.0 will show a warning about
@@ -43,6 +45,10 @@ Bug Fixes:
4345

4446
`SB3-Contrib`_
4547
^^^^^^^^^^^^^^
48+
- Added ``CrossQ`` algorithm, from "Batch Normalization in Deep Reinforcement Learning" paper (@danielpalen)
49+
- Added ``BatchRenorm`` PyTorch layer used in ``CrossQ`` (@danielpalen)
50+
- Updated QR-DQN optimizer input to only include quantile_net parameters (@corentinlger)
51+
- Fixed loading QRDQN changes `target_update_interval` (@jak3122)
4652

4753
`RL Zoo`_
4854
^^^^^^^^^
@@ -60,12 +66,17 @@ Others:
6066
- Fixed various typos (@cschindlbeck)
6167
- Remove unnecessary SDE noise resampling in PPO update (@brn-dev)
6268
- Updated PyTorch version on CI to 2.3.1
69+
- Added a warning to recommend using CPU with on policy algorithms (A2C/PPO) and ``MlpPolicy``
70+
- Switched to uv to download packages faster on GitHub CI
71+
- Updated dependencies for read the doc
6372

6473
Bug Fixes:
6574
^^^^^^^^^^
6675

6776
Documentation:
6877
^^^^^^^^^^^^^^
78+
- Updated PPO doc to recommend using CPU with ``MlpPolicy``
79+
- Clarified documentation about planned features and citing software
6980

7081
Release 2.3.2 (2024-04-27)
7182
--------------------------

docs/modules/ppo.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,23 @@ Train a PPO agent on ``CartPole-v1`` using 4 environments.
8888
vec_env.render("human")
8989
9090
91+
.. note::
92+
93+
PPO is meant to be run primarily on the CPU, especially when you are not using a CNN. To improve CPU utilization, try turning off the GPU and using ``SubprocVecEnv`` instead of the default ``DummyVecEnv``:
94+
95+
.. code-block::
96+
97+
from stable_baselines3 import PPO
98+
from stable_baselines3.common.env_util import make_vec_env
99+
from stable_baselines3.common.vec_env import SubprocVecEnv
100+
101+
if __name__=="__main__":
102+
env = make_vec_env("CartPole-v1", n_envs=8, vec_env_cls=SubprocVecEnv)
103+
model = PPO("MlpPolicy", env, device="cpu")
104+
model.learn(total_timesteps=25_000)
105+
106+
For more information, see :ref:`Vectorized Environments <vec_env>`, `Issue #1245 <https://github.com/DLR-RM/stable-baselines3/issues/1245#issuecomment-1435766949>`_ or the `Multiprocessing notebook <https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb>`_.
107+
91108
Results
92109
-------
93110

0 commit comments

Comments
 (0)