You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/misc/changelog.rst
+44-7Lines changed: 44 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,10 +3,45 @@
3
3
Changelog
4
4
==========
5
5
6
-
Release 2.4.0a10 (WIP)
6
+
Release 2.5.0a0 (WIP)
7
7
--------------------------
8
8
9
-
**New algorithm: CrossQ in SB3 Contrib**
9
+
Breaking Changes:
10
+
^^^^^^^^^^^^^^^^^
11
+
- Increased minimum required version of PyTorch to 2.3.0
12
+
- Removed support for Python 3.8
13
+
14
+
New Features:
15
+
^^^^^^^^^^^^^
16
+
- Added support for NumPy v2.0: ``VecNormalize`` now cast normalized rewards to float32, updated bit flipping env to avoid overflow issues too
17
+
- Added official support for Python 3.12
18
+
19
+
Bug Fixes:
20
+
^^^^^^^^^^
21
+
22
+
`SB3-Contrib`_
23
+
^^^^^^^^^^^^^^
24
+
25
+
`RL Zoo`_
26
+
^^^^^^^^^
27
+
28
+
`SBX`_ (SB3 + Jax)
29
+
^^^^^^^^^^^^^^^^^^
30
+
31
+
Deprecations:
32
+
^^^^^^^^^^^^^
33
+
34
+
Others:
35
+
^^^^^^^
36
+
37
+
Documentation:
38
+
^^^^^^^^^^^^^^
39
+
40
+
41
+
Release 2.4.0 (2024-11-18)
42
+
--------------------------
43
+
44
+
**New algorithm: CrossQ in SB3 Contrib, Gymnasium v1.0 support**
10
45
11
46
.. note::
12
47
@@ -18,18 +53,20 @@ Release 2.4.0a10 (WIP)
18
53
.. warning::
19
54
20
55
Stable-Baselines3 (SB3) v2.4.0 will be the last one supporting Python 3.8 (end of life in October 2024)
21
-
and PyTorch < 2.0.
22
-
We highly recommended you to upgrade to Python >= 3.9 and PyTorch >= 2.0.
56
+
and PyTorch < 2.3.
57
+
We highly recommended you to upgrade to Python >= 3.9 and PyTorch >= 2.3 (compatible with NumPy v2).
23
58
24
59
25
60
Breaking Changes:
26
61
^^^^^^^^^^^^^^^^^
62
+
- Increased minimum required version of Gymnasium to 0.29.1
27
63
28
64
New Features:
29
65
^^^^^^^^^^^^^
30
66
- Added support for ``pre_linear_modules`` and ``post_linear_modules`` in ``create_mlp`` (useful for adding normalization layers, like in DroQ or CrossQ)
31
67
- Enabled np.ndarray logging for TensorBoardOutputFormat as histogram (see GH#1634) (@iwishwasaneagle)
32
68
- Updated env checker to warn users when using multi-dim array to define `MultiDiscrete` spaces
69
+
- Added support for Gymnasium v1.0
33
70
34
71
Bug Fixes:
35
72
^^^^^^^^^^
@@ -57,6 +94,7 @@ Bug Fixes:
57
94
`SBX`_ (SB3 + Jax)
58
95
^^^^^^^^^^^^^^^^^^
59
96
- Added CNN support for DQN
97
+
- Bug fix for SAC and related algorithms, optimize log of ent coeff to be consistent with SB3
60
98
61
99
Deprecations:
62
100
^^^^^^^^^^^^^
@@ -69,14 +107,13 @@ Others:
69
107
- Added a warning to recommend using CPU with on policy algorithms (A2C/PPO) and ``MlpPolicy``
70
108
- Switched to uv to download packages faster on GitHub CI
71
109
- Updated dependencies for read the doc
72
-
73
-
Bug Fixes:
74
-
^^^^^^^^^^
110
+
- Removed unnecessary ``copy_obs_dict`` method for ``SubprocVecEnv``, remove the use of ordered dict and rename ``flatten_obs`` to ``stack_obs``
75
111
76
112
Documentation:
77
113
^^^^^^^^^^^^^^
78
114
- Updated PPO doc to recommend using CPU with ``MlpPolicy``
79
115
- Clarified documentation about planned features and citing software
116
+
- Added a note about the fact we are optimizing log of ent coeff for SAC
Copy file name to clipboardExpand all lines: docs/modules/sac.rst
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,6 +35,9 @@ Notes
35
35
which is the equivalent to the inverse of reward scale in the original SAC paper.
36
36
The main reason is that it avoids having too high errors when updating the Q functions.
37
37
38
+
.. note::
39
+
When automatically adjusting the temperature (alpha/entropy coefficient), we optimize the logarithm of the entropy coefficient instead of the entropy coefficient itself. This is consistent with the original implementation and has proven to be more stable
40
+
(see issues `GH#36 <https://github.com/DLR-RM/stable-baselines3/issues/36>`_, `#55 <https://github.com/araffin/sbx/issues/55>`_ and others).
0 commit comments