You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 3, 2022. It is now read-only.
* Add support for custom objects
* Add python 3.8 to the CI
* Bump version
* PyType fixes
* [ci skip] Fix typo
* Add note about slow-down + fix typos
* Minor edits to the doc
* Bug fix for DQN
* Update test
* Add test for custom objects
Copy file name to clipboardexpand all lines: docs/guide/migration.rst
+7
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,13 @@ You can also take a look at the `rl-zoo3 <https://github.com/DLR-RM/rl-baselines
33
33
to the `rl-zoo <https://github.com/araffin/rl-baselines-zoo>`_ of SB2 to have a concrete example of successful migration.
34
34
35
35
36
+
.. note::
37
+
38
+
If you experience massive slow-down switching to PyTorch, you may need to play with the number of threads used,
39
+
using ``torch.set_num_threads(1)`` or ``OMP_NUM_THREADS=1``, see `issue #122 <https://github.com/DLR-RM/stable-baselines3/issues/122>`_
40
+
and `issue #90 <https://github.com/DLR-RM/stable-baselines3/issues/90>`_.
Copy file name to clipboardexpand all lines: docs/guide/rl_tips.rst
+7-7
Original file line number
Diff line number
Diff line change
@@ -119,14 +119,14 @@ Discrete Actions
119
119
Discrete Actions - Single Process
120
120
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
121
121
122
-
DQN with extensions (double DQN, prioritized replay, ...) are the recommended algorithms.
123
-
We notably provide QR-DQN in our :ref:`contrib repo <sb3_contrib>`.
124
-
DQN is usually slower to train (regarding wall clock time) but is the most sample efficient (because of its replay buffer).
122
+
``DQN`` with extensions (double DQN, prioritized replay, ...) are the recommended algorithms.
123
+
We notably provide ``QR-DQN`` in our :ref:`contrib repo <sb3_contrib>`.
124
+
``DQN`` is usually slower to train (regarding wall clock time) but is the most sample efficient (because of its replay buffer).
125
125
126
126
Discrete Actions - Multiprocessed
127
127
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
128
128
129
-
You should give a try to PPO or A2C.
129
+
You should give a try to ``PPO`` or ``A2C``.
130
130
131
131
132
132
Continuous Actions
@@ -142,7 +142,7 @@ Please use the hyperparameters in the `RL zoo <https://github.com/DLR-RM/rl-base
142
142
Continuous Actions - Multiprocessed
143
143
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
144
144
145
-
Take a look at PPO, TRPO or A2C. Again, don't forget to take the hyperparameters from the `RL zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_
145
+
Take a look at ``PPO`` or ``A2C``. Again, don't forget to take the hyperparameters from the `RL zoo <https://github.com/DLR-RM/rl-baselines3-zoo>`_
146
146
for continuous actions problems (cf *Bullet* envs).
147
147
148
148
.. note::
@@ -155,12 +155,12 @@ Goal Environment
155
155
-----------------
156
156
157
157
If your environment follows the ``GoalEnv`` interface (cf :ref:`HER <her>`), then you should use
158
-
HER + (SAC/TD3/DDPG/DQN/TQC) depending on the action space.
158
+
HER + (SAC/TD3/DDPG/DQN/QR-DQN/TQC) depending on the action space.
159
159
160
160
161
161
.. note::
162
162
163
-
The number of workers is an important hyperparameters for experiments with HER
163
+
The ``batch_size`` is an important hyperparameter for experiments with :ref:`HER<her>`
0 commit comments