@@ -16,53 +16,53 @@ class VecEnv(ABC):
16
16
the same action is applied to all environments and the same observation is returned from all environments.
17
17
18
18
All extra observations must be provided as a dictionary to "extras" in the step() method. Based on the
19
- configuration, the extra observations are used for different purposes. The following keys are reserved
20
- in the "observations" dictionary (if they are present) :
19
+ configuration, the extra observations are used for different purposes. The following keys are used by the
20
+ environment :
21
21
22
- - "critic": The observation is used as input to the critic network. Useful for asymmetric observation spaces.
23
- - "rnd_state": The observation is used as input to the RND network. Useful for random network distillation.
24
- """
22
+ - "observations" (dict[str, dict[str, torch.Tensor]]):
23
+ Additional observations that are not used by the actor networks. The keys are the names of the observations
24
+ and the values are the observations themselves. The following are reserved keys for the observations:
25
25
26
- num_envs : int
27
- """Number of environments."""
26
+ - "critic": The observation is used as input to the critic network. Useful for asymmetric observation spaces.
27
+ - "rnd_state": The observation is used as input to the RND network. Useful for random network distillation.
28
28
29
- num_obs : int
30
- """Number of observations."""
29
+ - "time_outs" (torch.Tensor): Timeouts for the environments. These correspond to terminations that happen due to time limits and
30
+ not due to the environment reaching a terminal state. This is useful for environments that have a fixed
31
+ episode length.
31
32
32
- num_privileged_obs : int
33
- """Number of privileged observations."""
33
+ - "log" (dict[str, float | torch.Tensor]): Additional information for logging and debugging purposes.
34
+ The key should be a string and start with "/" for namespacing. The value can be a scalar or a tensor.
35
+ If it is a tensor, the mean of the tensor is used for logging.
34
36
35
- num_actions : int
36
- """Number of actions."""
37
+ .. deprecated:: 2.0.0
37
38
38
- max_episode_length : int
39
- """Maximum episode length."""
39
+ Use "log" in the extra information dictionary instead of the "episode" key.
40
40
41
- privileged_obs_buf : torch .Tensor
42
- """Buffer for privileged observations."""
41
+ """
43
42
44
- obs_buf : torch . Tensor
45
- """Buffer for observations ."""
43
+ num_envs : int
44
+ """Number of environments ."""
46
45
47
- rew_buf : torch .Tensor
48
- """Buffer for rewards."""
46
+ num_actions : int
47
+ """Number of actions."""
48
+
49
+ max_episode_length : int | torch .Tensor
50
+ """Maximum episode length.
49
51
50
- reset_buf : torch .Tensor
51
- """Buffer for resets."""
52
+ The maximum episode length can be a scalar or a tensor. If it is a scalar, it is the same for all environments.
53
+ If it is a tensor, it is the maximum episode length for each environment. This is useful for dynamic episode
54
+ lengths.
55
+ """
52
56
53
57
episode_length_buf : torch .Tensor
54
58
"""Buffer for current episode lengths."""
55
59
56
- extras : dict
57
- """Extra information (metrics).
58
-
59
- Extra information is stored in a dictionary. This includes metrics such as the episode reward, episode length,
60
- etc. Additional information can be stored in the dictionary such as observations for the critic network, etc.
61
- """
62
-
63
60
device : torch .device
64
61
"""Device to use."""
65
62
63
+ cfg : dict | object
64
+ """Configuration object."""
65
+
66
66
"""
67
67
Operations.
68
68
"""
@@ -89,6 +89,9 @@ def reset(self) -> tuple[torch.Tensor, dict]:
89
89
def step (self , actions : torch .Tensor ) -> tuple [torch .Tensor , torch .Tensor , torch .Tensor , dict ]:
90
90
"""Apply input action on the environment.
91
91
92
+ The extra information is a dictionary. It includes metrics such as the episode reward, episode length,
93
+ etc. Additional information can be stored in the dictionary such as observations for the critic network, etc.
94
+
92
95
Args:
93
96
actions (torch.Tensor): Input actions to apply. Shape: (num_envs, num_actions)
94
97
0 commit comments