Skip to content

Commit 74a70e1

Browse files
authoredOct 17, 2022
Speed-up to O(1) from O(N) of the computation of each return in REINFORCE (#1083)
Replace list with deque to obtain O(1) time complexity of insertion at the beginning of the list of returns
1 parent ca1bd91 commit 74a70e1

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed
 

‎reinforcement_learning/reinforce.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import gym
33
import numpy as np
44
from itertools import count
5-
5+
from collections import deque
66
import torch
77
import torch.nn as nn
88
import torch.nn.functional as F
@@ -62,10 +62,10 @@ def select_action(state):
6262
def finish_episode():
6363
R = 0
6464
policy_loss = []
65-
returns = []
65+
returns = deque()
6666
for r in policy.rewards[::-1]:
6767
R = r + args.gamma * R
68-
returns.insert(0, R)
68+
returns.appendleft(R)
6969
returns = torch.tensor(returns)
7070
returns = (returns - returns.mean()) / (returns.std() + eps)
7171
for log_prob, R in zip(policy.saved_log_probs, returns):

0 commit comments

Comments
 (0)
Please sign in to comment.