You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
states = torch.cat(states)
gae = 0
R = []
for value, reward, done in list(zip(values, rewards, dones))[::-1]: # len(list(zip(values, rewards, dones))[::-1]) is 512
gae = gae * opt.gamma * opt.tau
gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach()
next_value = value
R.append(gae + value)
##################################################################################
Question: with --num_local_steps=512 and —num_processes=8, after 'values = torch.cat(values).detach()’, the values.shape is torch.Size([4096]). But this list: "list(zip(values, rewards, dones))[::-1]”, the length is 512, which mean only the first 512 items “values" are used in the "for…loop”, the others are discarded.
So, in every 512 local_steps, only the values of first 64(=512/8) steps are used to calculate GAE and R. Is it a problem or I have misunderstanding?
Looking for your answer, thanks!
The text was updated successfully, but these errors were encountered:
While study your Mario PPO codes, https://github.com/uvipen/Super-mario-bros-PPO-pytorch/blob/master/train.py, it’s hard to understand the following codes:
################################################################################
values = torch.cat(values).detach() # torch.Size([4096])
states = torch.cat(states)
gae = 0
R = []
for value, reward, done in list(zip(values, rewards, dones))[::-1]: # len(list(zip(values, rewards, dones))[::-1]) is 512
gae = gae * opt.gamma * opt.tau
gae = gae + reward + opt.gamma * next_value.detach() * (1 - done) - value.detach()
next_value = value
R.append(gae + value)
##################################################################################
The text was updated successfully, but these errors were encountered: