You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the following code to evaluate the model I obtained
import torch as th
import os
from rllte.env import make_mario_env
from rllte.agent import PPO, DDPG
import rllte
if __name__ == '__main__':
n_steps: int = 2048 * 16
device = 'cuda' if th.cuda.is_available() else 'cpu'
envs = make_mario_env('SuperMarioBros-1-1-v0', device=device, num_envs=1,
asynchronous=False, frame_stack=4, gray_scale=True)
print(device, envs.observation_space, envs.action_space)
agent = PPO(envs,
device=device,
batch_size=512,
n_epochs=10,
num_steps=n_steps//8,
pretraining=False)
agent.freeze(init_model_path="pretrained_1507328.pth")
agent.eval_env = envs
agent.eval(3)
But checking the x_pos of Mario at the end of each episode I noticed that for all the three evaluation the algorithm is behaving deterministically, returning the same result. Is there a way to avoid this?
The text was updated successfully, but these errors were encountered:
I am running the following code to evaluate the model I obtained
But checking the x_pos of Mario at the end of each episode I noticed that for all the three evaluation the algorithm is behaving deterministically, returning the same result. Is there a way to avoid this?
The text was updated successfully, but these errors were encountered: