-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minigrid version #24
Comments
Hi. The versions are as follows:
For using unregistered envs like MiniGrid-MultiRoom-N12-S10, u need to add a code like: import minigrid
from gymnasium.envs.registration import register
register(
id="MiniGrid-MultiRoom-N10-S10-v0",
entry_point="minigrid.envs:MultiRoomEnv",
kwargs={"minNumRooms": 10, "maxNumRooms": 10, "maxRoomSize": 10},
)
register(
id="MiniGrid-MultiRoom-N12-S10-v0",
entry_point="minigrid.envs:MultiRoomEnv",
kwargs={"minNumRooms": 12, "maxNumRooms": 12, "maxRoomSize": 10},
) |
Thanks for the versions which help a lot. |
Hi, the code for the MiniGrid task is an experiment version that hasn't been uploaded yet. You can change the current code by
def make_minigrid_env(
env_id: str = "MiniGrid-DoorKey-5x5-v0",
num_envs: int = 8,
fully_observable: bool = False,
fully_numerical: bool = False,
seed: int = 0,
frame_stack: int = 4,
device: str = "cpu",
asynchronous: bool = False,
) -> Gymnasium2Torch:
"""Create MiniGrid environments.
Args:
env_id (str): Name of environment.
num_envs (int): Number of environments.
fully_observable (bool): Fully observable gridworld using a compact grid encoding instead of the agent view.
fully_numerical (bool): Transforms the observation space (that has a textual component) to a fully numerical
observation space, where the textual instructions are replaced by arrays representing the indices of each
word in a fixed vocabulary.
seed (int): Random seed.
frame_stack (int): Number of stacked frames.
device (str): Device to convert the data.
asynchronous (bool): `True` for creating asynchronous environments,
and `False` for creating synchronous environments.
Returns:
The vectorized environments.
"""
def make_env(env_id: str, seed: int) -> Callable:
def _thunk():
env = gym.make(env_id)
#env = RGBImgPartialObsWrapper(env)
env = ImageTranspose(env)
env = ImgObsWrapper(env)
#env = ResizeObservation(env, 84)
#env = FrameStack(env, k=frame_stack)
env.action_space.seed(seed)
env.observation_space.seed(seed)
return env
return _thunk
envs = [make_env(env_id, seed + i) for i in range(num_envs)]
if asynchronous:
envs = AsyncVectorEnv(envs)
else:
envs = SyncVectorEnv(envs)
envs = TransformReward(envs, lambda r: 100.0 * r)
envs = RecordEpisodeStatistics(envs)
return Gymnasium2Torch(envs, device=device)
import torch
from torch import nn
from rllte.common.prototype import BaseEncoder
class MinigridEncoder(BaseEncoder):
def __init__(self, observation_space, features_dim: int = 512) -> None:
super().__init__(observation_space, features_dim)
n_input_channels = observation_space.shape[0]
self.cnn = nn.Sequential(
nn.Conv2d(n_input_channels, 16, (2, 2)),
nn.ReLU(),
nn.Conv2d(16, 32, (2, 2)),
nn.ReLU(),
nn.Conv2d(32, 64, (2, 2)),
nn.ReLU(),
nn.Flatten(),
)
# Compute shape by doing one forward pass
with torch.no_grad():
observations = observation_space.sample()
observations = torch.as_tensor(observations[None]).float()
n_flatten = self.cnn(observations).float().shape[1]
self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())
def forward(self, observations: torch.Tensor) -> torch.Tensor:
#observations = observations.permute(0, 3, 1, 2).float()
return self.linear(self.cnn(observations.float())) change the old encoder, like in E3B # build the encoder and inverse dynamics model
self.encoder = MinigridEncoder(observation_space=observation_space).to(self.device)
intrinsic_reward = E3B(
observation_space=env.observation_space,
action_space=env.action_space,
device=device,
n_envs=args.n_envs,
rwd_norm_type="rms",
obs_rms=True,
update_proportion=1.0,
gamma=args.int_gamma,
encoder_model=encoder_model,
weight_init='orthogonal',
beta=0.25,
latent_dim=args.hidden_dim
) I will update the code asap, you can have a try first. Thx! |
Hi, thanks for your kind replies. I have tried the above code, Unfortunately, I still can not replicate the results on MiniGrid-MultiRoom-N12-S10-v0. The reward is always zero. Perhaps some hyperparameters are not set appropriately. Could u please provide them? Really thanks. Also looking forward to the update version. |
Could u provide an email? I can share the experiment code with you first. |
Thanks a lot. |
Sent via email. |
Hi, I have tried the code of RIDE on MultiRoom-N10S10-v0 which you emailed. The reward converges to zero which is inconsistent with the RIDE original paper. I am not sure if I have tried it correctly. Moreover, the pseudo_counts involves the k-nearest neighbors of f(x_t) in the memory. The original implementation of pseudo_counts in RIDE is the sqrt of N(ep_s) which indicates the number of times that state has been visited during the current episode. I was wondering if you have ever tried the N(ep_s) way. Thanks. |
@yuanmingqi Hello can I get the code too? My email is [email protected] |
sent. |
Thank you so much!! |
Hello, I am trying to run on env minigrid. Could you please help with the version of minigrid, gymnasium, gym? I want to test on envs MiniGrid-ObstructedMaze-Full, and MiniGrid-MultiRoom-N12-S10 etc. Really thanks for your help and look forwarding to your replay.
The text was updated successfully, but these errors were encountered: