Minigrid version #24

Acedorkz · 2024-10-23T15:27:05Z

Hello, I am trying to run on env minigrid. Could you please help with the version of minigrid, gymnasium, gym? I want to test on envs MiniGrid-ObstructedMaze-Full, and MiniGrid-MultiRoom-N12-S10 etc. Really thanks for your help and look forwarding to your replay.

yuanmingqi · 2024-10-24T05:44:55Z

Hi. The versions are as follows:

gymnasium 0.28.1
gym 0.26.1
minigrid 2.3.1

For using unregistered envs like MiniGrid-MultiRoom-N12-S10, u need to add a code like:

import minigrid
from gymnasium.envs.registration import register

register(
        id="MiniGrid-MultiRoom-N10-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 10, "maxNumRooms": 10, "maxRoomSize": 10},
    )

register(
        id="MiniGrid-MultiRoom-N12-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 12, "maxNumRooms": 12, "maxRoomSize": 10},
    )

Acedorkz · 2024-10-24T09:07:47Z

Thanks for the versions which help a lot.
Sorry, one more question about the encoder_model used in MultiRoom-N10-S10-v0. Espeholt or Mnih can not work. Could you give some guidance about how to run on minigrid env to replicate the results on https://wandb.ai/yuanmingqi/RLeXplore/reportlist?
Thanks again. Have a nice day.

yuanmingqi · 2024-10-24T12:01:26Z

Hi, the code for the MiniGrid task is an experiment version that hasn't been uploaded yet. You can change the current code by

change the env code by:

def make_minigrid_env(
    env_id: str = "MiniGrid-DoorKey-5x5-v0",
    num_envs: int = 8,
    fully_observable: bool = False,
    fully_numerical: bool = False,
    seed: int = 0,
    frame_stack: int = 4,
    device: str = "cpu",
    asynchronous: bool = False,
) -> Gymnasium2Torch:
    """Create MiniGrid environments.

    Args:
        env_id (str): Name of environment.
        num_envs (int): Number of environments.
        fully_observable (bool): Fully observable gridworld using a compact grid encoding instead of the agent view.
        fully_numerical (bool): Transforms the observation space (that has a textual component) to a fully numerical
            observation space, where the textual instructions are replaced by arrays representing the indices of each
            word in a fixed vocabulary.
        seed (int): Random seed.
        frame_stack (int): Number of stacked frames.
        device (str): Device to convert the data.
        asynchronous (bool): `True` for creating asynchronous environments,
            and `False` for creating synchronous environments.

    Returns:
        The vectorized environments.
    """

    def make_env(env_id: str, seed: int) -> Callable:
        def _thunk():
            env = gym.make(env_id)

            #env = RGBImgPartialObsWrapper(env)
            env = ImageTranspose(env)
            env = ImgObsWrapper(env)
            #env = ResizeObservation(env, 84)
            #env = FrameStack(env, k=frame_stack)
            
            env.action_space.seed(seed)
            env.observation_space.seed(seed)

            return env

        return _thunk

    envs = [make_env(env_id, seed + i) for i in range(num_envs)]

    if asynchronous:
        envs = AsyncVectorEnv(envs)
    else:
        envs = SyncVectorEnv(envs)

    envs = TransformReward(envs, lambda r: 100.0 * r)
    envs = RecordEpisodeStatistics(envs)

    return Gymnasium2Torch(envs, device=device)

use a new MiniGrid encoder

import torch
from torch import nn
from rllte.common.prototype import BaseEncoder

class MinigridEncoder(BaseEncoder):
    def __init__(self, observation_space, features_dim: int = 512) -> None:
        super().__init__(observation_space, features_dim)
        n_input_channels = observation_space.shape[0]
        self.cnn = nn.Sequential(
            nn.Conv2d(n_input_channels, 16, (2, 2)),
            nn.ReLU(),
            nn.Conv2d(16, 32, (2, 2)),
            nn.ReLU(),
            nn.Conv2d(32, 64, (2, 2)),
            nn.ReLU(),
            nn.Flatten(),
        )

        # Compute shape by doing one forward pass
        with torch.no_grad():
            observations = observation_space.sample()
            observations = torch.as_tensor(observations[None]).float()
            n_flatten = self.cnn(observations).float().shape[1]

        self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())

    def forward(self, observations: torch.Tensor) -> torch.Tensor:
        #observations = observations.permute(0, 3, 1, 2).float()
        return self.linear(self.cnn(observations.float()))

change the old encoder, like in E3B

# build the encoder and inverse dynamics model
self.encoder = MinigridEncoder(observation_space=observation_space).to(self.device)

use the following hyperparameters:

intrinsic_reward = E3B(
                observation_space=env.observation_space,
                action_space=env.action_space,
                device=device,
                n_envs=args.n_envs,
                rwd_norm_type="rms",
                obs_rms=True,
                update_proportion=1.0,
                gamma=args.int_gamma,
                encoder_model=encoder_model,
                weight_init='orthogonal',
                beta=0.25,
                latent_dim=args.hidden_dim
            )

I will update the code asap, you can have a try first. Thx!

Acedorkz · 2024-10-27T09:57:56Z

Hi, thanks for your kind replies. I have tried the above code, Unfortunately, I still can not replicate the results on MiniGrid-MultiRoom-N12-S10-v0. The reward is always zero. Perhaps some hyperparameters are not set appropriately. Could u please provide them? Really thanks. Also looking forward to the update version.

yuanmingqi · 2024-10-27T10:02:20Z

Could u provide an email? I can share the experiment code with you first.

Acedorkz · 2024-10-27T10:09:25Z

Thanks a lot.
[email protected]

yuanmingqi · 2024-10-29T10:52:03Z

Sent via email.

Acedorkz · 2024-11-01T10:01:10Z

Hi, I have tried the code of RIDE on MultiRoom-N10S10-v0 which you emailed. The reward converges to zero which is inconsistent with the RIDE original paper. I am not sure if I have tried it correctly.

Moreover, the pseudo_counts involves the k-nearest neighbors of f(x_t) in the memory. The original implementation of pseudo_counts in RIDE is the sqrt of N(ep_s) which indicates the number of times that state has been visited during the current episode. I was wondering if you have ever tried the N(ep_s) way.

Thanks.

deepneuralnetworks · 2024-12-28T08:52:52Z

@yuanmingqi Hello can I get the code too? My email is [email protected]

yuanmingqi · 2024-12-30T02:05:23Z

@yuanmingqi Hello can I get the code too? My email is [email protected]

sent.

deepneuralnetworks · 2024-12-30T03:29:20Z

@yuanmingqi Hello can I get the code too? My email is [email protected]

sent.

Thank you so much!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minigrid version #24

Minigrid version #24

Acedorkz commented Oct 23, 2024

yuanmingqi commented Oct 24, 2024

Acedorkz commented Oct 24, 2024

yuanmingqi commented Oct 24, 2024

Acedorkz commented Oct 27, 2024

yuanmingqi commented Oct 27, 2024

Acedorkz commented Oct 27, 2024

yuanmingqi commented Oct 29, 2024

Acedorkz commented Nov 1, 2024

deepneuralnetworks commented Dec 28, 2024 •

edited

Loading

yuanmingqi commented Dec 30, 2024

deepneuralnetworks commented Dec 30, 2024

Minigrid version #24

Minigrid version #24

Comments

Acedorkz commented Oct 23, 2024

yuanmingqi commented Oct 24, 2024

Acedorkz commented Oct 24, 2024

yuanmingqi commented Oct 24, 2024

Acedorkz commented Oct 27, 2024

yuanmingqi commented Oct 27, 2024

Acedorkz commented Oct 27, 2024

yuanmingqi commented Oct 29, 2024

Acedorkz commented Nov 1, 2024

deepneuralnetworks commented Dec 28, 2024 • edited Loading

yuanmingqi commented Dec 30, 2024

deepneuralnetworks commented Dec 30, 2024

deepneuralnetworks commented Dec 28, 2024 •

edited

Loading