Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minigrid version #24

Open
Acedorkz opened this issue Oct 23, 2024 · 11 comments
Open

Minigrid version #24

Acedorkz opened this issue Oct 23, 2024 · 11 comments

Comments

@Acedorkz
Copy link

Hello, I am trying to run on env minigrid. Could you please help with the version of minigrid, gymnasium, gym? I want to test on envs MiniGrid-ObstructedMaze-Full, and MiniGrid-MultiRoom-N12-S10 etc. Really thanks for your help and look forwarding to your replay.

@yuanmingqi
Copy link
Collaborator

Hi. The versions are as follows:

  • gymnasium 0.28.1
  • gym 0.26.1
  • minigrid 2.3.1

For using unregistered envs like MiniGrid-MultiRoom-N12-S10, u need to add a code like:

import minigrid
from gymnasium.envs.registration import register

register(
        id="MiniGrid-MultiRoom-N10-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 10, "maxNumRooms": 10, "maxRoomSize": 10},
    )

register(
        id="MiniGrid-MultiRoom-N12-S10-v0",
        entry_point="minigrid.envs:MultiRoomEnv",
        kwargs={"minNumRooms": 12, "maxNumRooms": 12, "maxRoomSize": 10},
    )

@Acedorkz
Copy link
Author

Thanks for the versions which help a lot.
Sorry, one more question about the encoder_model used in MultiRoom-N10-S10-v0. Espeholt or Mnih can not work. Could you give some guidance about how to run on minigrid env to replicate the results on https://wandb.ai/yuanmingqi/RLeXplore/reportlist?
Thanks again. Have a nice day.

@yuanmingqi
Copy link
Collaborator

Hi, the code for the MiniGrid task is an experiment version that hasn't been uploaded yet. You can change the current code by

  • change the env code by:
def make_minigrid_env(
    env_id: str = "MiniGrid-DoorKey-5x5-v0",
    num_envs: int = 8,
    fully_observable: bool = False,
    fully_numerical: bool = False,
    seed: int = 0,
    frame_stack: int = 4,
    device: str = "cpu",
    asynchronous: bool = False,
) -> Gymnasium2Torch:
    """Create MiniGrid environments.

    Args:
        env_id (str): Name of environment.
        num_envs (int): Number of environments.
        fully_observable (bool): Fully observable gridworld using a compact grid encoding instead of the agent view.
        fully_numerical (bool): Transforms the observation space (that has a textual component) to a fully numerical
            observation space, where the textual instructions are replaced by arrays representing the indices of each
            word in a fixed vocabulary.
        seed (int): Random seed.
        frame_stack (int): Number of stacked frames.
        device (str): Device to convert the data.
        asynchronous (bool): `True` for creating asynchronous environments,
            and `False` for creating synchronous environments.

    Returns:
        The vectorized environments.
    """

    def make_env(env_id: str, seed: int) -> Callable:
        def _thunk():
            env = gym.make(env_id)

            #env = RGBImgPartialObsWrapper(env)
            env = ImageTranspose(env)
            env = ImgObsWrapper(env)
            #env = ResizeObservation(env, 84)
            #env = FrameStack(env, k=frame_stack)
            
            env.action_space.seed(seed)
            env.observation_space.seed(seed)

            return env

        return _thunk

    envs = [make_env(env_id, seed + i) for i in range(num_envs)]

    if asynchronous:
        envs = AsyncVectorEnv(envs)
    else:
        envs = SyncVectorEnv(envs)

    envs = TransformReward(envs, lambda r: 100.0 * r)
    envs = RecordEpisodeStatistics(envs)

    return Gymnasium2Torch(envs, device=device)
  • use a new MiniGrid encoder
import torch
from torch import nn
from rllte.common.prototype import BaseEncoder

class MinigridEncoder(BaseEncoder):
    def __init__(self, observation_space, features_dim: int = 512) -> None:
        super().__init__(observation_space, features_dim)
        n_input_channels = observation_space.shape[0]
        self.cnn = nn.Sequential(
            nn.Conv2d(n_input_channels, 16, (2, 2)),
            nn.ReLU(),
            nn.Conv2d(16, 32, (2, 2)),
            nn.ReLU(),
            nn.Conv2d(32, 64, (2, 2)),
            nn.ReLU(),
            nn.Flatten(),
        )

        # Compute shape by doing one forward pass
        with torch.no_grad():
            observations = observation_space.sample()
            observations = torch.as_tensor(observations[None]).float()
            n_flatten = self.cnn(observations).float().shape[1]

        self.linear = nn.Sequential(nn.Linear(n_flatten, features_dim), nn.ReLU())

    def forward(self, observations: torch.Tensor) -> torch.Tensor:
        #observations = observations.permute(0, 3, 1, 2).float()
        return self.linear(self.cnn(observations.float()))

change the old encoder, like in E3B

# build the encoder and inverse dynamics model
self.encoder = MinigridEncoder(observation_space=observation_space).to(self.device)
  • use the following hyperparameters:
intrinsic_reward = E3B(
                observation_space=env.observation_space,
                action_space=env.action_space,
                device=device,
                n_envs=args.n_envs,
                rwd_norm_type="rms",
                obs_rms=True,
                update_proportion=1.0,
                gamma=args.int_gamma,
                encoder_model=encoder_model,
                weight_init='orthogonal',
                beta=0.25,
                latent_dim=args.hidden_dim
            )

I will update the code asap, you can have a try first. Thx!

@Acedorkz
Copy link
Author

Hi, thanks for your kind replies. I have tried the above code, Unfortunately, I still can not replicate the results on MiniGrid-MultiRoom-N12-S10-v0. The reward is always zero. Perhaps some hyperparameters are not set appropriately. Could u please provide them? Really thanks. Also looking forward to the update version.

@yuanmingqi
Copy link
Collaborator

Could u provide an email? I can share the experiment code with you first.

@Acedorkz
Copy link
Author

Thanks a lot.
[email protected]

@yuanmingqi
Copy link
Collaborator

Sent via email.

@Acedorkz
Copy link
Author

Acedorkz commented Nov 1, 2024

Hi, I have tried the code of RIDE on MultiRoom-N10S10-v0 which you emailed. The reward converges to zero which is inconsistent with the RIDE original paper. I am not sure if I have tried it correctly.

Moreover, the pseudo_counts involves the k-nearest neighbors of f(x_t) in the memory. The original implementation of pseudo_counts in RIDE is the sqrt of N(ep_s) which indicates the number of times that state has been visited during the current episode. I was wondering if you have ever tried the N(ep_s) way.

Thanks.

@deepneuralnetworks
Copy link

deepneuralnetworks commented Dec 28, 2024

@yuanmingqi Hello can I get the code too? My email is [email protected]

@yuanmingqi
Copy link
Collaborator

@yuanmingqi Hello can I get the code too? My email is [email protected]

sent.

@deepneuralnetworks
Copy link

@yuanmingqi Hello can I get the code too? My email is [email protected]

sent.

Thank you so much!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants