Skip to content

Asynchronous API for ParallelRLEnv #343

@vwxyzjn

Description

@vwxyzjn

Hello, this work looks pretty cool and looking forward to using it in the future.

I was wondering if you would be interested in implementing EnvPool's Asynchronous API, which looks like below:

import envpool
import numpy as np

num_envs = 64
batch_size = 16
env = envpool.make("Pong-v5", env_type="gym", num_envs=num_envs, batch_size=batch_size)
action_num = env.action_space.n
env.async_reset()
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
obs, rew, done, info = env.recv()
print(obs.shape, info["env_id"])
action = np.random.randint(action_num, size=batch_size)
env.send(action, info["env_id"])
(16, 4, 84, 84) [ 1  0  8  3  5  9 11  6 13 12 16 14  4 18  2 19]
(16, 4, 84, 84) [23 24 17 21 25 26 28 20 32 31 22  7 15 29 27 30]
(16, 4, 84, 84) [34 10 38 41 40 35 33 36 39 37 42 48 51 50 52 44]

The general idea is to return a subset of the environments for the agent to sample actions while the environments execute other actions. This approach should scale considerably better, primarily when the engine backend uses socket (#219). In CleanRL we have a fast PPO implementation prototype that leverages this async API (see code here)

image

Farama-Foundation/Gymnasium#98 also contains an example of implementing this type of Async API with existing vectorized environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions