Documentation | Implemented Algorithms | Installation | Getting Started | License
OmniSafe is a comprehensive and reliable benchmark for safe reinforcement learning, covering a multitude of SafeRL domains and delivering a new suite of testing environments.
The simulation environment around OmniSafe and a series of reliable algorithm implementations will help the SafeRL research community easier to replicate and improve the excellent work already done while also helping to facilitate the validation of new ideas and new algorithms.
The supported interface algorithms currently include:
- [AAAI 2023] Augmented Proximal Policy Optimization for Safe Reinforcement Learning (APPO) The original author of the paper contributed code
- [NeurIPS 2022] Constrained Update Projection Approach to Safe Policy Optimization (CUP) The original author of the paper contributed code
- [NeurIPS 2022] Effects of Safety State Augmentation on Safe Exploration (Simmer)
- [NeurIPS 2022] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
- [ICML 2022] Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)
- [ICML 2022] Constrained Variational Policy Optimization for Safe Reinforcement Learning (CVPO)
- [IJCAI 2022] Penalized Proximal Policy Optimization for Safe Reinforcement Learning The original author of the paper contributed code
- [ICLR 2022] Constrained Policy Optimization via Bayesian World Models (LA-MBDA)
- [AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)
- The Lagrange version of PPO (PPO-Lag)
- The Lagrange version of TRPO (TRPO-Lag)
- [ICML 2017] Constrained Policy Optimization (CPO)
- [ICLR 2019] Reward Constrained Policy Optimization (RCPO)
- [ICML 2020] Responsive Safety in Reinforcement Learning by PID Lagrangian Methods (PID-Lag)
- [NeurIPS 2020] First Order Constrained Optimization in Policy Space (FOCOPS)
- [AAAI 2020] IPO: Interior-point Policy Optimization under Constraints (IPO)
- [ICLR 2020] Projection-Based Constrained Policy Optimization (PCPO)
- [ICML 2021] CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
- The Lagrange version of TD3 (TD3-Lag)
- The Lagrange version of DDPG (DDPG-Lag)
- The Lagrange version of SAC (SAC-Lag)
- [ICML 2019] Lyapunov-based Safe Policy Optimization for Continuous Control (SDDPG)
- [ICML 2019] Lyapunov-based Safe Policy Optimization for Continuous Control (SDDPG-modular)
- [ICML 2022] Constrained Variational Policy Optimization for Safe Reinforcement Learning (CVPO)
- [NeurIPS 2021] Safe Reinforcement Learning by Imagining the Near Future (SMBPO)
- [CoRL 2021 (Oral)] Learning Off-Policy with Online Planning (SafeLOOP)
- [AAAI 2022] Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning (CAP)
- [NeurIPS 2022] Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
- [ICLR 2022] Constrained Policy Optimization via Bayesian World Models (LA-MBDA)
- The Lagrange version of BCQ (BCQ-Lag)
- The Constrained version of CRR (C-CRR)
- [AAAI 2022] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning CPQ
- [ICLR 2022 (Spotlight)] COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation
- [ICML 2022] Constrained Offline Policy Optimization (COPO)
- Safe Exploration in Continuous Action Spaces (Safety Layer)
- [RA-L 2021] Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
- [ICML 2022] Sauté RL: Almost Surely Safe Reinforcement Learning Using State Augmentation (SauteRL)
- [NeurIPS 2022] Effects of Safety State Augmentation on Safe Exploration
OmniSafe requires Python 3.8+ and PyTorch 1.10+.
git clone https://github.com/PKU-MARL/omnisafe
cd omnisafe
conda create -n omnisafe python=3.8
conda activate omnisafe
# Install omnisafe
pip install -e .
cd examples
python train_policy.py --env-id SafetyPointGoal1-v0 --algo PPOLag --parallel 1
algo:
Type | Name |
---|---|
Base-On-Policy |
PolicyGradient, PPO NaturalPG, TRPO |
Base-Off-Policy |
DDPG, TD3, SAC |
Naive Lagrange |
RCPO, PPOLag, TRPOLag DDPGLag, TD3Lag, SACLag |
PID Lagrange |
CPPOPid, TRPOPid |
First Order |
FOCOPS, CUP |
Second Order |
SDDPG, CPO, PCPO |
Saute RL |
PPOSaute, PPOLagSaute |
Simmer RL |
PPOSimmerQ, PPOSimmerPid PPOLagSimmerQ, PPOLagSimmerPid |
EarlyTerminated |
PPOEarlyTerminated PPOLagEarlyTerminated |
Model-Based |
CAP, MBPPOLag, SafeLOOP |
env-id: Environment id in Safety Gymnasium, here a list of envs that safety-gymnasium supports.
Category | Task | Agent | Example |
---|---|---|---|
Safe Navigation | Goal[012] | Point, Car, Racecar, Ant | SafetyPointGoal1-v0 |
Button[012] | |||
Push[012] | |||
Circle[012] | |||
Safe Velocity | Velocity | HalfCheetah, Hopper, Swimmer, Walker2d, Ant, Humanoid | SafetyHumanoidVelocity-v4 |
More information about environments, please refer to Safety Gymnasium
parallel: Number of parallels
import omnisafe
env = 'SafetyPointGoal1-v0'
agent = omnisafe.Agent('PPOLag', env)
agent.learn()
import omnisafe
env = 'SafetyPointGoal1-v0'
custom_dict = {'epochs': 1, 'data_dir': './runs'}
agent = omnisafe.Agent('PPOLag', env, custom_cfgs=custom_dict)
agent.learn()
cd examples
python train_policy.py --env-id SafetyPointGoal1-v0 --algo PPOLag --parallel 1
import os
import omnisafe
# Just fill your experiment's log directory in here.
# Such as: ~/omnisafe/runs/SafetyPointGoal1-v0/CPO/seed-000-2022-12-25_14-45-05
LOG_DIR = ''
evaluator = omnisafe.Evaluator()
for item in os.scandir(os.path.join(LOG_DIR, 'torch_save')):
if item.is_file() and item.name.split('.')[-1] == 'pt':
evaluator.load_saved_model(save_dir=LOG_DIR, model_name=item.name)
evaluator.render(num_episode=10, camera_name='track', width=256, height=256)
OmniSafe is currently maintained by Borong Zhang, Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun ,Xuehai Pan, Jiamg Ji and under the instruction of Prof. Yaodong Yang. If you have any question in the process of using omnisafe, don't hesitate to ask your question in the GitHub issue page, we will reply you in 2-3 working days.
OmniSafe is released under Apache License 2.0.