Implementations of RL algorithms using PyTorch.
- Deep Q-Learning
- Double Deep Q-Learning
- Deep Deterministic Policy Gradient
- Advantage Actor-Critic
- Proximal Policy Optimization
- Twin Delayed Deep Deterministic Policy Gradient
[Playing Atari with Deep Reinforcement Learning]
Simple implementation of the deep Q-learning agent with experience replay and a target network that is periodically updated to match the value network.
[Deep Reinforcement Learning with Double Q-learning]
Same as DQN, except the online network is used for action selection.
[Continuous control with deep reinforcement learning]
Implementation of the deep deterministic policy gradient algorithm for continuous action spaces.
[Asynchronous Methods for Deep Reinforcement Learning]
Advantage actor-critic with eligibility traces. Value function trains towards λ-weighted sum of n-step TD-targets.
[Proximal Policy Optimization Algorithms]
Implementation of the clipping variant of PPO. Supports weight sharing between policy and value functions. Value function trains towards λ-weighted sum of n-step TD-targets. Generalized advantage estimation is used, truncated at end of episode or end of batch.
[Addressing Function Approximation Error in Actor-Critic Methods]
As described in the paper.