Skip to content

ProximialPolicyOptimization

Neo-X edited this page Jan 16, 2018 · 1 revision

Intro

Proximial Policy Optimization (PPO) is a new method for stochastic policy optimization. Similar to how TRPO works but uses stochastic gradient descent instead of conjugate gradient descent (which requires the coputation of the gradient of the policy gradient) like TRPO.

Getting is working

Clone this wiki locally