ProximialPolicyOptimization

Intro

Proximial Policy Optimization (PPO) is a new method for stochastic policy optimization. Similar to how TRPO works but uses stochastic gradient descent instead of conjugate gradient descent (which requires the coputation of the gradient of the policy gradient) like TRPO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ProximialPolicyOptimization

Intro

Getting is working

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally