Skip to content

DeepDeterministicPolicyGradient

Neo-X edited this page Jan 16, 2018 · 1 revision

Intro

Deep Deterministic Plicy Gradient (DDPG) is a recient RL method for learning a policy by passing gradients from the critic to the actor directly from the critic.

Getting it working

  1. Needed to reduce the learning rate on the actor by a factor of 10. It is not 0.00001
  2. The networks operate independantly. I compute the gradient for the inputs of the critic and then backprop those grads through the policy.
Clone this wiki locally