DeepDeterministicPolicyGradient

Intro

Deep Deterministic Plicy Gradient (DDPG) is a recient RL method for learning a policy by passing gradients from the critic to the actor directly from the critic.

Getting it working

Needed to reduce the learning rate on the actor by a factor of 10. It is not 0.00001
The networks operate independantly. I compute the gradient for the inputs of the critic and then backprop those grads through the policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DeepDeterministicPolicyGradient

Intro

Getting it working

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally