You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deep Deterministic Plicy Gradient (DDPG) is a recient RL method for learning a policy by passing gradients from the critic to the actor directly from the critic.
Getting it working
Needed to reduce the learning rate on the actor by a factor of 10. It is not 0.00001
The networks operate independantly. I compute the gradient for the inputs of the critic and then backprop those grads through the policy.