Hello,
I wanted to verify something I found in your code. In the method MADDPGAgentTrainer.update() there is a comment next to the following line stating that an update is only allowed to occur every 100 steps:
if not t % 100 == 0: # only update every 100 steps
return
I could be misreading this, but doesn't this line mean that an update will occur every step but skip over steps when t_step%100==0?