-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Questions] Performance with default ddpg, sac, td3 and other confuse #9
Comments
Hi @Gaiejj , thanks for your questions and the great work you have done for OmniSafe. I combine the answers for Q1 and Q3 as follows:
For Q2, what do you mean by using a true random layout for the velocity task? Are the results here using similar implementations in this repo? Note that my implementations mainly follows the spiningup repo, and it should work well for most non-safe RL tasks, given properly selected hyper-parameters. One thing I could imagine to cause the disparity is the parameter Also, I will release a new implementation of CVPO, DDPG-Lag, SAC-Lag based on Tianshou very soon in the coming weeks. My new implementations of CVPO and DDPG/SAC would be much faster than the current one; For example, regarding the CarCircle task, CVPO converges within 30 mins, and SAC-Lag converges within 15 mins. You may refer to the new implementations for better integration to OminiSafe at that time. |
Thanks for your early reply, which is surely insightful. I try to run your implementation of |
Sure. Looking forward to that! |
As one of the maintainers of Omnisafe, we are currently working on adding CVPO to our supported algorithm list. You can find more information about Omnisafe on our homepage at https://github.com/OmniSafeAI/omnisafe. However, during our implementation process, we encountered a few challenges that we hope to address with the original author's guidance.
Firstly, we noticed that CVPO sets the random layout to false, as shown in the code at https://github.com/liuzuxin/cvpo-safe-rl/blob/main/envs/safety-gym/safety_gym/envs/engine.py#L118. This implies that the environment lacks randomness, with obstacles and goal positions remaining unchanged for each epoch. We are curious about the reason behind this design choice and how it aligns with the safety objectives of CVPO.
Secondly, while the author has implemented the Lagrangian versions of SAC, DDPG, and TD3, we encountered challenges while testing them in our modified interface with a true random layout. Specifically, when we ran SAC, DDPG, and TD3 on velocity, we consistently obtained a reward of zero, which differs from the results obtained in other platforms such as Tianshou and Stable Baseline3. This disparity raises concerns about the efficacy of CVPO's Lagrangian versions. We would appreciate insights on how to ensure that they function effectively.
Lastly, we are curious about why CVPO changed the max episode length of the original setting of Safety Gym from 1000 to 500 in the experiment. It would be helpful to understand the rationale behind this decision.
Overall, we seek clarification from the original author to avoid any potential misunderstandings and ensure the successful integration of CVPO into Omnisafe.
The text was updated successfully, but these errors were encountered: