You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am currently working on reproducing the results presented in Fig. 12 for the ICM method and have encountered some challenges. Specifically, according to Fig. 12, the ICM reward appears to converge to approximately 10+ after 1e7 steps. However, when running the notebook provided at https://github.com/RLE-Foundation/RLeXplore/blob/main/1%20rlexplore_with_rllte.ipynb and setting the rewards to intrinsic rewards only (instead of the combined intrinsic and extrinsic rewards), I observed a reward of 30 at 5e6 steps.
Specifically, I change 'self.storage.rewards += intrinsic_rewards.to(self.device)' to 'self.storage.rewards = intrinsic_rewards.to(self.device)' at https://github.com/RLE-Foundation/rllte/blob/eeefdedb2ceee3ae1abfe88896cae3b8b62b4c05/rllte/common/prototype/on_policy_agent.py#L168.
This discrepancy has led me to question whether my understanding of Fig. 12 is correct. Could you kindly clarify the methodology or provide guidance on how to replicate the ICM results as depicted in Fig. 12?
Thank you for your time and for making such valuable resources available to the community. I appreciate any insights or suggestions you may offer. Have a nice day.
The text was updated successfully, but these errors were encountered:
Hi, I am currently working on reproducing the results presented in Fig. 12 for the ICM method and have encountered some challenges. Specifically, according to Fig. 12, the ICM reward appears to converge to approximately 10+ after 1e7 steps. However, when running the notebook provided at https://github.com/RLE-Foundation/RLeXplore/blob/main/1%20rlexplore_with_rllte.ipynb and setting the rewards to intrinsic rewards only (instead of the combined intrinsic and extrinsic rewards), I observed a reward of 30 at 5e6 steps.
Specifically, I change 'self.storage.rewards += intrinsic_rewards.to(self.device)' to 'self.storage.rewards = intrinsic_rewards.to(self.device)' at https://github.com/RLE-Foundation/rllte/blob/eeefdedb2ceee3ae1abfe88896cae3b8b62b4c05/rllte/common/prototype/on_policy_agent.py#L168.
This discrepancy has led me to question whether my understanding of Fig. 12 is correct. Could you kindly clarify the methodology or provide guidance on how to replicate the ICM results as depicted in Fig. 12?
Thank you for your time and for making such valuable resources available to the community. I appreciate any insights or suggestions you may offer. Have a nice day.
The text was updated successfully, but these errors were encountered: