You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, First of all thank you for providing these implementations to the community.
I've a few questions about your NGU implementation. the original work uses two networks a randomly fixed network like in RND and an embedding network to calculate the exploration rewards. The idea of the embedding network is to use it to represent states in episodic memory and use them later to calculate intrinsic rewards. Also, the embedding network is trained each iteration to optimize for action-state pairs (a,s) with batches sampled from the replay buffer.
My questions are:
How does this implementation handle the episodic memory and the training embedding network. If I understood your implementation well. you assume that the buffer (either replay or rollout) is the episodic memory and use it to embed states.
Meanwhile the embedding network is used to calculate intrinsic rewards, a predictor network is the one trained and used for RND rewards. I didn't understand this part quite well. Can you elaborate this point please?
The text was updated successfully, but these errors were encountered:
Hi! The key insight of NGU is to combine episodic state novelty and life-long novelty:
Episodic state novelty for maximizing intra-episode exploration;
Life-long state novelty for maximizing Inter-episode exploration;
For the RND part, we follow the origin design of RND. But for the episodic part, we use a random and fixed encoder to generate representations, which is inspired by
Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.
Since we only need the representations to perform pseudo-count and the memory is erased in each episode, thus the embedding method may not be that critical.
A fixed encoder can provide fixed representations and maintain a stable reward space;
It is more efficient and easy to train.
Anyway, you can follow the original implementation or create a new one, which depends on your task.
Hi, First of all thank you for providing these implementations to the community.
I've a few questions about your NGU implementation. the original work uses two networks a randomly fixed network like in RND and an embedding network to calculate the exploration rewards. The idea of the embedding network is to use it to represent states in episodic memory and use them later to calculate intrinsic rewards. Also, the embedding network is trained each iteration to optimize for action-state pairs (a,s) with batches sampled from the replay buffer.
My questions are:
The text was updated successfully, but these errors were encountered: