NGU implementation #4

TakieddineSOUALHI · 2023-02-18T10:57:07Z

Hi, First of all thank you for providing these implementations to the community.

I've a few questions about your NGU implementation. the original work uses two networks a randomly fixed network like in RND and an embedding network to calculate the exploration rewards. The idea of the embedding network is to use it to represent states in episodic memory and use them later to calculate intrinsic rewards. Also, the embedding network is trained each iteration to optimize for action-state pairs (a,s) with batches sampled from the replay buffer.

My questions are:

How does this implementation handle the episodic memory and the training embedding network. If I understood your implementation well. you assume that the buffer (either replay or rollout) is the episodic memory and use it to embed states.
Meanwhile the embedding network is used to calculate intrinsic rewards, a predictor network is the one trained and used for RND rewards. I didn't understand this part quite well. Can you elaborate this point please?

yuanmingqi · 2023-02-25T06:12:35Z

Hi! The key insight of NGU is to combine episodic state novelty and life-long novelty:

Episodic state novelty for maximizing intra-episode exploration;
Life-long state novelty for maximizing Inter-episode exploration;

For the RND part, we follow the origin design of RND. But for the episodic part, we use a random and fixed encoder to generate representations, which is inspired by

Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.

Since we only need the representations to perform pseudo-count and the memory is erased in each episode, thus the embedding method may not be that critical.
A fixed encoder can provide fixed representations and maintain a stable reward space;
It is more efficient and easy to train.

Anyway, you can follow the original implementation or create a new one, which depends on your task.

yuanmingqi · 2024-02-29T22:00:08Z

Hello! We've published a big update that provides more reasonable implementations of these intrinsic rewrads.

If you have any other questions, please don't hesitate to ask here.

@TakieddineSOUALHI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGU implementation #4

NGU implementation #4

TakieddineSOUALHI commented Feb 18, 2023

yuanmingqi commented Feb 25, 2023

yuanmingqi commented Feb 29, 2024 •

edited

Loading

NGU implementation #4

NGU implementation #4

Comments

TakieddineSOUALHI commented Feb 18, 2023

yuanmingqi commented Feb 25, 2023

yuanmingqi commented Feb 29, 2024 • edited Loading

yuanmingqi commented Feb 29, 2024 •

edited

Loading