This repository was archived by the owner on Oct 31, 2023. It is now read-only.
Add Efficient Numpy Replay Buffer #17
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The default replay buffer requires very high RAM and results in frequent crashes due to Pytorch's data-loader memory leak issue. Thus, I have efficiently re-implemented DrQv2's replay buffer entirely in NumPy, taking only about 20gb of RAM for storing all 1000000 transitions. Moreover, with this implementation, there is no need to wait for a trajectory to be completed before adding a new transition to the memory used for sampling.
FPS of this NumPy implementation appears to be identical (perhaps, very slightly higher) on all machines I have tested this on. Potentially, this could also lead to (very minimal) performance gains since the agent can now sample replay transitions from its latest trajectory.
I have kept the original dataloader replay buffer as default. The new replay_buffer can be used by running
train.py
with the replay_buffer=numpy option.