Skip to content

Conversation

Edilmo
Copy link

@Edilmo Edilmo commented Oct 1, 2020

Why are these changes needed?

Currently, for recurrent/recursive models, the state is only available for policy evaluation during training, but it's not available in the SampleBatch hence is not accessible at the execution plans level which in turn means that is not present in the replay buffer. So, apex-like algorithms can not use memory models right now in RLlib.

Here we are making the very first step towards supporting memory for this kind of algorithms.

Related issue number

None

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/latest/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failure rates at https://ray-travis-tracker.herokuapp.com/.
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested (please justify below)

@Edilmo Edilmo force-pushed the edpalenc/replay-state branch from 962ad27 to 5d19589 Compare October 6, 2020 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant