Skip to content

Explicit distribution of transition and reward #534

Discussion options

You must be logged in to vote

Hi @pkel ,

I think the best way to handle this is to implement both the gen function and transition with (s, a) arguments. You can implement both simultaneously; you just have to make sure that reward returns an expectation consistent with gen (which is why implementing both separately is usually not recommended). Then gen will be used in simulations and transition and reward will be used in solvers that need explicit representations.

To ensure reliability, you could pre-compute transition probabilities and reward expectations for all states using your current transition_and_reward function.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@pkel
Comment options

Answer selected by pkel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants