Explicit distribution of transition and reward #534

pkel · 2023-12-19T18:43:24Z

pkel
Dec 19, 2023

I'm trying to implement the Selfish Mining problem defined here https://arxiv.org/abs/1507.06183.

The problem has explicit (SparseCat) state transitions and defines the reward for each transition. I have a function of the following form.

transition_and_reward = function (s, a)
  return SparseCat( [ (sp_0, rew_0), ..., (sp_n, rew_n) ], [prob_0, ..., prob_n] )
end

How would you implement this in POMDP?

After reading the doc, it seems I have to choose from the following:

Implement transition with SparseCat return value and reward returning an expected reward. This implies loosing some information about the association between transition and reward.
Implement transition with SparseCat return value and include the reward in the next state. Then make the reward = function (s, a, sp) function read the immediate reward from sp. This approach implies that reward = function (s, a) remains undefined. It also causes redundant states.
Implement the gen function. This implies loosing the explicit distributions.
Implement different POMDPs for different solvers.

Each has its own shortcomings. Do you have a better solution?

Discussion #509 seems to be related, but I do not understand the proposed solution. What is "a compositional approach"? Does it solve my problem?

Bonus questions:

What speaks against defining (PO)MDPs with the combined transition_and_reward function presented above?
What about info values? I actually have a SparseCat for (sp, rew, info). The info is important to transform the MDP later down the pipeline, so I'm keen not to loose it.

Answered by zsunberg

Dec 28, 2023

Hi @pkel ,

I think the best way to handle this is to implement both the gen function and transition with (s, a) arguments. You can implement both simultaneously; you just have to make sure that reward returns an expectation consistent with gen (which is why implementing both separately is usually not recommended). Then gen will be used in simulations and transition and reward will be used in solvers that need explicit representations.

To ensure reliability, you could pre-compute transition probabilities and reward expectations for all states using your current transition_and_reward function.

View full answer

zsunberg · 2023-12-28T00:30:19Z

zsunberg
Dec 28, 2023
Maintainer

Hi @pkel ,

I think the best way to handle this is to implement both the gen function and transition with (s, a) arguments. You can implement both simultaneously; you just have to make sure that reward returns an expectation consistent with gen (which is why implementing both separately is usually not recommended). Then gen will be used in simulations and transition and reward will be used in solvers that need explicit representations.

To ensure reliability, you could pre-compute transition probabilities and reward expectations for all states using your current transition_and_reward function.

1 reply

pkel Dec 29, 2023
Author

Excellent. I will try this approach, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit distribution of transition and reward #534

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Explicit distribution of transition and reward #534

pkel Dec 19, 2023

Replies: 1 comment · 1 reply

zsunberg Dec 28, 2023 Maintainer

pkel Dec 29, 2023 Author

pkel
Dec 19, 2023

Replies: 1 comment 1 reply

zsunberg
Dec 28, 2023
Maintainer

pkel Dec 29, 2023
Author