Explicit distribution of transition and reward #534
-
I'm trying to implement the Selfish Mining problem defined here https://arxiv.org/abs/1507.06183. The problem has explicit (SparseCat) state transitions and defines the reward for each transition. I have a function of the following form. transition_and_reward = function (s, a)
return SparseCat( [ (sp_0, rew_0), ..., (sp_n, rew_n) ], [prob_0, ..., prob_n] )
end How would you implement this in POMDP? After reading the doc, it seems I have to choose from the following:
Each has its own shortcomings. Do you have a better solution? Discussion #509 seems to be related, but I do not understand the proposed solution. What is "a compositional approach"? Does it solve my problem? Bonus questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @pkel , I think the best way to handle this is to implement both the To ensure reliability, you could pre-compute transition probabilities and reward expectations for all states using your current |
Beta Was this translation helpful? Give feedback.
Hi @pkel ,
I think the best way to handle this is to implement both the
gen
function andtransition
with(s, a)
arguments. You can implement both simultaneously; you just have to make sure thatreward
returns an expectation consistent withgen
(which is why implementing both separately is usually not recommended). Thengen
will be used in simulations andtransition
andreward
will be used in solvers that need explicit representations.To ensure reliability, you could pre-compute transition probabilities and reward expectations for all states using your current
transition_and_reward
function.