QuickPOMDP python interface #513

JuliBaCSE · 2023-07-30T14:06:01Z

JuliBaCSE
Jul 30, 2023

Hi,
i am currently stuck at implementing a problem with continuous space. I want to use my python implementation of a Extended Kaiman Filter to compute the next state and the prediction and the update. For now I just want to get simple example running using the POMCPOW algorithm which doesn't work for me. I would be really glad if someone could help me out or give we guidance how to define the problem given:

EKF predict equation
EKF update equation
observations from the system (e.g. true dynamics but only parts of states observable)

import numpy as np
from julia.api import Julia
jl = Julia(compiled_modules=False)

from quickpomdps import QuickPOMDP

from julia import Pkg
Pkg.add(["POMDPs", "POMDPTools", "Distributions", "QMDP","POMCPOW",'CommonRLSpaces'])

from julia.CommonRLSpaces import product,ArraySpace

from julia.Main import Float64
from julia.POMDPs import solve, pdf,action
from julia.QMDP import QMDPSolver
from julia.POMCPOW import POMCPOWSolver
from julia.POMDPTools import stepthrough, alphavectors, Uniform, Deterministic
from julia.Distributions import Normal,AbstractMvNormal,MvNormal

goal = [3,3,0]
def reward(s, a, sp):
    if s[0] == goal[0] and s[1] == goal[1]:
        return 5.0
    else:
        return -1.0

def transition(s, a, dt=0.1):
    print(s)
    s[0] = s[0] + a[0] * np.sin(s[2]) * dt
    s[1] = s[1] + a[0] * np.sin(s[2]) * dt
    return MvNormal(s, [0.1,0.1,0.1])


def observation(s, a, sp):
    # sp is next state
    return MvNormal(sp, [0.001,0.001,0.001])

def reward(s, a, sp):
    if s[0] == goal[0] and s[1] == goal[1]:
        return 5.0
    else:
        return -1.0

m = QuickPOMDP(
    states = product([-5,5],[-5,5],[-3,3]),
    actions = product([-5,5],[-5,5]),
    observations=product([-5,5],[-5,5],[-3,3]),
    discount = 0.9,
    isterminal = lambda s: (s[0] == goal[0] and s[1] == goal[1]),
    transition = transition,
    observation = observation,
    reward = reward,
    initialstate = Deterministic([0.0,0.0,0.0])
)
solver = POMCPOWSolver()
#solver = QMDPSolver()
policy = solve(solver, m)
action(policy,Deterministic([0.0,0.0,0.0]))

the error message is

RuntimeError: <PyCall.jlwrap (in a Julia function called from Python)
JULIA: MethodError: no method matching POMCPOW.POWNodeBelief(::QuickPOMDPs.QuickPOMDP{UUID("05db1c7f-dfee-4917-8172-ca787314eb46"), Tuple{Int64, Int64, Int64}, Tuple{Int64, Int64}, Tuple{Int64, Int64, Int64}, NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :observation, :actionindex, :transition, :reward, :initialstate), Tuple{Dict{Tuple{Int64, Int64, Int64}, Int64}, PyCall.var"#fn#26"{PyCall.var"#fn#25#27"{PyObject}}, Dict{Tuple{Int64, Int64, Int64}, Int64}, CommonRLSpaces.TupleProduct{Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}}, CommonRLSpaces.TupleProduct{Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}}, Float64, CommonRLSpaces.TupleProduct{Tuple{Vector{Int64}, Vector{Int64}}}, Main.PyQuickPOMDPs.var"#53#observation_pyfunc_closure#10"{PyObject}, Dict{Tuple{Int64, Int64}, Int64}, PyCall.var"#fn#26"{PyCall.var"#fn#25#27"{PyObject}}, Main.PyQuickPOMDPs.var"#85#reward_pyfunc_closure#16"{PyObject}, POMDPTools.POMDPDistributions.Deterministic{Vector{Float64}}}}}, ::Vector{Float64}, ::Tuple{Int64, Int64}, ::Vector{Float64}, ::Vector{Float64}, ::Float64)
Closest candidates are:
  POMCPOW.POWNodeBelief(::POMDPs.POMDP{S, A, O}, !Matched::S, ::A, !Matched::S, !Matched::O, ::Any) where {S, A, O}
   @ POMCPOW ~/.julia/packages/POMCPOW/IdXFf/src/beliefs.jl:15

zsunberg · 2023-07-31T23:03:54Z

zsunberg
Jul 31, 2023
Maintainer

Hi @JuliBaCSE ,

Thanks for asking. It looks like you have most things correct. Thanks for taking the time to dig into the documentation and figure out things for yourself.

It looks like you have slightly misunderstood the construction of the spaces. Using product([-1, 1], [-2,2]) will construct a set that is mathematically written {-1, 1} x {-2, 2}, whereas the set you seem to want to construct is the cartesian product of intervals, [-1, 1] x [-2, 2]. I think the easiest way to do what you want is states = Box([-5, -5, -3], [5, 5, 3]), etc.

There will probably be a few more errors to work through because the combination of python and continuous spaces is not as well tested, but this is definitely the kind of thing we want to support, and I am optimistic that we can get it to work!

0 replies

JuliBaCSE · 2023-08-01T03:51:12Z

JuliBaCSE
Aug 1, 2023
Author

Hi @zsunberg,
thanks a lot. I originally tried Box([-5,5],[-5,5],[-3,3]), which didn't worked so I thought to use the other options.

After your proposed change, I still encounter the same issue for some reason. Do I have to define the observations or transition differently? Or, do I have to define an updater. I can't find anything in the documentation.

Thanks for your help! Maybe we get a minimal example running to add for the python QuickPOMDPs :)

6 replies

zsunberg Aug 1, 2023
Maintainer

Side note: if you use IntervalSets, product(-5.0..5.0, -5.0..5.0, -3.0..3.0) would do what you want, but I think the .. syntax would have trouble in python.

JuliBaCSE Aug 1, 2023
Author

Thanks a lot. I will also try to get a minimal example working.

zsunberg Aug 2, 2023
Maintainer

(I'm working on adding this example as a test for POMCPOW now and fixing it, but I may not finish today)

zsunberg Aug 3, 2023
Maintainer

OK, POMCPOW was fixed for this in this PR: JuliaPOMDP/POMCPOW.jl#37. In POMCPOW v0.3.8, which should be available momentarily, it should work.

Note: The exact version of the code that I checked it with is here: https://gist.github.com/zsunberg/c298d1f79099a624193c3dfe16b93762

You need to make sure not to modify s within transition or you might get errors. (or worse, the solver will silently not work correctly). states should ideally be immutable.

JuliBaCSE Aug 3, 2023
Author

Thanks a lot! I just tested it out and it works fine! By modify you mean overwrite?

JuliBaCSE · 2023-08-03T16:32:23Z

JuliBaCSE
Aug 3, 2023
Author

I now created an example code that works with your update. Only thing is the results are somehow suboptimal. I guess this somehow corresponds to the definition

import numpy as np
from julia.api import Julia

# needed in case of errors
#jl = Julia(compiled_modules=False)

from quickpomdps import QuickPOMDP

from julia import Pkg
Pkg.add(["POMDPs", "POMDPTools", "Distributions", "QMDP","POMCPOW",'CommonRLSpaces'])

from julia.CommonRLSpaces import product,ArraySpace,Box

from julia.Base import clamp,randn
from julia.Main import Float64
from julia.POMDPs import solve, pdf,action,simulate
from julia.QMDP import QMDPSolver
from julia.POMCPOW import POMCPOWSolver, MaxUCB
from julia.POMDPTools import stepthrough, alphavectors, Uniform, Deterministic, HistoryRecorder,RandomPolicy
from julia.Distributions import Normal,AbstractMvNormal,MvNormal,mean,var,cov

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

goal = np.array([3,3,0])
R_pos = np.diag([-100,-100,1])
R_u = np.diag([-1,-1])
def transition(s, a, dt=0.1):
    s0 = s[0] + a[0] * np.cos(s[2]) * dt # check here if needed
    s1 = s[1] + a[0] * np.sin(s[2]) * dt
    s2 = s[2] + a[1] * dt
    return MvNormal([s0,s1,s2],[0.01,0.01,0.01])


def observation(s, a, sp):
    # sp is next state
    return MvNormal(sp, [0.0001,0.0001,0.0001])

def terminal(s,goal):
    return np.isclose(s,goal,atol=0.2).all()

def reward(s, a, sp):
    if terminal(s,goal):
        return 5.0
    else:
        state = np.array(s)
        action = np.array(a)
        return (state-goal).T@R_pos@(state-goal) + action.T@R_u@action



pomdp = QuickPOMDP(
    states = Box([-8,-8,-np.pi],[8,8,np.pi]),
    actions = Box([-0.6,-np.pi/4],[0.6,np.pi/4]),
    observations= Box([-8,-8,-np.pi],[8,8,np.pi]),
    discount = 0.9,
    isterminal = lambda s: terminal(s,goal),
    transition = transition,
    observation = observation,
    reward = reward,
    initialstate = Deterministic(np.array([0,0,0]))
)
solver = POMCPOWSolver(max_time = 1, tree_queries = 15)
#solver = QMDPSolver()
policy = solve(solver, pomdp)

hr = HistoryRecorder(max_steps = 1000)
hist = simulate(hr,pomdp, policy)
rhist = simulate(hr,pomdp,RandomPolicy(pomdp))

# show histories
it = 0
for step in hist:
    print(f'___step:{it}____')
    print(step.s)
    print(step.a)
    print(step.r)
    print('__________')
    it +=1
# show random history
it = 0
for step in rhist:
    print(f'___step:{it}____')
    print(step.s)
    print(step.a)
    print(step.r)
    print('__________')
    it +=1

1 reply

zsunberg Aug 4, 2023
Maintainer

There are a few reasons why POMCPOW might not be performing that well yet.

I don't think the reward function is working like you intended. In POMDPs.jl, if a state is terminal (https://juliapomdp.github.io/POMDPs.jl/latest/def_pomdp/#Terminal-states), a solver can assume that it will not take any more actions or collect any reward. So there is no way the solver can collect the 5.0 reward at the terminal state. I think if terminal(sp, goal): return 5.0 will do what you intended though, because it will collect the reward on the way into the terminal state.
Usually MCTS-based solvers like POMCPOW need some help to find good policies. By default it is using a random rollout policy, which is unlikely to find the reward. I would recommend using estimate_value = FORollout(FunctionPolicy(policy)) where policy is a function that maps the state to a decent action. Also, you will probably have to use a lot more tree_queries :) I hope this is not too slow - it has not gotten much testing with python.

I would recommend trying out some linear policies (e.g. FunctionPolicy(b -> -mean(b))) to test and see if the problem is working the way you intend for it to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QuickPOMDP python interface #513

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

QuickPOMDP python interface #513

JuliBaCSE Jul 30, 2023

Replies: 3 comments · 7 replies

zsunberg Jul 31, 2023 Maintainer

JuliBaCSE Aug 1, 2023 Author

zsunberg Aug 1, 2023 Maintainer

JuliBaCSE Aug 1, 2023 Author

zsunberg Aug 2, 2023 Maintainer

zsunberg Aug 3, 2023 Maintainer

JuliBaCSE Aug 3, 2023 Author

JuliBaCSE Aug 3, 2023 Author

zsunberg Aug 4, 2023 Maintainer

JuliBaCSE
Jul 30, 2023

Replies: 3 comments 7 replies

zsunberg
Jul 31, 2023
Maintainer

JuliBaCSE
Aug 1, 2023
Author

zsunberg Aug 1, 2023
Maintainer

JuliBaCSE Aug 1, 2023
Author

zsunberg Aug 2, 2023
Maintainer

zsunberg Aug 3, 2023
Maintainer

JuliBaCSE Aug 3, 2023
Author

JuliBaCSE
Aug 3, 2023
Author

zsunberg Aug 4, 2023
Maintainer