QuickPOMDP python interface #513
Replies: 3 comments 7 replies
-
Hi @JuliBaCSE , Thanks for asking. It looks like you have most things correct. Thanks for taking the time to dig into the documentation and figure out things for yourself. It looks like you have slightly misunderstood the construction of the spaces. Using There will probably be a few more errors to work through because the combination of python and continuous spaces is not as well tested, but this is definitely the kind of thing we want to support, and I am optimistic that we can get it to work! |
Beta Was this translation helpful? Give feedback.
-
Hi @zsunberg, After your proposed change, I still encounter the same issue for some reason. Do I have to define the observations or transition differently? Or, do I have to define an updater. I can't find anything in the documentation. Thanks for your help! Maybe we get a minimal example running to add for the python QuickPOMDPs :) |
Beta Was this translation helpful? Give feedback.
-
I now created an example code that works with your update. Only thing is the results are somehow suboptimal. I guess this somehow corresponds to the definition import numpy as np
from julia.api import Julia
# needed in case of errors
#jl = Julia(compiled_modules=False)
from quickpomdps import QuickPOMDP
from julia import Pkg
Pkg.add(["POMDPs", "POMDPTools", "Distributions", "QMDP","POMCPOW",'CommonRLSpaces'])
from julia.CommonRLSpaces import product,ArraySpace,Box
from julia.Base import clamp,randn
from julia.Main import Float64
from julia.POMDPs import solve, pdf,action,simulate
from julia.QMDP import QMDPSolver
from julia.POMCPOW import POMCPOWSolver, MaxUCB
from julia.POMDPTools import stepthrough, alphavectors, Uniform, Deterministic, HistoryRecorder,RandomPolicy
from julia.Distributions import Normal,AbstractMvNormal,MvNormal,mean,var,cov
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
goal = np.array([3,3,0])
R_pos = np.diag([-100,-100,1])
R_u = np.diag([-1,-1])
def transition(s, a, dt=0.1):
s0 = s[0] + a[0] * np.cos(s[2]) * dt # check here if needed
s1 = s[1] + a[0] * np.sin(s[2]) * dt
s2 = s[2] + a[1] * dt
return MvNormal([s0,s1,s2],[0.01,0.01,0.01])
def observation(s, a, sp):
# sp is next state
return MvNormal(sp, [0.0001,0.0001,0.0001])
def terminal(s,goal):
return np.isclose(s,goal,atol=0.2).all()
def reward(s, a, sp):
if terminal(s,goal):
return 5.0
else:
state = np.array(s)
action = np.array(a)
return (state-goal).T@R_pos@(state-goal) + action.T@R_u@action
pomdp = QuickPOMDP(
states = Box([-8,-8,-np.pi],[8,8,np.pi]),
actions = Box([-0.6,-np.pi/4],[0.6,np.pi/4]),
observations= Box([-8,-8,-np.pi],[8,8,np.pi]),
discount = 0.9,
isterminal = lambda s: terminal(s,goal),
transition = transition,
observation = observation,
reward = reward,
initialstate = Deterministic(np.array([0,0,0]))
)
solver = POMCPOWSolver(max_time = 1, tree_queries = 15)
#solver = QMDPSolver()
policy = solve(solver, pomdp)
hr = HistoryRecorder(max_steps = 1000)
hist = simulate(hr,pomdp, policy)
rhist = simulate(hr,pomdp,RandomPolicy(pomdp))
# show histories
it = 0
for step in hist:
print(f'___step:{it}____')
print(step.s)
print(step.a)
print(step.r)
print('__________')
it +=1
# show random history
it = 0
for step in rhist:
print(f'___step:{it}____')
print(step.s)
print(step.a)
print(step.r)
print('__________')
it +=1 |
Beta Was this translation helpful? Give feedback.
-
Hi,
i am currently stuck at implementing a problem with continuous space. I want to use my python implementation of a Extended Kaiman Filter to compute the next state and the prediction and the update. For now I just want to get simple example running using the POMCPOW algorithm which doesn't work for me. I would be really glad if someone could help me out or give we guidance how to define the problem given:
the error message is
Beta Was this translation helpful? Give feedback.
All reactions