Async Python SDK for the Odyseus API.
The current API has two main modes:
single_image_infer(...)for one image plus a prompt, returning projected 3D targets and reconstruction events.connect_webrtc(...)/connect_video_session(...)for live video over WebRTC, with SLAM events streamed from the runtime.
The default point cloud model is depth-anything.
python -m venv venv
source ./venv/bin/activate
pip install -e .import asyncio
import odyseus as od
async def main():
client = od.Odyseus(api_key="sk_your_api_key_here")
async def on_event(event: dict) -> None:
print(event.get("type"), event.get("session_id"))
image_bytes = open("frame.jpg", "rb").read()
result = await client.single_image_infer(
image_bytes,
"find the coffee mug",
on_event=on_event,
)
print(result["session_id"])
print(result["objective_payload"])
asyncio.run(main())single_image_infer(...) uses the existing runtime websocket flow and emits the same event types the backend already sends:
single_infer_readysingle_infer_acceptedsingle_infer_mesh_readysingle_infer_objectiveslam_statusslam_binaryerror
The returned aggregate dict includes:
session_idsessionmesh_ready_payloadobjective_payloadlatest_slam_statuslatest_slam_binaryevents
Backend field names are preserved exactly, including:
objective_pointsobjective_world_pointobjective_image_pointrobot_pose
import asyncio
from aiortc import RTCPeerConnection
from aiortc.contrib.media import MediaPlayer
import odyseus as od
async def main():
client = od.Odyseus(api_key="sk_your_api_key_here")
player = MediaPlayer("/dev/video0", format="v4l2", options={"video_size": "640x480", "framerate": "15"})
pc = RTCPeerConnection()
track = od.webrtc.LatestFrameTrack(player.video)
pc.addTrack(track)
session = await client.connect_video_session(pc)
print(session)
async for event in client.iter_slam_events(session["session_id"]):
print(event.get("type"), event.get("status"))
asyncio.run(main())connect_video_session(...) keeps video transport WebRTC-driven. Session inspection uses the existing runtime SLAM websocket and session APIs.
Available helpers:
await client.connect_webrtc(pc, unreal_fix=False)await client.connect_video_session(pc, unreal_fix=False)await client.get_live_session_state(session_id=None)await client.get_session_frame(session_id)client.iter_slam_events(session_id=None)
infer(...) is still present for legacy compatibility, but the new SDK examples use single_image_infer(...) for image mode.
Single image:
python ./examples/raspberry_pi/single_image.py \
--api-key <YOUR_API_KEY> \
--prompt "find the chair"Live video:
python ./examples/raspberry_pi/live_video.py \
--api-key <YOUR_API_KEY>Install the extra requirements if you want a standalone example environment:
pip install -r ./examples/unreal_sim/requirements.txt
pip install -e .Single image from the Unreal video stream:
python ./examples/unreal_sim/single_image.py \
--api-key <YOUR_API_KEY> \
--prompt "find the table"Live video relay from Unreal:
python ./examples/unreal_sim/live_video.py \
--api-key <YOUR_API_KEY>Single image from a path:
python ./examples/static/single_image.py \
--api-key <YOUR_API_KEY> \
--path ./frame.jpg \
--prompt "find the doorway"Live video from a path over WebRTC:
python ./examples/static/live_video.py \
--api-key <YOUR_API_KEY> \
--path ./demo.mp4- Live video transport remains WebRTC end to end.
- Single-image mode defaults to
depth-anythingunless you pass--point-cloud-model. - The frontend live prompt / infer flow is separate from this SDK refresh and is not changed here.