This document covers the primary web‑first flows. WhatsApp is documented separately in whatsapp.md.
The web UI can be gated behind TOTP with optional WebAuthn passkeys. Passkeys are enrolled via the /passkey enrol slash command (after signing in with TOTP). The login page attempts passkeys automatically when supported and falls back to the TOTP form if none are available.
- Initial TOTP setup:
/totp→ single card with QR + manual code + confirmation input → secret is committed only after successful confirmation - Secondary-device setup:
/totpwith an existing secret re-displays the same secret in the same single-card flow so another authenticator can be validated without rotating the secret - Reset:
/totp reset <current-code>→ verify current code → single card with new QR/manual code + confirmation input → new secret is committed only after successful confirmation, then existing web sessions are invalidated - Passkey enrolment:
/passkey enrol→ one-time link → WebAuthn registration - Login: passkey first (conditional mediation or first input focus), then TOTP fallback
- Multiple passkeys are supported per user; manage with
/passkey listand/passkey delete - Every TOTP card submission returns explicit feedback describing validation success/failure and any changes performed
The web UI supports steering mid‑response by queuing follow‑ups while streaming.
sequenceDiagram
participant User
participant Web as Web UI
participant API as WebChannel (HTTP/SSE)
participant Router
participant DB as SQLite
participant Pool as AgentPool
participant Pi as Pi SDK
User->>Web: Prompt
Web->>API: POST /agent
API->>DB: Store message
API->>Router: Route message
Router->>Pool: runAgent()
Pool->>Pi: prompt()
Pi-->>Pool: streaming updates
Pool-->>API: events
API-->>Web: SSE updates
Pool->>DB: Store assistant reply
When the agent is streaming a response, the user can submit additional messages. These are automatically queued as follow‑ups rather than interrupting the active turn:
sequenceDiagram
participant User
participant Web as Web UI
participant API as WebChannel
participant Queue as Deferred Queue
participant Placeholder as Placeholder Store
participant Pool as AgentPool
User->>Web: Submit while agent is streaming
Web->>API: POST /agent
API->>API: isStreaming(chatJid) = true
alt Deferred follow‑up (queue full path)
API->>Queue: Append to queued_followups_json
API-->>Web: { queued: "followup" }
Web->>Web: Add to queue stack UI
end
Note over Pool: Active turn completes
API->>Queue: Peek next deferred item
Queue-->>API: { rowId, content, threadId }
API->>Placeholder: Move to placeholder store
API->>Pool: processChat(content)
Pool->>Pool: Run agent turn
Note over Web: Queue stack item disappears
There are two queue storage layers:
- Deferred Queue — persisted in
queued_followups_jsoncolumn ofchat_cursors. Uses negative synthetic row IDs. Survives restarts. - Placeholder Store — in‑memory FIFO (
FollowupPlaceholderStore). Uses positive real row IDs (from the stored message). Lost on restart but recovered via the deferred queue.
getQueuedFollowupItems() merges both stores, deduplicating by rowId.
The user can interact with queued items in the compose area:
- Cancel (×) — calls
removeAgentQueueItem(rowId), which splices the item from the server‑side queue array and broadcastsagent_followup_removed. - Steer (→) — calls
steerAgentQueueItem(rowId), which injects the item's content as steering into the active agent session. The agent sees it as mid‑turn user input. - Move up / move down — calls
POST /agent/queue-reorderwithfrom_index/to_index, which reorders the deferred queue inqueued_followups_jsonand persists the new order on the backend instead of leaving the stack at the mercy of timestamp sorting.
Queue reorder now preserves deliberate user ordering all the way through execution:
- the backend control-plane surface exposes
handleAgentQueueReorder() - the queued-followup lifecycle service mutates the persisted deferred queue order in place
- the old timestamp-based re-sort that used to quietly undo user moves has been removed
The client tracks a dismissedQueueRowIdsRef (Set) to prevent refreshQueueState from re‑adding items that were just cancelled or steered. The set is cleared when the agent turn completes or when the server queue empties.
Each chat JID has a cursor tracked in chat_cursors:
IDLE → (beginChatRun) → IN‑FLIGHT → (endChatRun) → IDLE
→ (endChatRunWithError) → FAILED
FAILED → (clearFailedRun / next success) → IDLE
All state transitions are single SQL statements — crash‑safe under SQLite WAL mode:
beginChatRun(chatJid, messageId)— saves current cursor asinflight_prev_ts, recordsinflight_message_idandinflight_started_atendChatRun(chatJid, newCursorTs)— advances cursor, clears inflight + failed markersendChatRunWithError(chatJid, ...)— records failure metadata, clears inflight markerrollbackInflightRun(chatJid)— restores cursor frominflight_prev_ts, clears inflight marker
If piclaw restarts mid-turn, startup recovery now happens in two phases:
- inflight web runs are rolled back/cleared immediately on startup
- once workers and channels are online, piclaw writes a self-addressed
resume_pendingIPC task and resumes any remaining pending chats through the normal IPC path
Recovery logic (recoverInflightRuns):
- Finds all
chat_cursorsrows with a non‑nullinflight_message_id - If a terminal agent reply already exists past the inflight message → clears the inflight marker (the turn actually completed before the crash)
- If the inflight marker is older than 30 minutes (
MAX_INFLIGHT_AGE_MS) → clears without rollback (prevents infinite retry of permanently failing messages) - Otherwise → rolls back the cursor, deletes non‑terminal bot messages after the inflight point, and enqueues a retry through the normal processing path
That keeps restart recovery on the same code path as an explicit reload instead of depending on a lucky post-reboot user action.
The web UI can render adaptive_card content blocks inline in timeline posts and route card actions back through the normal web channel.
Action.OpenUrlis handled client-side with URL validation and an explicit secondary action button pattern.Action.Submitposts toPOST /agent/card-action.- Submissions are persisted as
adaptive_card_submissioncontent blocks on the follow-up message. - The timeline renders those submission blocks as compact receipt-style UI instead of relying on raw fallback text.
- Card lifecycle is tracked on the original card block:
- default submit behavior:
active → completed - explicit terminal variants can resolve to
cancelledorfailed - cards with
submit_behavior: "keep_active"remain interactive after submit
- default submit behavior:
- Completed/cancelled/failed cards are re-rendered from the original card payload with the last submitted values hydrated back into the inputs.
- Finished cards then lock those inputs read-only, hide action buttons, and show a concise theme-consistent status banner rather than echoing the full submission in banner text.
- Validation cards can be posted through the internal
send_adaptive_cardtool, including cases that exercise bad URL handling, submit errors, keep-active cards, and terminal-state transitions.
Piclaw now has a side-prompt primitive for work that should reuse the chat's current model and thinking level without touching the main session tree.
- Backend primitive:
AgentPool.runSidePrompt(chatJid, prompt, options) - Web endpoints:
POST /agent/side-promptfor a one-shot JSON resultPOST /agent/side-prompt/streamfor SSE-styleside_prompt_*events with live thinking/text deltas
- Uses the current chat model + thinking level
- Does not append to the main agent session tree
- Intended as the substrate for future
/btw/ side-conversation UI work
The web UI now has a first thin consumer for this substrate:
/btw <question>is handled locally in the web compose box- it opens a lightweight side-conversation panel
- the panel streams thinking/text deltas from
POST /agent/side-prompt/stream - each BTW run is reseeded from the current main session tree context before prompting, so it starts from the active Pi conversation state rather than a cold empty context
- the side run uses a separate side session so it can stay isolated from the main visible conversation while still inheriting current context and model/thinking state
Inject into chatsends the final BTW answer back through the normal message path, so it respects the same queue/follow-up rules as any other user submission
This is still an early web-native BTW layer rather than the full final system, but the separation of concerns is now in place: core provides the side-prompt/side-session substrate, while BTW remains a thin UI consumer on top.
The compose footer exposes current context-window usage through the ContextPie indicator, backed by GET /agent/context.
The web client now refreshes this state on:
- initial connect
- SSE reconnect
- window focus
pageshowvisibilitychangewhen the document becomes visible again
That keeps the context compaction affordance in sync when returning to the tab or reopening the webapp, rather than waiting for the slower backstop poller.
The SSE reconnect handler also refreshes queue state (refreshQueueState()) so queued follow-ups submitted before a connection gap are restored in the compose stack immediately, rather than waiting for the next 60-second poll cycle.
The persistence model is intentionally split:
- backend truth decides whether the compaction-related affordance is currently warranted
- browser-local state is reserved for lightweight local UI memory such as dismissal/seen state
Hot web paths now emit correlated backend timing and browser-visible request traces.
Backend side:
- request-router and selected web surfaces append
Server-Timingheaders using theappendServerTiming()helpers - the main router also emits an
x-request-idheader so a browser-visible request can be correlated with backend logs or server-side timing spans - instrumentation is intentionally lightweight: bounded timing metrics attached to normal HTTP responses rather than a full tracing stack
Browser side:
runtime/web/src/api.tsrecordsx-request-id,Server-Timing, duration, method, URL, and success/failure intowindow.__PICLAW_PERF__runtime/web/src/ui/app-perf-tracing.tskeeps bounded in-memory trace/request buffers for live inspection in the browser- failed requests that never receive a response are still recorded as failed-before-response entries so the client-side picture does not quietly omit them
This gives piclaw a cheap request-correlation surface now, while leaving room for a fuller observability/export story later.
The web UI now keeps a bounded cache of recent timeline snapshots and uses nearby-thread prewarming to make thread switches feel less abrupt.
app-timeline-cache.tsstores recent chat timeline payloads with an age limit and a small LRU-style cache cap- the active chat is excluded from prewarm selection, and recent nearby chats are deduped before background fetches begin
- background prewarm is best-effort: failed fetches are logged at debug level and dropped rather than surfacing as user-facing errors
- the cache is used together with the newer refresh-coalescing/warm-session work so thread switches can often render from something fresher than a cold network round-trip
This is not backend truth — it is browser-local performance state intended purely to reduce visible latency around timeline revisits and nearby-thread navigation.
Scheduled tasks run on the same AgentSession as normal user messages but are isolated using the session tree. Before executing a task, the scheduler saves the current tree position (leaf ID) and the active model. The task's prompt and response are appended to the session as usual, then the scheduler navigates back to the saved leaf. This leaves the task's output in a side branch of the session tree — it persists in history but does not pollute the user's conversation context.
If the task specifies a different model (e.g. a cheaper one for periodic summaries), the model is switched before execution and restored afterwards. Because the tree navigation also rewinds the conversation state, the model restore happens on the original branch where it belongs.
sequenceDiagram
participant Skill
participant IPC
participant Scheduler
participant DB as SQLite
participant Pool as AgentPool
participant Tree as Session Tree
participant Pi as Pi SDK
participant Web as Web UI
Skill->>IPC: write schedule_task JSON
Scheduler->>DB: persist task
Scheduler->>Tree: save leaf position + model
Note over Tree: leafId = "abc123"
opt Task specifies model
Scheduler->>Pool: switch model
end
Scheduler->>Pool: runAgent()
Pool->>Pi: prompt()
Pi-->>Pool: result
Note over Tree: leaf advanced to "xyz789"
Scheduler->>Tree: navigate back to "abc123"
Note over Tree: task output in side branch
opt Model was switched
Scheduler->>Pool: restore original model
end
Scheduler->>DB: log run
Scheduler-->>Web: post result
Without isolation, a scheduled task's prompt and response would appear in the agent's conversation context. The next user message would see the task's output, leading to confused responses. The session tree approach solves this cleanly:
- No context pollution: The user's conversation continues from where it left off.
- Full history: The task's output is preserved in a side branch and can be inspected via
/tree. - Model safety: The model is restored to its pre-task state on the correct branch.
- No session forking: Unlike
fork()which creates a new session file,navigateTree()stays in the same file and simply moves the branch pointer.
A chat can optionally switch its core file/shell tools to a remote host over SSH.
- Control surface: agent-only
sshtool - Scope: one chat JID at a time
- Persistence: SQLite
ssh_configs - Affected tools:
read,write,edit,bash
The important runtime property is that SSH mode is live mutable. If a warm session already exists, ssh set and ssh clear can flip the backend immediately for the next tool/model step without rebuilding the whole session.
sequenceDiagram
participant Agent
participant ssh as ssh tool
participant Pool as AgentPool
participant DB as SQLite
participant Core as ssh-core wrappers
participant Host as Local or Remote host
Agent->>ssh: action=set target+keychain
ssh->>Pool: setSshConfig(chatJid, config)
Pool->>DB: upsert ssh_configs
Pool->>Core: applyLiveSshConfig(chatJid)
Core-->>Host: next read/write/edit/bash uses SSH
Agent->>ssh: action=clear
ssh->>Pool: clearSshConfig(chatJid)
Pool->>DB: delete ssh_configs row
Pool->>Core: clearLiveSshConfig(chatJid)
Core-->>Host: next core tool call runs locally
Transport semantics match the packaged SSH extension model:
- multiplexed connection reuse
ControlMaster=autoControlPersist=600- persistent remote shell state
- configured remote cwd/home mapping
Several runtime choices are intentionally optimized for low-context turns and progressive discovery:
- small fixed tool baseline, plus automatic default activation for safe read-only tools and message/scheduling/attachment helpers in the current catalog, with explicit same-turn tool activation for everything else
- session-scoped infra profiles so SSH / Proxmox / Portainer state can be reused without restating connection details every turn
- compact
capabilitiesoutput and shortrecommendresults before full workflow details - opt-in examples in
workflow_helpinstead of returning bulky example payloads by default - warm session reuse so model state and scoped tool state stay available without repeated setup
For infrastructure work, the preferred discovery path is:
discover → capabilities or recommend → workflow_help → workflow
For internal tools, the parallel staged path is:
list_tools(query) → compact summaries → on-demand parameters/details → activate_tools / use
Only fall back to raw request when the curated workflow surface is not the right fit.
- Messages for a chat JID share a warm
AgentSession. - Auto‑compaction runs when the context window is tight.
- Idle sessions are evicted after a short TTL.
- When the agent produces multiple turns in a single response (e.g. tool calls followed by a final answer), each turn's text and attachments are stored as separate messages. The first becomes the thread root; subsequent turns carry a
thread_idpointing back to the root. The UI renders these as indented threaded replies.
See architecture.md for component layout and tools-and-skills.md for tool/skill details.