This document belongs to the L1 kernel layer and describes how the Worker implements and complies with the L0 protocol.
- Subscribe to
cmd.agent.{worker_target}.wakeupas the execution bell, determined by[worker].worker_targetsinconfig.toml. - Pull tasks and receipts from
state.agent_inbox. - Hydrate
state/resource/cards, then execute the ReAct loop. - Write cards and
state.*, and publish events and streaming output. - The semantic source of truth is the inbox record; the wakeup header is not the semantic source of truth.
- Single instance, multi-coroutine: A fixed number of coroutines are started per process/instance, and tasks are processed concurrently from an internal queue.
- Strict statelessness: Do not store any mutable business state related to agent/turn/step on
self(for example, current agent, context, temporary results, cache, etc.). - State ownership: Business state must exist only in WorkItem, function local variables, or persistent storage.
- Allowed self contents: Immutable config, connection pools/clients, queues, loggers, and other infrastructure objects.
- Lock boundary: Locks may only be used for shared statistics/throttling/resource-pool management without isolation requirements; they must not be used to share business state with a lock as an isolation substitute.
worker_targetis the unique routing field and comes from Profile/Roster.- Worker instances declare which targets to consume via
[worker].worker_targetsinconfig.toml(multiple targets allowed). - Common values:
worker_generic(generic pool),ui_worker(UI entrypoint),sandbox(controlled execution environment).
- Purpose: provide a deliverable, non-contention route for “stateful inbox-consuming services/orchestrators” (to avoid contention among multiple process instances for the same
agent_idinbox). - Protocol constraint:
worker_targetmust be a single segment token of a NATS subject;./*/>/ whitespace are forbidden. Recommended character set is[a-z0-9_-], all lowercase. - Naming format:
svc1_<service>_<scope> service: service/binary name, such asbetter_demo/ground_control.scope: unique scope; recommended to includeenv+clusterorproject_id(to avoid cross-environment/cross-project collisions), such asdev_proj_cgnd_demo_01.- Operational rule (must follow): Any
svc1_*target in the same environment may only be subscribed to and consumed by one process instance (the correspondingcmd.agent.{target}.wakeupcan only have one “real processor”). - Multi-replica/HA: Do not let multiple replicas share the same
svc1_*. Instead, use per-replica independentagent_id + worker_target(upstream selects caller by instance), or introduce a leader election/DB lock mechanism for single-primary semantics before sharing.
- Read:
state.*,resource.*, cards/boxes (through card-box-cg). - Write:
state.agent_state_head/agent_steps(for only its own AgentTurn), cards/boxes; when necessary write Report tostate.agent_inboxvia ExecutionService.
- Receive
cmd.agent.*.wakeup→ claim inbox first (FOR UPDATE SKIP LOCKED), then state update with CAS gate. state_agentinitializesagent_turn_id/turn_epochat enqueue/claim in L0;agent_turnis not generated by Worker.- When entering processing, Worker sets
agent_turntorunning(state transition with CAS protection), then hydrates in order:state.agent_state_head→resource.*→profile_box_id/context_box_id/output_box_id. - Greedy Batch: fetch all executable inbox for the same Worker-correlated agents from
claim_pending_inbox, and process them in chronological order. - If tool calls exist: write
tool.callcard and publishcmd.tool.*/cmd.sys.pmo.internal.*, recordtool_call_idwithin the step, and setactivity=executing_tool(LLM step metadata). - Each tool wait writes
turn_waiting_toolsto state and setsresume_deadline;suspend_timeoutis governed by worker-specific config and tool-side timeout constraints. The current implementation does not persistexpecting_correlation_idas the matching condition (common path isNone); resume judgment is unified onturn_waiting_tools + state.agent_inbox(correlation=tool_call_id). - When
status=suspended, wake/resume no longer depends on single-correlation filtering; watchdog and resume paths reconcileturn_waiting_toolsrows inwaiting/receivedstate and claim the matching due inbox rows (pending/deferred) before continuing. - After tool return, or for normal LLM turns with no tool calls, continue to the next step under
next_step(is_continuation=True)within the sameagent_turn_id/turn_epoch(intra-turn continuation;resumemay cause turn_epoch reordering). - Terminal state: write
task.deliverable, publishevt.agent.*.task, clearactive_agent_turn_id, and returnstatustoidle.
Constraint: Worker must not generate
agent_turn_id/turn_epochby itself; invalid/missing inbox should log warnings and stop related side effects.
Supplemental note (context incremental updates during batch processing):
- Hydration is done once on wakeup to obtain an initial context snapshot.
- The Worker ReAct is step-level continuation: a single turn may trigger multiple model calls in one processing path, and multiple Enqueue/report operations can sequence progress within the same
agent_turn. - When processing each step, new input and output cards are incrementally appended to context, rather than fully rebuilding each time.
output_box_idcards written via sync writes can be replayed through subsequent wakeup hydration to recover after crash/restart.
- Read
state.agent_state_headand perform gate checks. - Read
resource.project_agents/resource.tools/resource.profiles. - Read
profile_box_id/context_box_id/output_box_id.
Supplemental note (visibility boundary):
- Worker only reads cards/boxes referenced by the above
box_ids and does not automatically scan or read other boxes in the project. - Therefore, cross-agent context sharing must happen explicitly: upstream/PMO must perform context packing (or in future by tool), assembling the required cards into a new
context_box_id. - Reference:
03_kernel_l1/pmo_orchestration.md output_box_idis rolling memory: each turn merges cards fromoutput_box_idinto context, so outputs are read back.
- Entry file:
services/agent_worker/tool_spec.pybuild_tool_specs: converts rows inresource.toolsto LLM tool specs (OpenAI-style); actual implementation is ininfra/llm/tool_specs.py.filter_allowed_tools: filters by profileallowed_toolsallowlist and logs mismatch warnings.
- Processing details:
options.args.defaults: inject defaults when LLM omits parameters.options.args.fixed/options.envs: force inject and hide from LLM parameters.
- After building the tool list, Worker appends the built-in
submit_result(seeservices/agent_worker/builtin_tools.py).
evt.agent.*.step: phase event (started/planning/executing/completed).evt.agent.*.task: terminal event, must includedeliverable_card_id.evt.agent.*.chunk: streaming output (Core NATS).- LLM return
usage/response_costare written tostate.agent_steps.metadata(llm_usage/llm_response_cost), which supports SQL aggregation. - The tool-call ID list returned by LLM is written to
state.agent_steps.tool_call_ids(step-level index, replacing legacystate.agent_turns.tool_call_ids).
- Any
UPDATE statemust includeturn_epoch + active_agent_turn_id; zero affected rows means stop side effects. - Idempotency and convergence of
tool_result/tool callback are jointly constrained byturn_waiting_tools+state.agent_inbox(correlation_id=tool_call_id); apply is single-winner on the waiting row, and later duplicates converge as idempotent consume rather than duplicate append. - If
statusorturnmismatch/box expired, resume is rerouted to drop/retry and emits observability signals instead of blind duplicate submission. Worker watchdogrescans wait tables and claims due rows fromstate.agent_inbox: on timeout and replay conditions, it injectstool.resulttimeout report and drives wakeup back into the same turn flow.error/parsefailures use a failure fallback path: for example, whentask.result_fieldsor tool schemas are invalid or parse errors occur, it logs warnings and continues an available degraded path; it is not treated as hard protocol rejection unless the model request cannot be built.- Worker crashes are reclaimed by PMO;
watchdogandresumehandle timeout and replay, and should not modify others' state across agents.
- The built-in workflow tool
submit_resultdoes not go through UTP. - Semantics: write
tool.result+task.deliverableand end the Turn. - Constraint: only one invocation of this tool is allowed within the same turn.
- If
context_box_idcontainstask.result_fields, do minimal field validation; if parsing fails, log a warning and continue with field-missing degraded output (current implementation favors availability first). task.result_fieldsstructural requirement: content should be parsable asFieldsSchemaContent; if it does not conform, log a warning and apply backward-compatible degradation instead of hard rejection (unless a key field needed by the main chain is unavailable).- Profile field
must_end_with(default empty): when a turn has no tool_calls:- If
must_end_withis empty/missing: automatically write this turn’s assistant content astask.deliverableand end the Turn (empty content uses placeholder text). - If
must_end_withis non-empty: writesys.must_end_with_requiredprompt and re-enqueue to continue (requiring one of the tools listed there in the next call).
- If
Note: The mandatory check for
must_end_withcurrently applies only when there are no tool_calls and the Agent attempts natural completion. If a tool called in the turn explicitly returnsafter_execution=terminate, the current Turn is directly terminated, allowingmust_end_withto be bypassed.