Summary
Codex executions can contain substantial command and assistant activity in raw events, while Centaur execution summaries report zero commands, zero tool results, zero assistant text, and zero token usage.
This one is lower confidence than the other two issues. It may be incomplete Codex telemetry support or expected behavior, but given the other Codex parsing/delivery issues, it looks like Centaur is under-projecting some Codex event shapes.
Observed
Example execution: exe_69d2b0c925414cd1
Raw event shape counts:
raw_event_count: 2748
item.agentMessage.delta: 1937
item.commandExecution.outputDelta: 468
item.started commandExecution: 82
item.completed commandExecution: 82
item.completed agentMessage: 44
item.started agentMessage: 44
item.completed reasoning: 42
item.started reasoning: 42
But the execution summary reported:
observation_event_count: 0
assistant_text_events: 0
assistant_text_chars: 0
command_events: 0
tool_result_events: 0
tool_error_events: 0
file_change_events: 0
total_tokens: 0
Other Codex executions showed the same pattern:
exe_f1c44cb9a8f64852: raw_event_count 1411, observation_event_count 0
exe_cc37103e753f4122: raw_event_count 800, observation_event_count 0
Expected
If Centaur stores Codex raw events, the execution summary should project useful observation counters from those events where possible.
For example:
item.completed with item.type = commandExecution should count as command execution activity.
item.commandExecution.outputDelta should be usable for command output diagnostics.
item.agentMessage.delta or completed agentMessage should count as assistant text.
- file changes and usage/token events should be projected if present.
Alternatively, if this is intentionally not supported for Codex, the behavior should be documented clearly so downstream workflows know to query raw Codex events directly.
Why It Matters
Trace workflows that look for failed or inefficient tool use cannot rely on Centaur summaries. They have to scrape raw Codex events or sandbox logs, even though the database already contains enough raw signal to identify command and tool behavior.
Possible Root Cause
The Codex normalizer appears to pass through many item.* events, while the observability projection expects canonical event types such as command_execution, assistant, tool, file_change, and usage.
Relevant code paths to inspect:
services/api/api/sandbox/normalize.py
services/api/api/observability.py
services/api/api/runtime_control.py
Summary
Codex executions can contain substantial command and assistant activity in raw events, while Centaur execution summaries report zero commands, zero tool results, zero assistant text, and zero token usage.
This one is lower confidence than the other two issues. It may be incomplete Codex telemetry support or expected behavior, but given the other Codex parsing/delivery issues, it looks like Centaur is under-projecting some Codex event shapes.
Observed
Example execution:
exe_69d2b0c925414cd1Raw event shape counts:
But the execution summary reported:
Other Codex executions showed the same pattern:
Expected
If Centaur stores Codex raw events, the execution summary should project useful observation counters from those events where possible.
For example:
item.completedwithitem.type = commandExecutionshould count as command execution activity.item.commandExecution.outputDeltashould be usable for command output diagnostics.item.agentMessage.deltaor completedagentMessageshould count as assistant text.Alternatively, if this is intentionally not supported for Codex, the behavior should be documented clearly so downstream workflows know to query raw Codex events directly.
Why It Matters
Trace workflows that look for failed or inefficient tool use cannot rely on Centaur summaries. They have to scrape raw Codex events or sandbox logs, even though the database already contains enough raw signal to identify command and tool behavior.
Possible Root Cause
The Codex normalizer appears to pass through many
item.*events, while the observability projection expects canonical event types such ascommand_execution,assistant,tool,file_change, andusage.Relevant code paths to inspect:
services/api/api/sandbox/normalize.pyservices/api/api/observability.pyservices/api/api/runtime_control.py