feat: Docs/governance event sink spi proposal#2240
Conversation
🤖 AI Agent: code-reviewer — View detailsTL;DR: 0 blockers, 1 warning. Proposal is solid but needs minor clarification.
Action items: None, as there are no blockers. Warnings:
|
🤖 AI Agent: security-scanner — View detailsNo security issues found. |
🤖 AI Agent: test-generator — View detailsTest coverage analysis is not applicable to this pull request as it only introduces documentation changes. No code files were modified. |
🤖 AI Agent: docs-sync-checker — Docs SyncDocs SyncDocumentation is in sync. |
🤖 AI Agent: breaking-change-detector — View detailsNo breaking changes detected. |
PR Review Summary
Verdict: |
|
🟡 Contributor Check: MEDIUM
Automated check by AGT Contributor Check. |
1a63c27 to
22296c2
Compare
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
- This is a great initiative. Surely will help our customers a ton. Couple of questions I have
-
If GovernanceEventSink, StdOutEventSink, OtlpEventSink live on the same host where agent policy engine runs. Do we need GovernanceEventSink in the middle fanning to different event sinks?
-
If yes, Can you explain more on how the fan out to StdOutEventSink and OtlpEventSink look like
-
For someone new to OpenTelemetryCollector How does this pick events from the OtlpEventSink and send it to Datadog, splunk. I am assuming there is a queueing mechanism involved here.
-
It would be good to write a detailed end to end flow including where each component lives and other internal components for understanding. Since the data flows across different endpoints there can be unforeseen crashes. How would the components handle events as they recover from crash
There was a problem hiding this comment.
Hi @amolr, great feedback, thank you, to akswer those points below:
-
It's not a separate process, it's the SPI (the Protocol/interface) that StdoutEventSink and OtlpEventSink both implement, plus a tiny in-process dispatcher the kernel emits to. We need the dispatcher because policy can require multiple sinks active at once (e.g. siem + audit), and we want one emit() call from the kernel to fan out without the kernel knowing which sinks are wired in. Same-process, no network hop, just function calls.
-
In-process and parallel:
async def emit(event): await asyncio.gather(*(s.emit(event) for s in sinks), return_exceptions=True)
Each sink has its own bounded queue, retry policy, and health state. A slow or failing sink doesn't block the others. With both Stdout and Otlp registered, every event writes a JSON line to stdout and pushes an OTLP record concurrently. -
Yes, queueing lives in the Collector, not in AGT. Quick primer:
The Collector is a separate vendor-neutral process (sidecar, daemonset, container — customer's choice) with three pipeline stages:
Receivers - otlp listens on gRPC 4317 / HTTP 4318. OtlpEventSink pushes here.
Processors - batch, memory_limiter, and a persistent_queue extension for disk-backed durability.
Exporters - vendor-specific (datadog, splunk_hec, azuremonitor, awscloudwatchlogs, etc.). Each handles vendor auth and rate limiting.
Typical config:
receivers: { otlp: { protocols: { grpc: {}, http: {} } } }processors: { batch: {}, memory_limiter: { limit_mib: 512 } }exporters: datadog: { api: { key: ${DD_KEY} } } splunk_hec: { token: ${SPLUNK_HEC} }service: pipelines: logs: { receivers: [otlp], processors: [batch], exporters: [datadog, splunk_hec] }
AGT pushes one OTLP stream; the Collector handles vendor fan-out, batching, retry, queue.
- Very good point I will add this as a section to the proposal
High-level design for the pluggable governance event sink interface (issue #1999). Covers the core concept, mermaid flow, event categories, envelope, policy integration, where it lives, and per-language interface sketches (Python, .NET, Rust, TypeScript, Go). Signed-off-by: Ricky Gummadi <[email protected]>
Signed-off-by: Ricky Gummadi <[email protected]>
Signed-off-by: Ricky Gummadi <[email protected]>
Signed-off-by: Ricky Gummadi <[email protected]>
75b7e20 to
20e2037
Compare
Signed-off-by: Ricky Gummadi <[email protected]>
Signed-off-by: Ricky Gummadi <[email protected]>
Signed-off-by: Ricky Gummadi <[email protected]>
a6df5db to
72e6e17
Compare
…-sink-spi-proposal Signed-off-by: Ricky Gummadi <[email protected]> # Conflicts: # .cspell-repo-terms.txt
Adds a concise design doc for the pluggable governance event sink interface tracked in #1999. Generalises the existing SpanSink pattern from agent-hypervisor into a first-class GovernanceEventSink SPI in agent-os, with reference OTLP and stdout sinks and policy-enforced sink presence (fail-closed).
Includes:
High-level mermaid flow
Event categories, envelope schema, policy integration
Per-language interface sketches (Python, .NET, Rust, TypeScript, Go)
Concrete decisions on delivery semantics, fanout, signing keys, audit log unification, schema versioning, and backpressure