Skip to content

feat: Docs/governance event sink spi proposal#2240

Open
Ricky-G wants to merge 8 commits into
mainfrom
docs/governance-event-sink-spi-proposal
Open

feat: Docs/governance event sink spi proposal#2240
Ricky-G wants to merge 8 commits into
mainfrom
docs/governance-event-sink-spi-proposal

Conversation

@Ricky-G
Copy link
Copy Markdown
Contributor

@Ricky-G Ricky-G commented May 13, 2026

Adds a concise design doc for the pluggable governance event sink interface tracked in #1999. Generalises the existing SpanSink pattern from agent-hypervisor into a first-class GovernanceEventSink SPI in agent-os, with reference OTLP and stdout sinks and policy-enforced sink presence (fail-closed).

Includes:

High-level mermaid flow
Event categories, envelope schema, policy integration
Per-language interface sketches (Python, .NET, Rust, TypeScript, Go)
Concrete decisions on delivery semantics, fanout, signing keys, audit log unification, schema versioning, and backpressure

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

🤖 AI Agent: code-reviewer — View details

TL;DR: 0 blockers, 1 warning. Proposal is solid but needs minor clarification.

# Sev Issue Where
1 Warn Clarify signing key rotation and revocation "Signing key management" section

Action items: None, as there are no blockers.

Warnings:

# Issue Fine as follow-up PRs
1 Clarify signing key rotation and revocation Yes

@github-actions github-actions Bot added the size/L Large PR (< 500 lines) label May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

🤖 AI Agent: security-scanner — View details

No security issues found.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

🤖 AI Agent: test-generator — View details

Test coverage analysis is not applicable to this pull request as it only introduces documentation changes. No code files were modified.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

Documentation is in sync.

@Ricky-G Ricky-G changed the title Docs/governance event sink spi proposal feat: Docs/governance event sink spi proposal May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

🤖 AI Agent: breaking-change-detector — View details

No breaking changes detected.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

PR Review Summary

Check Status Details
🔍 Code Review ⚠️ Warning See details
🛡️ Security Scan ✅ Passed No issues found
🔄 Breaking Changes ✅ Passed No issues found
📝 Docs Sync ✅ Passed No issues found
🧪 Test Coverage ✅ Completed Analysis complete

Verdict: ⚠️ Ready for human review

@github-actions
Copy link
Copy Markdown

🟡 Contributor Check: MEDIUM

Check Result
Profile MEDIUM
Credential NONE
Overall MEDIUM

Automated check by AGT Contributor Check.

@github-actions github-actions Bot added the needs-review:MEDIUM Contributor check flagged MEDIUM risk label May 13, 2026
@Ricky-G Ricky-G force-pushed the docs/governance-event-sink-spi-proposal branch from 1a63c27 to 22296c2 Compare May 13, 2026 04:48
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 - This is a great initiative. Surely will help our customers a ton. Couple of questions I have

  • If GovernanceEventSink, StdOutEventSink, OtlpEventSink live on the same host where agent policy engine runs. Do we need GovernanceEventSink in the middle fanning to different event sinks?

  • If yes, Can you explain more on how the fan out to StdOutEventSink and OtlpEventSink look like

  • For someone new to OpenTelemetryCollector How does this pick events from the OtlpEventSink and send it to Datadog, splunk. I am assuming there is a queueing mechanism involved here.

  • It would be good to write a detailed end to end flow including where each component lives and other internal components for understanding. Since the data flows across different endpoints there can be unforeseen crashes. How would the components handle events as they recover from crash

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amolr, great feedback, thank you, to akswer those points below:

  1. It's not a separate process, it's the SPI (the Protocol/interface) that StdoutEventSink and OtlpEventSink both implement, plus a tiny in-process dispatcher the kernel emits to. We need the dispatcher because policy can require multiple sinks active at once (e.g. siem + audit), and we want one emit() call from the kernel to fan out without the kernel knowing which sinks are wired in. Same-process, no network hop, just function calls.

  2. In-process and parallel:
    async def emit(event):    await asyncio.gather(*(s.emit(event) for s in sinks), return_exceptions=True)
    Each sink has its own bounded queue, retry policy, and health state. A slow or failing sink doesn't block the others. With both Stdout and Otlp registered, every event writes a JSON line to stdout and pushes an OTLP record concurrently.

  3. Yes, queueing lives in the Collector, not in AGT. Quick primer:

The Collector is a separate vendor-neutral process (sidecar, daemonset, container — customer's choice) with three pipeline stages:

Receivers - otlp listens on gRPC 4317 / HTTP 4318. OtlpEventSink pushes here.
Processors - batch, memory_limiter, and a persistent_queue extension for disk-backed durability.
Exporters - vendor-specific (datadog, splunk_hec, azuremonitor, awscloudwatchlogs, etc.). Each handles vendor auth and rate limiting.

Typical config:

receivers: { otlp: { protocols: { grpc: {}, http: {} } } }processors: { batch: {}, memory_limiter: { limit_mib: 512 } }exporters:  datadog:    { api: { key: ${DD_KEY} } }  splunk_hec: { token: ${SPLUNK_HEC} }service:  pipelines:    logs: { receivers: [otlp], processors: [batch], exporters: [datadog, splunk_hec] }
AGT pushes one OTLP stream; the Collector handles vendor fan-out, batching, retry, queue.
  1. Very good point I will add this as a section to the proposal

Ricky-G added 4 commits May 14, 2026 15:00
High-level design for the pluggable governance event sink interface (issue #1999). Covers the core concept, mermaid flow, event categories, envelope, policy integration, where it lives, and per-language interface sketches (Python, .NET, Rust, TypeScript, Go).

Signed-off-by: Ricky Gummadi <[email protected]>
@Ricky-G Ricky-G force-pushed the docs/governance-event-sink-spi-proposal branch from 75b7e20 to 20e2037 Compare May 14, 2026 03:00
@Ricky-G Ricky-G force-pushed the docs/governance-event-sink-spi-proposal branch from a6df5db to 72e6e17 Compare May 14, 2026 08:21
…-sink-spi-proposal

Signed-off-by: Ricky Gummadi <[email protected]>

# Conflicts:
#	.cspell-repo-terms.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation needs-review:MEDIUM Contributor check flagged MEDIUM risk size/L Large PR (< 500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants