You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Auth design proposal
Publish the design document covering X.509 mTLS (with SPIRE and file-based certificate modes), Keycloak authentication, Keycloak RBAC authorization, and cloud provider federation.
2. SPIRE infrastructure deployment
Add SPIRE Server (StatefulSet), Agent (DaemonSet), and Controller Manager (Deployment) manifests to agentcube-system. Create ServiceAccounts, RBAC, ConfigMaps for server/agent configuration, and ClusterSPIFFEID CRDs for declarative workload registration. SPIFFE IDs follow the Istio convention (spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>).
3. Certificate source abstraction and dynamic reloading
Add --mtls-cert-file, --mtls-key-file, and --mtls-ca-file flags to Router, WorkloadManager, and PicoD. Implement a unified TLS configuration loader using Go's standard crypto/tls package. Configure GetCertificate and GetClientCertificate callbacks for dynamic zero-downtime certificate reloading. For SPIRE mode, the spiffe-helper sidecar writes SVIDs to disk as PEM files, consumed by the same code path. Unit tests for the configuration loader.
4. mTLS: Router → WorkloadManager
Wrap the Router's HTTP client (in session_manager.go) with mTLS using the certificate source abstraction. Add mTLS server config to WorkloadManager that accepts only Router connections. Gate behind --enable-mtls flag. Unit tests for both sides.
5. mTLS: Router → PicoD
Add mTLS client config in the Router for proxying to PicoD sandboxes. Add mTLS server config to PicoD accepting only Router. For SPIRE mode, mount the spiffe-helper sidecar and SPIRE Agent socket into sandbox pods via WorkloadManager's pod builder. For file-based mode, mount cert files from K8s Secrets. Unit tests included.
6. Hybrid Router → PicoD Authentication Modes
Refactor the existing PicoD-Plain-Authentication code into a new, latency-optimized JWT mode. Add a --picod-auth-mode flag accepting mtls (default) or jwt. The jwt mode bypasses TLS handshake overhead by using an application-layer Authorization: Bearer header, specifically designed for Code Interpreters and agentic RL scenarios targeting ~100ms bootstrap latency. Unit tests for mode-based toggling.
7. Keycloak deployment and realm setup
Add Keycloak Deployment and Service manifests to agentcube-system. Configure agentcube realm with agentcube-sdk, agentcube-router, and workloadmanager clients. Ensure AgentCube components support dynamic realm configuration via CLI flags (--keycloak-realm) to natively support multi-tenant isolation. Create sandbox:invoke, sandbox:manage, admin realm roles with inheritance.
8. Router JWT verification and Role-based Authorization Focus solely on edge validation. Add Keycloak JWT validation middleware to Router: periodic JWKS key fetching, local caching, and token signature verification. Extract realm_access.roles claims and add authorization middleware mapping Router endpoints to required Keycloak roles (e.g., sandbox:invoke). Return 403 Forbidden for insufficient permissions. Gate behind --enable-external-auth flag. Unit tests for validation and role enforcement.
9. User Identity Propagation (Token Exchange) Focus on internal identity propagation. Add Token Exchange logic (grant_type=urn:ietf:params:oauth:grant-type:token-exchange) to swap the validated incoming SDK token for a downstream token scoped to WorkloadManager and PicoD audiences. Forward the exchanged JWT in the Authorization: Bearer header over the internal channels. Update WorkloadManager and PicoD to validate the exchanged JWT using local JWKS fetchers. Unit tests for token exchange and validation.
10. Resource-Level Access Control (RLAC) Focus on tenant isolation. Update WorkloadManager to read the sub claim from the exchanged JWT (representing the end user) and apply it as an agentcube.io/owner label when provisioning sandbox CRDs and Pods. Update Router to query the target sandbox's labels before proxying traffic, ensuring the caller's sub matches the sandbox's owner label. Return 403 Forbidden for unauthorized sandbox access. Unit tests for resource tenant isolation.
11. Python SDK authentication support
Add ServiceAccountAuth and TokenAuth classes to the Python SDK. ServiceAccountAuth handles automatic token acquisition via client_credentials grant and re-authentication on expiry. TokenAuth accepts a pre-obtained token as-is. Attach Authorization: Bearer header to all requests when auth is configured. Unit tests included.
Stretch Goal
12. Cloud provider identity federation
Configure Keycloak identity brokering for AWS IAM (OIDC), Google Cloud Identity (OAuth2), and Azure AD (SAML/OIDC). No AgentCube code changes - Keycloak admin configuration and documentation only.
Testing and Documentation
13. E2E tests
End-to-end tests covering: mTLS between internal components, Keycloak token issuance and Router validation, role enforcement, RLAC ownership isolation, Token Exchange, and the Hybrid mtls/jwt PicoD auth modes.
14. User guide and developer documentation
Document setup instructions for SPIRE and file-based certificate modes, Keycloak configuration, multi-tenant realm configuration, SDK usage, role management, RLAC validation, and cloud provider federation. Add to docs/.
Establishing AgentCube's Authentication and Authorization Capabilities
V=volcano-sh/agentcube: Establish authentication and authorization (2026 Term 1)*
AgentCube currently lacks a unified authentication and authorization model. This issue tracks the implementation of:
Design proposal: auth-proposal.md
Checkpoints
Internal Workload Authentication (X.509 mTLS & JWT)
1. Auth design proposal
Publish the design document covering X.509 mTLS (with SPIRE and file-based certificate modes), Keycloak authentication, Keycloak RBAC authorization, and cloud provider federation.
2. SPIRE infrastructure deployment
Add SPIRE Server (StatefulSet), Agent (DaemonSet), and Controller Manager (Deployment) manifests to
agentcube-system. Create ServiceAccounts, RBAC, ConfigMaps for server/agent configuration, andClusterSPIFFEIDCRDs for declarative workload registration. SPIFFE IDs follow the Istio convention (spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>).3. Certificate source abstraction and dynamic reloading
Add
--mtls-cert-file,--mtls-key-file, and--mtls-ca-fileflags to Router, WorkloadManager, and PicoD. Implement a unified TLS configuration loader using Go's standardcrypto/tlspackage. ConfigureGetCertificateandGetClientCertificatecallbacks for dynamic zero-downtime certificate reloading. For SPIRE mode, the spiffe-helper sidecar writes SVIDs to disk as PEM files, consumed by the same code path. Unit tests for the configuration loader.4. mTLS: Router → WorkloadManager
Wrap the Router's HTTP client (in
session_manager.go) with mTLS using the certificate source abstraction. Add mTLS server config to WorkloadManager that accepts only Router connections. Gate behind--enable-mtlsflag. Unit tests for both sides.5. mTLS: Router → PicoD
Add mTLS client config in the Router for proxying to PicoD sandboxes. Add mTLS server config to PicoD accepting only Router. For SPIRE mode, mount the spiffe-helper sidecar and SPIRE Agent socket into sandbox pods via WorkloadManager's pod builder. For file-based mode, mount cert files from K8s Secrets. Unit tests included.
6. Hybrid Router → PicoD Authentication Modes
Refactor the existing PicoD-Plain-Authentication code into a new, latency-optimized JWT mode. Add a
--picod-auth-modeflag acceptingmtls(default) orjwt. Thejwtmode bypasses TLS handshake overhead by using an application-layerAuthorization: Bearerheader, specifically designed for Code Interpreters and agentic RL scenarios targeting ~100ms bootstrap latency. Unit tests for mode-based toggling.External Authentication & Authorization (Keycloak)
7. Keycloak deployment and realm setup
Add Keycloak Deployment and Service manifests to
agentcube-system. Configureagentcuberealm withagentcube-sdk,agentcube-router, andworkloadmanagerclients. Ensure AgentCube components support dynamic realm configuration via CLI flags (--keycloak-realm) to natively support multi-tenant isolation. Createsandbox:invoke,sandbox:manage,adminrealm roles with inheritance.8. Router JWT verification and Role-based Authorization
Focus solely on edge validation. Add Keycloak JWT validation middleware to Router: periodic JWKS key fetching, local caching, and token signature verification. Extract
realm_access.rolesclaims and add authorization middleware mapping Router endpoints to required Keycloak roles (e.g.,sandbox:invoke). Return 403 Forbidden for insufficient permissions. Gate behind--enable-external-authflag. Unit tests for validation and role enforcement.9. User Identity Propagation (Token Exchange)
Focus on internal identity propagation. Add Token Exchange logic (
grant_type=urn:ietf:params:oauth:grant-type:token-exchange) to swap the validated incoming SDK token for a downstream token scoped to WorkloadManager and PicoD audiences. Forward the exchanged JWT in theAuthorization: Bearerheader over the internal channels. Update WorkloadManager and PicoD to validate the exchanged JWT using local JWKS fetchers. Unit tests for token exchange and validation.10. Resource-Level Access Control (RLAC)
Focus on tenant isolation. Update WorkloadManager to read the
subclaim from the exchanged JWT (representing the end user) and apply it as anagentcube.io/ownerlabel when provisioning sandbox CRDs and Pods. Update Router to query the target sandbox's labels before proxying traffic, ensuring the caller'ssubmatches the sandbox'sownerlabel. Return 403 Forbidden for unauthorized sandbox access. Unit tests for resource tenant isolation.11. Python SDK authentication support
Add
ServiceAccountAuthandTokenAuthclasses to the Python SDK.ServiceAccountAuthhandles automatic token acquisition viaclient_credentialsgrant and re-authentication on expiry.TokenAuthaccepts a pre-obtained token as-is. AttachAuthorization: Bearerheader to all requests when auth is configured. Unit tests included.Stretch Goal
Configure Keycloak identity brokering for AWS IAM (OIDC), Google Cloud Identity (OAuth2), and Azure AD (SAML/OIDC). No AgentCube code changes - Keycloak admin configuration and documentation only.
Testing and Documentation
13. E2E tests
End-to-end tests covering: mTLS between internal components, Keycloak token issuance and Router validation, role enforcement, RLAC ownership isolation, Token Exchange, and the Hybrid
mtls/jwtPicoD auth modes.14. User guide and developer documentation
Document setup instructions for SPIRE and file-based certificate modes, Keycloak configuration, multi-tenant realm configuration, SDK usage, role management, RLAC validation, and cloud provider federation. Add to
docs/.