Skyflo.ai is an AI co-pilot for Cloud & DevOps that unifies Kubernetes operations and CI/CD systems (starting with Jenkins) behind a natural-language interface. The Engine runs a LangGraph workflow with a human-in-the-loop safety gate for any changes, executes tools via the MCP server, and streams sanitized, real-time SSE updates to the UI.
Skyflo.ai follows a clean architecture with the following components:
-
Engine (
/engine): Core backend service that turns natural language into safe Cloud Native operations using a LangGraph workflow.- LangGraph-based execution with nodes:
entry→model→gate→final - LLM integration via LiteLLM with controlled auto‑continue and iteration limits
- Approval policy for WRITE operations with human-in-the-loop safety
- Manages authentication, conversations, titles, persistence, and rate limiting
- Communicates with the MCP server for tool discovery and execution
- Real-time Server-Sent Events (SSE) streaming for tokens, tool progress, and workflow events
- Uses Redis pub/sub internally for streaming and stop signals; optional Postgres checkpointer for resilience
- LangGraph-based execution with nodes:
-
MCP Server (
/mcp): MCP server that exposes standardized cloud-native tools for the Engine to execute.- Built with FastMCP; single entrypoint registers tools and metadata
- Tool categories:
kubectl,argo(Rollouts),helm, andjenkins(CI) - Safety checks, validation, and clear parameter docs (Pydantic)
- Uses FastMCP Streamable HTTP transport for Engine communication; automatic tool discovery and registration
- Executes commands securely against cluster resources
-
Command Center (
/ui): UI command center delivering real-time insights and control for cloud native operations.- Next.js 14 application with a responsive Markdown-based chat UI
- Streams tokens and tool events from the Engine via SSE
- Server-side API routes (BFF) proxy to the Engine, forwarding cookies/auth headers
- Displays real-time workflow progress, tool results, and terminal-like outputs
- Manages conversations, profile/password, and team administration (roles, invitations)
-
Kubernetes Controller (
/kubernetes-controller): Kubernetes controller orchestrating secure, scalable deployments in cloud native environments.- Implements custom resource definitions (CRDs) for the Kubernetes operator
- Manages deployment and configuration of all Skyflo.ai components within Kubernetes clusters
- Provides unified deployment through a single
SkyfloAIcustom resource - Handles dynamic configuration updates and scaling of components
- Implements secure RBAC management for cluster interactions
- Supports namespace isolation and fine-grained access control
- Manages standard Kubernetes resources (Deployments, Services) for UI and API components
- Monitors and reports component health through status conditions
Skyflo.ai employs a graph-based workflow powered by LangGraph. The workflow is organized into the following phases:
-
Model Phase (Planning):
- Analyzes the user's natural language to determine intent
- Performs lightweight discovery when needed to ground the plan
- Produces structured tool calls for the next phase
-
Tool Gate (Execution):
- Executes MCP tools (
kubectl,argo,helm) with validated parameters - Requires explicit approval for any WRITE/mutating operation
- Resolves dynamic parameters from previous steps and supports recursive operations
- Streams progress/results back to the model and UI
- Executes MCP tools (
-
Verification Phase:
- Evaluates outcomes against the original intent and summarizes results
- Decides whether to auto‑continue, request approval, or stop
- If issues are detected, routes context back to the model phase for refinement
- Natural Language Kubernetes Management: Perform operations via natural language
- Tool Execution via MCP: Standardized tools for
kubectl,argo(Rollouts), andhelm - Human-in-the-Loop Safety: Explicit approvals required for WRITE operations
- SSE Streaming: Live tokens, tool progress, and results
- Token Usage & Metrics: Real-time tracking of token consumption, TTFT, and TTR latency
- Resource Discovery: Automatic discovery to ground actions
- Multi-stage Operations: Complex workflows broken into manageable steps
- Context-aware Responses: Maintains conversation history
- Conversation Persistence: Saved timelines with title generation
- Team Administration: Roles, invitations, and member management
- Rate Limiting: Protects services via Redis-backed limiter
- Optional Checkpointer: Postgres-backed workflow resilience
- Progressive Delivery: Argo Rollouts support
- Package Management: Helm install/upgrade/rollback
- External Integrations: Jenkins CI with secure credentials and integration-aware tools
- Terminal-style Output: Live command output visualization
- Backend: Python 3.11+, FastAPI, LangGraph, LiteLLM, Tortoise ORM, Aerich, Redis, PostgreSQL
- Frontend: React, Next.js 14, TypeScript, Tailwind CSS
- AI/ML: LangGraph, LLM integration via LiteLLM
- Infrastructure: Kubernetes, Argo Rollouts, Helm
- Communication: Server-Sent Events (SSE), Redis pub/sub
- Security: JWT authentication via fastapi-users, role-based access (admin)