Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sandbox] Kepimetheus #315

Open
2 tasks done
randomk opened this issue Dec 3, 2024 · 2 comments
Open
2 tasks done

[Sandbox] Kepimetheus #315

randomk opened this issue Dec 3, 2024 · 2 comments
Labels
New New Application

Comments

@randomk
Copy link

randomk commented Dec 3, 2024

Application contact emails

Kepimetheus is an open-source natural language interface for Prometheus monitoring, enabling users to write PromQL queries in any language. It integrates with Grafana and currently uses Amazon Bedrock for query translation, with plans to support multiple AI providers. The project aims to simplify Kubernetes monitoring by removing the barrier of PromQL syntax complexity

Project Summary

AI-powered cloud-native operations platform that enhances Kubernetes workload management through natural language interfaces and intelligent automation.

Project Description

Kepimetheus aligns with CNCF's mission by making cloud native observability more accessible. It reduces the learning curve for Prometheus monitoring, which is already a graduated CNCF project, and integrates with other CNCF ecosystem tools like Kubernetes and Grafana

Org repo URL (provide if all repos under the org are in scope of the application)

https://github.com/kepimetheus

Project repo URL in scope of application

https://github.com/kepimetheus/kepimetheus

Additional repos in scope of the application

https://github.com/kepimetheus/kepimetheus.github.io
https://github.com/kepimetheus/kepimetheus
https://github.com/kepimetheus/roadmap
https://github.com/kepimetheus/kepimetheus-nlppromqlplugin-app
https://github.com/kepimetheus/helm-charts
https://github.com/kepimetheus/aws-reinvent-presentation

Website URL

https://kepimetheus.github.io/

Roadmap

https://github.com/kepimetheus/roadmap/tree/main

Roadmap context

Our development roadmap is structured in an innovative "game-like" progression system, divided into distinct Worlds that represent major platform evolution phases:

World 0 - Current Release: "Ground Zero"

  • Establishing cloud-native foundations with AWS Bedrock
  • Focus on developer experience and immediate value delivery
  • Integration with familiar tools (Grafana, Helm) to reduce adoption friction

World 1 - Multi-LLM Evolution

  • Expanding beyond single AI model dependency
  • Introducing intelligent model selection based on workload
  • Supporting local deployment options for enhanced flexibility
  • Boss Fight Challenge: Automatic model selection system that optimizes for both cost and performance

World 2 - Operational Intelligence

  • Moving from passive monitoring to active insights
  • Real-time analysis of deployment patterns
  • Advanced Kubernetes event correlation
  • Boss Fight Challenge: Implementing predictive scaling that actually works

World 3 - Autonomous Operations

  • Transition from insights to automated actions
  • Pattern-based problem resolution
  • Historical data-driven decision making
  • Boss Fight Challenge: Achieving reliable self-healing capabilities

World 4 - MLOps Mastery

  • Full GitOps integration for AI workflows
  • End-to-end pipeline analytics
  • Custom model training integration
  • Boss Fight Challenge: Complete MLOps pipeline with measurable ROI

Each World builds upon previous achievements, with "Boss Fight" challenges representing significant technical milestones that unlock new platform capabilities. The progression system is designed to deliver immediate value while constantly evolving toward more sophisticated AI-driven operations.

Contributing Guide

https://github.com/kepimetheus/kepimetheus.github.io/blob/main/CONTRIBUTING.md

Code of Conduct (CoC)

https://github.com/kepimetheus/kepimetheus.github.io/blob/main/CODE_OF_CONDUCT.md

Adopters

https://rtspartners.com.br/

Contributing or Sponsoring Org

No response

Maintainers file

https://github.com/kepimetheus/kepimetheus.github.io/blob/main/MAINTAINERS.md

IP Policy

  • If the project is accepted, I agree the project will follow the CNCF IP Policy

Trademark and accounts

  • If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF

Why CNCF?

Our project's mission to enhance cloud-native development through AI-driven operations aligns perfectly with CNCF's commitment to fostering innovation in cloud native computing. Here's why we believe CNCF is the ideal home for our project:

Ecosystem Alignment:

  • Our integration with key CNCF projects (Kubernetes, Prometheus, Helm) demonstrates our commitment to the cloud-native ecosystem
  • We share CNCF's vision of making cloud native computing ubiquitous
  • Our project bridges the gap between traditional cloud-native tools and emerging AI technologies

Community Benefits:

  • CNCF's neutral governance model ensures our project remains vendor-neutral and community-driven
  • Access to CNCF's extensive technical expertise will accelerate our development and adoption
  • CNCF's global reach will help us build a diverse, inclusive contributor base
  • Integration with CNCF events and marketing will increase project visibility

Technical Alignment:

  • Our architecture follows cloud-native principles: containerized, scalable, and observable
  • Focus on automation and AI aligns with CNCF's forward-looking approach to cloud computing
  • Commitment to open standards and interoperability matches CNCF's technical values

Innovation Contribution:

  • Pioneering the integration of LLMs in cloud-native operations
  • Advancing the state of automated operations and self-healing systems
  • Contributing to the evolution of MLOps in cloud-native environments

Long-term Value:

  • CNCF incubation will help establish best practices for AI in cloud-native tools
  • Collaboration opportunities with other CNCF projects will drive innovation
  • CNCF's support will ensure sustainable, long-term project growth

We believe our project can significantly contribute to CNCF's mission while benefiting from its extensive ecosystem, making this partnership mutually beneficial for both the project and the broader cloud-native community.

Benefit to the Landscape

Our project enhances the CNCF landscape by introducing intelligent automation at the intersection of cloud-native operations and AI, addressing several key gaps:

Unique Differentiators:

  • Native integration of Large Language Models (LLMs) into cloud-native operations
  • Progressive deployment approach that grows with organization's needs
  • Automatic workload-based AI model selection for cost and performance optimization
  • Natural language interfaces for cloud-native tooling

Enhancement to Existing Projects:

  1. Kubernetes Operations
  • Automated root cause analysis using AI
  • Predictive scaling based on historical patterns
  • Self-healing recommendations from operational data
  1. Observability Stack
  • Natural language querying for Grafana
  • AI-powered anomaly detection
  • Automated correlation of events across systems
  1. GitOps & CI/CD
  • AI-enhanced pipeline analytics
  • Automated runbook generation
  • Intelligent deployment analysis

Addressing Current Challenges:

  • Reduces operational complexity through AI automation
  • Lowers barrier to entry for cloud-native tooling
  • Bridges skill gaps through natural language interfaces
  • Enables predictive operations instead of reactive responses

Future Innovation:

  • Establishes patterns for AI integration in cloud-native tools
  • Creates framework for LLM-powered operational automation
  • Advances MLOps practices in cloud-native environments

Our project fills a crucial gap in the landscape by making cloud-native operations more intelligent, accessible, and automated while building upon and enhancing existing CNCF projects rather than replacing them.

Cloud Native 'Fit'

Our project embodies core cloud native principles and naturally fits into multiple critical areas of the CNCF landscape:

Orchestration & Management

  • Deep integration with Kubernetes for automated operations
  • AI-driven workload management and scaling
  • Self-healing capabilities aligned with cloud native resilience patterns
  • Built-in support for multi-cluster environments

Observability & Analysis

  • Native Grafana integration for visualization
  • Real-time analytics and insights
  • AI-powered anomaly detection
  • Natural language querying of metrics
  • Prometheus integration for metrics collection

Platform Engineering

  • GitOps-native deployment workflows
  • Infrastructure as Code principles
  • Automated configuration management
  • Helm-based deployment automation

App Definition & Development

  • Cloud native CI/CD integration
  • Support for microservices architecture
  • Container-first development approach
  • API-driven architecture

Machine Learning

  • Built-in AWS Bedrock integration
  • Multi-LLM support (Claude, Llama 2)
  • MLOps pipeline automation
  • Custom model training capabilities

Cloud Native Principles Embodied:

  1. Containerization
  • All components are containerized
  • Kubernetes-native deployment
  • Container-optimized architecture
  1. Service Orchestration
  • Dynamic service discovery
  • Automated scaling
  • Load balancing support
  1. Microservices
  • Loosely coupled architecture
  • API-first design
  • Independent scalability
  1. Immutable Infrastructure
  • GitOps workflows
  • Infrastructure as Code
  • Declarative configurations
  1. Observability
  • Built-in monitoring
  • Distributed tracing
  • Logging aggregation
  1. Automation
  • Automated deployment
  • Self-healing capabilities
  • AI-driven operations
  1. Scalability
  • Horizontal scaling
  • Cloud provider agnostic
  • Resource optimization

The project's architecture and features align perfectly with CNCF's Technical Oversight Committee's definition of cloud native computing, making it a natural fit within the landscape while bringing innovative AI capabilities to enhance existing patterns and practices.

Cloud Native 'Integration'

Our project has strong integrations with several CNCF projects, with Helm being our primary deployment method. Here's our detailed integration landscape:

Core Dependencies:

Helm (Primary Deployment Method)

  • Complete Helm chart repository structure
  • Production-ready default configurations
  • Built-in Prometheus integration
  • Auto-configuration capabilities
  • Single command deployment: helm install my-release ./chart
  • Comprehensive value customization
  • Automated sidecar injection
  • Rolling update support
  • Built-in health checks
  • Resource management presets
  • Multi-environment configurations
  • Horizontal Pod Autoscaling setup

Additional CNCF Integrations:

Kubernetes

  • Native resource management
  • Custom Resource Definitions (CRDs) for AI operations
  • Service account configuration
  • RBAC policies
  • Pod security policies
  • Network policies
  • Resource quotas

Prometheus

  • Metric collection integration
  • Custom metrics for AI operations
  • Alert rule templates
  • Recording rules
  • Service discovery

Grafana

  • Custom dashboard templates
  • Natural language query panels
  • AI insights visualization
  • Metric correlation views
  • Alert visualization

Future Integration Plans:

  • OpenTelemetry for enhanced observability
  • Linkerd for service mesh capabilities
  • Flux for GitOps workflows
  • KEDA for event-driven autoscaling
  • Keptn for automated operations

Our Helm integration specifically demonstrates our commitment to cloud native best practices by providing:

  1. Zero-configuration deployments
  2. Production-ready defaults
  3. Comprehensive documentation
  4. Multi-environment support
  5. Security best practices
  6. Resource optimization

The project is designed to enhance and extend these CNCF technologies rather than replace them, creating a seamless integration that brings AI capabilities to existing cloud native workflows.

Cloud Native Overlap

Cloud Native Overlap


Our project has some functional overlaps with existing CNCF projects, which we acknowledge and explain how we complement rather than compete:

Monitoring & Observability Overlap:

Prometheus & Grafana

  • Overlap: Our Grafana plugin provides visualization capabilities
  • Differentiation: We enhance rather than replace by adding:
  • Natural language query interface
  • AI-powered metric correlation
  • Automated insight generation
  • One-click visualization features

Root Cause Analysis:

Keptn

  • Overlap: Automated issue detection and resolution
  • Differentiation: Our approach uses:
  • LLM-based pattern recognition
  • Historical data correlation
  • Context-aware recommendations
  • Predictive analysis capabilities

MLOps & Pipeline:

Kubeflow

  • Overlap: Machine learning operations
  • Differentiation: We focus on:
  • LLM-specific optimizations
  • Cloud-native AI integration
  • Automated model selection
  • Pay-per-query optimization

Automation & Operations:

Argo

  • Overlap: Workflow automation and GitOps
  • Differentiation: We add:
  • AI-driven deployment analysis
  • Intelligent rollback decisions
  • Automated runbook generation
  • Predictive scaling recommendations

In each case, we've designed our project to integrate with and enhance these existing tools rather than replace them, focusing on adding value through AI-powered capabilities while maintaining compatibility with established workflows.

Similar projects

N/A

Landscape

No, we are not currently listed on the CNCF Landscape. We are in the process of applying for inclusion.

Business Product or Service to Project separation

N/A

Project Domain Technical Review

Yes, we have completed the Day 0 portion of the General Technical Review questionnaire. You can find our completed Day 0 documentation at: https://github.com/kepimetheus/kepimetheus.github.io/blob/main/day-zero.md

We are currently in the process of scheduling presentations with:

  • TAG App-Delivery: For our Helm integration and deployment automation capabilities
  • TAG Runtime: For our Kubernetes and container runtime integrations

The presentations will focus on demonstrating how our AI-powered automation enhances cloud-native operations while maintaining compatibility with existing CNCF projects and patterns.

CNCF Contacts

No response

Additional information

Our project demonstrates innovative integration between cloud-native tooling and AI capabilities while maintaining strong alignment with CNCF principles:

  1. Production Readiness
  • Complete Helm chart with production defaults
  • Comprehensive monitoring integration
  • Built-in high availability support
  • Documented security practices
  1. Community Focus
  • Active Discord community
  • Regular contributor meetings
  • Open decision-making process
  • Documented contribution guidelines
  1. Innovation
  • Natural language interface for cloud-native tools
  • AI-driven operational automation
  • Predictive scaling capabilities
  • Progressive maturity model ("Worlds")
  1. Ecosystem Enhancement
  • Native integration with key CNCF projects
  • Focus on extending rather than replacing
  • Clear project boundaries and scope
  • Well-defined integration points
  1. Early Adoption
  • Being used by several companies in production
  • Active feedback loop with early adopters
  • Growing community of contributors
  • Clear path to production use

We believe our project brings unique value to the CNCF landscape by making cloud-native operations more accessible through AI while maintaining the high standards expected of cloud-native tools.

@randomk randomk added the New New Application label Dec 3, 2024
@krook krook changed the title [Sandbox] Kepimetheus ?assignees= [Sandbox] Kepimetheus Dec 8, 2024
@dims
Copy link
Member

dims commented Dec 19, 2024

@randomk is this prompt the heart of the project?

https://github.com/kepimetheus/kepimetheus/blob/main/app.py#L59-L67

@randomk
Copy link
Author

randomk commented Dec 20, 2024

Thank you for your observation, @dims .
While that prompt is a project component, Kepimetheus represents a fundamental shift in cloud-native systems interaction. We're not just simplifying Prometheus queries – we're revolutionizing how teams approach observability and operations in Kubernetes environments.
By uniting Large Language Models with Grafana and Prometheus, Kepimetheus addresses a critical accessibility gap in cloud-native tooling. Our platform transforms complex metrics queries into natural language conversations, democratizing operational insights for users across all technical levels.
The code you've highlighted is just one piece of our broader vision. Our roadmap encompasses multi-LLM orchestration, autonomous operations, and advanced MLOps capabilities. Each feature builds toward an AI-enhanced future where cloud-native operations become more intuitive and robust.
Kepimetheus serves both immediate operational needs and long-term innovation goals. It empowers teams to leverage AI for enhanced decision-making while maintaining scalability and resilience. We're not just solving today's monitoring challenges but building the foundation for tomorrow's intelligent cloud-native operations.
I'd be excited to discuss how Kepimetheus is already solving real-world challenges for our users. This project represents more than technology – it's a step toward brilliant cloud-native operations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New New Application
Projects
Status: 📋 New
Development

No branches or pull requests

2 participants