AI-powered knowledge graph builder and RAG system for the OpenClaw and NemoClaw ecosystems
ClawGraph automatically crawls open-source repositories, extracts entities and relationships into a knowledge graph, and serves grounded answers via RAG — all secured with a 5-layer prompt injection defense. The interface is a Telegram app bot that uses the OpenClaw skill to interact with the knowledge graph.
| Conceptual Component | Implementation |
|---|---|
| Orchestration Pipeline | Directed Acyclic Graph-based crawl→extract→embed→graph→curate pipeline with retry and scheduling |
| Custom MCP Server | Python MCP server exposing 9 GitHub API tools via the Model Context Protocol |
| AI Pipeline | Gemini Flash Lite (extraction/classification) + Gemini 2.5 Flash (reasoning/curation) |
| Knowledge Graph & RAG | Neo4j / NetworkX graph + embedding search → graph-grounded answer generation with Chain of Thought reasoning |
| Prompt Injection Prevention | 5-layer defense: sanitizer → classifier → canary tokens → output guardrails → audit |
GitHub Repos ──▶ [GitHub MCP Server] ──▶ [Orchestration Pipeline] ──▶ [Knowledge Graph]
(Python) crawl→extract→embed→ (Neo4j / NetworkX)
graph_update→curate │
▼
Telegram Bot ◀── [OpenClaw Skill] ◀── [RAG Engine] ◀── [Graph + Vector Retrieval]
(Gemini 2.5 Flash + CoT)
│
[5-Layer Security Defense]
L1: Input Sanitizer
L2: Injection Classifier (Flash Lite)
L3: Canary Tokens
L4: Output Guardrails
L5: Audit Logger
- Python 3.12+
- Gemini API key (free tier)
- GitHub PAT (no scope needed for public repos)
- Docker (optional, for containerized deployment)
git clone https://github.com/YOUR_USERNAME/ClawGraph.git
cd ClawGraph
pip install -e ".[dev]"
cp .env.example .env
# Edit .env with your API keyspython -m pytest tests/ -vpython -m ClawGraph.main
# API available at http://localhost:8000# Via API
curl -X POST http://localhost:8000/api/pipeline/run
# Query the knowledge graph
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the Gateway in OpenClaw?"}'docker compose build
docker compose up -d
# Health check: curl http://localhost:8000/api/healthClawGraph/
├── github_mcp_server/ # Custom Python MCP Server (9 GitHub tools)
├── pipeline/ # Orchestration Pipeline (5 stages + scheduler)
│ └── stages/ # crawl, extract, embed, graph_update, curate
├── graph/ # Knowledge Graph (Neo4j + NetworkX backends)
├── RAG/ # RAG Engine (retriever, generator, embeddings)
├── security/ # 5-Layer Prompt Injection Defense
├── config.py # 12-factor app configuration
├── models.py # Pydantic data models
└── main.py # FastAPI application
openclaw_skill/ # OpenClaw integration (SKILL.md + tool)
tests/ # Tests including red-team security suite
All user queries pass through a 5-layer defense pipeline:
- L1 — Input Sanitizer: Strips injection delimiters, Unicode confusables, control characters
- L2 — Injection Classifier: Gemini Flash Lite classifies input as benign/suspicious/malicious
- L3 — Canary Tokens: Hidden UUID tokens in system prompts detect prompt leaks
- L4 — Output Guardrails: Blocks system prompt fragments, credentials, and code execution patterns
- L5 — Audit Logger: JSON-lines log of all security events for monitoring
Custom Python MCP server with 9 tools:
| Tool | Description |
|---|---|
get_repo_info |
Repository metadata (stars, forks, language, topics) |
list_repo_files |
File tree with types and sizes |
get_file_content |
Raw file content (base64-decoded) |
search_code |
Code search across repositories |
list_issues |
Issues with labels and comments |
list_pull_requests |
PRs with merge status |
list_forks |
Forks sorted by stars/activity |
get_commit_history |
Recent commits with messages |
get_contributors |
Contributors with commit counts |
Run standalone: python -m ClawGraph.github_mcp_server.server
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/query |
RAG query with prompt injection defense |
GET |
/api/graph/stats |
Knowledge graph statistics |
POST |
/api/pipeline/run |
Trigger manual pipeline run |
GET |
/api/security/audit |
Recent security events |
GET |
/api/health |
Service health check |
Install as an OpenClaw skill to use via Telegram:
/kg query What is the Gateway in OpenClaw?
/kg status
/kg crawl
/kg security-report
All configuration via environment variables (12-factor app):
| Variable | Required | Default | Description |
|---|---|---|---|
GEMINI_API_KEY |
Yes | — | Google AI Studio API key |
GITHUB_TOKEN |
Yes | — | GitHub PAT |
GRAPH_BACKEND |
No | memory |
neo4j or memory |
NEO4J_URI |
If neo4j | — | Neo4j Aura connection URI |
PIPELINE_SCHEDULE |
No | 0 3 * * * |
Cron schedule for auto-crawl |
PIPELINE_TARGETS |
No | openclaw/openclaw,NVIDIA/NemoClaw |
Repos to crawl |
MIT