A web-based AI tool that summarizes and analyzes any GitHub repository — public or private — and answers natural language questions about its contents using Retrieval-Augmented Generation (RAG).
- Repo Summarization: Concise overview of the repository's purpose, structure, and key components.
- Intelligent Q&A: Ask questions like:
- "What is the main architecture of this project?"
- "Which files handle authentication?"
- "What were the major changes in the last month?"
- Multi-Source Context: Combines README, documentation, code, commit history, and issues for better context.
- Private Repo Support: Secure GitHub OAuth login for private repositories.
- Interactive Web UI: Built in React with a clean, responsive design.
- Insights Dashboard (optional): Shows contributors, tech stack, and complexity metrics.
- PDF Export (optional): Download an AI-generated project report.
┌──────────────────────┐
│ Frontend │
│ (React / Vite) │
└──────────┬───────────┘
│ REST / WebSocket
┌──────────────────────┴────────────────────────┐
│ Backend │
│ Golang Service │
│ - Repo fetching via GitHub API │
│ - OAuth for private repo access │
│ - API gateway to Python RAG service │
│ │
│ Python Service │
│ - Preprocessing & chunking │
│ - Embedding generation │
│ - Vector DB storage │
│ - RAG pipeline with LLM │
└──────────────────────┬────────────────────────┘
│
┌────────┴────────┐
│ Vector DB │
│ (Pinecone, etc.) │
└──────────────────┘
🛠 Tech Stack Frontend
React, Tailwind CSS, Axios
Backend
Golang: GitHub API integration, auth, routing
Python: RAG pipeline, embeddings, LLM calls (LangChain or custom)
AI / NLP
Embeddings: OpenAI text-embedding-3-large / HuggingFace models
LLM: GPT-4, Claude, or LLaMA
Vector DB: Pinecone / Weaviate / Chroma
Other
GitHub REST API v3
OAuth for authentication
Docker for deployment
⚙️ How It Works
User Input
Paste a GitHub repo URL.
Authenticate via GitHub for private repos.
Data Extraction (Golang Service)
Fetch README, docs, code, commits, and issues.
Send extracted content to Python service.
Preprocessing (Python Service)
Separate text from code.
Chunk into ~500–1000 token segments.
Create embeddings.
Storage
Store embeddings + metadata in vector DB.
RAG Pipeline
User query → Embed → Search vector DB → Retrieve relevant chunks → LLM generates answer.
Frontend Display
Repo summary panel.
Q&A section with highlighted code references.
Optional PDF report download.
🔧 Installation Prerequisites
Node.js >= 18
Go >= 1.21
Python >= 3.10
GitHub Personal Access Token (for testing private repos)
OpenAI API Key (or HuggingFace model setup)
Pinecone API Key
Clone & Install
git clone https://github.com//github-repo-analyzer.git cd github-repo-analyzer
cd frontend npm install
cd ../backend-go go mod tidy
cd ../backend-python pip install -r requirements.txt
cd backend-go go run main.go
Start Python RAG Service
cd backend-python python main.py
Start React Frontend
cd frontend npm run dev
📡 Example API Endpoints Golang Service
GET /api/repo/summary?url=<repo_url> POST /api/repo/query { "url": "https://github.com/user/project", "question": "What is the architecture of this repo?" }
Python RAG Service
POST /rag/query { "query": "List all API endpoints in the repo", "repo_id": "12345" }
🧠 RAG Pipeline (Python) – Pseudocode
def rag_query(query, repo_id): # Embed the query q_emb = embed_model.embed(query)
# Search vector DB
relevant_chunks = vector_db.search(q_emb, top_k=5, filter={"repo_id": repo_id})
# Build context
context = "\n".join([chunk["content"] for chunk in relevant_chunks])
# LLM Answer
prompt = f"Answer based on repo context:\n{context}\n\nQuestion: {query}"
answer = llm.generate(prompt)
return answer
🖼 Screenshots (Coming Soon)
screenshots/home.png – Main dashboard
screenshots/query.png – Q&A page
screenshots/summary.png – Repo summary
🔮 Future Enhancements
Repo comparison feature.
Security vulnerability scanning.
Automated documentation quality scoring.
GitHub webhook integration for real-time updates.
📜 License
MIT License – feel free to fork and modify. 🤝 Contributing
Pull requests are welcome! For major changes, open an issue first to discuss your ideas. 📬 Contact
Author: Yug Dalwadi Email: [email protected] GitHub: YugDalwadi LinkedIn: Profile