📦 GitHub Repository Summarizer & Analyzer (RAG-powered)

A web-based AI tool that summarizes and analyzes any GitHub repository — public or private — and answers natural language questions about its contents using Retrieval-Augmented Generation (RAG).

🚀 Features

Repo Summarization: Concise overview of the repository's purpose, structure, and key components.
Intelligent Q&A: Ask questions like:
- "What is the main architecture of this project?"
- "Which files handle authentication?"
- "What were the major changes in the last month?"
Multi-Source Context: Combines README, documentation, code, commit history, and issues for better context.
Private Repo Support: Secure GitHub OAuth login for private repositories.
Interactive Web UI: Built in React with a clean, responsive design.
Insights Dashboard (optional): Shows contributors, tech stack, and complexity metrics.
PDF Export (optional): Download an AI-generated project report.

🏗 Architecture

            ┌──────────────────────┐
            │       Frontend        │
            │    (React / Vite)     │
            └──────────┬───────────┘
                       │ REST / WebSocket
┌──────────────────────┴────────────────────────┐
│                  Backend                       │
│   Golang Service                               │
│     - Repo fetching via GitHub API             │
│     - OAuth for private repo access            │
│     - API gateway to Python RAG service        │
│                                                │
│   Python Service                               │
│     - Preprocessing & chunking                 │
│     - Embedding generation                     │
│     - Vector DB storage                        │
│     - RAG pipeline with LLM                    │
└──────────────────────┬────────────────────────┘
                       │
              ┌────────┴────────┐
              │   Vector DB      │
              │ (Pinecone, etc.) │
              └──────────────────┘

🛠 Tech Stack Frontend

React, Tailwind CSS, Axios

Backend

Golang: GitHub API integration, auth, routing

Python: RAG pipeline, embeddings, LLM calls (LangChain or custom)

AI / NLP

Embeddings: OpenAI text-embedding-3-large / HuggingFace models

LLM: GPT-4, Claude, or LLaMA

Vector DB: Pinecone / Weaviate / Chroma

Other

GitHub REST API v3

OAuth for authentication

Docker for deployment

⚙️ How It Works

User Input

    Paste a GitHub repo URL.

    Authenticate via GitHub for private repos.

Data Extraction (Golang Service)

    Fetch README, docs, code, commits, and issues.

    Send extracted content to Python service.

Preprocessing (Python Service)

    Separate text from code.

    Chunk into ~500–1000 token segments.

    Create embeddings.

Storage

    Store embeddings + metadata in vector DB.

RAG Pipeline

    User query → Embed → Search vector DB → Retrieve relevant chunks → LLM generates answer.

Frontend Display

    Repo summary panel.

    Q&A section with highlighted code references.

    Optional PDF report download.

🔧 Installation Prerequisites

Node.js >= 18

Go >= 1.21

Python >= 3.10

GitHub Personal Access Token (for testing private repos)

OpenAI API Key (or HuggingFace model setup)

Pinecone API Key

Clone & Install

1. Clone the repo

git clone https://github.com//github-repo-analyzer.git cd github-repo-analyzer

2. Install frontend dependencies

cd frontend npm install

3. Install Golang backend dependencies

cd ../backend-go go mod tidy

4. Install Python backend dependencies

cd ../backend-python pip install -r requirements.txt

▶️ Running the Project Start Golang API

cd backend-go go run main.go

Start Python RAG Service

cd backend-python python main.py

Start React Frontend

cd frontend npm run dev

📡 Example API Endpoints Golang Service

GET /api/repo/summary?url=<repo_url> POST /api/repo/query { "url": "https://github.com/user/project", "question": "What is the architecture of this repo?" }

Python RAG Service

POST /rag/query { "query": "List all API endpoints in the repo", "repo_id": "12345" }

🧠 RAG Pipeline (Python) – Pseudocode

def rag_query(query, repo_id): # Embed the query q_emb = embed_model.embed(query)

# Search vector DB
relevant_chunks = vector_db.search(q_emb, top_k=5, filter={"repo_id": repo_id})

# Build context
context = "\n".join([chunk["content"] for chunk in relevant_chunks])

# LLM Answer
prompt = f"Answer based on repo context:\n{context}\n\nQuestion: {query}"
answer = llm.generate(prompt)

return answer

🖼 Screenshots (Coming Soon)

screenshots/home.png – Main dashboard

screenshots/query.png – Q&A page

screenshots/summary.png – Repo summary

🔮 Future Enhancements

Repo comparison feature.

Security vulnerability scanning.

Automated documentation quality scoring.

GitHub webhook integration for real-time updates.

📜 License

MIT License – feel free to fork and modify. 🤝 Contributing

Pull requests are welcome! For major changes, open an issue first to discuss your ideas. 📬 Contact

Author: Yug Dalwadi Email: [email protected] GitHub: YugDalwadi LinkedIn: Profile

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend/cmd		backend/cmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📦 GitHub Repository Summarizer & Analyzer (RAG-powered)

🚀 Features

🏗 Architecture

1. Clone the repo

2. Install frontend dependencies

3. Install Golang backend dependencies

4. Install Python backend dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📦 GitHub Repository Summarizer & Analyzer (RAG-powered)

🚀 Features

🏗 Architecture

1. Clone the repo

2. Install frontend dependencies

3. Install Golang backend dependencies

4. Install Python backend dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages