Skip to content

xyphoes0727/GitAnalyze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

📦 GitHub Repository Summarizer & Analyzer (RAG-powered)

A web-based AI tool that summarizes and analyzes any GitHub repository — public or private — and answers natural language questions about its contents using Retrieval-Augmented Generation (RAG).


🚀 Features

  • Repo Summarization: Concise overview of the repository's purpose, structure, and key components.
  • Intelligent Q&A: Ask questions like:
    • "What is the main architecture of this project?"
    • "Which files handle authentication?"
    • "What were the major changes in the last month?"
  • Multi-Source Context: Combines README, documentation, code, commit history, and issues for better context.
  • Private Repo Support: Secure GitHub OAuth login for private repositories.
  • Interactive Web UI: Built in React with a clean, responsive design.
  • Insights Dashboard (optional): Shows contributors, tech stack, and complexity metrics.
  • PDF Export (optional): Download an AI-generated project report.

🏗 Architecture

            ┌──────────────────────┐
            │       Frontend        │
            │    (React / Vite)     │
            └──────────┬───────────┘
                       │ REST / WebSocket
┌──────────────────────┴────────────────────────┐
│                  Backend                       │
│   Golang Service                               │
│     - Repo fetching via GitHub API             │
│     - OAuth for private repo access            │
│     - API gateway to Python RAG service        │
│                                                │
│   Python Service                               │
│     - Preprocessing & chunking                 │
│     - Embedding generation                     │
│     - Vector DB storage                        │
│     - RAG pipeline with LLM                    │
└──────────────────────┬────────────────────────┘
                       │
              ┌────────┴────────┐
              │   Vector DB      │
              │ (Pinecone, etc.) │
              └──────────────────┘

🛠 Tech Stack Frontend

React, Tailwind CSS, Axios

Backend

Golang: GitHub API integration, auth, routing

Python: RAG pipeline, embeddings, LLM calls (LangChain or custom)

AI / NLP

Embeddings: OpenAI text-embedding-3-large / HuggingFace models

LLM: GPT-4, Claude, or LLaMA

Vector DB: Pinecone / Weaviate / Chroma

Other

GitHub REST API v3

OAuth for authentication

Docker for deployment

⚙️ How It Works

User Input

    Paste a GitHub repo URL.

    Authenticate via GitHub for private repos.

Data Extraction (Golang Service)

    Fetch README, docs, code, commits, and issues.

    Send extracted content to Python service.

Preprocessing (Python Service)

    Separate text from code.

    Chunk into ~500–1000 token segments.

    Create embeddings.

Storage

    Store embeddings + metadata in vector DB.

RAG Pipeline

    User query → Embed → Search vector DB → Retrieve relevant chunks → LLM generates answer.

Frontend Display

    Repo summary panel.

    Q&A section with highlighted code references.

    Optional PDF report download.

🔧 Installation Prerequisites

Node.js >= 18

Go >= 1.21

Python >= 3.10

GitHub Personal Access Token (for testing private repos)

OpenAI API Key (or HuggingFace model setup)

Pinecone API Key

Clone & Install

1. Clone the repo

git clone https://github.com//github-repo-analyzer.git cd github-repo-analyzer

2. Install frontend dependencies

cd frontend npm install

3. Install Golang backend dependencies

cd ../backend-go go mod tidy

4. Install Python backend dependencies

cd ../backend-python pip install -r requirements.txt

▶️ Running the Project Start Golang API

cd backend-go go run main.go

Start Python RAG Service

cd backend-python python main.py

Start React Frontend

cd frontend npm run dev

📡 Example API Endpoints Golang Service

GET /api/repo/summary?url=<repo_url> POST /api/repo/query { "url": "https://github.com/user/project", "question": "What is the architecture of this repo?" }

Python RAG Service

POST /rag/query { "query": "List all API endpoints in the repo", "repo_id": "12345" }

🧠 RAG Pipeline (Python) – Pseudocode

def rag_query(query, repo_id): # Embed the query q_emb = embed_model.embed(query)

# Search vector DB
relevant_chunks = vector_db.search(q_emb, top_k=5, filter={"repo_id": repo_id})

# Build context
context = "\n".join([chunk["content"] for chunk in relevant_chunks])

# LLM Answer
prompt = f"Answer based on repo context:\n{context}\n\nQuestion: {query}"
answer = llm.generate(prompt)

return answer

🖼 Screenshots (Coming Soon)

screenshots/home.png – Main dashboard

screenshots/query.png – Q&A page

screenshots/summary.png – Repo summary

🔮 Future Enhancements

Repo comparison feature.

Security vulnerability scanning.

Automated documentation quality scoring.

GitHub webhook integration for real-time updates.

📜 License

MIT License – feel free to fork and modify. 🤝 Contributing

Pull requests are welcome! For major changes, open an issue first to discuss your ideas. 📬 Contact

Author: Yug Dalwadi Email: [email protected] GitHub: YugDalwadi LinkedIn: Profile

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages