An integrated research assistant built with retrieval, structured data pipelines, and OpenAlex ingestion.
research-llm/
├── rag/ # Retrieval-Augmented Generation system
├── database/ # Data ingestion, processing, and storage
├── openalex/ # OpenAlex data pipeline
Core retrieval and question answering engine.
- Semantic search
- Graph-based retrieval
- Streamlit interface
- Session memory
📂 Location: /rag
📖 Details: RAG README
Handles ingestion, preprocessing, and enrichment of research data.
- PDF ingestion
- Data cleaning and normalization
- Model training utilities
- Pipeline orchestration
📂 Location: /database
📖 Details: Database README
Fetches and processes academic metadata from OpenAlex.
- Paper metadata retrieval
- Download pipelines
- Chunking and normalization
- Graph + vector ingestion
📂 Location: /openalex
📖 Details: OpenAlex README
- OpenAlex → fetch research papers
- Database → clean, structure, enrich
- RAG → retrieve + answer queries
# Clone repo
git clone https://github.com/AryanApte1408/research-llm.git
cd research-llmNavigate to a module:
cd rag
# or
cd database
# or
cd openalexFollow instructions in each module’s README.
- Python
- ChromaDB
- Neo4j
- Streamlit
- OpenAlex API
Each module is independently runnable but designed to work together as a pipeline.