Skip to content

SyracuseUniversity/research-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Research LLM System

An integrated research assistant built with retrieval, structured data pipelines, and OpenAlex ingestion.


Architecture Overview

research-llm/
├── rag/        # Retrieval-Augmented Generation system
├── database/   # Data ingestion, processing, and storage
├── openalex/   # OpenAlex data pipeline

Modules

RAG System

Core retrieval and question answering engine.

  • Semantic search
  • Graph-based retrieval
  • Streamlit interface
  • Session memory

📂 Location: /rag 📖 Details: RAG README


Database Pipeline

Handles ingestion, preprocessing, and enrichment of research data.

  • PDF ingestion
  • Data cleaning and normalization
  • Model training utilities
  • Pipeline orchestration

📂 Location: /database 📖 Details: Database README


OpenAlex Integration

Fetches and processes academic metadata from OpenAlex.

  • Paper metadata retrieval
  • Download pipelines
  • Chunking and normalization
  • Graph + vector ingestion

📂 Location: /openalex 📖 Details: OpenAlex README


Workflow

  1. OpenAlex → fetch research papers
  2. Database → clean, structure, enrich
  3. RAG → retrieve + answer queries

Quick Start

# Clone repo
git clone https://github.com/AryanApte1408/research-llm.git
cd research-llm

Navigate to a module:

cd rag
# or
cd database
# or
cd openalex

Follow instructions in each module’s README.


Tech Stack

  • Python
  • ChromaDB
  • Neo4j
  • Streamlit
  • OpenAlex API

Notes

Each module is independently runnable but designed to work together as a pipeline.


About

LLM chatbot for querying a database of papers by Syracuse researchers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages