Skip to content

SomneelSaha2042/MedGen

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Python React OpenAI Flask License

πŸ₯ MedGen

AI-Powered Synthetic Medical Data Generation & Privacy Evaluation Platform

Generate privacy-preserving synthetic medical datasets using Large Language Models with built-in utility and privacy risk assessment.


πŸ“‹ Table of Contents


🎯 Overview

MedGen addresses a critical challenge in healthcare AI: the scarcity of accessible medical data due to privacy regulations (HIPAA, GDPR). It was an idea my groupmates and I came up with for our project for CS3264 at NUS, I continued to work on the project and extending its functionality as LLMs grew in their analytical power. By leveraging state-of-the-art Large Language Models with Retrieval-Augmented Generation (RAG), MedGen generates high-quality synthetic medical datasets that:

  • βœ… Preserve statistical properties of original data
  • βœ… Maintain utility for machine learning tasks
  • βœ… Minimize privacy risks (singling out, linkability, inference attacks)
  • βœ… Enable safe data sharing for research and development

✨ Features

οΏ½ Dataset Management System

  • Unified Dataset Hub: Manage all datasets from a central location
  • Sample Datasets: Pre-loaded medical datasets (Pima Diabetes, Diabetes Prediction, Andrew's Diabetes)
  • Save & Organize: Save generated datasets with custom names and descriptions
  • One-Click Activation: Instantly switch between datasets for analysis
  • Preview & Delete: Preview any dataset or remove saved ones

πŸ”¬ Synthetic Data Generation

  • Dual Generation Modes:
    • ⚑ Fast Mode: Single API call batch generation (~5-10 seconds for 10-50 rows)
    • 🧠 Deep Mode: Feature-by-feature RAG-enhanced generation (slower but more context-aware)
  • LLM-Powered Generation: Uses GPT-4o-mini with customizable parameters
  • Auto-Batching: Automatic batching for large requests (>25 rows)
  • Real-time Progress: Live progress updates during generation
  • CSV Auto-Detection: Automatic delimiter detection (comma, semicolon, tab, pipe)

πŸ“Š Data Analysis & Visualization

  • Interactive Data Explorer: Upload, view, and analyze CSV datasets
  • Statistical Analysis: Automatic computation of distributions, correlations, and summary statistics
  • Rich Visualizations: Charts and graphs powered by Recharts

πŸ“₯ Export & Download

  • Download Synthetic Data: Export only the generated rows
  • Download Combined Data: Export original + synthetic merged datasets
  • Save for Later: Persist generated datasets for future use

πŸ§ͺ Utility Evaluation

  • Multi-Model Comparison: Evaluate with KNN, MLP, Naive Bayes, Random Forest, SGD, and SVM
  • Automated Pipeline: Split β†’ Train β†’ Generate β†’ Compare workflow
  • Performance Metrics: Accuracy, precision, recall, F1-score, confusion matrices

πŸ”’ Privacy Risk Assessment

  • Anonymeter Integration: Industry-standard privacy risk metrics
  • Singling Out Risk: Probability of uniquely identifying individuals
  • Linkability Risk: Risk of linking records across datasets
  • Inference Risk: Risk of inferring sensitive attributes

πŸ–₯️ Modern Web Interface

  • Material-UI v7 Design: Clean, responsive interface with cyberpunk dark theme
  • Sidebar Navigation: Quick access to all features
  • Real-time Updates: Live generation progress and status
  • Natural Language Queries: Ask questions about your data in plain English

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Frontend (React 19)                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚   Home   β”‚ β”‚ Datasets β”‚ β”‚ Explorer β”‚ β”‚ Analysis β”‚ β”‚ Generate β”‚ ...      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          Backend (Flask API)                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚    Dataset     β”‚  β”‚    Generate    β”‚  β”‚   Evaluation Pipeline      β”‚    β”‚
β”‚  β”‚   Management   β”‚  β”‚    Service     β”‚  β”‚   (ML Models + Privacy)    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚          β”‚                   β”‚                        β”‚                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                     Data Storage Layer                                β”‚  β”‚
β”‚  β”‚  ./data/saved_datasets/  β”‚  ./data/generated/  β”‚  ./data/chroma_db/  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β–Ό              β–Ό              β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ ChromaDB β”‚   β”‚  OpenAI  β”‚   β”‚  Anonymeter  β”‚
              β”‚ (Vector) β”‚   β”‚   API    β”‚   β”‚   (Privacy)  β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Tech Stack

Backend

Technology Purpose
Python 3.11+ Core language
Flask 3.1 REST API server
LlamaIndex RAG framework
ChromaDB Vector database for embeddings
OpenAI GPT-4o-mini Synthetic data generation
scikit-learn ML model evaluation
Anonymeter Privacy risk assessment
Pandas/NumPy Data processing

Frontend

Technology Purpose
React 19 UI framework
Material-UI v7 Component library
Recharts Data visualization
Framer Motion Animations
Axios HTTP client
React Router v7 Navigation

πŸš€ Getting Started

Prerequisites

  • Python 3.11 or 3.12
  • Node.js 18+ and npm
  • OpenAI API key

Installation

  1. Clone the repository

    git clone https://github.com/SomneelSaha2042/MedGen
    cd MedGen
  2. Set up Python environment

    # Using uv (recommended)
    pip install uv
    uv sync
    
    # Or using pip
    pip install -r requirements.txt
  3. Install frontend dependencies

    cd frontend
    npm install
    cd ..
  4. Configure environment variables

    cp .env.example .env
    # Edit .env and add your OpenAI API key
  5. Run the application

    # Start backend (terminal 1)
    uv run python backend.py
    
    # Start frontend (terminal 2)
    cd frontend && npm start
  6. Access the application

Using Makefile

make install    # Install all dependencies
make dev        # Run both backend and frontend
make backend    # Run backend only
make frontend   # Run frontend only
make clean      # Clean generated files

Docker (Alternative)

docker-compose up --build

πŸ“– Usage

1. Manage Datasets

Navigate to Datasets page to:

  • View all available sample datasets
  • Activate a dataset with one click
  • Save generated data for later use
  • Preview any dataset before activating

2. Upload Custom Dataset

Go to Data Explorer and upload your own CSV file. The platform automatically detects delimiters (comma, semicolon, tab).

3. Generate Synthetic Data

Go to Data Generation and configure:

  • Generation Mode: Fast (batch) or Deep (feature-by-feature)
  • Number of samples: How many synthetic rows to generate
  • Temperature (0.1-2.0): Controls randomness
  • Top-P (0.1-1.0): Nucleus sampling threshold
  • Frequency Penalty: Reduces repetitive patterns
  • Max Tokens: Maximum tokens per API call

After generation:

  • Download as CSV (synthetic only or combined)
  • Use for Analysis to switch to the generated data
  • Save for Later to store in your dataset library

4. Analyze Results

Use Analysis page to:

  • View statistical distributions
  • Generate charts and visualizations
  • Compare original vs synthetic data

5. Natural Language Queries

Use the Query Interface to ask questions about your data in plain English, powered by RAG.


πŸ“‘ API Reference

Dataset Management

Method Endpoint Description
GET /datasets List all datasets (sample + saved)
POST /datasets/<id>/activate Activate a dataset for analysis
POST /datasets/save Save generated data as new dataset
DELETE /datasets/<id> Delete a saved dataset
GET /datasets/<id>/preview Preview dataset (first 100 rows)

Data Generation

Method Endpoint Description
POST /generate_data Start synthetic data generation
GET /generation_status Check generation progress
GET /get_generated_data Retrieve generated data
GET /download_data?type=<type> Download as CSV (synthetic/combined/original)
POST /use_generated_data Switch to generated data for analysis

File Operations

Method Endpoint Description
POST /upload Upload CSV dataset
GET /check_csv_status Check if CSV is loaded
POST /delete_current_csv Remove current CSV
GET /sample_datasets List sample datasets
POST /use_sample_dataset Use a sample dataset

Analysis

Method Endpoint Description
GET /stats_query Get statistical analysis
POST /stream_analysis Stream analysis results
POST /query_csv Execute pandas query

System

Method Endpoint Description
GET /health Health check endpoint
GET /data_availability Check available data

Example: Generate Data (Fast Mode)

curl -X POST http://localhost:5000/generate_data \
  -H "Content-Type: application/json" \
  -d '{
    "numSamples": 50,
    "temperature": 0.7,
    "topP": 0.9,
    "repetitionPenalty": 1.1,
    "maxTokens": 4096,
    "generationMode": "fast"
  }'

Example: Save Generated Dataset

curl -X POST http://localhost:5000/datasets/save \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Study Data",
    "description": "100 synthetic diabetes records",
    "type": "combined"
  }'

πŸ”¬ Evaluation Pipeline

The evaluation pipeline (basic_eval_pipeline.py) performs:

  1. Data Splitting: 80% training / 20% test
  2. Original Training: Train 6 ML models on original training data
  3. Synthetic Generation: Generate synthetic data matching training set size
  4. Synthetic Training: Train same models on synthetic data
  5. Evaluation: Compare both on the held-out test set
  6. Visualization: Generate comparison plots and metrics

Supported Models

  • K-Nearest Neighbors (KNN)
  • Multi-Layer Perceptron (MLP)
  • Naive Bayes
  • Random Forest
  • Stochastic Gradient Descent (SGD)
  • Support Vector Machine (SVM)

Run Evaluation

uv run python basic_eval_pipeline.py

Multi-Dataset Evaluation

uv run python multi_dataset_pipeline.py

πŸ”’ Privacy Assessment

MedGen uses Anonymeter for privacy risk evaluation:

Singling Out Risk

Measures the probability that a synthetic record can uniquely identify an individual from the original dataset.

Linkability Risk

Assesses whether records in the synthetic dataset can be linked to records in external datasets.

Inference Risk

Evaluates the risk of inferring sensitive attributes about individuals using the synthetic data.

Run Privacy Evaluation

uv run python anonymeter_privacy_eval.py

πŸ“ Project Structure

MedGen/
β”œβ”€β”€ backend.py                 # Flask API server (main entry point)
β”œβ”€β”€ generate_data.py           # LLM synthetic data generation (fast + deep modes)
β”œβ”€β”€ rag.py                     # RAG system with ChromaDB
β”œβ”€β”€ basic_eval_pipeline.py     # ML evaluation pipeline
β”œβ”€β”€ multi_dataset_pipeline.py  # Multi-dataset evaluation
β”œβ”€β”€ anonymeter_privacy_eval.py # Privacy risk assessment
β”œβ”€β”€ preprocess.py              # Data preprocessing utilities
β”œβ”€β”€ dquery.py                  # Feature analysis with LLM
β”‚
β”œβ”€β”€ frontend/                  # React frontend application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/        # React components
β”‚   β”‚   β”‚   β”œβ”€β”€ Home.js        # Landing page
β”‚   β”‚   β”‚   β”œβ”€β”€ DatasetManager.js  # Dataset management UI
β”‚   β”‚   β”‚   β”œβ”€β”€ DataExplorer.js    # Data upload and preview
β”‚   β”‚   β”‚   β”œβ”€β”€ DataGeneration.js  # Generation interface
β”‚   β”‚   β”‚   β”œβ”€β”€ Analysis.js        # Data analysis & charts
β”‚   β”‚   β”‚   β”œβ”€β”€ Database.js        # Database info
β”‚   β”‚   β”‚   β”œβ”€β”€ Sidebar.js         # Navigation sidebar
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   └── api.js         # API client with all endpoints
β”‚   β”‚   └── App.js             # Main app with routing
β”‚   └── package.json
β”‚
β”œβ”€β”€ data/                      # Runtime data storage
β”‚   β”œβ”€β”€ saved_datasets/        # User-saved datasets
β”‚   β”œβ”€β”€ generated/             # Generated synthetic data
β”‚   β”œβ”€β”€ chroma_db/             # ChromaDB vector store
β”‚   └── features/              # Feature documents for RAG
β”‚
β”œβ”€β”€ evals/                     # Evaluation module
β”‚   β”œβ”€β”€ models/                # ML model implementations
β”‚   β”‚   β”œβ”€β”€ knn.py
β”‚   β”‚   β”œβ”€β”€ mlp.py
β”‚   β”‚   β”œβ”€β”€ naivebayes.py
β”‚   β”‚   β”œβ”€β”€ randomforest.py
β”‚   β”‚   β”œβ”€β”€ sgd.py
β”‚   β”‚   └── svm.py
β”‚   β”œβ”€β”€ dataset/               # Evaluation datasets
β”‚   └── pristine_datasets/     # Original unmodified datasets
β”‚
β”œβ”€β”€ datasets/                  # Sample datasets
β”œβ”€β”€ results/                   # Generated results and plots
β”œβ”€β”€ multi_dataset_results/     # Multi-dataset evaluation results
β”‚
β”œβ”€β”€ .vscode/                   # VS Code configuration
β”‚   β”œβ”€β”€ launch.json            # Debug configurations
β”‚   └── settings.json          # Editor settings
β”‚
β”œβ”€β”€ pyproject.toml             # Python project configuration (uv)
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ Makefile                   # Build automation
β”œβ”€β”€ docker-compose.yml         # Docker configuration
β”œβ”€β”€ Dockerfile                 # Backend container
└── .env.example               # Environment template

πŸ”„ Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         User Workflow                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Select Dataset β”‚    β”‚  Upload Custom  β”‚    β”‚   Use Sample    β”‚
β”‚   (Datasets)    β”‚    β”‚   (Explorer)    β”‚    β”‚   Dataset       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Active Dataset    β”‚
                    β”‚  (RAG Index Built)  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β–Ό                     β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Analyze      β”‚   β”‚    Generate     β”‚   β”‚     Query       β”‚
β”‚   (Analysis)    β”‚   β”‚  (Generation)   β”‚   β”‚   (Database)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Generated Data     β”‚
                    β”‚ (Synthetic Rows)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β–Ό                     β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Download     β”‚   β”‚  Use for        β”‚   β”‚     Save        β”‚
β”‚    as CSV       β”‚   β”‚  Analysis       β”‚   β”‚   for Later     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                      β”‚
                                                      β–Ό
                                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                           β”‚   Saved Datasets    β”‚
                                           β”‚ (Datasets Library)  β”‚
                                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Built as part of CS3264 coursework at the National University of Singapore
  • Uses Anonymeter for privacy evaluation
  • Powered by OpenAI GPT-4o-mini
  • UI components from Material-UI
  • RAG framework by LlamaIndex

Made with ❀️ for privacy-preserving healthcare AI

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 49.3%
  • Python 46.3%
  • CSS 1.1%
  • Makefile 0.8%
  • Dockerfile 0.8%
  • Shell 0.7%
  • Other 1.0%