Skip to content

amallepalli/Promptify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Promptify

Promptify is an end-to-end automatic prompt generation and optimization system designed to select the best reasoning style (e.g., Chain-of-Thought vs Roleplay) for a given task description. It integrates machine learning, NLP, and prompt engineering into a seamless workflow — from classification to text generation.

Overview

Promptify intelligently decides how an LLM should think before generating text.
When given a user prompt, the system:

  1. Classifies the prompt as either:
    • Chain-of-Thought (CoT) – suited for analytical or reasoning-based tasks.
    • Roleplay – suited for conversational, creative, or persona-driven tasks.
  2. Generates text using a Flan-T5 model equipped with LoRA adapters, one fine-tuned for each reasoning style.

This allows the model to dynamically switch between different "thinking modes," improving output quality and adaptability across various prompt types.

Architecture

Layer Description
Frontend Built with React + Vite for a clean, responsive UI. Users can enter prompts, view classifications, and visualize generated outputs. The frontend communicates with the backend through a development proxy.
Backend Implemented using FastAPI, it exposes REST endpoints for classification and text generation. It loads:
- A DistilBERT classifier that determines the appropriate adapter.
- A Flan-T5 model with two LoRA adapters (cot and roleplay) for task-specific generation.
Training Scripts Python utilities for:
- Fine-tuning LoRA adapters on Chain-of-Thought and Roleplay datasets.
- Training the DistilBERT classifier to label incoming prompts by type.

Workflow

  1. User Input: A prompt is submitted via the frontend.
  2. Classification: The backend classifier predicts the best adapter (cot or roleplay).
  3. Generation: The selected adapter is applied to the base Flan-T5 model.
  4. Response: The generated output is returned and displayed in the UI.

Tech Stack

  • Backend: FastAPI, Transformers, PEFT, PyTorch
  • Frontend: React, Vite
  • Models: Flan-T5 (with LoRA adapters), DistilBERT classifier
  • Training: Custom in-house datasets (CSV), LoRA fine-tuning utilities

Repo structure

backend/
	main.py                # FastAPI server
	requirements.txt       # API-focused deps
	adapters/
		cot/                 # CoT LoRA adapter folder (tokenizer + adapter files)
		roleplay/            # Roleplay LoRA adapter folder
	classifier/            # Exported HF text-classification model (tokenizer + safetensors)
frontend/
	...                    # React + Vite app
scripts/
	train_cot.py           # Train CoT adapter
	train_roleplay.py      # Train Roleplay adapter
	train_classifier.py    # Train classifier (DistilBERT)
	test_*.py              # Quick inference checks
data/
	...                    # CSVs used by the training scripts
requirements.txt         # Unified deps (API + training)

Prerequisites

  • Python 3.10+
  • Node.js 18+
  • If using GPU: install the correct PyTorch CUDA build for your system.

Install dependencies

Install all Python deps from the repository root (covers API + training):

pip install -r requirements.txt

Install frontend deps:

cd frontend
npm install

Run the backend API

From the backend/ folder:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Health check:

curl http://localhost:8000/health

API endpoints

  • GET /health{ "status": "ok" }
  • POST /classify → body: { "prompt": string }{ "label": "cot" | "roleplay" }
  • POST /generate → body: { "prompt": string, "adapter_override"?: "cot"|"roleplay" }{ "adapter", "output" }

Generation uses the exact, deterministic beam settings from the test scripts for reproducibility.

Run the frontend (dev)

The Vite dev server proxies /api/* to the backend on port 8000.

cd frontend
npm run dev

Then open the printed localhost URL (typically http://localhost:5173). Requests to /api will be forwarded to http://localhost:8000.

Troubleshooting

  • CPU vs GPU: The backend auto-selects GPU if available via torch.cuda.is_available(). On CPU-only systems, this is fine but slower.
  • PyTorch install: If you need CUDA-enabled PyTorch, follow the official install matrix and replace the torch wheel accordingly.
  • Tokenizer files: Ensure adapter directories include tokenizer.json / tokenizer_config.json and any special tokens to match training.
  • SentencePiece: Required for T5 tokenization; included in requirements.

License

This repository is provided as-is for demonstration and research. Review model licenses for any pretrained weights you use.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •