A web-based interface for collecting feedback and gold standard answers from experts to finetune and evaluate medical Question & Answering and RAG systems.
Demo: https://qa-eval-dashboard.onrender.com/
- Review and provide feedback on medical Q&A pairs to support evaluation and finetuning
- Score model answers based on accuracy, completeness, clarity, and clinical relevance
- Provide gold standard answers
- Dataset management for organizing Q&A collections, including upload and download functionality (with support for csv and json file formats)
- Admin interface for user and dataset management
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
python app.py
The application will be available at http://localhost:5000