EKA-EVAL Demo
Evaluation Framework for Low-Resource Multilingual Large Language Models
EKA-EVAL is a unified framework for evaluating Large Language Models (LLMs) across low-resource multilingual languages.
Most existing evaluation frameworks focus heavily on English and high-resource languages, while requiring complex CLI workflows and configuration files.
EKA-EVAL solves this with a Zero-Code Web Interface, enabling researchers to run multilingual evaluations directly from a browser.
| Capability | Description |
|---|---|
| 🌍 Multilingual Benchmarks | 55+ benchmarks including 23 multilingual datasets |
| 🖥 Zero-Code UI | Run evaluations without editing configs or code |
| 📊 Visual Analytics | Interactive charts and model comparisons |
| 🤖 AI Diagnostics | Automatic analysis of model failures |
| ⚡ Modular Framework | Easily extend with new models and datasets |
The eka-eval-demo repository provides the complete UI-based evaluation platform, combining a React frontend with a FastAPI backend.
The web interface allows users to perform full evaluations without writing code.
Select Model → Choose Benchmarks → Configure Parameters → Run Evaluation → Analyze Results
Users can:
• select multilingual benchmarks • configure prompts and inference parameters • monitor evaluation progress • visualize benchmark performance
Users can build evaluation suites by selecting benchmarks from multiple categories.
Supported domains include:
| Category | Examples |
|---|---|
| Reasoning | ARC, MMLU |
| Code Generation | HumanEval |
| Commonsense | HellaSwag |
| Multilingual QA | XQuAD, XorQA |
Fine-tune evaluation settings directly in the UI.
| Parameter | Purpose |
|---|---|
| Temperature | Controls randomness |
| Batch Size | Optimizes GPU throughput |
| Top-p / decoding | Controls generation diversity |
| GPU Manager | Select compute resources |
Modify prompts without editing JSON configuration files.
Users can edit:
• system prompts • few-shot examples • prompt templates
Real-time monitoring of evaluation progress.
Displays:
| Feature | Description |
|---|---|
| Live Logs | Streamed model inference logs |
| Benchmark Status | Task progress tracking |
| GPU Usage | Real-time resource monitoring |
Automatically analyzes model failures after evaluation.
Provides insights such as:
• hallucination patterns • reasoning weaknesses • multilingual performance gaps
Compare models across benchmarks and languages.
| Visualization | Purpose |
|---|---|
| Radar Charts | Compare model strengths |
| Bar Charts | Benchmark score breakdown |
| Leaderboards | Model ranking |
EKA-EVAL includes one of the largest multilingual evaluation suites for LLMs.
| Benchmark | Description |
|---|---|
| IndicMMLU-Pro | Indic multi-task reasoning |
| MMLU-IN | Multilingual reasoning |
| MILU | Indic language understanding |
| TriviaQA-IN | Multilingual QA |
| ARC-Challenge-Indic | Science reasoning across Indic languages |
| Benchmark | Languages |
|---|---|
| Belebele | 122 languages |
| XQuAD-Indic | Hindi, Greek |
| XorQA-Indic | Bengali, Telugu |
| BoolQ-Indic | Indic languages |
| Indic-QA | Multilingual QA |
| Benchmark | Task |
|---|---|
| IndicNER | Named entity recognition |
| IndicSentiment | Sentiment classification |
| IndicGLUE | Multilingual NLU |
| XNLI | Cross-lingual inference |
| Benchmark | Task |
|---|---|
| Flores-IN | Translation |
| IndicWikiBio | Biographical generation |
| IndicParaphrase | Paraphrase generation |
Supported languages include:
Hindi • Bengali • Kannada • Malayalam • Odia • Telugu • Swahili • Yoruba
EKA-EVAL follows a four-layer modular architecture.
| Layer | Responsibility |
|---|---|
| Evaluation Engine | task scheduling, batching, distributed inference |
| Benchmark Registry | dataset loading and benchmark configuration |
| Model Interface | local models + API models |
| Results Processor | metrics computation and visualization |
Clone the repository
git clone https://github.com/lingo-iitgn/eka-eval-demo.git
cd eka-eval-demoCreate environment
conda create -n eka-env python=3.10 pip -y
conda activate eka-envInstall dependencies
pip install -r requirements.txtThe demo platform consists of two services.
| Service | Technology |
|---|---|
| Backend | FastAPI |
| Frontend | React |
uvicorn main:app --reloadBackend runs at
http://127.0.0.1:8000
cd frontend
npm install
npm run devFrontend runs at
http://localhost:5173
If you use EKA-EVAL in your research:
@misc{sinha2025ekaevalcomprehensiveevaluation,
title={Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages},
author={Samridhi Raj Sinha and Rajvee Sheth and Abhishek Upperwal and Mayank Singh},
year={2025},
eprint={2507.01853},
archivePrefix={arXiv},
primaryClass={cs.CL}
}







