Skip to content

lingo-iitgn/eka-eval-demo

Repository files navigation


EKA-EVAL Demo

EKA-EVAL Demo

Evaluation Framework for Low-Resource Multilingual Large Language Models


Overview

EKA-EVAL is a unified framework for evaluating Large Language Models (LLMs) across low-resource multilingual languages.

Most existing evaluation frameworks focus heavily on English and high-resource languages, while requiring complex CLI workflows and configuration files.

EKA-EVAL solves this with a Zero-Code Web Interface, enabling researchers to run multilingual evaluations directly from a browser.

What the framework provides

Capability Description
🌍 Multilingual Benchmarks 55+ benchmarks including 23 multilingual datasets
🖥 Zero-Code UI Run evaluations without editing configs or code
📊 Visual Analytics Interactive charts and model comparisons
🤖 AI Diagnostics Automatic analysis of model failures
⚡ Modular Framework Easily extend with new models and datasets

The eka-eval-demo repository provides the complete UI-based evaluation platform, combining a React frontend with a FastAPI backend.


Zero-Code Evaluation Interface

The web interface allows users to perform full evaluations without writing code.

Workflow

Select Model → Choose Benchmarks → Configure Parameters → Run Evaluation → Analyze Results

Users can:

• select multilingual benchmarks • configure prompts and inference parameters • monitor evaluation progress • visualize benchmark performance


UI Features


Benchmark Selection Dashboard

Users can build evaluation suites by selecting benchmarks from multiple categories.

Supported domains include:

Category Examples
Reasoning ARC, MMLU
Code Generation HumanEval
Commonsense HellaSwag
Multilingual QA XQuAD, XorQA

Advanced Configuration Panel

Fine-tune evaluation settings directly in the UI.

Parameter Purpose
Temperature Controls randomness
Batch Size Optimizes GPU throughput
Top-p / decoding Controls generation diversity
GPU Manager Select compute resources

Prompt Customization Interface

Modify prompts without editing JSON configuration files.

Users can edit:

• system prompts • few-shot examples • prompt templates


Live Evaluation Dashboard

Real-time monitoring of evaluation progress.

Displays:

Feature Description
Live Logs Streamed model inference logs
Benchmark Status Task progress tracking
GPU Usage Real-time resource monitoring

AI Diagnosis Dashboard

Automatically analyzes model failures after evaluation.

Provides insights such as:

• hallucination patterns • reasoning weaknesses • multilingual performance gaps


Interactive Leaderboard

Compare models across benchmarks and languages.

Visualization Examples

Visualization Purpose
Radar Charts Compare model strengths
Bar Charts Benchmark score breakdown
Leaderboards Model ranking

Low-Resource Multilingual Benchmark Suite

EKA-EVAL includes one of the largest multilingual evaluation suites for LLMs.

Knowledge & Reasoning

Benchmark Description
IndicMMLU-Pro Indic multi-task reasoning
MMLU-IN Multilingual reasoning
MILU Indic language understanding
TriviaQA-IN Multilingual QA
ARC-Challenge-Indic Science reasoning across Indic languages

Reading & Question Answering

Benchmark Languages
Belebele 122 languages
XQuAD-Indic Hindi, Greek
XorQA-Indic Bengali, Telugu
BoolQ-Indic Indic languages
Indic-QA Multilingual QA

Natural Language Understanding

Benchmark Task
IndicNER Named entity recognition
IndicSentiment Sentiment classification
IndicGLUE Multilingual NLU
XNLI Cross-lingual inference

Multilingual Generation

Benchmark Task
Flores-IN Translation
IndicWikiBio Biographical generation
IndicParaphrase Paraphrase generation

Supported languages include:

Hindi • Bengali • Kannada • Malayalam • Odia • Telugu • Swahili • Yoruba


System Architecture

EKA-EVAL follows a four-layer modular architecture.

Layer Responsibility
Evaluation Engine task scheduling, batching, distributed inference
Benchmark Registry dataset loading and benchmark configuration
Model Interface local models + API models
Results Processor metrics computation and visualization

Installation

Clone the repository

git clone https://github.com/lingo-iitgn/eka-eval-demo.git
cd eka-eval-demo

Create environment

conda create -n eka-env python=3.10 pip -y
conda activate eka-env

Install dependencies

pip install -r requirements.txt

Running the Application

The demo platform consists of two services.

Service Technology
Backend FastAPI
Frontend React

Start Backend

uvicorn main:app --reload

Backend runs at

http://127.0.0.1:8000

Start Frontend

cd frontend
npm install
npm run dev

Frontend runs at

http://localhost:5173

Citation

If you use EKA-EVAL in your research:

@misc{sinha2025ekaevalcomprehensiveevaluation,
      title={Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages}, 
      author={Samridhi Raj Sinha and Rajvee Sheth and Abhishek Upperwal and Mayank Singh},
      year={2025},
      eprint={2507.01853},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

The Framework for Multilingual Low-Resource Language Evaluation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors