EKA-EVAL Demo

EKA-EVAL Demo

Evaluation Framework for Low-Resource Multilingual Large Language Models

Overview

EKA-EVAL is a unified framework for evaluating Large Language Models (LLMs) across low-resource multilingual languages.

Most existing evaluation frameworks focus heavily on English and high-resource languages, while requiring complex CLI workflows and configuration files.

EKA-EVAL solves this with a Zero-Code Web Interface, enabling researchers to run multilingual evaluations directly from a browser.

What the framework provides

Capability	Description
🌍 Multilingual Benchmarks	55+ benchmarks including 23 multilingual datasets
🖥 Zero-Code UI	Run evaluations without editing configs or code
📊 Visual Analytics	Interactive charts and model comparisons
🤖 AI Diagnostics	Automatic analysis of model failures
⚡ Modular Framework	Easily extend with new models and datasets

The eka-eval-demo repository provides the complete UI-based evaluation platform, combining a React frontend with a FastAPI backend.

Zero-Code Evaluation Interface

The web interface allows users to perform full evaluations without writing code.

Workflow

Select Model → Choose Benchmarks → Configure Parameters → Run Evaluation → Analyze Results

Users can:

• select multilingual benchmarks • configure prompts and inference parameters • monitor evaluation progress • visualize benchmark performance

UI Features

Benchmark Selection Dashboard

Users can build evaluation suites by selecting benchmarks from multiple categories.

Supported domains include:

Category	Examples
Reasoning	ARC, MMLU
Code Generation	HumanEval
Commonsense	HellaSwag
Multilingual QA	XQuAD, XorQA

Advanced Configuration Panel

Fine-tune evaluation settings directly in the UI.

Parameter	Purpose
Temperature	Controls randomness
Batch Size	Optimizes GPU throughput
Top-p / decoding	Controls generation diversity
GPU Manager	Select compute resources

Prompt Customization Interface

Modify prompts without editing JSON configuration files.

Users can edit:

• system prompts • few-shot examples • prompt templates

Live Evaluation Dashboard

Real-time monitoring of evaluation progress.

Displays:

Feature	Description
Live Logs	Streamed model inference logs
Benchmark Status	Task progress tracking
GPU Usage	Real-time resource monitoring

AI Diagnosis Dashboard

Automatically analyzes model failures after evaluation.

Provides insights such as:

• hallucination patterns • reasoning weaknesses • multilingual performance gaps

Interactive Leaderboard

Compare models across benchmarks and languages.

Visualization Examples

Visualization	Purpose
Radar Charts	Compare model strengths
Bar Charts	Benchmark score breakdown
Leaderboards	Model ranking

Low-Resource Multilingual Benchmark Suite

EKA-EVAL includes one of the largest multilingual evaluation suites for LLMs.

Knowledge & Reasoning

Benchmark	Description
IndicMMLU-Pro	Indic multi-task reasoning
MMLU-IN	Multilingual reasoning
MILU	Indic language understanding
TriviaQA-IN	Multilingual QA
ARC-Challenge-Indic	Science reasoning across Indic languages

Reading & Question Answering

Benchmark	Languages
Belebele	122 languages
XQuAD-Indic	Hindi, Greek
XorQA-Indic	Bengali, Telugu
BoolQ-Indic	Indic languages
Indic-QA	Multilingual QA

Natural Language Understanding

Benchmark	Task
IndicNER	Named entity recognition
IndicSentiment	Sentiment classification
IndicGLUE	Multilingual NLU
XNLI	Cross-lingual inference

Multilingual Generation

Benchmark	Task
Flores-IN	Translation
IndicWikiBio	Biographical generation
IndicParaphrase	Paraphrase generation

Supported languages include:

Hindi • Bengali • Kannada • Malayalam • Odia • Telugu • Swahili • Yoruba

System Architecture

EKA-EVAL follows a four-layer modular architecture.

Layer	Responsibility
Evaluation Engine	task scheduling, batching, distributed inference
Benchmark Registry	dataset loading and benchmark configuration
Model Interface	local models + API models
Results Processor	metrics computation and visualization

Installation

Clone the repository

git clone https://github.com/lingo-iitgn/eka-eval-demo.git
cd eka-eval-demo

Create environment

conda create -n eka-env python=3.10 pip -y
conda activate eka-env

Install dependencies

pip install -r requirements.txt

Running the Application

The demo platform consists of two services.

Service	Technology
Backend	FastAPI
Frontend	React

Start Backend

uvicorn main:app --reload

Backend runs at

http://127.0.0.1:8000

Start Frontend

cd frontend
npm install
npm run dev

Frontend runs at

http://localhost:5173

Citation

If you use EKA-EVAL in your research:

@misc{sinha2025ekaevalcomprehensiveevaluation,
      title={Eka-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages}, 
      author={Samridhi Raj Sinha and Rajvee Sheth and Abhishek Upperwal and Mayank Singh},
      year={2025},
      eprint={2507.01853},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
start_server.sh		start_server.sh
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EKA-EVAL Demo

Overview

What the framework provides

Zero-Code Evaluation Interface

Workflow

UI Features

Benchmark Selection Dashboard

Advanced Configuration Panel

Prompt Customization Interface

Live Evaluation Dashboard

AI Diagnosis Dashboard

Interactive Leaderboard

Visualization Examples

Low-Resource Multilingual Benchmark Suite

Knowledge & Reasoning

Reading & Question Answering

Natural Language Understanding

Multilingual Generation

System Architecture

Installation

Running the Application

Start Backend

Start Frontend

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EKA-EVAL Demo

Overview

What the framework provides

Zero-Code Evaluation Interface

Workflow

UI Features

Benchmark Selection Dashboard

Advanced Configuration Panel

Prompt Customization Interface

Live Evaluation Dashboard

AI Diagnosis Dashboard

Interactive Leaderboard

Visualization Examples

Low-Resource Multilingual Benchmark Suite

Knowledge & Reasoning

Reading & Question Answering

Natural Language Understanding

Multilingual Generation

System Architecture

Installation

Running the Application

Start Backend

Start Frontend

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages