LLM Evaluator Service

A modular service for static LLM evaluation of LLM outputs.

Requirements

Python 3.12 or higher
Dependencies are managed through pyproject.toml

Setup

Install UV (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create and activate a virtual environment using UV:

uv venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

Install dependencies using UV:

uv pip install -e .

Available Scripts

Chatbot Evaluation Scripts

Located in scripts/chatbot/:

simulate_convo.py - Creates simulated conversations for testing and evaluation purposes
- Usage: uv run scripts/chatbot/simulate_convo.py
pre_merge_check.py - Runs validation checks before merging code changes
- Usage: uv run scripts/chatbot/pre_merge_check.py
nightly_report.py - Generates daily evaluation reports
- Usage: uv run scripts/chatbot/nightly_report.py

Main Evaluation Script

convo_eval.py

Core evaluation script for analyzing conversations
Usage: uv run convo_eval.py

FAQ Generator Scripts

Located in scripts/faq_generator/:

faq_eval.py - Evaluates FAQ content using DeepEval metrics
- Usage: uv run scripts/faq_generator/faq_eval.py --input "Your prompt" --content "Generated FAQ content" --context "Reference material"
- Required arguments:
  - --input: The input prompt text used to generate the FAQ
  - --content: The generated FAQ content to evaluate
  - --context: The reference material or ground truth to check against
- Output: Generates a CSV file in deepeval_results/faq_eval/ with evaluation metrics including:
  - Hallucination score
  - Evaluation reasoning
  - Cost metrics
- Requirements:
  - DEEPEVAL_API_KEY environment variable must be set
  - Python 3.12 or higher
  - DeepEval package installed

Project Structure

src/ - Source code directory
scripts/ - Utility scripts for various tasks
- chatbot/ - Chatbot evaluation and testing scripts
- faq_generator/ - FAQ generation scripts
mock_data/ - Sample data for testing
deepeval_results/ - Output directory for evaluation results

Dependencies

Main dependencies include:

deepeval (>=2.7.6)
deepteam (>=0.0.9)
Google API Client Libraries
python-dotenv

Environment Variables

The project uses environment variables for configuration. Create a .env file in the root directory with necessary credentials and settings.

Contributing

Follow the existing code structure and style
Update documentation as needed

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.deepeval		.deepeval
.github/workflows		.github/workflows
mock_data		mock_data
scripts		scripts
src		src
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Evaluator Service

Requirements

Setup

Available Scripts

Chatbot Evaluation Scripts

Main Evaluation Script

FAQ Generator Scripts

Project Structure

Dependencies

Environment Variables

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gametimesf/gt_llm_evaluator

Folders and files

Latest commit

History

Repository files navigation

LLM Evaluator Service

Requirements

Setup

Available Scripts

Chatbot Evaluation Scripts

Main Evaluation Script

FAQ Generator Scripts

Project Structure

Dependencies

Environment Variables

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages