You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(llmobs): add summary evaluator functionality for datasets & experiments (#14629)
- changes the ExperimentResult class to have summary eval results and
the per experiment row results
- adds the ability to define and run summary evaluators
example:
given this script
```
#!/usr/bin/env python
# coding: utf-8
import os
import math
import random
from dotenv import load_dotenv
# Load environment variables from the .env file.
load_dotenv(override=True)
from ddtrace.llmobs import LLMObs
LLMObs.enable(
api_key=os.getenv("DD_API_KEY"),
app_key=os.getenv("DD_APPLICATION_KEY"),
project_name="Onboarding",
ml_app="Onboarding-ML-App",
)
dataset = LLMObs.create_dataset_from_csv(
csv_path="./data/taskmaster.csv",
dataset_name="taskmaster-mini-314",
input_data_columns=["prompt", "topics"],
expected_output_columns=["labels"],
)
dataset.as_dataframe()
def return_hello(input_data, config):
return "hello"
def return_2(input_data, output_data, expected_output):
return 2
def sum_of_rows_times_2_and_hellos(
inputs, outputs, expected_outputs, evaluators_results
):
return sum(evaluators_results["return_2"]) + len(outputs)
experiment = LLMObs.experiment(
name="taskmaster-experiment",
dataset=dataset,
task=return_hello,
evaluators=[return_2],
summary_evaluators=[sum_of_rows_times_2_and_hellos],
)
results = experiment.run(jobs=50, raise_errors=True)
print(experiment.url)
```
results in
https://dddev.datadoghq.com/llm/experiments/889df25f-95b8-4149-9ba0-756c1afcc5f6
<img width="327" height="298" alt="image"
src="https://github.com/user-attachments/assets/759393c2-109a-43f6-82fd-21bc1ffbe0fc"
/>
the summary evaluator is essentially 3*number of records -> correct
result
## Checklist
- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
0 commit comments