feat(llmobs): support joining custom evaluations via tags #11535

lievan · 2024-11-25T18:32:47Z

This PR implements LLMObs.submit_evaluation_for method, which gives users two options for joining custom evaluations

by tag via the span_with_tag argument, which accepts a tuple containing a tag key/value pair
by span via the span argument, which accepts a dictionary containing span_id and trace_id keys

There are also a couple behavior differences between submit_evaluation_for and submit_evaluation. In the new method, we

throw whenever a required argument is the wrong value or type
remove metadata argument
move the warning log for missing api key to the eval metric writer's periodic method

Other changes:

Eval metric writer

Update the eval metric writer to write to the v2 eval metric endpoint. The main difference with this endpoint is that it accepts a join_with field that holds joining information instead of a top-level trace and span id fields.

Deprecate `submit_evaluation`

Deprecates submit_evaluation. I've set the removal version to be 3.0.0.

Checklist

PR author has checked that all the criteria below are met
The PR description includes an overview of the change
The PR description articulates the motivation for the change
The change includes tests OR the PR description describes a testing strategy
The PR description notes risks associated with the change, if any
Newly-added code is easy to change
The change follows the library release note guidelines
The change includes or references documentation updates if necessary
Backport labels are set (if applicable)

Reviewer Checklist

Reviewer has checked that all the criteria below are met
Title is accurate
All changes are related to the pull request's stated goal
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Newly-added code is easy to change
Release note makes sense to a user of the library
If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

github-actions · 2024-11-25T18:33:20Z

CODEOWNERS have been resolved as:

releasenotes/notes/submit-evaluation-for-01096d803d969e3e.yaml          @DataDog/apm-python
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_categorical_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_metric_bad_api_key.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_multiple_events.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_eval_metric_writer.test_send_timed_events.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_evaluator_runner.send_score_metric.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_faithfulness_evaluator.emits_traces_and_evaluations_on_exit.yaml  @DataDog/ml-observability
tests/llmobs/llmobs_cassettes/tests.llmobs.test_llmobs_ragas_faithfulness_evaluator.test_ragas_faithfulness_emits_traces.yaml  @DataDog/ml-observability
tests/llmobs/test_llmobs_eval_metric_writer.py                          @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability

pr-commenter · 2024-11-25T19:17:35Z

Benchmarks

Benchmark execution time: 2025-01-08 23:33:22

Comparing candidate commit 86d204f in PR branch evan.li/submit-evaluation-for with baseline commit 75bed24 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 380 metrics, 2 unstable metrics.

ddtrace/llmobs/_llmobs.py

releasenotes/notes/submit-evaluation-for-01096d803d969e3e.yaml

ddtrace/llmobs/_llmobs.py

releasenotes/notes/submit-evaluation-for-01096d803d969e3e.yaml

ddtrace/llmobs/_llmobs.py

Yun-Kim

Did a first quick pass, will do another pass on Monday!

ddtrace/llmobs/_llmobs.py

Yun-Kim

Very minor comments/suggestions, but approving as none are super unblocking!

ddtrace/llmobs/_llmobs.py

releasenotes/notes/submit-evaluation-for-01096d803d969e3e.yaml

tests/llmobs/test_llmobs_eval_metric_writer.py

…bmit-evaluation-for

datadog-dd-trace-py-rkomorn · 2025-01-07T16:25:51Z

Datadog Report

Branch report: evan.li/submit-evaluation-for
Commit report: 86d204f
Test service: dd-trace-py

✅ 0 Failed, 87 Passed, 1468 Skipped, 3m 46.07s Total duration (35m 21.75s time saved)

…-trace-py into evan.li/submit-evaluation-for

lievan added 5 commits November 22, 2024 15:04

create function

152f3e3

switch to v2 eval metric endpoint, update llmobs service tests

3b9ebaa

update eval metric writer tests

61f33aa

add rel note

7cb4dd7

deprecation notice

68329ad

lievan added 2 commits November 25, 2024 13:36

update test names

4ad6c26

rename api key arg

ba8f1a1

lievan changed the title ~~feat(llmobs): Support joining custom evaluations via tags~~ feat(llmobs): support joining custom evaluations via tags Nov 25, 2024

lievan marked this pull request as ready for review November 25, 2024 20:24

lievan requested review from a team as code owners November 25, 2024 20:24

lievan requested review from duncanista and christophe-papazian November 25, 2024 20:24

Kyle-Verhoog reviewed Nov 25, 2024

View reviewed changes

update more cassettes

e8546d5

datadog-datadog-prod-us1 bot reviewed Nov 25, 2024

View reviewed changes

ddtrace/llmobs/_llmobs.py Show resolved Hide resolved

lievan added 2 commits November 25, 2024 16:23

fix release note

00a4f8e

remove support for numerical

0165132

datadog-datadog-prod-us1 bot reviewed Nov 25, 2024

View reviewed changes

ddtrace/llmobs/_llmobs.py Show resolved Hide resolved

lievan added 2 commits November 25, 2024 16:38

make submit eval throw

2ba69f1

update test for throwing

b88ec95

Kyle-Verhoog reviewed Nov 26, 2024

View reviewed changes

lievan added 2 commits November 26, 2024 13:53

revert change for existing method submit-eval

d7ee314

change where api key warning logs

6cb07f2

lievan changed the title ~~feat(llmobs): support joining custom evaluations via tags~~ feat(llmobs): support joining custom evaluations via tags Nov 26, 2024

lievan added 2 commits November 26, 2024 14:38

migration example

764b2c5

update example in release note

27a3551

Yun-Kim reviewed Nov 28, 2024

View reviewed changes

ddtrace/llmobs/_llmobs.py Outdated Show resolved Hide resolved

christophe-papazian removed their request for review December 3, 2024 13:42

lievan and others added 2 commits December 23, 2024 12:04

span_with_tag_value

41816f8

Merge branch 'main' into evan.li/submit-evaluation-for

8c5a708

Yun-Kim approved these changes Dec 24, 2024

View reviewed changes

lievan added 2 commits December 30, 2024 09:08

update comments, examples

e2450d5

Merge branch 'main' of github.com:DataDog/dd-trace-py into evan.li/su…

3455dd5

…bmit-evaluation-for

lievan and others added 5 commits January 7, 2025 15:15

Merge branch 'main' into evan.li/submit-evaluation-for

a7f274a

update the contract

9ff686c

updte contract

598bd37

Merge branch 'evan.li/submit-evaluation-for' of github.com:DataDog/dd…

74552b9

…-trace-py into evan.li/submit-evaluation-for

Merge branch 'main' into evan.li/submit-evaluation-for

86d204f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): support joining custom evaluations via tags #11535

feat(llmobs): support joining custom evaluations via tags #11535

lievan commented Nov 25, 2024 •

edited

Loading

github-actions bot commented Nov 25, 2024 •

edited

Loading

pr-commenter bot commented Nov 25, 2024 •

edited

Loading

Yun-Kim left a comment

Yun-Kim left a comment

datadog-dd-trace-py-rkomorn bot commented Jan 7, 2025 •

edited

Loading

feat(llmobs): support joining custom evaluations via tags #11535

Are you sure you want to change the base?

feat(llmobs): support joining custom evaluations via tags #11535

Conversation

lievan commented Nov 25, 2024 • edited Loading

Eval metric writer

Deprecate submit_evaluation

Checklist

Reviewer Checklist

github-actions bot commented Nov 25, 2024 • edited Loading

pr-commenter bot commented Nov 25, 2024 • edited Loading

Benchmarks

Yun-Kim left a comment

Choose a reason for hiding this comment

Yun-Kim left a comment

Choose a reason for hiding this comment

datadog-dd-trace-py-rkomorn bot commented Jan 7, 2025 • edited Loading

Datadog Report

lievan commented Nov 25, 2024 •

edited

Loading

Deprecate `submit_evaluation`

github-actions bot commented Nov 25, 2024 •

edited

Loading

pr-commenter bot commented Nov 25, 2024 •

edited

Loading

datadog-dd-trace-py-rkomorn bot commented Jan 7, 2025 •

edited

Loading