Skip to content

Conversation

@rogthefrog
Copy link
Contributor

@rogthefrog rogthefrog commented Dec 10, 2025

This generates the list of annotations for a given job, for consumption by modelrunner.

I still need to double check that the job ID is what we expect.

Questions:

  • is this the right place to pull the annotations?
  • is the source_id sufficient to reconcile these with ther matching prompts or responses?
  • is there anything else that would be useful to output?

Related to https://github.com/mlcommons/sugar/pull/303

@github-actions
Copy link

github-actions bot commented Dec 10, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@superdosh superdosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we had a conversation about how these annotations would be filtered to only those specifically marked as "exportable" in some way. Is that still correct?

@wpietri wpietri self-requested a review December 10, 2025 17:09
@rogthefrog
Copy link
Contributor Author

I remember we had a conversation about how these annotations would be filtered to only those specifically marked as "exportable" in some way. Is that still correct?

That's right, good reminder. IIRC only the annotations from a run using the prompts in the demo set would be available?

@superdosh
Copy link
Contributor

That's right, good reminder. IIRC only the annotations from a run using the prompts in the demo set would be available?

@wpietri had the idea that we would add a column to our prompt datasets that would indicate exportability, and then we'd just check against that?

@wpietri
Copy link
Contributor

wpietri commented Dec 10, 2025

@wpietri had the idea that we would add a column to our prompt datasets that would indicate exportability, and then we'd just check against that?

Yes, I think that's the best solution. I have little reason to think that we will have a demo prompt set for security or other future benchmarks, but we'll want to be able to offer annotations on a stable fraction of the non-official prompts. So I think annotation exportability should just be made explicit for every prompt and be part of the prompt's metadata.

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a plausible start. I'm glad to see it wasn't too hard. My thoughts:

  • We need to be very sure we never give anybody annotations outside the set of prompts for which that is allowed. Doing so is an error on the order of a $50k cost. So modelrunner should never put anything in the file that isn't explicitly allowed.
  • I'm not sure we should be exposing our source_id fields generally, as they may contain information about vendors and such. Kurt would know.
  • We should include the prompt and response along with the annotation, so it's easy for them to understand what about their model they need to improve. Probably also the hazard as long as we're at it.
  • It looks like compile_annotations could go on the BenchmarkRun object nicely, so it becomes part of the API.

@rogthefrog rogthefrog force-pushed the 226-return-annotations branch from 2be3f3a to 8cca1cd Compare December 11, 2025 00:50
@rogthefrog rogthefrog marked this pull request as ready for review December 11, 2025 00:51
@rogthefrog rogthefrog requested a review from a team as a code owner December 11, 2025 00:51
Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and I like the choice of only doing demo for now. Should have some tests eventually, though, at least for the extractor.

…d an always-passing test caused by missing () in a function call
@rogthefrog
Copy link
Contributor Author

Looks good, and I like the choice of only doing demo for now. Should have some tests eventually, though, at least for the extractor.

@wpietri do you mind taking a look at the tests I added?

@rogthefrog rogthefrog merged commit 3e8782d into main Dec 12, 2025
2 checks passed
@rogthefrog rogthefrog deleted the 226-return-annotations branch December 12, 2025 22:21
@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants