-
Notifications
You must be signed in to change notification settings - Fork 27
first pass at outputting annotations #1413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
superdosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember we had a conversation about how these annotations would be filtered to only those specifically marked as "exportable" in some way. Is that still correct?
That's right, good reminder. IIRC only the annotations from a run using the prompts in the demo set would be available? |
@wpietri had the idea that we would add a column to our prompt datasets that would indicate exportability, and then we'd just check against that? |
Yes, I think that's the best solution. I have little reason to think that we will have a demo prompt set for security or other future benchmarks, but we'll want to be able to offer annotations on a stable fraction of the non-official prompts. So I think annotation exportability should just be made explicit for every prompt and be part of the prompt's metadata. |
wpietri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a plausible start. I'm glad to see it wasn't too hard. My thoughts:
- We need to be very sure we never give anybody annotations outside the set of prompts for which that is allowed. Doing so is an error on the order of a $50k cost. So modelrunner should never put anything in the file that isn't explicitly allowed.
- I'm not sure we should be exposing our
source_idfields generally, as they may contain information about vendors and such. Kurt would know. - We should include the prompt and response along with the annotation, so it's easy for them to understand what about their model they need to improve. Probably also the hazard as long as we're at it.
- It looks like compile_annotations could go on the BenchmarkRun object nicely, so it becomes part of the API.
2be3f3a to
8cca1cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and I like the choice of only doing demo for now. Should have some tests eventually, though, at least for the extractor.
…d an always-passing test caused by missing () in a function call
@wpietri do you mind taking a look at the tests I added? |
This generates the list of annotations for a given job, for consumption by modelrunner.
I still need to double check that the job ID is what we expect.
Questions:
Related to https://github.com/mlcommons/sugar/pull/303