Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: @ls.pytest.mark.parametrize interface #1199

Closed
wants to merge 2 commits into from

Conversation

baskaryan
Copy link
Contributor

almost certainly not handling lazy eval correctly, but what do we think of interface?

  • automatically logs a pass/fail feedback based on test pass/fail
  • can return whatever other feedback you want as well
@ls.pytest.mark.parametrize("Sample Dataset 3", (lambda x: x))
def test_parametrize(inputs, outputs, reference_outputs) -> list:
    assert inputs == outputs
    return [{"key": "foo", "value": "bar"}]

some example experiments here https://dev.smith.langchain.com/public/e7782ea0-3de5-4352-8cd4-7b2cdbb03e4c/d

@baskaryan baskaryan requested a review from hinthornw November 8, 2024 22:23
@hinthornw
Copy link
Collaborator

hinthornw commented Nov 8, 2024

Things I like about this:

  1. Can connect to dataset
  2. outputs are fairly localized/transpernt
  3. Trace seems sensical (has outputs by default)
  4. Parallelized!
  5. Think you can re-use the score helping function if you wanted

Things I don't looove about this relative to @unit

  1. Seems harder to check multi-step things
  2. The actual system is run "outside" the test function
  3. Currently seems to be 1 experiment per unit test? Maybe that is the right equivalence though not sure
  4. Pytest doesn't like if you return stuff from the test function

pass_result = [r for r in eval_results if r.key == "pass"][0]
if not pass_result.score:
error = pass_result.comment
pytest.fail(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you set failure conditions? I assume people don't want to actually fail if any evaluation fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which might mean allowing customizability on the interface on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only fails if the actual test raises an error (we need to add a manual pytest.fail for that bc we catch and log all errors in the wrapper L48). so it is customizable by default

@baskaryan baskaryan closed this Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants