Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorial for testing framework #631

Merged
merged 18 commits into from
Jan 22, 2025
Merged

tutorial for testing framework #631

merged 18 commits into from
Jan 22, 2025

Conversation

isahers1
Copy link
Contributor

No description provided.

Copy link

vercel bot commented Jan 18, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langsmith-docs ✅ Ready (Inspect) Visit Preview 💬 1 unresolved
✅ 3 resolved
Jan 22, 2025 1:23am

Copy link
Contributor

@jacoblee93 jacoblee93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing a bit of interactivity. Right now it's very much just "here's how you add test some test cases"

Some ideas:

  • What do I do with results?
  • How do I improve my app (could just show how tweaking/improving system prompt or some tool description makes some metric increase)
  • How should I think about coming up with new evaluation cases/metrics?
  • When/how often should I run tests?
    • Would be nice to have a section where you mock out an LLM call in your evaluator or app if you want to e.g. run on every local change
  • Should also encourage people to click on experiment link and explore there
    • Share with others on team?

I think would be nice to make the app imperfect at first, then add evals to show it's imperfect, then improve the app, then show improvement using evals

The velocity and feedback loop is the real value over other tutorials using evaluate

## Write tests

Now that we have defined our agent, let's write a few tests to ensure basic functionality.
In this tutorial we are going to test whether the agent's tool calling abilities are working,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd structure this as:

In this tutorial we are going to test:

  1. The agent can ignore irrelevant questions.
  2. The agent's tool calling abilities.
  3. The agent can answer complex questions that involve using all of the tools
  4. The agent's answer is grounded in search results using LLM-as-a judge evaluator

Would also make sure that the order here matches the tests below

### Test 4: LLM as a judge

For LLM as a judge, we are going to ensure that the agent's answer is grounded in the search results.
In order to trace the LLM as a judge call separately from our agent, we will use the LangSmith provided `trace_feedback` context manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we link out to SDK docs fortrace_feedback and wrapEvaluator? i couldnt find

@tanushree-sharma
Copy link
Contributor

To Jacob's point on "What do I do with results?"

Can we show a screenshot of the results of these tests in LangSmith?

@jacoblee93
Copy link
Contributor

I think it's ok, when using stuff like wrapEvaluator it should link to relevant API refs/how-tos though

It's also still just "paste these test cases into your file" which is fine but I think it would benefit from something like showing how you can use test results to prompt engineer and improve your app.

@baskaryan baskaryan merged commit 580c65c into main Jan 22, 2025
5 of 6 checks passed
@baskaryan baskaryan deleted the isaac/testtutorial branch January 22, 2025 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants