feat!: redesign of the AI testing framework by constantinius · Pull Request #30 · getsentry/testing-ai-sdk-integrations

constantinius · 2026-02-03T11:59:25Z

Key features

Splitting the test cases from their implementation for the various framework integrations to test.

Test cases are rendered with templates for a given framework, allowing to significantly improve the test coverage for all kinds of variations in interacting with the framework.

New layout of the testing framework to run everything in TypeScript/JavaScript, only Python test execution uses the Python interpreter. Spans are no longer intercepted in the transport, but a Span collector receives the spans.

We now have a test layout in the following manner: <type>/<platform>/<framework>/<test-case>/<check>, where:

type is the basic thing to test, currently LLMs (non-agentic) and agents. Later MCP and embeddings.
platform: either JS or Python, will also be expanded upon
framework: the framework integration to test, e.g: openai, anthropic or langgraph
test-case: the test-case to run, a complete scenario with a given purpose
check: a specific check to be run, with one or more assertions

…on, and live status display

- Add causeAPIError flag to TestDefinition interface - Update base templates (Python/JS) with error handling wrapper - Implement respx-based API mocking for OpenAI Python template - Add respx dependency to OpenAI Python config - Create Basic Error LLM Test case that validates error capture - Simplify OpenAI Python template (remove verbose debug output)

…Python templates - Add respx/httpx imports and inject_api_error block to all 3 templates - Add respx dependency to config.json for each framework - Simplify templates (remove verbose debug output, use kwargs pattern)

…ations

… checkResponseToolCalls and checks for input message schema checkInputMessagesSchema

and fix the one for trimming

obostjancic

L F G 🚀

…ng results

constantinius added 30 commits January 27, 2026 16:49

feat!: initial state of the redesign

0e50137

feat: add model override system, enhanced reporting, skip configurati…

a8f5a01

…on, and live status display

feat: re-added several JS and Py frameworks

5f511d1

fix: test case setups for simple tests

a08c905

fix: making google-genai an agent framework and fixing litellm

a2f08af

feat: adding more options to filter and improve logging

9375f9b

feat: implement streaming/blocking(non-streaming) tests

470259b

feat: add streaming support for various LLM frameworks

2c89496

fix: templates for various LLM frameworks around multi-turn tests

2613d99

feat: add command to only render templates

405d264

fix: rendering tests first before conducting the tests

488fa7f

fix: making the JS framework tests work

ec9ef48

fix: fixing JS templates

df839fb

feat: add concurrency

19310a2

fix: using built-in argparser

062e076

fix: update default SDK version to test

e6b6785

fix: better reporting

43f308e

fix: update CLAUDE.md to be representative

723b1d0

fix: update Makefile and package.json

df56eb2

feat: add vision test and templates for LLMs

12ac8f2

feat: adding vision test for agent frameworks

60ec4fe

feat: add tests for very long messages to be truncated

1060b86

feat: re-arranged agent tests and added new ones

1bf778c

fix: framework discovery

eabeea4

fix: enhance framework discovery by filtering out incomplete configur…

b372ea4

…ations

fix: adjust action for changes

4812e16

fix: adjust readme for new layout

e3d6e53

fix: updating Readme

4b52e62

constantinius added 7 commits February 2, 2026 14:45

feat: making checks re-usable across test cases

b4de862

feat: better test case structuring

7e48cfb

fix: update Readme

66cea7f

fix: renamed checks and improved layout

2ffc8b2

feat: adding tool related checks checkToolCalls, checkAvailableTools,…

5fba24f

… checkResponseToolCalls and checks for input message schema checkInputMessagesSchema

feat: re-add the report generation logic

a82d307

feat: add check for binary redaction

49e8227

and fix the one for trimming

constantinius requested a review from obostjancic February 3, 2026 12:04

constantinius added 4 commits February 3, 2026 13:52

fix: action to use new ctrf file layout

29106ff

chore: update to latest framework version

776fcf5

feat: add mastra as a JS/agent test

ffab3d5

docs: cleanup and updated documentation files

5b0bc06

obostjancic approved these changes Feb 6, 2026

View reviewed changes

constantinius added 2 commits February 6, 2026 14:44

feat: several goodies for report visualization: showing spans, groupi…

fa11632

…ng results

chore: documenting the new CLI features

4839967

constantinius merged commit b322b81 into main Feb 6, 2026
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat!: redesign of the AI testing framework#30

feat!: redesign of the AI testing framework#30
constantinius merged 43 commits into
mainfrom
constantinius/breaking/redesign

constantinius commented Feb 3, 2026

Uh oh!

obostjancic left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

constantinius commented Feb 3, 2026

Key features

Uh oh!

obostjancic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants