feat(mock-server): add ResponsesRequest model with full dispatch plumbing by FrankD412 · Pull Request #1000 · ai-dynamo/aiperf

FrankD412 · 2026-05-27T00:26:33Z

Summary

Follow-up to #962. Introduces ResponsesRequest as a first-class member of RequestT so the mock-server request recorder can capture Responses-specific fields (max_output_tokens, reasoning_effort, stream) instead of the synthetic ChatCompletionRequest the /v1/responses handler builds for the latency simulator.

New ResponsesRequest model with a prompt_text property that flattens the Responses input shape (str | list[str|dict] | list[content-block]) into a single string. Flattener logic moved verbatim from app._extract_responses_prompt into models._flatten_responses_input so recorder, tokenizer, and handler share one source of truth.
Dispatch wired through models.RequestT, tokens._extract_request_content, tokens._extract_osl_fingerprint, utils._create_request_id (resp-{uuid} prefix), request_recorder._encode_request_prompt_ids, and the app.responses handler signature (req: dict -> req: ResponsesRequest).
JSONL schema decision: Responses' max_output_tokens is canonicalized into the existing max_completion_tokens column rather than introducing a new field. Both name the same semantic (the OSL cap); preserving the JSONL schema is more useful for downstream tools than preserving the API name-space, and the endpoint column on each row already disambiguates.

A subsequent commit will wire make_ctx to accept a record-time override so handlers can pass the real payload to the recorder while still driving simulation off the synthetic chat.

Test Plan

uv run pytest tests/unit/ -n auto (12881 passed; one unrelated MLflow flake passes in isolation)
Unit coverage for prompt_text flattening across all four input shapes
Extras (tools, instructions) pass through via BaseModel extra="allow"
_extract_request_content / _extract_osl_fingerprint dispatch for Responses
_create_request_id prefix (resp-)
_encode_request_prompt_ids for both string and content-block input
tokenize_request handles Responses on the generation path

Reported by reviewer dynamo-ops.

Summary by CodeRabbit

New Features
- Extended request handling support to a new request type with automatic input normalization across multiple input formats and enhanced token processing capabilities
Tests
- Comprehensive test coverage added across multiple modules to validate the new request type functionality, including request identification generation, token extraction and processing, content normalization from diverse input shapes, field preservation, and proper handling of default values and edge cases

…bing Introduce `ResponsesRequest` as a first-class member of `RequestT` so the recorder can capture Responses-specific fields (`max_output_tokens`, `reasoning_effort`, `stream`) instead of the synthetic `ChatCompletionRequest` the `/v1/responses` handler currently builds for the latency simulator. Subsequent commit will wire `make_ctx` to accept a record-time override so handlers can pass the real payload to the recorder while still driving simulation off the synthetic chat. The model exposes a `prompt_text` property that flattens the Responses `input` shape (str | list[str|dict] | list[content-block]) into a single string. The flattener logic moved verbatim from `app._extract_responses_prompt` into `models._flatten_responses_input` so the recorder and tokenizer dispatch sites share one source of truth; the handler call site now uses `req.prompt_text`. JSONL schema decision: Responses' `max_output_tokens` is canonicalized into the existing `max_completion_tokens` column rather than introducing a new field. Both name the same semantic (the OSL cap); preserving the JSONL schema is more useful for downstream tools than preserving the API name-space, and the `endpoint` column on each row already disambiguates. Dispatch wired in: - `models.RequestT` union - `tokens._extract_request_content` (text + cap) - `tokens._extract_osl_fingerprint` (canonicalized fields) - `utils._create_request_id` (`resp-{uuid}` prefix) - `request_recorder._encode_request_prompt_ids` (tokenize via prompt_text) - `app.responses` handler signature (`req: dict` -> `req: ResponsesRequest`) Tests cover: - prompt_text flattening across all four input shapes - extras (`tools`, `instructions`) pass through via BaseModel extra="allow" - _extract_request_content and _extract_osl_fingerprint dispatch - _create_request_id prefix - _encode_request_prompt_ids for string and content-block input - tokenize_request handles Responses on the generation path Reported by reviewer dynamo-ops. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: Frank Di Natale <[email protected]>

copy-pr-bot · 2026-05-27T00:26:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-27T00:26:42Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@3d656d2622e35b6a9eaaa9635be090366bf6fbba

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@3d656d2622e35b6a9eaaa9635be090366bf6fbba

Last updated for commit: 3d656d2 • Browse code

coderabbitai · 2026-05-27T00:31:48Z

Walkthrough

This PR adds ResponsesRequest Pydantic model support for OpenAI's /v1/responses API. The model flattens heterogeneous input shapes into a single prompt_text property, is registered in the RequestT union, and the request processing pipeline (endpoint handler, tokenization, token metrics, request ID generation) now dispatches on this new type with comprehensive test coverage.

Changes

ResponsesRequest Support

Layer / File(s)	Summary
ResponsesRequest model definition with input flattening `tests/aiperf_mock_server/models.py`	`ResponsesRequest` Pydantic model captures response-api fields and provides a `prompt_text` property that normalizes string/list/content-block inputs via `_flatten_responses_input` helper. Model is registered in the `RequestT` union.
Request type registration and imports `tests/aiperf_mock_server/app.py`, `tests/aiperf_mock_server/request_recorder.py`, `tests/aiperf_mock_server/tokens.py`, `tests/aiperf_mock_server/utils.py`	`ResponsesRequest` is imported into each request-processing module so dispatch logic can recognize and handle this new request type.
Responses endpoint handler implementation `tests/aiperf_mock_server/app.py`	Endpoint signature changes from `req: dict[str, Any]` to `req: ResponsesRequest`. Removes `_extract_responses_prompt` helper and builds chat completion messages directly from `req.model` and `req.prompt_text`.
Request processing pipeline support `tests/aiperf_mock_server/request_recorder.py`, `tests/aiperf_mock_server/tokens.py`, `tests/aiperf_mock_server/utils.py`	Request recorder tokenizes `prompt_text` via tokenizer call mode; tokens module extracts prompt and output-token cap, and maps `max_output_tokens` to `max_completion_tokens` in OSL fingerprint; utils generates `resp-`-prefixed request IDs.
ResponsesRequest model and behavior tests `tests/unit/server/test_models.py`	`TestResponsesRequest` verifies prompt-text flattening across multiple input shapes, unmodeled field preservation via Pydantic extras, and safe field defaults.
Request processing dispatch tests `tests/unit/aiperf_mock_server/test_request_recorder.py`, `tests/unit/server/test_tokens.py`	`TestResponsesRequestRecorderDispatch` tests request ID prefixing and tokenization with flattened content blocks; `TestResponsesRequestDispatch` validates content extraction from nested inputs, fingerprint field canonicalization, and token count constraints.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops through responses with glee,
Input shapes flattened with care—
Prompt text joins threads like a spree,
Tokenization, fingerprints fair!
New request type, tested with flair. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: introducing a new ResponsesRequest model with complete integration throughout the mock-server codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/aiperf_mock_server/models.py (1)
242-242: 💤 Low value

Consider more specific type hint for input field.

The input field is typed as str | list[Any], but based on _flatten_responses_input logic (lines 265-283), the actual expected shapes are more specific: strings, lists of strings, or lists of dicts with content fields. Consider narrowing to str | list[str | dict[str, Any]] for better type safety.
♻️ Proposed type refinement
-    input: str | list[Any] = ""
+    input: str | list[str | dict[str, Any]] = ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/aiperf_mock_server/models.py` at line 242, The current model field
named input is too broad (str | list[Any]); update its type to a more specific
union to match _flatten_responses_input expectations: use str | list[str |
dict[str, Any]] (ensure Any is imported from typing or typing_extensions
depending on project) so the field accepts plain strings, lists of strings, or
lists of dicts with content keys; update the type annotation for the input field
and run type checks to verify compatibility with the _flatten_responses_input
function and any serializers/deserializers that consume this model.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/aiperf_mock_server/models.py`:
- Around line 231-250: ResponsesRequest's Pydantic fields lack Field(...)
descriptions; update each field in the ResponsesRequest class (model, input,
max_output_tokens, stream, reasoning_effort, min_tokens, ignore_eos) to use
Field(..., description="...") with concise descriptions matching their purpose,
e.g. Field(default="", description="prompt input as string or list...") for
input and appropriate defaults for others, and ensure Field is imported from
pydantic if not already.

---

Nitpick comments:
In `@tests/aiperf_mock_server/models.py`:
- Line 242: The current model field named input is too broad (str | list[Any]);
update its type to a more specific union to match _flatten_responses_input
expectations: use str | list[str | dict[str, Any]] (ensure Any is imported from
typing or typing_extensions depending on project) so the field accepts plain
strings, lists of strings, or lists of dicts with content keys; update the type
annotation for the input field and run type checks to verify compatibility with
the _flatten_responses_input function and any serializers/deserializers that
consume this model.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca885e47-196d-495a-a0ae-934516df7bb5

📥 Commits

Reviewing files that changed from the base of the PR and between 03167d0 and f26c562.

📒 Files selected for processing (8)

tests/aiperf_mock_server/app.py
tests/aiperf_mock_server/models.py
tests/aiperf_mock_server/request_recorder.py
tests/aiperf_mock_server/tokens.py
tests/aiperf_mock_server/utils.py
tests/unit/aiperf_mock_server/test_request_recorder.py
tests/unit/server/test_models.py
tests/unit/server/test_tokens.py

coderabbitai · 2026-05-27T00:31:52Z

+class ResponsesRequest(BaseModel):
+    """Request model for OpenAI's /v1/responses endpoint.
+
+    The Responses API takes its prompt under `input` (which may be a string,
+    a list of strings, or a list of content-block dicts) and caps generation
+    via `max_output_tokens` rather than the chat API's `max_completion_tokens`.
+    Modeled here so the recorder can capture the real payload instead of the
+    synthetic ChatCompletionRequest the latency simulator drives off of.
+    """
+
+    model: str
+    input: str | list[Any] = ""
+    max_output_tokens: int | None = None
+    stream: bool = False
+    reasoning_effort: Literal["low", "medium", "high"] | None = None
+
+    # Mirrors BaseCompletionRequest so recorder/simulator share field semantics
+    # when the client supplies them via extras.
+    min_tokens: int | None = None
+    ignore_eos: bool = False


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add Field descriptions to all Pydantic fields.

All fields in ResponsesRequest lack Field(description="...") annotations. As per coding guidelines, every Pydantic field must include a description.

📝 Proposed fix to add Field descriptions

+from pydantic import Field + class ResponsesRequest(BaseModel): """Request model for OpenAI's /v1/responses endpoint. The Responses API takes its prompt under `input` (which may be a string, a list of strings, or a list of content-block dicts) and caps generation via `max_output_tokens` rather than the chat API's `max_completion_tokens`. Modeled here so the recorder can capture the real payload instead of the synthetic ChatCompletionRequest the latency simulator drives off of. """ - model: str - input: str | list[Any] = "" - max_output_tokens: int | None = None - stream: bool = False - reasoning_effort: Literal["low", "medium", "high"] | None = None + model: str = Field(description="Model identifier for the Responses API endpoint") + input: str | list[Any] = Field( + default="", + description="Prompt input: string, list of strings, or list of content-block dicts", + ) + max_output_tokens: int | None = Field( + default=None, + description="Maximum number of tokens to generate in the completion", + ) + stream: bool = Field( + default=False, + description="Whether to stream the response as server-sent events", + ) + reasoning_effort: Literal["low", "medium", "high"] | None = Field( + default=None, + description="Reasoning effort level for extended thinking models", + ) # Mirrors BaseCompletionRequest so recorder/simulator share field semantics # when the client supplies them via extras. - min_tokens: int | None = None - ignore_eos: bool = False + min_tokens: int | None = Field( + default=None, + description="Minimum number of tokens to generate before allowing EOS", + ) + ignore_eos: bool = Field( + default=False, + description="Whether to ignore end-of-sequence tokens during generation", + )

As per coding guidelines: "Add Field(description="...") on EVERY Pydantic field".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/aiperf_mock_server/models.py` around lines 231 - 250, ResponsesRequest's Pydantic fields lack Field(...) descriptions; update each field in the ResponsesRequest class (model, input, max_output_tokens, stream, reasoning_effort, min_tokens, ignore_eos) to use Field(..., description="...") with concise descriptions matching their purpose, e.g. Field(default="", description="prompt input as string or list...") for input and appropriate defaults for others, and ensure Field is imported from pydantic if not already.

dynamo-ops · 2026-05-27T00:32:19Z

    mock_req = ChatCompletionRequest(
        model=model,
-        messages=[{"role": "user", "content": _extract_responses_prompt(req)}],
+        messages=[{"role": "user", "content": req.prompt_text}],


The Responses handler still builds the request context from a synthetic ChatCompletionRequest, so real /v1/responses calls never exercise the new ResponsesRequest recorder/tokenizer/request-id dispatch and drop max_output_tokens, min_tokens, ignore_eos, and reasoning_effort. Fix: build the context from the parsed ResponsesRequest instead.

🤖 AI Fix

In tests/aiperf_mock_server/app.py, update responses() to call make_ctx(req, endpoint, request.state.start_time) and remove the mock_req ChatCompletionRequest construction so ResponsesRequest drives tokenization, request IDs, and request recording.

codecov · 2026-05-27T00:35:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

FrankD412 · 2026-05-27T19:35:27Z

Request distribution (1000 requests)
──────────────────────────────────────────────
  Definitions
    ISL/OSL: input/requested output sequence length in tokens; OSL is the request cap, not generated output.
    Vocab used: unique token IDs observed / tokenizer vocab size.
    top-10 cover: share of prompt tokens from the 10 most common token IDs.
    entropy: token-id diversity; higher means broader prompt vocabulary use.
    top decoded tokens: most frequent token IDs decoded for sanity checks; tokens are not words.
    vocab shape: log-scaled 80-bucket view across token-id space.
    vocab shape stats: mean/percentiles of prompt-token counts per bucket, including empty buckets.

  /v1/chat/completions  n=1000
    ISL            mean  1121.1   min    37   max  2527   p50  1138   p99  2201
    Requested OSL  mean   128.0   min   128   max   128   p50   128   p99   128

    ISL histogram (25 bins, n=1000, 769 unique)
        37-  137   10 ██░░░░░░░░░░░░░░░░░░
       137-  236   15 ███░░░░░░░░░░░░░░░░░
       236-  336   23 █████░░░░░░░░░░░░░░░
       336-  435   39 ████████░░░░░░░░░░░░
       435-  535   40 ████████░░░░░░░░░░░░
       535-  635   43 █████████░░░░░░░░░░░
       635-  734   51 ██████████░░░░░░░░░░
       734-  834   61 ████████████░░░░░░░░
       834-  933   71 ██████████████░░░░░░
       933- 1033   68 ██████████████░░░░░░
      1033- 1133   73 ███████████████░░░░░
      1133- 1232  100 ████████████████████
      1232- 1332   86 █████████████████░░░
      1332- 1431   68 ██████████████░░░░░░
      1431- 1531   57 ███████████░░░░░░░░░
      1531- 1631   48 ██████████░░░░░░░░░░
      1631- 1730   39 ████████░░░░░░░░░░░░
      1730- 1830   30 ██████░░░░░░░░░░░░░░
      1830- 1929   32 ██████░░░░░░░░░░░░░░
      1929- 2029   17 ███░░░░░░░░░░░░░░░░░
      2029- 2129    9 ██░░░░░░░░░░░░░░░░░░
      2129- 2228   13 ███░░░░░░░░░░░░░░░░░
      2228- 2328    3 █░░░░░░░░░░░░░░░░░░░
      2328- 2427    3 █░░░░░░░░░░░░░░░░░░░
      2427- 2527    1 ░░░░░░░░░░░░░░░░░░░░

    Requested OSL histogram (1 bins, n=1000, 1 unique)
      128- 128  1000 ████████████████████


    Vocab  used 18223/128000 (14.2%)  top-10 cover 13%  entropy 10.7/17.0 bits
      top decoded tokens: " the" 24285, " I" 20910, " and" 18940, " to" 16151, " of" 16115

    vocab shape  (80 buckets over id 0..127999, log-y)

      bucket tokens mean 13900.7   p50  2380   p90 15189   p95 34966   p99 212801

    ██▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▅▅▅▅▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▁▃▁▃▂▂▂▂▂▁▂▂▃▃▂▂▁
    0                   32K                 64K                 96K             128K

This is confirming that we haven't messed up the ISL/OSL distribution.

github-actions Bot added the feat label May 27, 2026

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

dynamo-ops reviewed May 27, 2026

View reviewed changes

Merge branch 'main' into fdinatale/mock-server-request-recorder

3d656d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mock-server): add ResponsesRequest model with full dispatch plumbing#1000

feat(mock-server): add ResponsesRequest model with full dispatch plumbing#1000
FrankD412 wants to merge 2 commits into
mainfrom
fdinatale/mock-server-request-recorder

FrankD412 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 27, 2026

Uh oh!

dynamo-ops May 27, 2026

Uh oh!

codecov Bot commented May 27, 2026

Uh oh!

FrankD412 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FrankD412 commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

Uh oh!

coderabbitai Bot commented May 27, 2026

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

dynamo-ops May 27, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 27, 2026

Codecov Report

Uh oh!

FrankD412 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FrankD412 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading