Add OpenAI Responses API compatibility layer #540

ericcurtin · 2026-01-06T17:43:16Z

The OpenAI Responses API provides a stateful interface combining chat
completions with conversation management and tool use. This change
introduces a new API endpoint at /responses that transforms requests to
the standard chat completion format and handles both streaming and
non-streaming responses. Includes in-memory storage with TTL for
response state management and comprehensive test coverage.

Signed-off-by: Eric Curtin [email protected]

gemini-code-assist · 2026-01-06T17:43:37Z

Summary of Changes

Hello @ericcurtin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a new compatibility layer for the OpenAI Responses API, enabling a more advanced, stateful interaction with chat completion models. It abstracts the complexities of conversation history and tool orchestration, presenting a unified /responses endpoint that manages the lifecycle of AI-generated responses, from initial request to final output, including streaming capabilities and in-memory state persistence.

Highlights

New API Endpoint: Introduced a new API endpoint at /responses (and /v1/responses) to provide a compatibility layer for the OpenAI Responses API.
Stateful Conversation Management: The new API offers a stateful interface that combines chat completions with conversation management and tool use, allowing for chained responses.
Streaming and Non-Streaming Support: The compatibility layer handles both streaming and non-streaming responses from the underlying chat completion service.
In-Memory State Storage: An in-memory store with a configurable Time-To-Live (TTL) is implemented to manage the state of responses, ensuring conversation history is maintained for a period.
Request/Response Transformation: Includes logic to transform incoming requests from the Responses API format to standard chat completion requests and to convert chat completion responses back to the Responses API format.
Comprehensive Testing: New unit tests have been added for the HTTP handler, response store, and transformation logic, covering various scenarios including streaming, tool calls, and error handling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey - I've found 9 issues, and left some high level feedback:

In the streaming path, non-200 upstream responses are only handled via WriteHeader and never translated into Responses API error events even though sendError exists; consider detecting non-OK status codes in the streaming adapter and emitting appropriate error/response.failed SSE events before saving the failed response.
The handleListInputItems handler is implemented but never registered on the mux, so /responses/{id}/input_items is currently unreachable; either wire this route into NewHTTPHandler or remove the dead code.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In the streaming path, non-200 upstream responses are only handled via WriteHeader and never translated into Responses API error events even though sendError exists; consider detecting non-OK status codes in the streaming adapter and emitting appropriate `error`/`response.failed` SSE events before saving the failed response.
- The `handleListInputItems` handler is implemented but never registered on the mux, so `/responses/{id}/input_items` is currently unreachable; either wire this route into `NewHTTPHandler` or remove the dead code.

## Individual Comments

### Comment 1
<location> `pkg/responses/transform.go:370-371` </location>
<code_context>
+			}
+		}
+
+		// Handle text content
+		if content, ok := choice.Message.Content.(string); ok && content != "" {
+			msgItem := OutputItem{
+				ID:     GenerateMessageID(),
</code_context>

<issue_to_address>
**issue (bug_risk):** Chat completion content that is not a plain string (e.g. multi-part or images) is currently dropped

`TransformChatCompletionToResponse` only handles `choice.Message.Content` when it is a `string`. If the upstream returns `[]ChatContentPart` (for multi-part, image, or other structured content), it will be ignored and the response will appear empty. Please either map `[]ChatContentPart` into `ContentPart` entries or fail fast so this limitation is explicit rather than silently dropping content.
</issue_to_address>

### Comment 2
<location> `pkg/responses/streaming.go:121-122` </location>
<code_context>
+
+// processChunk processes a single SSE chunk from chat completions.
+func (s *StreamingResponseWriter) processChunk(dataStr string) {
+	var chunk ChatStreamChunk
+	if err := json.Unmarshal([]byte(dataStr), &chunk); err != nil {
+		return
+	}
</code_context>

<issue_to_address>
**issue (bug_risk):** Swallowing JSON unmarshal errors in streaming hides upstream protocol issues

In `processChunk`, JSON unmarshal errors are ignored, so malformed or changed SSE payloads are dropped with no visibility. Consider surfacing this via `sendError` (e.g., `EventError`) or at least aborting further processing for this response to avoid sending partial or inconsistent output.
</issue_to_address>

### Comment 3
<location> `pkg/responses/streaming.go:108-104` </location>
<code_context>
+		}
+
+		dataStr := strings.TrimPrefix(line, "data: ")
+		if dataStr == "[DONE]" {
+			s.finalize()
+			continue
+		}
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Streaming finalization does not handle upstream non-text finish signals (e.g. usage, finish_reason)

The streaming handler only treats `[DONE]` as a terminator and ignores final chunks that contain only metadata like `finish_reason` or `usage`. As a result, this information is dropped from both the streaming output and the stored `Response` (e.g., `Usage` stays unset). Consider detecting and propagating those final metadata-only chunks into `resp.Status`, `resp.Usage`, or `IncompleteDetails` before calling `finalize()`.

Suggested implementation:

```golang
	for _, line := range lines {
		line = strings.TrimSpace(line)
		if !strings.HasPrefix(line, "data: ") {
			continue
		}

		dataStr := strings.TrimPrefix(line, "data: ")

		// Detect and propagate upstream metadata-only final chunks (e.g. usage, finish_reason)
		// before we consider the stream finalized. This ensures that Response.Usage and
		// Response.Status.FinishReason are preserved even when the final chunk has no text.
		if dataStr != "" && dataStr != "[DONE]" {
			var metaEnvelope struct {
				Usage   *Usage `json:"usage,omitempty"`
				Choices []struct {
					FinishReason string          `json:"finish_reason,omitempty"`
					Delta        json.RawMessage `json:"delta,omitempty"`
				} `json:"choices,omitempty"`
			}

			if err := json.Unmarshal([]byte(dataStr), &metaEnvelope); err == nil {
				// If upstream sent usage only in the final chunk, capture it once.
				if metaEnvelope.Usage != nil && s.resp != nil && s.resp.Usage == nil {
					s.resp.Usage = metaEnvelope.Usage
				}

				// If we have a finish_reason but an empty delta, treat this as a
				// metadata-only final chunk and propagate the finish state.
				if s.resp != nil {
					for _, choice := range metaEnvelope.Choices {
						if choice.FinishReason != "" && len(choice.Delta) == 0 {
							if s.resp.Status == nil {
								s.resp.Status = &Status{}
							}
							s.resp.Status.FinishReason = choice.FinishReason
							break
						}
					}
				}
			}
		}

```

This edit assumes the following, which you should verify and adjust to your actual types:

1. The streaming writer has access to the in-flight response via a field named `s.resp` of type `*Response`.
2. `Response` has:
   - a `Usage *Usage` field.
   - a `Status *Status` field, where `Status` has a `FinishReason string` (or compatible) field.
3. If your actual field or type names differ (e.g. `CompletionUsage`, `ResponseStatus`, `FinishReason *string`, or `IncompleteDetails` for non-success reasons), update the struct tags and assignments accordingly.
4. If you already decode chunks into a dedicated upstream type elsewhere in the same function, you may want to:
   - Reuse that type instead of the inline `metaEnvelope` struct.
   - Or move this metadata propagation logic to where that existing decoding happens to avoid double-unmarshalling.
</issue_to_address>

### Comment 4
<location> `pkg/responses/handler.go:37-40` </location>
<code_context>
+	h.router.HandleFunc("POST "+APIPrefix, h.handleCreate)
+	h.router.HandleFunc("GET "+APIPrefix+"/{id}", h.handleGet)
+	h.router.HandleFunc("DELETE "+APIPrefix+"/{id}", h.handleDelete)
+	// Also register /v1/responses routes
+	h.router.HandleFunc("POST /v1"+APIPrefix, h.handleCreate)
+	h.router.HandleFunc("GET /v1"+APIPrefix+"/{id}", h.handleGet)
+	h.router.HandleFunc("DELETE /v1"+APIPrefix+"/{id}", h.handleDelete)
+
+	// Apply CORS middleware
</code_context>

<issue_to_address>
**issue (bug_risk):** Input items listing endpoint is implemented but not wired into the router

`handleListInputItems` is implemented for `GET /responses/{id}/input_items` but no route is registered in `NewHTTPHandler`, so this code is currently unreachable. If this endpoint is part of the public API, please add routes for `/responses/{id}/input_items` and `/v1/responses/{id}/input_items` to the mux.
</issue_to_address>

### Comment 5
<location> `pkg/responses/handler.go:57` </location>
<code_context>
+// handleCreate handles POST /responses (or /v1/responses).
+func (h *HTTPHandler) handleCreate(w http.ResponseWriter, r *http.Request) {
+	// Read request body
+	body, err := io.ReadAll(http.MaxBytesReader(w, r.Body, 10*1024*1024))
+	if err != nil {
+		h.sendError(w, http.StatusBadRequest, "invalid_request", "Failed to read request body")
</code_context>

<issue_to_address>
**suggestion:** Request size limit might be low for multimodal/large-input use cases and is hard-coded

`handleCreate` currently limits request bodies to 10 MiB via `http.MaxBytesReader`. For larger prompts, tool definitions, or multimodal inputs, this may be too restrictive and surfaces only as a generic `invalid_request`. Consider making this limit configurable (e.g., env/flag) and returning a specific “request too large” error when it’s exceeded.

Suggested implementation:

```golang
	// handleCreate handles POST /responses (or /v1/responses).
func (h *HTTPHandler) handleCreate(w http.ResponseWriter, r *http.Request) {
	// Read request body with a configurable limit
	reader := http.MaxBytesReader(w, r.Body, h.maxRequestBodyBytes)
	defer reader.Close()

	body, err := io.ReadAll(reader)
	if err != nil {
		var maxBytesErr *http.MaxBytesError
		if errors.As(err, &maxBytesErr) {
			h.sendError(
				w,
				http.StatusRequestEntityTooLarge,
				"request_too_large",
				fmt.Sprintf("Request body too large (max %d bytes)", h.maxRequestBodyBytes),
			)
			return
		}

		h.sendError(w, http.StatusBadRequest, "invalid_request", "Failed to read request body")
		return
	}

```

To fully wire this up, you will also need to:

1. Ensure the following imports exist at the top of `pkg/responses/handler.go`:
   - `import "errors"`
   - `import "fmt"`
2. Add a configurable field on `HTTPHandler`, for example:
   - `maxRequestBodyBytes int64`
3. Initialize `maxRequestBodyBytes` from configuration (env/flag/config struct) when constructing `HTTPHandler`, with a sensible default (e.g., 10 * 1024 * 1024) if not provided.
4. Optionally, document the configuration knob (env var/flag) so operators know how to raise/lower the limit for large or multimodal requests.
</issue_to_address>

### Comment 6
<location> `pkg/responses/handler_test.go:354-363` </location>
<code_context>
+func TestHandler_CreateResponse_Streaming(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** Streaming test only asserts on SSE text; it does not verify that the underlying Response state is persisted correctly in the store

Currently this test only checks for event names in the SSE body. It should also verify that `StreamingResponseWriter` produces a coherent `Response` that’s persisted in `handler.store`. For example, capture the `resp_id` from the first `response.created` event (or handler state), fetch it from `handler.store`, and assert that `Status` is `completed`, `OutputText` is `"Hello!"` (from the "Hello" + "!" chunks), and that there is an `OutputItem` message matching the streamed text. This will validate the end-to-end state, not just the emitted SSE lines.

Suggested implementation:

```golang
func TestHandler_CreateResponse_Streaming(t *testing.T) {
	// Mock streaming response
	mock := &mockSchedulerHTTP{
		streaming: true,
		streamChunks: []string{
			"data: {\"id\":\"chatcmpl-123\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"gpt-4\",\"choices\":[{\"index\":0,\"delta\":{\"role\":\"assistant\"},\"finish_reason\":null}]}\n\n",
			"data: {\"id\":\"chatcmpl-123\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"gpt-4\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"Hello\"},\"finish_reason\":null}]}\n\n",
			"data: {\"id\":\"chatcmpl-123\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"gpt-4\",\"choices\":[{\"index\":0,\"delta\":{\"content\":\"!\"},\"finish_reason\":null}]}\n\n",
			"data: {\"id\":\"chatcmpl-123\",\"object\":\"chat.completion.chunk\",\"created\":1234567890,\"model\":\"gpt-4\",\"choices\":[{\"index\":0,\"delta\":{},\"finish_reason\":\"stop\"}]}\n\n",
			"data: [DONE]\n\n",
		},
	}

	// existing setup and HTTP request logic should remain unchanged:
	// - construct handler with `mock` as scheduler
	// - issue a POST /responses with Accept: text/event-stream
	// - collect SSE body into `body` (string) and keep a handle to `handler` and its `store`
	//
	// The new assertions below assume:
	//   - `handler` is the instance under test
	//   - `body` contains the SSE payload as a string
	//   - `handler.store` is an in-memory store that holds `Response` objects
	//     keyed by ID, e.g. `map[string]*Response`.

	// Verify existing SSE behavior (already present in the original test, kept conceptually here).
	// Example (adapt to your actual assertions):
	//
	// if !strings.Contains(body, "event: response.created") {
	// 	t.Fatalf("expected response.created event in SSE stream, got:\n%s", body)
	// }
	// if !strings.Contains(body, "event: response.completed") {
	// 	t.Fatalf("expected response.completed event in SSE stream, got:\n%s", body)
	// }

	// --- New: verify that the StreamingResponseWriter persisted a coherent Response in the store ---

	// Assuming the handler has a `store` field with all responses.
	// Adjust the concrete store type and field name to match your codebase (see <additional_changes>).
	memStore, ok := handler.store.(*memoryStore)
	if !ok {
		t.Fatalf("expected handler.store to be *memoryStore, got %T", handler.store)
	}

	if len(memStore.responses) != 1 {
		t.Fatalf("expected exactly one response in store, got %d", len(memStore.responses))
	}

	var persistedResp *Response
	for _, r := range memStore.responses {
		persistedResp = r
		break
	}
	if persistedResp == nil {
		t.Fatal("expected a persisted Response, got nil")
	}

	// Status should be completed after streaming finishes
	if persistedResp.Status != StatusCompleted {
		t.Errorf("persisted response status = %s, want %s", persistedResp.Status, StatusCompleted)
	}

	// OutputText should match concatenated streamed chunks: "Hello" + "!" => "Hello!"
	if persistedResp.OutputText != "Hello!" {
		t.Errorf("persisted response OutputText = %q, want %q", persistedResp.OutputText, "Hello!")
	}

	// There should be at least one OutputItem whose message content matches "Hello!"
	// Adjust field names/structure to match your Response model.
	found := false
	for _, item := range persistedResp.OutputItems {
		if item.Type != OutputItemTypeMessage {
			continue
		}
		// Example if messages are simple text:
		if item.Message != nil && item.Message.Text == "Hello!" {
			found = true
			break
		}
		// Or, if the message contains a slice of text parts, adjust accordingly.
	}
	if !found {
		t.Errorf("expected an OutputItem message with text %q in persisted response", "Hello!")
	}

```

).
	memStore, ok := handler.store.(*memoryStore)
	if !ok {
		t.Fatalf("expected handler.store to be *memoryStore, got %T", handler.store)
	}

	if len(memStore.responses) != 1 {
		t.Fatalf("expected exactly one response in store, got %d", len(memStore.responses))
	}

	var persistedResp *Response
	for _, r := range memStore.responses {
		persistedResp = r
		break
	}
	if persistedResp == nil {
		t.Fatal("expected a persisted Response, got nil")
	}

	// Status should be completed after streaming finishes
	if persistedResp.Status != StatusCompleted {
		t.Errorf("persisted response status = %s, want %s", persistedResp.Status, StatusCompleted)
	}

	// OutputText should match concatenated streamed chunks: "Hello" + "!" => "Hello!"
	if persistedResp.OutputText != "Hello!" {
		t.Errorf("persisted response OutputText = %q, want %q", persistedResp.OutputText, "Hello!")
	}

	// There should be at least one OutputItem whose message content matches "Hello!"
	// Adjust field names/structure to match your Response model.
	found := false
	for _, item := range persistedResp.OutputItems {
		if item.Type != OutputItemTypeMessage {
			continue
		}
		// Example if messages are simple text:
		if item.Message != nil && item.Message.Text == "Hello!" {
			found = true
			break
		}
		// Or, if the message contains a slice of text parts, adjust accordingly.
	}
	if !found {
		t.Errorf("expected an OutputItem message with text %q in persisted response", "Hello!")
	}
>>>>>>> REPLACE
</file_operation>
</file_operations>

<additional_changes>
The above edit assumes certain handler and store shapes that you’ll need to align with your actual implementation:

1. **Handler and store access**
   - Ensure the test has access to the `handler` instance used to serve the streaming request.
   - If your test currently only has an `http.Handler` (e.g. `router`), adjust the code so `handler` is the concrete type that exposes `.store`.
   - If your store type is not `*memoryStore`, replace `*memoryStore` and `memStore.responses` with the correct concrete type and field (e.g. `handler.store.(*InMemoryResponseStore).responses`).

2. **Response model fields**
   - Replace `Response`, `StatusCompleted`, `OutputText`, and `OutputItems` with the actual types/fields used in your project.
   - For example, your `Response` might look like:
     - `type Response struct { Status ResponseStatus; OutputText string; Output []OutputItem }`
     - or `Output []Message` instead of `OutputItems`.
   - Adjust the loop over `persistedResp.OutputItems` to match the real structure (e.g. `persistedResp.Output`, `item.Content`, etc.).

3. **Message content assertion**
   - If your message structure stores parts (e.g. `item.Message.Content []ContentPart` where each `ContentPart` has `Text string`), adapt the assertion to concatenate or match those parts against `"Hello!"`.

4. **SSE body variable**
   - The comments assume you already compute `body` from `resp.Body`. Keep your existing SSE assertions as-is and only add the new store assertions after the streaming has completed.
</issue_to_address>

### Comment 7
<location> `pkg/responses/handler_test.go:313-322` </location>
<code_context>
+func TestHandler_CreateResponse_UpstreamError(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** Non‑streaming error handling test does not cover the case where the upstream returns a non‑JSON or malformed error body

The non‑streaming path assumes a JSON error body and falls back to a generic `upstream_error` with the raw body when JSON parsing fails. Please add a test that uses a non‑JSON `mock.response` (e.g. plain text) with a non‑200 `statusCode` and asserts:
- Response status is `failed`.
- `Error.Code` is `"upstream_error"`.
- `Error.Message` contains the raw body.
This will exercise the fallback branch in `handleNonStreaming`.

Suggested implementation:

```golang
func TestHandler_CreateResponse_UpstreamError(t *testing.T) {
	mock := &mockSchedulerHTTP{
		statusCode: http.StatusInternalServerError,
		response: `{
			"error": {
				"message": "Model overloaded",
				"type": "server_error",
				"code": "model_overloaded"
			}
		}`,
	}

	// NOTE: This body should mirror however other tests in this file
	// construct the handler, request, and decode the response.
	handler := &Handler{
		schedulerHTTP: mock,
	}

	req := httptest.NewRequest(http.MethodPost, "/v1/responses", bytes.NewBufferString(`{"input":"test"}`))
	w := httptest.NewRecorder()

	handler.CreateResponse(w, req)

	resp := w.Result()
	defer resp.Body.Close()

	var result Response
	if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
		t.Fatalf("failed to decode response: %v", err)
	}

	if result.Status != "failed" {
		t.Errorf("status = %v, want failed", result.Status)
	}

	if result.Error == nil {
		t.Fatalf("expected error, got nil")
	}

	if result.Error.Code != "upstream_error" && result.Error.Code != "model_overloaded" {
		t.Errorf("error.code = %v, want upstream_error or model_overloaded", result.Error.Code)
	}
}

func TestHandler_CreateResponse_UpstreamError_NonJSONBody(t *testing.T) {
	mock := &mockSchedulerHTTP{
		statusCode: http.StatusInternalServerError,
		// non-JSON / malformed body to exercise the fallback branch in handleNonStreaming
		response: "upstream exploded in a non-json way",
	}

	handler := &Handler{
		schedulerHTTP: mock,
	}

	req := httptest.NewRequest(http.MethodPost, "/v1/responses", bytes.NewBufferString(`{"input":"test"}`))
	w := httptest.NewRecorder()

	handler.CreateResponse(w, req)

	resp := w.Result()
	defer resp.Body.Close()

	var result Response
	if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
		t.Fatalf("failed to decode response: %v", err)
	}

	// Assert: non-streaming error handling falls back correctly
	if result.Status != "failed" {
		t.Errorf("status = %v, want failed", result.Status)
	}

	if result.Error == nil {
		t.Fatalf("expected error, got nil")
	}

	if result.Error.Code != "upstream_error" {
		t.Errorf("error.code = %v, want upstream_error", result.Error.Code)
	}

	if !strings.Contains(result.Error.Message, "upstream exploded in a non-json way") {
		t.Errorf("error.message = %q, want to contain raw upstream body", result.Error.Message)
	}
}

```

To integrate this with the existing test file, you may need to:

1. Ensure the following imports exist (or are extended) at the top of `pkg/responses/handler_test.go`:
   - `net/http`
   - `net/http/httptest`
   - `bytes`
   - `encoding/json`
   - `strings`

2. Adjust the construction of `Handler` to match the actual implementation in your codebase:
   - If there is a constructor (e.g. `newTestHandler(mock)` or `NewHandler(...)`), use that instead of `&Handler{schedulerHTTP: mock}`.
   - If the scheduler client field has a different name or type, update `schedulerHTTP` accordingly.

3. Align the request path/method/body with other tests in the file:
   - If other tests use a different URL or payload, mirror that for consistency.

4. Confirm `Response` has `Status` and `Error` fields with `Code` and `Message`; if the field names differ, update the assertions to match.
</issue_to_address>

### Comment 8
<location> `pkg/responses/transform_test.go:70-79` </location>
<code_context>
+	}
+}
+
+func TestTransformRequestToChatCompletion_MessageArray(t *testing.T) {
+	input := `[
+		{"role": "user", "content": "Hello"},
+		{"role": "assistant", "content": "Hi there!"},
+		{"role": "user", "content": "How are you?"}
+	]`
+
+	req := &CreateRequest{
+		Model: "gpt-4",
+		Input: json.RawMessage(input),
+	}
+
+	chatReq, err := TransformRequestToChatCompletion(req, nil)
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+
+	if len(chatReq.Messages) != 3 {
+		t.Fatalf("got %d messages, want 3", len(chatReq.Messages))
+	}
+
+	expectedRoles := []string{"user", "assistant", "user"}
+	for i, msg := range chatReq.Messages {
+		if msg.Role != expectedRoles[i] {
+			t.Errorf("message %d role = %s, want %s", i, msg.Role, expectedRoles[i])
+		}
</code_context>

<issue_to_address>
**suggestion (testing):** No tests cover multi‑part content (e.g., text + image) in the request input to exercise parseContent

Current tests only cover string `content` and arrays of message objects, but not the case where a message `content` is an array of `InputContentPart` (text + image). Please add a test with a `user` message whose `content` is something like `[{"type":"input_text","text":"Hi"},{"type":"input_image","image_url":"..."}]` and assert that `TransformRequestToChatCompletion` produces a `ChatMessage` with a `[]ChatContentPart` containing matching `text` and `image_url` entries. This will directly exercise `parseContent` and the type mapping logic.

Suggested implementation:

```golang
	if chatReq.Messages[0].Content != "You are a helpful assistant." {
		t.Errorf("system content = %v, want You are a helpful assistant.", chatReq.Messages[0].Content)
	}

	// Second message should be user
	if chatReq.Messages[1].Role != "user" {
		t.Errorf("second message role = %s, want user", chatReq.Messages[1].Role)
	}
}

func TestTransformRequestToChatCompletion_MessageArray(t *testing.T) {
	input := `[
		{"role": "user", "content": "Hello"},
		{"role": "assistant", "content": "Hi there!"},
		{"role": "user", "content": "How are you?"}
	]`

	req := &CreateRequest{
		Model: "gpt-4",
		Input: json.RawMessage(input),
	}

	chatReq, err := TransformRequestToChatCompletion(req, nil)
	if err != nil {
		t.Fatalf("unexpected error: %v", err)
	}

	if len(chatReq.Messages) != 3 {
		t.Fatalf("got %d messages, want 3", len(chatReq.Messages))
	}

	expectedRoles := []string{"user", "assistant", "user"}
	for i, msg := range chatReq.Messages {
		if msg.Role != expectedRoles[i] {
			t.Errorf("message %d role = %s, want %s", i, msg.Role, expectedRoles[i])
		}
	}
}

func TestTransformRequestToChatCompletion_MessageArrayWithMultiPartContent(t *testing.T) {
	input := `[
		{
			"role": "user",
			"content": [
				{
					"type": "input_text",
					"text": "Hi"
				},
				{
					"type": "input_image",
					"image_url": "https://example.com/image.png"
				}
			]
		}
	]`

	req := &CreateRequest{
		Model: "gpt-4",
		Input: json.RawMessage(input),
	}

	chatReq, err := TransformRequestToChatCompletion(req, nil)
	if err != nil {
		t.Fatalf("unexpected error: %v", err)
	}

	if len(chatReq.Messages) != 1 {
		t.Fatalf("got %d messages, want 1", len(chatReq.Messages))
	}

	msg := chatReq.Messages[0]
	if msg.Role != "user" {
		t.Fatalf("message role = %s, want user", msg.Role)
	}

	contentParts, ok := msg.Content.([]ChatContentPart)
	if !ok {
		t.Fatalf("message content has type %T, want []ChatContentPart", msg.Content)
	}

	if len(contentParts) != 2 {
		t.Fatalf("got %d content parts, want 2", len(contentParts))
	}

	if contentParts[0].Type != "text" {
		t.Errorf("first content part type = %s, want text", contentParts[0].Type)
	}
	if contentParts[0].Text != "Hi" {
		t.Errorf("first content part text = %q, want %q", contentParts[0].Text, "Hi")
	}

	if contentParts[1].Type != "image_url" {
		t.Errorf("second content part type = %s, want image_url", contentParts[1].Type)
	}
	if contentParts[1].ImageURL != "https://example.com/image.png" {
		t.Errorf("second content part image_url = %q, want %q", contentParts[1].ImageURL, "https://example.com/image.png")
	}
}

func TestTransformRequestToChatCompletion_WithTools(t *testing.T) {
	req := &CreateRequest{
		Model: "gpt-4",
		Input: json.RawMessage(`"What's the weather?"`),
		Tools: []Tool{
			{
				Type: "function",

```

Because only part of the file and related types are visible, you may need to align the field and type names in the new test with the actual implementation:

1. **`ChatContentPart` type and fields**  
   - Update the type assertion `msg.Content.([]ChatContentPart)` if `Content` has a different concrete type (e.g., `[]ChatMessageContent`, `[]ChatCompletionContentPart`, or a pointer slice).
   - Adjust the field names `Type`, `Text`, and `ImageURL` if your struct uses different casing or JSON tags (e.g., `ContentType`, `Value`, `ImageUrl`).

2. **Type mapping in `parseContent`**  
   - The test assumes `input_text` → `Type: "text"` and `input_image` → `Type: "image_url"`. If your mapping uses different output type strings, update the expected values accordingly.

3. **Imports**  
   - Ensure `encoding/json` is imported in `transform_test.go` (it likely already is, given the existing tests). If not, add `import "encoding/json"` at the top of the file.
</issue_to_address>

### Comment 9
<location> `pkg/responses/transform_test.go:259-268` </location>
<code_context>
+func TestTransformChatCompletionToResponse_ToolCalls(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** TransformChatCompletionToResponse tests don’t verify behavior when both tool calls and assistant text are present in the same choice

Right now we only cover tool calls and plain text separately. Please add a test where `ChatChoice.Message` has both `ToolCalls` and non‑empty `Content`, and verify that `TransformChatCompletionToResponse` appends:
- one function_call `OutputItem` per tool call, and
- a message `OutputItem` whose `OutputText` matches the assistant text.
This will lock in ordering and guard against dropping either part of the data.

Suggested implementation:

```golang
func TestTransformChatCompletionToResponse_ToolCalls(t *testing.T) {
	chatResp := &ChatCompletionResponse{
		ID:      "chatcmpl-123",
		Object:  "chat.completion",
		Created: 1234567890,
		Model:   "gpt-4",
		Choices: []ChatChoice{
			{
				Index: 0,
				Message: ChatMessage{
					Role:    "assistant",
					Content: "assistant text that should become an OutputItem message",

```

To fully implement your review comment, you should also update the body of `TestTransformChatCompletionToResponse_ToolCalls` (or add a new test alongside it) to **assert the mixed behavior**:

1. After calling `TransformChatCompletionToResponse(chatResp)`:
   - Assert there is no error.
   - Assert the returned response has `Output` (or equivalent) length equal to `len(toolCalls)+1`.
2. For each tool call in `chatResp.Choices[0].Message.ToolCalls`, assert the corresponding `OutputItem`:
   - Has `Type` (or equivalent) set to `"function_call"`.
   - Contains a `FunctionCall` payload whose name/arguments match the original tool call.
3. Assert that one `OutputItem` has:
   - `Type == "message"` (or equivalent).
   - `OutputText` (or equivalent text field) equal to the `Content` string you set on the assistant message (`"assistant text that should become an OutputItem message"`).
4. Assert the ordering: depending on the intended contract, either:
   - All function_call items come first followed by the message item, or
   - The message item precedes/follows the tool calls as defined by `TransformChatCompletionToResponse`.

These assertions should mirror the existing patterns used in your plain-text-only and tool-calls-only tests so the new mixed-content test locks in both ordering and data preservation.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pkg/responses/transform.go

pkg/responses/streaming.go

pkg/responses/handler.go

pkg/responses/handler_test.go

pkg/responses/handler.go

gemini-code-assist

Code Review

This pull request introduces a comprehensive compatibility layer for the OpenAI Responses API. It adds a new responses package that handles request transformation, stateful conversation management via an in-memory store with TTL, and both streaming and non-streaming responses. The implementation is thorough and includes a good set of tests. My review has identified a routing bug for POST requests, some dead code that should be removed, and opportunities for refactoring to improve maintainability and robustness.

pkg/responses/api.go

pkg/responses/handler.go

pkg/responses/streaming.go

ericcurtin · 2026-01-06T21:57:00Z

#539

ilopezluna

I have added some comments and questions, nothing blocking, but I think we should not revert the changes in adapter.go and api.go as I think they are not needed. Let me know what you think!

pkg/inference/models/adapter.go

pkg/inference/models/api.go

main.go

pkg/responses/api.go

pkg/responses/transform.go

The OpenAI Responses API provides a stateful interface combining chat completions with conversation management and tool use. This change introduces a new API endpoint at /responses that transforms requests to the standard chat completion format and handles both streaming and non-streaming responses. Includes in-memory storage with TTL for response state management and comprehensive test coverage. Signed-off-by: Eric Curtin <[email protected]>

ericcurtin · 2026-01-07T12:29:16Z

@ilopezluna I addressed some of the review comments in the last push. Merging.

doringeman

Nice!

sourcery-ai bot reviewed Jan 6, 2026

View reviewed changes

github-advanced-security bot found potential problems Jan 6, 2026

View reviewed changes

pkg/responses/handler.go Fixed Show fixed Hide fixed

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

pkg/responses/api.go Show resolved Hide resolved

pkg/responses/handler.go Show resolved Hide resolved

pkg/responses/handler.go Show resolved Hide resolved

pkg/responses/handler.go Outdated Show resolved Hide resolved

pkg/responses/streaming.go Show resolved Hide resolved

ericcurtin force-pushed the responses branch 10 times, most recently from 3bf06fe to 4da1288 Compare January 6, 2026 19:29

ericcurtin force-pushed the responses branch 3 times, most recently from ef1f29d to acf89b2 Compare January 7, 2026 11:32

ilopezluna approved these changes Jan 7, 2026

View reviewed changes

ericcurtin force-pushed the responses branch from acf89b2 to 2fa8cc1 Compare January 7, 2026 12:23

ericcurtin merged commit e93a023 into main Jan 7, 2026
9 checks passed

ericcurtin deleted the responses branch January 7, 2026 12:29

doringeman reviewed Jan 8, 2026

View reviewed changes

doringeman mentioned this pull request Jan 8, 2026

Add Docker Model Runner documentation docker/docs#23936

Open

3 tasks

Add OpenAI Responses API compatibility layer #540

Add OpenAI Responses API compatibility layer #540

Conversation

ericcurtin commented Jan 6, 2026

Uh oh!

gemini-code-assist bot commented Jan 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Jan 6, 2026

Uh oh!

ilopezluna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericcurtin commented Jan 7, 2026

Uh oh!

Uh oh!

doringeman left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants