feat: unify reasoning_content + thinking_blocks across providers (v0.4.9) by vitalii-dynamiq · Pull Request #14 · dynamiq-ai/arcllm

vitalii-dynamiq · 2026-05-07T08:58:11Z

Summary

Reasoning-capable models (DeepSeek-R1, GLM-4.5+, Anthropic Claude with extended thinking, Gemini 2.5 with `includeThoughts`, Groq DeepSeek/Qwen-thinking, Cerebras Qwen-thinking, Together / Fireworks DeepSeek-R1, OpenAI o-series via `/chat/completions`) all expose chain-of-thought, but each family uses a different field name. Previously arcllm dropped this entirely on the floor — callers could see the final answer but not the thinking.

This wires up a unified surface:

`Message.reasoning_content: str` — flat-string CoT, populated by every reasoning provider
`Message.thinking_blocks: list[ThinkingBlock]` — Anthropic's structured form (`thinking` | `redacted_thinking`, with signatures preserved for tool-use round-trips)
`ChunkDelta.reasoning_content / .thinking / .signature` — streaming deltas

Provider mapping

OpenAIAdapter (and DeepSeek, GLM, Groq, Cerebras, Together, Fireworks, Nebius, OVHcloud, Moonshot, OpenRouter, Perplexity — every OpenAI-compat subclass): reads `message.reasoning_content` or `message.reasoning`; same for `delta.reasoning_content / .reasoning` in stream events.
AnthropicAdapter: extracts `content[].type=="thinking"` and `"redacted_thinking"` blocks; populates both `thinking_blocks` (with signature) and a concatenated `reasoning_content`. Streaming handles `thinking_delta` / `signature_delta` with one block per signature.
GeminiAdapter: routes `parts[].thought=true` text into `reasoning_content` (non-thought parts stay in `content`). Same split for streaming.

`stream_chunk_builder` accumulates reasoning across chunks and rebuilds Anthropic's per-block grouping (`signature_delta` closes a block).

Live verification (through arcllm.completion)

Provider	content	reasoning_len	thinking_blocks
Z.AI GLM-4.5-air	`"5"`	730	—
DeepSeek-R1	`"5"`	67	—
Claude Sonnet 4.5	`"5"`	101	1 (signature ✅)
Gemini 2.5 Flash	`"5"`	406	—

Streaming verified for all four — Anthropic's `thinking_delta` + `signature_delta` correctly group into a single `ThinkingBlock` with the signature attached.

Test plan

18 new unit tests covering wire-format parsing per provider + `stream_chunk_builder` accumulation
arcllm full unit suite: 792 passed (was 782)
ruff / mypy --strict / pyright clean on changed files
dynamiq unit tests: 1149 passed (no regressions from the new fields)
dynamiq integration tests: 1066 passed
Live smoke against four reasoning families through real APIs

Coverage gap (not in this PR)

OpenAI's Responses API (`/v1/responses`, not `/v1/chat/completions`) returns reasoning as `output[].type=="reasoning"` items. arcllm only uses chat/completions today, so this didn't surface. If a Responses adapter is added, `Message.reasoning_items` (litellm's name) would be the natural extension.

Note

Medium Risk
Adds new response/stream fields and modifies core stream aggregation and multiple provider parsers, which could affect downstream consumers expecting the previous response shape or streaming semantics. Provider-specific handling (especially Anthropic streaming block grouping/signatures) increases edge-case risk but is well-covered by new tests.

Overview
Adds first-class support for reasoning/extended-thinking outputs by introducing Message.reasoning_content (flat string) and Anthropic-specific Message.thinking_blocks/ThinkingBlock, plus streaming deltas (ChunkDelta.reasoning_content, thinking, signature).

Updates OpenAI-compatible parsing to accept both reasoning_content and OpenAI’s reasoning alias; updates Gemini parsing to separate parts[].thought into reasoning_content; and updates Anthropic parsing/streaming to preserve thinking/redacted_thinking blocks (including signatures) and emit thinking/signature stream deltas.

Enhances stream_chunk_builder to accumulate reasoning across chunks and to rebuild per-choice Anthropic thinking blocks (with a fallback that populates reasoning_content from blocks), bumps version to 0.4.9, and adds extensive unit tests covering parsing, streaming accumulation, and serialization.

^{Reviewed by Cursor Bugbot for commit b5552ff. Bugbot is set up for automated code reviews on this repo. Configure here.}

Three drop-in gaps prevented dynamiq's test fixtures from passing against arcllm even though direct API calls worked. Exception positional args: litellm's exception classes take (message, llm_provider, model, ...) positionally. arcllm previously made these keyword-only. Tests construct errors as `RateLimitError(msg, "bedrock", "amazon.titan")` which raised "takes 2 positional arguments but 4 were given". - ArcLLMError: provider/model/status_code now positional after message; llm_provider stays keyword-only as the litellm-name alias - RateLimitError: accepts (message, provider, model) positionally - ProviderAPIError: detects litellm shape (status_code, message, ...) by type — first int positional becomes status_code - BadRequestError (renamed from InvalidRequestError to match the canonical litellm/OpenAI name; InvalidRequestError stays as alias): accepts (message, model, provider) per litellm AND (message, provider, model) per arcllm. Disambiguates by checking SUPPORTED_PROVIDERS — common provider names always resolve correctly. Streaming chunk serialisation: Choice.model_dump() omitted .delta. dynamiq's streaming callback reads chunk["choices"][0]["delta"]["content"] from the serialized dict, so it saw KeyError on every streamed event. token_counter overhead: Counts now follow OpenAI's per-message formula (3 + per-key + 1 for name + 3 priming) so totals match litellm's. Previous sum-of-fields undercount made dynamiq's history-summarisation logic preserve more context than the model could actually accept. ModelResponse defaults: - choices defaults to [Choice()] so fixtures that do ModelResponse()["choices"][0]["message"]["content"] = ... work - stream: bool = False added so ModelResponse(stream=True) is accepted - Choice.delta added so streaming fixtures can set delta on the same Choice class litellm uses for both modes Result: dynamiq main suite goes from 281 integration failures → 0 (1066 integration + 1149 unit, all passing). arcllm's own test suite unchanged (8 pre-existing Ollama integration failures only).

…4.9) Reasoning-capable models (DeepSeek-R1, GLM-4.5+, Anthropic Claude with extended thinking, Gemini 2.5 with includeThoughts, Groq DeepSeek/Qwen, Cerebras Qwen-thinking, Together / Fireworks DeepSeek-R1, OpenAI o-series via chat/completions) all expose chain-of-thought, but each family uses a different field name. Previously arcllm dropped this entirely on the floor — callers could see the final answer but not the thinking. This wires up a unified surface: - Message.reasoning_content: str — flat-string CoT, populated by every reasoning provider - Message.thinking_blocks: list[ThinkingBlock] — Anthropic's structured form (thinking | redacted_thinking, with signatures preserved for tool-use round-trips) - ChunkDelta.reasoning_content / .thinking / .signature — streaming deltas Provider mapping: - OpenAIAdapter (and DeepSeek, GLM, Groq, Cerebras, Together, Fireworks, Nebius, OVHcloud, Moonshot, OpenRouter, Perplexity — all subclasses): reads message.reasoning_content or message.reasoning from the response; same for delta.reasoning_content / .reasoning in stream events. - AnthropicAdapter: extracts content[].type=="thinking" and "redacted_thinking" blocks; populates both thinking_blocks (with signature) and a concatenated reasoning_content. Streaming handles thinking_delta / signature_delta with one block per signature. - GeminiAdapter: routes parts[].thought=true text into reasoning_content (non-thought parts stay in content). Same split for streaming. stream_chunk_builder accumulates reasoning across chunks and rebuilds Anthropic's per-block grouping (signature_delta closes a block). Verified live end-to-end through arcllm.completion: Z.AI GLM-4.5-air content="5" reasoning_len=730 DeepSeek-R1 content="5" reasoning_len=67 Claude Sonnet 4.5 content="5" reasoning_len=101 thinking_blocks=1 (sig) Gemini 2.5 Flash content="5" reasoning_len=406 Streaming verified for all four — Anthropic's thinking_delta + signature_delta correctly group into a single ThinkingBlock with the signature attached. 18 new unit tests cover wire-format parsing for every provider plus stream_chunk_builder. arcllm own suite: 792 passed (was 782). dynamiq integration suite unaffected: 1149 unit + 1066 integration, all passing.

cursor · 2026-05-07T09:08:43Z

+                kwargs.setdefault("provider", arg2)
+                kwargs.setdefault("model", arg3)
+        elif arg2 is not None:
+            kwargs.setdefault("provider", arg2)


Single positional arg always treated as provider incorrectly

Medium Severity

When only arg2 is provided (without arg3), the elif arg2 is not None branch unconditionally treats it as provider. However, the docstring and litellm's documented signature BadRequestError(message, model, llm_provider) indicate the second positional is the model. If litellm callers pass only two positional args (message + model), the model name would be incorrectly stored as provider. The SUPPORTED_PROVIDERS heuristic is only applied when both arg2 and arg3 are present, leaving this single-arg case mishandled.

Additional Locations (1)

arcllm/exceptions.py#L270-L276

^{Reviewed by Cursor Bugbot for commit 4dab9eb. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit b5552ff. Configure here.}

cursor · 2026-05-10T14:56:53Z

+                            finish_reason=None,
+                        )
+                    ],
+                )


Streaming silently drops redacted_thinking blocks

Medium Severity

The Anthropic streaming handler in parse_stream_event handles content_block_start for type=="thinking" but silently drops type=="redacted_thinking" blocks (returns None). The non-streaming _build_model_response correctly preserves redacted_thinking blocks. Anthropic's streaming protocol does emit content_block_start with type: "redacted_thinking", and these blocks must be preserved unchanged for multi-turn conversation history. Additionally, stream_chunk_builder hardcodes type="thinking" for all assembled blocks, making it impossible to represent redacted_thinking even if the adapter were to emit them.

Additional Locations (1)

arcllm/core.py#L679-L688

^{Reviewed by Cursor Bugbot for commit b5552ff. Configure here.}

cursor · 2026-05-10T14:56:53Z

                tool_calls=tool_calls,
                function_call=delta_data.get("function_call"),
+                reasoning_content=delta_data.get("reasoning_content")
+                or delta_data.get("reasoning"),


Falsy or conflates empty string with absent field

Low Severity

Using or to fall back from reasoning_content to reasoning means an explicit empty string "" in reasoning_content is treated as absent, falling through to the reasoning field. If a provider legitimately sends both fields (e.g., reasoning_content: "" alongside reasoning: null), the result is None rather than "". While semantically an empty string contributes nothing, it prevents callers from distinguishing "field present but empty" from "field absent" via is not None checks on the resulting Message.

Additional Locations (1)

arcllm/providers/openai_adapter.py#L239-L242

^{Reviewed by Cursor Bugbot for commit b5552ff. Configure here.}

vitalii-dynamiq added 2 commits May 7, 2026 12:02

cursor Bot reviewed May 7, 2026

View reviewed changes

Merge branch 'main' into feat/reasoning-content-unified

b5552ff

vitalii-dynamiq merged commit a899e5a into main May 10, 2026
15 checks passed

cursor Bot reviewed May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: unify reasoning_content + thinking_blocks across providers (v0.4.9)#14

feat: unify reasoning_content + thinking_blocks across providers (v0.4.9)#14
vitalii-dynamiq merged 3 commits into
mainfrom
feat/reasoning-content-unified

vitalii-dynamiq commented May 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot May 7, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 10, 2026

Uh oh!

cursor Bot May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vitalii-dynamiq commented May 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Provider mapping

Live verification (through arcllm.completion)

Test plan

Coverage gap (not in this PR)

Uh oh!

cursor Bot May 7, 2026

Choose a reason for hiding this comment

Single positional arg always treated as provider incorrectly

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 10, 2026

Choose a reason for hiding this comment

Streaming silently drops redacted_thinking blocks

Uh oh!

cursor Bot May 10, 2026

Choose a reason for hiding this comment

Falsy or conflates empty string with absent field

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vitalii-dynamiq commented May 7, 2026 •

edited by cursor Bot

Loading

Falsy `or` conflates empty string with absent field