feat: unify reasoning_content + thinking_blocks across providers (v0.4.9)#14
Conversation
Three drop-in gaps prevented dynamiq's test fixtures from passing against arcllm even though direct API calls worked. Exception positional args: litellm's exception classes take (message, llm_provider, model, ...) positionally. arcllm previously made these keyword-only. Tests construct errors as `RateLimitError(msg, "bedrock", "amazon.titan")` which raised "takes 2 positional arguments but 4 were given". - ArcLLMError: provider/model/status_code now positional after message; llm_provider stays keyword-only as the litellm-name alias - RateLimitError: accepts (message, provider, model) positionally - ProviderAPIError: detects litellm shape (status_code, message, ...) by type — first int positional becomes status_code - BadRequestError (renamed from InvalidRequestError to match the canonical litellm/OpenAI name; InvalidRequestError stays as alias): accepts (message, model, provider) per litellm AND (message, provider, model) per arcllm. Disambiguates by checking SUPPORTED_PROVIDERS — common provider names always resolve correctly. Streaming chunk serialisation: Choice.model_dump() omitted .delta. dynamiq's streaming callback reads chunk["choices"][0]["delta"]["content"] from the serialized dict, so it saw KeyError on every streamed event. token_counter overhead: Counts now follow OpenAI's per-message formula (3 + per-key + 1 for name + 3 priming) so totals match litellm's. Previous sum-of-fields undercount made dynamiq's history-summarisation logic preserve more context than the model could actually accept. ModelResponse defaults: - choices defaults to [Choice()] so fixtures that do ModelResponse()["choices"][0]["message"]["content"] = ... work - stream: bool = False added so ModelResponse(stream=True) is accepted - Choice.delta added so streaming fixtures can set delta on the same Choice class litellm uses for both modes Result: dynamiq main suite goes from 281 integration failures → 0 (1066 integration + 1149 unit, all passing). arcllm's own test suite unchanged (8 pre-existing Ollama integration failures only).
…4.9) Reasoning-capable models (DeepSeek-R1, GLM-4.5+, Anthropic Claude with extended thinking, Gemini 2.5 with includeThoughts, Groq DeepSeek/Qwen, Cerebras Qwen-thinking, Together / Fireworks DeepSeek-R1, OpenAI o-series via chat/completions) all expose chain-of-thought, but each family uses a different field name. Previously arcllm dropped this entirely on the floor — callers could see the final answer but not the thinking. This wires up a unified surface: - Message.reasoning_content: str — flat-string CoT, populated by every reasoning provider - Message.thinking_blocks: list[ThinkingBlock] — Anthropic's structured form (thinking | redacted_thinking, with signatures preserved for tool-use round-trips) - ChunkDelta.reasoning_content / .thinking / .signature — streaming deltas Provider mapping: - OpenAIAdapter (and DeepSeek, GLM, Groq, Cerebras, Together, Fireworks, Nebius, OVHcloud, Moonshot, OpenRouter, Perplexity — all subclasses): reads message.reasoning_content or message.reasoning from the response; same for delta.reasoning_content / .reasoning in stream events. - AnthropicAdapter: extracts content[].type=="thinking" and "redacted_thinking" blocks; populates both thinking_blocks (with signature) and a concatenated reasoning_content. Streaming handles thinking_delta / signature_delta with one block per signature. - GeminiAdapter: routes parts[].thought=true text into reasoning_content (non-thought parts stay in content). Same split for streaming. stream_chunk_builder accumulates reasoning across chunks and rebuilds Anthropic's per-block grouping (signature_delta closes a block). Verified live end-to-end through arcllm.completion: Z.AI GLM-4.5-air content="5" reasoning_len=730 DeepSeek-R1 content="5" reasoning_len=67 Claude Sonnet 4.5 content="5" reasoning_len=101 thinking_blocks=1 (sig) Gemini 2.5 Flash content="5" reasoning_len=406 Streaming verified for all four — Anthropic's thinking_delta + signature_delta correctly group into a single ThinkingBlock with the signature attached. 18 new unit tests cover wire-format parsing for every provider plus stream_chunk_builder. arcllm own suite: 792 passed (was 782). dynamiq integration suite unaffected: 1149 unit + 1066 integration, all passing.
| kwargs.setdefault("provider", arg2) | ||
| kwargs.setdefault("model", arg3) | ||
| elif arg2 is not None: | ||
| kwargs.setdefault("provider", arg2) |
There was a problem hiding this comment.
Single positional arg always treated as provider incorrectly
Medium Severity
When only arg2 is provided (without arg3), the elif arg2 is not None branch unconditionally treats it as provider. However, the docstring and litellm's documented signature BadRequestError(message, model, llm_provider) indicate the second positional is the model. If litellm callers pass only two positional args (message + model), the model name would be incorrectly stored as provider. The SUPPORTED_PROVIDERS heuristic is only applied when both arg2 and arg3 are present, leaving this single-arg case mishandled.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 4dab9eb. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b5552ff. Configure here.
| finish_reason=None, | ||
| ) | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Streaming silently drops redacted_thinking blocks
Medium Severity
The Anthropic streaming handler in parse_stream_event handles content_block_start for type=="thinking" but silently drops type=="redacted_thinking" blocks (returns None). The non-streaming _build_model_response correctly preserves redacted_thinking blocks. Anthropic's streaming protocol does emit content_block_start with type: "redacted_thinking", and these blocks must be preserved unchanged for multi-turn conversation history. Additionally, stream_chunk_builder hardcodes type="thinking" for all assembled blocks, making it impossible to represent redacted_thinking even if the adapter were to emit them.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b5552ff. Configure here.
| tool_calls=tool_calls, | ||
| function_call=delta_data.get("function_call"), | ||
| reasoning_content=delta_data.get("reasoning_content") | ||
| or delta_data.get("reasoning"), |
There was a problem hiding this comment.
Falsy or conflates empty string with absent field
Low Severity
Using or to fall back from reasoning_content to reasoning means an explicit empty string "" in reasoning_content is treated as absent, falling through to the reasoning field. If a provider legitimately sends both fields (e.g., reasoning_content: "" alongside reasoning: null), the result is None rather than "". While semantically an empty string contributes nothing, it prevents callers from distinguishing "field present but empty" from "field absent" via is not None checks on the resulting Message.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b5552ff. Configure here.


Summary
Reasoning-capable models (DeepSeek-R1, GLM-4.5+, Anthropic Claude with extended thinking, Gemini 2.5 with `includeThoughts`, Groq DeepSeek/Qwen-thinking, Cerebras Qwen-thinking, Together / Fireworks DeepSeek-R1, OpenAI o-series via `/chat/completions`) all expose chain-of-thought, but each family uses a different field name. Previously arcllm dropped this entirely on the floor — callers could see the final answer but not the thinking.
This wires up a unified surface:
Provider mapping
`stream_chunk_builder` accumulates reasoning across chunks and rebuilds Anthropic's per-block grouping (`signature_delta` closes a block).
Live verification (through arcllm.completion)
Streaming verified for all four — Anthropic's `thinking_delta` + `signature_delta` correctly group into a single `ThinkingBlock` with the signature attached.
Test plan
Coverage gap (not in this PR)
OpenAI's Responses API (`/v1/responses`, not `/v1/chat/completions`) returns reasoning as `output[].type=="reasoning"` items. arcllm only uses chat/completions today, so this didn't surface. If a Responses adapter is added, `Message.reasoning_items` (litellm's name) would be the natural extension.
Note
Medium Risk
Adds new response/stream fields and modifies core stream aggregation and multiple provider parsers, which could affect downstream consumers expecting the previous response shape or streaming semantics. Provider-specific handling (especially Anthropic streaming block grouping/signatures) increases edge-case risk but is well-covered by new tests.
Overview
Adds first-class support for reasoning/extended-thinking outputs by introducing
Message.reasoning_content(flat string) and Anthropic-specificMessage.thinking_blocks/ThinkingBlock, plus streaming deltas (ChunkDelta.reasoning_content,thinking,signature).Updates OpenAI-compatible parsing to accept both
reasoning_contentand OpenAI’sreasoningalias; updates Gemini parsing to separateparts[].thoughtintoreasoning_content; and updates Anthropic parsing/streaming to preservethinking/redacted_thinkingblocks (including signatures) and emit thinking/signature stream deltas.Enhances
stream_chunk_builderto accumulate reasoning across chunks and to rebuild per-choice Anthropic thinking blocks (with a fallback that populatesreasoning_contentfrom blocks), bumps version to0.4.9, and adds extensive unit tests covering parsing, streaming accumulation, and serialization.Reviewed by Cursor Bugbot for commit b5552ff. Bugbot is set up for automated code reviews on this repo. Configure here.