Feat/dynamiq parity validation by vitalii-dynamiq · Pull Request #16 · dynamiq-ai/arcllm

vitalii-dynamiq · 2026-05-10T15:46:29Z

Note

Medium Risk
Medium risk because it expands the public import surface and subtly changes request-kwarg handling via a module-level drop_params default that can affect all calls when toggled. Changes are mostly additive/compat stubs, but mis-aliasing could break downstream imports or hide unsupported-parameter errors.

Overview
Improves litellm drop-in compatibility by exporting additional symbols from arcllm (typed message/tool-call factories, Choices, streaming aliases, ContextWindowExceededError, supports_response_schema), adding import-path shims (arcllm.types.utils, arcllm.utils), and introducing passive stubs for CustomLogger, Router, and acreate_file.

Adds global drop_params behavior: completion()/acompletion() now honor a module-level arcllm.drop_params toggle (when no per-call override is provided) to silently drop unsupported kwargs.

Expands validation + docs with a new @pytest.mark.live agentic-parity integration matrix wired into the nightly workflow, plus new tests/test_litellm_compat.py to pin the compat surface and feasibility reports for Google ADK and langchain-litellm.

^{Reviewed by Cursor Bugbot for commit 8708ef5. Bugbot is set up for automated code reviews on this repo. Configure here.}

…for smoke tests only

New ``tests/integration/test_agentic_parity.py`` enumerates every chat or reason model in the OpenAI / Anthropic / Gemini / Groq / xAI / Mistral / Cohere / Together AI / Fireworks AI capability tables and parametrizes four agentic-surface tests over each row: - ``test_streaming`` — non-empty streamed content chunks - ``test_tool_calling`` — tool_calls or content for a weather prompt - ``test_structured_output`` — valid JSON when response_format=json_object - ``test_reasoning_content_emitted`` — reasoning_content or thinking_blocks Each test honours the per-model capability flags and skips when the provider's API key is missing — running without keys produces 692 clean skips, which is what unit CI sees. Wired into ``.github/workflows/integration.yml`` as a new matrix entry gated on ``OPENAI_API_KEY`` (always present in CI). The job receives all provider secrets via the existing env-block, so per-row tests resolve their own keys at runtime. Registers the ``live`` pytest marker.

Three targeted README updates that reflect work already on dynamiq/main but not yet documented: - New "Migrating from litellm" section: import-path mapping table plus a one-liner swap example. Frames arcllm as a general-purpose drop-in, not dynamiq-specific. Validated counts (1148 unit + 986 integration tests pass for the dynamiq agentic framework against arcllm). - Reasoning surface: short paragraph on the unified ``Message.reasoning_content`` + ``Message.thinking_blocks`` fields (lifted from v0.4.9's cross-provider unification work). Lists every provider that populates ``reasoning_content``. - Provider count: 28 -> 30 to match ``len(arcllm.providers.base.SUPPORTED_PROVIDERS)``. No content removed; no example code changed; no API surface implied that isn't already shipped.

Additive-only shim surface that lets code which hard-codes `import litellm` resolve every symbol it touches through arcllm after a one-line import swap (or via a `sys.modules`-aliased `litellm` shim package). Validated against the unit suites of Google ADK (253/256, only fixture-side `finish_reason` normalization failures) and langchain-litellm (110/110 — full parity). Surface added: * `Choices` (alias of `Choice`), `ChatCompletionDeltaToolCall` (= `dict`), `ChatCompletionAssistantMessage` / `…UserMessage` / `…SystemMessage` / `…ToolMessage` / `…AssistantToolCall` / `…MessageToolCall` (= `_AttrDict`, a `dict` subclass with attribute access so callers that mix `obj["x"]` and `obj.x` styles both work), `Function`, `FileObject`, `OpenAIMessageContent` (= `list`), `StreamingChoices` (= `ChunkChoice`), `ModelResponseStream` (= `StreamChunk`). * `arcllm.types.utils` and `arcllm.utils` submodule paths via `sys.modules` aliases (zero duplication). * `arcllm.integrations.custom_logger.CustomLogger` no-op subclassable stub. * `arcllm.acreate_file` and `arcllm.Router` `NotImplementedError` stubs for surfaces arcllm doesn't yet implement — keeps the import path unblocked and surfaces a clear local failure on actual invocation. * `arcllm.add_function_to_prompt: bool = False` passive module attr. * `arcllm.success_callback` / `_async_success_callback` / `failure_callback` / `callbacks` passive lists. * `arcllm.drop_params: bool = False` module-level toggle — when `True` and no per-call `drop_params=` is passed, `completion()` / `acompletion()` honor it as the default (one `sys.modules` lookup per call, ~sub-microsecond cost). * `ContextWindowExceededError(BadRequestError)` exception class. * `supports_response_schema` (alias of `supports_structured_output`). `tests/test_litellm_compat.py` adds 25 import-surface tests pinning every alias, stub, module-level attr, and submodule path. Tests deliberately describe the litellm contract being shimmed (not specific consumers) so the docs survive consumer churn. Total impact: 813 → 817 arcllm unit tests pass; no regressions in the existing suite.

End-to-end validation reports for two highest-leverage agentic frameworks that hard-code `import litellm`: * Google ADK (google/adk-python @ main 2026-05-12, v1.33.0 + litellm 1.83.7 baseline): 253/256 LiteLlm unit tests pass via the arcllm-as-litellm shim. The 3 failures are a single category — fixture-side `ModelResponse` construction without explicit `finish_reason` where litellm normalizes the field to "stop" and arcllm preserves `None`. Doesn't affect real provider responses. * langchain-litellm (langchain-ai/langchain-litellm @ main 2026-05-13, v0.6.5 + litellm 1.83.14 baseline): 110/110 unit tests pass — full parity with the real-litellm baseline. Across both validations: zero genuine arcllm bugs surfaced. The drop-in claim holds.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2220ca5. Configure here.}

`ruff check arcllm tests` flagged: * `arcllm/__init__.py` — `Any` used in shim signatures without import (`acreate_file(..., **_: Any) -> Any` and `Router.__init__(*args: Any, **kwargs: Any)`). Added `from typing import Any`. * `arcllm/types.py` — `import sys as _sys` was placed beside the `sys.modules` registration at the bottom of the file (E402: module-level import not at top). Moved to the top-of-file import block. * `arcllm/__init__.py`, `arcllm/core.py`, `tests/test_litellm_compat.py` — ruff `--fix` reorganized 5 import blocks. All 817 unit tests still pass.

CI's `Lint` job runs both `ruff check` and `ruff format --check`. The prior fix satisfied `ruff check` but four files still triggered reformat-on-CI: `arcllm/core.py`, `arcllm/types.py`, `tests/integration/test_agentic_parity.py`, `tests/test_litellm_compat.py`. Pure whitespace / quote-style normalization — no logic changes. 817 unit tests still pass.

CI's `Lint` job also runs `mypy arcllm --strict`. Two type-arg gaps: * `arcllm/types.py:717` — ``class _AttrDict(dict):`` → ``dict[str, Any]``. * `arcllm/__init__.py:281-284` — ``success_callback: list = []`` (and three siblings) → ``list[Any]``. These passive litellm-compat attributes intentionally accept arbitrary callback objects. Verified locally: ``mypy --strict``, ``ruff check``, ``ruff format --check`` all clean. 817 unit tests still pass.

vitalii-dynamiq added 5 commits May 10, 2026 19:10

chore(gitignore): exclude /catalyst/ — external sibling service used …

076aaff

…for smoke tests only

cursor Bot reviewed May 13, 2026

View reviewed changes

Comment thread docs/langchain-litellm-dropin-report.md

vitalii-dynamiq added 3 commits May 13, 2026 18:33

vitalii-dynamiq merged commit 7dd82af into main May 14, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/dynamiq parity validation#16

Feat/dynamiq parity validation#16
vitalii-dynamiq merged 8 commits into
mainfrom
feat/dynamiq-parity-validation

vitalii-dynamiq commented May 10, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vitalii-dynamiq commented May 10, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vitalii-dynamiq commented May 10, 2026 •

edited by cursor Bot

Loading