Skip to content

Feat/dynamiq parity validation#16

Merged
vitalii-dynamiq merged 8 commits into
mainfrom
feat/dynamiq-parity-validation
May 14, 2026
Merged

Feat/dynamiq parity validation#16
vitalii-dynamiq merged 8 commits into
mainfrom
feat/dynamiq-parity-validation

Conversation

@vitalii-dynamiq

@vitalii-dynamiq vitalii-dynamiq commented May 10, 2026

Copy link
Copy Markdown
Contributor

Note

Medium Risk
Medium risk because it expands the public import surface and subtly changes request-kwarg handling via a module-level drop_params default that can affect all calls when toggled. Changes are mostly additive/compat stubs, but mis-aliasing could break downstream imports or hide unsupported-parameter errors.

Overview
Improves litellm drop-in compatibility by exporting additional symbols from arcllm (typed message/tool-call factories, Choices, streaming aliases, ContextWindowExceededError, supports_response_schema), adding import-path shims (arcllm.types.utils, arcllm.utils), and introducing passive stubs for CustomLogger, Router, and acreate_file.

Adds global drop_params behavior: completion()/acompletion() now honor a module-level arcllm.drop_params toggle (when no per-call override is provided) to silently drop unsupported kwargs.

Expands validation + docs with a new @pytest.mark.live agentic-parity integration matrix wired into the nightly workflow, plus new tests/test_litellm_compat.py to pin the compat surface and feasibility reports for Google ADK and langchain-litellm.

Reviewed by Cursor Bugbot for commit 8708ef5. Bugbot is set up for automated code reviews on this repo. Configure here.

New ``tests/integration/test_agentic_parity.py`` enumerates every chat or
reason model in the OpenAI / Anthropic / Gemini / Groq / xAI / Mistral /
Cohere / Together AI / Fireworks AI capability tables and parametrizes
four agentic-surface tests over each row:

- ``test_streaming``               — non-empty streamed content chunks
- ``test_tool_calling``            — tool_calls or content for a weather prompt
- ``test_structured_output``       — valid JSON when response_format=json_object
- ``test_reasoning_content_emitted`` — reasoning_content or thinking_blocks

Each test honours the per-model capability flags and skips when the
provider's API key is missing — running without keys produces 692 clean
skips, which is what unit CI sees.

Wired into ``.github/workflows/integration.yml`` as a new matrix entry
gated on ``OPENAI_API_KEY`` (always present in CI). The job receives all
provider secrets via the existing env-block, so per-row tests resolve
their own keys at runtime.

Registers the ``live`` pytest marker.
Three targeted README updates that reflect work already on dynamiq/main
but not yet documented:

- New "Migrating from litellm" section: import-path mapping table plus
  a one-liner swap example. Frames arcllm as a general-purpose drop-in,
  not dynamiq-specific. Validated counts (1148 unit + 986 integration
  tests pass for the dynamiq agentic framework against arcllm).
- Reasoning surface: short paragraph on the unified
  ``Message.reasoning_content`` + ``Message.thinking_blocks`` fields
  (lifted from v0.4.9's cross-provider unification work). Lists every
  provider that populates ``reasoning_content``.
- Provider count: 28 -> 30 to match
  ``len(arcllm.providers.base.SUPPORTED_PROVIDERS)``.

No content removed; no example code changed; no API surface implied that
isn't already shipped.
Additive-only shim surface that lets code which hard-codes `import
litellm` resolve every symbol it touches through arcllm after a one-line
import swap (or via a `sys.modules`-aliased `litellm` shim package).
Validated against the unit suites of Google ADK (253/256, only
fixture-side `finish_reason` normalization failures) and langchain-litellm
(110/110 — full parity).

Surface added:

* `Choices` (alias of `Choice`), `ChatCompletionDeltaToolCall` (= `dict`),
  `ChatCompletionAssistantMessage` / `…UserMessage` / `…SystemMessage` /
  `…ToolMessage` / `…AssistantToolCall` / `…MessageToolCall` (= `_AttrDict`,
  a `dict` subclass with attribute access so callers that mix
  `obj["x"]` and `obj.x` styles both work), `Function`, `FileObject`,
  `OpenAIMessageContent` (= `list`), `StreamingChoices` (= `ChunkChoice`),
  `ModelResponseStream` (= `StreamChunk`).
* `arcllm.types.utils` and `arcllm.utils` submodule paths via
  `sys.modules` aliases (zero duplication).
* `arcllm.integrations.custom_logger.CustomLogger` no-op subclassable stub.
* `arcllm.acreate_file` and `arcllm.Router` `NotImplementedError` stubs
  for surfaces arcllm doesn't yet implement — keeps the import path
  unblocked and surfaces a clear local failure on actual invocation.
* `arcllm.add_function_to_prompt: bool = False` passive module attr.
* `arcllm.success_callback` / `_async_success_callback` /
  `failure_callback` / `callbacks` passive lists.
* `arcllm.drop_params: bool = False` module-level toggle — when `True`
  and no per-call `drop_params=` is passed, `completion()` /
  `acompletion()` honor it as the default (one `sys.modules` lookup
  per call, ~sub-microsecond cost).
* `ContextWindowExceededError(BadRequestError)` exception class.
* `supports_response_schema` (alias of `supports_structured_output`).

`tests/test_litellm_compat.py` adds 25 import-surface tests pinning
every alias, stub, module-level attr, and submodule path. Tests
deliberately describe the litellm contract being shimmed (not specific
consumers) so the docs survive consumer churn.

Total impact: 813 → 817 arcllm unit tests pass; no regressions in the
existing suite.
End-to-end validation reports for two highest-leverage agentic frameworks
that hard-code `import litellm`:

* Google ADK (google/adk-python @ main 2026-05-12, v1.33.0 + litellm
  1.83.7 baseline): 253/256 LiteLlm unit tests pass via the
  arcllm-as-litellm shim. The 3 failures are a single category —
  fixture-side `ModelResponse` construction without explicit
  `finish_reason` where litellm normalizes the field to "stop" and
  arcllm preserves `None`. Doesn't affect real provider responses.

* langchain-litellm (langchain-ai/langchain-litellm @ main 2026-05-13,
  v0.6.5 + litellm 1.83.14 baseline): 110/110 unit tests pass — full
  parity with the real-litellm baseline.

Across both validations: zero genuine arcllm bugs surfaced. The drop-in
claim holds.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2220ca5. Configure here.

Comment thread docs/langchain-litellm-dropin-report.md
`ruff check arcllm tests` flagged:

* `arcllm/__init__.py` — `Any` used in shim signatures without import
  (`acreate_file(..., **_: Any) -> Any` and `Router.__init__(*args: Any,
  **kwargs: Any)`). Added `from typing import Any`.
* `arcllm/types.py` — `import sys as _sys` was placed beside the
  `sys.modules` registration at the bottom of the file (E402: module-level
  import not at top). Moved to the top-of-file import block.
* `arcllm/__init__.py`, `arcllm/core.py`, `tests/test_litellm_compat.py` —
  ruff `--fix` reorganized 5 import blocks.

All 817 unit tests still pass.
CI's `Lint` job runs both `ruff check` and `ruff format --check`. The
prior fix satisfied `ruff check` but four files still triggered
reformat-on-CI: `arcllm/core.py`, `arcllm/types.py`,
`tests/integration/test_agentic_parity.py`, `tests/test_litellm_compat.py`.

Pure whitespace / quote-style normalization — no logic changes. 817
unit tests still pass.
CI's `Lint` job also runs `mypy arcllm --strict`. Two type-arg gaps:

* `arcllm/types.py:717` — ``class _AttrDict(dict):`` → ``dict[str, Any]``.
* `arcllm/__init__.py:281-284` — ``success_callback: list = []`` (and
  three siblings) → ``list[Any]``. These passive litellm-compat
  attributes intentionally accept arbitrary callback objects.

Verified locally: ``mypy --strict``, ``ruff check``, ``ruff format --check``
all clean. 817 unit tests still pass.
@vitalii-dynamiq vitalii-dynamiq merged commit 7dd82af into main May 14, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant