Feat/dynamiq parity validation#16
Merged
Merged
Conversation
…for smoke tests only
New ``tests/integration/test_agentic_parity.py`` enumerates every chat or reason model in the OpenAI / Anthropic / Gemini / Groq / xAI / Mistral / Cohere / Together AI / Fireworks AI capability tables and parametrizes four agentic-surface tests over each row: - ``test_streaming`` — non-empty streamed content chunks - ``test_tool_calling`` — tool_calls or content for a weather prompt - ``test_structured_output`` — valid JSON when response_format=json_object - ``test_reasoning_content_emitted`` — reasoning_content or thinking_blocks Each test honours the per-model capability flags and skips when the provider's API key is missing — running without keys produces 692 clean skips, which is what unit CI sees. Wired into ``.github/workflows/integration.yml`` as a new matrix entry gated on ``OPENAI_API_KEY`` (always present in CI). The job receives all provider secrets via the existing env-block, so per-row tests resolve their own keys at runtime. Registers the ``live`` pytest marker.
Three targeted README updates that reflect work already on dynamiq/main but not yet documented: - New "Migrating from litellm" section: import-path mapping table plus a one-liner swap example. Frames arcllm as a general-purpose drop-in, not dynamiq-specific. Validated counts (1148 unit + 986 integration tests pass for the dynamiq agentic framework against arcllm). - Reasoning surface: short paragraph on the unified ``Message.reasoning_content`` + ``Message.thinking_blocks`` fields (lifted from v0.4.9's cross-provider unification work). Lists every provider that populates ``reasoning_content``. - Provider count: 28 -> 30 to match ``len(arcllm.providers.base.SUPPORTED_PROVIDERS)``. No content removed; no example code changed; no API surface implied that isn't already shipped.
Additive-only shim surface that lets code which hard-codes `import litellm` resolve every symbol it touches through arcllm after a one-line import swap (or via a `sys.modules`-aliased `litellm` shim package). Validated against the unit suites of Google ADK (253/256, only fixture-side `finish_reason` normalization failures) and langchain-litellm (110/110 — full parity). Surface added: * `Choices` (alias of `Choice`), `ChatCompletionDeltaToolCall` (= `dict`), `ChatCompletionAssistantMessage` / `…UserMessage` / `…SystemMessage` / `…ToolMessage` / `…AssistantToolCall` / `…MessageToolCall` (= `_AttrDict`, a `dict` subclass with attribute access so callers that mix `obj["x"]` and `obj.x` styles both work), `Function`, `FileObject`, `OpenAIMessageContent` (= `list`), `StreamingChoices` (= `ChunkChoice`), `ModelResponseStream` (= `StreamChunk`). * `arcllm.types.utils` and `arcllm.utils` submodule paths via `sys.modules` aliases (zero duplication). * `arcllm.integrations.custom_logger.CustomLogger` no-op subclassable stub. * `arcllm.acreate_file` and `arcllm.Router` `NotImplementedError` stubs for surfaces arcllm doesn't yet implement — keeps the import path unblocked and surfaces a clear local failure on actual invocation. * `arcllm.add_function_to_prompt: bool = False` passive module attr. * `arcllm.success_callback` / `_async_success_callback` / `failure_callback` / `callbacks` passive lists. * `arcllm.drop_params: bool = False` module-level toggle — when `True` and no per-call `drop_params=` is passed, `completion()` / `acompletion()` honor it as the default (one `sys.modules` lookup per call, ~sub-microsecond cost). * `ContextWindowExceededError(BadRequestError)` exception class. * `supports_response_schema` (alias of `supports_structured_output`). `tests/test_litellm_compat.py` adds 25 import-surface tests pinning every alias, stub, module-level attr, and submodule path. Tests deliberately describe the litellm contract being shimmed (not specific consumers) so the docs survive consumer churn. Total impact: 813 → 817 arcllm unit tests pass; no regressions in the existing suite.
End-to-end validation reports for two highest-leverage agentic frameworks that hard-code `import litellm`: * Google ADK (google/adk-python @ main 2026-05-12, v1.33.0 + litellm 1.83.7 baseline): 253/256 LiteLlm unit tests pass via the arcllm-as-litellm shim. The 3 failures are a single category — fixture-side `ModelResponse` construction without explicit `finish_reason` where litellm normalizes the field to "stop" and arcllm preserves `None`. Doesn't affect real provider responses. * langchain-litellm (langchain-ai/langchain-litellm @ main 2026-05-13, v0.6.5 + litellm 1.83.14 baseline): 110/110 unit tests pass — full parity with the real-litellm baseline. Across both validations: zero genuine arcllm bugs surfaced. The drop-in claim holds.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2220ca5. Configure here.
`ruff check arcllm tests` flagged: * `arcllm/__init__.py` — `Any` used in shim signatures without import (`acreate_file(..., **_: Any) -> Any` and `Router.__init__(*args: Any, **kwargs: Any)`). Added `from typing import Any`. * `arcllm/types.py` — `import sys as _sys` was placed beside the `sys.modules` registration at the bottom of the file (E402: module-level import not at top). Moved to the top-of-file import block. * `arcllm/__init__.py`, `arcllm/core.py`, `tests/test_litellm_compat.py` — ruff `--fix` reorganized 5 import blocks. All 817 unit tests still pass.
CI's `Lint` job runs both `ruff check` and `ruff format --check`. The prior fix satisfied `ruff check` but four files still triggered reformat-on-CI: `arcllm/core.py`, `arcllm/types.py`, `tests/integration/test_agentic_parity.py`, `tests/test_litellm_compat.py`. Pure whitespace / quote-style normalization — no logic changes. 817 unit tests still pass.
CI's `Lint` job also runs `mypy arcllm --strict`. Two type-arg gaps: * `arcllm/types.py:717` — ``class _AttrDict(dict):`` → ``dict[str, Any]``. * `arcllm/__init__.py:281-284` — ``success_callback: list = []`` (and three siblings) → ``list[Any]``. These passive litellm-compat attributes intentionally accept arbitrary callback objects. Verified locally: ``mypy --strict``, ``ruff check``, ``ruff format --check`` all clean. 817 unit tests still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Note
Medium Risk
Medium risk because it expands the public import surface and subtly changes request-kwarg handling via a module-level
drop_paramsdefault that can affect all calls when toggled. Changes are mostly additive/compat stubs, but mis-aliasing could break downstream imports or hide unsupported-parameter errors.Overview
Improves
litellmdrop-in compatibility by exporting additional symbols fromarcllm(typed message/tool-call factories,Choices, streaming aliases,ContextWindowExceededError,supports_response_schema), adding import-path shims (arcllm.types.utils,arcllm.utils), and introducing passive stubs forCustomLogger,Router, andacreate_file.Adds global
drop_paramsbehavior:completion()/acompletion()now honor a module-levelarcllm.drop_paramstoggle (when no per-call override is provided) to silently drop unsupported kwargs.Expands validation + docs with a new
@pytest.mark.liveagentic-parity integration matrix wired into the nightly workflow, plus newtests/test_litellm_compat.pyto pin the compat surface and feasibility reports for Google ADK andlangchain-litellm.Reviewed by Cursor Bugbot for commit 8708ef5. Bugbot is set up for automated code reviews on this repo. Configure here.