diff --git a/CHANGELOG.md b/CHANGELOG.md index ab8ee2a..646ebfa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -39,6 +39,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed (Internal) +- **MCP server catch-up — `drt_doctor` tool + `drt_run_sync(compute_diff=True)` CLI parity**: the MCP server picked up two pieces that had been shipped on the CLI for months but weren't reachable from Claude / Cursor. **`drt_doctor`** mirrors the `drt doctor` CLI (v0.7.0+) and returns a structured `{"passed": bool, "checks": [{"category", "name", "ok", "message"}, ...]}` report — runtime / project / extras / env-var rows, so "why won't this drt project run?" is one tool call instead of a console screenshot. **`drt_run_sync`** now accepts `compute_diff: bool` and `diff_limit: int` arguments matching the `drt run --dry-run --diff` CLI (#413, v0.7.1+) — when set, the response carries a structured `diff` field via the existing `cli/output.diff_to_dict` serialiser, so an LLM can preview added / updated / deleted / unchanged records for queryable destinations before committing. `compute_diff=True` without `dry_run=True` returns a structured error rather than silently running against a live destination (matches the CLI contract). Module docstring updated to list the now-complete tool surface (9 tools), and `drt_get_history` — which had been registered in code but missing from the docstring since #445 — is now listed there too. README.md / README.ja.md MCP-tool tables updated. 5 new tests in `tests/unit/test_mcp.py` cover the new tool registration, the `compute_diff` requires-`dry_run` contract, the doctor happy path against a well-formed fixture project, and the doctor `passed=False` path when the project file is missing. + - **Shared object-storage serialiser extracted from S3 destination** (prep for [#169](https://github.com/drt-hub/drt/issues/169) GCS + [#170](https://github.com/drt-hub/drt/issues/170) Azure Blob): the format-handling logic (csv / json / jsonl / parquet + gzip wrapping + `.` key naming) is now `drt/destinations/_blob_serializer.py` — pure, config-agnostic, no client dependency. The S3 destination keeps its existing behaviour (locked by the 25-test suite, all green; one `unittest.mock.patch.object(S3Destination, "_serialise")` test re-pointed at the new `serialise_records` import path) and shrinks from 244 LOC to 167 LOC. GCS and Azure Blob destinations will land as thin `_client` + `put_object`-equivalent shims on top of this module instead of duplicating the format/key logic three times. 19 new tests in `tests/unit/test_blob_serializer.py` lock the module's own contract: CSV header + data rows, JSON-as-array, JSONL-per-line, `default=str` fallback for datetime/Decimal/UUID, gzip wraps text formats while parquet ignores the outer gzip flag, parquet missing-extras raises with the `drt-core[parquet]` install hint preserved, key-naming default + template overrides (with and without an explicit extension), `.gz` suffix appended for gzip-text but not gzip-parquet. No behaviour change, no public API change. - **OpenTelemetry observability scaffolding — Phase 2** ([#531](https://github.com/drt-hub/drt/issues/531), PR [#615](https://github.com/drt-hub/drt/pull/615) by [@Muawiya-contact](https://github.com/Muawiya-contact)): new `drt/observability/` package owning the lazy initialisation of the global OTLP-backed tracer and meter providers, plus a guaranteed no-op fallback so drt has **zero runtime overhead** when OTel is not configured or `[otel]` extras are not installed. Public functions `get_tracer()` and `get_meter()` return a real OTLP-backed instance when `observability.otel.endpoint` is set (or the standard `OTEL_EXPORTER_OTLP_ENDPOINT` env var is present as a fallback) and the `[otel]` extras are installed; in every other case (extras missing, no `observability` block, missing endpoint, exporter init failure) they return a `_FallbackNoOpTracer` / `_FallbackNoOpMeter` that supports the full API surface (`start_as_current_span` / `set_attribute` / `record_exception` / `set_status` / `create_counter` / `create_up_down_counter` / `create_histogram` / `create_gauge`) so Phase 3 instrumentation can sprinkle `with tracer.start_as_current_span(...)` everywhere without risk. Provider init is **lazy and idempotent** (double-checked locking via `_STATE.initialized` + `_LOCK`, zero side-effects at import time). `service.name` resource attribute defaults to `"drt"` and is overridable via `observability.otel.service_name`. OTLP headers from `observability.otel.headers` are passed through to the exporter with `${VAR}` env expansion happening at provider init (not at config load time — Phase 1 deliberately keeps headers raw on disk for safety). Endpoint normalisation maps `http://` scheme to `insecure=True` per the gRPC OTLP convention. The `_warn_once` helper guarantees init failures log a single warning, never spam. New `tests/unit/test_otel.py` ships 12 tests covering the no-op-when-otel-is-missing path, env-var fallback, full config with header expansion, exporter init failure → no-op + warning, idempotent reuse, and the defensive paths (`_parse_otlp_headers_env` `ValueError` raises × 2, `_expand_headers` non-dict / non-string raises, `_FallbackNoOpInstrument.add` / `.record`, `_FallbackNoOpMeter.create_*`). Continuation of [@Muawiya-contact](https://github.com/Muawiya-contact)'s OTel chain since [#429](https://github.com/drt-hub/drt/issues/429) / [#527](https://github.com/drt-hub/drt/pull/527) (Phase 1 config schema); Phase 3 (engine instrumentation: `drt.sync.run` / `drt.sync.extract` / `drt.sync.load` spans wrapping `engine/sync.py` execution) is the natural next step and will be filed as a separate sub-issue. diff --git a/README.ja.md b/README.ja.md index 5cf000c..93af303 100644 --- a/README.ja.md +++ b/README.ja.md @@ -185,11 +185,14 @@ drt mcp run | ツール | 機能 | |------|-------------| | `drt_list_syncs` | 同期定義の一覧を表示 | -| `drt_run_sync` | 同期を実行(`dry_run`対応) | +| `drt_run_sync` | 同期を実行(`dry_run` + `compute_diff` で `--diff` 同等) | +| `drt_run_test` | 同期後の検証テストを実行(`drt test` 相当) | | `drt_get_status` | 前回の実行結果を取得 | +| `drt_get_history` | 直近の同期実行履歴を取得 | | `drt_validate` | 同期YAML設定を検証 | | `drt_get_schema` | 設定ファイルのJSONスキーマを返す | | `drt_list_connectors` | 利用可能なソースとデスティネーションを一覧 | +| `drt_doctor` | 環境診断(`drt doctor` 相当) | --- diff --git a/README.md b/README.md index f8f5815..6c85989 100644 --- a/README.md +++ b/README.md @@ -208,14 +208,17 @@ drt mcp run **Available MCP tools:** -| Tool | What it does | -| --------------------- | --------------------------------------- | -| `drt_list_syncs` | List all sync definitions | -| `drt_run_sync` | Run a sync (supports `dry_run`) | -| `drt_get_status` | Get last run result(s) | -| `drt_validate` | Validate sync YAML configs | -| `drt_get_schema` | Return JSON Schema for config files | -| `drt_list_connectors` | List available sources and destinations | +| Tool | What it does | +| --------------------- | ------------------------------------------------------------------------------------- | +| `drt_list_syncs` | List all sync definitions | +| `drt_run_sync` | Run a sync (supports `dry_run` + `compute_diff` for `--diff` parity) | +| `drt_run_test` | Run post-sync validation tests (mirrors `drt test`) | +| `drt_get_status` | Get last run result(s) | +| `drt_get_history` | Get recent sync run history | +| `drt_validate` | Validate sync YAML configs | +| `drt_get_schema` | Return JSON Schema for config files | +| `drt_list_connectors` | List available sources and destinations | +| `drt_doctor` | Environment diagnostics (mirrors `drt doctor`) | --- diff --git a/drt/mcp/server.py b/drt/mcp/server.py index 0b88e9f..ead820e 100644 --- a/drt/mcp/server.py +++ b/drt/mcp/server.py @@ -6,12 +6,14 @@ Tools: drt_list_syncs — list all sync definitions - drt_run_sync — run a specific sync (dry_run supported) + drt_run_sync — run a specific sync (dry_run + compute_diff supported) drt_run_test — run post-sync validation tests for a sync drt_get_status — get last sync result for a sync + drt_get_history — get recent sync run history (v0.7+) drt_validate — validate all sync YAML configs (per-file errors) drt_get_schema — return JSON Schema for drt_project.yml / sync.yml drt_list_connectors — list available source and destination connectors + drt_doctor — environment diagnostics (mirrors `drt doctor` CLI) """ from __future__ import annotations @@ -68,15 +70,28 @@ def drt_list_syncs() -> list[dict[str, str]]: # ----------------------------------------------------------------------- @mcp.tool() - def drt_run_sync(sync_name: str, dry_run: bool = False) -> dict[str, Any]: + def drt_run_sync( + sync_name: str, + dry_run: bool = False, + compute_diff: bool = False, + diff_limit: int = 20, + ) -> dict[str, Any]: """Run a specific drt sync. Args: sync_name: Name of the sync to run (from drt_list_syncs). dry_run: If True, extracts data but does not write to destination. + compute_diff: When True (requires ``dry_run=True``), compute a + record-level diff (added / updated / deleted / unchanged) + against the destination. Queryable destinations get a true + diff; non-queryable destinations get a sample preview. + Mirrors ``drt run --dry-run --diff`` (v0.7.1+). + diff_limit: Cap on records per diff category (default 20). Returns: - Result summary with success count, failed count, and any errors. + Result summary with success count, failed count, errors, and + (when ``compute_diff=True``) a ``diff`` field with the + structured preview. """ from drt.cli.main import _get_destination, _get_source from drt.config.credentials import load_profile @@ -84,6 +99,12 @@ def drt_run_sync(sync_name: str, dry_run: bool = False) -> dict[str, Any]: from drt.engine.sync import run_sync from drt.state.manager import StateManager + if compute_diff and not dry_run: + return { + "error": "compute_diff requires dry_run=True (matches the " + "`drt run --dry-run --diff` CLI contract)." + } + project = load_project(_project_dir) profile = load_profile(project.profile) syncs = load_syncs(_project_dir) @@ -105,15 +126,23 @@ def drt_run_sync(sync_name: str, dry_run: bool = False) -> dict[str, Any]: _project_dir, dry_run, state_mgr, + compute_diff=compute_diff, + diff_limit=diff_limit, ) - return { + response: dict[str, Any] = { "sync_name": sync_name, "dry_run": dry_run, "success": result.success, "failed": result.failed, "errors": result.errors[:10], # cap at 10 to avoid huge payloads } + diff_value = getattr(result, "diff", None) + if compute_diff and diff_value is not None: + from drt.cli.output import diff_to_dict + + response["diff"] = diff_to_dict(diff_value) + return response # ----------------------------------------------------------------------- # drt_run_test @@ -361,6 +390,81 @@ def drt_list_connectors() -> dict[str, list[dict[str, str]]]: ], } + # ----------------------------------------------------------------------- + # drt_doctor + # ----------------------------------------------------------------------- + + @mcp.tool() + def drt_doctor() -> dict[str, Any]: + """Run environment diagnostics — the MCP equivalent of ``drt doctor``. + + Mirrors the CLI ``drt doctor`` (v0.7.0+) but returns a structured + report instead of a console table. Useful for "why won't this drt + project run?" before reading any code — catches missing env vars, + malformed profile, uninstalled extras, etc. + + Returns: + ``{"passed": bool, "checks": [{"category", "name", "ok", + "message"}, ...]}`` where ``passed`` is False if any required + check failed (project file / profile / Python version). + """ + from drt import __version__ as drt_version + from drt.cli.doctor import ( + _check_env_vars, + _check_extras, + _check_profile, + _check_project_file, + _check_python, + _check_syncs, + ) + + checks: list[dict[str, Any]] = [] + required_ok = True + + py_ok, py_msg = _check_python() + checks.append( + {"category": "runtime", "name": "Python version", "ok": py_ok, "message": py_msg} + ) + required_ok = required_ok and py_ok + + checks.append( + { + "category": "runtime", + "name": "drt version", + "ok": True, + "message": drt_version, + } + ) + + proj_ok, proj_msg, project_data = _check_project_file() + checks.append( + {"category": "project", "name": "Project file", "ok": proj_ok, "message": proj_msg} + ) + required_ok = required_ok and proj_ok + + if project_data: + prof_ok, prof_msg = _check_profile(project_data) + checks.append( + {"category": "project", "name": "Profile", "ok": prof_ok, "message": prof_msg} + ) + required_ok = required_ok and prof_ok + + _, syncs_ok, syncs_msg = _check_syncs(project_data) + checks.append( + {"category": "project", "name": "Syncs", "ok": syncs_ok, "message": syncs_msg} + ) + + for label, ok, msg in _check_extras(): + # Extras are optional — they affect ``ok`` of the row but not + # overall ``passed``. A user can run a duckdb-only project with + # no other extras installed and that's fine. + checks.append({"category": "extras", "name": label, "ok": ok, "message": msg}) + + for var, ok, msg in _check_env_vars(project_data): + checks.append({"category": "env", "name": var, "ok": ok, "message": msg}) + + return {"passed": required_ok, "checks": checks} + return mcp diff --git a/tests/unit/test_mcp.py b/tests/unit/test_mcp.py index 822a855..35fda47 100644 --- a/tests/unit/test_mcp.py +++ b/tests/unit/test_mcp.py @@ -224,3 +224,195 @@ async def test_get_schema_project(server: FastMCP) -> None: schema = await call(server, "drt_get_schema", schema_type="project") assert isinstance(schema, dict) assert "$defs" in schema or "properties" in schema + + +# --------------------------------------------------------------------------- +# drt_run_sync — compute_diff parameter (#413 parity) +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_run_sync_returns_error_for_unknown_sync( + project_dir: Path, monkeypatch: Any +) -> None: + """Unknown ``sync_name`` returns a structured error (no engine call). + + Bypasses ``load_profile`` (which would otherwise try to read the + real ``~/.drt/profiles.yml`` on the developer's machine) — the + sync-name match happens after profile loading in the flow. + """ + monkeypatch.setattr("drt.config.credentials.load_profile", lambda _name: object()) + srv = create_server(project_dir) + result = await call(srv, "drt_run_sync", sync_name="nonexistent") + assert "error" in result + assert "nonexistent" in result["error"] + + +@pytest.mark.asyncio +async def test_run_sync_compute_diff_requires_dry_run(server: FastMCP) -> None: + """``compute_diff=True`` without ``dry_run=True`` is a contract + violation — matches the CLI ``drt run --diff`` requiring + ``--dry-run``. Returns a structured error rather than executing + the sync against a live destination. + """ + result = await call( + server, "drt_run_sync", sync_name="notify", compute_diff=True, dry_run=False + ) + assert "error" in result + assert "dry_run" in result["error"] + + +@pytest.mark.asyncio +async def test_run_sync_compute_diff_threads_diff_into_response( + project_dir: Path, monkeypatch: Any +) -> None: + """``compute_diff=True`` + ``dry_run=True`` → response carries a + ``diff`` field built from ``diff_to_dict``. This is the success + path that exercises the load_project / run_sync / response-with-diff + branch — which the error-path tests can't reach. + + Patches the engine + source/destination factory functions at + their source modules so the inside-function imports resolve to + the test doubles, avoiding a real warehouse / HTTP destination. + """ + from drt.engine.sync import SyncResult + + fake_diff = object() # diff_to_dict tolerates None / unknown shapes + + def fake_run_sync(*_args: Any, **_kwargs: Any) -> SyncResult: + result = SyncResult() + result.success = 1 + result.failed = 0 + result.diff = fake_diff # type: ignore[attr-defined] + return result + + def fake_diff_to_dict(_diff: object) -> dict[str, Any]: + return {"added": [{"id": 1}], "updated": [], "deleted": [], "unchanged": []} + + # Patch the engine + factory layers at their source modules so the + # inside-function imports inside `drt_run_sync` pick up the doubles. + monkeypatch.setattr("drt.engine.sync.run_sync", fake_run_sync) + monkeypatch.setattr("drt.cli.main._get_source", lambda _profile: object()) + monkeypatch.setattr("drt.cli.main._get_destination", lambda _sync: object()) + monkeypatch.setattr("drt.config.credentials.load_profile", lambda _name: object()) + monkeypatch.setattr("drt.cli.output.diff_to_dict", fake_diff_to_dict) + + srv = create_server(project_dir) + result = await call( + srv, "drt_run_sync", sync_name="notify", dry_run=True, compute_diff=True + ) + + assert "diff" in result + assert result["diff"] == { + "added": [{"id": 1}], + "updated": [], + "deleted": [], + "unchanged": [], + } + assert result["dry_run"] is True + assert result["success"] == 1 + + +@pytest.mark.asyncio +async def test_run_sync_dry_run_without_compute_diff_omits_diff_field( + project_dir: Path, monkeypatch: Any +) -> None: + """``compute_diff=False`` → response has no ``diff`` field even + when ``dry_run=True``. Exercises the response-building path + without the diff serialisation branch.""" + from drt.engine.sync import SyncResult + + def fake_run_sync(*_args: Any, **_kwargs: Any) -> SyncResult: + result = SyncResult() + result.success = 1 + return result + + monkeypatch.setattr("drt.engine.sync.run_sync", fake_run_sync) + monkeypatch.setattr("drt.cli.main._get_source", lambda _profile: object()) + monkeypatch.setattr("drt.cli.main._get_destination", lambda _sync: object()) + monkeypatch.setattr("drt.config.credentials.load_profile", lambda _name: object()) + + srv = create_server(project_dir) + result = await call(srv, "drt_run_sync", sync_name="notify", dry_run=True) + + assert "diff" not in result + assert result["dry_run"] is True + assert result["success"] == 1 + + +# --------------------------------------------------------------------------- +# drt_doctor — environment diagnostics (mirrors `drt doctor` CLI) +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_doctor_returns_structured_report( + project_dir: Path, monkeypatch: Any +) -> None: + """``drt_doctor`` returns ``{passed, checks}`` with at minimum + Python version + drt version + project file rows. + """ + # `_check_*` helpers in drt/cli/doctor.py read from CWD; cd into + # the temp project so project_file and profile checks see real + # files instead of whatever the test runner's CWD is. + monkeypatch.chdir(project_dir) + srv = create_server(project_dir) + + result = await call(srv, "drt_doctor") + assert "passed" in result + assert "checks" in result + assert isinstance(result["checks"], list) + # At minimum: Python version, drt version, project file + names = {c["name"] for c in result["checks"]} + assert "Python version" in names + assert "drt version" in names + assert "Project file" in names + # Each check has the documented shape + for check in result["checks"]: + assert set(check.keys()) >= {"category", "name", "ok", "message"} + + +@pytest.mark.asyncio +async def test_doctor_passes_on_well_formed_project( + project_dir: Path, monkeypatch: Any +) -> None: + """On a well-formed project (project file + profile file + syncs/), + ``passed`` is True. The fixture creates exactly this shape, so any + regression that breaks the happy path surfaces here.""" + monkeypatch.chdir(project_dir) + + # Profile fixture: ~/.drt/profiles.yml gets read by _check_profile. + # The fixture project references profile "default"; provide a + # minimal profiles.yml under a fake HOME to keep the test + # self-contained and avoid touching the developer's real + # ~/.drt/profiles.yml. + fake_home = project_dir / "fake_home" + (fake_home / ".drt").mkdir(parents=True) + (fake_home / ".drt" / "profiles.yml").write_text("default: { type: duckdb }\n") + monkeypatch.setenv("HOME", str(fake_home)) + + srv = create_server(project_dir) + result = await call(srv, "drt_doctor") + assert result["passed"] is True + + +@pytest.mark.asyncio +async def test_doctor_fails_without_project_file(tmp_path: Path, monkeypatch: Any) -> None: + """Outside a drt project, ``passed`` is False — the project-file + check is required for a green report.""" + srv = create_server(tmp_path) + monkeypatch.chdir(tmp_path) # empty dir, no drt_project.yml + result = await call(srv, "drt_doctor") + assert result["passed"] is False + project_file_row = next(c for c in result["checks"] if c["name"] == "Project file") + assert project_file_row["ok"] is False + + +@pytest.mark.asyncio +async def test_server_lists_drt_doctor_tool() -> None: + """The newly added `drt_doctor` is registered alongside the + existing tools.""" + srv = create_server() + tools = await srv._local_provider._list_tools() + tool_names = {t.name for t in tools} + assert "drt_doctor" in tool_names