diff --git a/README.md b/README.md index e0d03f4..120b79c 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ The toolkit bundles the following capabilities as a single **mc-agent-toolkit** | Feature | Description | Details | |---|---|---| +| **Asset Health** | Checks the health of a data table — surfaces last activity, alerts, monitoring coverage, importance, and upstream dependency health. | [README](skills/asset-health/README.md) | | **Monitor Creation** | Guides AI agents through creating monitors correctly — validates tables, fields, and parameters before generating monitors-as-code YAML. | [README](skills/monitor-creation/README.md) | | **Prevent** | Surfaces lineage, alerts, and blast radius before code changes. Generates monitors-as-code and targeted validation queries to prevent data incidents. | [README](skills/prevent/README.md) | | **Generate Validation Notebook** | Generates SQL validation notebooks for dbt model changes, with targeted queries comparing baseline and development data. | [README](skills/generate-validation-notebook/README.md) | diff --git a/plugins/claude-code/.claude-plugin/plugin.json b/plugins/claude-code/.claude-plugin/plugin.json index 77ff67a..25c3730 100644 --- a/plugins/claude-code/.claude-plugin/plugin.json +++ b/plugins/claude-code/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "mc-agent-toolkit", - "version": "1.0.1", + "version": "1.1.0", "description": "Monte Carlo Agent Toolkit — data observability skills and enforcement hooks for AI coding agents.", "author": { "name": "Monte Carlo", diff --git a/plugins/claude-code/skills/asset-health b/plugins/claude-code/skills/asset-health new file mode 120000 index 0000000..11fb45c --- /dev/null +++ b/plugins/claude-code/skills/asset-health @@ -0,0 +1 @@ +../../../skills/asset-health \ No newline at end of file diff --git a/plugins/codex/.codex-plugin/plugin.json b/plugins/codex/.codex-plugin/plugin.json index b50165b..76df567 100644 --- a/plugins/codex/.codex-plugin/plugin.json +++ b/plugins/codex/.codex-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "mc-agent-toolkit", - "version": "1.0.1", + "version": "1.1.0", "description": "Monte Carlo Agent Toolkit — data observability skills and enforcement hooks for AI coding agents.", "author": { "name": "Monte Carlo", diff --git a/plugins/codex/skills/asset-health b/plugins/codex/skills/asset-health new file mode 120000 index 0000000..11fb45c --- /dev/null +++ b/plugins/codex/skills/asset-health @@ -0,0 +1 @@ +../../../skills/asset-health \ No newline at end of file diff --git a/plugins/copilot/plugin.json b/plugins/copilot/plugin.json index 5d21e7d..cf141ab 100644 --- a/plugins/copilot/plugin.json +++ b/plugins/copilot/plugin.json @@ -1,6 +1,6 @@ { "name": "mc-agent-toolkit", - "version": "1.0.1", + "version": "1.1.0", "description": "Monte Carlo Agent Toolkit — data observability for AI coding agents.", "author": { "name": "Monte Carlo", diff --git a/plugins/copilot/skills/asset-health b/plugins/copilot/skills/asset-health new file mode 120000 index 0000000..11fb45c --- /dev/null +++ b/plugins/copilot/skills/asset-health @@ -0,0 +1 @@ +../../../skills/asset-health \ No newline at end of file diff --git a/plugins/cursor/.cursor-plugin/plugin.json b/plugins/cursor/.cursor-plugin/plugin.json index 9dea848..91cbdfe 100644 --- a/plugins/cursor/.cursor-plugin/plugin.json +++ b/plugins/cursor/.cursor-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "mc-agent-toolkit", - "version": "1.0.1", + "version": "1.1.0", "description": "Monte Carlo Agent Toolkit — data observability skills and enforcement hooks for AI coding agents.", "author": { "name": "Monte Carlo", diff --git a/plugins/cursor/skills/asset-health b/plugins/cursor/skills/asset-health new file mode 120000 index 0000000..11fb45c --- /dev/null +++ b/plugins/cursor/skills/asset-health @@ -0,0 +1 @@ +../../../skills/asset-health \ No newline at end of file diff --git a/plugins/opencode/skills/asset-health b/plugins/opencode/skills/asset-health new file mode 120000 index 0000000..11fb45c --- /dev/null +++ b/plugins/opencode/skills/asset-health @@ -0,0 +1 @@ +../../../skills/asset-health \ No newline at end of file diff --git a/skills/README.md b/skills/README.md index 1076290..b7d7570 100644 --- a/skills/README.md +++ b/skills/README.md @@ -6,6 +6,7 @@ Skills are platform-agnostic instruction sets that tell an AI coding agent what | Skill | Description | |---|---| +| **[Asset Health](asset-health/)** | Checks the health of a data table — surfaces last activity, alerts, monitoring coverage, importance, and upstream dependency health from Monte Carlo. | | **[Monitor Creation](monitor-creation/)** | Guides AI agents through creating monitors correctly — validates tables, fields, and parameters before generating monitors-as-code YAML. | | **[Prevent](prevent/)** | Surfaces Monte Carlo context (lineage, alerts, blast radius) before code changes, generates monitors-as-code, and produces targeted validation queries. | | **[Generate Validation Notebook](generate-validation-notebook/)** | Generates SQL validation notebooks for dbt model changes, with targeted queries comparing baseline and development data. | diff --git a/skills/asset-health/README.md b/skills/asset-health/README.md new file mode 100644 index 0000000..14fc7f9 --- /dev/null +++ b/skills/asset-health/README.md @@ -0,0 +1,48 @@ +# Monte Carlo Asset Health Skill + +Check the health of a data table using Monte Carlo — surfaces last activity, active alerts, monitoring coverage, importance, tags, and upstream dependency health in a single structured report. + +## Editor & Stack Compatibility + +The skill works with any AI editor that supports MCP and the Agent Skills format — including Claude Code, Cursor, and VS Code. + +All warehouses supported by Monte Carlo work with this skill. + +## Prerequisites + +- Claude Code, Cursor, VS Code or any editor with MCP support +- Monte Carlo account with Viewer role or above + +## Setup + +### Via the mc-agent-toolkit plugin (recommended) + +Install the plugin for your editor — it bundles the skill, MCP server, and permissions automatically. See the [main README](../../README.md#installing-the-plugin-recommended) for editor-specific instructions. + +### Standalone + +1. Configure the Monte Carlo MCP server: + ``` + claude mcp add --transport http monte-carlo-mcp https://integrations.getmontecarlo.com/mcp + ``` + +2. Install the skill: + ```bash + npx skills add monte-carlo-data/mc-agent-toolkit --skill asset-health + ``` + + Or copy directly: + ```bash + cp -r skills/asset-health ~/.claude/skills/asset-health + ``` + +## Usage + +Ask about the health or status of any table: + +- "How is table orders_status doing?" +- "Check health of dim_customers" +- "What's the status of raw_events?" +- "Check on volume_change table" + +The skill will produce a structured health report with metrics, active alerts, monitor status, and upstream dependency health. diff --git a/skills/asset-health/SKILL.md b/skills/asset-health/SKILL.md new file mode 100644 index 0000000..984e55b --- /dev/null +++ b/skills/asset-health/SKILL.md @@ -0,0 +1,172 @@ +--- +name: monte-carlo-asset-health +description: | + Check the health of a data table/asset using Monte Carlo. Use when the user + asks "how is table X", "check health of X", "is X healthy", "status of X", + "check on X table", or any question about the health, status, or reliability + of a data asset. This is NOT the explore-table skill — use this skill for + health checks, not profiling. +version: 1.0.0 +--- + +# Monte Carlo Asset Health Skill + +This skill checks the health of a data asset using Monte Carlo's observability +platform. It produces a structured health report covering freshness, alerts, +monitoring coverage, importance, and upstream dependency health. + +## REQUIRED: Read reference files before executing + +**You MUST read both reference files using the Read tool before making any MCP +tool calls.** These files are the source of truth for tool calls, parameters, +and response interpretation. This file only defines when to activate and how to +format the output. + +1. `references/workflows.md` (relative to this file) — exact tool calls, phases, and execution order +2. `references/parameters.md` (relative to this file) — parameter conventions and field details + +**Do NOT make any MCP tool calls until you have read both files.** + +## When to activate this skill + +Activate when the user: + +- Asks about health: "how is table X doing?", "check health of X", "is X healthy?" +- Asks about status: "what's the status of X?", "status of orders table" +- Asks to check on a table: "check on X table", "check on X" +- Asks about reliability, freshness, or quality of a specific asset +- References a table in context of incident triage or change planning + +## When NOT to activate this skill + +- **Profiling or exploring table data** (row counts, column stats, distributions) → use `explore-table` +- **Creating or suggesting monitors** → use `monitor-creation` +- **Active incident triage** (investigating root cause of a firing alert) → use prevent skill Workflow 3 + +## Health report format + +**CRITICAL: Only report data returned by the tools defined in `references/workflows.md`. +Do NOT call additional tools, do NOT infer or fabricate metrics. Each row below +specifies exactly which tool provides its value.** + +**All sections (Active Alerts, Monitors, Upstream Issues, Recommendations) must +always appear with their heading.** Never omit a section — if there is no data, +show the empty-state text defined below. + +**Never use emoji shortcodes** (like `:warning:` or `:arrow_up:`). Use Unicode +emoji characters directly (like ⚠️) or plain text. Shortcodes render as raw text +in the terminal. + +**Always display URLs as bare URLs**, never as markdown links (e.g., `[text](url)`). + +**`{MC_WEBAPP_URL}` appears throughout this template.** Every occurrence must be +replaced with the actual value returned by calling `get_mc_webapp_url()`. Never +hardcode or guess this URL — it varies by environment. + +Present results in this structure: + +``` +## Health Check: + +**Tags:** `tag1:value1`, `tag2:value2` (or "None" if no tags) +**Link:** {MC_WEBAPP_URL}/assets/{mcon} +**Warehouse:** snowflake-prod (Snowflake) +**Status: 🟢 Healthy / 🟡 Degraded / 🔴 Unhealthy** | **Importance:** 0.85 (key asset ⭐️) +**Avg Reads/Day:** ~538 | **Avg Writes/Day:** ~12 + +| Metric | Value | Signal | +|---------------|--------------------------------|--------| +| Last Activity | Apr 6, 2025 | 🟢 Recent | +| Alerts | 2 active | 🔴 Has alerts | +| Monitoring | 3 active monitors | 🟢 Monitored | +| Upstream | 1/3 sources unhealthy | 🔴 Issues | + +### Active Alerts + +| Date | Type | Priority | Status | Link | +|-------|----------------|----------|------------------|---------------------------------------------------------| +| Apr 8 | Metric anomaly | P3 | Not acknowledged | {MC_WEBAPP_URL}/alerts/{alert_uuid} | +| Apr 7 | Freshness | P2 | Acknowledged | {MC_WEBAPP_URL}/alerts/{alert_uuid} | + +If there are more than 5 active alerts, display only 5. Do NOT put the overflow +message inside the table as a row. Instead, put it as plain text on the line +immediately after the table: + +There are N more alerts not shown for brevity + +If there are zero active alerts, show: +No active alerts in the last 7 days. + +### Monitors + +| Type | Name | Incidents (7d) | Status | +|-------------|-----------------------------------------|----------------|---------------------| +| TABLE | Orders freshness and schema | 3 | Running hourly | +| METRIC | Revenue row count | 0 | Never executed | +| BULK_METRIC | Warehouse volume check | 21 | ⚠️ 1 table has errors | + +If there are zero monitors, show: +No monitors configured for this table. + +### Upstream Issues +- raw_orders — FRESHNESS alert: not updated in 8h +- raw_payments — healthy +- dim_customers — healthy + +> Want me to check further upstream for **raw_orders**? + +If there are no upstream dependencies, show: +No upstream dependencies found. + +### Recommendations +- Investigate upstream raw_orders freshness — likely root cause of this table's staleness +- Acknowledge or investigate the 2 active alerts + +If there are no recommendations, show: +No recommendations — table looks healthy. + +``` + +### Metric definitions — exact data sources + +Each metric row MUST use only the specified data source. Do not add, infer, or +embellish values beyond what the tool returns. + +| Metric | Data source | What to show | Signal | +|--------|------------|-------------|--------| +| **Last Activity** | `getTable` → `last_activity` | Date of last activity (e.g., "Apr 6, 2025") | 🟢 Recent (within 7 days) / 🟡 Stale (older than 7 days) | +| **Alerts** | `getAlerts` → count | "N active" or "No active alerts" | 🔴 Has alerts / 🟢 No alerts | +| **Monitoring** | `getMonitors` → count where `is_paused` is false | "N active monitors" or "0 active monitors (M paused)". Include relevant details from monitor fields (incident counts, error counts, types). | 🟢 Monitored (≥1 active) / 🔴 Unmonitored (0 active) | +| **Upstream** | `getAssetLineage` (upstream) + Phase 3 checks | "N/M sources unhealthy" or "All N sources healthy" | 🔴 Issues (any unhealthy) / 🟢 Healthy (all healthy) | + +**Importance** is shown next to the Status line (not in the metrics table). Source: +`getTable` → `importance_score` + `is_important`. Show "X.XX (key asset ⭐️)" if +key asset or importance > 0.8, otherwise just "X.XX". + +**Avg Reads/Day** and **Avg Writes/Day** are shown below the Status line. Source: +`getTable` → `table_stats.avg_reads_per_active_day` and `table_stats.avg_writes_per_active_day`. + +**Do NOT include downstream data.** This skill only queries upstream lineage. + +### Status determination + +- **🔴 Unhealthy:** Any non-resolved alerts on the asset (from `getAlerts` with statuses `[null, "ACKNOWLEDGED", "WORK_IN_PROGRESS"]` — see `parameters.md`) +- **🟡 Degraded:** No active alerts, but 0 active monitors on a high-importance + asset (importance > 0.8 or key asset) +- **🟢 Healthy:** No active alerts and has at least 1 active monitor + +### Tags + +Display tags from the `search` tool's `properties` field. Show as inline badges: +`key:value`. If no tags exist, show "None". Always include the Tags line. + +### Warehouse + +Display the warehouse name and type from the `search` result. Always include this line. + +### Recommendations + +Only include recommendations derivable from collected data: +- Upstream health issues that may be root causes +- Active alerts that need acknowledgment or investigation +- Do NOT recommend specific monitor types — that is outside this skill's scope diff --git a/skills/asset-health/references/parameters.md b/skills/asset-health/references/parameters.md new file mode 100644 index 0000000..1cc9322 --- /dev/null +++ b/skills/asset-health/references/parameters.md @@ -0,0 +1,117 @@ +# MCP Parameter Notes + +Parameter details for the MCP tools used by the asset-health skill. Only covers +the tools relevant to this skill's workflows. + +--- + +## `getAlerts` — use snake_case parameters + +``` +created_after +created_before +order_by +table_mcons +statuses +``` + +Always provide `created_after` and `created_before`. Max window is 60 days. +Use `getCurrentTime()` to get the current ISO timestamp when needed. + +Filter to non-resolved alerts for health checks: +``` +statuses: ["ACKNOWLEDGED", "WORK_IN_PROGRESS", null] +``` +**Important:** `null` represents unacknowledged alerts. Do NOT pass +`"NOT_ACKNOWLEDGED"` as a string — it is not a valid API value. Use `null`. + +Response field mapping for the alert table: +- **Date** → `createdTime` +- **Type** → `alert_types` (array, e.g., "Volume", "Metric anomaly", "Freshness") +- **Priority** → `priority` (e.g., "P1", "P2", "P3") +- **Status** → `status` (e.g., "Not acknowledged", "Acknowledged", "Work in progress") +- **Link** → construct as `/alerts/` where `MC_WEBAPP_URL` + comes from `get_mc_webapp_url()` (called in Phase 1). Display as bare URL. + +--- + +## `search` — finding the right table identifier + +MC uses MCONs (Monte Carlo Object Names) as table identifiers. Always use +`search` first to resolve a table name to its MCON before calling `getTable`, +`getAssetLineage`, or `getAlerts`. + +``` +search(query="orders_status") → returns mcon, full_table_id, warehouse, properties +``` + +The `properties` field contains tags (key-value pairs) associated with the asset. + +--- + +## `getTable` — table metadata and stats + +Pass the MCON as: `mcon=""` (single string, not an array). + +Key response fields used by this skill: +- `last_activity` — timestamp of last activity (for Last Activity metric) +- `importance_score` — float 0-1 (for Importance in header) +- `is_important` — boolean, true if key asset (for ⭐️ indicator) +- `table_stats.avg_reads_per_active_day` — average reads per active day +- `table_stats.avg_writes_per_active_day` — average writes per active day + +--- + +## `getMonitors` — checking if monitors are paused + +When filtering by table, pass MCONs via the `mcons` parameter (not `table_mcons`). +Check the `is_paused` field (boolean) on each monitor. Only count monitors where +`is_paused` is false as active coverage. + +Response field mapping for the monitors table: +- **Type** → `monitor_type` (e.g., "TABLE", "METRIC", "BULK_METRIC") +- **Name** → `name` or `description` +- **Incidents (7d)** → `seven_days_incident_count` +- **Status** → derive from: `is_paused`, `next_execution_time`, `prev_execution_time`, + `seven_days_error_count`, `seven_days_timeout_count` + - If `is_paused` is true → "Paused" + - If `prev_execution_time` is null → "Never executed" + - If `seven_days_error_count` > 0 → "⚠️ N errors" + - Otherwise → "Running" (include schedule info from `next_execution_time` if available) + +--- + +## `get_mc_webapp_url` — get Monte Carlo base URL + +Takes no arguments. Returns the regionalized base URL of the Monte Carlo web app +(e.g., `https://getmontecarlo.com` — the actual value depends on the customer's +environment). Call once in Phase 1 and store the result. Use it to construct all +Monte Carlo links — never hardcode the base URL: +- Assets/tables: `{result}/assets/{mcon}` +- Alerts: `{result}/alerts/{alert_uuid}` + +--- + +## `getCurrentTime` — get current timestamp + +Takes no arguments. Returns an ISO 8601 timestamp. Use this to compute +`created_after` and `created_before` for `getAlerts`. + +--- + +## `getAssetLineage` — direction and edge interpretation + +Pass `direction` as `"UPSTREAM"` or `"DOWNSTREAM"` (uppercase). +Pass `mcons` as an array even for a single asset: `mcons=[""]`. + +Returns paginated edges (default 100 per page) where `source` and `target` are +MCONs representing data flow direction: `source` feeds data into `target`. + +If `has_more` is true in the response, follow pagination using `next_offset` to +get remaining edges. For upstream health checks, all parents must be discovered +before Phase 3 can run — do not skip pages. + +For an **UPSTREAM** query on asset X: +- Edges have `source = `, `target = X` (or intermediate nodes) +- Extract unique MCONs from the `source` field to get the upstream parents +- Exclude the queried asset's own MCON from the parent list diff --git a/skills/asset-health/references/workflows.md b/skills/asset-health/references/workflows.md new file mode 100644 index 0000000..11c7ce8 --- /dev/null +++ b/skills/asset-health/references/workflows.md @@ -0,0 +1,131 @@ +# Workflow Details + +Detailed step-by-step instructions for the Monte Carlo Asset Health skill. +Referenced from the main SKILL.md — consult when executing the workflow. + +--- + +## Asset Health Check + +When the user asks about the health or status of a data asset, run this sequence. + +### Phase 1 — Resolve the asset + +Run both calls in parallel: + +``` +search(query="") +→ Returns MCON, full_table_id, and properties (tags) + +get_mc_webapp_url() +→ Returns the base Monte Carlo webapp URL (MC_WEBAPP_URL) +``` + +Save the webapp URL for constructing alert links later. + +Save the MCON for subsequent calls. Save properties for the Tags line in the +report. If multiple results are returned, present them in a table with these +exact columns and ask which one they want to check. Do not pick one automatically +or make assumptions. + +``` +| # | Table (full_table_id) | Warehouse | Importance | Key Asset | +|---|----------------------|-----------|------------|-----------| +| 1 | db:schema.table | my-wh | 0.99 | Yes | +``` + +Every row must include the Warehouse column. + +### Phase 2 — Gather health metrics (ALL in parallel) + +Run all 4 calls in a single turn: + +``` +getTable(mcon="") +→ last updated, row count, importance score, is_important (key asset flag) + +getAlerts(created_after="<7 days ago>", created_before="", table_mcons=[""], statuses=["ACKNOWLEDGED", "WORK_IN_PROGRESS", null]) +→ active alerts on this asset + +getMonitors(mcons=[""]) +→ monitor configs — check status field for paused vs active + +getAssetLineage(mcon="", direction="upstream") +→ 1-hop upstream parent assets +``` + +Use `getCurrentTime()` in Phase 1 or Phase 2 if you need the current timestamp +for `getAlerts`. Or compute it from system time if available. + +### Phase 3 — Check upstream health (ALL parents in parallel) + +Check at most **10** upstream parents. If there are more than 10, check the first +10 and note: "N more upstream parents not checked — ask to see more." + +For each upstream parent, run both calls in parallel: + +``` +getTable(mcon="") +→ freshness, importance + +getAlerts(created_after="<7 days ago>", created_before="", table_mcons=[""], statuses=["ACKNOWLEDGED", "WORK_IN_PROGRESS", null]) +→ active alerts on this parent +``` + +All parents are checked in parallel with each other. Each parent's `getTable` and +`getAlerts` are also parallel (no dependency between them). + +### Phase 4 — Synthesize the health report + +Assemble findings into the report format defined in SKILL.md: + +1. **Tags** — from `search` properties. Omit line if none. +2. **Status** — determine from alerts and monitoring: + - 🔴 if any alerts returned (the statuses filter already limits to non-resolved) + - 🟡 if no alerts but 0 active monitors on a high-importance asset + - 🟢 otherwise +4. **Metrics table** — freshness, volume, alerts, monitoring, importance, upstream +5. **Active Alerts** — list each with type and status +6. **Upstream Issues** — list each parent with health status + - If any parent is unhealthy, ask: "Want me to check further upstream for **\**?" +7. **Recommendations** — only facts derivable from data: + - Upstream issues that may explain this asset's problems + - Alerts needing attention + +### Monitoring assessment + +When evaluating monitors from `getMonitors`: + +- Count only **active** (non-paused) monitors +- A paused monitor does NOT count as active coverage +- Report: "N active monitors" or "N monitors (M paused)" +- Signal: ≥1 active = 🟢, 0 active = 🔴 + +--- + +## Upstream Drill-Down + +When the user requests deeper upstream investigation for a specific parent: + +### Phase 1 — Get upstream of the specified parent + +``` +getAssetLineage(mcon="", direction="upstream") +→ 1-hop upstream of the parent (grandparents of the original asset) +``` + +### Phase 2 — Check grandparent health (ALL in parallel) + +For each grandparent: + +``` +getTable(mcon="") +getAlerts(created_after="<7 days ago>", created_before="", table_mcons=[""], statuses=["ACKNOWLEDGED", "WORK_IN_PROGRESS", null]) +``` + +### Phase 3 — Report + +Present findings for this hop. If any grandparent has issues, again ask: +"Want me to check further upstream for **\**?" + +Each drill-down is exactly 1 hop. Never auto-cascade. Always wait for user request.