feat: add diagnose_residuals_tool for agentic residual diagnostics by DEEP-600 · Pull Request #402 · sktime/sktime-mcp

DEEP-600 · 2026-04-30T10:05:28Z

Implements #400.

What and why

Reading through the existing tool surface and #386, I noticed the
agentic loop has no feedback mechanism after a poor evaluation score.
The agent gets MAPE = 25% and has nothing to work with beyond trying
another model.

Human forecasters look at residuals at this point — ACF for missed
seasonality, normality checks, bias direction. Agents can't look at
plots, so this tool runs those same checks and returns structured text
they can reason over.

What's in this PR

src/sktime_mcp/tools/diagnose.py — the new tool
__init__.py and server.py updated to expose it
tests/test_diagnose.py — three test cases

How it works

Takes an estimator_handle and dataset, reloads the data the same
way evaluate.py does, pulls the fitted instance from
_handle_manager, and runs three tests on the residuals:

Ljung-Box (statsmodels) — catches missed seasonality
Shapiro-Wilk (scipy) — catches non-normality
Mean bias — catches systematic under/over-forecasting

No new dependencies — statsmodels and scipy are already in
pyproject.toml.

Example output

{
  "success": true,
  "diagnostics": {
    "bias": {"mean_error": -4.2, "status": "consistently over-forecasting"},
    "autocorrelation": {"ljung_box_passed": false, "significant_lags": [12, 24]},
    "normality": {"shapiro_passed": false, "p_value": 0.01}
  },
  "llm_hint": "Residuals show significant autocorrelation at lags [12, 24]. 
  This may indicate missed annual seasonality. Consider switching to SARIMA 
  or adding a Deseasonalizer pipeline."
}

A couple of things I'd like feedback on

I used predict_residuals(y) as primary with a manual fallback for
complex pipelines — is that the right call or overkill?
Kept heteroskedasticity out of scope for now, happy to add in a
follow-up if that's useful.

feat: add diagnose_residuals_tool for agentic residual diagnostics

c052a57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add diagnose_residuals_tool for agentic residual diagnostics#402

feat: add diagnose_residuals_tool for agentic residual diagnostics#402
DEEP-600 wants to merge 1 commit intosktime:mainfrom
DEEP-600:feat/diagnose-residuals

DEEP-600 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DEEP-600 commented Apr 30, 2026

What and why

What's in this PR

How it works

Example output

A couple of things I'd like feedback on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant