Skip to content

Feature/data management#466

Open
Shashankss1205 wants to merge 3 commits into
mainfrom
feature/data_management
Open

Feature/data management#466
Shashankss1205 wants to merge 3 commits into
mainfrom
feature/data_management

Conversation

@Shashankss1205

@Shashankss1205 Shashankss1205 commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

PR: Add Data Management Tools

Reference Issues/PRs

Fixes #465

Independence: This PR is rebased on latest main and does not depend on #463 or any other open PR. It can be merged on its own. (Previously bundled #463 commits; those have been removed so this PR contains only data-management changes.)

What does this implement/fix? Explain your changes.

Adds 4 new MCP tools for the Data Management group:

# New Tool Purpose sktime Coverage
4 inspect_data Rich metadata inspection of loaded data handles mtype(), check_is_mtype(), get_cutoff(), scitype detection
5 split_data Temporal train/test splitting with test_size (fraction) or fh (horizon count) temporal_train_test_split()
6 transform_data action="format" (auto-fix freq/dupes/NaN) or action="convert" (mtype conversion) convert_to(), frequency inference, fill NaN, dedup
7 save_data Persist data handles to CSV, Parquet, or JSON files File export

Files created:

  • src/sktime_mcp/tools/inspect_data.py
  • src/sktime_mcp/tools/split_data.py
  • src/sktime_mcp/tools/transform_data.py
  • src/sktime_mcp/tools/save_data.py
  • tests/test_data_management.py

Files modified:

  • src/sktime_mcp/server.py — tool schemas + dispatcher
  • src/sktime_mcp/tools/__init__.py — exports
  • README.md — tool documentation

The existing format_time_series tool is preserved for backward compatibility; transform_data(action="format") delegates to the same executor logic.

Does your contribution introduce a new dependency? If yes, which one?

No. Uses existing dependencies (pandas, sktime).

What should a reviewer concentrate their feedback on?

  • inspect_data — mtype/scitype detection fallback logic
  • split_data — temporal splitting and handle registration
  • transform_dataconvert_to() edge cases
  • save_data — format dispatch and path handling

Any other comments

Rebased on latest main (3 commits, +1282 / −31). 197 tests pass under make check.

PR checklist

For all contributions
  • I've added unit tests and made sure they pass locally (make check).
  • I've added the tool to the online documentation in docs/source/.

@Shashankss1205 Shashankss1205 self-assigned this Jun 1, 2026
@Shashankss1205 Shashankss1205 force-pushed the feature/data_management branch 2 times, most recently from 9a11d65 to 5d5afe9 Compare June 4, 2026 19:58
@Shashankss1205 Shashankss1205 force-pushed the feature/data_management branch 3 times, most recently from 9ff7cec to 87b74ff Compare June 12, 2026 09:51
Shashankss1205 and others added 3 commits June 12, 2026 18:39
…save_data

- inspect_data: rich metadata (mtype, scitype, shape, freq, cutoff, missing values, head, summary_stats)
- split_data: temporal train/test split with test_size (fraction) or fh (horizon count)
- transform_data: unified action='format' (auto-fix freq/dupes/NaN) or action='convert' (mtype conversion)
- save_data: persist data handles to CSV/Parquet/JSON files
- Added 15 unit tests covering all tools and edge cases
- Wired all 4 tools into server.py (Tool schemas + call_tool dispatcher)
- Updated tools/__init__.py exports
Add accurate LLM-facing descriptions for inspect_data, split_data,
transform_data, and save_data. Remove format_time_series MCP tool now
subsumed by transform_data(action='format'). Fix fh list splitting to
use max(fh) steps, rename return test_size to n_test, and add fh validation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 5: Data Management Tools

2 participants