Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions steps/25_split_package/step.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Modularisation of sktime into Independent Component Packages

Contributors: @yarnabrina

## Introduction

As the sktime codebase continues to expand, the current monolithic structure is
presenting increasing challenges in terms of maintenance, dependency management, and
release coordination. This proposal outlines a plan to modularise sktime into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would more packages not increase the amount of challenges in release *coordination? Given that it is easier to coordinate (or simply: to carry out) a release for a single package than for 5.

independent, interoperable component packages, each with its own dependencies and
release process. The intention is to improve maintainability and scalability, while
allowing for more flexible development and release cycles.

## Contents

1. Problem statement
2. Description of proposed solution
3. Motivation
4. Discussion and comparison of alternative solutions
5. Detailed description of design and implementation
- Package structure
- Migration strategy
- Interoperability and testing
- Release strategy
- Documentation and user support
6. Potential concerns and areas to monitor

## Problem statement

The current monolithic structure of sktime presents several challenges:

- Rapid growth in codebase size and complexity.
- Increasing difficulty in managing and resolving dependency conflicts, especially with
soft dependencies.
- Release process bottlenecked by a single release manager and the need to synchronise
all modules.
- Inconsistent dependency bound management and insufficient testing of all documented
bounds.
- Limited and non-exhaustive interoperability testing across modules, particularly for
estimators with soft dependencies.

## Description of proposed solution

This proposal suggests to split sktime into a set of independent, interoperable
packages, each corresponding to a major module (e.g., forecasting, classification,
transformation). A lightweight `sktime-core` package will provide shared base classes
and utilities, ensuring interoperability and a unified interface. Pipeline and
composition logic will initially reside in `sktime-core`, with the option to split into
a dedicated `sktime-pipeline` package if complexity warrants. Each component package
will manage its own dependencies, extras, and release cadence, following semantic
versioning. A clear migration path and enhanced interoperability testing will be
established.

## Motivation

- **Maintainability:** Smaller, focused packages are easier to maintain, test, and
document.
- **Scalability:** Modularisation allows the project to scale with new contributions and
features.
- **Dependency management:** Isolating dependencies per package reduces conflicts and
installation issues.
- **Release flexibility:** Independent release cycles enable faster bug fixes and
feature delivery.
- **User experience:** Users can install only what they need, reducing bloat and
complexity.

## Discussion and comparison of alternative solutions

- **Status quo:** Retaining the monolithic structure avoid the overhead of managing
multiple packages, but would continue to exacerbate maintenance and dependency issues
as the project grows.
- **Partial modularisation:** Splitting only some modules would not fully address
dependency and release bottlenecks and coordination challenges.
- **Meta-package (`sktime-all`):** While useful for transition, maintaining a
meta-package long-term increases maintenance overhead and can reintroduce dependency
conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another alternative: improving the sub-packaging usint scikit-base - already estimators have their own dependency sets.

That would leave a single pypi package, but might mitigate a significant part of the issues.
See the registry module, python_dependencies tag, tests:vm tag, and deps or craft.

The proposed full modularisation, with a shared core and clear migration strategy,
offers a balance of maintainability, flexibility, scalability, and user experience.

## Detailed description of design and implementation

### Package structure

- **sktime-core:** Contains all base class definitions, shared utilities, and
(initially) pipeline/composing logic.
- **Component packages:**
- `sktime-forecasting`
- `sktime-classification`
- `sktime-transformations`
- ...and others as needed.
- **Optional:** If pipeline logic grows in complexity, introduce `sktime-pipeline` as a
separate package.
- **Meta-package:** `sktime` (which essentially is `sktime-all`) would be retained only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecating sktime itself is a very bad idea. I am against it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a veto, and this is precisely the reason I never made a formal proposal in Github issue or discussion or otherwise. I gave this split proposal explicitly in Discord (to you indivually and in the latest thread: https://discord.com/channels/1075852648688930887/1386186193066131517) and in my opinion everyone (probably excluding yourself) was clear that the single central package will no longer be needed. I don't think there is any point in me responding to any other comments any further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anirban, I don't know if I have the full context of your discussions with Franz, but this change is indeed something really important that impact a lot of users and the current state of the package.

I would not bet on having the first version proposal approved by everyone. It will indeed require convincing and providing evidence that this change is beneficial. And it is also ok to have a first proposal that is not the best one and incorporating others' suggestions to make it better.

I'm still reading and thinking about the overall content, but I thought that this comment was important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Felipe, I was not expecting immediate approval and I know as of now it has little to no specific details as Franz requested below and I 100% agree that's important and I'd have reviewed the same. I've no objection to Franz's other reciew comments, and would be very happy to discuss those.

But discussing that and/or spending any effort to make this proposal mature needs a collaborative discussion, not specifically Franz or someone else making a comment saying deprecation and eventual removal of single do-all-together-in-single-repo-or-package is not at all acceptable under any circumstances, especially after this specific suggestion is the core idea of the proposal and of course mentioned in Discord thread (and if I am not too mistaken in last 1/2 annual roadmaps - but I am not certain). Otherwise it's a futile effort from my side (or if anyone else contribute to the proposal in future).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarnabrina, I am strictly against removal of sktime as a package, which people can easily pip install sktime. As outlined below, this would impact our users negatively.

Compare scikit-learn, there is not scikit-learn-classification and scikit-learn-regression and scikit-learn-pipeline which people separately install.

But that does not mean I am vetoing other ideas in this proposal, such as better modularity and even more packages to contain the growing set of estimators.

Copy link
Contributor

@fkiraly fkiraly Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But discussing that and/or spending any effort to make this proposal mature needs a collaborative discussion

@yarnabrina, please do not suggest that I am not engaging in collaborative discussion. I am stating my opinions and am open to be convinced by arguments!

I have outlined why removing sktime as a package would be quite a bad idea, and nothing positive seems to offset it, since increased modularity or outsourcing of estimators to expansion packages can be had without removing sktime as a package.

Finally, my change requests below are constructive, I am asking you explicitly to be more detailed in your suggestions. Where would individual parts of the current package go? How would we manage releases?

Copy link
Contributor

@fkiraly fkiraly Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially after this specific suggestion is the core idea of the proposal

This is the first time I understand that your suggestion implies - and apparently has at is core, according to you - the removal of sktime as a package.

I was working under the assumption that you wanted to move sets of estimators into their own expansion packages, or move individual modules out, without removing the single-install UX and architectural cohesion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a veto

I am, at least, against doing something as massive as this, when benefits and drawbacks have not yet been spelled out and discussed, or implementation details - please do that if you are in favour of this change.

The onus of convincing core developers to do this rests on you, it is unfair to blame me or others for being skeptical. Constructive collaboration requires doing the homework to present your intention and arguments in a complete fashion.

for transition, to be deprecated and removed post-migration.

### Migration strategy

- Announce the modularisation plan and timeline to the community.
- Provide migration guides and automated scripts where possible.
- Deprecate monolithic imports with clear warnings and documentation.
- Maintain `sktime-all` as a transitional meta-package, to be deprecated after a defined
period.
- Ensure all new features and fixes are developed in the new packages post-split.

### Interoperability and testing

- Develop a dedicated integration test suite to verify interoperability across component packages and pipelines.
- Include real estimators with soft dependencies in CI, not just dummy ones.
- Use a CI matrix to test combinations of packages, Python versions, and operating
systems.
- Regularly test all documented dependency bounds, both upper and lower, using automated
tools.
- Consider a “smoke test” meta-package for CI-only integration testing.

### Release strategy

- Each package follows semantic versioning:
- **Major:** Breaking changes.
- **Minor:** New features.
- **Patch:** Bug fixes and small enhancements.
- Independent release cadence per package, allowing for more agile and targeted
releases.
- Use the modularisation as an opportunity for a major (`1.0.0`) release, signalling
stability and the new structure.
- Release management can be distributed among maintainers familiar with specific
packages.

### Documentation and user support

- Update documentation to reflect the new package structure and installation
instructions.
- Provide migration guides, FAQs, and clear guidance on selecting and installing
component packages.
- Clearly document the stability and support status of each package.

## Potential concerns and look out areas

- **Interoperability:** Must ensure pipelines and workflows remain seamless across
packages; integration testing is critical.
- **User confusion:** Clear documentation and migration support are essential to prevent
confusion during and after the transition.
- **Maintenance overhead:** More packages mean more CI, releases, and documentation to
manage; governance and maintainership must scale accordingly.
- **Fragmentation:** Risk of ecosystem fragmentation if not managed carefully; maintain
a strong shared core and community engagement.
- **Meta-package deprecation:** Plan and communicate the deprecation of `sktime` to
avoid long-term maintenance burden.