-
Notifications
You must be signed in to change notification settings - Fork 30
[STEP] split sktime into per component packages
#45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
02fd6e8 to
6bcc5fe
Compare
|
|
||
| As the sktime codebase continues to expand, the current monolithic structure is | ||
| presenting increasing challenges in terms of maintenance, dependency management, and | ||
| release coordination. This proposal outlines a plan to modularise sktime into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would more packages not increase the amount of challenges in release *coordination? Given that it is easier to coordinate (or simply: to carry out) a release for a single package than for 5.
| - **Meta-package (`sktime-all`):** While useful for transition, maintaining a | ||
| meta-package long-term increases maintenance overhead and can reintroduce dependency | ||
| conflicts. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is another alternative: improving the sub-packaging usint scikit-base - already estimators have their own dependency sets.
That would leave a single pypi package, but might mitigate a significant part of the issues.
See the registry module, python_dependencies tag, tests:vm tag, and deps or craft.
| - ...and others as needed. | ||
| - **Optional:** If pipeline logic grows in complexity, introduce `sktime-pipeline` as a | ||
| separate package. | ||
| - **Meta-package:** `sktime` (which essentially is `sktime-all`) would be retained only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecating sktime itself is a very bad idea. I am against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a veto, and this is precisely the reason I never made a formal proposal in Github issue or discussion or otherwise. I gave this split proposal explicitly in Discord (to you indivually and in the latest thread: https://discord.com/channels/1075852648688930887/1386186193066131517) and in my opinion everyone (probably excluding yourself) was clear that the single central package will no longer be needed. I don't think there is any point in me responding to any other comments any further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anirban, I don't know if I have the full context of your discussions with Franz, but this change is indeed something really important that impact a lot of users and the current state of the package.
I would not bet on having the first version proposal approved by everyone. It will indeed require convincing and providing evidence that this change is beneficial. And it is also ok to have a first proposal that is not the best one and incorporating others' suggestions to make it better.
I'm still reading and thinking about the overall content, but I thought that this comment was important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Felipe, I was not expecting immediate approval and I know as of now it has little to no specific details as Franz requested below and I 100% agree that's important and I'd have reviewed the same. I've no objection to Franz's other reciew comments, and would be very happy to discuss those.
But discussing that and/or spending any effort to make this proposal mature needs a collaborative discussion, not specifically Franz or someone else making a comment saying deprecation and eventual removal of single do-all-together-in-single-repo-or-package is not at all acceptable under any circumstances, especially after this specific suggestion is the core idea of the proposal and of course mentioned in Discord thread (and if I am not too mistaken in last 1/2 annual roadmaps - but I am not certain). Otherwise it's a futile effort from my side (or if anyone else contribute to the proposal in future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yarnabrina, I am strictly against removal of sktime as a package, which people can easily pip install sktime. As outlined below, this would impact our users negatively.
Compare scikit-learn, there is not scikit-learn-classification and scikit-learn-regression and scikit-learn-pipeline which people separately install.
But that does not mean I am vetoing other ideas in this proposal, such as better modularity and even more packages to contain the growing set of estimators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But discussing that and/or spending any effort to make this proposal mature needs a collaborative discussion
@yarnabrina, please do not suggest that I am not engaging in collaborative discussion. I am stating my opinions and am open to be convinced by arguments!
I have outlined why removing sktime as a package would be quite a bad idea, and nothing positive seems to offset it, since increased modularity or outsourcing of estimators to expansion packages can be had without removing sktime as a package.
Finally, my change requests below are constructive, I am asking you explicitly to be more detailed in your suggestions. Where would individual parts of the current package go? How would we manage releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially after this specific suggestion is the core idea of the proposal
This is the first time I understand that your suggestion implies - and apparently has at is core, according to you - the removal of sktime as a package.
I was working under the assumption that you wanted to move sets of estimators into their own expansion packages, or move individual modules out, without removing the single-install UX and architectural cohesion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a veto
I am, at least, against doing something as massive as this, when benefits and drawbacks have not yet been spelled out and discussed, or implementation details - please do that if you are in favour of this change.
The onus of convincing core developers to do this rests on you, it is unfair to blame me or others for being skeptical. Constructive collaboration requires doing the homework to present your intention and arguments in a complete fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort. Some remarks:
- there is no clear plan here what goes where. I think this needs to be much more concrete than this so we can even consider implementation.
- you also need to comment about the user journey. What would a user import now? How would the common vignettes look like? What would a user have to install, and how would they learn of this information?
- I think it is an extremely bad idea to deprecate
sktimeitself. Very strongly against. - you need to comment about the maintenance and release workflow. How would it looks like?
- overall, the explanation of the end state lacks concretion.
- you should comment explicitly on how the repository or repositories would be managed.
Finally, who would carry this out? Would you be a part of this effort if some form of it reaches consensus?
|
I have also asked a popular chatbot whether they would recommend splitting a package like For a mature, widely-used, and heavily-integrated package like scikit-learn, I would not recommend breaking it up into multiple separately-distributed packages at this time — at least not publicly or for general use. Here's a reasoned breakdown that weighs technical, community, and ecosystem-wide considerations: 🔍 scikit-learn Context
Reasons to Keep it as One Package
scikit-learn’s current single-install UX (pip install scikit-learn) is extremely simple and beginner-friendly. Most users rely on multiple submodules (e.g., sklearn.linear_model, sklearn.preprocessing) and expect them all to be co-installed and compatible.
Algorithms, pipelines, metrics, and preprocessing tools often work together. Splitting into multiple packages could introduce version mismatches or break that cohesion.
Scikit-learn integrates tightly with numpy, pandas, joblib, matplotlib, and others.
Maintaining consistent documentation, versioning, testing, and compatibility across multiple modules is non-trivial and would increase CI, maintenance, and community overhead. Recommended Alternatives to Full SplitInternal Refactoring with Clean Submodules:Improve code modularity internally without affecting the external API. Example: keeping sklearn.cluster or sklearn.ensemble in separate directories/repos internally, while still building and distributing a single wheel. Pluggable Extension System:Support optional extensions or plugin APIs (e.g., third-party models, transformers). Could enable innovation outside the core without bloating the package. Namespace Package Design (in future):If scikit-learn 2.0 ever happens, consider using namespace packages (sklearn.linear, sklearn.ensemble, etc.) only if strict backward compatibility is not required. Meta-packages (Not Recommended Now):While some ecosystems use meta-packages (e.g., scikit-learn-core, scikit-learn-models), this would likely cause more confusion than benefit for this audience. Risks of Splitting for scikit-learn
Final RecommendationStick with the single-package structure for scikit-learn — it is well-suited for its audience and scope, and the costs of splitting outweigh the benefits. However, modularize the internals, encourage plugin ecosystems, and future-proof the architecture to allow partial optional loading or third-party innovation without bloating core. |
|
I broadly agree with the above. I think if we split, we should keep machinery and core estimators in We could move less commonly used estimators to separate packages and treat them as extensions. Though that behaviour is already the status quo if users have no soft dependencies installed. |
|
This is vs
I do not know which chatbot you asked, but I think https://chatgpt.com/share/68739758-8c9c-800c-87a9-ee036edcad96
Nothing stops |
Our core dependency set is similarly lean: Out of these,
I think this is wrong. It is, in fact, the precursor of all unified APIs for AI in python! The architecture is also quite similar, except that
I disagree about the architectural perception and vision. |
Naming is one of my concerns, but also layer architecture.
If you read my statement, I am against removal of For the rest, I am willing to listen to your proposal, once it spells out what would go where. You also said the following:
I do not see how this could ever be the case, it seems like a very significant operation. As said I am happy to listen to argument, so please spell out where things would go and how things would move. |
|
Anirban, I think this discussion is really useful, even if we decide not to go with it, but a least have motivations behind splitting/not splitting clearly defined for future users/devs. Another good consequence of this proposal is having new ideias for the future, thinking about long-term.
Thinking about some of concrete advantages and disadvantages of having separate packages:
Disadvantages:
I wonder if we should make clear what problem are we trying to solve:
|
I believe that any split that creates package that are coupled but separated is bad for both. I believe we should have the same principles of software development here: aim for high coherence, and low coupling. Question: what parts of sktime are currently together, but are not used together, or could be easily separated without problems? About sktime-core/sktime, to avoid breaking user code, one potential solution would be requiring an optional dependency such as |
|
@felipeangelimvieira, to add to your thoughts: I think the idea of coupling/cohesion and modularity is a good one. I also want to add: we should not exclusivly frame an initiative to increase modularity in the context of splitting the package. In fact, it is almost always true that improved modularity can be achieved within a single package just as well as in multiple packages. That increasing modularity and improving architectural structure requires to split the package is a misleading implication. Hence I would suggest to first consider structure and architecture, and only second how to distribute the give structure across modules and packages. From this perspective I suspect that "splitting package" will feel less necessary, since we uncouple actual pain points (like testing times, coupling, etc) from the package management question. |
And I think this precisely is the crux of the problem. At the current state I do not think the modules are uncoupled, they are interdependent, and necessarliy interdependent for the composition cases that are For instance, users will build pipelines from transformers, detectors, and forecasters. If we now separate into This is why I think that @yarnabrina should spell out:
|
No description provided.