-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#3451: Support __partitioned__
protocol
#3452
Conversation
Signed-off-by: Igoshev, Yaroslav <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #3452 +/- ##
===========================================
- Coverage 83.23% 48.79% -34.45%
===========================================
Files 147 144 -3
Lines 15246 15529 +283
===========================================
- Hits 12690 7577 -5113
- Misses 2556 7952 +5396
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a high priority? What consumers of this type of protocol already exist?
yeah, let's not consider this for 0.11 |
@devin-petersohn You're raising a chicken-and-egg issue. Generally, adding support as a consumer is more involved but at the same time adds more value. Adding this at the producer side is less involved. Like here it is very isolated - it has no effect on anything else. Current Implementation status is
More to come, but we need producers! I'd appreciate seeing modin as a producer. To add more value for modin, we can add support for consuming this API. A simple "from_partitioned" would provide the basic functionality. "Automatic" detection when accepting data at construction time would be nice to have and could be added later. |
It would also help to match partitions of the left object (Modin DF/Series) and partitions of the right object (an instance supporting |
It's not a big deal to add something, but I don't want code bloat for things that won't be useful. I think it's a good idea to have a protocol like this, but who would use it? What library has said that they actually need an interface like this to be able to work with Modin? We at least need to know if it would be useful to a meaningful consumer before we add it. Typically these protocols have significant amounts of input from multiple interested parties (producers and consumers) to answer these questions before they are even designed. I just want to make sure we aren't adding something that won't be used. I don't deny the usefulness of this protocol, but whether or not it will be used is still not answered to me. |
Yes, we are open to feedback and suggestions for different designs and features. As mentioned above, your feedback/suggestions is/are highly appreciated. Talking about something concrete is usually easier than keeping discussions in the abstract.
That's of course a valid request. One issue in the process of ramping this up is that we are targeting something that is currently less of a concern but we want to make sure we can avoid running into a situation where implementations of distributed features become messy.
The idea is to allow a packages to consume various distributed containers. It is not meant to be an enabler for modin specifically. Of course packages like xgboost_ray can support modin/RayDataSetl/MLDataSet/HeAT/... by implementing dedicated code for each. Avoiding such specialization for each (in particular upcoming) structure is the major motivation for this. |
A new discussion was initiated with the data-API consortium: data-apis/consortium-feedback#7 |
@YarShev should we mark this as draft then? |
@vnlitvinov , yes, we can mark the PR as a draft for now. I'll update it once the protocol settles. |
Closing this PR as no plans on pushing this forward. |
Signed-off-by: Igoshev, Yaroslav [email protected]
What do these changes do?
flake8 modin
black --check modin
git commit -s
__partitioned__
protocol #3451docs/developer/architecture.rst
is up-to-date