feat: add ray repartition pipeline#985
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the repartition_mapper operator, which allows for repartitioning Ray Datasets into a target number of blocks. The implementation includes the operator logic, configuration updates, documentation, and unit tests. Review feedback consistently points out that since the operator inherits from the Pipeline class and performs dataset-level transformations, it should be renamed to RepartitionPipeline and reclassified from a mapper to a pipeline across the codebase and documentation to maintain architectural consistency.
cbfbbc5 to
4410e80
Compare
|
This is a very useful new op, thanks! The implementation looks good to me. One minor suggestion: since it is Ray-only, maybe rename it to ray_repartition_pipeline for consistency with other Ray-only ops. If you prefer, we can also make this adjustment on our side. |
|
Thanks for the suggestion! Renamed it to |
Summary
Validation