-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Node Grouping in Kedro Deployment #4319
Comments
This is the most important problem for me. It's also tightly coupled with dependency management - the minute we make it easier to isolate different parts of the pipeline to be run on different containers you get into dependency isolation questions. |
Users today also tend towards tags because namespaces are a pain to use |
Summary of Tech Design Sessions on Node Grouping (4, 6 Dec 2024)VideoDiscussed:
Decisions:
Next Steps:
Additional Information:Feel free to add your thoughts or suggestions here. If there’s anything to update or clarify in the summary, please let me know. |
from @marrrcin :
|
I value the desire of keeping momentum and devoting more time to understanding what namespaces do and how do they work so at least we can discuss more intelligently about them. But going forward I think there's an opportunity to explore innovative new solutions, or just make namespaces an implementation detail so that they continue to exist but they become invisible for the user and get swept under a more usable API layer. I know I sound like a broken record but I'll say it again: more documentation will not fix bad Developer Experience. In short: agree to continue exploring them (on the grounds of keeping the momentum on this topic and not having to do another knowledge sharing session in ~12 months), but I think we should timebox this effort, and put a deadline on when do we think we're ready to go back to the drawing board and continue iterating as a team. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I think namespaces are critical for any way we eventually unify and simplify deployment to orchestrators, I'm arguing dependency isolation is the basically the same problem - but would like to see if others agree? |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I agree with the need for dependency isolation for large projects 👍🏻 For grouping by the namespace - although it's fine to have anything to group on, namespaces were (at least for me) used to group larger chunks of the pipeline, e.g.:
etc. (like in the dynamic pipelines https://getindata.com/blog/kedro-dynamic-pipelines/ ). So grouping by the namespace will actually require thinking about the target deployment as soon as you start writing the pipeline, which means it will impact the data catalog/parameters creation too (because of namespace prefixes) = more cognitive load. Plus the projects might end up with having a lot of "synthetic" namespaces, just for the sake of preparing the pipeline for an orchestrator. |
Copy-pasting some comments about dependency isolation to #4147 and collapsing them here |
OK, but I'm going to just reiterate my points specifically related to node grouping and deployment, so that they don't get skipped/misconstrued as being only relevant for dependency management:
|
Next steps:
|
Thanks, @astrojuanlu, for the summary—it sounds great! We've also drafted a problem statement, which we hope will help us focus more on the goal. What do you think? "How might we support users in optimizing their code for deployment by effectively grouping their nodes and providing clear, actionable guidance through documentation?" |
Overview
Part of #4317. Users have expressed the need to merge multiple Kedro nodes into a single task on deployment platforms for better clarity and efficiency. Current plugins offer limited support for this, often requiring manual grouping, which complicates deployment and reduces performance.
User Insights and Challenges
Problem Statement
How can we design a flexible and efficient node grouping mechanism - using tags, namespaces, pipelines, or other methods - to maximise usefulness for users and streamline the deployment process?
Proposed Solution
Tech Design - 4th & 6th December
It was decided to use namespaces for node grouping purposes and to implement helper functions within Kedro to simplify the deployment of grouped nodes with namespaces. Full summary of TD
Next steps:
The text was updated successfully, but these errors were encountered: