-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revising the signac-flow API #665
Comments
We discussed the proposed changes in the developer meeting today (2022-08-24), and agreed on 1, 2, and 3. We disagreed with 4, preferring to remove The disagreements from the proposed API (primarily 4) in part stem from the burden of having to be able to create a full mapping of directives of a group at instantiation which for dynamically generated operations is unduly burdensome. This makes 5 impossible to cleanly do then as specifying both groups associated with an operation and the group's operation directives is clunky. Below is an example of the modified proposed API changes. @group(directives={"walltime": 2.0})
@FlowProject.operation(
with_job=True,
aggregator=agg_func,
directives={"walltime": 1.0}
)
def op(job):
pass |
Also what are people's thoughts on keeping pre/postconditions as decorators? The two options as I see it are @FlowProject.pre.true("run_op")
@FlowProject.operation(directives={...}, with_job=True):
pass or @FlowProject.operation(directives={...}, with_job=True, pre=[FlowProject.pre.true("run_op")]):
pass Elsewhere, we have moved to a single decorator approach which simplifies the logic on our side. I would say the first is more "pretty" in an ascetic sense, but the second follows a declaration (making a function an operation) is definition (defines the parameters of the operation) which is simpler to code and localize. Overall, given our push on |
@csadorf It feels like we are slowly coming full circle back to the original signac-flow API, no? 😄 This is looking eerily similar to
just with different arguments. Having said that, I agree that option 2 is better. It gives a more declarative and unified API. I think it would also fit better with some of the longer-term refactorings around the interaction of projects and groups. |
I am hesitant about the proposed changes to make That said, I understand there is some good motivation from the changes to |
I agree with @bdice here. I don't clearly see the user-benefit to moving all arguments and options into a single decorator. The current API clearly delineates the description of the workflow logic from the description of the execution environment. Operations(-group) decorators declare a function to be part of the worklow and allow for the specification of the execution environment. The direct association of operation-decorators and directives therefor makes sense. The pre- and post-conditions on the other hand describe the worklow logic. This separation allows users to very easily describe and understand their workflow and then declare multiple execution environments for those operations. Further, and for the reason I just explained, I would assume that in the vast majority of cases the conditions would be similar even for operation-functions that are part of multiple operation-groups. Forcing users to declare conditions as part of the operation-decorators would lead to substantial duplication. Think about it this way, most user workflows would probably work perfectly fine if the Last, but not least, the argument "there should only be one way to do one thing" overlooks that the need to use special conditions for special execution environments or sub-workflows is not exactly the same thing as describing the general workflow logic and should therefor be considered a special or advanced case. And even if it was equivalent, it would appear as an overzealous application of this general guideline to me. I think we would lose one of the most appealing elements of the current API by making this change and I do not support this unless someone can provide a very compelling argument. |
Thanks for the thoughts @bdice and @csadorf. The most compelling two arguments I can think of for the new syntax would be
To address @csadorf comments specifically
I am not so certain that workflow logic and execution are meaningful to separate. They are certainly different, but I am not sure that warrants their separation, and I fail to see how this makes defining multiple execution environments difficult (unless using some class hierarchy which separates logic from execution).
The ways groups are designed I feel this is a non-issue. No duplication would occur if using a single project class with multiple execution groups.
I agree that multiple execution environments is relatively advanced. However, I don't think this makes the general case any more difficult than the current API. Now if by this you are referring to conditions changing with execution environments then yes the new API would result in users having to make lists of conditions based on environment rather than simply composing decorators based on environments. Even here though, I would argue a new Python user would likely find creating condition lists more intuitive then using a decorator as a standard function. I think the nature of the break (essentially every modern script will be broken) is a very strong counter point to the arguments I give above though. As I said in the original comments though, I am not trying to see my dog win necessarily, but I wanted to start this discussion whatever the outcome may be. |
I would like to stress that the separation of workflow logic and execution environment is one of the main innovation drivers in this space. Basically any modern workflow orchestrator, whether that is dask, or PREFECT, or Metaflow will aim to separate the description of the workflow logic and its execution via worker units. Ideally, a researcher should not need to worry about the execution environment at all (local, cloud, Kubernetes cluster, etc.). Signac is actually very good at this. You only have to worry about the execution environment when you start running on HPC clusters and then you need to provide only the information that these HPC systems simply cannot infer. |
I guess what I mean is what is meaningfully different in the separation of execution details from @FlowProject.operation(pre=..., post=...)
def op(job):
pass
@FlowProject.post.true("bar")
@FlowProject.pre.true("foo")
@FlowProject.operation()
def op(job):
pass We do allow the simultaneous specification of both the execution environment and logic, but we do not mix them internally so to speak. The question is this a useful distinction for the user to be exposed to. Through groups we allow for other execution environments outside the default, but explicitly do not allow for different logic. |
I see a lot of good points being made here. I don't have a strong opinion regarding the aesthetics of the API that @bdice mentioned, but I do agree with @csadorf that separating execution environments from workflow definition is an area where flow stands out and we don't want to lose that. I am not necessarily convinced that moving pre/post into the operation declaration breaks that separation in a meaningful way (mostly because I think the way everything is tied into the FlowProject already blurs these lines to a significant extent, but that is a topic for a separate discussion), but I am also not convinced that the benefit of the change is large enough to overcome the heavy reservations expressed in this thread. @bdice's comment on simplifying internals is my bigger focus. As I have stated repeatedly to various degrees, overemphasizing API stability hinders the long-term sustainability of flow at present. I think others agree, which has led to a relatively fast deprecation cycle in recent releases. I anticipate further large changes to flow internals as we refactor, and I expect that those will lead to API breaks around operation declaration anyway. Given the current state of this discussion, my inclination is to table changes to pre/post until we see how other refactorings fall out and take it from there. |
Then answer is a very clear yes! This is a very important distinction for the user. Again, a user should be enabled to declare workflow logic and execution environment independently. Having the ability to compartmentalize these two problems changes how you are able to think and approach the problem and is not purely aesthetic. We have always claimed that one of the main benefits of using signac is that of reproducibility. Being able to read and understand a signac workflow without getting distracted by the execution conditions is a key driver for that.
If there are areas in the flow API where that separation is not or only poorly upheld, then that would be an area for improvement, not an argument against maintaining said separation.
This is not a question of API stability IMO. I am not against this change because it is breaking, I am against this change, because I think it regresses our API and reduces the user experience. If you want to make any meaningful change here that also reduces the implementation complexity, I could get on-board with a full separation of workflow logic and execution directives where each of those two things are declared as two separate decorators ( |
Given the strength of oppositions, I say we keep everything as is. I will get to crafting a release then. |
@csadorf sorry if I wasn't clear enough, I don't think I properly stated my position. I am agreeing with pretty much everything that you're saying, I just wanted to make sure that everyone was on the same page regarding
I wanted to clarify that the highest priorities for me with regards to the broader point of this issue (revising the signac-flow API) are:
My main point was that I do not think we should be putting much effort into API changes that are primarily aimed at improving the user experience. We should prioritize API stability because I think it's good for the project to remain in a state where multiple maintainers are still familiar with the current APIs and usage patterns, but not at the expense of not being able to simplify the code in ways that improve long-term maintainability. Specifically, I would support changes like removing support for FlowProject inheritance for workflow composition since FlowGroups can do the same thing, or moving more of the submission/execution logic out of FlowProject into FlowGroup to better separate workflow logic from input data. I think those types of changes could substantially simplify our internals while only affecting a tiny fraction of users. |
@b-butler Please do not feel discouraged to argue your position just because there is some opposition. I intentionally waited with my initial response because I wanted to make sure that I actually have arguments against this change rather than just having a reactionary response, because I am personally used to (and thus emotionally attached to) the current API. @vyasr Thank you for clarifying. Fully in support of any sensible changes that simplify implementation with only very minor impact on the quality of user experience. That said, I believe that a trade-off between API quality and implementation clarity usually hints at an inconsistency in the translation of the user intent to our data and execution models. It means the API either oversimplifies the problem (i.e. lacks information) or does not faithfully represent the problem. This means that in the majority of cases, API quality and implementation clarity should be correlated with each other. |
@csadorf I appreciate the concern 😃. I haven't taken it personally, we are just debating opinions. My last comment was more me saying, given a sizable group of opposition you, @cbkerr , and @bdice that it is probably sensible to keep as is for now. It isn't as if your arguments carry no merit. I want to move on the next couple of releases, and given this is a bit more of a divided issue than some of the previous changes, I punting the decision it into the future is the best course of action. |
I agree with that statement when starting from a blank slate. That does not represent the current state of flow, though. The specific examples I mentioned above are a few of the cases where I think our internal execution model blurs lines between concepts or duplicates functionality. Part of the issue in my view is that the API is too tightly coupled to the CLI, and I think we would be better served by introducing an additional layer between the internals and the Python APIs that implement CLI functionality. I do agree that improving our internal implementation should, in the long term, also improve our API. |
It seems like you are ultimately agreeing with me. 😁 |
Feature description
Coming from https://github.com/glotzerlab/signac/wiki/GSoC-2022-Projects#2-revising-the-signac-flow-api, we can start splitting this up into chunks and move towards kind-of a decorator-less API.
Proposed solution
I think we do these in the sequence posted below (ordered in increasing complexity)
with_job
andcmd
through the operation decorator.make_group
. This involves dropping support of with_directives for groups in the process.I might have missed a few, please feel free to add this in the description itself.
The text was updated successfully, but these errors were encountered: