Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update crossby attribute #128

Open
wants to merge 8 commits into
base: bugfix/merge_stages
Choose a base branch
from

Conversation

kels271828
Copy link
Member

WHAT
Update crossby attribute to work the same way as groupby.

WHY

  • Declaring parameters to parallelize over via the crossby attribute (e.g., crossby=["param"]) is more explicit than just having param as a list of parameters in the stage config.
  • Users won't have to create their own custom stage config with _crossable_params.

HOW

  • Added stage.submodel_ids generator to fix a mypy error
  • Removed StageConfig._crossable_params private attribute
  • Changed groupby, crossby, subset_ids, and param_ids to tuples, changed default from None to an empty tuple
  • Added field_validator for Pipeline.groupby, Stage.groupby, and Stage.crossby to enforce unique items
  • Added crossby field to Stage and removed private attribute _crossby
  • Added model_post_init methods to RoverStage and SPxModStage to require groupby attribute but prevent crossby
  • Updated subsets and parameters utility modules for tuples
  • Updated relevant tests

QUESTIONS

  • I updated groupby, crossby, subset_ids, and param_ids to be tuples instead of sets. May have some merge conflicts when merge with the UniqueList PR
  • Relevant to both groupby and crossby: Do we want a way to allow users to prevent the use of either attribute?
    • If pipeline has groupby, it will get passed to all stages. But we could have a stage (e.g., preprocessing) that we don't want to have groupby. Right now, the thing to do would be to NOT have groupby in pipeline, and then any column that we want all parallel stages to have would have to be included in each of the parallel stage's definitions. What if someone could say groupby=False instead? Note: Unless we do something like this, it is always possible for a stage to have subsets, so all functions (except collect) will need to have subset_id as an argument.
    • Since I removed _crossable_params, a user can parallelize over any parameter in a stage's config. Do we want to restrict which parameters can be parallelized over, or restrict a stage from using crossby altogether? For RoverStage and SPxModStage, I added a check in model_post_init that does not allow crossby, but there could be a better way. Unless a stage does something like this, it will always be possible for a stage to have parameter sets, so all functions (except collect) will need to have param_id as an argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant