Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWELL or CYLC too slow in starting tasks #497

Open
rtodling opened this issue Feb 6, 2025 · 1 comment
Open

SWELL or CYLC too slow in starting tasks #497

rtodling opened this issue Feb 6, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@rtodling
Copy link
Contributor

rtodling commented Feb 6, 2025

I don't know if anyone has noticed this, but I believe either swell or cylc are slow in starting up tasks. For example, if I were to write a c-shell program that looked at the directory where the obs files are and copy them over to a give location, I guarantee you it would be considerably faster ... there seems to be a delay of when tasks start- and that's applicable to every task - don't know if this is a cylc thing evaluating what the flow entails and then figuring out what can do in parallel, etc or something.

All I know is that copying the obs file and bkg files into a work directory and constructing the yaml for a Var runs should be a much faster process than it is now - perhaps there is a way to time how the delay between things the launching of the suite and the time when things actually start happening.

@ashiklom ashiklom added the enhancement New feature or request label Feb 7, 2025
@ashiklom
Copy link
Collaborator

ashiklom commented Feb 7, 2025

Thanks Ricardo. Agreed that this kind of thing should be basically instantaneous. We had a discussion of this at the Swell development meeting.

First thing we should do is confirm that cylc itself is the culprit. Best way to do this is check the task timings (which we already capture) to make sure that the tasks themselves execute quickly.

https://github.com/GEOS-ESM/swell/blob/develop/src/swell/tasks/base/task_base.py#L323-L329

Second, we should confirm that cylc doesn't introduce significant overhead into super simple tasks. I suggest setting up some trivial cylc workflows with a bunch of echo "hello world"-style tasks, just to make sure that these trivial workflows run as instantaneously as they should. (As a bonus, run these same experiments with the module load commands Swell's cylc binary expects; that will give us some clues about the overhead of that module loading every time).

Assuming neither of those is the issue, that leaves us with some more complicated issues. One likely culprit we discussed is that the cylc binary loads the full set of required modules every time, which introduces nontrivial overhead. If that's the bottleneck, we need some way to load the modules once, capture the resulting environment, and then propagate that environment (which is basically free) via something like subprocess.run(..., env=<captured_env>) in the places where we actually execute cylc tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants