-
Notifications
You must be signed in to change notification settings - Fork 501
Limit max concurrent template runs #3627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
@stefannica I've changed this now to handle all the run template requests in a sequential manner with a configurable amount of worker threads. These threads are however additional threads on top of the ones that are created/used by fastapi. Do you think this is a good idea or should I try to get the worker threads from the same pool, while still somehow limiting the concurrency? |
run_template_executor = ThreadPoolExecutor( | ||
max_workers=server_config().max_concurrent_template_runs | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global objects like these are problematic because they get created the moment you import the module and allocate resources even though the actual functionality isn't used.
I recommend the following alternative approach similar to what is used in zenml/zen_server/utils.py
:
- declare a nullable global variable for this
- have helper functions for initialization/cleanup. Call these from the FastAPI startup/shutdown hooks. Cleanup is particularly important, because you might want to wait for the threads to finish running or kill/join the running threads altogether before exiting.
- have a getter function for actual use
src/zenml/config/server_config.py
Outdated
if data.get("max_concurrent_template_runs", None) is None: | ||
# Block a maximum of 1/4 of the thread pool size for concurrent | ||
# template runs | ||
thread_pool_size = data.get( | ||
"thread_pool_size", DEFAULT_ZENML_SERVER_THREAD_POOL_SIZE | ||
) | ||
data["max_concurrent_template_runs"] = int(thread_pool_size) // 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a "before" validator, does that mean that these values might be strings ?
else: | ||
run_template_executor.submit(_task_with_analytics_and_error_handling) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of threads is limited, but not the queue size. This means that you can still run into a situation where the server is flooded with requests and eventually croak due to OOM.
I recommend you also cap the queue size to match the thread pool size. This is a good way to create back-pressure and actually make the client wait (and/or eventually time out) if it's creating too many parallel run template requests instead of killing the server.
Yes, this means that FastAPI request threads will eventually be blocked waiting for the run template threads to complete. I think this is perfectly acceptable, as it gives server admins a knob that they can properly calibrate without compromising the server's reliability.
One more thing you can do is to make the submit requests cancellable: if the client times out, FastAPI also has the option to notify you that the request timed out (see https://fastapiexpert.com/blog/category/fastapi/#understanding-client-disconnection-in-fastapi). If you don't handle client disconnects for jobs like these, they will just keep on running. I.e. you can run await request.is_disconnected()
to check if the request was cancelled before you execute anything in the worker thread.
ZenML CLI Performance Comparison (Threshold: 1.0s, Timeout: 60s, Slow: 5s)❌ Failed Commands on Current Branch (feature/limit-concurrent-template-runs)
🚨 New Failures IntroducedThe following commands fail on your branch but worked on the target branch:
Performance Comparison
Summary
Environment Info
|
src/zenml/config/server_config.py
Outdated
max_concurrent_template_runs: The maximum number of concurrent template | ||
runs that can be executed on the server. If not specified, the | ||
default value of 1/4 of the thread pool size will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_concurrent_template_runs: The maximum number of concurrent template | |
runs that can be executed on the server. If not specified, the | |
default value of 1/4 of the thread pool size will be used. | |
max_concurrent_template_runs: The maximum number of concurrent template | |
runs that can be executed on the server. |
The last part is no longer accurate, is it ?
def shutdown(self, **kwargs: Any) -> None: | ||
"""Shutdown the executor. | ||
|
||
Args: | ||
**kwargs: Keyword arguments to pass to the shutdown method of the | ||
executor. | ||
""" | ||
self._executor.shutdown(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't called anywhere. Are you missing the cleanup/deinitialization part ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the call in the fastapi shutdown event handler
return _run_template_executor | ||
|
||
|
||
def initialize_run_template_executor() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a de-initialize/cleanup function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this in a the fastapi shutdown event
Describe changes
This PR limits the amount of concurrent template runs.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes