-
Notifications
You must be signed in to change notification settings - Fork 59
feat: added option to select default inferencing runtime #2893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added option to select default inferencing runtime #2893
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
Files not reviewed (1)
- packages/backend/package.json: Language not supported
105c01f
to
26409bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot have a default for inference server not dealing with the same kind of files, like you cannot default to llamacpp instead of whispercpp as model handled by whispercpp cannot be handled by llamacpp.
This need to be specific to GGUF models, or safetensors. Choice should be llamacpp
, openvino
, vllm
.
Meaning this PR should be delayed after the following PRs
In the meantime we cannot have a default, as the inference provided depends on the models, not the other way around
964fb7f
to
df86e95
Compare
1b11247
to
df86e95
Compare
df86e95
to
b790b86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not configure anything but when I go to the settings page the settings tell me that llama-cpp is the prefered (I would expect to be none) but if I go to the recipes page I see only one recipe (object detection)
So I think we are missing the no value case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
However, I though you said you add a select in ModelSelect so that we can select models even if they are not for the preferred runtime ?
In the ModelSelect there is a mechanism of filtering the models based on selected Recipe ~ runime for this recipe |
I think ModelSelect is used in other contexts ? |
Yes it is used in 3 or 4 places I think. Should I remove the filter from ModelSelect then? |
Should I now also add the openvino filter? since the openvino PR is now closed? |
…ences Signed-off-by: Evzen Gasta <[email protected]>
Signed-off-by: Evzen Gasta <[email protected]>
Signed-off-by: Evzen Gasta <[email protected]>
Signed-off-by: Evzen Gasta <[email protected]>
Signed-off-by: Evzen Gasta <[email protected]>
Signed-off-by: Evzen Gasta <[email protected]>
Signed-off-by: Evzen Gasta <[email protected]>
8a848a6
to
aa62f29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@axel7083 can you PTAL? |
I think the behavior is fine. And we have a couple of PRs that are waiting to be merged after this one, which will make the experience more logical. What we are proposing the user is to use different Inferencing Runtimes with AI Lab. With this ticket, it will default the entire scope of AI Lab to a specific runtime. For other situation (and probably the most common case), the user are most likely going to change the inferencing runtime at the time they are going to start one. For example, when the user is starting a recipe, when the user is starting to serve a model or starting a playground. We have the corresponding tickets for that: #2612, #2613 So the current behavior is good from my point of view. |
Dismissing my request change as @slemeur agree to go ahead with this change.
My personal opinion is that this feature will not be used because it is a configuration deep in the settings, and having it increase the complexity of AI Lab, for QE and maintenance
What does this PR do?
Adds settings to chose default inference runtime to preferences
Implements mockup in the issue
Screenshot / video of UI
Screencast_20250428_150354.webm
What issues does this PR fix or reference?
Closes #2611
How to test this PR?
Select different inference runtime -> models should be filtered out