Skip to content

feat: added option to select default inferencing runtime #2893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 17, 2025

Conversation

gastoner
Copy link
Contributor

@gastoner gastoner commented Apr 16, 2025

What does this PR do?

Adds settings to chose default inference runtime to preferences
Implements mockup in the issue

Screenshot / video of UI

Screencast_20250428_150354.webm

What issues does this PR fix or reference?

Closes #2611

How to test this PR?

Select different inference runtime -> models should be filtered out

@gastoner gastoner changed the title feat: added option to select default interferencing runtime feat: added option to select default inferencing runtime Apr 16, 2025
@gastoner gastoner requested review from slemeur and Copilot April 16, 2025 15:20
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (1)
  • packages/backend/package.json: Language not supported

@gastoner gastoner force-pushed the default_interferencing_runtime branch 5 times, most recently from 105c01f to 26409bf Compare April 28, 2025 12:59
@gastoner gastoner marked this pull request as ready for review April 28, 2025 13:06
@gastoner gastoner requested review from benoitf, jeffmaury and a team as code owners April 28, 2025 13:06
@gastoner gastoner requested review from feloy and axel7083 April 28, 2025 13:06
Copy link
Contributor

@axel7083 axel7083 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have a default for inference server not dealing with the same kind of files, like you cannot default to llamacpp instead of whispercpp as model handled by whispercpp cannot be handled by llamacpp.

This need to be specific to GGUF models, or safetensors. Choice should be llamacpp, openvino, vllm.

Meaning this PR should be delayed after the following PRs

In the meantime we cannot have a default, as the inference provided depends on the models, not the other way around

@gastoner gastoner marked this pull request as draft April 30, 2025 07:15
@gastoner gastoner force-pushed the default_interferencing_runtime branch from 964fb7f to df86e95 Compare May 6, 2025 07:17
@gastoner gastoner requested a review from jeffmaury May 12, 2025 08:25
@gastoner gastoner force-pushed the default_interferencing_runtime branch from 1b11247 to df86e95 Compare May 12, 2025 09:42
@gastoner gastoner force-pushed the default_interferencing_runtime branch from df86e95 to b790b86 Compare May 12, 2025 13:46
@gastoner gastoner marked this pull request as ready for review May 12, 2025 13:51
@gastoner gastoner requested a review from axel7083 May 13, 2025 05:58
Copy link
Collaborator

@jeffmaury jeffmaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not configure anything but when I go to the settings page the settings tell me that llama-cpp is the prefered (I would expect to be none) but if I go to the recipes page I see only one recipe (object detection)
So I think we are missing the no value case

@gastoner gastoner requested a review from jeffmaury May 14, 2025 08:36
Copy link
Collaborator

@jeffmaury jeffmaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

However, I though you said you add a select in ModelSelect so that we can select models even if they are not for the preferred runtime ?

@gastoner
Copy link
Contributor Author

LGTM.

However, I though you said you add a select in ModelSelect so that we can select models even if they are not for the preferred runtime ?

In the ModelSelect there is a mechanism of filtering the models based on selected Recipe ~ runime for this recipe

@jeffmaury
Copy link
Collaborator

LGTM.
However, I though you said you add a select in ModelSelect so that we can select models even if they are not for the preferred runtime ?

In the ModelSelect there is a mechanism of filtering the models based on selected Recipe ~ runime for this recipe

I think ModelSelect is used in other contexts ?

@gastoner
Copy link
Contributor Author

LGTM.
However, I though you said you add a select in ModelSelect so that we can select models even if they are not for the preferred runtime ?

In the ModelSelect there is a mechanism of filtering the models based on selected Recipe ~ runime for this recipe

I think ModelSelect is used in other contexts ?

Yes it is used in 3 or 4 places I think. Should I remove the filter from ModelSelect then?

@gastoner
Copy link
Contributor Author

Should I now also add the openvino filter? since the openvino PR is now closed?

@gastoner gastoner force-pushed the default_interferencing_runtime branch from 8a848a6 to aa62f29 Compare May 15, 2025 14:30
Copy link
Collaborator

@jeffmaury jeffmaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gastoner
Copy link
Contributor Author

@axel7083 can you PTAL?

@slemeur
Copy link
Contributor

slemeur commented May 16, 2025

I think the behavior is fine. And we have a couple of PRs that are waiting to be merged after this one, which will make the experience more logical.

What we are proposing the user is to use different Inferencing Runtimes with AI Lab. With this ticket, it will default the entire scope of AI Lab to a specific runtime.
For example, a user who is defaulting to Whisper, will get the list of recipes filtered to those who are compatible with Whisper, but the user will still be able to remove the filter and see the entire list.

For other situation (and probably the most common case), the user are most likely going to change the inferencing runtime at the time they are going to start one. For example, when the user is starting a recipe, when the user is starting to serve a model or starting a playground. We have the corresponding tickets for that: #2612, #2613

So the current behavior is good from my point of view.

@axel7083 axel7083 dismissed their stale review May 16, 2025 13:59

Dismissing my request change as @slemeur agree to go ahead with this change.


My personal opinion is that this feature will not be used because it is a configuration deep in the settings, and having it increase the complexity of AI Lab, for QE and maintenance

@gastoner gastoner merged commit f13dddc into containers:main May 17, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementation of default inferencing runtime
4 participants