You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A user requested the following functionality. They found it counter-intuitive that KubeAI did not automatically do this...
If a user specifies:
kind: Modelspec:
features: ["TextEmbedding"]
vLLM should be configured with:
--task embed
Docs from vLLM:
--task
Possible choices: auto, generate, embedding, embed, classify, score, reward
The task to use the model for. Each vLLM instance only supports one task, even if the same model can be used for multiple tasks. When the model only supports one task, "auto" can be used to select it; otherwise, you must specify explicitly which task to use.
Default: “auto”
Based on these docs we should also add validation logic to reject Models with spec.features: ["TextCompletion", "TextGeneration"] when spec.engine: VLLM.
The text was updated successfully, but these errors were encountered:
I agree with you. In my personal opinion there are two viable approaches:
Reject Models with Multiple Tasks: For engines that do not support multiple tasks on a single instance, reject models that specify multiple tasks. This would require users to define separate models for each task, passing the correct task flags or environment variables.
Continue to Support Multiple Spec.Features while matching them with engines capabilities: Configure KubeAI to support multiple spec.features and set up the selected engine to handle all defined features. For vLLM, this would mean spawning a separate pod for each model feature with the correct task flag or environment variable set.
Backgroud: Many models can handle multiple tasks. Users may want to leverage these models for various tasks. In such cases, KubeAI could be enhanced to support more dynamic task allocation, possibly by:
Dynamic Task Allocation: Allow users to specify multiple tasks and dynamically allocate pods based on the required task, enabling flexible and efficient use of multi-task models.
Enhanced Validation Logic: Ensure that if spec.features includes multiple tasks, KubeAI can intelligently handle the allocation or reject the model if the configuration is invalid.
A user requested the following functionality. They found it counter-intuitive that KubeAI did not automatically do this...
If a user specifies:
vLLM should be configured with:
Docs from vLLM:
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
Based on these docs we should also add validation logic to reject Models with
spec.features: ["TextCompletion", "TextGeneration"]
whenspec.engine: VLLM
.The text was updated successfully, but these errors were encountered: