Specifying TextEmbedding feature should automatically configure vLLM to support embeddings #400

nstogner · 2025-02-12T23:26:33Z

A user requested the following functionality. They found it counter-intuitive that KubeAI did not automatically do this...

If a user specifies:

kind: Model
spec:
  features: ["TextEmbedding"]

vLLM should be configured with:

--task embed

Docs from vLLM:

--task
Possible choices: auto, generate, embedding, embed, classify, score, reward

The task to use the model for. Each vLLM instance only supports one task, even if the same model can be used for multiple tasks. When the model only supports one task, "auto" can be used to select it; otherwise, you must specify explicitly which task to use.

Default: “auto”

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

Based on these docs we should also add validation logic to reject Models with spec.features: ["TextCompletion", "TextGeneration"] when spec.engine: VLLM.

The text was updated successfully, but these errors were encountered:

MRColorR · 2025-02-13T10:30:46Z

I agree with you. In my personal opinion there are two viable approaches:

Reject Models with Multiple Tasks: For engines that do not support multiple tasks on a single instance, reject models that specify multiple tasks. This would require users to define separate models for each task, passing the correct task flags or environment variables.
Continue to Support Multiple Spec.Features while matching them with engines capabilities: Configure KubeAI to support multiple spec.features and set up the selected engine to handle all defined features. For vLLM, this would mean spawning a separate pod for each model feature with the correct task flag or environment variable set.

Backgroud: Many models can handle multiple tasks. Users may want to leverage these models for various tasks. In such cases, KubeAI could be enhanced to support more dynamic task allocation, possibly by:

Dynamic Task Allocation: Allow users to specify multiple tasks and dynamically allocate pods based on the required task, enabling flexible and efficient use of multi-task models.
Enhanced Validation Logic: Ensure that if spec.features includes multiple tasks, KubeAI can intelligently handle the allocation or reject the model if the configuration is invalid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specifying TextEmbedding feature should automatically configure vLLM to support embeddings #400

Specifying TextEmbedding feature should automatically configure vLLM to support embeddings #400

nstogner commented Feb 12, 2025 •

edited

Loading

MRColorR commented Feb 13, 2025

Specifying TextEmbedding feature should automatically configure vLLM to support embeddings #400

Specifying TextEmbedding feature should automatically configure vLLM to support embeddings #400

Comments

nstogner commented Feb 12, 2025 • edited Loading

MRColorR commented Feb 13, 2025

nstogner commented Feb 12, 2025 •

edited

Loading