Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying TextEmbedding feature should automatically configure vLLM to support embeddings #400

Open
nstogner opened this issue Feb 12, 2025 · 1 comment

Comments

@nstogner
Copy link
Contributor

nstogner commented Feb 12, 2025

A user requested the following functionality. They found it counter-intuitive that KubeAI did not automatically do this...

If a user specifies:

kind: Model
spec:
  features: ["TextEmbedding"]

vLLM should be configured with:

--task embed

Docs from vLLM:

--task
Possible choices: auto, generate, embedding, embed, classify, score, reward

The task to use the model for. Each vLLM instance only supports one task, even if the same model can be used for multiple tasks. When the model only supports one task, "auto" can be used to select it; otherwise, you must specify explicitly which task to use.

Default: “auto”

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

Based on these docs we should also add validation logic to reject Models with spec.features: ["TextCompletion", "TextGeneration"] when spec.engine: VLLM.

@MRColorR
Copy link
Contributor

I agree with you. In my personal opinion there are two viable approaches:

  • Reject Models with Multiple Tasks: For engines that do not support multiple tasks on a single instance, reject models that specify multiple tasks. This would require users to define separate models for each task, passing the correct task flags or environment variables.

  • Continue to Support Multiple Spec.Features while matching them with engines capabilities: Configure KubeAI to support multiple spec.features and set up the selected engine to handle all defined features. For vLLM, this would mean spawning a separate pod for each model feature with the correct task flag or environment variable set.

Backgroud: Many models can handle multiple tasks. Users may want to leverage these models for various tasks. In such cases, KubeAI could be enhanced to support more dynamic task allocation, possibly by:

  • Dynamic Task Allocation: Allow users to specify multiple tasks and dynamically allocate pods based on the required task, enabling flexible and efficient use of multi-task models.

  • Enhanced Validation Logic: Ensure that if spec.features includes multiple tasks, KubeAI can intelligently handle the allocation or reject the model if the configuration is invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants