Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom chat templates #410

Open
nstogner opened this issue Feb 15, 2025 · 3 comments
Open

Support custom chat templates #410

nstogner opened this issue Feb 15, 2025 · 3 comments

Comments

@nstogner
Copy link
Contributor

Uses:

  1. Some models do not provide chat completion templates, which means that /chat wont function
    a. vllm log: `WARNING 07-18 22:59:10 serving_chat.py:347] No chat template provided. Chat API will not work
    b. See facebook opt-125m model no longer works with chat completion #404
  2. Some users might want to customize chat templates.

Considerations:

  1. vLLM allows for specifying a chat template in jinja format, see docs.
  2. Ollama supported specifying a Modelfile which includes a template
    a. See docs.
    b. See related discussion on Discord to set max context via a Modelfile. Opinion: @nstogner - Thinking it might be best to create a Modelfile (template, and other options) from a KubeAI Model spec, instead of allowing users to specify a Modelfile directly - this allows KubeAI to abstract some of the serving-engine-specific formats.
    d. NOTE: Ollama Modelfile template section uses a Go template format. (See docs).
@nstogner nstogner changed the title Support custom chat completion templates Support custom chat templates Feb 15, 2025
@samos123
Copy link
Contributor

Also see #243

@nstogner
Copy link
Contributor Author

Thanks for linking. So #243 introduces two approaches, inline in Model spec and via reference... I think we should start with inline:

kind: Model
spec:
  chatTemplate: |
    {% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}
    {% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}

If we get an additional feature request, we can expand to allow specifying via reference using a url format similar to how .spec.url and .spec.adapters[].url work today. For this, I would prefer to use a dedicated chatTemplateURL field to avoid overloading the chatTemplate field which needs to support multi-line strings.

kind: Model
spec:
  chatTemplateURL: cm://name-of-configmap
  # And other schemes:
  # chatTemplateURL: s3://bucket/my-template.jinja

@samos123
Copy link
Contributor

I think I prefer an approach of being able to provide arbitrary files like this. This makes it future proof for any engine:

kind: Model
spec:
  files:
    /mnt/chat-template: multi-line-string of chat-template
   args:
      --chat-template-file=/mnt/chat-template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants