Support custom chat templates #410

nstogner · 2025-02-15T17:51:44Z

Uses:

Some models do not provide chat completion templates, which means that /chat wont function
a. vllm log: `WARNING 07-18 22:59:10 serving_chat.py:347] No chat template provided. Chat API will not work
b. See facebook opt-125m model no longer works with chat completion #404
Some users might want to customize chat templates.

Considerations:

vLLM allows for specifying a chat template in jinja format, see docs.
Ollama supported specifying a Modelfile which includes a template
a. See docs.
b. See related discussion on Discord to set max context via a Modelfile. Opinion: @nstogner - Thinking it might be best to create a Modelfile (template, and other options) from a KubeAI Model spec, instead of allowing users to specify a Modelfile directly - this allows KubeAI to abstract some of the serving-engine-specific formats.
d. NOTE: Ollama Modelfile template section uses a Go template format. (See docs).

samos123 · 2025-02-15T17:56:55Z

Also see #243

nstogner · 2025-02-15T18:17:59Z

Thanks for linking. So #243 introduces two approaches, inline in Model spec and via reference... I think we should start with inline:

kind: Model
spec:
  chatTemplate: |
    {% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}
    {% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}

If we get an additional feature request, we can expand to allow specifying via reference using a url format similar to how .spec.url and .spec.adapters[].url work today. For this, I would prefer to use a dedicated chatTemplateURL field to avoid overloading the chatTemplate field which needs to support multi-line strings.

kind: Model
spec:
  chatTemplateURL: cm://name-of-configmap
  # And other schemes:
  # chatTemplateURL: s3://bucket/my-template.jinja

samos123 · 2025-02-18T06:55:11Z

I think I prefer an approach of being able to provide arbitrary files like this. This makes it future proof for any engine:

kind: Model
spec:
  files:
    /mnt/chat-template: multi-line-string of chat-template
   args:
      --chat-template-file=/mnt/chat-template

nstogner changed the title ~~Support custom chat completion templates~~ Support custom chat templates Feb 15, 2025

nstogner mentioned this issue Feb 15, 2025

Ability to provide chat templates to vLLM #243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom chat templates #410

Support custom chat templates #410

nstogner commented Feb 15, 2025

samos123 commented Feb 15, 2025

nstogner commented Feb 15, 2025

samos123 commented Feb 18, 2025

Support custom chat templates #410

Support custom chat templates #410

Comments

nstogner commented Feb 15, 2025

samos123 commented Feb 15, 2025

nstogner commented Feb 15, 2025

samos123 commented Feb 18, 2025