-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom chat templates #410
Comments
Also see #243 |
Thanks for linking. So #243 introduces two approaches, inline in Model spec and via reference... I think we should start with inline: kind: Model
spec:
chatTemplate: |
{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}
{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %} If we get an additional feature request, we can expand to allow specifying via reference using a url format similar to how kind: Model
spec:
chatTemplateURL: cm://name-of-configmap
# And other schemes:
# chatTemplateURL: s3://bucket/my-template.jinja |
I think I prefer an approach of being able to provide arbitrary files like this. This makes it future proof for any engine:
|
Uses:
a. vllm log: `WARNING 07-18 22:59:10 serving_chat.py:347] No chat template provided. Chat API will not work
b. See facebook opt-125m model no longer works with chat completion #404
Considerations:
a. See docs.
b. See related discussion on Discord to set max context via a Modelfile. Opinion: @nstogner - Thinking it might be best to create a Modelfile (template, and other options) from a KubeAI Model spec, instead of allowing users to specify a Modelfile directly - this allows KubeAI to abstract some of the serving-engine-specific formats.
d. NOTE: Ollama Modelfile template section uses a Go template format. (See docs).
The text was updated successfully, but these errors were encountered: