Summary
The VS Code extension currently exposes the same context window to VS Code Copilot for all Lemonade models, based on the global LEMONADE_CTX_SIZE environment variable or a fixed fallback.
This works as a basic workaround, but it is not ideal for a multi-model / hybrid-model setup where different Lemonade models can have very different usable context sizes. The extension should discover the context window per model, report it accurately to VS Code.
Motivation
Lemonade can expose multiple models through the VS Code Copilot Language Model Provider integration.
It should be an easy for a first UX.
But often a different context size that the default 128000 is set inside of Lemonde.
At this could differ individually form model to model.
That way a chat starts without a problem but errors out at some point.
Proposed behavior
E.g. context size per model should be resolved and report to VS Code through the Language Model Chat Provider metadata:
-
Explicit VS Code user setting override, for example:
{
"lemonade.modelContextSizes": {
"Qwen3-Coder-Next-GGUF": 184320,
"Gemma-4-26B-A4B-it-GGUF": 262144
}
}
-
Model-specific metadata from Lemonade, for example context size etc.
-
Global LEMONADE_CTX_SIZE as backward-compatible fallback
-
Existing hardcoded default as final fallback
Additionally further informations like maxOutputTokens and maxInputTokens should be shared with VS Code aw well.
I'm happy to expand those ideas further or help with implementing it.
Summary
The VS Code extension currently exposes the same context window to VS Code Copilot for all Lemonade models, based on the global
LEMONADE_CTX_SIZEenvironment variable or a fixed fallback.This works as a basic workaround, but it is not ideal for a multi-model / hybrid-model setup where different Lemonade models can have very different usable context sizes. The extension should discover the context window per model, report it accurately to VS Code.
Motivation
Lemonade can expose multiple models through the VS Code Copilot Language Model Provider integration.
It should be an easy for a first UX.
But often a different context size that the default
128000is set inside of Lemonde.At this could differ individually form model to model.
That way a chat starts without a problem but errors out at some point.
Proposed behavior
E.g. context size per model should be resolved and report to VS Code through the Language Model Chat Provider metadata:
Explicit VS Code user setting override, for example:
{ "lemonade.modelContextSizes": { "Qwen3-Coder-Next-GGUF": 184320, "Gemma-4-26B-A4B-it-GGUF": 262144 } }Model-specific metadata from Lemonade, for example context size etc.
Global
LEMONADE_CTX_SIZEas backward-compatible fallbackExisting hardcoded default as final fallback
Additionally further informations like maxOutputTokens and maxInputTokens should be shared with VS Code aw well.
I'm happy to expand those ideas further or help with implementing it.