-
Notifications
You must be signed in to change notification settings - Fork 73
refactor: llguidance #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: llguidance #288
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's some mypy errors as well.
I don't know why there is a mypy error. Mypy passes locally for me. |
3ebb8bd to
8dce1f2
Compare
We've seen this happen recently and it came down to mypy versions. Could be the cause. I re-ran the checks on the latest branch. If those still fail and you still pass with mypy, try checking your mypy version. |
|
With an updated lockfile the mypy passes. Test is terminated but locally it is passing. If you approve, I will merge this |
…API, use llguidance
…lm, V1 API, use llguidance
…lm, V1 API, use llguidance
…lm, V1 API, use llguidance
…lm, V1 API, use llguidance
938d919 to
62923fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?
I tried installing this version of the mellea package on my mac and was getting versioning errors.
Details
(mellea) ➜ mellea git:(pr/guicho271828/288) uv pip install -e '.[all]' --all-extras --group dev -r pyproject.toml
Using Python 3.12.0 environment at: /opt/homebrew/Caskroom/miniforge/base/envs/mellea
× No solution found when resolving dependencies:
╰─▶ Because only the following versions of nvidia-cudnn-frontend are available:
nvidia-cudnn-frontend<=1.13.0
nvidia-cudnn-frontend==1.14.0
nvidia-cudnn-frontend==1.14.1
nvidia-cudnn-frontend==1.15.0
nvidia-cudnn-frontend==1.16.0
nvidia-cudnn-frontend==1.17.0
and nvidia-cudnn-frontend>=1.13.0 has no wheels with a matching platform tag (e.g., `macosx_15_0_arm64`), we can conclude that
nvidia-cudnn-frontend>=1.13.0 cannot be used.
And because flashinfer-python==0.5.3 depends on nvidia-cudnn-frontend>=1.13.0 and vllm==0.13.0 depends on flashinfer-python==0.5.3, we
can conclude that vllm==0.13.0 cannot be used.
And because only vllm<=0.13.0 is available and mellea depends on vllm>=0.13.0, we can conclude that your requirements are
unsatisfiable.
It looks like the newest version I can install on a mac is 0.11.0.
The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?
To run vllm locally, vllm assumes
So yes |
29588fb disables installing vllm on darwin. |
| vllm.sampling_params.StructuredOutputsParams( | ||
| json=format.model_json_schema() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, when I was trying this locally (uv run pytest test/backends/test_vllm.py on Fedora 43, Python 3.12.8, CUDA 13.0), the test_generate_from_raw_with_format test failed. Here's a snippet of the output:
session = <mellea.stdlib.session.MelleaSession object at 0x7ff08f129a90>
@pytest.mark.qualitative
async def test_generate_from_raw_with_format(session):
prompts = ["what is 1+1?", "what is 2+2?", "what is 3+3?", "what is 4+4?"]
class Answer(pydantic.BaseModel):
name: str
value: int
results = await session.backend.generate_from_raw(
actions=[CBlock(value=prompt) for prompt in prompts],
ctx=session.ctx,
format=Answer,
)
assert len(results) == len(prompts)
random_result = results[0]
try:
answer = Answer.model_validate_json(random_result.value)
except pydantic.ValidationError as e:
> assert False, (
f"formatting directive failed for {random_result.value}: {e.json()}"
)
E AssertionError: formatting directive failed for {
E
E
E "name": "binary",
E "value": 1
E : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 6 column 1","input":"{\n\n\n \"name\": \"binary\",\n \"value\": 1\n ","ctx":{"error":"EOF while parsing an object at line 6 column 1"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]
E assert False
test/backends/test_vllm.py:133: AssertionError
Seems like what's happening is that the model is following the grammar but not valid JSON (which seems like may just be a fact of life with a tiny model, I see the vllm test is using qwen3 0.6b).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick hack I tried locally that got the test to pass was to add an additional prompt to the list around using proper json content:
diff --git a/mellea/backends/vllm.py b/mellea/backends/vllm.py
index a59ced7..6e3ee5a 100644
--- a/mellea/backends/vllm.py
+++ b/mellea/backends/vllm.py
@@ -447,8 +447,21 @@ class LocalVLLMBackend(FormatterBackend):
model_options = self._simplify_and_merge(model_options)
+ # When structured output is requested, ensure there's a reasonable max_tokens limit
+ # to prevent excessive whitespace generation and ensure output completion.
+ if format is not None and ModelOption.MAX_NEW_TOKENS not in model_options:
+ model_options[ModelOption.MAX_NEW_TOKENS] = 512
+
prompts = [self.formatter.print(action) for action in actions]
+ # When structured output is requested, prepend format instructions to help the model
+ # understand what JSON content to generate. Without this, models may produce valid JSON
+ # structure (due to constrained decoding) but with meaningless content like whitespace.
+ if format is not None:
+ schema_str = json.dumps(format.model_json_schema(), indent=2)
+ format_prefix = f"Output a JSON object matching this schema:\n{schema_str}\n\nQuery: "
+ prompts = [format_prefix + p for p in prompts]
+
sampling_params = vllm.SamplingParams(
**self._make_backend_specific_and_remove(
model_options, vllm.SamplingParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this error message better, but I think what's happening is that the model runs out of tokens before the json completes (at least I see this error with hf sometimes):
"{\n\n\n \"name\": \"binary\",\n \"value\": 1\n " <-- missing closing bracket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that seems to be it. just re-ran the test and I see:
E Invalid JSON: EOF while parsing an object at line 130 column 0 [type=json_invalid, input_value='{ \n\n\n\n\n\n\n\n\n\n\...n\n\n\n\n\n\n\n\n\n\n\n', input_type=str]
and also
E : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 130 column 0","input":"{ \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","ctx":{"error":"EOF while parsing an object at line 130 column 0"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]
though it's interesting, on my run looks like the string is an opening bracket and then a LOT of newlines...
| if os.environ.get("VLLM_USE_V1", -1) != "0": | ||
| pytest.skip("skipping vllm tests; tests require `export VLLM_USE_V1=0`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My laptop GPU wasn't big enough to run this test 😢
I'm going to try running on a remote environment where I can get a bigger GPU...
No description provided.