refactor: llguidance #288

guicho271828 · 2026-01-06T17:36:51Z

No description provided.

mergify · 2026-01-06T17:37:26Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

jakelorocco

Looks like there's some mypy errors as well.

mellea/backends/vllm.py

mellea/backends/huggingface.py

guicho271828 · 2026-01-07T18:37:31Z

Looks like there's some mypy errors as well.

I don't know why there is a mypy error. Mypy passes locally for me.

nrfulton · 2026-01-07T20:49:05Z

Looks like there's some mypy errors as well.

I don't know why there is a mypy error. Mypy passes locally for me.

We've seen this happen recently and it came down to mypy versions. Could be the cause. I re-ran the checks on the latest branch. If those still fail and you still pass with mypy, try checking your mypy version.

guicho271828 · 2026-01-07T21:11:07Z

With an updated lockfile the mypy passes. Test is terminated but locally it is passing. If you approve, I will merge this

…API, use llguidance

…lm, V1 API, use llguidance

jakelorocco

@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?

I tried installing this version of the mellea package on my mac and was getting versioning errors.

Details

(mellea) ➜  mellea git:(pr/guicho271828/288) uv pip install -e '.[all]' --all-extras --group dev -r pyproject.toml
Using Python 3.12.0 environment at: /opt/homebrew/Caskroom/miniforge/base/envs/mellea
  × No solution found when resolving dependencies:
  ╰─▶ Because only the following versions of nvidia-cudnn-frontend are available:
          nvidia-cudnn-frontend<=1.13.0
          nvidia-cudnn-frontend==1.14.0
          nvidia-cudnn-frontend==1.14.1
          nvidia-cudnn-frontend==1.15.0
          nvidia-cudnn-frontend==1.16.0
          nvidia-cudnn-frontend==1.17.0
      and nvidia-cudnn-frontend>=1.13.0 has no wheels with a matching platform tag (e.g., `macosx_15_0_arm64`), we can conclude that
      nvidia-cudnn-frontend>=1.13.0 cannot be used.
      And because flashinfer-python==0.5.3 depends on nvidia-cudnn-frontend>=1.13.0 and vllm==0.13.0 depends on flashinfer-python==0.5.3, we
      can conclude that vllm==0.13.0 cannot be used.
      And because only vllm<=0.13.0 is available and mellea depends on vllm>=0.13.0, we can conclude that your requirements are
      unsatisfiable.

It looks like the newest version I can install on a mac is 0.11.0.

The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?

guicho271828 · 2026-01-12T20:11:10Z

@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?

I tried installing this version of the mellea package on my mac and was getting versioning errors.
Details

The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?

To run vllm locally, vllm assumes

NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, and TPU

So yes mella[vllm] does not work on Mac.

guicho271828 · 2026-01-12T21:14:44Z

@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?

I tried installing this version of the mellea package on my mac and was getting versioning errors.
Details

(mellea) ➜  mellea git:(pr/guicho271828/288) uv pip install -e '.[all]' --all-extras --group dev -r pyproject.toml
Using Python 3.12.0 environment at: /opt/homebrew/Caskroom/miniforge/base/envs/mellea
  × No solution found when resolving dependencies:
  ╰─▶ Because only the following versions of nvidia-cudnn-frontend are available:
          nvidia-cudnn-frontend<=1.13.0
          nvidia-cudnn-frontend==1.14.0
          nvidia-cudnn-frontend==1.14.1
          nvidia-cudnn-frontend==1.15.0
          nvidia-cudnn-frontend==1.16.0
          nvidia-cudnn-frontend==1.17.0
      and nvidia-cudnn-frontend>=1.13.0 has no wheels with a matching platform tag (e.g., `macosx_15_0_arm64`), we can conclude that
      nvidia-cudnn-frontend>=1.13.0 cannot be used.
      And because flashinfer-python==0.5.3 depends on nvidia-cudnn-frontend>=1.13.0 and vllm==0.13.0 depends on flashinfer-python==0.5.3, we
      can conclude that vllm==0.13.0 cannot be used.
      And because only vllm<=0.13.0 is available and mellea depends on vllm>=0.13.0, we can conclude that your requirements are
      unsatisfiable.

It looks like the newest version I can install on a mac is 0.11.0.

The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?

29588fb disables installing vllm on darwin.

psschwei · 2026-01-22T15:15:09Z

mellea/backends/vllm.py

+                vllm.sampling_params.StructuredOutputsParams(
+                    json=format.model_json_schema()
+                )


FWIW, when I was trying this locally (uv run pytest test/backends/test_vllm.py on Fedora 43, Python 3.12.8, CUDA 13.0), the test_generate_from_raw_with_format test failed. Here's a snippet of the output:

session = <mellea.stdlib.session.MelleaSession object at 0x7ff08f129a90> @pytest.mark.qualitative async def test_generate_from_raw_with_format(session): prompts = ["what is 1+1?", "what is 2+2?", "what is 3+3?", "what is 4+4?"] class Answer(pydantic.BaseModel): name: str value: int results = await session.backend.generate_from_raw( actions=[CBlock(value=prompt) for prompt in prompts], ctx=session.ctx, format=Answer, ) assert len(results) == len(prompts) random_result = results[0] try: answer = Answer.model_validate_json(random_result.value) except pydantic.ValidationError as e: > assert False, ( f"formatting directive failed for {random_result.value}: {e.json()}" ) E AssertionError: formatting directive failed for { E E E "name": "binary", E "value": 1 E : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 6 column 1","input":"{\n\n\n \"name\": \"binary\",\n \"value\": 1\n ","ctx":{"error":"EOF while parsing an object at line 6 column 1"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}] E assert False test/backends/test_vllm.py:133: AssertionError

Seems like what's happening is that the model is following the grammar but not valid JSON (which seems like may just be a fact of life with a tiny model, I see the vllm test is using qwen3 0.6b).

A quick hack I tried locally that got the test to pass was to add an additional prompt to the list around using proper json content:

diff --git a/mellea/backends/vllm.py b/mellea/backends/vllm.py index a59ced7..6e3ee5a 100644 --- a/mellea/backends/vllm.py +++ b/mellea/backends/vllm.py @@ -447,8 +447,21 @@ class LocalVLLMBackend(FormatterBackend): model_options = self._simplify_and_merge(model_options) + # When structured output is requested, ensure there's a reasonable max_tokens limit + # to prevent excessive whitespace generation and ensure output completion. + if format is not None and ModelOption.MAX_NEW_TOKENS not in model_options: + model_options[ModelOption.MAX_NEW_TOKENS] = 512 + prompts = [self.formatter.print(action) for action in actions] + # When structured output is requested, prepend format instructions to help the model + # understand what JSON content to generate. Without this, models may produce valid JSON + # structure (due to constrained decoding) but with meaningless content like whitespace. + if format is not None: + schema_str = json.dumps(format.model_json_schema(), indent=2) + format_prefix = f"Output a JSON object matching this schema:\n{schema_str}\n\nQuery: " + prompts = [format_prefix + p for p in prompts] + sampling_params = vllm.SamplingParams( **self._make_backend_specific_and_remove( model_options, vllm.SamplingParams

We could make this error message better, but I think what's happening is that the model runs out of tokens before the json completes (at least I see this error with hf sometimes):

"{\n\n\n \"name\": \"binary\",\n \"value\": 1\n " <-- missing closing bracket

yeah, that seems to be it. just re-ran the test and I see:

E Invalid JSON: EOF while parsing an object at line 130 column 0 [type=json_invalid, input_value='{ \n\n\n\n\n\n\n\n\n\n\...n\n\n\n\n\n\n\n\n\n\n\n', input_type=str]

and also

E : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 130 column 0","input":"{ \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","ctx":{"error":"EOF while parsing an object at line 130 column 0"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]

though it's interesting, on my run looks like the string is an opening bracket and then a LOT of newlines...

psschwei · 2026-01-22T15:30:49Z

test/backends/test_vllm_tools.py

-    if os.environ.get("VLLM_USE_V1", -1) != "0":
-        pytest.skip("skipping vllm tests; tests require `export VLLM_USE_V1=0`")


My laptop GPU wasn't big enough to run this test 😢
I'm going to try running on a remote environment where I can get a bigger GPU...

guicho271828 changed the title ~~Llguidance~~ refactor: llguidance Jan 6, 2026

guicho271828 requested review from jakelorocco and nrfulton January 6, 2026 20:09

jakelorocco reviewed Jan 6, 2026

View reviewed changes

mellea/backends/vllm.py Outdated Show resolved Hide resolved

mellea/backends/vllm.py Outdated Show resolved Hide resolved

mellea/backends/huggingface.py Show resolved Hide resolved

guicho271828 force-pushed the llguidance branch 2 times, most recently from 3ebb8bd to 8dce1f2 Compare January 7, 2026 19:29

guicho271828 added 8 commits January 8, 2026 15:55

refactor: removed outlines from vllm/hf backend, use latest vllm, V1 …

21a57ef

…API, use llguidance

fix: conda should not always supersede mamba

b6c1eb3

fix(test): hf test uses a smaller model

35d9521

fixup! refactor: removed outlines from vllm/hf backend, use latest vl…

fbd37f1

…lm, V1 API, use llguidance

fixup! refactor: removed outlines from vllm/hf backend, use latest vl…

f3b4591

…lm, V1 API, use llguidance

fixup! refactor: removed outlines from vllm/hf backend, use latest vl…

20ae052

…lm, V1 API, use llguidance

fix: force vllm >= 0.13.0 and updated lock

1987115

fixup! refactor: removed outlines from vllm/hf backend, use latest vl…

62923fe

…lm, V1 API, use llguidance

guicho271828 force-pushed the llguidance branch from 938d919 to 62923fe Compare January 8, 2026 20:55

jakelorocco reviewed Jan 12, 2026

View reviewed changes

guicho271828 and others added 2 commits January 12, 2026 16:10

Merge branch 'main' into llguidance

7eee2f1

fix: disable vllm on darwin

29588fb

psschwei mentioned this pull request Jan 22, 2026

Add vllm v1 support for LocalVLLMBackend #334

Open

psschwei reviewed Jan 22, 2026

View reviewed changes

		if os.environ.get("VLLM_USE_V1", -1) != "0":
		pytest.skip("skipping vllm tests; tests require `export VLLM_USE_V1=0`")

refactor: llguidance #288

Are you sure you want to change the base?

refactor: llguidance #288

Conversation

guicho271828 commented Jan 6, 2026

Uh oh!

mergify bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guicho271828 commented Jan 7, 2026

Uh oh!

nrfulton commented Jan 7, 2026

Uh oh!

guicho271828 commented Jan 7, 2026

Uh oh!

jakelorocco left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guicho271828 commented Jan 12, 2026

Uh oh!

guicho271828 commented Jan 12, 2026

Uh oh!

psschwei Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

jakelorocco Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

psschwei Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psschwei Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mergify bot commented Jan 6, 2026 •

edited

Loading

jakelorocco left a comment •

edited

Loading

psschwei Jan 22, 2026 •

edited

Loading