Skip to content

Conversation

@guicho271828
Copy link
Contributor

No description provided.

@mergify
Copy link

mergify bot commented Jan 6, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@guicho271828 guicho271828 changed the title Llguidance refactor: llguidance Jan 6, 2026
Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's some mypy errors as well.

@guicho271828
Copy link
Contributor Author

Looks like there's some mypy errors as well.

I don't know why there is a mypy error. Mypy passes locally for me.

@guicho271828 guicho271828 force-pushed the llguidance branch 2 times, most recently from 3ebb8bd to 8dce1f2 Compare January 7, 2026 19:29
@nrfulton
Copy link
Member

nrfulton commented Jan 7, 2026

Looks like there's some mypy errors as well.

I don't know why there is a mypy error. Mypy passes locally for me.

We've seen this happen recently and it came down to mypy versions. Could be the cause. I re-ran the checks on the latest branch. If those still fail and you still pass with mypy, try checking your mypy version.

@guicho271828
Copy link
Contributor Author

With an updated lockfile the mypy passes. Test is terminated but locally it is passing. If you approve, I will merge this

Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?

I tried installing this version of the mellea package on my mac and was getting versioning errors.

Details

(mellea) ➜  mellea git:(pr/guicho271828/288) uv pip install -e '.[all]' --all-extras --group dev -r pyproject.toml
Using Python 3.12.0 environment at: /opt/homebrew/Caskroom/miniforge/base/envs/mellea
  × No solution found when resolving dependencies:
  ╰─▶ Because only the following versions of nvidia-cudnn-frontend are available:
          nvidia-cudnn-frontend<=1.13.0
          nvidia-cudnn-frontend==1.14.0
          nvidia-cudnn-frontend==1.14.1
          nvidia-cudnn-frontend==1.15.0
          nvidia-cudnn-frontend==1.16.0
          nvidia-cudnn-frontend==1.17.0
      and nvidia-cudnn-frontend>=1.13.0 has no wheels with a matching platform tag (e.g., `macosx_15_0_arm64`), we can conclude that
      nvidia-cudnn-frontend>=1.13.0 cannot be used.
      And because flashinfer-python==0.5.3 depends on nvidia-cudnn-frontend>=1.13.0 and vllm==0.13.0 depends on flashinfer-python==0.5.3, we
      can conclude that vllm==0.13.0 cannot be used.
      And because only vllm<=0.13.0 is available and mellea depends on vllm>=0.13.0, we can conclude that your requirements are
      unsatisfiable.

It looks like the newest version I can install on a mac is 0.11.0.

The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?

@guicho271828
Copy link
Contributor Author

@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?

I tried installing this version of the mellea package on my mac and was getting versioning errors.
Details

The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?

To run vllm locally, vllm assumes

NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, and TPU

So yes mella[vllm] does not work on Mac.

@guicho271828
Copy link
Contributor Author

@guicho271828, can you remind me if we ever got the local vllm backend running on mac (or if you remember a discussion about that)?

I tried installing this version of the mellea package on my mac and was getting versioning errors.
Details

(mellea) ➜  mellea git:(pr/guicho271828/288) uv pip install -e '.[all]' --all-extras --group dev -r pyproject.toml
Using Python 3.12.0 environment at: /opt/homebrew/Caskroom/miniforge/base/envs/mellea
  × No solution found when resolving dependencies:
  ╰─▶ Because only the following versions of nvidia-cudnn-frontend are available:
          nvidia-cudnn-frontend<=1.13.0
          nvidia-cudnn-frontend==1.14.0
          nvidia-cudnn-frontend==1.14.1
          nvidia-cudnn-frontend==1.15.0
          nvidia-cudnn-frontend==1.16.0
          nvidia-cudnn-frontend==1.17.0
      and nvidia-cudnn-frontend>=1.13.0 has no wheels with a matching platform tag (e.g., `macosx_15_0_arm64`), we can conclude that
      nvidia-cudnn-frontend>=1.13.0 cannot be used.
      And because flashinfer-python==0.5.3 depends on nvidia-cudnn-frontend>=1.13.0 and vllm==0.13.0 depends on flashinfer-python==0.5.3, we
      can conclude that vllm==0.13.0 cannot be used.
      And because only vllm<=0.13.0 is available and mellea depends on vllm>=0.13.0, we can conclude that your requirements are
      unsatisfiable.

It looks like the newest version I can install on a mac is 0.11.0.

The vllm engine can't seem to get enough resources to run locally on my mac anyways, so maybe we can just add a note somewhere that "mella[vllm]" doesn't work for macs?

29588fb disables installing vllm on darwin.

Comment on lines +461 to +463
vllm.sampling_params.StructuredOutputsParams(
json=format.model_json_schema()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, when I was trying this locally (uv run pytest test/backends/test_vllm.py on Fedora 43, Python 3.12.8, CUDA 13.0), the test_generate_from_raw_with_format test failed. Here's a snippet of the output:

session = <mellea.stdlib.session.MelleaSession object at 0x7ff08f129a90>

    @pytest.mark.qualitative
    async def test_generate_from_raw_with_format(session):
        prompts = ["what is 1+1?", "what is 2+2?", "what is 3+3?", "what is 4+4?"]

        class Answer(pydantic.BaseModel):
            name: str
            value: int

        results = await session.backend.generate_from_raw(
            actions=[CBlock(value=prompt) for prompt in prompts],
            ctx=session.ctx,
            format=Answer,
        )

        assert len(results) == len(prompts)

        random_result = results[0]
        try:
            answer = Answer.model_validate_json(random_result.value)
        except pydantic.ValidationError as e:
>           assert False, (
                f"formatting directive failed for {random_result.value}: {e.json()}"
            )
E           AssertionError: formatting directive failed for {
E
E
E                 "name": "binary",
E                 "value": 1
E              : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 6 column 1","input":"{\n\n\n    \"name\": \"binary\",\n    \"value\": 1\n ","ctx":{"error":"EOF while parsing an object at line 6 column 1"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]
E           assert False

test/backends/test_vllm.py:133: AssertionError

Seems like what's happening is that the model is following the grammar but not valid JSON (which seems like may just be a fact of life with a tiny model, I see the vllm test is using qwen3 0.6b).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick hack I tried locally that got the test to pass was to add an additional prompt to the list around using proper json content:

diff --git a/mellea/backends/vllm.py b/mellea/backends/vllm.py
index a59ced7..6e3ee5a 100644
--- a/mellea/backends/vllm.py
+++ b/mellea/backends/vllm.py
@@ -447,8 +447,21 @@ class LocalVLLMBackend(FormatterBackend):

         model_options = self._simplify_and_merge(model_options)

+        # When structured output is requested, ensure there's a reasonable max_tokens limit
+        # to prevent excessive whitespace generation and ensure output completion.
+        if format is not None and ModelOption.MAX_NEW_TOKENS not in model_options:
+            model_options[ModelOption.MAX_NEW_TOKENS] = 512
+
         prompts = [self.formatter.print(action) for action in actions]

+        # When structured output is requested, prepend format instructions to help the model
+        # understand what JSON content to generate. Without this, models may produce valid JSON
+        # structure (due to constrained decoding) but with meaningless content like whitespace.
+        if format is not None:
+            schema_str = json.dumps(format.model_json_schema(), indent=2)
+            format_prefix = f"Output a JSON object matching this schema:\n{schema_str}\n\nQuery: "
+            prompts = [format_prefix + p for p in prompts]
+
         sampling_params = vllm.SamplingParams(
             **self._make_backend_specific_and_remove(
                 model_options, vllm.SamplingParams

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this error message better, but I think what's happening is that the model runs out of tokens before the json completes (at least I see this error with hf sometimes):

"{\n\n\n    \"name\": \"binary\",\n    \"value\": 1\n " <-- missing closing bracket

Copy link
Contributor

@psschwei psschwei Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that seems to be it. just re-ran the test and I see:

E             Invalid JSON: EOF while parsing an object at line 130 column 0 [type=json_invalid, input_value='{  \n\n\n\n\n\n\n\n\n\n\...n\n\n\n\n\n\n\n\n\n\n\n', input_type=str]

and also

E             : [{"type":"json_invalid","loc":[],"msg":"Invalid JSON: EOF while parsing an object at line 130 column 0","input":"{  \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n","ctx":{"error":"EOF while parsing an object at line 130 column 0"},"url":"https://errors.pydantic.dev/2.12/v/json_invalid"}]

though it's interesting, on my run looks like the string is an opening bracket and then a LOT of newlines...

Comment on lines -22 to -23
if os.environ.get("VLLM_USE_V1", -1) != "0":
pytest.skip("skipping vllm tests; tests require `export VLLM_USE_V1=0`")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My laptop GPU wasn't big enough to run this test 😢
I'm going to try running on a remote environment where I can get a bigger GPU...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants