Skip to content

fix: llama.cpp backend updates make sure artifacts are present#2518

Merged
fl0rianr merged 3 commits into
mainfrom
fl0rianr/harden_llama_backend_auto_updater
Jul 2, 2026
Merged

fix: llama.cpp backend updates make sure artifacts are present#2518
fl0rianr merged 3 commits into
mainfrom
fl0rianr/harden_llama_backend_auto_updater

Conversation

@fl0rianr

@fl0rianr fl0rianr commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Harden the llama.cpp release update workflow so backend pins are updated only when the expected release assets are available.

Relates to #2492

What changed

  • Adds an asset availability check before updating llama.cpp backend pins.
  • Keeps existing pins when a backend release is incomplete.
  • Preserves the existing PR validation path: pull requests build and validate against the current checked-in pins.
  • Scheduled/manual runs can still create update PRs, but only after verified assets and successful validation.

@github-actions github-actions Bot added engine::llamacpp llama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal) area::ci CI / GitHub Actions / self-hosted runner infrastructure enhancement New feature or request labels Jul 1, 2026
@fl0rianr fl0rianr requested a review from kenvandine July 2, 2026 00:41
@kenvandine

Copy link
Copy Markdown
Member

Overall, this nicely matches the PRs we have in llama.cpp and stable-diffusion.cpp, ensuring families are complete before including them in a release. However, there is a minor bug:

CUDA_SMS includes sm_121, which llama.cpp never builds. Our release.yml only builds 7 variants (sm_75 through sm_120) — no sm_121. That's a stable-diffusion.cpp thing (GB10/Blackwell arm64 support), not llama.cpp's. Since CUDA_SMS here has 8 entries including sm_121, the cuda requirement list will always be missing 3 assets (windows-cuda-sm_121-x64.7z, ubuntu-cuda-sm_121-x64.tar.xz, -arm64.tar.xz) that can never exist in lemonade-sdk/llama.cpp
releases. That means cuda_available evaluates to false forever, and the CUDA backend pin can never auto-update through this workflow — even when all 21 real CUDA assets are complete. Looks like the list was copied from
stable-diffusion.cpp's CUDA_SMS (which correctly has sm_121) without accounting for the difference between the two repos' build matrices.

@fl0rianr

fl0rianr commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch, thanks! Minor adaption implemented.

@kenvandine

Copy link
Copy Markdown
Member

Re-reviewed after the sm_121 fix — confirmed `CUDA_SMS` now has exactly 7 entries matching lemonade-sdk/llama.cpp's actual build matrix, and the `rocm-stable`/`cuda` all-or-nothing family groupings correctly mirror the producer-side "drop the whole family if any piece is missing" logic from lemonade-sdk/llama.cpp#16 and lemonade-sdk/stable-diffusion.cpp#14.

Found one more instance of the same bug class, on the ROCm-nightly side:

```python
ROCM_NIGHTLY_ARCHES = (
"gfx1150",
"gfx1151",
"gfx1152",
"gfx103X",
"gfx110X",
"gfx120X",
)
```

I checked lemonade-sdk/llamacpp-rocm's build workflow and `gfx1152` doesn't appear anywhere in it — not in the default `GFX_TARGETS` matrix (`gfx1151,gfx1150,gfx120X,gfx110X,gfx103X,gfx90a,gfx908`), not in `rocm_asset_families`, nowhere. That default list does build `gfx90a` and `gfx908`, neither of which are in `ROCM_NIGHTLY_ARCHES`.

Same failure mode as the `sm_121` bug: `llama-{tag}-windows-rocm-gfx1152-x64.zip` and the ubuntu equivalent can never exist in a llamacpp-rocm release, so `rocm_nightly` will permanently report incomplete and that pin can never auto-update — even when the 6 real targets llamacpp-rocm actually ships are all complete.

My guess is `gfx1152` got pulled in from stable-diffusion.cpp's `GPU_TARGETS` list (a compile-time ISA list for TheRock), which does include it — but that's a different thing from llamacpp-rocm's release asset names. I don't have enough context on GPU support plans to know whether the right fix is dropping `gfx1152` from the required list, or whether llamacpp-rocm should start shipping a `gfx1152` asset — that's a call for whoever owns that repo's roadmap.

@fl0rianr

fl0rianr commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch again — aligned ROCM_NIGHTLY_ARCHES with the actual lemonade-sdk/llamacpp-rocm producer matrix: removed gfx1152 and added gfx90a / gfx908.

@kenvandine

Copy link
Copy Markdown
Member

Re-reviewed after c560cc3 — confirmed `ROCM_NIGHTLY_ARCHES` now reads:

```python
ROCM_NIGHTLY_ARCHES = (
"gfx1151",
"gfx1150",
"gfx120X",
"gfx110X",
"gfx103X",
"gfx90a",
"gfx908",
)
```

which is an exact match for lemonade-sdk/llamacpp-rocm's default `GFX_TARGETS` (`gfx1151,gfx1150,gfx120X,gfx110X,gfx103X,gfx90a,gfx908`). No leftover `gfx1152`, and `sm_121` is still correctly absent from `CUDA_SMS`. That was the only hunk that changed since my last pass — everything else (the `rocm-stable`/`cuda` all-or-nothing family groupings, the reused-asset skip logic, the PR body table) is unchanged and still looks correct.

With both asset lists now matching their producers' real output, this looks good to me.

@kenvandine kenvandine left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good now.

@fl0rianr fl0rianr added this pull request to the merge queue Jul 2, 2026
Merged via the queue into main with commit e9ae1b2 Jul 2, 2026
83 checks passed
@fl0rianr fl0rianr deleted the fl0rianr/harden_llama_backend_auto_updater branch July 2, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area::ci CI / GitHub Actions / self-hosted runner infrastructure engine::llamacpp llama.cpp backend (LlamaCppServer); GPU/CPU LLM inference (Vulkan, ROCm, Metal) enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants