refactor(backends): self-describing WrappedServer backends (#2287) by jeremyfowers · Pull Request #2320 · lemonade-sdk/lemonade

jeremyfowers · 2026-06-19T16:48:45Z

Implements the plan in #2287: each inference backend describes itself with a plain-data descriptor + a server class + a stateless behavior object, and every scattered if (recipe == "...") site is rewritten to read a registry built from those descriptors. Backend-specific logic no longer leaks into the router, model manager, system-info, CLI, or docs.

Layout — a backend is a folder

include/lemon/backends/<stem>/<stem>.h          # descriptor (header-only inline const, CLI-safe)
include/lemon/backends/<stem>/<stem>_server.h   # WrappedServer subclass + create()/spec()/ops() decls
server/backends/<stem>/<stem>_server.cpp        # server class impl, BackendOps subclass, create/spec/ops
                                                # (+ any backend-private helpers, e.g. llamacpp_gguf.cpp)

Adding a backend = one LEMON_BACKENDS line in CMakeLists.txt + that folder + a backend_versions.json pin + server_models.json entries. No router, CLI, doc, or support-matrix edits — those are all derived. CMake globs each backend folder (CONFIGURE_DEPENDS), so backend-private helper files need no build edit.

What changed

Descriptor (backend_descriptor.h) — plain data describing what a backend is: recipe, display name, binary, config section, default device, SlotPolicy, selectable_backend, uses_ctx_size, dynamic_models, declarative options[], OS/GPU support[], default labels, required checkpoints, plus editorial/policy fields (modality, experimental, web_priority, rocm_channels, version_policy, exposes_prometheus_metrics, rocm_requires_cwsr_fix, self_manages_downloads).
Two-tier registry, generated from LEMON_BACKENDS at CMake configure time — a CLI-safe data registry (descriptors only; links into both lemonade and lemond) and a server-only factory registry (binds each descriptor to its class's create(), spec(), ops()). This split lets the CLI read recipe options/flags from descriptors without linking server classes.
BackendOps — stateless per-backend behavior (backend_ops.h): the model-management logic that happens without a running subprocess. The base class is the shared Hugging Face behavior; each backend overrides only its policy points, so shared download/cache logic is inherited, not copied. Methods: populate_metadata, resolve_checkpoint_path, find_imported_checkpoint, validate_registration_checkpoint, select_checkpoint_files, discover_models, is_downloaded, validate_checkpoint_file, download_model, invalidates_cache_after_download, resolve_version, check_install, classify_unavailable. This is what let model_manager.cpp and system_info.cpp shed their per-recipe switchboards (resolve_model_path went from a ~290-line if/else to one ops_for(recipe)->… call).
Descriptor/ops-driven sites — router creation, NPU/slot eviction & cloud LRU exemption (SlotPolicy, no recipe literals left in router.cpp), device type, recipe options / CLI flags / defaults, config-section identity, ROCm channels (recipe_has_rocm_channels), the support matrix (RECIPE_DEFS deleted from system_info.cpp), recipe→label inference, FLM dynamic discovery, the FLM install-state machine, cloud availability + discovery, and the install-state UI hints.
Registration helpers — make_server<T> / make_spec<T> / single_ops<T> keep the per-backend create()/spec()/ops() one-liners DRY (irregular backends — cloud, ryzenai, vllm — keep bespoke bodies).
/system-info recipes entries enriched with display_name / selectable_backend / uses_ctx_size / options / support. The desktop app reads recipe display names from /system-info instead of hardcoded TypeScript.
Docs generation — docs/tools/gen_backend_docs.py boots lemond, reads /system-info + server_models.json, and rewrites marker-delimited regions of six docs (README.md support matrix, guide/cli.md, guide/configuration/README.md, guide/configuration/multi-model.md, custom-models.md, dev/backends-reference.md) plus assets/models.js. A CI job (backend-docs-drift) fails on drift. Authoring guide added at dev/adding-a-backend.md (both wired into the mkdocs nav).
ModelInfo::extras — generic map<string, json> populated from unknown server_models.json keys, so a new backend adds per-model fields without editing shared structs.

Verification

Local: lemond + lemonade CLI + web-app build clean; C++ unit tests ctest 5/5 (incl. GgufCapabilities, AutoTune, LatestVersionFallback, InstallAtomicity); server_endpoints 71/71; /system-info carries the enriched fields; docs --check clean; a registry smoke confirms all backends register and route. Cross-platform + clean-environment validation via CI.

One pre-existing local failure unrelated to this change (reproduced on main): server_cli2 test_020_list — a built-in collection name with a space ("Lite Collection") breaks the test's whitespace-based table parser.

Notes for reviewers

recipeOptionsConfig.ts (the TypeScript-typed per-recipe option forms) is intentionally left to maintainers per AGENTS.md; the schema is now exposed via /system-info for a future dynamic migration.
Backend install still goes through each backend's BackendSpec (install params are class-side behavior); the descriptor supplies the binary name.
Deliberately left as documented exceptions (not oversights): cloud recipe checks (the dynamic-models exception), collection.omni (the orchestrator exception, not a WrappedServer), inspect_repo repo→recipe detection (its collection branch is that same exception), and defaults.json generation (its variant *_args/*_bin keys aren't in the descriptor options, so generating it would need a config-schema expansion that risks the config contract).

🤖 Generated with Claude Code

Make each inference backend describe itself with a plain-data descriptor plus a server class, and rewrite the scattered `if (recipe == "...")` sites to read a registry built from those descriptors. Adding a backend becomes one LEMON_BACKENDS line plus a descriptor + factory file — no router, CLI, docs, or support-matrix edits. - Descriptor types (BackendDescriptor/BackendOption/SlotPolicy) + a CLI-safe data registry and a server-only factory registry, generated from the LEMON_BACKENDS list at CMake configure time. - All 9 backends carry a descriptor (device, slot policy, options, support matrix, labels, binary) and a create(). - Descriptor-driven: router creation, NPU/slot eviction, device type, recipe options/CLI flags, config-section identity, support matrix, recipe labels, cloud availability. - /system-info recipes enriched with display_name/selectable_backend/options/ support; the app reads recipe display names from it instead of hardcoded TS. - docs/tools/gen_backend_docs.py generates docs/dev/backends-reference.md from /system-info; a CI step fails on drift. Authoring guide in docs/dev/adding-a-backend.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremyfowers · 2026-06-19T17:45:39Z

CI status

All cross-platform builds pass (MSVC, AppleClang, GCC, Arch, openSUSE, Fedora rpm), validating the descriptor aggregate-init, CMake LEMON_BACKENDS codegen, and the CLI-safe/server-only registry split compile everywhere. Functional jobs exercising this change pass: CLI/Endpoints (ubuntu + macOS), Test .exe (whisper, moonshine, stable-diffusion, text-to-speech), backend-docs-drift, plus locally endpoints (69), pinning (6), app-regression (37).

The single red — Test CLI/Endpoints (windows-latest) → test_026_anthropic_messages_tool_calling — is a pre-existing flaky timeout, not from this PR. It's a 500 s ReadTimeout on a tool-calling inference request that the Windows runner intermittently can't finish in time:

main run 27765794877: same job fails on the same test with the identical read timeout=500 signature.
main run 27795912134: same job passes.

This PR touches backend construction, not inference, anthropic_api.cpp, or the tool-calling loop, so it can't change that test's latency. Re-running the job.

Restructure the self-describing backends to the layout the issue #2287 plan specified — one folder per backend — instead of the flat file layout I used before. This also folds the earlier _descriptor/_factory split into the spec's cleaner shape: the descriptor is a header-only `inline const` and create() lives with the server class. Each backend now lives in its own folder, in namespace lemon::backends::<stem>: include/lemon/backends/<stem>/<stem>.h inline const descriptor (CLI-safe data) include/lemon/backends/<stem>/<stem>_server.h WrappedServer subclass + create() decl server/backends/<stem>/<stem>_server.cpp implementation + create() def Shared registry/util files stay at the top of backends/. The CMake foreach over LEMON_BACKENDS compiles each <stem>/<stem>_server.cpp and generates the registry headers from the folder paths. Removes the per-backend *_descriptor.{h,cpp} and *_factory.{h,cpp} files. Behavior is unchanged (same descriptors, same create()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Make the existing curated docs generate from the backend descriptors instead of just shipping a separate reference file — closing appendix rows 14 and 22. - Expand the descriptor with the editorial fields the curated docs need: `modality`, `experimental`, `web_display_name`, and a per-support-row `device_summary` (RecipeBackendDef). These keep the descriptor the single source of truth. - /system-info exposes them plus a registry `order` index and `slot_policy`. - gen_backend_docs.py now targets multiple docs and renders: * README.md "Supported Configurations" HTML matrix (grouped by modality, merged rows, rowspans, experimental tag) — wrapped in GENERATED markers; * docs/guide/configuration/multi-model.md NPU-exclusivity list. The backend-docs-drift CI job's --check now covers all three docs. The generated README matrix is also more complete than the hand-written one (it now includes whispercpp rocm/metal, kokoro metal, sd-cpp metal). Footnotes and prose outside the markers are preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wrap cli.md's "Recipe-Specific Options" tables in GENERATED markers and render them from the descriptor options. This also fixes pre-existing drift: the section documented `--steps`/`--cfg-scale`/`--width`/`--height` flags that the CLI no longer registers, and omitted the moonshine and vllm recipes. Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add inline-marker support to the generator and wrap the `--recipe` "Common values" list in custom-models.md so it renders from the descriptor recipe set (plus collection.omni). Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Close the last two cleanly-derivable doc touchpoints (appendix rows 16 and 21). - configuration/README.md "Example config.json": generated from a fresh lemond's GET /internal/config (the real canonical config). This also fixes pre-existing drift — the hand-written block had `config_version: 1` (now 2), `prefer_system: false` (now true), a stray `device` key, and an invalid trailing comma. `port` is normalized to the documented default 13305. - docs/assets/models.js RECIPE_PRIORITY + RECIPE_DISPLAY_NAMES: generated from descriptors. A new `web_priority` editorial field preserves the curated website ordering (so the order is descriptor-sourced, not a silent reorder); legacy `oga-*` recipes are dropped as agreed. Adds the correct `vllm` display name. The generator now drives 7 docs and supports both `` (Markdown) and `/* */` (JS) GENERATED markers. backend-docs-drift --check covers all of them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ive spec; drop device map) Two agreed plan touchpoints were left incomplete; this finishes them. Row 4 — try_get_spec_for_recipe was still a hand-written 8-branch if-ladder in backend_utils.cpp, which also forced it to #include all 8 server headers. Each backend now exposes a uniform `spec()` accessor (alongside create()); the generated factory registry binds it, and `backends::spec_for(recipe)` / try_get_spec_for_recipe iterate the registry. backend_utils.cpp now includes ZERO server headers. Also reroute the two leaking `Server::SPEC` references (model_manager find_flm_binary) through the registry. Row 5 — get_device_type_from_recipe still carried the full recipe->device map, redundant with BackendDescriptor::default_device. Reduced to a DEVICE_NONE fallback for non-descriptor recipes (collections/unknown); the descriptor is the single source via ModelManager::device_type_for_recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Introduce a stateless per-backend behavior interface for model management that happens WITHOUT a running subprocess (checkpoint-path resolution, download, dynamic discovery, per-model metadata, version detection, availability) — the home for the recipe switchboards currently scattered through model_manager and system_info. - BackendOps base class (lemon/backends/backend_ops.h): shared default behavior; backends override only the policy points they need (inherit shared logic, don't copy it). Methods are added incrementally as switchboards migrate; each has a default so adding one never forces edits to backends that don't override it. - Each backend folder exposes a uniform ops() singleton (alongside create()/spec()), bound into BackendRegistration; backends::ops_for(recipe) returns it. - Purely additive: every backend uses the default base ops for now, so there is no behavior change yet. Migrations follow in subsequent commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…readers into folders Replace the populate_model_metadata recipe switchboard with ops_for(recipe)->populate_metadata(). The backend-specific readers move into their folders: - GGUF metadata reader (read_gguf_metadata + byte parsers) -> backends/llamacpp/ llamacpp_gguf.{h,cpp}; LlamaCppOps::populate_metadata reads arch + capability labels there. - FLM model-file helpers (config.json ctx window, model-dir discovery) -> backends/fastflowlm/fastflowlm_models.{h,cpp}; FlmOps::populate_metadata uses it. model_manager no longer knows how either backend stores or introspects models. CMake now globs each backend folder's *.cpp (CONFIGURE_DEPENDS) so backend-private helper files need no CMake edit; the backend LIST stays explicit. Verified: GGUF context windows still populate (131072/128000/32768 for sample models) and test_gguf_capabilities passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…llamacpp||sd-cpp)&&rocm) Add a `rocm_channels` descriptor field (llamacpp {"stable","nightly"}, sd-cpp {"stable"}) and a recipe_has_rocm_channels() registry helper. Replace the hardcoded `(recipe=="llamacpp"||recipe=="sd-cpp") && rocm` predicate — copied across backend_utils.cpp (3×), backend_manager.cpp (2×), and system_info.cpp — with the descriptor check. rocm_channel_for_recipe() now clamps a requested channel to one the backend publishes (so sd-cpp's missing "nightly" -> "stable" falls out of the data instead of a per-recipe special case). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rst leak) Replace the ~290-line recipe switchboard in ModelManager::resolve_model_path with ops_for(recipe)->resolve_checkpoint_path(). The model manager now only does the generic prefix (collections, local_path/local_upload, HF cache-dir computation) and hands off to the backend. - New BackendOps::resolve_checkpoint_path; base = the shared HF behavior (active-snapshot variant/aux resolution, main-repo fallback, directory fallback). Backends override only their artifact layout: * llamacpp -> GGUF resolver (sharding/folder/quant-token), moved into backends/llamacpp/llamacpp_gguf (resolve_gguf_path). * ryzenai -> genai_config.json directory; kokoro -> index.json; whispercpp -> first .bin; cloud -> ""; flm -> checkpoint passthrough. - New shared backends/hf_cache_util (exists/dir_options/active_snapshot_path/ repo_id_to_cache_dir_name) so ops reuse the same HF-cache mechanics. model_manager.cpp -362 lines; resolve_model_path 365 -> 34. Verified all recipes still resolve as downloaded (llamacpp variants, whisper .bin, kokoro index, sd-cpp, ryzenai, flm) via /models. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…FLM cluster → folder Dynamic discovery, download status, and downloading now flow through BackendOps instead of recipe switchboards in model_manager: - discover_models: build_cache loops descriptors with dynamic_models=true and merges ops->discover_models(). FLM (`flm list`) and cloud (per-provider) both implement it — the two bespoke discovery blocks collapse to one generic loop. - is_downloaded: base = shared HF completeness (ModelManager::checkpoints_complete); CloudOps → true; FlmOps → installed-set membership. Replaces the flm_set/cloud/ else branches in build_cache and add_model_to_cache. - validate_checkpoint_file: LlamaCppOps does the GGUF-magic check (was an inline llamacpp branch in are_required_checkpoints_complete). - download_model: base = shared HF engine (download_from_huggingface_engine); FlmOps → flm pull; CloudOps → no-op. download_registered_model just dispatches. invalidates_cache_after_download() replaces the recipe=="flm" cache-reset. The whole FLM cluster (find_flm_binary, flm_installed_checkpoints, flm_discover_models, flm_download) moves into backends/fastflowlm/fastflowlm_models. model_manager keeps only the generic HF engine. Verified: server_endpoints 69 pass; download status correct for every recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s hook get_recipe_version now reads version.txt generically and lets the backend ops override, instead of branching on recipe. The per-backend version commands move into their folders: - system llama-server version (`llama-server --version` + regex) -> backends/ llamacpp; LlamaCppOps::resolve_version returns it for the "system" backend. - flm version (`flm version --json`) -> backends/fastflowlm (flm_version()); FlmOps::resolve_version returns it when no version.txt is present. Removes SystemInfo::get_system_llamacpp_version / get_flm_version and the llamacpp-system / flm branches from system_info. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

config_section duplicated the recipe string in 8 descriptors; it defaults to the recipe via effective_config_section(), so set those to "". Only sd-cpp ("sdcpp") and ryzenai-llm ("ryzenai") keep an explicit section because theirs genuinely differ from the recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_metrics descriptor flag prometheus_metrics.cpp hardcoded `recipe == "llamacpp"` to decide whether to scrape a backend subprocess's /metrics. Replace with a descriptor flag (exposes_prometheus_metrics; llamacpp = true) so a new backend that exposes Prometheus metrics opts in via its descriptor, not by editing the metrics code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

These backend-specific per-model fields no longer sit on the shared ModelInfo struct: llamacpp reads info.extra<bool>("hf_load", false) and moonshine reads info.extra<int>("moonshine_arch", -1). Removed the typed fields, their explicit parse sites, and their kKnownKeys entries; added parse_extras() to the two ModelInfo-building paths that lacked it (add_model_to_cache, get_model_info_ unfiltered) so extras populate everywhere a model is built from JSON. Verified: llamacpp models still resolve/download (hf_load path intact). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the hardcoded (sd-cpp||llamacpp||vllm)&&rocm recipe-list in is_recipe_installed and build_recipes_info with a rocm_requires_cwsr_fix descriptor flag (set on those three backends). The kernel CWSR detection (needs_gfx1151_cwsr_fix) stays in system_info as generic hardware detection; only "which backends' rocm build needs it" is now descriptor data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ps hook is_recipe_installed now finds the managed binary generically and asks the backend's ops whether it's actually installed, instead of hardcoding the llamacpp-system HIP check and the flm PATH fallback: - check_install(backend, binary_found) ops hook; base = installed iff binary found. LlamaCppOps adds the ggml HIP-plugin requirement for the "system" build on AMD GPUs; FlmOps treats a PATH-installed flm as present. - is_ggml_hip_plugin_available moves into backends/llamacpp; find_flm_executable and run_flm_validate move into backends/fastflowlm. Removed from path_utils (+ their orphaned decls/comments). system_info no longer carries llamacpp/flm-specific availability knowledge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… vs AtLeast) The update-required check special-cased recipe=="flm" to allow an installed version newer than the pin. Replace with a version_policy descriptor field (Exact default; flm = AtLeast for its system-managed package). system_info no longer names flm in the version-comparison logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The `flm remove` subprocess orchestration moves out of ModelManager::delete_model into backends/fastflowlm (flm_remove). model_manager keeps only the generic HF-cache deletion path; the flm branch is now a thin call into the backend. Verified: server_endpoints 69 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cipe blocks RuntimeConfig::recipe_options() had a hardcoded nested→flat translation block per recipe (llamacpp/whispercpp/moonshine/sdcpp/vllm). Replace with a single loop over the descriptors: each option's config.json key is derived from its name role (*_backend → "backend", *_args → variant "<backend>_args"/"args", *_device → "device", else the option name verbatim for sd-cpp's steps/cfg_scale/ width/height). Adding a backend no longer requires editing this function. Verified: server_endpoints 69 pass (config/params translation unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… across descriptor↔server.h) The backend binary name (and recipe) were duplicated between the descriptor (<stem>.h) and the BackendSpec literal (<stem>_server.h) — the cross-file redundancy. Remove the static SPEC member; each backend's spec() now builds the BackendSpec lazily from descriptor.binary (+ descriptor.recipe, or the explicit "ryzenai-server" install id where it differs) plus the class's get_install_params and split flag. In-class binary lookups go through spec(); server.cpp's sd upscale uses try_get_spec_for_recipe. Net: the binary name now lives in exactly one place (the descriptor). Lazy function-local statics also avoid any static-init-order coupling between the descriptor and the spec. Verified: builds green; system-info install detection unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The recipe was repeated on every support row (6x in llamacpp.h). Introduce a recipe-free BackendSupport struct; the owning descriptor's recipe is filled in by recipe_defs() when flattening to RecipeBackendDef. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The preceding generic block already handles backend_versions[recipe] for any recipe, so the recipe=="llamacpp" branch was unreachable duplicate code. Removing it also drops a hardcoded backend name from shared code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

find_flm_server_by_type -> find_coexisting_server_by_type matches on SlotPolicy::CoexistByType; count_pinned_servers_by_type skips SlotPolicy::Unmetered instead of recipe=="cloud". router.cpp now holds zero backend-name string literals; both behaviors are unchanged (flm is the only CoexistByType backend, cloud the only Unmetered one). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ipe==flm Add BackendDescriptor::self_manages_downloads (true only for flm) and ModelManager::backend_self_manages_downloads(). The two load-time auto-download guards in server.cpp/ollama_api.cpp now consult it instead of hardcoding recipe != "flm". flm is the only backend with the flag set, so behavior is identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

resolve_and_register_local_model() had a recipe if/else scanning the imported directory for each backend's primary artifact (.gguf / .bin / genai_config.json dir). Replace with BackendOps::find_imported_checkpoint(dir): default "" registers the directory (sd-cpp/kokoro/moonshine); llamacpp reuses resolve_gguf_path, whisper finds the .bin, ryzenai finds genai_config.json's dir (and its resolve_checkpoint_path now reuses the same scan). server.cpp holds no per-recipe import logic. Verified via local_import smoke tests for llamacpp (ignores mmproj), whisper, and a default backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Reconcile the self-describing-backends refactor with main's divergence: - backend_manager.cpp: keep both includes (main's backend_version_policy.h for resolve_latest_pin + our backend_descriptor_registry.h); the two version concerns are orthogonal. - model_manager.cpp resolve_model_path: keep the ops-based one-liner (backends::ops_for(recipe)->resolve_checkpoint_path) over main's inline recipe switchboard. - Port main's #2300 GGUF resolver improvements into llamacpp resolve_gguf_path: factor cases 0-5 into a resolve_gguf_variant lambda, resolve against the active refs/main snapshot first, then broaden to all snapshots when the active one lacks the variant. Restores test_034. - Regenerate backend docs/models.js for main's new server_models entries. Verified: C++ build clean; ctest 4/4 (incl. GgufCapabilities, LatestVersionFallback, InstallAtomicity); server_endpoints 70/70 (incl. main's #2300 test_034); server_cli2 only the pre-existing test_020 collection-name parsing failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

On Windows the merged include chain pulls in the windows.h max() macro into this TU, turning std::numeric_limits<T>::max() into a syntax error (C2589). Wrap the calls as (std::numeric_limits<T>::max)() so the macro cannot expand. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

flm models come from flm's model_list.json at runtime (0 entries in server_models.json), but the descriptor had dynamic_models=false, so build_cache skipped flm's ops->discover_models() and flm models (e.g. llama3.2-1b-FLM) never registered -> 404. The build_cache comment already documents flm as a dynamic-discovery backend alongside cloud; align the descriptor with that intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

model_manager's download path hardcoded recipe == "moonshine" to fetch a variant directory of files. Add BackendOps::select_checkpoint_files (default nullopt = the GGUF/direct-file defaults) and override it in MoonshineOps. The download path no longer names a backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

system_info hardcoded a recipe == "flm" block to classify FLM's supported-but-unavailable state (.deb/driver manual setup) and emit troubleshoot links. Add BackendOps::classify_unavailable (default nullopt = the generic installable/no-fetch path) and implement it in FlmOps. system_info no longer names a backend in its install-state machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…==llamacpp bench hardcoded recipe == "llamacpp" to send the llamacpp_backend override. Use the CLI-safe descriptor registry: any recipe with selectable_backend gets its <config_section>_backend override (llamacpp and vllm today). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… ops model_manager hardcoded actual_recipe == "llamacpp" to require a :variant on GGUF checkpoints at registration. Add BackendOps::validate_registration_checkpoint (default accept) and implement the GGUF rule in LlamaCppOps. Verified: a GGUF checkpoint without :variant is still rejected; other recipes are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

DRY pass across the backend folders: - Add backends::make_server<T>(ctx) for the standard (log_level, model_manager, backend_manager) construction; the 6 plain create() bodies now call it instead of repeating the three context fields. cloud/ryzenai keep bespoke create(). - Each *_server.h closed and re-opened namespace lemon::backends just to nest the per-backend namespace; nest it inline instead (8 headers). ryzenai is left as-is (its legacy RyzenAIServer lives in namespace lemon, not lemon::backends). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

main advanced ~15 commits with its own GGUF-reader consolidation (lemon/gguf_reader.h + gguf_capabilities.h, ModelInfo::gguf) and a cloud discovery security gate. Reconcile: - model_manager.cpp (6 conflicts): keep the ops-based forms (populate_metadata, validate_checkpoint_file, discover_models, resolve_model_path, download file-selection) over main's inline recipe switchboards. - Consolidate GGUF reading on main's shared lemon::gguf_reader: drop the now -redundant reader from backends/llamacpp/llamacpp_gguf.{h,cpp} (~240 lines), keeping only the unique resolve_gguf_path; LlamaCppOps::populate_metadata now fills ModelInfo::gguf via the shared reader. - Port main's cloud-discovery allow_insecure_http gate into CloudOps::discover_models. - Regenerate docs for main's new server_models entries. Verified: build clean; ctest 5/5 (incl. GgufCapabilities, AutoTune); endpoints 71/71; cli only the pre-existing test_020; docs drift clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jeremyfowers · 2026-06-25T20:44:47Z

-| `--ctx-size SIZE` | Context size for the model | `4096` |
+| `--ctx-size SIZE` | Context size for the model | auto |


nice its already catching some stale docs

cc @bitgamma

The per-backend spec()/ops() are the name-based adapter the CMake codegen binds (<stem>::spec/ops), so the functions must exist — but their bodies were repetitive. Add make_spec<T>(descriptor[, split]) (backend_utils.h, where BackendSpec is complete) and single_ops<T>() (backend_registry.h, next to make_server) so the 7 standard spec() and 7 custom ops() collapse to one line each. ryzenai (install key != recipe) and cloud (no spec) keep bespoke spec(); sd-cpp/vllm keep default_backend_ops(). Pure refactor — registry binding, 71/71 endpoints, and all-backends-registered smoke unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The two backend dev docs added by this work (dev/adding-a-backend.md and the generated dev/backends-reference.md) were not wired into the Development nav. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…onfig/defaults Per-recipe config defaults are now declared in each backend descriptor (takes_args / arg_variants / bin_variants / config_extra -> config_defaults()) instead of hand-maintained blocks in defaults.json. The committed resources/defaults.json stays fully populated (so it remains the discoverable reference for factory defaults) but is now generated: - New GET /internal/config/defaults emits the canonical default config (ConfigFile::base_defaults(): global keys + descriptor-derived per-recipe sections, host/deployment-independent). Documented alongside /internal/config. - gen_backend_docs.py -> gen_backend_boilerplate.py, which mirrors that endpoint verbatim into resources/defaults.json (whole-file) in addition to the doc regions. The existing CI --check now also fails if defaults.json drifts. config_file keeps reading defaults.json at runtime; base_defaults() re-seeds the descriptor blocks so the descriptor stays authoritative even if the file lags. Verified: a fresh config.json reproduces every prior default; endpoints 71/71; generator --check clean; black clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The single-installable-unit path keyed off recipe != "llamacpp"; switch it to repo_kind != "gguf", the same server-provided classification the function already uses for the collection branch. Behavior-equivalent (collections are handled earlier, so by here repo_kind is gguf or onnx-ryzenai), and it drops the last backend-name literal from hf_pull. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Geramy

Please have tools go through the whole thing and cleanup comments to be functional and descriptive to what's not already intrinsicaly visible and required to know to use the function or whatever it is in code, don't have commentary comments, but other than the pure amount of comments which I'm sure is about 25% of this commit haha xD it looks good.

Geramy · 2026-06-26T22:20:16Z

 const RAW_BASE = 'https://raw.githubusercontent.com/lemonade-sdk/lemonade';

+/* BEGIN GENERATED: models-js-recipes */
 const RECIPE_PRIORITY = [


I don't see moonshine in here.

Geramy · 2026-06-26T22:20:35Z

  'kokoro'
 ];

 const RECIPE_DISPLAY_NAMES = {


moonshine seems to be missing here too

Geramy · 2026-06-26T22:22:25Z

+)
+```
+
+The `foreach` in `CMakeLists.txt` compiles `<stem>/<stem>_server.cpp` and


Geramy · 2026-06-26T22:30:40Z

+# forces regeneration on the next build (a file(GLOB) would silently miss a
+# newly added backend). The descriptor is a header-only inline const, so it links
+# into both the lemonade CLI and lemond; only lemond links the server sources.
+set(LEMON_BACKENDS


these should be as close to the top of the file as possible honestly.

Geramy · 2026-06-26T22:31:13Z

+    "cloud|cloud"
+)
+
+set(LEMON_DESCRIPTOR_INCLUDES "")


these are fine here.

Geramy · 2026-06-26T22:35:26Z

+    void download_model(const ModelInfo&, bool, DownloadProgressCallback,
+                        const BackendOpsContext&) const override {}
+
+    // Discover models from each installed cloud provider with a resolvable


this is not needed commentary, again self explanatory.

Geramy · 2026-06-26T22:36:49Z


-// Determine device type from recipe
-// Default device from recipe — individual backends override based on their config
+// Fallback device type for recipes with no registered backend descriptor


maybe shrink the commentary to the purpose, no explanation.

Geramy · 2026-06-26T22:37:44Z

+    // so should be skipped by the router's load-time auto-download path.
+    bool backend_self_manages_downloads(const std::string& recipe) const;
+
+    // Shared Hugging Face completeness check: true if all required checkpoints


im fine with headers containing commentary of the purpose of functions as long as its short and simple and we don't already see intrinsically from the code.

Geramy · 2026-06-26T22:38:28Z

        std::string log_name() const { return recipe + " Server"; };
    };

+    // Build a backend's install/download spec from its descriptor's recipe/binary


this comment is talking about backends it doesn't even know really exist, probably try to trim this to functional comments only please.

Geramy · 2026-06-26T22:39:14Z

+
+using BackendCreateFn = std::unique_ptr<WrappedServer> (*)(const BackendContext&);
+
+// Convenience for the common create(): construct a server class from the


again, functional comments only not commentary.

github-actions Bot added the enhancement New feature or request label Jun 19, 2026

jeremyfowers and others added 27 commits June 19, 2026 16:25

jeremyfowers and others added 8 commits June 22, 2026 19:18

jeremyfowers commented Jun 25, 2026

View reviewed changes

jeremyfowers self-assigned this Jun 25, 2026

jeremyfowers and others added 4 commits June 25, 2026 16:51

Geramy self-requested a review June 26, 2026 22:19

Geramy requested changes Jun 26, 2026

View reviewed changes

		\| `--ctx-size SIZE` \| Context size for the model \| `4096` \|
		\| `--ctx-size SIZE` \| Context size for the model \| auto \|


		using BackendCreateFn = std::unique_ptr<WrappedServer> (*)(const BackendContext&);

		// Convenience for the common create(): construct a server class from the

Uh oh!

Conversation

jeremyfowers commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Layout — a backend is a folder

What changed

Verification

Notes for reviewers

Uh oh!

jeremyfowers commented Jun 19, 2026

CI status

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Geramy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeremyfowers commented Jun 19, 2026 •

edited

Loading