refactor(backends): self-describing WrappedServer backends (#2287)#2320
refactor(backends): self-describing WrappedServer backends (#2287)#2320jeremyfowers wants to merge 40 commits into
Conversation
Make each inference backend describe itself with a plain-data descriptor plus a server class, and rewrite the scattered `if (recipe == "...")` sites to read a registry built from those descriptors. Adding a backend becomes one LEMON_BACKENDS line plus a descriptor + factory file — no router, CLI, docs, or support-matrix edits. - Descriptor types (BackendDescriptor/BackendOption/SlotPolicy) + a CLI-safe data registry and a server-only factory registry, generated from the LEMON_BACKENDS list at CMake configure time. - All 9 backends carry a descriptor (device, slot policy, options, support matrix, labels, binary) and a create(). - Descriptor-driven: router creation, NPU/slot eviction, device type, recipe options/CLI flags, config-section identity, support matrix, recipe labels, cloud availability. - /system-info recipes enriched with display_name/selectable_backend/options/ support; the app reads recipe display names from it instead of hardcoded TS. - docs/tools/gen_backend_docs.py generates docs/dev/backends-reference.md from /system-info; a CI step fails on drift. Authoring guide in docs/dev/adding-a-backend.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI statusAll cross-platform builds pass (MSVC, AppleClang, GCC, Arch, openSUSE, Fedora rpm), validating the descriptor aggregate-init, CMake The single red — Test CLI/Endpoints (windows-latest) →
This PR touches backend construction, not inference, |
Restructure the self-describing backends to the layout the issue #2287 plan specified — one folder per backend — instead of the flat file layout I used before. This also folds the earlier _descriptor/_factory split into the spec's cleaner shape: the descriptor is a header-only `inline const` and create() lives with the server class. Each backend now lives in its own folder, in namespace lemon::backends::<stem>: include/lemon/backends/<stem>/<stem>.h inline const descriptor (CLI-safe data) include/lemon/backends/<stem>/<stem>_server.h WrappedServer subclass + create() decl server/backends/<stem>/<stem>_server.cpp implementation + create() def Shared registry/util files stay at the top of backends/. The CMake foreach over LEMON_BACKENDS compiles each <stem>/<stem>_server.cpp and generates the registry headers from the folder paths. Removes the per-backend *_descriptor.{h,cpp} and *_factory.{h,cpp} files. Behavior is unchanged (same descriptors, same create()). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the existing curated docs generate from the backend descriptors instead of
just shipping a separate reference file — closing appendix rows 14 and 22.
- Expand the descriptor with the editorial fields the curated docs need:
`modality`, `experimental`, `web_display_name`, and a per-support-row
`device_summary` (RecipeBackendDef). These keep the descriptor the single
source of truth.
- /system-info exposes them plus a registry `order` index and `slot_policy`.
- gen_backend_docs.py now targets multiple docs and renders:
* README.md "Supported Configurations" HTML matrix (grouped by modality,
merged rows, rowspans, experimental tag) — wrapped in GENERATED markers;
* docs/guide/configuration/multi-model.md NPU-exclusivity list.
The backend-docs-drift CI job's --check now covers all three docs.
The generated README matrix is also more complete than the hand-written one
(it now includes whispercpp rocm/metal, kokoro metal, sd-cpp metal). Footnotes
and prose outside the markers are preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wrap cli.md's "Recipe-Specific Options" tables in GENERATED markers and render them from the descriptor options. This also fixes pre-existing drift: the section documented `--steps`/`--cfg-scale`/`--width`/`--height` flags that the CLI no longer registers, and omitted the moonshine and vllm recipes. Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add inline-marker support to the generator and wrap the `--recipe` "Common values" list in custom-models.md so it renders from the descriptor recipe set (plus collection.omni). Now covered by the backend-docs-drift CI check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the last two cleanly-derivable doc touchpoints (appendix rows 16 and 21). - configuration/README.md "Example config.json": generated from a fresh lemond's GET /internal/config (the real canonical config). This also fixes pre-existing drift — the hand-written block had `config_version: 1` (now 2), `prefer_system: false` (now true), a stray `device` key, and an invalid trailing comma. `port` is normalized to the documented default 13305. - docs/assets/models.js RECIPE_PRIORITY + RECIPE_DISPLAY_NAMES: generated from descriptors. A new `web_priority` editorial field preserves the curated website ordering (so the order is descriptor-sourced, not a silent reorder); legacy `oga-*` recipes are dropped as agreed. Adds the correct `vllm` display name. The generator now drives 7 docs and supports both `<!-- -->` (Markdown) and `/* */` (JS) GENERATED markers. backend-docs-drift --check covers all of them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ive spec; drop device map) Two agreed plan touchpoints were left incomplete; this finishes them. Row 4 — try_get_spec_for_recipe was still a hand-written 8-branch if-ladder in backend_utils.cpp, which also forced it to #include all 8 server headers. Each backend now exposes a uniform `spec()` accessor (alongside create()); the generated factory registry binds it, and `backends::spec_for(recipe)` / try_get_spec_for_recipe iterate the registry. backend_utils.cpp now includes ZERO server headers. Also reroute the two leaking `Server::SPEC` references (model_manager find_flm_binary) through the registry. Row 5 — get_device_type_from_recipe still carried the full recipe->device map, redundant with BackendDescriptor::default_device. Reduced to a DEVICE_NONE fallback for non-descriptor recipes (collections/unknown); the descriptor is the single source via ModelManager::device_type_for_recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce a stateless per-backend behavior interface for model management that happens WITHOUT a running subprocess (checkpoint-path resolution, download, dynamic discovery, per-model metadata, version detection, availability) — the home for the recipe switchboards currently scattered through model_manager and system_info. - BackendOps base class (lemon/backends/backend_ops.h): shared default behavior; backends override only the policy points they need (inherit shared logic, don't copy it). Methods are added incrementally as switchboards migrate; each has a default so adding one never forces edits to backends that don't override it. - Each backend folder exposes a uniform ops() singleton (alongside create()/spec()), bound into BackendRegistration; backends::ops_for(recipe) returns it. - Purely additive: every backend uses the default base ops for now, so there is no behavior change yet. Migrations follow in subsequent commits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…readers into folders
Replace the populate_model_metadata recipe switchboard with
ops_for(recipe)->populate_metadata(). The backend-specific readers move into
their folders:
- GGUF metadata reader (read_gguf_metadata + byte parsers) -> backends/llamacpp/
llamacpp_gguf.{h,cpp}; LlamaCppOps::populate_metadata reads arch + capability
labels there.
- FLM model-file helpers (config.json ctx window, model-dir discovery) ->
backends/fastflowlm/fastflowlm_models.{h,cpp}; FlmOps::populate_metadata uses it.
model_manager no longer knows how either backend stores or introspects models.
CMake now globs each backend folder's *.cpp (CONFIGURE_DEPENDS) so backend-private
helper files need no CMake edit; the backend LIST stays explicit.
Verified: GGUF context windows still populate (131072/128000/32768 for sample
models) and test_gguf_capabilities passes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…llamacpp||sd-cpp)&&rocm)
Add a `rocm_channels` descriptor field (llamacpp {"stable","nightly"}, sd-cpp
{"stable"}) and a recipe_has_rocm_channels() registry helper. Replace the
hardcoded `(recipe=="llamacpp"||recipe=="sd-cpp") && rocm` predicate — copied
across backend_utils.cpp (3×), backend_manager.cpp (2×), and system_info.cpp —
with the descriptor check. rocm_channel_for_recipe() now clamps a requested
channel to one the backend publishes (so sd-cpp's missing "nightly" -> "stable"
falls out of the data instead of a per-recipe special case).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rst leak)
Replace the ~290-line recipe switchboard in ModelManager::resolve_model_path
with ops_for(recipe)->resolve_checkpoint_path(). The model manager now only does
the generic prefix (collections, local_path/local_upload, HF cache-dir
computation) and hands off to the backend.
- New BackendOps::resolve_checkpoint_path; base = the shared HF behavior
(active-snapshot variant/aux resolution, main-repo fallback, directory
fallback). Backends override only their artifact layout:
* llamacpp -> GGUF resolver (sharding/folder/quant-token), moved into
backends/llamacpp/llamacpp_gguf (resolve_gguf_path).
* ryzenai -> genai_config.json directory; kokoro -> index.json;
whispercpp -> first .bin; cloud -> ""; flm -> checkpoint passthrough.
- New shared backends/hf_cache_util (exists/dir_options/active_snapshot_path/
repo_id_to_cache_dir_name) so ops reuse the same HF-cache mechanics.
model_manager.cpp -362 lines; resolve_model_path 365 -> 34. Verified all recipes
still resolve as downloaded (llamacpp variants, whisper .bin, kokoro index,
sd-cpp, ryzenai, flm) via /models.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…FLM cluster → folder Dynamic discovery, download status, and downloading now flow through BackendOps instead of recipe switchboards in model_manager: - discover_models: build_cache loops descriptors with dynamic_models=true and merges ops->discover_models(). FLM (`flm list`) and cloud (per-provider) both implement it — the two bespoke discovery blocks collapse to one generic loop. - is_downloaded: base = shared HF completeness (ModelManager::checkpoints_complete); CloudOps → true; FlmOps → installed-set membership. Replaces the flm_set/cloud/ else branches in build_cache and add_model_to_cache. - validate_checkpoint_file: LlamaCppOps does the GGUF-magic check (was an inline llamacpp branch in are_required_checkpoints_complete). - download_model: base = shared HF engine (download_from_huggingface_engine); FlmOps → flm pull; CloudOps → no-op. download_registered_model just dispatches. invalidates_cache_after_download() replaces the recipe=="flm" cache-reset. The whole FLM cluster (find_flm_binary, flm_installed_checkpoints, flm_discover_models, flm_download) moves into backends/fastflowlm/fastflowlm_models. model_manager keeps only the generic HF engine. Verified: server_endpoints 69 pass; download status correct for every recipe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s hook get_recipe_version now reads version.txt generically and lets the backend ops override, instead of branching on recipe. The per-backend version commands move into their folders: - system llama-server version (`llama-server --version` + regex) -> backends/ llamacpp; LlamaCppOps::resolve_version returns it for the "system" backend. - flm version (`flm version --json`) -> backends/fastflowlm (flm_version()); FlmOps::resolve_version returns it when no version.txt is present. Removes SystemInfo::get_system_llamacpp_version / get_flm_version and the llamacpp-system / flm branches from system_info. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
config_section duplicated the recipe string in 8 descriptors; it defaults to the
recipe via effective_config_section(), so set those to "". Only sd-cpp ("sdcpp")
and ryzenai-llm ("ryzenai") keep an explicit section because theirs genuinely
differ from the recipe.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_metrics descriptor flag prometheus_metrics.cpp hardcoded `recipe == "llamacpp"` to decide whether to scrape a backend subprocess's /metrics. Replace with a descriptor flag (exposes_prometheus_metrics; llamacpp = true) so a new backend that exposes Prometheus metrics opts in via its descriptor, not by editing the metrics code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These backend-specific per-model fields no longer sit on the shared ModelInfo
struct: llamacpp reads info.extra<bool>("hf_load", false) and moonshine reads
info.extra<int>("moonshine_arch", -1). Removed the typed fields, their explicit
parse sites, and their kKnownKeys entries; added parse_extras() to the two
ModelInfo-building paths that lacked it (add_model_to_cache, get_model_info_
unfiltered) so extras populate everywhere a model is built from JSON.
Verified: llamacpp models still resolve/download (hf_load path intact).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the hardcoded (sd-cpp||llamacpp||vllm)&&rocm recipe-list in is_recipe_installed and build_recipes_info with a rocm_requires_cwsr_fix descriptor flag (set on those three backends). The kernel CWSR detection (needs_gfx1151_cwsr_fix) stays in system_info as generic hardware detection; only "which backends' rocm build needs it" is now descriptor data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ps hook is_recipe_installed now finds the managed binary generically and asks the backend's ops whether it's actually installed, instead of hardcoding the llamacpp-system HIP check and the flm PATH fallback: - check_install(backend, binary_found) ops hook; base = installed iff binary found. LlamaCppOps adds the ggml HIP-plugin requirement for the "system" build on AMD GPUs; FlmOps treats a PATH-installed flm as present. - is_ggml_hip_plugin_available moves into backends/llamacpp; find_flm_executable and run_flm_validate move into backends/fastflowlm. Removed from path_utils (+ their orphaned decls/comments). system_info no longer carries llamacpp/flm-specific availability knowledge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… vs AtLeast) The update-required check special-cased recipe=="flm" to allow an installed version newer than the pin. Replace with a version_policy descriptor field (Exact default; flm = AtLeast for its system-managed package). system_info no longer names flm in the version-comparison logic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `flm remove` subprocess orchestration moves out of ModelManager::delete_model into backends/fastflowlm (flm_remove). model_manager keeps only the generic HF-cache deletion path; the flm branch is now a thin call into the backend. Verified: server_endpoints 69 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cipe blocks RuntimeConfig::recipe_options() had a hardcoded nested→flat translation block per recipe (llamacpp/whispercpp/moonshine/sdcpp/vllm). Replace with a single loop over the descriptors: each option's config.json key is derived from its name role (*_backend → "backend", *_args → variant "<backend>_args"/"args", *_device → "device", else the option name verbatim for sd-cpp's steps/cfg_scale/ width/height). Adding a backend no longer requires editing this function. Verified: server_endpoints 69 pass (config/params translation unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… across descriptor↔server.h) The backend binary name (and recipe) were duplicated between the descriptor (<stem>.h) and the BackendSpec literal (<stem>_server.h) — the cross-file redundancy. Remove the static SPEC member; each backend's spec() now builds the BackendSpec lazily from descriptor.binary (+ descriptor.recipe, or the explicit "ryzenai-server" install id where it differs) plus the class's get_install_params and split flag. In-class binary lookups go through spec(); server.cpp's sd upscale uses try_get_spec_for_recipe. Net: the binary name now lives in exactly one place (the descriptor). Lazy function-local statics also avoid any static-init-order coupling between the descriptor and the spec. Verified: builds green; system-info install detection unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The recipe was repeated on every support row (6x in llamacpp.h). Introduce a recipe-free BackendSupport struct; the owning descriptor's recipe is filled in by recipe_defs() when flattening to RecipeBackendDef. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The preceding generic block already handles backend_versions[recipe] for any recipe, so the recipe=="llamacpp" branch was unreachable duplicate code. Removing it also drops a hardcoded backend name from shared code. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
find_flm_server_by_type -> find_coexisting_server_by_type matches on SlotPolicy::CoexistByType; count_pinned_servers_by_type skips SlotPolicy::Unmetered instead of recipe=="cloud". router.cpp now holds zero backend-name string literals; both behaviors are unchanged (flm is the only CoexistByType backend, cloud the only Unmetered one). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ipe==flm Add BackendDescriptor::self_manages_downloads (true only for flm) and ModelManager::backend_self_manages_downloads(). The two load-time auto-download guards in server.cpp/ollama_api.cpp now consult it instead of hardcoding recipe != "flm". flm is the only backend with the flag set, so behavior is identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
resolve_and_register_local_model() had a recipe if/else scanning the imported directory for each backend's primary artifact (.gguf / .bin / genai_config.json dir). Replace with BackendOps::find_imported_checkpoint(dir): default "" registers the directory (sd-cpp/kokoro/moonshine); llamacpp reuses resolve_gguf_path, whisper finds the .bin, ryzenai finds genai_config.json's dir (and its resolve_checkpoint_path now reuses the same scan). server.cpp holds no per-recipe import logic. Verified via local_import smoke tests for llamacpp (ignores mmproj), whisper, and a default backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reconcile the self-describing-backends refactor with main's divergence: - backend_manager.cpp: keep both includes (main's backend_version_policy.h for resolve_latest_pin + our backend_descriptor_registry.h); the two version concerns are orthogonal. - model_manager.cpp resolve_model_path: keep the ops-based one-liner (backends::ops_for(recipe)->resolve_checkpoint_path) over main's inline recipe switchboard. - Port main's #2300 GGUF resolver improvements into llamacpp resolve_gguf_path: factor cases 0-5 into a resolve_gguf_variant lambda, resolve against the active refs/main snapshot first, then broaden to all snapshots when the active one lacks the variant. Restores test_034. - Regenerate backend docs/models.js for main's new server_models entries. Verified: C++ build clean; ctest 4/4 (incl. GgufCapabilities, LatestVersionFallback, InstallAtomicity); server_endpoints 70/70 (incl. main's #2300 test_034); server_cli2 only the pre-existing test_020 collection-name parsing failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On Windows the merged include chain pulls in the windows.h max() macro into this TU, turning std::numeric_limits<T>::max() into a syntax error (C2589). Wrap the calls as (std::numeric_limits<T>::max)() so the macro cannot expand. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
flm models come from flm's model_list.json at runtime (0 entries in server_models.json), but the descriptor had dynamic_models=false, so build_cache skipped flm's ops->discover_models() and flm models (e.g. llama3.2-1b-FLM) never registered -> 404. The build_cache comment already documents flm as a dynamic-discovery backend alongside cloud; align the descriptor with that intent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
model_manager's download path hardcoded recipe == "moonshine" to fetch a variant directory of files. Add BackendOps::select_checkpoint_files (default nullopt = the GGUF/direct-file defaults) and override it in MoonshineOps. The download path no longer names a backend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
system_info hardcoded a recipe == "flm" block to classify FLM's supported-but-unavailable state (.deb/driver manual setup) and emit troubleshoot links. Add BackendOps::classify_unavailable (default nullopt = the generic installable/no-fetch path) and implement it in FlmOps. system_info no longer names a backend in its install-state machine. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…==llamacpp bench hardcoded recipe == "llamacpp" to send the llamacpp_backend override. Use the CLI-safe descriptor registry: any recipe with selectable_backend gets its <config_section>_backend override (llamacpp and vllm today). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… ops model_manager hardcoded actual_recipe == "llamacpp" to require a :variant on GGUF checkpoints at registration. Add BackendOps::validate_registration_checkpoint (default accept) and implement the GGUF rule in LlamaCppOps. Verified: a GGUF checkpoint without :variant is still rejected; other recipes are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DRY pass across the backend folders: - Add backends::make_server<T>(ctx) for the standard (log_level, model_manager, backend_manager) construction; the 6 plain create() bodies now call it instead of repeating the three context fields. cloud/ryzenai keep bespoke create(). - Each *_server.h closed and re-opened namespace lemon::backends just to nest the per-backend namespace; nest it inline instead (8 headers). ryzenai is left as-is (its legacy RyzenAIServer lives in namespace lemon, not lemon::backends). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main advanced ~15 commits with its own GGUF-reader consolidation
(lemon/gguf_reader.h + gguf_capabilities.h, ModelInfo::gguf) and a cloud
discovery security gate. Reconcile:
- model_manager.cpp (6 conflicts): keep the ops-based forms (populate_metadata,
validate_checkpoint_file, discover_models, resolve_model_path, download
file-selection) over main's inline recipe switchboards.
- Consolidate GGUF reading on main's shared lemon::gguf_reader: drop the now
-redundant reader from backends/llamacpp/llamacpp_gguf.{h,cpp} (~240 lines),
keeping only the unique resolve_gguf_path; LlamaCppOps::populate_metadata now
fills ModelInfo::gguf via the shared reader.
- Port main's cloud-discovery allow_insecure_http gate into CloudOps::discover_models.
- Regenerate docs for main's new server_models entries.
Verified: build clean; ctest 5/5 (incl. GgufCapabilities, AutoTune); endpoints
71/71; cli only the pre-existing test_020; docs drift clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| | `--ctx-size SIZE` | Context size for the model | `4096` | | ||
| | `--ctx-size SIZE` | Context size for the model | auto | |
There was a problem hiding this comment.
nice its already catching some stale docs
cc @bitgamma
The per-backend spec()/ops() are the name-based adapter the CMake codegen binds (<stem>::spec/ops), so the functions must exist — but their bodies were repetitive. Add make_spec<T>(descriptor[, split]) (backend_utils.h, where BackendSpec is complete) and single_ops<T>() (backend_registry.h, next to make_server) so the 7 standard spec() and 7 custom ops() collapse to one line each. ryzenai (install key != recipe) and cloud (no spec) keep bespoke spec(); sd-cpp/vllm keep default_backend_ops(). Pure refactor — registry binding, 71/71 endpoints, and all-backends-registered smoke unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The two backend dev docs added by this work (dev/adding-a-backend.md and the generated dev/backends-reference.md) were not wired into the Development nav. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onfig/defaults Per-recipe config defaults are now declared in each backend descriptor (takes_args / arg_variants / bin_variants / config_extra -> config_defaults()) instead of hand-maintained blocks in defaults.json. The committed resources/defaults.json stays fully populated (so it remains the discoverable reference for factory defaults) but is now generated: - New GET /internal/config/defaults emits the canonical default config (ConfigFile::base_defaults(): global keys + descriptor-derived per-recipe sections, host/deployment-independent). Documented alongside /internal/config. - gen_backend_docs.py -> gen_backend_boilerplate.py, which mirrors that endpoint verbatim into resources/defaults.json (whole-file) in addition to the doc regions. The existing CI --check now also fails if defaults.json drifts. config_file keeps reading defaults.json at runtime; base_defaults() re-seeds the descriptor blocks so the descriptor stays authoritative even if the file lags. Verified: a fresh config.json reproduces every prior default; endpoints 71/71; generator --check clean; black clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The single-installable-unit path keyed off recipe != "llamacpp"; switch it to repo_kind != "gguf", the same server-provided classification the function already uses for the collection branch. Behavior-equivalent (collections are handled earlier, so by here repo_kind is gguf or onnx-ryzenai), and it drops the last backend-name literal from hf_pull. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Geramy
left a comment
There was a problem hiding this comment.
Please have tools go through the whole thing and cleanup comments to be functional and descriptive to what's not already intrinsicaly visible and required to know to use the function or whatever it is in code, don't have commentary comments, but other than the pure amount of comments which I'm sure is about 25% of this commit haha xD it looks good.
| const RAW_BASE = 'https://raw.githubusercontent.com/lemonade-sdk/lemonade'; | ||
|
|
||
| /* BEGIN GENERATED: models-js-recipes */ | ||
| const RECIPE_PRIORITY = [ |
| 'kokoro' | ||
| ]; | ||
|
|
||
| const RECIPE_DISPLAY_NAMES = { |
There was a problem hiding this comment.
moonshine seems to be missing here too
| ) | ||
| ``` | ||
|
|
||
| The `foreach` in `CMakeLists.txt` compiles `<stem>/<stem>_server.cpp` and |
| # forces regeneration on the next build (a file(GLOB) would silently miss a | ||
| # newly added backend). The descriptor is a header-only inline const, so it links | ||
| # into both the lemonade CLI and lemond; only lemond links the server sources. | ||
| set(LEMON_BACKENDS |
There was a problem hiding this comment.
these should be as close to the top of the file as possible honestly.
| "cloud|cloud" | ||
| ) | ||
|
|
||
| set(LEMON_DESCRIPTOR_INCLUDES "") |
| void download_model(const ModelInfo&, bool, DownloadProgressCallback, | ||
| const BackendOpsContext&) const override {} | ||
|
|
||
| // Discover models from each installed cloud provider with a resolvable |
There was a problem hiding this comment.
this is not needed commentary, again self explanatory.
|
|
||
| // Determine device type from recipe | ||
| // Default device from recipe — individual backends override based on their config | ||
| // Fallback device type for recipes with no registered backend descriptor |
There was a problem hiding this comment.
maybe shrink the commentary to the purpose, no explanation.
| // so should be skipped by the router's load-time auto-download path. | ||
| bool backend_self_manages_downloads(const std::string& recipe) const; | ||
|
|
||
| // Shared Hugging Face completeness check: true if all required checkpoints |
There was a problem hiding this comment.
im fine with headers containing commentary of the purpose of functions as long as its short and simple and we don't already see intrinsically from the code.
| std::string log_name() const { return recipe + " Server"; }; | ||
| }; | ||
|
|
||
| // Build a backend's install/download spec from its descriptor's recipe/binary |
There was a problem hiding this comment.
this comment is talking about backends it doesn't even know really exist, probably try to trim this to functional comments only please.
|
|
||
| using BackendCreateFn = std::unique_ptr<WrappedServer> (*)(const BackendContext&); | ||
|
|
||
| // Convenience for the common create(): construct a server class from the |
There was a problem hiding this comment.
again, functional comments only not commentary.
Implements the plan in #2287: each inference backend describes itself with a plain-data descriptor + a server class + a stateless behavior object, and every scattered
if (recipe == "...")site is rewritten to read a registry built from those descriptors. Backend-specific logic no longer leaks into the router, model manager, system-info, CLI, or docs.Layout — a backend is a folder
Adding a backend = one
LEMON_BACKENDSline inCMakeLists.txt+ that folder + abackend_versions.jsonpin +server_models.jsonentries. No router, CLI, doc, or support-matrix edits — those are all derived. CMake globs each backend folder (CONFIGURE_DEPENDS), so backend-private helper files need no build edit.What changed
Descriptor (
backend_descriptor.h) — plain data describing what a backend is: recipe, display name, binary, config section, default device,SlotPolicy,selectable_backend,uses_ctx_size,dynamic_models, declarativeoptions[], OS/GPUsupport[], default labels, required checkpoints, plus editorial/policy fields (modality,experimental,web_priority,rocm_channels,version_policy,exposes_prometheus_metrics,rocm_requires_cwsr_fix,self_manages_downloads).Two-tier registry, generated from
LEMON_BACKENDSat CMake configure time — a CLI-safe data registry (descriptors only; links into bothlemonadeandlemond) and a server-only factory registry (binds each descriptor to its class'screate(),spec(),ops()). This split lets the CLI read recipe options/flags from descriptors without linking server classes.BackendOps— stateless per-backend behavior (backend_ops.h): the model-management logic that happens without a running subprocess. The base class is the shared Hugging Face behavior; each backend overrides only its policy points, so shared download/cache logic is inherited, not copied. Methods:populate_metadata,resolve_checkpoint_path,find_imported_checkpoint,validate_registration_checkpoint,select_checkpoint_files,discover_models,is_downloaded,validate_checkpoint_file,download_model,invalidates_cache_after_download,resolve_version,check_install,classify_unavailable. This is what letmodel_manager.cppandsystem_info.cppshed their per-recipe switchboards (resolve_model_pathwent from a ~290-lineif/elseto oneops_for(recipe)->…call).Descriptor/ops-driven sites — router creation, NPU/slot eviction & cloud LRU exemption (
SlotPolicy, no recipe literals left inrouter.cpp), device type, recipe options / CLI flags / defaults, config-section identity, ROCm channels (recipe_has_rocm_channels), the support matrix (RECIPE_DEFSdeleted fromsystem_info.cpp), recipe→label inference, FLM dynamic discovery, the FLM install-state machine, cloud availability + discovery, and the install-state UI hints.Registration helpers —
make_server<T>/make_spec<T>/single_ops<T>keep the per-backendcreate()/spec()/ops()one-liners DRY (irregular backends — cloud, ryzenai, vllm — keep bespoke bodies)./system-inforecipesentries enriched withdisplay_name/selectable_backend/uses_ctx_size/options/support. The desktop app reads recipe display names from/system-infoinstead of hardcoded TypeScript.Docs generation —
docs/tools/gen_backend_docs.pybootslemond, reads/system-info+server_models.json, and rewrites marker-delimited regions of six docs (README.mdsupport matrix,guide/cli.md,guide/configuration/README.md,guide/configuration/multi-model.md,custom-models.md,dev/backends-reference.md) plusassets/models.js. A CI job (backend-docs-drift) fails on drift. Authoring guide added atdev/adding-a-backend.md(both wired into the mkdocs nav).ModelInfo::extras— genericmap<string, json>populated from unknownserver_models.jsonkeys, so a new backend adds per-model fields without editing shared structs.Verification
Local:
lemond+lemonadeCLI + web-app build clean; C++ unit tests ctest 5/5 (incl. GgufCapabilities, AutoTune, LatestVersionFallback, InstallAtomicity); server_endpoints 71/71;/system-infocarries the enriched fields; docs--checkclean; a registry smoke confirms all backends register and route. Cross-platform + clean-environment validation via CI.One pre-existing local failure unrelated to this change (reproduced on
main):server_cli2test_020_list— a built-in collection name with a space ("Lite Collection") breaks the test's whitespace-based table parser.Notes for reviewers
recipeOptionsConfig.ts(the TypeScript-typed per-recipe option forms) is intentionally left to maintainers perAGENTS.md; the schema is now exposed via/system-infofor a future dynamic migration.BackendSpec(install params are class-side behavior); the descriptor supplies the binary name.cloudrecipe checks (the dynamic-models exception),collection.omni(the orchestrator exception, not aWrappedServer),inspect_reporepo→recipe detection (its collection branch is that same exception), anddefaults.jsongeneration (its variant*_args/*_binkeys aren't in the descriptoroptions, so generating it would need a config-schema expansion that risks the config contract).🤖 Generated with Claude Code