Skip to content
Draft
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
118f54a
refactor(backends): self-describing WrappedServer backends (#2287)
jeremyfowers Jun 19, 2026
2ef9379
refactor(backends): move each backend into its own folder (per spec)
jeremyfowers Jun 19, 2026
33b437b
docs(backends): mechanize the README support matrix from descriptors
jeremyfowers Jun 20, 2026
84616c4
docs(cli): mechanize the per-recipe load-options tables from descriptors
jeremyfowers Jun 20, 2026
2d1fd36
docs(custom-models): mechanize the --recipe value list from descriptors
jeremyfowers Jun 20, 2026
9b8383c
docs: mechanize config.json example and models.js recipe metadata
jeremyfowers Jun 22, 2026
566ea83
refactor(backends): finish agreed touchpoints rows 4 & 5 (registry-dr…
jeremyfowers Jun 22, 2026
cfb6e3d
refactor(backends): add BackendOps infrastructure (Tier-2 foundation)
jeremyfowers Jun 22, 2026
5a1d534
refactor(backends): migrate per-model metadata to ops; move GGUF/FLM …
jeremyfowers Jun 22, 2026
7933852
refactor(backends): descriptor-drive ROCm channels (kill duplicated (…
jeremyfowers Jun 22, 2026
2cc963e
refactor(backends): migrate resolve_model_path switchboard to ops (wo…
jeremyfowers Jun 22, 2026
2feae84
refactor(backends): migrate download/discovery/is_downloaded to ops; …
jeremyfowers Jun 22, 2026
4354260
refactor(backends): migrate version detection to a resolve_version op…
jeremyfowers Jun 22, 2026
2a9b38e
polish(backends): drop redundant config_section (defaults to recipe)
jeremyfowers Jun 22, 2026
623334c
refactor(backends): gate Prometheus scraping on an exposes_prometheus…
jeremyfowers Jun 22, 2026
ae8ca93
refactor(backends): move hf_load and moonshine_arch to ModelInfo::extras
jeremyfowers Jun 22, 2026
add34ed
refactor(backends): descriptor-drive the gfx1151 CWSR availability check
jeremyfowers Jun 22, 2026
94ebbab
refactor(backends): migrate install availability to a check_install o…
jeremyfowers Jun 22, 2026
1ced08c
refactor(backends): descriptor-drive version comparison policy (Exact…
jeremyfowers Jun 22, 2026
e89b47c
refactor(backends): move FLM model deletion into the fastflowlm folder
jeremyfowers Jun 22, 2026
55fa6f1
refactor(config): drive recipe_options() from descriptors, not per-re…
jeremyfowers Jun 22, 2026
c3aff59
polish(backends): build BackendSpec from the descriptor (dedup binary…
jeremyfowers Jun 22, 2026
de6d3b1
polish(backends): drop redundant recipe from descriptor support rows
jeremyfowers Jun 22, 2026
070fcbc
polish(backends): remove dead llamacpp-special branch in version lookup
jeremyfowers Jun 22, 2026
554ab6c
polish(router): replace flm/cloud recipe-string checks with slot policy
jeremyfowers Jun 22, 2026
b4547cd
polish(backends): descriptor flag for self-managed downloads, not rec…
jeremyfowers Jun 22, 2026
71c1bb1
polish(backends): move local-import checkpoint scan into BackendOps
jeremyfowers Jun 22, 2026
a54cacd
Merge origin/main into feat/self-describing-backends
jeremyfowers Jun 22, 2026
7d82220
fix(llamacpp): parenthesize numeric_limits::max() for MSVC
jeremyfowers Jun 22, 2026
6492260
fix(flm): mark backend dynamic_models so its models register
jeremyfowers Jun 22, 2026
f7ec14c
polish(backends): move moonshine download file-selection into ops
jeremyfowers Jun 25, 2026
6cc9552
polish(backends): move FLM unavailable-state machine into flm ops
jeremyfowers Jun 25, 2026
d0368da
polish(cli): drive bench backend override from descriptor, not recipe…
jeremyfowers Jun 25, 2026
e14fc2a
polish(backends): move GGUF :variant registration check into llamacpp…
jeremyfowers Jun 25, 2026
5daebd5
polish(backends): make_server<T> helper + collapse redundant namespaces
jeremyfowers Jun 25, 2026
a288c48
Merge origin/main into feat/self-describing-backends
jeremyfowers Jun 25, 2026
43ec4f2
polish(backends): make_spec<T>/single_ops<T> helpers shrink spec()/ops()
jeremyfowers Jun 25, 2026
8f6f36e
docs(nav): add Adding a Backend + Backends Reference to mkdocs nav
jeremyfowers Jun 25, 2026
2ac10fd
feat(config): generate defaults.json from descriptors via /internal/c…
jeremyfowers Jun 26, 2026
283297a
polish(cli): branch hf pull on repo_kind, not recipe==llamacpp
jeremyfowers Jun 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/workflows/docs_and_style.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,23 @@ jobs:
- name: Run app regression tests
run: node test/app/run-app-regression-tests.cjs

backend-docs-drift:
# The backend reference doc (docs/dev/backends-reference.md) is generated from
# the self-describing backend descriptors. Build lemond, regenerate, and fail
# if the committed doc is stale — the same guarantee a lint provides.
runs-on: ubuntu-latest
concurrency:
group: ${{ github.workflow }}-backend-docs-${{ github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v5
- name: Configure and install build dependencies
run: ./setup.sh
- name: Build lemond
run: cmake --build --preset default --target lemond
- name: Check backend reference docs are up to date
run: python3 docs/tools/gen_backend_docs.py --check

markdown-link-check:
runs-on: ubuntu-latest
concurrency:
Expand Down
86 changes: 77 additions & 9 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -607,15 +607,6 @@ set(SOURCES_CORE
src/cpp/server/utils/wmi_helper.cpp
src/cpp/server/utils/network_beacon.cpp
src/cpp/server/utils/tcp_jsonl_client.cpp
src/cpp/server/backends/cloud_server.cpp
src/cpp/server/backends/llamacpp_server.cpp
src/cpp/server/backends/fastflowlm_server.cpp
src/cpp/server/backends/ryzenaiserver.cpp
src/cpp/server/backends/whisper_server.cpp
src/cpp/server/backends/moonshine_server.cpp
src/cpp/server/backends/kokoro_server.cpp
src/cpp/server/backends/sd_server.cpp
src/cpp/server/backends/vllm_server.cpp
src/cpp/server/backends/backend_utils.cpp
src/cpp/server/backend_manager.cpp
src/cpp/server/ollama_api.cpp
Expand Down Expand Up @@ -647,6 +638,83 @@ elseif(UNIX)
list(APPEND SOURCES_CORE src/cpp/server/utils/platform/process_unix.cpp)
endif()

# ============================================================
# Self-describing backends registry
# ============================================================
# The authoritative backend list. Each entry is "<recipe>|<stem>":
# recipe - the recipe string used in server_models.json (may contain dashes)
# stem - identifier-safe name and folder. Each backend lives in its own
# folder, shipping (in namespace lemon::backends::<stem>):
# include/lemon/backends/<stem>/<stem>.h inline const descriptor (CLI-safe data)
# include/lemon/backends/<stem>/<stem>_server.h WrappedServer subclass + create() decl
# server/backends/<stem>/<stem>_server.cpp implementation + create() def
#
# Adding a backend is one line here plus that folder. The foreach below compiles
# the server source and regenerates the registry headers, which bind each
# descriptor to its create(). Because this list is a tracked input, editing it
# forces regeneration on the next build (a file(GLOB) would silently miss a
# newly added backend). The descriptor is a header-only inline const, so it links
# into both the lemonade CLI and lemond; only lemond links the server sources.
set(LEMON_BACKENDS

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be as close to the top of the file as possible honestly.

# "<recipe>|<stem>"
"llamacpp|llamacpp"
"whispercpp|whispercpp"
"moonshine|moonshine"
"kokoro|kokoro"
"sd-cpp|sdcpp"
"flm|fastflowlm"
"ryzenai-llm|ryzenai"
"vllm|vllm"
"cloud|cloud"
)

set(LEMON_DESCRIPTOR_INCLUDES "")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are fine here.

set(LEMON_DESCRIPTOR_ENTRIES "")
set(LEMON_FACTORY_INCLUDES "")
set(LEMON_FACTORY_ENTRIES "")
# The data registry (descriptors, header-only) links into both binaries; the
# factory registry + per-backend server sources are server-only.
# Absolute paths so the CLI subdirectory can reuse LEMON_BACKEND_DESCRIPTOR_SOURCES.
set(LEMON_BACKEND_DESCRIPTOR_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/backend_descriptor_registry.cpp)
set(LEMON_BACKEND_FACTORY_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/backend_registry.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/backend_ops.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/hf_cache_util.cpp)
foreach(_backend_entry ${LEMON_BACKENDS})
string(REPLACE "|" ";" _backend_parts "${_backend_entry}")
list(GET _backend_parts 1 _backend_stem)
# The descriptor is header-only (no source). Compile every .cpp in the
# backend's folder (server class + any backend-private helpers like GGUF
# parsing) — CONFIGURE_DEPENDS re-globs when a file is added/removed so a new
# helper in a folder needs no CMake edit. (The backend LIST is still explicit
# above so a whole new backend is never silently missed.)
file(GLOB _backend_srcs CONFIGURE_DEPENDS
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/${_backend_stem}/*.cpp)
list(APPEND LEMON_BACKEND_FACTORY_SOURCES ${_backend_srcs})
string(APPEND LEMON_DESCRIPTOR_INCLUDES
"#include \"lemon/backends/${_backend_stem}/${_backend_stem}.h\"\n")
string(APPEND LEMON_DESCRIPTOR_ENTRIES
" &lemon::backends::${_backend_stem}::descriptor,\n")
string(APPEND LEMON_FACTORY_INCLUDES
"#include \"lemon/backends/${_backend_stem}/${_backend_stem}_server.h\"\n")
string(APPEND LEMON_FACTORY_ENTRIES
" { &lemon::backends::${_backend_stem}::descriptor, &lemon::backends::${_backend_stem}::create, lemon::backends::${_backend_stem}::spec(), lemon::backends::${_backend_stem}::ops() },\n")
endforeach()

configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/backend_descriptors_generated.h.in
${CMAKE_CURRENT_BINARY_DIR}/include/backend_descriptors_generated.h
@ONLY)
configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/src/cpp/server/backends/backend_factories_generated.h.in
${CMAKE_CURRENT_BINARY_DIR}/include/backend_factories_generated.h
@ONLY)

# lemond gets both descriptor data and factories; the CLI gets only the data
# (see src/cpp/cli/CMakeLists.txt, which reuses LEMON_BACKEND_DESCRIPTOR_SOURCES).
list(APPEND SOURCES_CORE ${LEMON_BACKEND_DESCRIPTOR_SOURCES} ${LEMON_BACKEND_FACTORY_SOURCES})

# ============================================================
# Server core OBJECT library (shared by lemond and Lemonade.exe)
# ============================================================
Expand Down
82 changes: 52 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ Use `lemonade pull` or the built-in **Model Manager** to download models. You ca

Lemonade supports multiple inference engines for LLM, speech, TTS, and image generation, and each has its own backend and hardware requirements.

<!-- BEGIN GENERATED: backends-matrix -->
<table>
<thead>
<tr>
Expand All @@ -137,107 +138,128 @@ Lemonade supports multiple inference engines for LLM, speech, TTS, and image gen
<tr>
<td rowspan="9"><strong>Text generation</strong></td>
<td rowspan="6"><code>llamacpp</code></td>
<td><code>vulkan</code></td>
<td><code>x86_64</code> CPU, AMD iGPU, AMD dGPU; ARM64 CPU/GPU (Linux)</td>
<td>Windows, Linux</td>
<td><code>system</code></td>
<td><code>x86_64</code>/ARM64 CPU, GPU</td>
<td>Linux</td>
</tr>
<tr>
<td><code>rocm</code></td>
<td>Supported AMD ROCm iGPU/dGPU families*</td>
<td>Windows, Linux</td>
<td><code>metal</code></td>
<td>Apple Silicon GPU</td>
<td>macOS</td>
</tr>
<tr>
<td><code>cuda</code></td>
<td>NVIDIA GPUs (Turing or newer)**</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>cpu</code></td>
<td><code>x86_64</code> CPU; ARM64 CPU (Linux)</td>
<td><code>vulkan</code></td>
<td><code>x86_64</code> CPU, AMD iGPU, AMD dGPU; ARM64 CPU/GPU (Linux)</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>metal</code></td>
<td>Apple Silicon GPU</td>
<td>macOS</td>
<td><code>rocm</code></td>
<td>Supported AMD ROCm iGPU/dGPU families*</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>system</code></td>
<td><code>x86_64</code>/ARM64 CPU, GPU</td>
<td>Linux</td>
<td><code>cpu</code></td>
<td><code>x86_64</code> CPU; ARM64 CPU (Linux)</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>flm</code></td>
<td rowspan="1"><code>flm</code></td>
<td><code>npu</code></td>
<td>XDNA2 NPU</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>ryzenai-llm</code></td>
<td rowspan="1"><code>ryzenai-llm</code></td>
<td><code>npu</code></td>
<td>XDNA2 NPU</td>
<td>Windows</td>
</tr>
<tr>
<td><code>vllm</code> (experimental)</td>
<td rowspan="1"><code>vllm</code> (experimental)</td>
<td><code>rocm</code></td>
<td>Strix Halo iGPU (gfx1151)</td>
<td>Linux</td>
</tr>
<tr>
<td rowspan="4"><strong>Speech-to-text</strong></td>
<td rowspan="3"><code>whispercpp</code></td>
<td rowspan="6"><strong>Speech-to-text</strong></td>
<td rowspan="5"><code>whispercpp</code></td>
<td><code>npu</code></td>
<td>XDNA2 NPU</td>
<td>Windows</td>
</tr>
<tr>
<td><code>rocm</code></td>
<td>Supported AMD ROCm iGPU/dGPU families*</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>vulkan</code></td>
<td><code>x86_64</code> CPU</td>
<td>Linux</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>cpu</code></td>
<td><code>x86_64</code> CPU</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>moonshine</code></td>
<td><code>metal</code></td>
<td>Apple Silicon GPU</td>
<td>macOS</td>
</tr>
<tr>
<td rowspan="1"><code>moonshine</code></td>
<td><code>cpu</code></td>
<td><code>x86_64</code>/<code>arm64</code> CPU</td>
<td>Windows, Linux, macOS</td>
</tr>
<tr>
<td><strong>Text-to-speech</strong></td>
<td><code>kokoro</code></td>
<td rowspan="2"><strong>Text-to-speech</strong></td>
<td rowspan="2"><code>kokoro</code></td>
<td><code>cpu</code></td>
<td><code>x86_64</code> CPU</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td rowspan="4"><strong>Image generation</strong></td>
<td rowspan="4"><code>sd-cpp</code></td>
<td><code>rocm</code></td>
<td>Supported AMD ROCm iGPU/dGPU families*</td>
<td>Windows, Linux</td>
<td><code>metal</code></td>
<td>Apple Silicon GPU</td>
<td>macOS</td>
</tr>
<tr>
<td><code>vulkan</code></td>
<td>Vulkan-capable GPUs</td>
<td rowspan="5"><strong>Image generation</strong></td>
<td rowspan="5"><code>sd-cpp</code></td>
<td><code>rocm</code></td>
<td>Supported AMD ROCm iGPU/dGPU families*</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>cuda</code></td>
<td>NVIDIA GPUs (Turing or newer)**</td>
<td>Linux</td>
</tr>
<tr>
<td><code>vulkan</code></td>
<td>Vulkan-capable GPUs</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>cpu</code></td>
<td><code>x86_64</code> CPU</td>
<td>Windows, Linux</td>
</tr>
<tr>
<td><code>metal</code></td>
<td>Apple Silicon GPU</td>
<td>macOS</td>
</tr>
</tbody>
</table>
<!-- END GENERATED: backends-matrix -->

To check exactly which recipes/backends are supported on your own machine, run:

Expand Down
12 changes: 6 additions & 6 deletions docs/assets/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,25 @@ const GITHUB_REPO = 'lemonade-sdk/lemonade';
const TAGS_URL = `https://api.github.com/repos/${GITHUB_REPO}/tags?per_page=100`;
const RAW_BASE = 'https://raw.githubusercontent.com/lemonade-sdk/lemonade';

/* BEGIN GENERATED: models-js-recipes */
const RECIPE_PRIORITY = [

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see moonshine in here.

'llamacpp',
'ryzenai-llm',
'flm',
'whispercpp',
'sd-cpp',
'oga-hybrid',
'oga-npu',
'oga-cpu',
'kokoro'
];

const RECIPE_DISPLAY_NAMES = {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moonshine seems to be missing here too

llamacpp: 'llama.cpp GPU',
'ryzenai-llm': 'Ryzen AI SW NPU',
flm: 'FastFlowLM NPU',
whispercpp: 'whisper.cpp',
'sd-cpp': 'stable-diffusion.cpp'
'sd-cpp': 'stable-diffusion.cpp',
flm: 'FastFlowLM NPU',
'ryzenai-llm': 'Ryzen AI SW NPU',
vllm: 'vLLM ROCm (experimental)'
};
/* END GENERATED: models-js-recipes */

const state = {
tag: null,
Expand Down
Loading
Loading