feat(server): persist model pin status and auto-load pinned models on boot by abn · Pull Request #2310 · lemonade-sdk/lemonade

abn · 2026-06-18T23:49:13Z

This pull request adds support for persisting and automatically loading "pinned" models in the server. Now, when a model is pinned, its status is saved to disk and, upon server restart, any pinned models are auto-loaded in the background. This ensures that important models are always available after a restart. Previously, this state was lost and server restarts caused configuration amnesia.

Persistence and auto-loading of pinned models:

When a model's pinned status is set (either during load or via the /internal/pin endpoint), the status is now saved to disk in recipe_options.json, ensuring persistence across server restarts. [1] [2]
On server startup, a new asynchronous task (Server::load_pinned_models_async) scans for models marked as pinned and loads them in the background, keeping the thread alive to prevent orphaned backend processes. [1] [2] [3]

Testing enhancements:

Added an integration test (test_pinned_model_persists_and_autoloads_on_restart) that verifies a pinned model is persisted and automatically loaded after a server restart, including setup/teardown and log output for debugging. [1] [2]

kenvandine · 2026-06-22T16:37:40Z

@claude please review this

github-actions · 2026-06-22T16:44:08Z

Claude finished @kenvandine's task in 3m 3s —— View job

Review: persist & auto-load pinned models

I traced the full flow: pin → save_model_options → recipe_options.json (keyed by canonical ID, under get_cache_dir()), then on boot build_recipe_options() layers the saved pinned flag back into each model's recipe_options, and load_pinned_models_async() reads it via get_supported_models(). The feature is sound and the persistence round-trips correctly. Nice work, and the integration test is genuinely end-to-end (real restart, shared models dir, isolated config dir). A few things worth addressing:

Correctness

1. Pinned state is persisted before the load succeeds — router.cpp:394-402
The persist block runs before the model is actually loaded (and before the early-return / NPU-eviction logic). If the subsequent load throws, the model is now recorded as pinned: true on disk and will be retried on every future boot, failing each time. Consider persisting only after a successful load, or rolling back on failure.

2. Multiple conflicting pinned models on boot — server.cpp:5314-5331
Auto-load iterates get_supported_models() (a std::map, i.e. alphabetical order) and loads each pinned model sequentially via load_model. Two concerns:

If a user has pinned two exclusive-NPU recipes (ryzenai-llm / whispercpp), or more pinned models than max_loaded_models, the later loads will evict the earlier ones (NPU exclusivity / limit enforcement) — so the final state silently depends on map ordering, not user intent. Worth at least a LOG(WARNING) when a pinned model can't coexist, so it isn't a silent no-op.
VRAM exhaustion from loading many pinned models at once isn't bounded here.

3. Redundant double-write during auto-load
Boot calls load_model(name, info, info.recipe_options, true, false, /*pinned=*/true). Because pinned.has_value(), the new block at router.cpp:394 re-writes the identical pinned: true back to disk for every pinned model on every boot. Harmless but unnecessary I/O — the value already lives in info.recipe_options.

Thread-safety

4. recipe_options_ has no mutex — model_manager.h:310, model_manager.cpp:1907-1914
save_model_options() mutates the recipe_options_ JSON member and rewrites the file with no lock. The two router call sites are serialized under load_mutex_, but the new background boot thread now runs concurrently with incoming HTTP requests during the startup window, and other save_model_options callers (e.g. /internal/set at server.cpp:3579) don't hold load_mutex_. This is a pre-existing latent race, but this PR widens the window by introducing a concurrent writer at boot. Guarding recipe_options_ (and the file write) with a dedicated mutex would close it.

The "keep thread alive forever" hack — `server.cpp:5340-5344`

The PR_SET_PDEATHSIG reasoning is correct (confirmed in process_unix.cpp:146,505 — children get SIGTERM when their spawning thread dies, the well-known Linux gotcha), so the thread genuinely must outlive the loads. Two improvements:

The thread is detach()ed and busy-polls shutdown_requested_ every 500ms for the entire process lifetime. Prefer storing it as a joinable member and join()ing it on shutdown (or wait on a condition_variable instead of polling) for a clean teardown.
The fixed sleep_for(500ms) to "wait for listeners" is racy — load doesn't actually depend on the listeners being bound (only on router_/model_manager_), so the comment is slightly misleading; it really just defers log interleaving.

Test (`test/server_pinning.py`)

Several bare except: clauses (lines ~457, ~474, and the debug-dump blocks) will likely trip Pylint (bare-except / W0702) which CI enforces — use except Exception:.
Hardcoded port = 13307 risks collisions if the suite ever runs tests in parallel; consider an OS-assigned/free port.
Good catch isolating config.json to tmp_dir while sharing models_dir — since recipe_options.json lives in the cache dir (tmp_dir), the real test server's state isn't polluted. ✅

Minor / DRY

The persist try/catch block is duplicated verbatim in set_model_pinned (router.cpp:1489-1495) and load_model (router.cpp:394-402). Factoring into a small helper (e.g. persist_pinned_state(model_name, pinned)) would reduce drift.

Fix the persist-before-load ordering →

Overall the change is well-scoped and the mechanism works; items 1, 2, and 4 are the ones I'd want addressed before merge.

… boot

abn · 2026-06-22T17:34:49Z

Review feedback has been addressed.

kenvandine · 2026-06-22T19:20:02Z

@claude please review this again

github-actions · 2026-06-22T19:24:04Z

Claude finished @kenvandine's task in 1m 18s —— View job

Re-review: persist & auto-load pinned models

I re-traced the diff against origin/main. The substantive items from the first review (1, 2, 4) plus the DRY and test nits are all addressed. Details below.

#	First-review item	Status
1	Persist before load succeeds	✅ Fixed
2	Conflicting pinned models on boot (silent eviction)	✅ Mostly — warning added
3	Redundant double-write on boot	✅ Fixed
4	`recipe_options_` has no mutex	✅ Fixed
—	Thread teardown / busy-poll / racy sleep	✅ Fixed
—	DRY: duplicated persist block	✅ Fixed
—	Test bare-`except:` / hardcoded port	✅ Fixed

Verified fixes

1 — Persist only after success. persist_pinned_state() is now invoked only on the success paths: the already-loaded early return (router.cpp:420), the initial successful load (router.cpp:553), and the post-nuclear retry success (router.cpp:604). A throwing load no longer writes pinned: true, so the repeated-boot-retry failure mode is gone. 👍

3 — Redundant write. persist_pinned_state() reads the current pinned option and early-returns when it already matches, so the boot-time double-write (and any no-op /internal/pin) is suppressed.

4 — Mutex. New recipe_options_mutex_ (model_manager.h:317) guards the three concurrent access sites: save_model_options (:1912), build_cache (:2244), and add_model_to_cache (:2339). The only unguarded recipe_options_ access is the initial load/migrate at :1113–1147, which runs during construction before any thread is spawned — safe. recipe_options_mutex_ is a leaf lock (no nested load_mutex_ acquisition), so there's no lock-ordering risk with the router's load_mutex_.

Thread teardown. The autoload thread is now a joinable member, joined in stop() (server.cpp:1442), and waits on shutdown_cv_ instead of busy-polling. set_shutdown_requested/stop set the flag under shutdown_mutex_ and notify_all, and the thread checks the predicate under the lock — no lost-wakeup. The startup defer is now a 100ms interruptible wait_for. Clean.

DRY. Both call sites (load_model, set_model_pinned) route through the single persist_pinned_state helper.

2 — Silent eviction. evict_server now emits LOG(WARNING) when the evicted server is_pinned() (router.cpp:242), so an NPU-exclusivity / capacity conflict is no longer a silent no-op.

Remaining minor notes (non-blocking)

Shutdown latency during a slow load. If stop() fires while the autoload thread is mid-load_model, the join() blocks until that one model finishes loading (the shutdown check sits between models, not within a load). Acceptable, but worth a comment so a future reader knows teardown can stall on a long-running backend spawn.
Many-pinned / VRAM bound still unbounded (item 2 tail). The warning surfaces a conflict, but nothing caps total pinned loads against max_loaded_models or VRAM — boot will still load each pinned model in map order and let the LRU/limit logic sort it out. Fine for now; just confirming it's a known, deliberate tradeoff rather than overlooked.
Test health shape. m.get("loaded", True) treats a missing loaded key as loaded — intentional, but slightly lenient. Minor.

Overall: the previously-flagged correctness and thread-safety concerns are resolved and the mechanism round-trips cleanly. LGTM from my side. ✅
• feat/persist-pinned-models

abn · 2026-06-23T07:44:51Z

@kenvandine let me know if you want me to address any of the non blocking notes from above.

github-actions Bot added the enhancement New feature or request label Jun 18, 2026

abn force-pushed the feat/persist-pinned-models branch 3 times, most recently from e7e8a22 to 3b2822b Compare June 22, 2026 16:25

feat(server): persist model pin status and auto-load pinned models on…

17a9b19

… boot

abn force-pushed the feat/persist-pinned-models branch from 3b2822b to 17a9b19 Compare June 22, 2026 17:33

Merge branch 'main' into feat/persist-pinned-models

a368d90

kenvandine added this to the Lemonade v10.9 milestone Jun 23, 2026

kenvandine added 2 commits June 23, 2026 16:25

Merge branch 'main' into feat/persist-pinned-models

02dc2af

Merge branch 'main' into feat/persist-pinned-models

e04143a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(server): persist model pin status and auto-load pinned models on boot#2310

feat(server): persist model pin status and auto-load pinned models on boot#2310
abn wants to merge 4 commits into
lemonade-sdk:mainfrom
abn:feat/persist-pinned-models

abn commented Jun 18, 2026 •

edited

Loading

Uh oh!

kenvandine commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

abn commented Jun 22, 2026

Uh oh!

kenvandine commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

abn commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

abn commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kenvandine commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: persist & auto-load pinned models

Correctness

Thread-safety

The "keep thread alive forever" hack — server.cpp:5340-5344

Test (test/server_pinning.py)

Minor / DRY

Uh oh!

abn commented Jun 22, 2026

Uh oh!

kenvandine commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: persist & auto-load pinned models

Verified fixes

Remaining minor notes (non-blocking)

Uh oh!

abn commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abn commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 22, 2026 •

edited

Loading

The "keep thread alive forever" hack — `server.cpp:5340-5344`

Test (`test/server_pinning.py`)

github-actions Bot commented Jun 22, 2026 •

edited

Loading