Skip to content

feat(server): persist model pin status and auto-load pinned models on boot#2310

Open
abn wants to merge 4 commits into
lemonade-sdk:mainfrom
abn:feat/persist-pinned-models
Open

feat(server): persist model pin status and auto-load pinned models on boot#2310
abn wants to merge 4 commits into
lemonade-sdk:mainfrom
abn:feat/persist-pinned-models

Conversation

@abn

@abn abn commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

This pull request adds support for persisting and automatically loading "pinned" models in the server. Now, when a model is pinned, its status is saved to disk and, upon server restart, any pinned models are auto-loaded in the background. This ensures that important models are always available after a restart. Previously, this state was lost and server restarts caused configuration amnesia.

Persistence and auto-loading of pinned models:

  • When a model's pinned status is set (either during load or via the /internal/pin endpoint), the status is now saved to disk in recipe_options.json, ensuring persistence across server restarts. [1] [2]
  • On server startup, a new asynchronous task (Server::load_pinned_models_async) scans for models marked as pinned and loads them in the background, keeping the thread alive to prevent orphaned backend processes. [1] [2] [3]

Testing enhancements:

  • Added an integration test (test_pinned_model_persists_and_autoloads_on_restart) that verifies a pinned model is persisted and automatically loaded after a server restart, including setup/teardown and log output for debugging. [1] [2]

@github-actions github-actions Bot added the enhancement New feature or request label Jun 18, 2026
@abn abn force-pushed the feat/persist-pinned-models branch 3 times, most recently from e7e8a22 to 3b2822b Compare June 22, 2026 16:25
@kenvandine

Copy link
Copy Markdown
Member

@claude please review this

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @kenvandine's task in 3m 3s —— View job


Review: persist & auto-load pinned models

  • Read PR diff
  • Review router.cpp persistence logic
  • Review server.cpp auto-load thread
  • Check thread-safety & shutdown interplay
  • Review the test
  • Post review

I traced the full flow: pin → save_model_optionsrecipe_options.json (keyed by canonical ID, under get_cache_dir()), then on boot build_recipe_options() layers the saved pinned flag back into each model's recipe_options, and load_pinned_models_async() reads it via get_supported_models(). The feature is sound and the persistence round-trips correctly. Nice work, and the integration test is genuinely end-to-end (real restart, shared models dir, isolated config dir). A few things worth addressing:


Correctness

1. Pinned state is persisted before the load succeedsrouter.cpp:394-402
The persist block runs before the model is actually loaded (and before the early-return / NPU-eviction logic). If the subsequent load throws, the model is now recorded as pinned: true on disk and will be retried on every future boot, failing each time. Consider persisting only after a successful load, or rolling back on failure.

2. Multiple conflicting pinned models on bootserver.cpp:5314-5331
Auto-load iterates get_supported_models() (a std::map, i.e. alphabetical order) and loads each pinned model sequentially via load_model. Two concerns:

  • If a user has pinned two exclusive-NPU recipes (ryzenai-llm / whispercpp), or more pinned models than max_loaded_models, the later loads will evict the earlier ones (NPU exclusivity / limit enforcement) — so the final state silently depends on map ordering, not user intent. Worth at least a LOG(WARNING) when a pinned model can't coexist, so it isn't a silent no-op.
  • VRAM exhaustion from loading many pinned models at once isn't bounded here.

3. Redundant double-write during auto-load
Boot calls load_model(name, info, info.recipe_options, true, false, /*pinned=*/true). Because pinned.has_value(), the new block at router.cpp:394 re-writes the identical pinned: true back to disk for every pinned model on every boot. Harmless but unnecessary I/O — the value already lives in info.recipe_options.

Thread-safety

4. recipe_options_ has no mutexmodel_manager.h:310, model_manager.cpp:1907-1914
save_model_options() mutates the recipe_options_ JSON member and rewrites the file with no lock. The two router call sites are serialized under load_mutex_, but the new background boot thread now runs concurrently with incoming HTTP requests during the startup window, and other save_model_options callers (e.g. /internal/set at server.cpp:3579) don't hold load_mutex_. This is a pre-existing latent race, but this PR widens the window by introducing a concurrent writer at boot. Guarding recipe_options_ (and the file write) with a dedicated mutex would close it.

The "keep thread alive forever" hack — server.cpp:5340-5344

The PR_SET_PDEATHSIG reasoning is correct (confirmed in process_unix.cpp:146,505 — children get SIGTERM when their spawning thread dies, the well-known Linux gotcha), so the thread genuinely must outlive the loads. Two improvements:

  • The thread is detach()ed and busy-polls shutdown_requested_ every 500ms for the entire process lifetime. Prefer storing it as a joinable member and join()ing it on shutdown (or wait on a condition_variable instead of polling) for a clean teardown.
  • The fixed sleep_for(500ms) to "wait for listeners" is racy — load doesn't actually depend on the listeners being bound (only on router_/model_manager_), so the comment is slightly misleading; it really just defers log interleaving.

Test (test/server_pinning.py)

  • Several bare except: clauses (lines ~457, ~474, and the debug-dump blocks) will likely trip Pylint (bare-except / W0702) which CI enforces — use except Exception:.
  • Hardcoded port = 13307 risks collisions if the suite ever runs tests in parallel; consider an OS-assigned/free port.
  • Good catch isolating config.json to tmp_dir while sharing models_dir — since recipe_options.json lives in the cache dir (tmp_dir), the real test server's state isn't polluted. ✅

Minor / DRY

The persist try/catch block is duplicated verbatim in set_model_pinned (router.cpp:1489-1495) and load_model (router.cpp:394-402). Factoring into a small helper (e.g. persist_pinned_state(model_name, pinned)) would reduce drift.

Fix the persist-before-load ordering →

Overall the change is well-scoped and the mechanism works; items 1, 2, and 4 are the ones I'd want addressed before merge.

@abn abn force-pushed the feat/persist-pinned-models branch from 3b2822b to 17a9b19 Compare June 22, 2026 17:33
@abn

abn commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

Review feedback has been addressed.

@kenvandine

Copy link
Copy Markdown
Member

@claude please review this again

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @kenvandine's task in 1m 18s —— View job


Re-review: persist & auto-load pinned models

I re-traced the diff against origin/main. The substantive items from the first review (1, 2, 4) plus the DRY and test nits are all addressed. Details below.

# First-review item Status
1 Persist before load succeeds ✅ Fixed
2 Conflicting pinned models on boot (silent eviction) ✅ Mostly — warning added
3 Redundant double-write on boot ✅ Fixed
4 recipe_options_ has no mutex ✅ Fixed
Thread teardown / busy-poll / racy sleep ✅ Fixed
DRY: duplicated persist block ✅ Fixed
Test bare-except: / hardcoded port ✅ Fixed

Verified fixes

1 — Persist only after success. persist_pinned_state() is now invoked only on the success paths: the already-loaded early return (router.cpp:420), the initial successful load (router.cpp:553), and the post-nuclear retry success (router.cpp:604). A throwing load no longer writes pinned: true, so the repeated-boot-retry failure mode is gone. 👍

3 — Redundant write. persist_pinned_state() reads the current pinned option and early-returns when it already matches, so the boot-time double-write (and any no-op /internal/pin) is suppressed.

4 — Mutex. New recipe_options_mutex_ (model_manager.h:317) guards the three concurrent access sites: save_model_options (:1912), build_cache (:2244), and add_model_to_cache (:2339). The only unguarded recipe_options_ access is the initial load/migrate at :1113–1147, which runs during construction before any thread is spawned — safe. recipe_options_mutex_ is a leaf lock (no nested load_mutex_ acquisition), so there's no lock-ordering risk with the router's load_mutex_.

Thread teardown. The autoload thread is now a joinable member, joined in stop() (server.cpp:1442), and waits on shutdown_cv_ instead of busy-polling. set_shutdown_requested/stop set the flag under shutdown_mutex_ and notify_all, and the thread checks the predicate under the lock — no lost-wakeup. The startup defer is now a 100ms interruptible wait_for. Clean.

DRY. Both call sites (load_model, set_model_pinned) route through the single persist_pinned_state helper.

2 — Silent eviction. evict_server now emits LOG(WARNING) when the evicted server is_pinned() (router.cpp:242), so an NPU-exclusivity / capacity conflict is no longer a silent no-op.


Remaining minor notes (non-blocking)

  • Shutdown latency during a slow load. If stop() fires while the autoload thread is mid-load_model, the join() blocks until that one model finishes loading (the shutdown check sits between models, not within a load). Acceptable, but worth a comment so a future reader knows teardown can stall on a long-running backend spawn.
  • Many-pinned / VRAM bound still unbounded (item 2 tail). The warning surfaces a conflict, but nothing caps total pinned loads against max_loaded_models or VRAM — boot will still load each pinned model in map order and let the LRU/limit logic sort it out. Fine for now; just confirming it's a known, deliberate tradeoff rather than overlooked.
  • Test health shape. m.get("loaded", True) treats a missing loaded key as loaded — intentional, but slightly lenient. Minor.

Overall: the previously-flagged correctness and thread-safety concerns are resolved and the mechanism round-trips cleanly. LGTM from my side. ✅
feat/persist-pinned-models

@abn

abn commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

@kenvandine let me know if you want me to address any of the non blocking notes from above.

@kenvandine kenvandine added this to the Lemonade v10.9 milestone Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants