feat: align with FlowMesh; add PermissionChecker + ResourceRegistrar by kaiitunnz · Pull Request #1 · mlsys-io/lumid.flowmesh-plugin

kaiitunnz · 2026-05-20T10:40:36Z

Purpose

Two coupled changes that together make the plugin's authorization story explicit and aligned with current FlowMesh.

Docs + identity cleanup. Strips the stale "FlowMesh V2" framing, removes the dead V1 scope-vocabulary filter from LumidIdentityProvider (FlowMesh has no require_scope path; the filter was misleading), documents the FLOWMESH_API_KEY operational caveat that this plugin imposes when it's the sole IdentityProvider, and swaps the README's pip install loading snippet for the two canonical flowmesh stack deployment patterns.
PermissionChecker + ResourceRegistrar. Implements the two hooks the plugin has been advertising. ResourceRegistrar mirrors FlowMesh's resource lifecycle into a SQLite ownership table; PermissionChecker reads it to gate access. Defines the concrete scope vocabulary lum.id PATs mint against — admin scopes bypass, kind-level scopes gate creation, ownership gates concrete-id access. The ACL DB lives under FlowMesh's FLOWMESH_PLUGIN_DATA_DIR mount so it survives restarts.

Depends on the matching FlowMesh PR adding FLOWMESH_PLUGIN_DATA_DIR to the stack compose template.

Changes

Identity + docs cleanup — drop the V1-vocabulary scope filter from LumidIdentityProvider, strip "V2" framing from the README and module docstring, document the FLOWMESH_API_KEY operational caveat, swap the loading snippet for the canonical flowmesh stack patterns, resolve hook deps from PyPI.
ACL storage (new acl.py) — SQLite-backed OwnershipStore keyed on (kind, id). Single-table schema; upsert semantics; startup TTL prune for stale rows.
ResourceRegistrar (new registrar.py) — mirror FlowMesh's resource lifecycle into the ACL.
PermissionChecker (new permissions.py) — admin-scope bypass; kind-level scope checks; concrete-id ownership lookup; SYSTEM gets a read-scope bypass.
install() becomes @asynccontextmanager — opens the engine against LUMID_ACL_DB_PATH, bootstraps schema, prunes stale rows, yields bindings, disposes on shutdown.

README gains a "Scope vocabulary" section enumerating the five scopes the PermissionChecker enforces, and rows for LUMID_ACL_DB_PATH / LUMID_ACL_TTL_DAYS.

Test Plan

uv sync --all-extras
uv run pytest
uv run ruff check src tests
uv run mypy src

End-to-end against a running FlowMesh stack: not run — requires the matching FlowMesh PR (adding FLOWMESH_PLUGIN_DATA_DIR) to land first so the default DB path is mountable. Will retest live once both PRs merge.

Test Result

63 passed in 0.47s     # pytest
All checks passed!     # ruff
Success: no issues found in 10 source files   # mypy --strict

Follow-ups

Upstream change to FlowMesh's hook surface so the plugin can do V1-style "diff against the authoritative resource registry" cleanup at boot. Either (a) replay register() at boot for every persisted resource, or (b) expose a "list live resource IDs by kind" hook. Until then, the TTL prune is the best we can do.
Live e2e test once the FlowMesh FLOWMESH_PLUGIN_DATA_DIR PR merges.

The V1 lum.id host enforced a fixed scope vocabulary (`workers:register`, `results:read`, etc.) via route guards. FlowMesh has no scope-based gating — authorization runs through `PermissionChecker` hooks instead, and nothing in the server reads `PrincipalContext.scopes`. The `flowmesh:`-prefix mapping plus `ALLOWED_SCOPES` filter was therefore dead code that also misled the README into promising a behavior FlowMesh no longer has. Drop both. lum.id scopes now flow onto `PrincipalContext.scopes` verbatim, where any plugin-supplied `PermissionChecker` can read them.

The README still framed this as a "FlowMesh V2" plugin — a label that stems from internal miscommunication and that FlowMesh's own docs never use. The "Loading" section also told operators to `pip install` into an unspecified Python env, which doesn't match the canonical `flowmesh stack` deployment patterns. Three substantive updates beyond the V2 cleanup: - Add an `FLOWMESH_API_KEY` env-var row. Once this plugin is the sole `IdentityProvider`, that key must itself be a token we can resolve (lum.id JWT or `lm_pat_*`). Workers send it as their bearer on every server call, and the server resolves it at boot to obtain the system principal that drives `ResourceRegistrar` calls. An unresolvable key falls back to a synthetic admin and breaks worker auth. - Replace the single `pip install` snippet with the two canonical patterns from `FlowMesh/docs/PLUGINS.md`: bind-mount via `FLOWMESH_PLUGIN_DIR`, then an overlay Dockerfile that bakes the wheel into a derived server image. - Document the email-cache TTL (24 h) and capacity (10 k) on the `IdentityProvider` row, mirroring the introspect cache's annotation.

Both packages are now published; drop the `[tool.uv.sources]` git pins so the existing `>=0.1.0` constraints resolve from PyPI like every other dep. Lockfile regenerated at lumid-hooks==0.1.0 and flowmesh-hook==0.1.0.

PermissionChecker and ResourceRegistrar need a persistent (kind, id) -> principal_id table to track who owns which resource. This adds the storage layer in isolation; the hooks that read and write through it follow. `OwnershipStore` wraps an async SQLAlchemy sessionmaker with set/get/ delete/list_ids_for_principal/prune_older_than. `set` is an upsert so re-registering a resource updates the owner. `prune_older_than` is the startup cleanup for stale rows; FlowMesh does not replay register() at boot, so this TTL is the best we can do without an upstream API for listing live resource IDs. `open_store` is the async ctx-manager `install()` will use — opens the engine, bootstraps the schema, yields the store, disposes on exit.

Listen to FlowMesh's resource lifecycle events and mirror them into the ACL ownership table. `register` upserts (kind, id) -> principal_id; `deregister` removes the row. Kind-level refs (id is None) are no-ops with a logged warning — they shouldn't reach a registrar but we don't want to crash if the server ever fires one.

Concrete scope vocabulary (defined by this plugin, minted on lum.id PATs): *, flowmesh:*, flowmesh:admin -> admin bypass everything flowmesh:workflows:write -> create workflows (kind-level WRITE) flowmesh:nodes:write -> register nodes flowmesh:workers:write -> register workers flowmesh:system:read -> read SYSTEM (cluster metrics) For concrete resource ids, ownership is the gate — the principal who created the resource (via ResourceRegistrar.register) is allowed; others are denied. SYSTEM is the exception: `flowmesh:system:read` grants read on any SYSTEM resource regardless of ownership. TASK and RESULT have no kind-level scope because tasks are created via workflow submission and result ownership is inferred from the owning task — both reduce to concrete-id ownership checks. `accessible_ids` returns the principal's owned ids for list endpoints, or `None` (no filter) for admins.

`install()` becomes an `@asynccontextmanager`: opens the ACL SQLite engine, bootstraps the schema, prunes rows older than LUMID_ACL_TTL_DAYS (default 90; 0 disables), yields a BaseBindings carrying the existing identity / supplier / usage / submission hooks plus the new permission_checker and resource_registrar, then disposes the engine on FastAPI shutdown. The default DB path is `/app/plugin-data/lumid_acl.sqlite` — the writable mount FlowMesh exposes via FLOWMESH_PLUGIN_DATA_DIR. Operators override via LUMID_ACL_DB_PATH; tests point at a tmp_path.

Adds rows for the two new hooks in the "What it provides" table, a "Scope vocabulary" section enumerating the five scopes lum.id PATs mint against, and the LUMID_ACL_DB_PATH / LUMID_ACL_TTL_DAYS env vars. Also notes that install() is now an async ctx-manager and that the default ACL SQLite path lives under FlowMesh's FLOWMESH_PLUGIN_DATA_DIR mount.

A non-admin principal needs a corresponding `:read` scope to call a kind-level READ endpoint (`flowmesh:workflows:read`, `flowmesh:tasks:read`, `flowmesh:results:read`, `flowmesh:nodes:read`, `flowmesh:workers:read`). `accessible_ids` still filters the returned set to the principal's owned ids, and concrete-id access stays owner-only — only admin sees resources they don't own. The existing `flowmesh:system:read` is now a regular entry in the same policy table, with the same kind-level semantics.

The ACL is now keyed by (kind, id, principal_id), so multiple principals can hold a grant on the same resource. The store gains `grant`, `revoke`, `has_grant`, and `delete_resource` (the deregister path wipes every grant on the resource). The PermissionChecker concrete-id branch becomes a grant-membership check; `accessible_ids` returns the principal's granted ids, including resources shared with them. A composite `(principal_id, kind)` index replaces the standalone `principal_id` index so `list_ids_for_principal` is fully covered. `revoke()` is implemented but unwired — there is no grant/revoke API yet; FlowMesh's `register()` is still the only writer today.

FLOWMESH_API_KEY is FlowMesh's own concern, not this plugin's, so it shouldn't appear in the plugin's env-var table or the Loading example. The Loading section is rewritten to match what actually works: `flowmesh stack up` auto-imports anything under `${FLOWMESH_PLUGIN_DIR}` named in `FLOWMESH_PLUGINS`, so the bind-mount path is just "drop the source tree in" — no thin loader. The overlay image path is unchanged. Also drops a redundant email-cache annotation on the IdentityProvider row and tightens the LUMID_ACL_TTL_DAYS note.

A long-running worker (or workflow) used to lose its grant on the next FlowMesh restart past LUMID_ACL_TTL_DAYS — the wall-clock prune dropped the row even though the resource was still live. The host-driven reconcile sweep replaces that with a stronger guarantee: FlowMesh batches every live ResourceRef into a single `refresh` call, then `purge_stale` drops whatever the sweep didn't touch. - `GrantStore.touch_resources(refs)` does a single bulk UPDATE keyed by `(kind, id)`, refreshing every principal's grant on the listed resources — multi-principal-safe. - `GrantStore.delete_unrefreshed(session_start)` clears rows whose `granted_at` predates the sweep. - `LumidResourceRegistrar` takes `session_start` (captured in `install()` after schema bootstrap). `refresh` translates the batch into a `touch_resources` call; `purge_stale` calls `delete_unrefreshed`. - `LUMID_ACL_TTL_DAYS` and `prune_older_than` are gone. Requires lumid-hooks 0.2.0 for the new Protocol methods. The `tool.uv.sources` entry pointing at `../lumid.hooks` is temporary — drop it once 0.2.0 is on PyPI.

Review findings on the reconcile work: - `GrantStore.touch_resources(refs)` -> `touch_resources(pairs)` — the parameter takes `(kind, id)` tuples, not `ResourceRef` instances; the old name implied otherwise. - `LumidResourceRegistrar.refresh` switches to `Collection[ResourceRef]` to match the tightened lumid-hooks 0.2.0 Protocol signature, and logs a debug line when it drops kind-level refs (parity with the warnings on `register`/`deregister`). - Test helper `_backdate(principal_id: str | None)` split into `_backdate_one` and `_backdate_all`; the implicit branching on a None-overloaded arg was a smell. - Add coverage for two reconcile shapes the existing tests didn't hit: sweep against an empty store is a no-op, and a second sweep within the same boot doesn't drop grants the first sweep just refreshed.

Surfaces the temporary override in grep when 0.2.0 ships to PyPI.

Match lumid-hooks 0.2.0's single-method Protocol: one atomic `reconcile(resources, logger)` replaces the two-call sweep so a mid-sweep failure can't half-wipe the ACL. - `GrantStore.reconcile(pairs, session_start)` runs the UPDATE (touch refreshed grants) and the DELETE (drop anything older than `session_start`) in a single transaction. On error the transaction rolls back, leaving the store unchanged. Replaces `touch_resources` and `delete_unrefreshed`. - `session_start` stays on the registrar so it's captured at plugin load time, not when the host invokes `reconcile`. Grants written by other startup paths (e.g. supervisor registration) between load and the sweep have `granted_at > session_start` and survive. - `LumidResourceRegistrar.reconcile(resources, logger)` flattens refs to `(kind, id)` pairs, logs kind-level drops, and reports touched/deleted counts at INFO. - Tests cover: live grants survive (long-running resources), stale grants drop, empty batch wipes pre-session rows, grants written after `session_start` survive (the host-race protection), and a mid-transaction failure rolls back.

lumid-hooks 0.2.0 is released on PyPI, so the editable path override from `[tool.uv.sources]` is no longer needed. uv now resolves the pin from the registry. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

The grant store's only persistence need is a single-table SQLite file, which the stdlib `sqlite3` module covers directly. Dropping the SQLAlchemy and aiosqlite deps means the plugin's runtime deps (`httpx`, `pydantic`, `fastapi`, `lumid-hooks`, `flowmesh-hook`) are all already present in the FlowMesh server image, so the bind-mount deployment path no longer needs an overlay Dockerfile. `GrantStore` keeps its public API. One `sqlite3.Connection` is opened in WAL + autocommit and shared across all ops; an `asyncio.Lock` serialises access and queries run in `asyncio.to_thread`. `reconcile` uses explicit `BEGIN`/`COMMIT`/`ROLLBACK` for the same atomic-on- failure contract. The README's Loading section collapses to the single bind-mount path. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

`GrantStore` already serialises every operation through an `asyncio.Lock`, so SQLite's WAL concurrency (non-blocking readers vs. one writer) is mooted before the engine sees it. Defaulting to rollback-journal mode keeps a single file at rest — no `-wal`/`-shm` sidecars to back up or trip up external readers. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Trim docstrings + README to direct declarative statements. Cut justification of absent design choices ("no SQLAlchemy / aiosqlite dependency", "No overlay image needed", "No locks needed" — for the WAL paragraph in `acl.py`), narrative deliberation ("With this plugin as the sole IdentityProvider, every authenticated principal came through our resolve path…"), and contrastive rebuttals ("so a partial sweep can't wipe live grants", "(admin aside)", "they shouldn't reach a registrar in practice, but…"). Keep the load-bearing invariants — single-atomic-transaction reconcile, asyncio.Lock + to_thread for the SQLite connection, kind-level scope fallback policy — stated once each. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

kaiitunnz and others added 19 commits May 20, 2026 15:21

chore(deps): resolve lumid-hooks and flowmesh-hook from PyPI

6cf1cc9

Both packages are now published; drop the `[tool.uv.sources]` git pins so the existing `>=0.1.0` constraints resolve from PyPI like every other dep. Lockfile regenerated at lumid-hooks==0.1.0 and flowmesh-hook==0.1.0.

chore(deps): mark lumid-hooks override with TODO

a6f0dcf

Surfaces the temporary override in grep when 0.2.0 ships to PyPI.

chore(deps): pin lumid-hooks to PyPI 0.2.0

a3c52d8

lumid-hooks 0.2.0 is released on PyPI, so the editable path override from `[tool.uv.sources]` is no longer needed. uv now resolves the pin from the registry. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

tjluyao mentioned this pull request May 24, 2026

chore: pin lumid-hooks to v0.1.0 tag until registrar.reconcile lands #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: align with FlowMesh; add PermissionChecker + ResourceRegistrar#1

feat: align with FlowMesh; add PermissionChecker + ResourceRegistrar#1
kaiitunnz wants to merge 19 commits into
mainfrom
kaiitunnz/feat/permissions-and-registrar

kaiitunnz commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaiitunnz commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Test Plan

Test Result

Follow-ups

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kaiitunnz commented May 20, 2026 •

edited

Loading