feat: align with FlowMesh; add PermissionChecker + ResourceRegistrar#1
Draft
kaiitunnz wants to merge 19 commits into
Draft
feat: align with FlowMesh; add PermissionChecker + ResourceRegistrar#1kaiitunnz wants to merge 19 commits into
kaiitunnz wants to merge 19 commits into
Conversation
The V1 lum.id host enforced a fixed scope vocabulary (`workers:register`, `results:read`, etc.) via route guards. FlowMesh has no scope-based gating — authorization runs through `PermissionChecker` hooks instead, and nothing in the server reads `PrincipalContext.scopes`. The `flowmesh:`-prefix mapping plus `ALLOWED_SCOPES` filter was therefore dead code that also misled the README into promising a behavior FlowMesh no longer has. Drop both. lum.id scopes now flow onto `PrincipalContext.scopes` verbatim, where any plugin-supplied `PermissionChecker` can read them.
The README still framed this as a "FlowMesh V2" plugin — a label that stems from internal miscommunication and that FlowMesh's own docs never use. The "Loading" section also told operators to `pip install` into an unspecified Python env, which doesn't match the canonical `flowmesh stack` deployment patterns. Three substantive updates beyond the V2 cleanup: - Add an `FLOWMESH_API_KEY` env-var row. Once this plugin is the sole `IdentityProvider`, that key must itself be a token we can resolve (lum.id JWT or `lm_pat_*`). Workers send it as their bearer on every server call, and the server resolves it at boot to obtain the system principal that drives `ResourceRegistrar` calls. An unresolvable key falls back to a synthetic admin and breaks worker auth. - Replace the single `pip install` snippet with the two canonical patterns from `FlowMesh/docs/PLUGINS.md`: bind-mount via `FLOWMESH_PLUGIN_DIR`, then an overlay Dockerfile that bakes the wheel into a derived server image. - Document the email-cache TTL (24 h) and capacity (10 k) on the `IdentityProvider` row, mirroring the introspect cache's annotation.
Both packages are now published; drop the `[tool.uv.sources]` git pins so the existing `>=0.1.0` constraints resolve from PyPI like every other dep. Lockfile regenerated at lumid-hooks==0.1.0 and flowmesh-hook==0.1.0.
PermissionChecker and ResourceRegistrar need a persistent (kind, id) -> principal_id table to track who owns which resource. This adds the storage layer in isolation; the hooks that read and write through it follow. `OwnershipStore` wraps an async SQLAlchemy sessionmaker with set/get/ delete/list_ids_for_principal/prune_older_than. `set` is an upsert so re-registering a resource updates the owner. `prune_older_than` is the startup cleanup for stale rows; FlowMesh does not replay register() at boot, so this TTL is the best we can do without an upstream API for listing live resource IDs. `open_store` is the async ctx-manager `install()` will use — opens the engine, bootstraps the schema, yields the store, disposes on exit.
Listen to FlowMesh's resource lifecycle events and mirror them into the ACL ownership table. `register` upserts (kind, id) -> principal_id; `deregister` removes the row. Kind-level refs (id is None) are no-ops with a logged warning — they shouldn't reach a registrar but we don't want to crash if the server ever fires one.
Concrete scope vocabulary (defined by this plugin, minted on lum.id PATs): *, flowmesh:*, flowmesh:admin -> admin bypass everything flowmesh:workflows:write -> create workflows (kind-level WRITE) flowmesh:nodes:write -> register nodes flowmesh:workers:write -> register workers flowmesh:system:read -> read SYSTEM (cluster metrics) For concrete resource ids, ownership is the gate — the principal who created the resource (via ResourceRegistrar.register) is allowed; others are denied. SYSTEM is the exception: `flowmesh:system:read` grants read on any SYSTEM resource regardless of ownership. TASK and RESULT have no kind-level scope because tasks are created via workflow submission and result ownership is inferred from the owning task — both reduce to concrete-id ownership checks. `accessible_ids` returns the principal's owned ids for list endpoints, or `None` (no filter) for admins.
`install()` becomes an `@asynccontextmanager`: opens the ACL SQLite engine, bootstraps the schema, prunes rows older than LUMID_ACL_TTL_DAYS (default 90; 0 disables), yields a BaseBindings carrying the existing identity / supplier / usage / submission hooks plus the new permission_checker and resource_registrar, then disposes the engine on FastAPI shutdown. The default DB path is `/app/plugin-data/lumid_acl.sqlite` — the writable mount FlowMesh exposes via FLOWMESH_PLUGIN_DATA_DIR. Operators override via LUMID_ACL_DB_PATH; tests point at a tmp_path.
Adds rows for the two new hooks in the "What it provides" table, a "Scope vocabulary" section enumerating the five scopes lum.id PATs mint against, and the LUMID_ACL_DB_PATH / LUMID_ACL_TTL_DAYS env vars. Also notes that install() is now an async ctx-manager and that the default ACL SQLite path lives under FlowMesh's FLOWMESH_PLUGIN_DATA_DIR mount.
A non-admin principal needs a corresponding `:read` scope to call a kind-level READ endpoint (`flowmesh:workflows:read`, `flowmesh:tasks:read`, `flowmesh:results:read`, `flowmesh:nodes:read`, `flowmesh:workers:read`). `accessible_ids` still filters the returned set to the principal's owned ids, and concrete-id access stays owner-only — only admin sees resources they don't own. The existing `flowmesh:system:read` is now a regular entry in the same policy table, with the same kind-level semantics.
The ACL is now keyed by (kind, id, principal_id), so multiple principals can hold a grant on the same resource. The store gains `grant`, `revoke`, `has_grant`, and `delete_resource` (the deregister path wipes every grant on the resource). The PermissionChecker concrete-id branch becomes a grant-membership check; `accessible_ids` returns the principal's granted ids, including resources shared with them. A composite `(principal_id, kind)` index replaces the standalone `principal_id` index so `list_ids_for_principal` is fully covered. `revoke()` is implemented but unwired — there is no grant/revoke API yet; FlowMesh's `register()` is still the only writer today.
FLOWMESH_API_KEY is FlowMesh's own concern, not this plugin's, so it
shouldn't appear in the plugin's env-var table or the Loading example.
The Loading section is rewritten to match what actually works:
`flowmesh stack up` auto-imports anything under `${FLOWMESH_PLUGIN_DIR}`
named in `FLOWMESH_PLUGINS`, so the bind-mount path is just "drop the
source tree in" — no thin loader. The overlay image path is unchanged.
Also drops a redundant email-cache annotation on the IdentityProvider
row and tightens the LUMID_ACL_TTL_DAYS note.
A long-running worker (or workflow) used to lose its grant on the next FlowMesh restart past LUMID_ACL_TTL_DAYS — the wall-clock prune dropped the row even though the resource was still live. The host-driven reconcile sweep replaces that with a stronger guarantee: FlowMesh batches every live ResourceRef into a single `refresh` call, then `purge_stale` drops whatever the sweep didn't touch. - `GrantStore.touch_resources(refs)` does a single bulk UPDATE keyed by `(kind, id)`, refreshing every principal's grant on the listed resources — multi-principal-safe. - `GrantStore.delete_unrefreshed(session_start)` clears rows whose `granted_at` predates the sweep. - `LumidResourceRegistrar` takes `session_start` (captured in `install()` after schema bootstrap). `refresh` translates the batch into a `touch_resources` call; `purge_stale` calls `delete_unrefreshed`. - `LUMID_ACL_TTL_DAYS` and `prune_older_than` are gone. Requires lumid-hooks 0.2.0 for the new Protocol methods. The `tool.uv.sources` entry pointing at `../lumid.hooks` is temporary — drop it once 0.2.0 is on PyPI.
Review findings on the reconcile work: - `GrantStore.touch_resources(refs)` -> `touch_resources(pairs)` — the parameter takes `(kind, id)` tuples, not `ResourceRef` instances; the old name implied otherwise. - `LumidResourceRegistrar.refresh` switches to `Collection[ResourceRef]` to match the tightened lumid-hooks 0.2.0 Protocol signature, and logs a debug line when it drops kind-level refs (parity with the warnings on `register`/`deregister`). - Test helper `_backdate(principal_id: str | None)` split into `_backdate_one` and `_backdate_all`; the implicit branching on a None-overloaded arg was a smell. - Add coverage for two reconcile shapes the existing tests didn't hit: sweep against an empty store is a no-op, and a second sweep within the same boot doesn't drop grants the first sweep just refreshed.
Surfaces the temporary override in grep when 0.2.0 ships to PyPI.
Match lumid-hooks 0.2.0's single-method Protocol: one atomic `reconcile(resources, logger)` replaces the two-call sweep so a mid-sweep failure can't half-wipe the ACL. - `GrantStore.reconcile(pairs, session_start)` runs the UPDATE (touch refreshed grants) and the DELETE (drop anything older than `session_start`) in a single transaction. On error the transaction rolls back, leaving the store unchanged. Replaces `touch_resources` and `delete_unrefreshed`. - `session_start` stays on the registrar so it's captured at plugin load time, not when the host invokes `reconcile`. Grants written by other startup paths (e.g. supervisor registration) between load and the sweep have `granted_at > session_start` and survive. - `LumidResourceRegistrar.reconcile(resources, logger)` flattens refs to `(kind, id)` pairs, logs kind-level drops, and reports touched/deleted counts at INFO. - Tests cover: live grants survive (long-running resources), stale grants drop, empty batch wipes pre-session rows, grants written after `session_start` survive (the host-race protection), and a mid-transaction failure rolls back.
lumid-hooks 0.2.0 is released on PyPI, so the editable path override from `[tool.uv.sources]` is no longer needed. uv now resolves the pin from the registry. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The grant store's only persistence need is a single-table SQLite file, which the stdlib `sqlite3` module covers directly. Dropping the SQLAlchemy and aiosqlite deps means the plugin's runtime deps (`httpx`, `pydantic`, `fastapi`, `lumid-hooks`, `flowmesh-hook`) are all already present in the FlowMesh server image, so the bind-mount deployment path no longer needs an overlay Dockerfile. `GrantStore` keeps its public API. One `sqlite3.Connection` is opened in WAL + autocommit and shared across all ops; an `asyncio.Lock` serialises access and queries run in `asyncio.to_thread`. `reconcile` uses explicit `BEGIN`/`COMMIT`/`ROLLBACK` for the same atomic-on- failure contract. The README's Loading section collapses to the single bind-mount path. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
`GrantStore` already serialises every operation through an `asyncio.Lock`, so SQLite's WAL concurrency (non-blocking readers vs. one writer) is mooted before the engine sees it. Defaulting to rollback-journal mode keeps a single file at rest — no `-wal`/`-shm` sidecars to back up or trip up external readers. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Trim docstrings + README to direct declarative statements. Cut
justification of absent design choices ("no SQLAlchemy / aiosqlite
dependency", "No overlay image needed", "No locks needed" — for the
WAL paragraph in `acl.py`), narrative deliberation ("With this plugin
as the sole IdentityProvider, every authenticated principal came
through our resolve path…"), and contrastive rebuttals ("so a partial
sweep can't wipe live grants", "(admin aside)", "they shouldn't reach
a registrar in practice, but…"). Keep the load-bearing invariants —
single-atomic-transaction reconcile, asyncio.Lock + to_thread for the
SQLite connection, kind-level scope fallback policy — stated once
each.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Two coupled changes that together make the plugin's authorization story explicit and aligned with current FlowMesh.
Docs + identity cleanup. Strips the stale "FlowMesh V2" framing, removes the dead V1 scope-vocabulary filter from
LumidIdentityProvider(FlowMesh has norequire_scopepath; the filter was misleading), documents theFLOWMESH_API_KEYoperational caveat that this plugin imposes when it's the soleIdentityProvider, and swaps the README'spip installloading snippet for the two canonicalflowmesh stackdeployment patterns.PermissionChecker + ResourceRegistrar. Implements the two hooks the plugin has been advertising.
ResourceRegistrarmirrors FlowMesh's resource lifecycle into a SQLite ownership table;PermissionCheckerreads it to gate access. Defines the concrete scope vocabulary lum.id PATs mint against — admin scopes bypass, kind-level scopes gate creation, ownership gates concrete-id access. The ACL DB lives under FlowMesh'sFLOWMESH_PLUGIN_DATA_DIRmount so it survives restarts.Depends on the matching FlowMesh PR adding
FLOWMESH_PLUGIN_DATA_DIRto the stack compose template.Changes
LumidIdentityProvider, strip "V2" framing from the README and module docstring, document theFLOWMESH_API_KEYoperational caveat, swap the loading snippet for the canonicalflowmesh stackpatterns, resolve hook deps from PyPI.acl.py) — SQLite-backedOwnershipStorekeyed on(kind, id). Single-table schema; upsert semantics; startup TTL prune for stale rows.ResourceRegistrar(newregistrar.py) — mirror FlowMesh's resource lifecycle into the ACL.PermissionChecker(newpermissions.py) — admin-scope bypass; kind-level scope checks; concrete-id ownership lookup; SYSTEM gets a read-scope bypass.install()becomes@asynccontextmanager— opens the engine againstLUMID_ACL_DB_PATH, bootstraps schema, prunes stale rows, yields bindings, disposes on shutdown.README gains a "Scope vocabulary" section enumerating the five scopes the
PermissionCheckerenforces, and rows forLUMID_ACL_DB_PATH/LUMID_ACL_TTL_DAYS.Test Plan
End-to-end against a running FlowMesh stack: not run — requires the matching FlowMesh PR (adding
FLOWMESH_PLUGIN_DATA_DIR) to land first so the default DB path is mountable. Will retest live once both PRs merge.Test Result
Follow-ups
register()at boot for every persisted resource, or (b) expose a "list live resource IDs by kind" hook. Until then, the TTL prune is the best we can do.FLOWMESH_PLUGIN_DATA_DIRPR merges.