fix(config): validate disposition_* range on bank-config write so one malformed bank can't 500 the whole bank list (#2348)#2349
Conversation
…rize-io#2348) PATCH /v1/{tenant}/banks/{id}/config validated field names only, never scalar type/range, so an out-of-contract disposition_skepticism/literalism/ empathy (float, 0-1 scale, or int outside 1-5) was json.dumps-ed into JSONB and later injected into a strict DispositionTraits(int, ge=1, le=5) -- a single malformed bank 500s GET banks for the whole tenant. Add a write-side _validate_disposition_updates raising ValueError (route maps ValueError->400), mirroring _validate_recall_budget_updates, plus a unit test. None is allowed as the clear-override sentinel (overlay falls back to the legacy column, so null can't poison the list). Closes vectorize-io#2348.
|
I checked the narrow path locally. The new disposition validator tests and the adjacent recall-budget validator tests pass for me: cd hindsight-api-slim
uv run pytest tests/test_disposition_config.py tests/test_recall_budget_config.py::TestValidateRecallBudgetUpdatesResult: One tiny style note from local ruff: uv run ruff check --diff tests/test_disposition_config.pyIt only wants to remove the extra blank line after the import block in The write-side boundary here looks right to me: |
nicoloboschi
left a comment
There was a problem hiding this comment.
Verified the diagnosis and fix against the code:
BankConfigUpdate.updatesis a rawdict[str, Any]— the one unguarded numeric entry point (deprecated bank/template paths already carry Pydantic ge=1/le=5).DispositionTraitsis strict int 1-5, built per-bank during list serialization → one poisoned bank 500s the whole tenant list, as described.Noneis safe: the read overlay falls back to the legacy disposition column, so it never reachesDispositionTraits.- Placement (after recall-budget validation, before persist) and
ValueError → 400mapping are correct; centralizing inupdate_bank_configcovers all callers. - Logic is sound on all boundary cases (bool/float/out-of-range/string rejected, string doesn't TypeError due to short-circuit). Mirrors the accepted recall-budget pattern.
Follow-up worth tracking: already-poisoned legacy rows still need a read-side clamp or one-time cleanup to self-heal. Approving as the write-side fix closes the issue as stated.
Problem
Closes #2348.
PATCH /v1/{tenant}/banks/{id}/configvalidates field names only — it never checks scalar type/range. So an out-of-contractdisposition_skepticism/disposition_literalism/disposition_empathy(a float, a 0-1 scale, or an int outside 1-5) is accepted andjson.dumps-ed verbatim intobanks.configJSONB. The read overlay then injects it into a strictDispositionTraits(skepticism/literalism/empathy: int, ge=1, le=5). Because that model is built while serializingBankListResponse, a single malformed bank 500sGET /v1/{tenant}/banksfor the entire tenant (and the per-bank profile) — a self-hoster can't list or manage any bank until manual JSONB surgery. This is the reported v0.8.3 case (a 0-1 scale used by mistake).update_bank_configalready validates names, credentials, permissions, entity_labels, retain_strategies and recall_budget, but has no disposition value check — the per-bank API is the only unguarded numeric entry point (the env path coerces viaint(os.getenv(...))).Fix
Add a write-side
_validate_disposition_updatesthat raisesValueError(the route already mapsValueError → 400), called fromupdate_bank_configright after_validate_recall_budget_updates. It mirrors the existing, accepted_validate_recall_budget_updatesshape exactly: rejects floats (incl. in-range3.0), bools, strings, and ints outside 1-5; accepts ints 1-5. Plus a unit test mirroringtests/test_recall_budget_config.py::TestValidateRecallBudgetUpdates.On
None: the validator allowsNoneas the "clear this per-bank override" sentinel (the field isint | None). This is safe and does not reintroduce the 500: the read overlay_overlay_bank_config_disposition_missionresolves each trait ascfg_value if cfg_value is not None else <legacy banks.disposition column>, so a storednullfalls back to the bank's legacy disposition and never reachesDispositionTraits— it can't poison the list.Scope
This closes the issue as stated — the write path is the entry point that lets bad values in. Already-poisoned legacy rows (written before this guard) would still need either a one-time cleanup or read-side hardening; I kept a read-side clamp/round out of this PR since "reject-at-write + migrate" vs "clamp-on-read" is a design call. Happy to add a clamp in
_overlay_bank_config_disposition_missionas a follow-up if you'd prefer the list to self-heal poisoned rows.