Skip to content

fix(config): validate disposition_* range on bank-config write so one malformed bank can't 500 the whole bank list (#2348)#2349

Merged
nicoloboschi merged 1 commit into
vectorize-io:mainfrom
r266-tech:fix/disposition-config-range-validation-2348
Jun 23, 2026
Merged

fix(config): validate disposition_* range on bank-config write so one malformed bank can't 500 the whole bank list (#2348)#2349
nicoloboschi merged 1 commit into
vectorize-io:mainfrom
r266-tech:fix/disposition-config-range-validation-2348

Conversation

@r266-tech

Copy link
Copy Markdown
Contributor

Problem

Closes #2348.

PATCH /v1/{tenant}/banks/{id}/config validates field names only — it never checks scalar type/range. So an out-of-contract disposition_skepticism / disposition_literalism / disposition_empathy (a float, a 0-1 scale, or an int outside 1-5) is accepted and json.dumps-ed verbatim into banks.config JSONB. The read overlay then injects it into a strict DispositionTraits(skepticism/literalism/empathy: int, ge=1, le=5). Because that model is built while serializing BankListResponse, a single malformed bank 500s GET /v1/{tenant}/banks for the entire tenant (and the per-bank profile) — a self-hoster can't list or manage any bank until manual JSONB surgery. This is the reported v0.8.3 case (a 0-1 scale used by mistake).

update_bank_config already validates names, credentials, permissions, entity_labels, retain_strategies and recall_budget, but has no disposition value check — the per-bank API is the only unguarded numeric entry point (the env path coerces via int(os.getenv(...))).

Fix

Add a write-side _validate_disposition_updates that raises ValueError (the route already maps ValueError → 400), called from update_bank_config right after _validate_recall_budget_updates. It mirrors the existing, accepted _validate_recall_budget_updates shape exactly: rejects floats (incl. in-range 3.0), bools, strings, and ints outside 1-5; accepts ints 1-5. Plus a unit test mirroring tests/test_recall_budget_config.py::TestValidateRecallBudgetUpdates.

On None: the validator allows None as the "clear this per-bank override" sentinel (the field is int | None). This is safe and does not reintroduce the 500: the read overlay _overlay_bank_config_disposition_mission resolves each trait as cfg_value if cfg_value is not None else <legacy banks.disposition column>, so a stored null falls back to the bank's legacy disposition and never reaches DispositionTraits — it can't poison the list.

Scope

This closes the issue as stated — the write path is the entry point that lets bad values in. Already-poisoned legacy rows (written before this guard) would still need either a one-time cleanup or read-side hardening; I kept a read-side clamp/round out of this PR since "reject-at-write + migrate" vs "clamp-on-read" is a design call. Happy to add a clamp in _overlay_bank_config_disposition_mission as a follow-up if you'd prefer the list to self-heal poisoned rows.

…rize-io#2348)

PATCH /v1/{tenant}/banks/{id}/config validated field names only, never
scalar type/range, so an out-of-contract disposition_skepticism/literalism/
empathy (float, 0-1 scale, or int outside 1-5) was json.dumps-ed into JSONB
and later injected into a strict DispositionTraits(int, ge=1, le=5) -- a
single malformed bank 500s GET banks for the whole tenant. Add a write-side
_validate_disposition_updates raising ValueError (route maps ValueError->400),
mirroring _validate_recall_budget_updates, plus a unit test. None is allowed
as the clear-override sentinel (overlay falls back to the legacy column, so
null can't poison the list).

Closes vectorize-io#2348.
@koriyoshi2041

Copy link
Copy Markdown
Contributor

I checked the narrow path locally. The new disposition validator tests and the adjacent recall-budget validator tests pass for me:

cd hindsight-api-slim
uv run pytest tests/test_disposition_config.py tests/test_recall_budget_config.py::TestValidateRecallBudgetUpdates

Result: 14 passed.

One tiny style note from local ruff:

uv run ruff check --diff tests/test_disposition_config.py

It only wants to remove the extra blank line after the import block in tests/test_disposition_config.py. I also tried the existing list overlay test, but my local run hit the same environment issue I saw yesterday: SentenceTransformer goes through a SOCKS proxy and this venv lacks socksio, so I would not treat that as a PR signal.

The write-side boundary here looks right to me: None is kept as a clear-override sentinel, while floats/bools/strings/out-of-range ints get rejected before they can poison BankListResponse.

@nicoloboschi nicoloboschi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the diagnosis and fix against the code:

  • BankConfigUpdate.updates is a raw dict[str, Any] — the one unguarded numeric entry point (deprecated bank/template paths already carry Pydantic ge=1/le=5).
  • DispositionTraits is strict int 1-5, built per-bank during list serialization → one poisoned bank 500s the whole tenant list, as described.
  • None is safe: the read overlay falls back to the legacy disposition column, so it never reaches DispositionTraits.
  • Placement (after recall-budget validation, before persist) and ValueError → 400 mapping are correct; centralizing in update_bank_config covers all callers.
  • Logic is sound on all boundary cases (bool/float/out-of-range/string rejected, string doesn't TypeError due to short-circuit). Mirrors the accepted recall-budget pattern.

Follow-up worth tracking: already-poisoned legacy rows still need a read-side clamp or one-time cleanup to self-heal. Approving as the write-side fix closes the issue as stated.

@nicoloboschi nicoloboschi merged commit 387c09e into vectorize-io:main Jun 23, 2026
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PATCH bank config accepts out-of-range/wrong-type disposition_* values, then 500s the entire list endpoint

3 participants