Skip to content

feat: accept seed_uri via host-resources SDK (phase 2c)#8

Merged
mgoldsborough merged 10 commits into
mainfrom
feat/host-resources-sdk-adoption
May 23, 2026
Merged

feat: accept seed_uri via host-resources SDK (phase 2c)#8
mgoldsborough merged 10 commits into
mainfrom
feat/host-resources-sdk-adoption

Conversation

@mgoldsborough
Copy link
Copy Markdown
Contributor

@mgoldsborough mgoldsborough commented May 22, 2026

Phase 2c of the host-resources roadmap. Phase 1 (nimblebrain#262) advertised the capability; Phase 2a (nimblebrain#263) wired the platform-side handlers; Phase 2b (nimblebrain#268, now `nimblebrain-bundle-sdk` v0.1.0 on PyPI) shipped the Python SDK. This PR is the first real adoption — closes the original production bug where synapse-research couldn't anchor on a workspace file the agent pointed it at.

Behaviour change

`start_research` grows two new optional parameters, mutually exclusive — together they cover every host/agent combination:

Parameter What it does When to use
`seed_uri` A `files://` URI the server reads via the host-resources extension. Saves the agent's context budget on the file bytes. Host advertises `ai.nimblebrain/host-resources` (Level-A path)
`seed_data` Raw text the agent passes inline. Works on every host. Host doesn't advertise the extension — Level-C fallback. Also a clean choice when the agent already has the content composed in memory.

Agents decide between the two by checking the host's advertised capabilities. When `seed_uri` is passed to a host that doesn't support host-resources (or doesn't support the `files://` scheme), the tool returns a structured error naming both the missing capability and the specific retry shape (`seed_data=`) — actionable for the agent without trial-and-error.

Passing both `seed_uri` and `seed_data` together is rejected as ambiguous rather than silently picking one.

Other behaviours:

  • Seeds above 400 KB are truncated with a visible marker carrying both the cap and the original length, so the model and the human reader see the cut.
  • Binary resources (no `text` field) and legitimately-empty text resources surface as distinct errors with different recovery hints.
  • Seed length records to the `research_run` entity as `seed_content_chars` (pre-truncation), so the Synapse UI / debug surfaces can show that a run was anchored on a workspace file.

SDK source

Pins `nimblebrain-bundle-sdk>=0.1.0` and resolves it from PyPI (no `tool.uv.sources` override anymore). Earlier revisions of this PR sourced via git ref while #268 was unmerged; that's gone — clean PyPI resolution as of `f7ca866`.

No merge-ordering dependency remains — the SDK is canonical on PyPI, so this PR is independently mergeable.

Tests

`uv run pytest tests/` → 54 passing (43 existing + 11 in `tests/test_seed_uri.py`).

Notable coverage in the new file:

  • Happy path: `seed_uri` content reaches the gpt-researcher prompt under a clear header
  • Happy path: `seed_data` inline reaches the prompt by the same route
  • `seed_uri` + `seed_data` together → mutex error
  • Capability-missing error names the missing capability and points at `seed_data`
  • Scheme-not-supported error names the scheme and points at `seed_data`
  • Empty text resource is distinct from binary resource (different recovery hints)
  • 400 KB truncation: head sentinel survives the cut, tail sentinel is dropped, marker carries both numbers
  • Backward-compat: omitting both seed parameters is a no-op

Tests mock at the `host()` factory boundary because the Python `mcp.client.ClientSession` rejects custom-method server→client requests (its `ServerRequest` union is closed). Wire-shape validation lives in the SDK's own unit tests + the platform's TS-side handler tests, both of which already exist.

Drive-by fix

Commit `b77f483` is a separate, narrowly-scoped fix for the `briefing.priority` manifest value (`"normal"` was never valid per the host schema enum `["high","medium","low"]`; blocked local install on current platform main). Kept as its own commit for review isolation — happy to split into a separate PR if reviewer prefers.

Test plan

  • `uv run pytest tests/` green (54 passing).
  • `uv run ruff check src/ tests/` clean.
  • `uv run ruff format --check src/ tests/` clean.
  • Verified end-to-end locally: uploaded a 2,558-byte CSV to a workspace, called `start_research` with `seed_uri`, observed seed content reach the gpt-researcher prompt and `seed_content_chars: 2554` on the entity (4-byte delta from UTF-8 em-dash encoding).

Add an optional `seed_uri` parameter to `start_research`. When set, the
server resolves the URI via the `ai.nimblebrain/host-resources`
extension (using `nimblebrain-bundle-sdk`'s `host(ctx).read()`) and
prepends the file content to the research prompt. Closes the original
production bug — synapse-research can now anchor research on a
workspace file (brokers.csv, transcript, doc) the agent points it at,
instead of either receiving content inlined in the query (large + ugly)
or doing web research from scratch (the bug we hit).

## Behaviour

- `seed_uri="files://fl_abc"`: server reads the file, prepends content
  under a `## Workspace seed` header in the prompt, records
  `seed_content_chars` on the `research_run` entity.
- Seeds above 400 KB are truncated with a visible marker so the model
  and the human reader both see the cut happened.
- Binary resources (no `text` field) are refused with a clear error
  telling the agent to extract text upstream.
- Hosts that don't advertise the host-resources extension surface a
  `HostCapabilityMissing` error naming the capability — the agent
  knows to retry with content inline via its own file-reading tool
  (the Level-C fallback pattern from the host-resources design).

## SDK source

Pins `nimblebrain-bundle-sdk>=0.1.0`. Until the SDK lands on PyPI, the
`[tool.uv.sources]` block sources from the local path at
`../../products/nimblebrain/code/packages/bundle-sdk-py/` (relative to
this pyproject.toml — requires the `hq` meta-repo cloned alongside
synapse-research, the standard NimbleBrain dev layout). Drop that
block once the package is on PyPI.

**Ordering:** this PR depends on `NimbleBrainInc/nimblebrain#268`
(the SDK package) merging first. Otherwise the local path doesn't
exist and `uv sync` fails.

## Tests

48 passing (43 existing + 5 new). The new file covers:
- happy path: seed content reaches the worker's gpt-researcher prompt
- entity records `seed_content_chars` correctly
- Level-C: `HostCapabilityMissing` surfaces with the capability name
- binary refusal: clear error directing the agent to extract upstream
- backward-compat: omitting `seed_uri` is a no-op
- SDK importability smoke

Tests mock at the `host()` factory boundary because the Python
`mcp.client.ClientSession` rejects custom-method server→client
requests (its `ServerRequest` union is closed). Wire-shape validation
lives in the SDK's own unit tests and the platform's TS-side handler
tests, both of which already exist.
synapse-research CI clones only its own repo — the local-path
`tool.uv.sources` (`../../products/nimblebrain/code/packages/bundle-sdk-py`)
resolves on developer machines with the `hq` meta-repo cloned
alongside, but breaks in CI where the parent path doesn't exist.

Switch to a git source pinned to a specific commit on the SDK PR
branch (`nimblebrain@89cd28f`). Reproducible builds, works in CI
without a meta-repo clone, and bumps explicitly when the SDK ships
a new pre-1.0 revision.

Drop this block once `nimblebrain-bundle-sdk` is on PyPI; the
version pin in `[project.dependencies]` resolves through PyPI from
then on. Until then, contributors editing the SDK locally should
`uv pip install -e <path>` against their checkout.
The previous version added `seed_uri` (host-resources-extension read)
but left a real gap: hosts without the extension had nowhere to put
seed content. The error message even told the agent to "pass content
inline via `seed_data`" — except `seed_data` didn't exist as a
parameter. Fixing that.

## Changes

`start_research` now takes both `seed_uri` and `seed_data`,
mutually exclusive:

- `seed_uri`: host reads via `ai.nimblebrain/host-resources`.
  Preferred when available — the agent doesn't pay context budget
  on the file bytes.
- `seed_data`: raw text passed inline by the agent. Universal
  fallback that works on every host. Required for hosts without
  the extension.

When `seed_uri` is set on a host that doesn't advertise the
extension, the tool now returns a `ValueError` whose message names
both the missing capability AND the specific retry shape
(`seed_data=<file contents>`). The previous error pointed at a
non-existent parameter — a Level-C signal that wasn't actionable.

Passing both `seed_uri` and `seed_data` is rejected as ambiguous
rather than picking one; the agent probably confused the two paths.

## Tests

8 passing in tests/test_seed_uri.py (+3 new):
- `seed_data` inline reaches the worker prompt
- `seed_uri` + `seed_data` together → mutex error
- Capability-missing error tells the agent to retry with `seed_data`
  specifically (not just "pass content inline")

All 51 tests (43 existing + 8 in test_seed_uri.py) pass.
The platform's host-manifest schema constrains `briefing.priority`
to `["high", "medium", "low"]` (host-manifest.schema.json:153). The
existing `"normal"` value was never valid against this enum — it
worked previously because the platform didn't enforce the schema
strictly. Recent installs fail with:

  Bundle "@nimblebraininc/synapse-research" has an invalid
  _meta["ai.nimblebrain/host"] block:
  ai.nimblebrain/host/briefing/priority: must be equal to one of the
  allowed values. Refusing to install.

Switching to `"medium"` — it's the middle bucket, matching the prior
intent (default-ish priority for this app's briefing facets).

Unrelated to the Phase 2c work in this PR; folded in because it
blocks local testing of any synapse-research install against current
platform main.
PR NimbleBrainInc/nimblebrain#268 merged as 852cbdd. The previous git
ref pointed at the SDK PR branch (89cd28f) which was deleted on
merge — the ref still resolves via github.com's commit-keyed fetch,
but tracking a deleted branch's tip is confusing. Bump to the
squash-merge commit on main.

Still git-sourced, not PyPI; the actual `bundle-sdk-py/v0.1.0` PyPI
publish requires the one-time Trusted Publisher setup. Once that
lands, drop the `[tool.uv.sources]` block entirely and let the
version pin in `[project.dependencies]` resolve through PyPI.
nimblebrain-bundle-sdk v0.1.0 is now on PyPI:
https://pypi.org/project/nimblebrain-bundle-sdk/0.1.0/

Drop the `[tool.uv.sources]` git-source override. The `>=0.1.0` pin in
`[project.dependencies]` now resolves through PyPI, which means fresh
clones (devs, CI runners) install the SDK without needing the `hq`
meta-repo cloned alongside or a specific git commit fetched.

uv.lock updated to reflect the registry source. 51 tests still green.
…d 13 on #8)

Three substantive fixes from QA review:

## Truncation test (Critical #4)

`_SEED_MAX_CHARS = 400_000` was implemented without a test exercising
the slicing path or the marker shape. Added
`test_seed_truncation_emits_marker` that constructs a seed >cap with
sentinel head/tail strings, asserts the head survives the cut, the
tail is dropped, and the marker carries both the cap and the actual
length in its formatted-number form. Future refactors that touch the
cap or the marker text now fail loudly.

## Empty-text resource distinct from binary (Suggestion #1)

`if not text:` matched both `None` (binary) and `""` (empty text).
A legitimately empty workspace file was reported as binary with a
misleading "extract text upstream" recovery hint. Tightened to
`if text is None:` for the binary branch and added a dedicated
empty-text branch whose error tells the agent the file is actually
empty (verify contents, or omit `seed_uri`). New test pins this
distinction.

## Scheme probe before read (Suggestion #3)

`_resolve_seed_uri` now calls `h.supports_scheme("files")` after the
availability check. The platform would otherwise return
`-32602 Invalid params` for an unsupported scheme, which the agent
sees as a generic wire error. Routing it through the same Level-C
retry hint (pass `seed_data` inline) gives the agent an actionable
recovery path. Not load-bearing today (the platform always supports
`files://`), but cheap insurance and exercises the SDK's
`supports_scheme()` surface.

## Tests

54 passing (43 existing + 11 in test_seed_uri.py, +3 new):
- test_seed_truncation_emits_marker
- test_seed_uri_empty_text_distinct_from_binary
- test_seed_uri_scheme_not_supported

All existing fixtures (`seeded_host`, `unavailable_host`,
`binary_host`) grew a `supports_scheme()` method to match the new
SDK probe. Lint + format clean on src/ and tests/.
QA reviewer's worktree at `.claude/worktrees/feat-host-resources-sdk-adoption`
got staged as an embedded repo in the previous commit because `.claude/`
wasn't in `.gitignore`. Untrack the pointer + add the directory to
gitignore so future QA worktrees don't reintroduce it.
Commit 9a99def corrupted .gitignore by appending `.claude/` via
`echo >>` to a file whose last line had no trailing newline. The
concatenation produced:

  .tasks/.claude/

which broke BOTH intended behaviours:
  - `.tasks/foo` was no longer ignored (`/implement` scratch leaked)
  - `.claude/worktrees/foo` was never ignored (QA worktree submodule
    pointers can still be re-staged — the bug 9a99def was meant to fix)

The only path that was newly ignored — `.tasks/.claude/` — doesn't
exist in this repo.

Split into two distinct lines with proper newline terminators.
Verified with `git check-ignore -v`:

  .gitignore:28:.tasks/   .tasks/foo
  .gitignore:31:.claude/  .claude/worktrees/foo
@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label May 23, 2026
@mgoldsborough mgoldsborough merged commit 17b204c into main May 23, 2026
2 checks passed
@mgoldsborough mgoldsborough deleted the feat/host-resources-sdk-adoption branch May 23, 2026 18:04
mgoldsborough added a commit that referenced this pull request May 23, 2026
Minor bump for the seed_uri + seed_data feature add merged in #8
(Phase 2c host-resources SDK adoption). Cuts the first synapse-research
release that can anchor research on a workspace file.

See the GitHub release notes on v0.3.0 for user-visible changes; this
commit only touches manifest.json / pyproject.toml / __init__.py via
`make bump`.

Co-authored-by: Mathew Goldsborough <1759329+mgoldsborough@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant