Skip to content

chore(sync): pull upstream NousResearch/hermes-agent@main#5

Open
github-actions[bot] wants to merge 776 commits into
mainfrom
upstream-sync/20260430-1403-9a14540
Open

chore(sync): pull upstream NousResearch/hermes-agent@main#5
github-actions[bot] wants to merge 776 commits into
mainfrom
upstream-sync/20260430-1403-9a14540

Conversation

@github-actions
Copy link
Copy Markdown

Automated sync from NousResearch/hermes-agent@main.

  • Upstream HEAD: 9a145406031aab0054868d27e64314e165a15806
  • Fork was 776 commits behind, 11 commits ahead.

Review the diff and let CI pass before merging. If the fork has zero local commits ahead, this is effectively a fast-forward and can be merged once green.

Generated by .github/workflows/sync-upstream.yml.

OutThisLife and others added 30 commits April 28, 2026 19:42
Two targeted fixes on the critical path from `hermes --tui` launch to
`gateway.ready`:

1. **Defer `@hermes/ink` import in memoryMonitor.ts.** The static top-level
   import dragged the full ~414KB Ink bundle (React + renderer + all
   components/hooks) onto the critical path *before* `gw.start()` could
   spawn the Python gateway — serialising ~155ms of Node work in front of
   it on every launch. `evictInkCaches` only runs inside the 10-second
   tick under heap pressure, so it moves to a lazy dynamic import. First
   tick hits the ESM cache because the app entry has long since imported
   `@hermes/ink`.

2. **Gate `tools.mcp_tool` import on config in tui_gateway/entry.py.**
   Importing the module transitively pulls the MCP SDK + pydantic + httpx
   + jsonschema + starlette formparsers (~200ms). The overwhelming
   majority of users have no `mcp_servers` configured, so this runs for
   nothing. A cheap `load_config()` check (~25ms) skips the 200ms import
   when no servers are declared, with a conservative fallback to the old
   behaviour if the config probe itself fails.

## Measurements (macOS Terminal.app, Apple Silicon, n=12)

| Metric                     | Before (p50) | After (p50) | Δ        |
|----------------------------|--------------|-------------|----------|
| Python gateway boot alone  | 252–365ms    | 105–151ms   | −180ms   |
| `hermes --tui` banner paint | 686ms        | 665ms       | −21ms    |
| `hermes --tui` → ready      | **1843ms**   | **1655ms**  | **−188ms (−10.2%)** |
| `hermes --tui` → ready p90  | 1932ms       | 1778ms      | −154ms   |
| stdev (ready)              | 126ms        | 83ms        | also more consistent |

## Tests

- `scripts/run_tests.sh tests/tui_gateway/ tests/tools/test_mcp_tool.py`:
  195 passed.  (The one pre-existing failure in
  `test_session_resume_returns_hydrated_messages` reproduces on main —
  unrelated, it's a mock-DB kwarg mismatch.)
- `ui-tui` vitest: 430 tests, all pass.
- `npm run type-check` in ui-tui: clean.

## Notes

- Node-side first paint ("banner") didn't move meaningfully because that
  latency is dominated by Ink's render pipeline + React mount, not by
  which imports load first.
- The win shows up entirely in the time from banner to `gateway.ready`
  — exactly where we expected it, since both fixes shorten the Python
  gateway's boot path or let it overlap more with Node startup.
- No user-visible behaviour change. Memory monitoring still fires every
  10s; MCP still works when `mcp_servers` is configured.
NousResearch#17098)

Two amplifying optimizations to per-turn overhead in the gateway:

1. get_tool_definitions() memoization (model_tools.py)
   Keyed on (frozenset(enabled), frozenset(disabled),
   registry._generation, config.yaml mtime+size). Only active when
   quiet_mode=True (which is every hot-path caller — gateway,
   AIAgent.__init__); quiet_mode=False keeps the existing print side
   effects. Cached path returns a shallow-copy list sharing read-only
   schema dicts.

   Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway
   constructs fresh AIAgent per message, so this saves ~7 ms/turn before
   any LLM work.

2. check_fn() TTL cache (tools/registry.py)
   check_fn callables like check_terminal_requirements probe external
   state (Docker daemon, Modal SDK, playwright binary). For a long-lived
   process, hitting them on every get_definitions() pass was pure waste
   — external state changes on human timescales. 30 s TTL so env-var
   flips (hermes tools enable X) propagate within a turn or two without
   explicit invalidation.

   Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate);
   subsequent calls ~0.01ms via the upstream memoization.

Invalidation surface:
- registry._generation bumps on register/deregister/register_toolset_alias,
  invalidating the memoized definitions automatically.
- config.yaml mtime in the cache key captures user-visible config edits
  affecting dynamic schemas (execute_code mode, discord allowlist).
- invalidate_check_fn_cache() exposed for explicit flushes (e.g. after
  hermes tools enable/disable).
- tests/conftest.py autouse fixture clears both caches before every test
  so env-var monkeypatches don't see stale results.

Also fixes a regression from PR NousResearch#17046 that I missed:
- tools/web_tools.py — Firecrawl was removed from module scope by the
  lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'.
  Applied the same _FirecrawlProxy pattern used in auxiliary_client/
  run_agent for OpenAI (module-level proxy that looks like the class
  but imports the SDK on first call/isinstance; patch() replaces the
  attribute as usual).

Verified:
- 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main)
- 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in
  the full suite due to check_fn TTL cross-test pollution; fixed by
  the autouse fixture)
- 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp
  dynamic discovery, 5 mcp structured content — all confirmed on main)
- 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails)
- 868/868 tests/run_agent/ (excluding test_run_agent.py which has
  pre-existing suite-level issues)
- Live smoke: 2 turns + /model switch + tool calls, zero errors in
  agent.log session window.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Validate configured providers against both Hermes runtime provider ids and
catalog-normalized provider ids. This keeps providers like ai-gateway from
being rejected after catalog resolution maps them to models.dev ids.

Keep credential checks and vendor-slug warnings anchored to the runtime id
so doctor reports actionable provider names in follow-up diagnostics.
…ch#17202)

Replace the removed built-in boot-md hook (NousResearch#17093) with a how-to that
shows users how to wire up the same behavior themselves via the hooks
system. Uses _resolve_gateway_model() + _resolve_runtime_agent_kwargs()
so the example works against custom endpoints and OAuth providers,
not just the aggregator defaults that the old built-in silently assumed.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
…17203)

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Address two Copilot review comments on PR NousResearch#17175.

- `wrapForFrac` doc said "additive operators or whitespace" but the
  implementation also matches `*` and `/`. The wider behaviour is the
  one we want (nested products and fractions need parens to disambiguate
  inline `/`), so the doc is updated to match instead of tightening the
  regex.

- `fenceOpenAt` was flagged as "overly conservative" vs. `markdown.tsx`,
  which falls back to paragraph rendering for unclosed `$$` openers.
  Mirroring that fallback in the streaming chunker would prematurely
  commit a paragraph rendering of the unclosed opener to the monotonic
  stable prefix, where it would be frozen and become wrong the moment
  the closer streams in. The asymmetry is deliberate; document why so
  it isn't "fixed" again later.

Made-with: Cursor
…ousResearch#17206)

detect_dangerous_command() and detect_hardline_command() were calling
re.search(pattern, text, re.IGNORECASE | re.DOTALL) inline — Python's
re._cache (512 patterns) amortizes compile cost on the warm path, but:

  1. The first terminal() call per process pays the full compile fan-out
     for all 59 patterns (12 HARDLINE + 47 DANGEROUS). Measured at
     ~2.6 ms per detect_dangerous_command() call after re.purge().
  2. The re._cache is LRU — unrelated regex work elsewhere in the agent
     (response parsing, text normalization, etc.) can evict our patterns
     and silently re-compile them on the next terminal() call.

Precompiling at module load eliminates both costs:

  detect_dangerous_command:
    cold  2.613 ms  →  0.298 ms   (-88%)
    warm  0.042 ms  →  0.004 ms   (-90%)
  detect_hardline_command:
    cold  ~0.6 ms   →  0.006 ms
    warm  0.011 ms  →  0.002 ms

Savings are per terminal() call. Agents with heavy terminal use see
compound savings; the bigger value is the stability guarantee (no
re._cache eviction can silently re-introduce the 2.6 ms cold cost
mid-session).

Implementation:
- HARDLINE_PATTERNS_COMPILED and DANGEROUS_PATTERNS_COMPILED built at
  module load from the existing (pattern, description) tuples, using
  shared _RE_FLAGS = re.IGNORECASE | re.DOTALL.
- detect_* functions now iterate the compiled list and call pattern_re.search(text).
- Original HARDLINE_PATTERNS and DANGEROUS_PATTERNS lists kept as-is
  (other code in the file uses them for key derivation /
  _PATTERN_KEY_ALIASES).

Verified:
- 160/161 tests/tools/test_approval*.py pass (1 pre-existing heartbeat
  test flake on main).
- 349/349 tests/tools/ 'approval or terminal or dangerous' pass.
- Live hermes chat smoke: 3 benign terminal commands + 1 rm -rf /tmp/
  (clarify prompt fired — approval path still works) + 1 sudo (sudo
  password prompt fired — DANGEROUS pattern match still works). 23
  log lines in the smoke window, zero errors.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Made-with: Cursor

# Conflicts:
#	ui-tui/src/components/markdown.tsx
Prevent unterminated bracketed paste input from swallowing future keystrokes, and avoid rendering an empty Thinking panel before reasoning arrives.
Keep the latest prompt sticky while the viewport is in live assistant output beyond history, and clear stale sticky state at the real bottom using fresh scroll height.
Keep the /steer acknowledgement plain text so it reads like the rest of the TUI status copy.
Run the TUI lint autofix and formatter on the PR branch after the sticky prompt and paste recovery changes.
Match the buffered-stdin rearm cadence to IN_PASTE state so large pastes do not spin the normal escape timeout while waiting for readable data to drain.
…watchdog

fix(tui): stabilize sticky prompts and paste recovery
TUI session readiness was still laggy after the gateway-ready fixes. Profiling
session.create -> session.info showed the slow phase is background AIAgent
construction (~1.1s). A cProfile run of tui_gateway.server::_make_agent showed
model_tools/tool discovery importing tools.code_execution_tool, whose
module-level EXECUTE_CODE_SCHEMA calls _get_execution_mode(), which imported
cli.CLI_CONFIG.

That pulled the classic interactive CLI stack (prompt_toolkit/Rich and REPL
setup) into every agent startup path, including hermes --tui where it is not
used. Replace that with hermes_cli.config.read_raw_config(), which is cached and
reads only the raw code_execution section. Existing defaults still apply when
the key is absent.

Measurements on macOS Terminal.app:
- import run_agent: ~466ms -> ~347ms
- model_tools import: ~418ms -> ~272ms
- _make_agent: ~1452ms -> ~1239ms
- session.create -> session.info: ~1167ms -> ~999ms
- full hermes --tui ready p50: ~1655ms -> ~1537ms

Tests:
- scripts/run_tests.sh tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
…ft (NousResearch#17207)

Three modules independently implemented the same "preserve head+tail of
a secret, mask the middle" logic with slightly different behaviors that
had started to drift:

  hermes_cli/config.py redact_key  — 12-char floor, 4+4, DIM '(not set)'
  hermes_cli/status.py redact_key  — 12-char floor, 4+4, plain '(not set)'  ← drift
  hermes_cli/dump.py _redact       — 12-char floor, 4+4, empty string

The visible bug: 'hermes status' displayed the '(not set)' placeholder
in plain text while 'hermes config' showed it in dim text. Same concept,
inconsistent UI.

Introduces mask_secret() in agent/redact.py as the canonical helper,
with head/tail/floor/placeholder/empty kwargs. The three call sites
become one-line wrappers that differ only in the 'empty' handling:

  config.redact_key  → mask_secret(k, empty=color('(not set)', Colors.DIM))
  status.redact_key  → mask_secret(k, empty=color('(not set)', Colors.DIM))
  dump._redact       → mask_secret(v)  # empty → ''

agent.redact._mask_token (log redactor, different policy: 18-char floor,
6+4 visible, '***' on empty) also ports to mask_secret but retains its
own empty-case handling to preserve the historical '***' return.

Net: the three display-time redactors now agree on formatting, the
canonical helper lives in one place, and future tweaks (e.g. adding
bullet-point masking, changing the head/tail widths) happen once.

Verified:
- 3/3 tests/hermes_cli/test_web_server.py::TestRedactKey pass
- 89/89 agent/tests/test_redact.py + tests/tools/test_browser_secret_exfil.py
  + tests/hermes_cli/test_redact_config_bridge.py pass
- Live 'hermes status', 'hermes config', 'hermes dump' all render the
  same way they did before (verified against actual env with real
  keys: OpenRouter, Firecrawl, Browserbase, FAL, Tinker all show
  'prefix...suffix'; Kimi shows '***' at <12 chars; unset shows
  '(not set)' uniformly).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Adds a 'pretext' skill under skills/creative/ for building cool browser
demos with @chenglou/pretext — the 15KB DOM-free text-layout library by
Cheng Lou.

The skill documents pretext as a creative primitive (not plumbing): text
flowing around obstacles, text-as-geometry games, proportional ASCII
surfaces, shatter/particle typography, editorial multi-column, kinetic
type, and multiline shrink-wrap. Each pattern pairs with copy-pasteable
snippets in references/patterns.md.

Two single-file HTML templates, both verified in a browser:

  templates/hello-orb-flow.html
    Minimal starter: long paragraph flows around a mouse-tracked orb
    using layoutNextLineRange + a per-row corridor-width function.

  templates/donut-orbit.html
    Full 3D Sloane torus with orbit controls (drag to rotate, scroll to
    zoom, idle auto-rotate). Each 'luminance pixel' is a real grapheme
    sampled in reading order from a prose corpus via pretext's
    prepareWithSegments + layoutWithLines + Intl.Segmenter. Amber-on-
    black CRT aesthetic, z-buffer keyed by screen cell, 60fps.

Related skills: p5js, claude-design, excalidraw, architecture-diagram.
…riants (NousResearch#17213)

The background skill-review prompts (_SKILL_REVIEW_PROMPT and the **Skills**
half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive
behavior — most passes concluded 'Nothing to save.' even when the session
produced real lessons. User-preference corrections (style, format,
legibility, verbosity) were especially lost: they were read as memory
signals only, so skills never carried the fix.

This rewrite changes the stance:

- **Active-update bias.** The reviewer now treats inaction as a missed
  learning opportunity. 'Nothing to save.' remains an explicit escape
  but is no longer framed as the most-common outcome.

- **User-preference corrections are first-class skill signals.** Style,
  tone, format, legibility, verbosity complaints — and the actual
  phrasings users use ('stop doing X', 'this is too verbose', 'I hate
  when you Y', 'remember this') — now warrant patching the skill that
  governs the task, not just writing to memory.

- **Loaded-skill-first preference order.** When a skill was loaded via
  /skill-name or skill_view during the session, the reviewer patches
  THAT one first. It was in play; it's the right place.

- **Four-step ladder: patch-loaded → patch-umbrella → support-file →
  create.** Support files are explicitly enumerated as three kinds:
    * references/<topic>.md — session-specific detail OR condensed
      knowledge banks (quoted research, API docs excerpts, domain notes)
    * templates/<name>.<ext> — starter files to copy and modify
    * scripts/<name>.<ext>  — statically re-runnable actions

- **Name-veto for CREATE.** New skill names MUST be class-level — no PR
  numbers, error strings, codenames, library-alone names, or session
  artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name
  only fits today's task, fall back to one of the patch/support-file
  options.

- **Memory scope clarified.** 'who the user is and what the current
  situation and state of your operations are' — MEMORY.md is
  situational/state, USER.md is identity/preferences.

- **Curator handoff.** Reviewer flags overlap; the background curator
  handles consolidation at scale. Single-session reviewer doesn't
  attempt umbrella-rebalancing.

Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to
assert the new behavioral contracts (active bias, user-correction
signals, loaded-skill-first, support-file kinds, name-veto, memory
framing, curator handoff). 17 tests, all pass.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Match classic CLI perceived startup behavior: show the TUI shell and composer
before constructing the full AIAgent. session.create now returns a lightweight
placeholder session with lazy=true and no longer starts _make_agent eagerly.
The first method that needs the agent triggers _start_agent_build() via _sess();
prompt.submit is routed through the RPC worker pool so that the initial wait for
agent construction does not block the stdio dispatcher.

The intro panel renders skeleton rows for tools/skills while the real
session.info payload is absent, then hydrates to the real tools/skills panel once
AIAgent initialization completes. Also skip the startup /voice status probe and
avoid the input.detect_drop RPC for ordinary plain-text prompts to keep early
startup/first-submit paths cheap.

Measurements on macOS Terminal.app:
- Previous full ready p50 after earlier PR commits: ~1537ms
- Lazy skeleton panel p50: ~794ms
- Original baseline full ready p50: ~1843ms

So the visible startup surface is now ~743ms faster than the prior PR state and
~1.05s faster than the original baseline. First prompt still pays the same agent
construction cost if it races the background/skeleton state, matching classic
CLI's deferred behavior.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
The lazy startup panel could remain stuck on the placeholder when no first
prompt was submitted because agent construction only started from _sess(). Keep
session.create cheap, but schedule _start_agent_build shortly after returning
the placeholder so tools/skills hydrate automatically.

Also replace the ugly placeholder bar rows with compact unicode-animations
braille loaders for the tools and skills sections.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
Copilot correctly flagged two concurrency windows:

- memoryMonitor could re-enter while awaiting the lazy @hermes/ink import or
  heap dump, producing duplicate imports/dumps under sustained pressure.
- _start_agent_build used a check-then-set guard without synchronization, so
  concurrent agent-backed RPCs could start duplicate agent builders.

Fix both with single-flight guards: cache the dynamic import promise and track
per-level dump in-flight state in memoryMonitor, and protect the TUI agent build
flag with a per-session lock.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
A cleanup review found that adding prompt.submit to _LONG_HANDLERS made the RPC
pool own the full first-turn wait even though the handler itself already spawns
a turn thread. Keep prompt.submit inline and make it return immediately:

- look up the session without waiting
- kick the lazy agent build
- spawn a short waiter thread that blocks on agent_ready, then starts the
  existing turn dispatcher

This keeps stdin dispatch responsive, avoids occupying a bounded pool worker for
a normal chat turn, and preserves the lazy-start hydration behavior.

Tests:
- python -m py_compile tui_gateway/server.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
Clean up the remaining review nits:

- let the deferred @hermes/ink import retry after a transient failure instead
  of memoizing a rejected promise forever
- keep memory-monitor in-flight state inside a finally so future exceptions
  cannot suppress that memory level indefinitely
- use read_raw_config for the TUI MCP cold-start probe instead of full
  load_config()
- keep input.detect_drop for explicit relative path prefixes (./ and ../)
  while preserving the no-RPC fast path for ordinary plain prompts

Tests:
- python -m py_compile tui_gateway/server.py tui_gateway/entry.py
- cd ui-tui && npm run type-check && npm run build
- scripts/run_tests.sh tests/tui_gateway/test_protocol.py::test_sess_found tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
- cd ui-tui && npm test -- --run src/__tests__/useSessionLifecycle.test.ts src/__tests__/useConfigSync.test.ts
Share Chrome CDP launch helpers between the classic CLI and TUI so default /browser connect uses loopback consistently, retries local Chrome launch, and reports a copyable manual-start command instead of claiming a dead connection.
Detect an actual Chrome/Chromium executable before printing a manual CDP launch command, including common WSL-mounted Windows browser paths, so /browser connect does not suggest google-chrome when it is unavailable.
Return CLI-style browser connect status messages from the gateway and render them in the TUI so local Chrome launch attempts are visible instead of ending in a silent delayed failure.
Emit browser.progress JSON-RPC notifications during the connect work and render them in the TUI as system transcript lines, so users see the same step-by-step status the base CLI prints instead of nothing for ~1m followed by a final result.
Split browser.manage into a small dispatcher with named connect/disconnect
helpers, fold _http_ok / _probe_urls / _normalize_cdp_url out of the nested
probe loop, collapse the failure-message scaffolding, and DRY the chrome
candidate path tables. Behaviour and event shape unchanged.
Fixes from Copilot's two passes on PR NousResearch#17238:

* Validate parsed URL once: reject missing host, invalid port, and
  unsupported scheme up front so malformed inputs (e.g. http://:9222
  or http://localhost:abc) don't fall through to a generic 5031.
* Tighten _is_default_local_cdp to require a discovery-style path so
  ws://127.0.0.1:9222/devtools/browser/<id> is not collapsed to bare
  http://127.0.0.1:9222 (which would lose the path and break the
  connect).
* Move browser.manage into _LONG_HANDLERS so the up-to-10s
  launch-and-retry loop runs on the RPC pool instead of blocking the
  main dispatcher.
* try_launch_chrome_debug uses Windows-appropriate detach kwargs
  (creationflags=DETACHED_PROCESS|CREATE_NEW_PROCESS_GROUP) instead
  of POSIX-only start_new_session=True.
* manual_chrome_debug_command uses subprocess.list2cmdline on
  Windows so the printed instruction is cmd.exe-compatible.
* Mirror host/port validation in cli.py /browser connect so the
  classic CLI never persists an invalid BROWSER_CDP_URL.
teknium1 and others added 21 commits April 30, 2026 03:32
Follow-up to the try/except guards added in the previous commit.
Four sibling call sites all read HERMES_AGENT_TIMEOUT /
HERMES_AGENT_TIMEOUT_WARNING / HERMES_AGENT_NOTIFY_INTERVAL via the
same read-env-or-fallback pattern, so factor it into _float_env(name,
default) alongside the existing _auto_continue_freshness_window()
helper.
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.

Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.

Also implements batching + rate-limit handling in the `send_message`
tool.

New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
…ck, Mattermost, Email

Ports PR NousResearch#17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.

Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
  GIFs peeled off and routed through send_animation (albums don't support
  animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
  over); URL images downloaded into BytesIO so they render inline; forum
  channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
  respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
  chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
  SMTP size governs); remote URLs remain linked in body (parity with
  existing send_image)

All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.

Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.

Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.

Co-authored-by: Maxence Groine <maxence@groine.fr>
…NousResearch#17426)

The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and
historically other Google models like PaLM. Those reject
`extra_body.thinking_config` with HTTP 400:

    Unknown name "thinking_config": Cannot find field

`_build_gemini_thinking_config()` was unconditionally producing a
config dict for any model on the `gemini` / `google-gemini-cli`
provider, which `ChatCompletionsTransport.build_kwargs` then dropped
into `extra_body["thinking_config"]`. The result: every chat turn for
Gemma users on the gemini provider blew up at the API edge.

The fix is the same shape Hermes already uses for the Gemini-2.5 vs
Gemini-3 family clamping: normalise the model id, strip an
`OpenRouter`-style `google/` prefix, and short-circuit early when the
result doesn't start with `gemini`. We return `None` rather than
`{"includeThoughts": False}`, because the API rejects the field name
itself — even the polite "off" form trips the same 400.

Three regression tests cover Gemma with reasoning enabled, Gemma with
reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the
existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing
because the Gemini guard fires after the prefix strip.

Fixes NousResearch#17426

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thorized injection (NousResearch#17775)

The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().

This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.

Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.

Closes NousResearch#17775
…isplay paths

When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).

Root cause: two separate context values existed independently:
  - context_compressor.context_length = config value (e.g. 65536) ✓
  - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config

Changes:

1. Cap Ollama num_ctx to config context_length (run_agent.py)
   When model.context_length is explicitly set and no explicit
   ollama_num_ctx override exists, cap the auto-detected GGUF value
   to the user's context_length. This is the core fix — it prevents
   Ollama from allocating more VRAM than the user budgeted.

2. Pass config_context_length through all secondary call sites
   Several paths called get_model_context_length() without the config
   override, falling through to the 256K default fallback:
   - cli.py: @-reference expansion and /model switch display
   - gateway/run.py: @-reference expansion and /model switch display
   - tui_gateway/server.py: @-reference expansion
   - hermes_cli/model_switch.py: resolve_display_context_length()

3. Normalize root-level context_length in config (hermes_cli/config.py)
   _normalize_root_model_keys() now migrates root-level context_length
   into the model section, matching existing behavior for provider and
   base_url. Users who wrote `context_length: 65536` at the YAML root
   instead of under `model:` had it silently ignored.

4. Fix misleading comments (agent/model_metadata.py)
   DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
   as two comments stated.

Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
… injection (NousResearch#17335)

Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:

  - DeepSeek:        'Tool names must be unique.'
  - Xiaomi MiMo:     'tools contains duplicate names: lcm_expand'
  - Moonshot/Kimi:   'function name lcm_grep is duplicated'

TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.

Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
  in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
  but the FIRST uncached call stored and returned the SAME object.
  run_agent.py mutates self.tools (memory + LCM context-engine schemas)
  in-place, so the very first agent init in a Gateway process
  poisoned the cache, and every subsequent init appended LCM schemas
  again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
  lcm_expand) had no dedup, unlike the memory-tools injection right
  above it which already skips already-present names.

Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
  computed list but return list(result) to the caller. Same pattern as
  the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
  schemas whose names are already present, mirroring the memory-tools
  block. This also defends against plugin paths that may register the
  same schemas via ctx.register_tool().

Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
  it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
  appending lcm_grep / lcm_expand to the returned list and asserts the
  next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
  long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
  was fine.

5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
When hermes model picker switches to a custom_providers entry, the slug
assignment can write the literal string 'custom' to model.provider if a
prior failed switch already left that value in config.yaml.

Two fixes:
1. model_switch.py: filter out bare 'custom' in slug assignment, always
   resolve to canonical custom:<name> form
2. providers.py: resolve_custom_provider() self-heals bare 'custom' by
   falling back to the first valid custom_providers entry

Closes NousResearch#17478
…7876)

_set_nested unconditionally replaced any non-dict value with an empty
dict when walking the dotted path, which silently destroyed list-typed
config nodes the moment someone set a value with a numeric index
(e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling
entries and any fields inside the targeted entry that the user didn't
write were lost.

Fix:
- _set_nested now detects list nodes and navigates by numeric index,
  and preserves both dicts AND lists at intermediate positions (scalars
  are still replaced so bare-scalar -> nested overrides keep working).
- set_config_value drops its duplicated navigation logic and calls
  _set_nested instead -- single source of truth for the rules.

Regression tests (tests/hermes_cli/test_set_config_value.py):
- test_indexed_set_preserves_sibling_list_entries -- exact NousResearch#17876 repro
- test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive
- test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path

35/35 existing + new tests pass.

E2E-verified with the issue's repro against a real on-disk config.yaml --
list stays a list, entry 0 updated, entry 1 intact.

Closes NousResearch#17876
Existing test_tar_pipe_commands asserted the literal substring
'tar xf - -C /' in ssh_str, which is no longer present after the
NousResearch#17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'.

Split the one substring check into three independent assertions for
the tar stdin mode, the new --no-overwrite-dir flag (regression guard
for NousResearch#17767), and the extract target.
…esearch#7100)

The NousResearch#1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping.  That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.

Split the failure classification in ``GatewayRunner._run_agent``:

* Context-overflow (``compression_exhausted`` flag, explicit
  context-length phrases, or generic 400 with a long history) → keep
  the existing skip, preserving the NousResearch#1630/NousResearch#9893 fix.
* Anything else that failed → persist just the user message so the
  conversation survives a retry.

Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.

Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing NousResearch#1630 suite still passes.
… fire import (NousResearch#17927)

Three fixes bundled for curator reliability on existing installs and
broken/partial installs:

1. run_agent.py: defer `import fire` into the __main__ block. `fire` is
   only used by `fire.Fire(main)` when running run_agent.py directly as
   a CLI — it is NOT needed for library usage. Importing it at module
   top made `from run_agent import AIAgent` from a daemon thread (e.g.
   the curator's forked review agent) crash with ModuleNotFoundError
   on broken/partial installs where `fire` isn't present.

2. hermes_cli/config.py: add version 22 → 23 migration that writes the
   `curator` + `auxiliary.curator` sections to config.yaml with their
   defaults, only filling keys the user hasn't overridden. Existing
   configs from before PR NousResearch#16049 / the April 2026 `auxiliary.curator`
   unification had neither section on disk, so users couldn't see or
   edit the settings in their config.yaml (runtime deep-merge papered
   over it at read time, but the file never reflected reality).

3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates
   `~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on
   every CLI launch. Managed-mode (NixOS) variant mkdir's it
   defensively after the activation-script existence checks, since the
   activation script may not know about this subpath.

4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as
   belt-and-suspenders for entry paths that bypass both
   ensure_hermes_home() and the v23 migration (gateway-only installs,
   bare library use).

E2E validated in isolated HERMES_HOME: fresh install gets full defaults
seeded; partial-override config keeps user's `enabled: false` and
custom `interval_hours` while filling the missing keys; re-running the
migration is a no-op.
Archived skills (moved to ~/.hermes/skills/.archive/ by the curator)
were still surfaced in the <available_skills> system prompt under a
fake '.archive' category, causing the agent to load and try to use
deprecated skills. The os.walk in iter_skill_index_files() only
excluded .git/.github/.hub.

Add '.archive' to EXCLUDED_SKILL_DIRS, and to the two other places
that hardcode the same exclusion tuple (gateway/run.py and
agent/skill_commands.py).
Widen NousResearch#17639 to the fourth sibling site (tools/skills_tool.py _EXCLUDED_SKILL_DIRS)
and register leoneparise in scripts/release.py AUTHOR_MAP so CI release script
resolves the contributor.
…drain hand-off

Belt-and-suspenders on top of @briandevans' NousResearch#17758 fix.  The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly.  Pin each invariant so a
future refactor can't silently regress them:

1. Normal single-message path still releases _active_sessions[sk] and
   _session_tasks[sk] through end-of-finally.  The NousResearch#17758 follow-up
   moved _release_session_guard under
     if current_task is self._session_tasks.get(session_key)
   For the 99%-common case current_task IS the stored task, so the
   guard must still fire.  Test would fail if the conditional were
   ever tightened in a way that dropped the normal path.

2. Drain-task cancellation releases the session.  If the drain task
   spawned by the in-band hand-off is cancelled mid-handler (e.g.
   /stop fired while draining a follow-up), its own finally must
   fire _release_session_guard.  Without this a cancel would leave
   the session permanently pinned busy.

3. Late-arrival drain still spawns when no in-band drain preceded
   it.  Pre-existing path, but the NousResearch#17758 follow-up added a
   re-queue branch that only fires when ownership was already
   handed off.  When no handoff happened the else branch must still
   spawn a fresh drain task — otherwise a message arriving during
   stop_typing gets silently dropped.

All three tests pass against current main.  Zero production code
changes.
…ousResearch#17782)

bump_use() existed and was tested but had zero production call sites —
use_count stayed 0 for all skills, breaking Curator's stale-detection
logic which relies on last_used_at.

Wire bump_use() into:
1. build_skill_invocation_message() — when a user invokes /skill-name
2. build_preloaded_skills_prompt() — when a skill is preloaded at session start

Both are the canonical 'a skill is actively being used' moments, distinct
from 'browsing' (bump_view in skill_view tool call).

Closes NousResearch#17782
Widen NousResearch#17818 to cover the dominant 'agent actively used this skill' path:
when the model calls the skill_view tool, bump use_count alongside view_count.
The slash-command and --skill preload paths (covered by the cherry-picked
commit) only catch user-initiated invocation; most skill activation happens
via the agent calling skill_view to consume an indexed skill.

Curator's stale-timer keys off last_used_at (agent/curator.py:233), so
without this wire-up agent-created skills would transition to stale
simultaneously regardless of actual use.
* fix(nix): replace magic-nix-cache with Cachix

magic-nix-cache caused recurring CI failures (TwirpErrorResponse
ResourceExhausted) by hitting GitHub Actions Cache's 10 GB limit and
200 req/min rate limit. This was flagged as 'unfixable infra flake' in
NousResearch#17836 but is actually a fixable architecture choice.

Switch to Cachix (dedicated binary cache, no GHA quota dependency):
- Replace DeterminateSystems/magic-nix-cache-action with cachix/cachix-action
- Add cachix-auth-token input to nix-setup composite action
- Pass CACHIX_AUTH_TOKEN secret through all three nix workflows
- continue-on-error: true so cache failures never block CI

Cache 'hermes-agent' is public at hermes-agent.cachix.org.
Devs can pull locally with: cachix use hermes-agent

* fix: correct cachix-action commit SHA pin

---------

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

8 similar comments
Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Copy link
Copy Markdown
Owner

Auto-merge bot: skipping — has conflicts. Manual rebase needed.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.