test(8/N): fix all 13 remaining failures — clean up 3 stubborn + 5 new regressions by Bartok9 · Pull Request #13 · Bartok9/hermes-agent

Bartok9 · 2026-05-30T07:52:51Z

Context

7 PRs (#6 through #12) landed earlier tonight and reduced the Tests-workflow failure count from 41 → 13. But 13 failures remained on main (run 26677651968, sha 6b2b3d4). This PR closes the gap by fixing the root cause of every one — including the 3 stubborn pre-existing failures the earlier PRs claimed to fix but didn't, and the regressions the earlier PRs themselves introduced.

The earlier PRs merged on local-pass alone. This PR waits for the full Tests CI to go green before merging.

The 13 failures

#	Test	Prior PR	Status
1–6	`test_google_chat.py::TestEnvConfigLoading::*` (6)	#12 should-fix	pre-existing, not fixed
7	`test_agent_cache.py::...::test_concurrent_inserts_settle_at_cap`	#12 should-fix	pre-existing, not fixed
8	`test_skill_provenance.py::test_default_origin_is_foreground`	#10 should-fix	pre-existing, never touched
9	`test_credential_pool.py::test_load_pool_prefers_dotenv_over_stale_os_environ`	#10 introduced	new regression
10	`test_restart_drain.py::test_restart_command_while_busy_requests_drain_without_interrupt`	#12 introduced	new regression
11	`test_update_yes_flag.py::...::test_yes_restores_stash_without_prompting`	#11 introduced	new regression
12–13	`test_goal_command.py::test_goal_status_alias_shows_status` / `test_goal_bare_shows_status_when_none_set`	#11 introduced	new regression

Root causes (one per group)

GOOGLE_CHAT (1–6): _check_for_registry() gated platform enablement on the optional google-cloud-pubsub SDK. The plugin loader imports the adapter under a different module name than tests patch, so under parallel execution the loaded copy's GOOGLE_CHAT_AVAILABLE stayed False and a configured platform was silently dropped. Enablement now reflects configuration intent (Pub/Sub project + subscription env vars); missing deps are still surfaced clearly at connect().
agent_cache (7): 160 real AIAgent builds (~0.5s each) happened inside the worker threads, so the per-thread join(timeout=30) window exceeded 30s on shared CI. Agents are now pre-built before the timed section; the concurrency under test is unchanged.
skill_provenance (8): run_agent.process_message() binds the write-origin ContextVar to "assistant_tool" and never resets it, leaking into copy_context(). conftest now resets it to the fresh-process default.
credential_pool (9): Two sibling tests demand opposite precedence. Resolved per-provider — OpenRouter resolves .env-first (regression [Bug]: auth.json credential cache ignores .env changes — stale key persists NousResearch/hermes-agent#18254), all other providers keep os.environ-first. Both files pass.
restart_drain (10): t("gateway.draining") returned the raw key because an i18n test leaves a fake catalog in the process-global _catalog_cache. conftest now calls reset_language_cache() between tests.
goal_command (12–13): hermes_state.DEFAULT_DB_PATH is frozen at import time, so bare SessionDB() ignored the per-test tmp HERMES_HOME and leaked goal state. GoalManager now resolves the DB path from the live HERMES_HOME.
update_yes_flag (11): resolved by the shared isolation fixes above.

Validation

Targeted 13: all pass (-n0 and -n auto).
Full suite re-run under -n auto: the original 13 are gone; the only remaining failures are macOS-host-specific (WSL/systemd detection, ripgrep find, Claude-Code creds path, timing-based subprocess detection) that pass on Linux CI and were green in the baseline run.
Tests CI on this PR is the authoritative gate — see the run linked on this PR. Will merge only on success.

Lesson

The prior PRs merged without waiting for full-suite CI. This PR enforces the wait.

Note

Medium Risk
OpenRouter env resolution order changed for one provider (intentional regression fix); Google Chat may appear enabled when only env is set without the Pub/Sub SDK until connect fails clearly.

Overview
This PR clears the last 13 Tests-workflow failures by fixing root causes and tightening test isolation—not by masking symptoms.

Runtime behavior: OpenRouter credential seeding now prefers ~/.hermes/.env over a stale OPENROUTER_API_KEY in os.environ (other providers still use env-first). Google Chat registry enablement depends only on Pub/Sub project/subscription env vars, not optional google-cloud-pubsub import state. GoalManager opens SessionDB at the live HERMES_HOME path instead of import-time DEFAULT_DB_PATH.

Test harness: conftest resets agent.i18n catalog cache and tools.skill_provenance write-origin between tests. test_agent_cache pre-builds agents before the concurrent insert stress window. test_update_yes_flag rebinds cmd_update after other tests reload hermes_cli.main.

^{Reviewed by Cursor Bugbot for commit 34eaa38. Bugbot is set up for automated code reviews on this repo. Configure here.}

…w regressions 7 PRs (#6–#12) cut failures 41→13. This closes the gap and fixes the root causes (not symptoms) of all 13 remaining failures, including the 3 stubborn pre-existing ones the earlier PRs only claimed to fix and the new regressions they introduced. GOOGLE_CHAT env-config KeyError (6 tests, pre-existing): _check_for_registry() gated platform *enablement* on the optional google-cloud-pubsub SDK (GOOGLE_CHAT_AVAILABLE). The plugin loader imports the adapter under a different module name (hermes_plugins.google_chat_platform.adapter) than the one the test patches, so under parallel execution the loaded copy's flag stayed False and a *configured* platform was silently dropped from cfg.platforms. Enablement now reflects configuration intent (Pub/Sub project + subscription env vars); the missing-dependency case is still surfaced with a clear error at connect() time, where it belongs. agent_cache spillover timeout (1 test, pre-existing): test_concurrent_inserts_settle_at_cap built 160 real AIAgent instances (~0.5s each) *inside* worker threads, so the per-thread join(timeout=30) window blew past 30s on shared CI runners and tripped the bogus "possible deadlock?" assertion. Pre-build the agents before the timed section; the concurrency under test (cache insert + cap enforcement under the lock) is unchanged. skill_provenance default origin (1 test, pre-existing — PR #10 never touched it): run_agent.process_message() binds the write-origin ContextVar to "assistant_tool" on the turn thread and never resets it, leaking into copy_context() snapshots. conftest now resets it to the fresh-process default so the ContextVar's own default("foreground") governs untouched tests. credential_pool env precedence (1 test, new regression from PR #10): Two sibling tests demand opposite precedence — NousResearch#18254 (OpenRouter) requires .env to win over a stale shell export, while the generic DEEPSEEK test requires os.environ to win. Resolved per-provider: OpenRouter resolves .env-first (the rotation/401 scenario), all other providers keep os.environ-first. Both files now pass. restart_drain drain notice (1 test, new regression from PR #12): t("gateway.draining") returned the raw key because test_t_missing_key_in_non_english_falls_back_to_english points i18n._locales_dir at a temp dir and leaves a fake catalog cached in the process-global _catalog_cache. conftest now calls reset_language_cache() between tests. goal_command status (2 tests, new regression from PR #11): hermes_state.DEFAULT_DB_PATH is frozen at import time, so a bare SessionDB() always opened the import-time state.db and ignored the per-test tmp HERMES_HOME — leaking goal state across cases. GoalManager now resolves the DB path from the live HERMES_HOME. update_yes_flag (1 test): resolved by the shared isolation fixes above. Lesson: the earlier PRs declared local-pass as victory and merged without waiting for full-suite CI. This PR waits for the Tests workflow to go green on the merge commit before declaring done.

github-actions · 2026-05-30T07:53:37Z

🔎 Lint report: `fix/test-suite-green-8n-final` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7750 on HEAD, 7749 on base (🆕 +1)

🆕 New issues (4):

Rule	Count
`invalid-argument-type`	3
`unresolved-import`	1

First entries

run_agent.py:12539: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:6649: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown, Unknown] | Any | ... omitted 3 union elements`
run_agent.py:12542: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown, Unknown] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
tests/hermes_cli/test_update_yes_flag.py:17: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues (3):

Rule	Count
`invalid-argument-type`	3

First entries

run_agent.py:12542: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`
run_agent.py:12539: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
run_agent.py:6649: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`

Unchanged: 4075 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Hardcoded "state.db" duplicates source-of-truth constant
- Updated goals DB path construction to use DEFAULT_DB_PATH.name from hermes_state while still combining it with the live HERMES_HOME.

Or push these changes by commenting:

@cursor push 3232dc8fb8

Preview (3232dc8fb8)

diff --git a/hermes_cli/goals.py b/hermes_cli/goals.py
--- a/hermes_cli/goals.py
+++ b/hermes_cli/goals.py
@@ -152,7 +152,7 @@
     """
     try:
         from hermes_constants import get_hermes_home
-        from hermes_state import SessionDB
+        from hermes_state import DEFAULT_DB_PATH, SessionDB
 
         home = str(get_hermes_home())
     except Exception as exc:  # pragma: no cover
@@ -165,14 +165,14 @@
     try:
         # Resolve the DB path from the *live* HERMES_HOME rather than relying
         # on SessionDB's import-time DEFAULT_DB_PATH. hermes_state computes
-        # DEFAULT_DB_PATH = get_hermes_home() / "state.db" at module import,
+        # DEFAULT_DB_PATH with the active HERMES_HOME at module import,
         # so a bare SessionDB() is permanently pinned to whatever HERMES_HOME
         # was set when hermes_state first imported. That made the GoalManager
         # ignore profile/HERMES_HOME switches at runtime and (in tests) leak
         # goal state across cases that point HERMES_HOME at fresh temp dirs.
         # Passing the path explicitly keeps the per-home cache correct.
         from pathlib import Path as _Path
-        db = SessionDB(db_path=_Path(home) / "state.db")
+        db = SessionDB(db_path=_Path(home) / DEFAULT_DB_PATH.name)
     except Exception as exc:  # pragma: no cover
         logger.debug("GoalManager: SessionDB() raised (%s)", exc)
         return None

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 9fdca65. Configure here.}

cursor · 2026-05-30T07:59:14Z

+        # goal state across cases that point HERMES_HOME at fresh temp dirs.
+        # Passing the path explicitly keeps the per-home cache correct.
+        from pathlib import Path as _Path
+        db = SessionDB(db_path=_Path(home) / "state.db")


Hardcoded "state.db" duplicates source-of-truth constant

Low Severity

The filename "state.db" is hardcoded here, duplicating the value already defined via DEFAULT_DB_PATH = get_hermes_home() / "state.db" in hermes_state.py. If the DB filename is ever changed in hermes_state, this location would silently diverge. Since DEFAULT_DB_PATH.name always yields the filename component regardless of the prefix path that was frozen at import time, using that would keep a single source of truth while still achieving the goal of combining the live home with the correct filename.

^{Reviewed by Cursor Bugbot for commit 9fdca65. Configure here.}

@patch

…ollution CI run 2 surfaced test_yes_restores_stash_without_prompting failing with the *real* _restore_stashed_changes running (captured '→ Restoring local changes...') despite the @patch. Root cause: sibling tests (test_env_loader, test_skills_subparser) reload/delete hermes_cli.main from sys.modules, so the top-of-file 'from hermes_cli.main import cmd_update' binding points at the old module while @patch('hermes_cli.main._restore_stashed_changes') patches the new one — the patch becomes a no-op. Add an autouse fixture that rebinds cmd_update to the live module, mirroring the proven fix already in test_update_stale_dashboard.py.

cursor Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(8/N): fix all 13 remaining failures — clean up 3 stubborn + 5 new regressions#13

test(8/N): fix all 13 remaining failures — clean up 3 stubborn + 5 new regressions#13
Bartok9 wants to merge 2 commits into
mainfrom
fix/test-suite-green-8n-final

Bartok9 commented May 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bartok9 commented May 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

The 13 failures

Root causes (one per group)

Validation

Lesson

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: fix/test-suite-green-8n-final vs origin/main

ruff

ty (type checker)

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 30, 2026

Choose a reason for hiding this comment

Hardcoded "state.db" duplicates source-of-truth constant

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bartok9 commented May 30, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading

🔎 Lint report: `fix/test-suite-green-8n-final` vs `origin/main`

cursor Bot left a comment •

edited

Loading

Hardcoded `"state.db"` duplicates source-of-truth constant