Authority: This file is the single source of truth for agent-facing guidance. Tool-specific files (
.claude/CLAUDE.md,.github/copilot-instructions.md) contain only overrides and guardrails — they defer to this file for all shared rules.
contextweaver is a Python library for dynamic context management for tool-using AI agents. It provides two integrated engines:
- Context Engine — phase-specific budgeted context compilation with a context firewall
- Routing Engine — bounded-choice navigation over large tool catalogs via DAG + beam search
Non-goals: contextweaver is not an LLM inference layer and not a tool execution runtime. It prepares context and routes tools but never calls models or executes tools.
| Path | Responsibility |
|---|---|
types.py |
Core dataclasses and enums: SelectableItem, ContextItem, Phase, ItemKind, Sensitivity |
envelope.py |
Result types: ResultEnvelope, BuildStats, DroppedItem, ContextPack, ChoiceCard, HydrationResult, RoutingDecision |
diagnostics.py |
Versioned, payload-safe gateway events and sinks (DiagnosticEvent, DiagnosticSink, JSONL/in-memory sinks) plus deterministic aggregate reports (issues #370/#378). |
inspection.py |
Pure JSON/Markdown report construction for offline context, routing, and artifact inspection without raw payload content (issue #398). |
config.py |
Configuration: ContextBudget, ContextPolicy (incl. overflow_action budget-overflow policy, issue #510), and re-exported ScoringConfig |
_scoring_config.py |
ScoringConfig — candidate-scorer weights, incl. kind_priority + per-Phase phase_overrides (issue #487). Extracted from config.py to keep it ≤300 lines; re-exported there so from contextweaver.config import ScoringConfig is unchanged. |
profiles.py |
Routing and profile config: Mode, RoutingConfig, ProfileConfig, named presets |
protocols.py |
Protocol interfaces: TokenEstimator, EventHook, Summarizer, Extractor, RedactionHook, SensitivityClassifier (ingestion-time labelling, issue #542), MemorySource, Labeler, Retriever, Reranker, ClusteringEngine, RoutingScoreProvider (store protocols re-exported from store/protocols.py). Bundled estimators: HeuristicEstimator (default, script-aware, dependency-free — counts CJK/Kana/Hangul/emoji ≈1 token/char, issue #525), CharDivFourEstimator (raw len // 4 primitive), TiktokenEstimator (exact, falls back to HeuristicEstimator offline). Each carries a stable name for BuildStats.token_estimator. |
store/protocols.py |
Store-layer protocols: EventLog, ArtifactStore, EpisodicStore, FactStore |
store/async_protocols.py |
Async counterparts AsyncEventLog / AsyncArtifactStore / AsyncEpisodicStore / AsyncFactStore (issue #495) — same surface, async def. Consumed only by the async context/ path; backend-agnostic. |
store/async_bridge.py |
to_async(sync_store) — wraps a thread-safe sync backend as the matching async protocol via asyncio.to_thread. Thread-affine backends (SqliteEventLog, check_same_thread=True) are not valid targets (issue #495). |
store/_async_to_sync.py |
Inverse bridges + to_sync(async_store, loop) + is_async_store() (issue #495). Drives async stores on a private _LoopThread so the existing sync pipeline can consume them; ContextManager offloads build to a worker thread when async-backed. Not public API. |
exceptions.py |
Custom exception hierarchy (all errors inherit ContextWeaverError). Each class carries a stable, frozen code (e.g. CW_CONFIG) plus an optional hint; str(exc) renders [code] message (hint: …). Codes are documented in docs/errors.md and golden-listed in tests/test_exceptions.py (issues #635, #637). |
_utils.py |
Text similarity primitives: tokenize(), jaccard(), TfIdfScorer |
secrets.py |
Pure, deterministic secret detection/scrubbing primitives: scrub_secrets(), scrub_secrets_in_list(), contains_secret(), SecretPattern (issue #428). Shared by the firewall secret-scrub, the SecretRedactor hook, the sensitivity classifier, and ChoiceCard scrubbing. No I/O; never weakens a surface (only removes characters). |
_version.py |
Single-source version derived from importlib.metadata; fallback "0.0.0+local" |
_demos.py |
Demo logic for the CLI demo subcommand (exempt from print() rule) |
serde.py |
Serialisation helpers for to_dict / from_dict |
tokens.py |
Built-in token counter (count(), get_token_counter(), heuristic_counter(), TokenCounter alias) plus the provider-estimator registry (register_estimator(), registered_estimators(), estimator_name(), issue #493). The single source of truth for token counts — firewall, sensitivity-redaction placeholders, card budgeting (routing/cards.count_tokens), and FirewallStats/BuildStats numbers all route through it (issues #405/#493/#530); no stray len // 4 literals elsewhere. Owns the tiktoken dependency; offline it falls back to the script-aware HeuristicEstimator. |
store/ |
In-memory data stores: EventLog, ArtifactStore, EpisodicStore, FactStore, StoreBundle |
store/_sqlite_base.py |
Shared SQLite connection + migration scaffolding (WAL, foreign_keys=ON, _contextweaver_schema_version table). Reused by every SQLite-backed store (issue #174). |
store/sqlite_event_log.py |
SqliteEventLog — first persistent EventLog backend; single-process, sync, append-only, schema-versioned (issue #223). |
store/sqlite_episodic.py |
SqliteEpisodicStore — persistent EpisodicStore on _sqlite_base (issue #496). Append-only, ordered by ordinal; search delegates to a transient InMemoryEpisodicStore for byte-identical ranking. Own version table (VERSION_TABLE) so it can share a DB file with the event log / facts. |
store/sqlite_facts.py |
SqliteFactStore — persistent FactStore on _sqlite_base (issue #496). put upserts on fact_id; get_by_key/all sorted by fact_id. Own version table; shareable DB file. |
store/redis_artifacts.py |
RedisArtifactStore — Redis ArtifactStore for multi-process gateways (issue #426). Namespaced keys, optional per-artifact TTL, list_refs via SCAN. Lazy redis import ([redis] extra). |
store/redis_event_log.py |
RedisEventLog — Redis EventLog (issue #426). Items in a hash keyed by id + parallel order list; append-only ordering across processes. Lazy redis import ([redis] extra). |
store/s3_artifacts.py |
S3ArtifactStore — S3-compatible ArtifactStore (issue #426; AWS/MinIO/R2/GCS). {prefix}/{handle}.data + .json objects. Lazy boto3 import ([s3] extra). |
store/json_file_artifacts.py |
JsonFileArtifactStore — filesystem ArtifactStore backend; {enc(handle)}.data + {enc(handle)}.json per artifact, re-instantiable against an existing directory (issue #42). Hardened (issue #497): atomic writes (temp file + os.replace), an in-memory handle→ref index built once on init so list_refs never rescans the directory, and optional max_bytes / max_artifacts quotas raising ArtifactStoreQuotaError. Persists content_hash and percent-encodes handles into filenames so the firewall's artifact:result:… handles are Windows-safe (issue #466). |
store/_json_file_io.py |
Private filesystem helpers for json_file_artifacts.py (keeps it ≤300 lines): on-disk suffix constants, validate_handle (path-traversal defense), encode_handle (percent-encoding), and the atomic_write primitive (issues #466/#497). Not public API. |
store/testing.py |
Store-protocol conformance kit (issue #520): framework-agnostic check_event_log_conformance / check_artifact_store_conformance / check_episodic_store_conformance / check_fact_store_conformance, each taking a factory for an empty backend and asserting the round-trip / ordering / not-found contract. No test-framework import; ships in the core wheel. tests/test_store_conformance.py runs every bundled backend through it. |
summarize/ |
SummarizationRule, RuleEngine, extract_facts() |
summarize/structured.py |
Lossless JSON field projection for the firewall: parse_path / project + StructuredFirewall(keep=[...]). Deterministic, no LLM — keeps an allow-list of JSON paths inline and offloads the rest (issue #406). |
context/ |
Full context pipeline, sensitivity enforcement, view registry, ContextManager |
context/firewall_api.py |
Single-call firewall facade: compact_tool_result / firewalled_tool_result → CompactResult. Composes structured/text strategies, schema-preserving pass-through (reserved _cw sidecar — a caller payload already using _cw raises ConfigError unless overwrite_sidecar=True, #467), the built-in token counter, and fail-closed deterministic mode (issues #399, #402, #403, #404, #405, #406, #467). |
context/manager.py |
ContextManager — thin orchestrator (__init__, properties, drilldown, mixin composition). Public method stubs live in flat partial-class mixins; pipeline logic lives in the delegate modules below (issue #101). |
context/_manager_base.py |
_ManagerState — private-attribute + _build contract the manager mixins inherit and the delegate pipeline modules type their manager parameter against (ContextManager inherits it via the mixins). Not public API (issue #101). |
context/_manager_ingest.py / _manager_build.py / _manager_routing.py |
_IngestMixin / _BuildMixin / _RoutingMixin — partial-class mixins holding ContextManager's ingestion, build, and route/call-prompt method surface as thin delegations; keep manager.py ≤300 lines (issue #101). Not public API. |
context/ingest.py |
Tool-result ingestion helpers (extracted from manager.py to honor the <=300 line guideline). Includes ingest_envelope — the canonical Frame-shaped seam (weaver-spec I-05) that ingests an already-firewalled ResultEnvelope without re-deriving firewalling; raw-output ingest_tool_result / ingest_mcp_result are non-canonical for spec compliance (issue #352). |
context/memory_types.py |
MemoryEntry dataclass + PHASE_SCOPE_PREFERENCES constants for phase-aware memory ingestion (issue #293). |
context/memory_fixture.py |
JsonFixtureMemorySource — deterministic stdlib fixture adapter implementing the MemorySource Protocol from protocols.py (issue #293). |
context/memory_source.py |
memory_entries_to_context_items / select_memory_for_phase helpers that materialise memory entries into budgeted memory_fact candidates (issue #293). |
context/handoff_types.py |
HandoffEntry + SessionHandoffPack dataclasses and canonical handoff category constants (issue #294). |
context/handoff.py |
build_session_handoff_pack / render_handoff_pack — deterministic, budget-aware, sensitivity- and firewall-respecting session continuity snapshot (issue #294). |
context/consolidation_types.py |
ConsolidationPolicy / EpisodeCluster / PromotedFact / ConsolidationReport (+ CONSOLIDATION_REPORT_VERSION) — pure-data config and result types for the memory consolidation engine (issue #498). |
context/consolidation.py |
Memory consolidation engine (issue #498): cluster_episodes (deterministic episodic clustering/dedupe, #679), promote_clusters (fact promotion with provenance + max-sensitivity inheritance, #680; optional fail-closed call_fn merge, #682), decay_episodes / decay_facts (report-only decay over append-only stores, #681), and the consolidate(...) orchestrator → ConsolidationReport. Deterministic; apply=True upserts content-addressed facts (idempotent). Standalone functions (not a ContextManager method) mirroring handoff.py. |
context/_consolidation_helpers.py |
Private deterministic helpers for consolidation.py (clustering canonical text, max-sensitivity, session counting, ISO-timestamp parsing, content-addressed fact IDs, decay predicate) — keeps consolidation.py ≤300 lines. Not public API. |
context/_consolidation_merge.py |
Private optional model-assisted canonicalizer for consolidation (issue #682): refine_canonical_text runs a user-supplied call_fn under fail-closed guardrails (no LLM SDK dep; rejects blank/ungrounded completions that introduce tokens absent from the source cluster, falling back to the deterministic text). Not public API. |
context/explanation.py |
ContextBuildExplanation + CandidateExplanation opt-in debug surface returned by ContextManager.build(..., explain=True) (issue #291); carries resolved_weights (the per-phase scoring weights applied, issue #487). Sister to routing/explanation.py on the routing side. |
context/build_policy.py |
Pure build-pipeline policy helpers (not public API): override_phase_budget / adjust_budget_for_header (budget math), enforce_overflow_policy (ContextPolicy.overflow_action, issue #510), and render_pack_prompt (caller-owned renderer hook, issue #410). Extracted from build.py to keep it within its size ceiling. |
context/classify.py |
Opt-in deterministic ingestion-time sensitivity classification (issue #542): HeuristicSensitivityClassifier (implements the SensitivityClassifier protocol) + detect_sensitivity(). Runs at the start of the pipeline's sensitivity stage and over fact/episode header content; may only raise a label, never lower it. Reuses secrets.contains_secret plus PII markers. |
context/secret_redaction.py |
Opt-in SecretRedactor RedactionHook (issue #428): substring-scrubs secret shapes from an item's text via secrets.scrub_secrets. Registered under the name "secret" for ContextPolicy.redaction_hooks; complements (does not replace) MaskRedactionHook. |
routing/ |
Catalog, ChoiceGraph, TreeBuilder, Router (beam search), card renderer |
routing/filters.py |
Pre-scoring helpers: filter_items(), augment_query(), suggest_clarifying_question() (issues #14, #22, #112, #116) |
routing/manifest.py |
GraphManifest + compute_catalog_hash() for graph metadata and cache invalidation (issue #48, #15) |
routing/normalizer.py |
CatalogNormalizer + NormalizationReport for catalog metadata hygiene (issue #44) |
routing/catalog.py (validation) |
validate_references / Catalog.validate_references → CatalogValidationReport of dangling depends_on/requires refs; loaders take on_invalid ("warn"/"raise"/"ignore") and raise CatalogValidationError in raise mode (issue #519) |
routing/registry.py |
EngineRegistry and bundled TfIdfRetriever / NoOpReranker / JaccardClusteringEngine defaults (issue #47) |
routing/index_cache.py |
Persistent, reusable fitted-index cache (issues #543/#624/#685): RoutingIndexCache (in-process LRU + optional deterministic-JSON on-disk layer) and CachedRetriever (a Retriever wrapper that loads/stores the fitted index keyed by a corpus fingerprint, transparently — warm loads score byte-identically to a cold fit). Pass via Router(retriever=CachedRetriever(TfIdfRetriever(), cache)). Codec + fingerprint live in routing/_index_codec.py. |
routing/_index_codec.py |
Private helper for index_cache.py: index_fingerprint() (deterministic ordered-corpus SHA-256) + the IndexCodec contract and bundled TFIDF_CODEC. Not public API; names re-exported from index_cache. |
routing/trace.py |
RouteTrace + TraceStep structured routing audit (issue #51) |
routing/explanation.py |
RouteResult.explanation() Markdown / dict rendering (issue #226) |
routing/pipeline.py |
RoutingPipeline composer — explicit retrieve → rerank → navigate → pack stages (issue #56) |
routing/navigator.py |
BeamSearchNavigator (lifted from router.py) + rank_collected — the score-sort/active-filter helper is re-exported from routing/__init__.py for custom Navigator implementations (issues #56, #288) |
routing/packer.py |
DefaultCardPacker wrapping make_choice_cards for the pipeline pack stage (issue #56) |
routing/history.py |
RouteHistory dataclass + adjust_scores (history-aware re-routing, issue #27) |
routing/feedback.py |
Optional feedback-aware routing scores (issue #318): ExecutionFeedback (contextweaver-native, not a weaver-spec type), DeterministicScoreProvider (default no-op), FeedbackAwareScoreProvider, aggregate_feedback. Plugs into Router(score_provider=...); default None keeps routing deterministic. |
routing/selection.py |
Structured route→select contract (issues #515/#479), both pure/deterministic: selection_schema emits the routed candidate IDs as a provider-native constrained-selection schema (json_schema/openai/anthropic) so a model can only pick a routed tool_id ("constrain before"); validate_selection → SelectionValidation validates/repairs (strip → case-fold → unique-prefix; ambiguous matches rejected, never guessed) a returned ID against the candidates ("validate after"). Surfaced on RouteResult.selection_schema() / RouteResult.validate_selection(); to_routing_decision resolves + records the outcome. Shortlist composition (pin_ids always-include + per-namespace namespace_quota, issue #509) lives in routing/filters.compose_shortlist and is exposed via Router.route(...). |
extras/embeddings.py |
SentenceTransformerBackend + HybridEmbeddingRetriever + HashingEmbeddingBackend (re-exported) behind the [embeddings] extra (issue #8) |
extras/embeddings_hashing.py |
HashingEmbeddingBackend — stdlib-only deterministic EmbeddingBackend using blake2b hashing trick; no extras required (issue #266) |
_schema_gen.py |
Dataclass → JSON Schema (Draft 2020-12) generator + make schemas-check engine (issue #225) |
routing/tool_id.py |
Canonical tool_id grammar (parse_tool_id / format_tool_id / compute_hash8) per docs/gateway_spec.md §1 |
routing/primitive_id.py |
Unified cross-primitive identity + collision policy for tools/resources/prompts (parse_primitive_id / format_primitive_id / canonical_resource_id / canonical_prompt_id / resolve_collisions) per docs/gateway_spec.md §9. Tools keep the bare tool_id; resources/prompts get disjoint kind:: ids (issue #671). |
routing/path.py |
tool_browse path-navigation grammar (parse_path / resolve_path) per docs/gateway_spec.md §3 |
routing/hydration.py |
Public schema-hydration helpers — SchemaSource (from raw dict / JSON file / MCP tools-list), hydrate_with_schema, lazy_schema_resolver. Reference architectures use these to resolve a tool's full input schema from a sidecar source rather than hand-rolling a _FULL_SCHEMAS dict. Inline args_schema on the catalog item wins; sidecar only fills empties. Issue #261. |
adapters/ |
MCP, FastMCP, A2A, weaver-spec, CrewAI, Pydantic AI, smolagents, Agno, LangChain, OpenAI Agents SDK, Google ADK, Microsoft Agent Framework, OpenAPI, Agent Skills protocol adapters + MCP proxy / gateway runtime + provider-message ingestion helpers for OpenAI / Anthropic / Gemini chat histories. Framework tool-catalog adapters share adapters/_framework_common.py (issue #454). (issues #13, #28, #29, #34, #193, #194, #219, #222, #272, #274, #275, #430, #454, #501, #502, #545, #546, #547) |
adapters/chainweaver.py |
ChainWeaver flow-export → SelectableItem(kind="flow") import (chainweaver_flow_to_selectable, chainweaver_flows_to_catalog, load_chainweaver_export, issue #334). Pure data; no ChainWeaver dependency. Preserves name/description/input+output schemas; stamps metadata["runtime"]="chainweaver" + flow id/version. |
adapters/crewai.py |
CrewAI BaseTool (or equivalent plain-dict shape) ↔ SelectableItem (crewai_tool_to_selectable, crewai_tools_to_catalog, infer_crewai_namespace, load_crewai_catalog, issue #193) |
adapters/pydantic_ai.py |
Pydantic AI Tool ↔ SelectableItem and ModelMessage ↔ ContextItem lossless round-trip (pydantic_ai_tool_to_selectable, pydantic_ai_tools_to_catalog, load_pydantic_ai_catalog, from_/to_pydantic_ai_messages, issue #272) — heavy decode/encode helpers live in adapters/_pydantic_ai_messages.py |
adapters/smolagents.py |
Hugging Face smolagents Tool ↔ SelectableItem and MultiStepAgent.memory.steps → ContextItems (smolagents_tool_to_selectable, smolagents_tools_to_catalog, load_smolagents_catalog, from_smolagents_agent, issue #274) |
adapters/agno.py |
Agno (formerly Phidata) Function / Toolkit ↔ SelectableItem and AgentSession → ContextItems (agno_tool_to_selectable, agno_tools_to_catalog, load_agno_catalog, from_agno_session, issue #275) |
adapters/_framework_common.py |
Shared, framework-agnostic conversion scaffolding for the framework tool-catalog adapters (issue #454): infer_namespace, strip_namespace_prefix, coerce_schema_dict, collect_tags, require_name_description. Pure/stateless, imports no framework lib. Private — not exported. New adapters reuse these instead of re-implementing namespace/schema/tag mechanics. |
adapters/langchain.py |
LangChain BaseTool (or equivalent plain-dict shape) ↔ SelectableItem (langchain_tool_to_selectable, langchain_tools_to_catalog, infer_langchain_namespace, load_langchain_catalog, issue #502). [langchain] extra for live loading; plain-dict path needs no extra. |
adapters/openai_agents.py |
OpenAI Agents SDK FunctionTool ↔ SelectableItem and run items → ContextItems (openai_agents_tool_to_selectable, openai_agents_tools_to_catalog, load_openai_agents_catalog, from_openai_agents_run, issue #501). Run-item ingestion lives in adapters/_openai_agents_run.py. [openai-agents] extra for live loading. |
adapters/google_adk.py |
Google ADK tools ↔ SelectableItem and Session.events → ContextItems (google_adk_tool_to_selectable, google_adk_tools_to_catalog, load_google_adk_catalog, from_google_adk_session, issue #547). Session ingestion lives in adapters/_google_adk_session.py. [google-adk] extra for live loading. |
adapters/agent_framework.py |
Microsoft Agent Framework (AutoGen / Semantic Kernel lineage) tools ↔ SelectableItem and thread ChatMessages → ContextItems (agent_framework_tool_to_selectable, agent_framework_tools_to_catalog, load_agent_framework_catalog, from_agent_framework_thread, issue #430). Thread ingestion lives in adapters/_agent_framework_thread.py. [agent-framework] extra for live loading. |
adapters/openapi.py |
OpenAPI 3.0/3.1 operations → SelectableItem catalog (openapi_operation_to_selectable, openapi_spec_to_catalog, load_openapi_catalog, infer_openapi_namespace, issue #546). Routes over REST APIs; never calls them. Local $ref resolution + parameters/requestBody → args_schema composition + method→safety tags live in adapters/_openapi_schema.py. No extra — PyYAML/jsonschema are core. |
adapters/agent_skills.py |
Agent Skills (SKILL.md) directories → kind="skill" SelectableItems with lazy body hydration (skill_to_selectable, load_skills_catalog, parse_skill_frontmatter, SkillBodySource, issue #545). Frontmatter routes; SkillBodySource resolves the body/resources on selection (mirrors routing/hydration.SchemaSource). No extra — PyYAML is core. |
adapters/proxy_runtime.py |
ProxyRuntime shared core + ExposureMode enum + UpstreamCall Protocol (issue #29) |
adapters/gateway_diagnostics.py / gateway_catalog_diagnostics.py |
Sanitized ProxyRuntime instrumentation plus exact gateway/proxy static-schema exposure calculations: catalog, browse/hydrate/execute/view events, savings, artifact-view usage, and latency (issues #370/#378). |
adapters/mcp_gateway.py |
Two-tool gateway dispatch (tool_browse + tool_execute + tool_view, issues #28 / #34) |
adapters/mcp_proxy.py |
Transparent proxy dispatch (stripped tools/list + tool_hydrate + tool_execute, issue #13) |
adapters/mcp_upstream.py |
Concrete UpstreamCall adapters (StubUpstream, McpClientUpstream, MultiplexUpstream) |
adapters/mcp_gateway_server.py |
Bind mcp_gateway onto mcp.server.Server over stdio (issue #28); optional primitive_runtime= also advertises/dispatches the four resource/prompt meta-tools (issues #669/#670) |
adapters/mcp_primitives.py |
MCP resource/prompt → SelectableItem(kind="resource"/"prompt") converters + resources/read / prompts/get result→envelope wrappers (issues #669/#670). Emits ids via routing/primitive_id. |
adapters/gateway_primitives.py |
PrimitiveGatewayRuntime + PrimitiveUpstream Protocol — bounded-choice routing + firewall for resources/prompts, sharing the tool runtime's ContextManager (issues #669/#670/#555). |
adapters/_primitive_index.py |
Private single-kind catalog+graph+router+browse helper for gateway_primitives (keeps it ≤300 lines). Not public API. |
adapters/mcp_gateway_primitives.py |
The four resource/prompt gateway meta-tools (resource_browse / resource_read / prompt_browse / prompt_get) + dispatch, mirroring mcp_gateway (issues #669/#670). |
adapters/mcp_primitive_upstream.py |
Concrete PrimitiveUpstream adapters mirroring mcp_upstream: StubPrimitiveUpstream (in-process), McpClientPrimitiveUpstream (wraps an MCP ClientSession), MultiplexPrimitiveUpstream (multi-server fan-out). Transport errors raise (the runtime classifies them) per the Protocol contract (issues #669/#670). |
adapters/mcp_proxy_server.py |
Bind mcp_proxy onto mcp.server.Server over stdio (issue #13) |
adapters/sidecar_contract.py |
HTTP sidecar wire contract (issue #674): RouteRequest/RouteResponse/CompactRequest/CompactResponse/SidecarError dataclasses + SIDECAR_API_VERSION. Pure, dependency-free; the published JSON Schemas live under schemas/sidecar/v1/. |
adapters/sidecar.py |
HTTP sidecar runtime (issue #675/#676): SidecarConfig + SidecarApp.dispatch — transport-free (method, path, headers, body) → (status, json) over the sync Router (/v1/route) and compact_tool_result facade (/v1/compact). Optional bearer-token auth, per-client rate limiting (reuses gateway_controls.RateLimiter), body-size cap, and typed SidecarError responses; never raises across the HTTP boundary. |
adapters/_sidecar_http.py |
Stdlib http.server.ThreadingHTTPServer binding for SidecarApp (issue #675). No third-party dependency. Public re-exports: serve_api (blocking serve) + make_sidecar_server (build-only, for tests). Not public API itself. |
adapters/_sidecar_validation.py |
Stateless parsing + field-validation helpers shared by sidecar.py and sidecar_contract.py (request-body JSON decode, bearer-token extraction, typed contract-field coercions). Pure, dependency-free; raises ConfigError on malformed input. Not public API. |
adapters/gateway_error.py |
Structured GatewayError (codes + §3.4 wire shape) + retryable hint. Upstream-error taxonomy: classify_upstream_exception maps timeouts/connection/auth/permission/rate failures to UPSTREAM_TIMEOUT/UPSTREAM_UNAVAILABLE/AUTH_FAILED/PERMISSION_DENIED/RATE_LIMITED (fallback UPSTREAM_ERROR); redact_upstream_detail strips control chars + caps length on model-visible detail (issue #485). |
adapters/gateway_validation.py |
Untrusted-schema hardening for the gateway ingest path (issues #464/#484): SchemaLimits/SchemaFinding/SkippedTool/CatalogRefreshReport, check_schema_health (meta-validation + iterative size/depth/property bounds), build_validator (cached per tool_id). Pure, deterministic; iterative traversal avoids stack exhaustion on hostile schemas. |
adapters/gateway_args.py |
Opt-in deterministic tool-call argument repair (issue #488): normalize_args (stringified-object parse + schema-demanded str→int/number/boolean/null coercion) + Repair. Gated behind ProxyRuntime(tolerant_args=True); never renames keys, drops keys, or fuzzy-matches. |
adapters/gateway_policy.py |
Pure-data config + result types for the dispatch-path controls (issues #529/#482/#483): RetryPolicy (bounded backoff), RateLimit/RateLimitPolicy (per-session quotas), DryRunReport. All defaults inert; to_dict/from_dict for mcp serve --config. |
adapters/gateway_controls.py |
Runtime mechanisms behind gateway_policy (issues #529/#482/#512): call_with_retry (retry loop, injectable sleep), RateLimiter (sliding-window + cumulative counters, injectable clock), ToolResultCache (TTL+LRU read-only response cache). All opt-in; wired into ProxyRuntime.execute/browse/view. |
adapters/openai_messages.py |
OpenAI Chat Completions messages ↔ ContextItem round-trip (from_/to_openai_messages, issue #219) |
adapters/anthropic_messages.py |
Anthropic Messages API messages ↔ ContextItem round-trip (from_/to_anthropic_messages, issue #222) |
adapters/gemini_contents.py |
Google Gemini contents[] ↔ ContextItem round-trip (from_/to_gemini_contents, issue #222) |
extras/otel.py |
OpenTelemetry GenAI integration (OTelEventHook — invoke_agent / execute_tool spans + GenAI SemConv attributes, gated behind the [otel] extra, issue #224). |
extras/llm_summarizer.py |
Optional LlmSummarizer / LlmExtractor — LLM-backed Summarizer / Extractor plugins for the firewall. Take a user-supplied call_fn (no LLM SDK dep, no extra) and degrade to the rule-based path on any failure (issue #26). |
extras/memory/ |
External-memory backend adapters that implement EpisodicStore / FactStore against an existing long-lived memory deployment without widening the Protocols (issue #195). |
extras/memory/mem0.py |
Mem0EpisodicStore + Mem0FactStore — wrap a mem0.Memory instance scoped by user_id; writes go through Memory.add(infer=False) and items are stamped with cw_episode_id / cw_fact_id metadata for canonical-ID resolution. Gated behind the [mem0] extra (issue #195). |
extras/memory/zep.py |
ZepEpisodicStore + ZepFactStore — wrap a zep_cloud.Zep client scoped by user_id; persist items as JSON graph episodes (graph.add(type="json")) stamped with cw_* IDs, resolving back via graph.episode.get_by_user_id. Episodic search is client-side (Zep graph search is edge/node-shaped). Gated behind the [zep] extra (issue #195). |
extras/memory/_zep_common.py |
Internal helpers backing zep.py (keeps it ≤300 lines): shared cw_* constants, the ZepBackendError exception, the JSON/scan helpers (_episode_records / _episode_uuid / _episode_payload), the defensive payload-coercion helpers (_coerce_str_tags / _coerce_metadata), and the _ZepStoreBase scope/scan/write base. Carries the same [zep]-extra import guard (issue #195). |
extras/memory/langmem.py |
LangMemEpisodicStore + LangMemFactStore — wrap any LangGraph BaseStore scoped by a namespace tuple; canonical ID is the store key, value is the dataclass to_dict() payload (direct, lossless KV). search delegates to BaseStore.search. Gated behind the [langmem] extra (issue #195). |
eval/ |
Evaluation harness (issue #12): EvalCase / EvalDataset (gold datasets), evaluate_routing → RoutingEvalReport (top-k recall, MRR, confidence gap, beam steps), evaluate_context → ContextEvalReport (budget utilisation + token savings vs naive concat). Pure-stdlib, deterministic; backs the eval CLI subcommand. |
eval/consolidation.py |
Consolidation quality evaluation harness (issue #683): evaluate_consolidation → ConsolidationEvalReport (precision / coverage against an optional gold set + dedup ratio). Pure-stdlib, offline, deterministic. |
eval/metrics.py |
Canonical rank-based routing metrics — recall_at_k (classic fractional recall@k), precision_at_k, reciprocal_rank (issue #354). Single source of truth imported by both eval/routing.py and benchmarks/benchmark.py so the harness and the benchmark script can no longer define the same names with different semantics. |
__main__.py |
CLI: 11 subcommands (demo, build, route, print-tree, init, ingest, replay, stats, inspect, budget-check, eval) plus the mcp and catalog Typer sub-apps. inspect renders payload-safe context/routing/artifact JSON or Markdown (issue #398); catalog lint surfaces NormalizationReport + reference findings with --json and CI exit codes (issue #538). |
_mcp_cli.py |
Backs the mcp Typer sub-app. Hosts mcp serve, mcp inspect, mcp stats, and mcp generate-configs; accepts native contextweaver, raw MCP tools/list, and {tools:[...]} catalog shapes. mcp serve --diagnostics FILE appends sanitized JSONL and --quiet suppresses lifecycle stderr; both are config-file keys. mcp serve --state-dir DIR (config key state_dir) persists gateway state — events.sqlite3 + artifacts/ — so artifact handles and event history survive a restart (issue #511); omit it for the in-memory default. mcp generate-configs emits deterministic multi-client recipe artifacts from one canonical mcp serve --config input (issue #659). |
data/ |
Packaged data files shipped inside the wheel via [tool.setuptools.package-data]. Exposes gateway_catalog_path() (resolves mcp_gateway_catalog.yaml to a concrete Path for both editable installs and zipped wheels — falls back to a persistent cache under tempfile.gettempdir()/contextweaver/ for zipimport). Issue #264. |
examples/recipes/ |
MCP-client integration recipes: installed-CLI configs for Claude Desktop, Claude Code, GitHub Copilot, and Cursor plus gateway_config.yaml; serve_gateway.py remains a legacy/custom-runtime launcher (issues #278, #279, #346, #371, #429, #437). |
Context Engine — 8 stages:
generate_candidates→ 2.dependency_closure→ 3.sensitivity_filter→apply_firewall→ 5.score_candidates→ 6.deduplicate_candidates→select_and_pack→ 8.render_context
Routing Engine — 4 stages:
Catalog→ 2.TreeBuilder→ 3.Router(beam search) → 4.ChoiceCards
The Router itself composes a four-stage RoutingPipeline internally
(retrieve → rerank → navigate → pack, issue #56). Each stage is
swappable via the EngineRegistry or by passing a custom
RoutingPipeline to Router(pipeline=...). History-aware re-routing
(Router.route(history=...), issue #27) and the optional embedding
retriever (Router(embedding_backend=...), issue #8) plug into this
same pipeline contract.
For full pipeline descriptions and design rationale, see docs/agent-context/architecture.md.
| Type | Purpose |
|---|---|
SelectableItem |
Unified tool/agent/skill/flow/internal item (kind="flow" = external multi-step capability, e.g. a ChainWeaver flow). Deprecated alias: ToolCard (use SelectableItem in code; removal in 1.0 — see docs/upgrading.md). |
ContextItem |
Event log entry with parent_id for dependency closure |
ResultEnvelope |
Processed tool output: summary + facts + artifacts + views |
ContextPack |
Rendered prompt + stats from a context build |
BuildStats |
What was kept, dropped, and why. total_candidates is pre-sensitivity; dropped_items attributes every exclusion; completed builds satisfy included + dropped = total. Carries firewall_events + firewall_summary() (issues #402/#414/#459) and token_estimator — the estimator-path identifier that produced the build's token numbers (issue #493). |
FirewallStats |
Per-firewall diagnostics: triggered, strategy, original/summary chars+tokens, artifact_ref, summarized_by_llm (issues #402 / #404) |
CompactResult |
Output of the single-call compact_tool_result facade: firewalled, payload, summary, facts, artifact_ref, stats (issue #399) |
StructuredFirewall |
Non-summarising firewall strategy — keep an allow-list of JSON paths inline, offload the rest (issue #406) |
ChoiceCard |
LLM-friendly compact card (never includes full schemas) |
RoutingDecision |
Routing output shaped for weaver-spec interop (id, choice_cards, timestamp, selection). choice_cards is a flat list of CW 1:1 cards; for schema-valid spec JSON, go through adapters.weaver_contracts.to_weaver_routing_decision(). Build with RouteResult.to_routing_decision(...). |
ChoiceGraph |
Bounded DAG for routing, serializable, validated on load |
GraphManifest |
Build-time metadata attached to every routing graph (hash, seed, engine versions, timestamp) |
RouteTrace |
Always-populated structured audit of a routing call; per-step expansions opt-in via debug=True |
EngineRegistry |
Pluggable registry for Retriever, Reranker, ClusteringEngine slots |
Mode |
Determinism mode (strict / seeded / adaptive placeholder) on ProfileConfig |
MaskRedactionHook |
Built-in redaction hook for sensitivity enforcement |
HydrationResult |
Result of hydrating a tool call with context |
ConsolidationReport |
Deterministic result of a consolidate() run: episode clusters, promoted facts (with provenance + inherited sensitivity), and report-only decayed episode/fact IDs (issue #498) |
ViewRegistry |
Maps content-type patterns to view generators for progressive disclosure |
ProxyRuntime |
Shared core for MCP proxy (#13) and gateway (#28) modes — owns upstream catalog, per-session ContextManager, browse / execute / view dispatch; persisted text results are returned as envelope artifact refs for tool_view. Hardens the untrusted-input boundary (issues #464/#484/#485/#488): on_invalid (skip/raise) + schema_limits + last_refresh_report at ingest, cached per-tool_id validators, classified+redacted upstream errors, and opt-in tolerant_args. Opt-in dispatch-path controls (issues #529/#482/#512/#483): retry_policy, rate_limiter, result_cache, and tool_execute(dry_run=True) — all inert by default; catalog refresh rebuilds all derived state atomically (#507). |
ExposureMode |
TRANSPARENT (#13) vs GATEWAY (#28) for ProxyRuntime |
UpstreamCall |
Transport-agnostic Protocol over upstream MCP fan-out (used by ProxyRuntime) |
PrimitiveGatewayRuntime |
Resource/prompt counterpart to ProxyRuntime: bounded-choice browse + firewalled read/get over MCP resources and prompts, sharing the tool runtime's ContextManager (issues #669/#670) |
PrimitiveUpstream |
Transport-agnostic Protocol for upstream resources/list / resources/read / prompts/list / prompts/get (sibling of UpstreamCall) |
GatewayError |
Structured error payload (§3.4) returned from every gateway/proxy meta-tool. Carries a retryable hint and a classified upstream-error taxonomy (issue #485) plus the SCHEMA_INVALID ingest code (issue #484) |
ToolIdParts |
Destructured canonical tool_id (namespace / name / version / hash8) |
Vocabulary notes:
SelectableItemis the canonical name.ToolCardis a deprecated alias (documentation-only deprecation, scheduled for removal in 1.0 — see docs/upgrading.md) — useSelectableItemin code and docs.- "Context" is overloaded — can mean
ContextItem,ContextPack, the pipeline, or the LLM context window. Disambiguate when unclear. See docs/concepts.md. - "Firewall" here means context firewall (prevents large outputs from consuming the token budget), not a security firewall.
make fmt # ruff format src/ tests/ examples/ scripts/
make lint # ruff check src/ tests/ examples/ scripts/
make type # mypy src/ examples/ scripts/ (examples + scripts gated too, #539)
make test # python -m pytest --cov=contextweaver --cov-report=term-missing -q
make example # run all example scripts (includes architectures via the umbrella target)
make architectures # run reference architecture scripts under examples/architectures/
make demo # python -m contextweaver demo
make ci # fmt + lint + type + test + drift-check + module-size-check + doc-snippets-check + readme-version-check + security-policy-check + example + demo
make docs # mkdocs build --clean (docs site)
make docs-serve # mkdocs serve (live preview)
make benchmark # run benchmark harness (non-gating; writes benchmarks/results/latest.json)
make benchmark-matrix # benchmark + per-backend × per-size matrix (#208) and per-namespace breakdown (#209)
make gateway-scorecard-check # verify gateway scorecard matches its committed JSON (gating CI; #391)
make record-demos-check # verify committed demo casts match current output (gating CI; #390)
make smoke-eval # non-gating CI smoke-evaluation over fixed fixtures (#331/#392); deterministic, credential-free
make scorecard # render benchmarks/scorecard.md from benchmarks/results/latest.json
make scorecard-check # verify scorecard.md is up to date (exits non-zero on drift)
make schemas # regenerate schemas/ + docs/schemas/v0/ (issue #225)
make schemas-check # verify published schemas match dataclasses (gating, in `make ci`)
make drift # regenerate every committed generated artifact (issue #522)
make drift-check # one gate over all generated-artifact drift checks (in `make ci`, #522)
make api # regenerate api/public_api.txt (public-API manifest, #518)
make api-check # verify the public-API manifest matches the surface (in drift-check, #518)
make module-size-check # enforce the ≤300-line convention; frozen baseline (gating, #456)
make module-size-update # re-snapshot scripts/module_size_baseline.json (deliberate use only)
make doc-snippets-check # execute README + curated docs Python snippets (gating, #526)
make sweep-scoring # weight sweep for ScoringConfig (#214); writes benchmarks/sweep_scoring.md
make context-rot # render the context-rot demo: benchmarks/results/context_rot.json + docs/assets/context_rot.svg (#349)
make context-rot-check # verify context_rot.svg matches its committed JSON (gating in CI; exits non-zero on drift)
make readme-version-check # verify README version references match pyproject.toml (gating in CI; #347)
make security-policy-check # verify SECURITY.md supported series + links match pyproject.toml (gating in CI; #691)
make llms # regenerate llms.txt and llms-full.txt from canonical docs
make llms-check # verify llms.txt and llms-full.txt are up to date (gating in CI; #389)
make weaver-conformance # round-trip + JSON-Schema validate the weaver-spec adapter (CI gating, fetches schemas)Run pre-commit install once after cloning to activate git hooks
(ruff format + check + file hygiene on every commit).
For command-selection rules and sequencing, see docs/agent-context/workflows.md.
These are auto-reject in review. No exceptions.
- No
print()in library code. Use hooks or logging.__main__.pyand_demos.py(CLI) are exempt. - No business logic in
__init__.py. Only re-exports allowed.
These are strongly recommended. Engineering judgment applies — deviate with good reason.
- Text similarity in
_utils.pyonly —tokenize(),jaccard(),TfIdfScorerare the single source of truth. Do not duplicate. from __future__ import annotationsin every source file.- All exceptions from
contextweaver.exceptions— use the custom hierarchy, not bareValueError/RuntimeError. A new exception class needs a unique stablecode, aGOLDEN_CODESentry intests/test_exceptions.py, and a section indocs/errors.md. to_dict()/from_dict()on all dataclasses — complementsserde.py; they are not redundant. See invariants.- Deterministic by default — tie-break by ID, sorted keys. No randomness in core pipelines.
- No wildcard imports — never use
from contextweaver import *. - Event log is append-only — mutate only via
InMemoryEventLog.append().
- Python ≥ 3.10 — use
X | Yunion syntax,matchstatements where appropriate. - Type hints on all public functions and methods.
- Google-style docstrings on all public classes and functions.
- 100-character line length (enforced by ruff).
- ≤ 300 lines per module — exempt:
types.py,envelope.py,__main__.py,_mcp_cli.py(experimental Typer sub-app; size is dominated by Typer parameter declarations + docstrings and is expected to shrink oncemcp servegraduates from[experimental]to stable), and_demos.py(CLI demo-output module — print-heavy walkthrough scripts backing thedemosubcommand, same rationale as__main__.py). Enforced mechanically bymake module-size-check(issue #456): new non-exempt modules must stay ≤300 lines, and pre-existing oversized modules are grandfathered at their current size inscripts/module_size_baseline.jsonand frozen — they may shrink but may not grow past their recorded ceiling. Decomposing a grandfathered module lowers its ceiling viamake module-size-update. - Core runtime dependencies. The core install pulls
tiktoken,PyYAML,rank-bm25, plusmcpandjsonschema(added when the proxy / gateway runtimes landed — both are load-bearing fordocs/gateway_spec.md§4.4 schema validation and the MCP transport binding). Adding another core dependency requires explicit justification: broad ecosystem use, small wheel, and a default the library would otherwise have to approximate. Heavy or runtime-specific packages (CLI, OpenTelemetry, fuzzy retrieval, ANN, NetworkX, FastMCP, LangChain) live under[project.optional-dependencies]and are loaded via guarded imports. - Dependency-constraint policy (issue #356). Specifiers are lower-bound-only (
>=), set to the lowest version actually known to work — no==pins, no speculative upper caps. The only caps kept carry an inline rationale (pre-1.0weaver_contracts<1; docs-extra major pins). Two CI jobs enforce this: a gating floor-deps job (uv pip install --resolution lowest-direct, Python 3.10, inci.yml) proves the>=floors, and a non-gating weeklydeps-latest-weekly.yml(latest + pre-releases) is the no-upper-cap safety net. Raising a floor is a real change — verify with the floor-deps job. Python support is 3.10–3.13, every cell gating in the CI matrix (3.14 is pending — the heavy dev/adapter stack still caps atRequires-Python <3.14).
- Tests in
tests/test_<module>.py— one file per module. pytest.mark.asynciofor async tests (asyncio_mode = "auto"is set globally).- Do not mock internal modules — use real in-memory implementations.
store/ — Protocols are backend-agnostic (must not import backend-specific libraries). Concrete implementations may import backend libs. Must implement the protocol from protocols.py. Data is append-only / immutable-after-write.
adapters/ — Pure stateless converters. External format parsing must not leak into core. May import optional external libraries at the adapter boundary only.
context/ — Async-first. All new code should be async with _sync wrappers.
routing/ — Sync-only. Pure computation (DAG traversal, beam search). Do not make async.
Sensitivity (context/sensitivity.py) — Security-grade code. Extra review scrutiny required. Never weaken defaults. Treat changes like security-sensitive code. Redaction is effective end-to-end: MaskRedactionHook drops the item's artifact_ref and stamps metadata["redacted"]=True so the rendered prompt never advertises a handle that drilldown could dereference back to the original (issue #451). Enforcement is also applied on the prompt header (facts + episode summaries are routed through the floor and the memory_fact phase policy, issue #450) and an opt-in SensitivityClassifier may raise labels before enforcement (issue #542).
- Protocol-based store design — the protocol layer exists for backend extensibility. Do not collapse protocols into concrete classes.
dependency_closurepipeline stage — if a selected item hasparent_id, the parent must be included. Removing it produces incoherent context (tool results without their tool calls).serde.py+ per-classto_dict/from_dict— complementary, not redundant.serde.pyprovides shared primitives; per-class methods handle class-specific serialization. Do not consolidate.ContextManagermixin composition — its public method surface lives in flat partial-class mixins (_IngestMixin/_BuildMixin/_RoutingMixin) sharing the_ManagerStatebase. Do not "simplify" this into delegating composition (manager.ingest.x()): that would change the public method surface, which issue #101 forbids ("ContextManager still exposes all current methods"). Mixins were the deliberate trade-off to keepmanager.py≤300 lines without breaking the public API.
See docs/agent-context/invariants.md for the full invariants list and rationale.
make lint— check for style and import errors.make type— check for type errors.make test— run the test suite.- Check
BuildStatsfields to understand what the context engine dropped and why. - Use
ContextManager.artifact_store.list_refs()to inspect intercepted tool outputs. - Enable
logging.DEBUGoncontextweaver.contextto trace pipeline stages (candidate counts, scores, drops, budget usage). - Enable
logging.DEBUGoncontextweaver.routingto trace beam search expansions and scoring.
- Identify the relevant module, modify it, add tests in
tests/test_<module>.py. - Run
make cito verify (all declared targets must pass). - Update
CHANGELOG.mdand add docstrings to new public APIs. - Update agent-facing docs and examples if the pipeline or public API changed.
- If the feature can move recall@k / drops / dedup / token counts: follow
.github/prompts/add-eval.prompt.mdto extend the gold set or scenarios and runmake benchmark-matrix && make scorecard. CI will post a sticky benchmark-delta comment on the PR.
For the full workflow and definition of done, see docs/agent-context/workflows.md.
See docs/agent-context/lessons-learned.md for durable recurring mistakes and how to avoid them.
| File | Role |
|---|---|
AGENTS.md (this file) |
Primary shared source of truth for all agents |
docs/agent-context/architecture.md |
Non-obvious architectural guidance and tradeoffs |
docs/agent-context/workflows.md |
Authoritative commands, sequencing, definition of done |
docs/agent-context/invariants.md |
Hard constraints and forbidden shortcuts |
docs/agent-context/lessons-learned.md |
Failure-capture workflow and durable lessons |
docs/agent-context/review-checklist.md |
Self-check and review gates |
docs/architecture.md |
Canonical architecture reference (full pipeline detail, diagrams) |
docs/concepts.md |
Core concept glossary (types, subsystems, phases) |
CONTRIBUTING.md |
Human contributor guide |
When architecture details conflict, docs/architecture.md is the canonical reference.
- Update
AGENTS.mdwhen shared rules, conventions, or the module map change. - Update
docs/agent-context/files when their specific topic area changes. - Any PR that changes the pipeline, public API, or project conventions must include doc updates.
- If two docs disagree,
AGENTS.mdis authoritative for agent guidance;docs/architecture.mdis authoritative for architecture detail. - See docs/agent-context/workflows.md for documentation governance rules.