From fb3d24f37cf35b14f87b6955c01ec511ea4f6625 Mon Sep 17 00:00:00 2001 From: Jim Park Date: Wed, 8 Apr 2026 23:54:51 -0700 Subject: [PATCH] feat(openspec): backport existing functionality into baseline specs Add 15 OpenSpec specification files under openspec/specs/ that document the existing behavior of all memex-core modules as baseline requirements. These specs serve as the authoritative reference for current functionality and will be the foundation for future change proposals. Specs organized by module: - embeddings: EmbeddingProvider, Local/OpenAI providers, cosineSimilarity - skill-index: SkillIndex, parseFrontmatter, parseMemoryFile, search - cache: Cache v2 schema, atomic writes, mtime-gating - config: DEFAULT_CORE_CONFIG, resolveCoreConfig - session: SessionTracker, InMemorySessionTracker - telemetry: recordMatch, recordObservation, formatTelemetryReport - traces: TraceAccumulator, writeTrace - sync: initSyncRepo, syncPull, syncCommitAndPush, conflict resolution - sync-migration: version markers, migrateProjectIdsToLowercase, runSyncMigrations - project-mapping: resolveProjectId, normalizeGitUrl, findMatchingProjectMemoryDirs - project-registry: loadRegistry, saveRegistry, registerProject - file-lock: acquireLock, withFileLock - git-helpers: git wrapper, isGitRepo, hasRemote, hasCommits, getDefaultBranch - path-encoder: encodeProjectPath - types: all TypeScript interfaces and type definitions --- openspec/specs/cache/spec.md | 52 ++++++++ openspec/specs/config/spec.md | 34 +++++ openspec/specs/embeddings/spec.md | 57 ++++++++ openspec/specs/file-lock/spec.md | 44 +++++++ openspec/specs/git-helpers/spec.md | 79 +++++++++++ openspec/specs/path-encoder/spec.md | 25 ++++ openspec/specs/project-mapping/spec.md | 115 ++++++++++++++++ openspec/specs/project-registry/spec.md | 37 ++++++ openspec/specs/session/spec.md | 42 ++++++ openspec/specs/skill-index/spec.md | 150 +++++++++++++++++++++ openspec/specs/sync-migration/spec.md | 123 +++++++++++++++++ openspec/specs/sync/spec.md | 143 ++++++++++++++++++++ openspec/specs/telemetry/spec.md | 94 +++++++++++++ openspec/specs/traces/spec.md | 79 +++++++++++ openspec/specs/types/spec.md | 168 ++++++++++++++++++++++++ 15 files changed, 1242 insertions(+) create mode 100644 openspec/specs/cache/spec.md create mode 100644 openspec/specs/config/spec.md create mode 100644 openspec/specs/embeddings/spec.md create mode 100644 openspec/specs/file-lock/spec.md create mode 100644 openspec/specs/git-helpers/spec.md create mode 100644 openspec/specs/path-encoder/spec.md create mode 100644 openspec/specs/project-mapping/spec.md create mode 100644 openspec/specs/project-registry/spec.md create mode 100644 openspec/specs/session/spec.md create mode 100644 openspec/specs/skill-index/spec.md create mode 100644 openspec/specs/sync-migration/spec.md create mode 100644 openspec/specs/sync/spec.md create mode 100644 openspec/specs/telemetry/spec.md create mode 100644 openspec/specs/traces/spec.md create mode 100644 openspec/specs/types/spec.md diff --git a/openspec/specs/cache/spec.md b/openspec/specs/cache/spec.md new file mode 100644 index 0000000..96f718b --- /dev/null +++ b/openspec/specs/cache/spec.md @@ -0,0 +1,52 @@ +## Requirements + +### Requirement: Cache files use schema version 2 and are keyed by embedding model + +The cache schema SHALL be `{ version: 2, embeddingModel, skills }`. `loadCache(cachePath, embeddingModel)` SHALL return an empty valid cache object with `version: 2`, the requested `embeddingModel`, and an empty `skills` map when the cache file is missing, unreadable, malformed, has a different schema version, or was created for a different embedding model. + +#### Scenario: Missing or corrupt cache yields an empty cache + +- **WHEN** `loadCache(cachePath, embeddingModel)` cannot read or parse the cache file +- **THEN** it returns `{ version: 2, embeddingModel, skills: {} }` + +#### Scenario: Model mismatch invalidates the cache + +- **WHEN** the on-disk cache was written for a different `embeddingModel` +- **THEN** `loadCache` returns an empty cache for the requested model instead of reusing stored skills + +### Requirement: saveCache writes atomically through a temporary file and rename + +`saveCache(cachePath, data)` SHALL create the parent directory recursively, write the serialized cache JSON to a temporary file whose name is `..tmp`, where `` comes from `randomBytes(4).toString("hex")`, and then atomically replace the target path via `rename(tmpPath, cachePath)`. + +#### Scenario: Cache writes use a temp-file swap + +- **WHEN** `saveCache(cachePath, data)` persists cache data +- **THEN** it writes to a randomly suffixed `.tmp` file first and renames that file to `cachePath` + +### Requirement: Cached skills preserve mtime-based reuse metadata + +`CachedSkill` entries SHALL store an `mtime` alongside the embedded data so callers can reuse embeddings only when the current file mtime still matches the cached value. `getCachedSkill`, `setCachedSkill`, and `removeCachedSkill` SHALL read, write, and delete cache entries by location key within `cache.skills`. + +#### Scenario: Cached entry can gate reuse by file mtime + +- **WHEN** a caller retrieves a cached skill whose stored `mtime` matches the current file `mtime` +- **THEN** the caller has the metadata needed to reuse the cached embeddings without re-embedding the file + +#### Scenario: Cached entries are keyed by location + +- **WHEN** `setCachedSkill(cache, location, skill)` and `getCachedSkill(cache, location)` are used with the same location +- **THEN** the stored `CachedSkill` is returned from `cache.skills[location]` + +### Requirement: Cache conversion strips and restores location around persistence + +`toCachedSkill(skill, mtime)` SHALL persist the `IndexedSkill` fields except `location`, because the cache key stores that path separately. `fromCachedSkill(location, cached)` SHALL reconstruct an `IndexedSkill` by restoring the supplied `location` while preserving the cached name, description, queries, embeddings, type, `oneLiner`, and `boost` fields. + +#### Scenario: Location is omitted from stored CachedSkill values + +- **WHEN** `toCachedSkill(skill, mtime)` converts an `IndexedSkill` +- **THEN** the returned `CachedSkill` includes the supplied `mtime` and skill metadata, but not `location` + +#### Scenario: Restoring a cached skill reinstates the location key + +- **WHEN** `fromCachedSkill(location, cached)` converts a stored cache entry back to an `IndexedSkill` +- **THEN** the returned skill's `location` is the supplied key and its remaining fields come from `cached` diff --git a/openspec/specs/config/spec.md b/openspec/specs/config/spec.md new file mode 100644 index 0000000..6f82fb8 --- /dev/null +++ b/openspec/specs/config/spec.md @@ -0,0 +1,34 @@ +## Requirements + +### Requirement: The core config exposes fixed baseline defaults + +`DEFAULT_CORE_CONFIG` SHALL provide these defaults: `enabled: true`, `embeddingModel: "Xenova/all-MiniLM-L6-v2"`, `embeddingBackend: "local"`, `cacheTimeMs: 300000`, `topK: 3`, `threshold: 0.35`, `scoringMode: "relative"`, `maxDropoff: 0.1`, `maxInjectedChars: 8000`, `types: ["skill", "memory", "workflow", "session-learning", "rule"]`, `skillDirs: []`, and `memoryDirs: []`. + +#### Scenario: Default config is used as the baseline + +- **WHEN** the package exports `DEFAULT_CORE_CONFIG` +- **THEN** it contains the documented default values for all core config fields + +### Requirement: resolveCoreConfig merges partial config with runtime type checks + +`resolveCoreConfig(partial?)` SHALL return a shallow copy of `DEFAULT_CORE_CONFIG` when `partial` is omitted. When `partial` is provided, it SHALL merge field-by-field using runtime checks: `enabled` only accepts booleans, `embeddingModel` only accepts strings, `embeddingBackend` only accepts the literal `"openai"` and otherwise falls back to the default `"local"`, numeric fields (`cacheTimeMs`, `topK`, `threshold`, `maxDropoff`, `maxInjectedChars`) only accept numbers, `scoringMode` only accepts the literal `"absolute"` and otherwise falls back to the default `"relative"`, `types` accepts any array value as-is, and `skillDirs` / `memoryDirs` accept arrays whose elements are coerced with `String(...)`. + +#### Scenario: Omitted partial returns a cloned default config + +- **WHEN** `resolveCoreConfig()` is called without arguments +- **THEN** it returns a new object with the same field values as `DEFAULT_CORE_CONFIG` + +#### Scenario: Valid overrides replace defaults + +- **WHEN** `resolveCoreConfig(partial)` is called with correctly typed override values +- **THEN** those fields replace the defaults in the returned config + +#### Scenario: Invalid override types fall back to defaults + +- **WHEN** `resolveCoreConfig(partial)` receives values of the wrong runtime type for a field +- **THEN** that field in the returned config falls back to `DEFAULT_CORE_CONFIG` + +#### Scenario: Directory arrays are string-coerced element by element + +- **WHEN** `resolveCoreConfig(partial)` receives `skillDirs` or `memoryDirs` as arrays +- **THEN** the returned arrays contain `String(...)` of each supplied element diff --git a/openspec/specs/embeddings/spec.md b/openspec/specs/embeddings/spec.md new file mode 100644 index 0000000..acd85d3 --- /dev/null +++ b/openspec/specs/embeddings/spec.md @@ -0,0 +1,57 @@ +## Requirements + +### Requirement: Embedding providers implement a shared batch embedding contract + +`EmbeddingProvider` SHALL expose a single `embed(texts: string[]): Promise` method. Both built-in providers SHALL accept an array of input strings and return an array of embedding vectors in matching order. When `texts` is empty, both providers SHALL return an empty array without performing backend work. + +#### Scenario: Empty input short-circuits + +- **WHEN** `embed([])` is called on either `LocalEmbeddingProvider` or `OpenAIEmbeddingProvider` +- **THEN** the provider returns `[]` + +### Requirement: LocalEmbeddingProvider lazily initializes a local ONNX feature-extraction pipeline + +`LocalEmbeddingProvider` SHALL default its model name to `"Xenova/all-MiniLM-L6-v2"`, optionally accept a `cacheDir`, and lazily initialize its extractor on the first non-empty `embed()` call by memoizing a single `extractorPromise`. Initialization SHALL resolve `@huggingface/transformers` through this fallback chain: direct `import("@huggingface/transformers")`, then `createRequire(...).resolve("@huggingface/transformers")` followed by dynamic import of the resolved path, then dynamic import of an absolute `../node_modules/@huggingface/transformers/src/transformers.js` path relative to the module directory. If all three resolution paths fail, initialization SHALL throw an install guidance error. When a `cacheDir` is provided, it SHALL be assigned to `transformers.env.cacheDir`. The created pipeline SHALL use task `"feature-extraction"`, the configured model name, and `dtype: "q8"`. + +#### Scenario: First embed call initializes the extractor once + +- **WHEN** `embed(texts)` is called for the first time on a `LocalEmbeddingProvider` +- **THEN** the provider initializes and memoizes a single feature-extraction pipeline before generating embeddings + +#### Scenario: Optional cache directory is propagated + +- **WHEN** a `LocalEmbeddingProvider` is constructed with a `cacheDir` +- **THEN** extractor initialization sets `transformers.env.cacheDir` to that directory before creating the pipeline + +### Requirement: OpenAIEmbeddingProvider batches requests to the embeddings API + +`OpenAIEmbeddingProvider` SHALL be constructed with a model name and API key. For non-empty inputs, it SHALL call `https://api.openai.com/v1/embeddings` with bearer-token authorization, sending inputs in batches of 2048 strings and placing each returned embedding into the original result order using the response item's `index`. If any HTTP response is non-OK, the provider SHALL read the response body text and throw `Error("OpenAI embeddings API error : ")`. + +#### Scenario: Large input is split into 2048-item batches + +- **WHEN** `embed(texts)` is called with more than 2048 input strings +- **THEN** the provider submits multiple sequential requests, each containing at most 2048 strings + +#### Scenario: Non-200 API response raises a descriptive error + +- **WHEN** the OpenAI embeddings endpoint responds with a non-OK status +- **THEN** the provider throws an error containing the HTTP status code and response body text + +### Requirement: cosineSimilarity optimizes for pre-normalized vectors and falls back safely + +`cosineSimilarity(a, b)` SHALL return `0` when the vectors have different lengths. It SHALL compute the dot product for all equal-length vectors, then use a fast path that returns the dot product directly when both squared norms are within `1e-6` of `1.0`. Otherwise it SHALL compute the full cosine similarity formula `dot / (|a| * |b|)`. If the denominator is zero, it SHALL return `0`. + +#### Scenario: Mismatched vector lengths return zero + +- **WHEN** `cosineSimilarity(a, b)` is called with vectors of different lengths +- **THEN** the result is `0` + +#### Scenario: Normalized vectors use the fast path + +- **WHEN** `cosineSimilarity(a, b)` is called with vectors whose squared norms are both within `1e-6` of `1.0` +- **THEN** the result is the raw dot product + +#### Scenario: Non-normalized vectors use the full cosine formula + +- **WHEN** `cosineSimilarity(a, b)` is called with equal-length vectors that are not both pre-normalized +- **THEN** the result is `dot / (sqrt(normSqA) * sqrt(normSqB))`, or `0` if that denominator is zero diff --git a/openspec/specs/file-lock/spec.md b/openspec/specs/file-lock/spec.md new file mode 100644 index 0000000..f2abfab --- /dev/null +++ b/openspec/specs/file-lock/spec.md @@ -0,0 +1,44 @@ +## Requirements + +### Requirement: Advisory file locking via mkdir + +`acquireLock(filePath)` SHALL create an advisory lock directory at `${filePath}.lock` using `mkdir`, which is atomic on all platforms. It returns an unlock function that removes the lock directory. If the lock already exists, it retries with exponential backoff until a timeout is reached. + +#### Scenario: Lock acquired on first attempt + +- **WHEN** no lock directory exists for a given file path +- **THEN** `mkdir` succeeds, the lock directory is created, and the returned unlock function removes it + +#### Scenario: Lock contention with retry + +- **WHEN** another process holds the lock directory +- **THEN** the acquiring process retries every 50ms until the 5-second timeout + +#### Scenario: Stale lock detection and recovery + +- **WHEN** a lock directory exists whose `mtimeMs` is more than 30 seconds old +- **THEN** the lock is considered stale, force-removed, and acquisition retries immediately + +#### Scenario: Lock between mkdir and stat + +- **WHEN** the lock directory is released by another process between the failing `mkdir` and the `stat` check +- **THEN** `stat` throws, the catch block detects the lock was released, and acquisition retries immediately + +#### Scenario: Timeout with best-effort fallback + +- **WHEN** the 5-second deadline is reached without acquiring the lock +- **THEN** a no-op unlock function is returned and execution proceeds without the lock (best-effort) + +### Requirement: withFileLock executes callback under lock + +`withFileLock(filePath, fn)` SHALL acquire the lock, execute the callback `fn`, and release the lock in a `finally` block — even if the callback throws. + +#### Scenario: Successful locked operation + +- **WHEN** `withFileLock("/data/cache.json", async () => { ... })` is called +- **THEN** the lock directory `/data/cache.json.lock` exists during callback execution and is removed afterward + +#### Scenario: Callback error releases lock + +- **WHEN** the callback throws an error +- **THEN** the lock is released (unlock function called in `finally`) and the error propagates to the caller \ No newline at end of file diff --git a/openspec/specs/git-helpers/spec.md b/openspec/specs/git-helpers/spec.md new file mode 100644 index 0000000..eaedc80 --- /dev/null +++ b/openspec/specs/git-helpers/spec.md @@ -0,0 +1,79 @@ +## Requirements + +### Requirement: Git subprocess wrapper with timeout + +`git(args, cwd)` SHALL execute `git` with the given arguments in the specified working directory with a 30-second timeout. It returns `{ stdout, stderr }` on success and throws on non-zero exit codes. + +#### Scenario: Successful git command + +- **WHEN** `git(["rev-parse", "--git-dir"], "/path/to/repo")` is called in a valid git repo +- **THEN** the function returns `{ stdout: string, stderr: string }` with the command output + +#### Scenario: Non-zero exit code + +- **WHEN** `git(["checkout", "nonexistent-branch"], "/path/to/repo")` is called and git exits with code 128 +- **THEN** the function throws with the error details + +### Requirement: isGitRepo detects git repositories + +`isGitRepo(dir)` SHALL return `true` if `git rev-parse --git-dir` succeeds in the given directory, `false` otherwise. + +#### Scenario: Valid git repository + +- **WHEN** `isGitRepo` is called on a directory containing a `.git` folder +- **THEN** the function returns `true` + +#### Scenario: Non-git directory + +- **WHEN** `isGitRepo` is called on a directory without a `.git` folder +- **THEN** the function returns `false` + +### Requirement: hasRemote checks for configured remotes + +`hasRemote(dir)` SHALL return `true` if `git remote` produces non-empty output, `false` otherwise. + +#### Scenario: Remote configured + +- **WHEN** `hasRemote` is called on a repo with `origin` configured +- **THEN** the function returns `true` + +#### Scenario: No remotes + +- **WHEN** `hasRemote` is called on a freshly `git init`-ed repo with no remotes +- **THEN** the function returns `false` + +### Requirement: hasCommits checks for any commits + +`hasCommits(dir)` SHALL return `true` if `git rev-parse HEAD` succeeds (at least one commit exists), `false` otherwise. + +#### Scenario: Repo with commits + +- **WHEN** `hasCommits` is called on a repo that has at least one commit +- **THEN** the function returns `true` + +#### Scenario: Empty repo with no commits + +- **WHEN** `hasCommits` is called on a freshly `git init`-ed repo with zero commits +- **THEN** the function returns `false` + +### Requirement: getDefaultBranch resolves the default branch name + +`getDefaultBranch(dir)` SHALL determine the default branch name through a three-step cascade: +1. Try `git symbolic-ref refs/remotes/origin/HEAD` and extract the branch name. +2. If that fails, try `git ls-remote --symref origin HEAD` and parse the branch from the ref. +3. If both fail, return `"main"` as the fallback. + +#### Scenario: Symbolic ref resolves + +- **WHEN** `git symbolic-ref refs/remotes/origin/HEAD` outputs `refs/remotes/origin/main` +- **THEN** `getDefaultBranch` returns `"main"` + +#### Scenario: ls-remote fallback + +- **WHEN** `symbolic-ref` fails but `ls-remote --symref origin HEAD` outputs `ref: refs/heads/develop` +- **THEN** `getDefaultBranch` returns `"develop"` + +#### Scenario: Default fallback + +- **WHEN** both `symbolic-ref` and `ls-remote` fail +- **THEN** `getDefaultBranch` returns `"main"` \ No newline at end of file diff --git a/openspec/specs/path-encoder/spec.md b/openspec/specs/path-encoder/spec.md new file mode 100644 index 0000000..a1614e2 --- /dev/null +++ b/openspec/specs/path-encoder/spec.md @@ -0,0 +1,25 @@ +## Requirements + +### Requirement: encodeProjectPath transforms absolute paths to safe directory names + +`encodeProjectPath(cwd)` SHALL transform an absolute filesystem path into a directory-name-safe string by replacing `/`, `.`, and `_` characters with `-` (hyphen). Consecutive hyphens are preserved because they encode original dots and separators. + +#### Scenario: Typical Unix path + +- **WHEN** `encodeProjectPath("/home/user/.myproject")` is called +- **THEN** the result is `"-home-user--myproject"` + +#### Scenario: Path with underscores + +- **WHEN** `encodeProjectPath("/Users/jim/work/my_project")` is called +- **THEN** the result is `"-Users-jim-work-my-project"` + +#### Scenario: Root path + +- **WHEN** `encodeProjectPath("/")` is called +- **THEN** the result is `"-"` + +#### Scenario: Path used in _local fallback + +- **WHEN** `resolveProjectId` falls through to the encoded path fallback for `/home/me/work` +- **THEN** the resulting project ID contains `"_local/-home-me-work"` as the encoded segment \ No newline at end of file diff --git a/openspec/specs/project-mapping/spec.md b/openspec/specs/project-mapping/spec.md new file mode 100644 index 0000000..62dfa40 --- /dev/null +++ b/openspec/specs/project-mapping/spec.md @@ -0,0 +1,115 @@ +## Requirements + +### Requirement: Git remote URLs normalize into canonical project IDs + +`normalizeGitUrl(url, caseSensitive = false)` SHALL trim whitespace, strip a trailing `.git` suffix, normalize SSH remotes of the form `git@host:owner/repo` into `host/owner/repo`, normalize parseable URL remotes into `host/path`, and lowercase the result by default. When `caseSensitive` is true, it SHALL preserve the input case instead. If URL parsing fails for a non-SSH string, it SHALL return the raw trimmed string lowercased by default or unchanged in case-sensitive mode. + +#### Scenario: SSH remote is normalized + +- **WHEN** `normalizeGitUrl("git@GitHub.com:Jim80Net/Repo.git")` is called +- **THEN** it returns `"github.com/jim80net/repo"` + +#### Scenario: HTTPS remote is normalized + +- **WHEN** `normalizeGitUrl("https://GitHub.com/Jim80Net/Repo.git")` is called +- **THEN** it returns `"github.com/jim80net/repo"` + +#### Scenario: Case-sensitive normalization preserves case + +- **WHEN** `normalizeGitUrl("git@GitHub.com:Jim80Net/Repo.git", true)` is called +- **THEN** it returns `"GitHub.com/Jim80Net/Repo"` + +#### Scenario: Unparseable remote falls through to the raw string + +- **WHEN** `normalizeGitUrl` receives a non-SSH remote string that cannot be parsed as a URL +- **THEN** it returns that trimmed string lowercased by default, or with original case when `caseSensitive` is true + +### Requirement: Local path encoding produces filesystem-safe fallback project IDs + +`encodeProjectPath(cwd)` SHALL encode local directory paths by replacing every `/`, `.`, and `_` character with `-` while preserving consecutive hyphens that result from those substitutions. + +#### Scenario: Local path is encoded for fallback storage + +- **WHEN** `encodeProjectPath("/home/user/.my_project")` is called +- **THEN** it returns `"-home-user--my-project"` + +### Requirement: Git remote lookup returns origin or null + +`getGitRemoteUrl(cwd)` SHALL run `git remote get-url origin` in the target directory, trim the result, return the URL when a non-empty string is produced, and return `null` on any subprocess failure or empty output. + +#### Scenario: Directory is not a git repo or has no origin + +- **WHEN** `git remote get-url origin` fails or returns an empty string for `cwd` +- **THEN** `getGitRemoteUrl` returns `null` + +### Requirement: Project ID resolution uses manual mappings, git remotes, and local fallbacks in order + +`resolveProjectId(cwd, syncConfig)` SHALL resolve a canonical project ID through a three-step cascade: first `syncConfig.projectMappings[cwd]`, then `getGitRemoteUrl(cwd)` normalized through `normalizeGitUrl`, and finally the encoded local fallback `_local/`. The `caseSensitive` flag SHALL be applied symmetrically across all three paths: manual mappings are lowercased by default, git remotes are normalized with the same policy, and encoded `_local/` fallbacks are lowercased by default or preserved when `caseSensitive === true`. + +#### Scenario: Manual mapping wins over git metadata + +- **WHEN** `syncConfig.projectMappings` contains an entry for `cwd` +- **THEN** `resolveProjectId` returns that mapped value normalized according to `caseSensitive` without consulting git + +#### Scenario: Manual mapping lowercases by default + +- **WHEN** `resolveProjectId("/home/me/work", { ...syncConfig, projectMappings: { "/home/me/work": "MyOrg/MyProject" } })` is called with `caseSensitive` unset or false +- **THEN** it returns `"myorg/myproject"` + +#### Scenario: Manual mapping preserves case in case-sensitive mode + +- **WHEN** `resolveProjectId("/home/me/work", { ...syncConfig, caseSensitive: true, projectMappings: { "/home/me/work": "MyOrg/MyProject" } })` is called +- **THEN** it returns `"MyOrg/MyProject"` + +#### Scenario: Git remote is used when no manual mapping exists + +- **WHEN** no manual mapping exists for `cwd` and `getGitRemoteUrl(cwd)` returns a remote URL +- **THEN** `resolveProjectId` returns the normalized git remote ID + +#### Scenario: Local fallback is used when no mapping or git remote exists + +- **WHEN** `cwd` has no manual mapping and `getGitRemoteUrl(cwd)` returns `null` +- **THEN** `resolveProjectId` returns `_local/` normalized according to `caseSensitive` + +#### Scenario: Local fallback lowercases by default + +- **WHEN** `resolveProjectId("/does-not-exist-memex-test/SomeDir", { ...syncConfig, projectMappings: {} })` is called and no git remote can be resolved +- **THEN** it returns `"_local/-does-not-exist-memex-test-somedir"` + +#### Scenario: Local fallback preserves case in case-sensitive mode + +- **WHEN** `resolveProjectId("/does-not-exist-memex-test/SomeDir", { ...syncConfig, caseSensitive: true, projectMappings: {} })` is called and no git remote can be resolved +- **THEN** it returns `"_local/-does-not-exist-memex-test-SomeDir"` + +### Requirement: Matching memory directory discovery returns canonical, local, and rollout-window legacy paths + +`findMatchingProjectMemoryDirs(cwd, syncRepoPath, syncConfig)` SHALL return all matching project memory directories for the current working directory. It SHALL include the canonical memory directory at `projects//memory` when it exists, include the `_local//memory` fallback when it exists, and when `caseSensitive` is not true it SHALL also probe for legacy mixed-case project directories whose lowercased relative path equals the canonical ID. Legacy probing SHALL walk only along directory prefixes that can lowercase-match the target ID, and multiple matches SHALL be returned during the rollout window. + +#### Scenario: Canonical memory directory exists + +- **WHEN** `projects//memory` exists in the sync repo +- **THEN** `findMatchingProjectMemoryDirs` includes that directory in its result set + +#### Scenario: _local fallback exists during non-git usage + +- **WHEN** `projects/_local//memory` exists in the sync repo +- **THEN** `findMatchingProjectMemoryDirs` includes that fallback directory in its result set + +#### Scenario: Legacy mixed-case project path is still present + +- **WHEN** `caseSensitive` is not true and a legacy mixed-case project directory lowercases to the canonical project ID +- **THEN** `findMatchingProjectMemoryDirs` includes the legacy `memory/` directory alongside any canonical or `_local` matches + +#### Scenario: Case-sensitive mode skips legacy mixed-case probing + +- **WHEN** `syncConfig.caseSensitive === true` +- **THEN** `findMatchingProjectMemoryDirs` checks only the canonical and `_local` locations and does not perform the case-insensitive legacy walk + +### Requirement: The sync project memory destination always uses the canonical project ID + +`getSyncProjectMemoryDir(cwd, syncRepoPath, syncConfig)` SHALL resolve the canonical project ID with `resolveProjectId` and return `projects//memory` rooted at `syncRepoPath`. + +#### Scenario: Canonical sync memory path is requested + +- **WHEN** `getSyncProjectMemoryDir(cwd, syncRepoPath, syncConfig)` is called +- **THEN** it returns the canonical project memory directory under `projects//memory` diff --git a/openspec/specs/project-registry/spec.md b/openspec/specs/project-registry/spec.md new file mode 100644 index 0000000..ae30e71 --- /dev/null +++ b/openspec/specs/project-registry/spec.md @@ -0,0 +1,37 @@ +## Requirements + +### Requirement: loadRegistry returns version 1 registry data and falls back to empty data on invalid input + +`loadRegistry(registryPath)` SHALL read JSON registry data from disk and expect the version 1 schema `{ version: 1, projects: {} }`. If the file is missing, unreadable, malformed, or has a version other than `1`, the function SHALL return an empty version 1 registry object. + +#### Scenario: Corrupt or mismatched registry data is ignored + +- **WHEN** the registry file is invalid JSON or declares a version other than `1` +- **THEN** `loadRegistry` returns `{ version: 1, projects: {} }` + +### Requirement: saveRegistry writes atomically, creates parent directories, and pretty-prints JSON + +`saveRegistry(registryPath, data)` SHALL create the destination parent directory if needed, write the registry as pretty-printed JSON to a uniquely suffixed temporary file, and rename that temp file into place to complete the save atomically. + +#### Scenario: Registry save creates missing parent directories + +- **WHEN** the parent directory for `registryPath` does not exist +- **THEN** `saveRegistry` creates it before writing and renaming the temporary file + +### Requirement: registerProject mutates the registry in place with an ISO lastSeen timestamp + +`registerProject(registry, cwd)` SHALL mutate the provided registry object in place and store the project under `registry.projects[cwd]` with a `lastSeen` timestamp generated in ISO 8601 string form. + +#### Scenario: Registering a project updates lastSeen for that cwd + +- **WHEN** `registerProject` is called for a project path, including one already present in the registry +- **THEN** the registry stores that cwd with a fresh ISO-formatted `lastSeen` value + +### Requirement: getKnownProjects returns project paths ordered by recency + +`getKnownProjects(registry)` SHALL return the known project paths sorted by descending `lastSeen`, so the most recently seen project appears first. + +#### Scenario: Projects are returned most-recent-first + +- **WHEN** the registry contains multiple projects with different `lastSeen` values +- **THEN** `getKnownProjects` returns their paths ordered from newest timestamp to oldest timestamp diff --git a/openspec/specs/session/spec.md b/openspec/specs/session/spec.md new file mode 100644 index 0000000..558b562 --- /dev/null +++ b/openspec/specs/session/spec.md @@ -0,0 +1,42 @@ +## Requirements + +### Requirement: SessionTracker defines the rule-disclosure tracking contract + +`SessionTracker` SHALL expose four methods: `hasRuleBeenShown(sessionId, location)`, `markRuleShown(sessionId, location)`, `clearSession(sessionId)`, and `cleanup(maxAgeMs?)`. The interface SHALL support callers that need to know whether a rule has already been fully shown during the current session. + +#### Scenario: Session tracker exposes graduated-disclosure queries and updates + +- **WHEN** a caller uses `SessionTracker` +- **THEN** it can check whether a rule was already shown, mark it as shown, clear a session, and request cleanup of stale sessions + +### Requirement: InMemorySessionTracker stores disclosure state in process memory only + +`InMemorySessionTracker` SHALL store session state in an in-memory `Map; lastAccess: number }>` keyed by session ID. `markRuleShown` SHALL create a session entry on first use, add the rule location to that session's `Set`, and refresh `lastAccess`. `hasRuleBeenShown` SHALL report whether the location exists in the session's `Set` and also refresh `lastAccess` when the session exists. `clearSession` SHALL remove the session entry entirely. Because storage is process-local memory, all state SHALL reset on process restart. + +#### Scenario: Marking and checking a rule updates in-memory session state + +- **WHEN** `markRuleShown(sessionId, location)` is called and then `hasRuleBeenShown(sessionId, location)` is checked in the same process +- **THEN** the tracker reports `true` for that session-location pair + +#### Scenario: Process restart clears disclosure history + +- **WHEN** the process restarts and a new `InMemorySessionTracker` is created +- **THEN** previously shown rules are no longer present because the tracker does not persist state + +### Requirement: InMemorySessionTracker supports cleanup of stale sessions + +`cleanup(maxAgeMs = 3600000)` SHALL delete any session whose `lastAccess` timestamp is older than `Date.now() - maxAgeMs`. The default retention window SHALL be one hour. + +#### Scenario: Default cleanup removes sessions idle for more than one hour + +- **WHEN** `cleanup()` runs +- **THEN** any session whose `lastAccess` is more than 3600000 ms old is removed from the internal map + +### Requirement: Session tracking enables graduated disclosure for repeated rule matches + +The session tracker SHALL support a graduated disclosure pattern in which the first match for a rule in a session can be treated as unseen, and later matches in that same session can be treated as already shown. After a caller marks a rule as shown, subsequent `hasRuleBeenShown(sessionId, location)` calls for the same session and location SHALL return `true`, enabling consumers to switch from full rule content to a one-line reminder. + +#### Scenario: First match can be full content and later matches can be one-liners + +- **WHEN** a caller checks a rule before it has been marked shown, then calls `markRuleShown(sessionId, location)`, and checks again for the same session and location +- **THEN** the first check reports unseen and the later check reports seen, enabling full-content-first and one-liner-later behavior diff --git a/openspec/specs/skill-index/spec.md b/openspec/specs/skill-index/spec.md new file mode 100644 index 0000000..5fe5271 --- /dev/null +++ b/openspec/specs/skill-index/spec.md @@ -0,0 +1,150 @@ +## Requirements + +### Requirement: ScanDirs names the three index source groups + +The skill index baseline SHALL use a `ScanDirs` descriptor with `skillDirs`, `memoryDirs`, and `ruleDirs`, each represented as arrays of directory paths supplied by the consumer. + +#### Scenario: Build receives source directories by category + +- **WHEN** a caller invokes `SkillIndex.build(scanDirs)` +- **THEN** the index reads skill directories from `scanDirs.skillDirs`, memory directories from `scanDirs.memoryDirs`, and rule directories from `scanDirs.ruleDirs` + +### Requirement: parseFrontmatter extracts supported YAML-like metadata and preserves raw bodies when absent + +`parseFrontmatter(content)` SHALL parse content wrapped in `---` frontmatter delimiters and return `{ meta, body }`, extracting `name`, `description`, `type`, `queries`, `keywords`, `paths`, `hooks`, `one-liner`, and `boost`. The returned `body` SHALL contain the content after the closing delimiter. If frontmatter delimiters are absent, the function SHALL return `{ meta: {}, body: content }` without modification. + +#### Scenario: Frontmatter is present + +- **WHEN** markdown begins with a `---` block containing supported keys and ends that block with a second `---` +- **THEN** the parsed metadata is returned in `meta` and the remaining markdown is returned in `body` + +#### Scenario: Frontmatter is absent + +- **WHEN** content does not match the frontmatter delimiter pattern +- **THEN** `parseFrontmatter` returns an empty `meta` object and the full raw content as `body` + +### Requirement: parseFrontmatter supports both block-style and inline list values + +For `queries`, `paths`, `hooks`, and `keywords`, `parseFrontmatter` SHALL accept either block-style YAML lists with indented `-` items or inline scalar values. Block-style items and inline values SHALL be normalized into arrays of trimmed strings with surrounding single or double quotes removed. + +#### Scenario: Block-style list values are collected + +- **WHEN** a supported list key has an empty value followed by indented `- item` lines +- **THEN** each item is appended to the corresponding metadata array in order + +#### Scenario: Inline list-like values are treated as single entries + +- **WHEN** a supported list key is written on one line with a scalar value +- **THEN** the value is stored as a one-element array for that key + +### Requirement: parseMemoryFile supports frontmatter-based and section-based memory formats + +`parseMemoryFile(content, filePath)` SHALL support two baseline formats: a frontmatter-based single-entry format using `name`, `description`, and `queries`, and a section-based format that splits on `##` headings and reads `Triggers:` or `Trigger:` lines as queries. The function SHALL return an array of parsed sections with `name`, `description`, `queries`, and trimmed `body`. + +#### Scenario: Frontmatter-based memory file yields one parsed entry + +- **WHEN** a memory file contains frontmatter with `name` or `description` +- **THEN** `parseMemoryFile` returns at most one parsed section using the declared metadata, defaulting the name to the markdown filename and the description to the first body line when omitted + +#### Scenario: Section-based memory file yields one entry per heading + +- **WHEN** a memory file contains one or more `## Section Name` headings with optional `Triggers:` lines inside each section +- **THEN** `parseMemoryFile` returns one parsed section per heading, removes trigger lines from the body, and splits comma-separated triggers into normalized query strings + +### Requirement: Skill parsing requires name and description and falls back queries to description + +When building skill entries from `SKILL.md`, the parser SHALL ignore files missing either `meta.name` or `meta.description`. For valid skill files, the entry type SHALL default to `"skill"` when unspecified, and embedded queries SHALL use `meta.queries` when present or fall back to a single query containing the description. + +#### Scenario: Skill file with no explicit queries still indexes + +- **WHEN** a `SKILL.md` file has valid `name` and `description` metadata but no `queries` +- **THEN** the index embeds the description as that skill's only query + +### Requirement: Rule parsing derives defaults and incorporates keywords into search queries + +When building rule entries from markdown files, the parser SHALL default the rule name to the filename without `.md`, default the description to the first line of the parsed body when frontmatter omits it, default `oneLiner` to the description when absent, and add `keywords` to the embedded query list alongside explicit queries. If neither queries nor keywords are present, the description SHALL be used as the sole query. + +#### Scenario: Rule file without metadata still produces a searchable rule + +- **WHEN** a rule markdown file has no frontmatter name, description, queries, or one-liner +- **THEN** the index uses the filename as the rule name, the first body line as the description and one-liner, and the description as the only embedded query + +### Requirement: SkillIndex.build incrementally scans, hydrates, and maintains the index across skills, memories, and rules + +`SkillIndex.build(scanDirs)` SHALL load the persistent cache on the first build, hydrate in-memory skills from that cache on a cold start, scan skill directories for `SKILL.md` files in immediate subdirectories, scan memory directories for `.md` files excluding `MEMORY.md`, and scan rule directories for `.md` files. The build SHALL detect new, changed, and deleted files using mtimes and SHALL skip rebuilding unchanged indexes when the previous build is still current. + +#### Scenario: First build hydrates from persistent cache + +- **WHEN** `build()` runs for the first time and the cache already contains indexed skills for the configured embedding model +- **THEN** the in-memory index is hydrated from cached entries before scanning for changes + +#### Scenario: Unchanged sources avoid rebuild work + +- **WHEN** all scanned locations have the same mtimes as the last successful build and no locations were added or deleted +- **THEN** `build()` refreshes `buildTime` and returns without re-parsing or re-embedding entries + +#### Scenario: Deleted sources are removed from memory and cache + +- **WHEN** a previously indexed skill, rule, or memory file is no longer present in scanned locations +- **THEN** the corresponding indexed entries and cached records are removed, including memory section keys derived from that file + +### Requirement: Memory files expand into section-keyed entries and new embeddings are generated in batch + +During `build()`, each parsed memory section SHALL be indexed as a separate memory entry keyed as `"#"`. For any new or changed skill, rule, or memory section, the index SHALL flatten all queries into a single batch embed call, rebuild the affected `IndexedSkill` records from that batch, and persist the cache after updating embedded entries. + +#### Scenario: Memory file yields multiple indexed entries + +- **WHEN** a memory file parses into multiple sections +- **THEN** each section is indexed independently under a unique `path#SectionName` location key + +#### Scenario: Newly parsed queries are embedded in one batch + +- **WHEN** the build includes one or more new or changed entries +- **THEN** the provider receives one flattened batch of all queries for embedding and the cache is saved after the updated entries are written + +### Requirement: SkillIndex.search scores all indexed entries, supports filtering, and enforces scoring modes + +`SkillIndex.search(query, topK, threshold, typeFilter?, scoringMode?, maxDropoff?)` SHALL embed the incoming query, compute cosine similarity against each indexed query embedding, add any per-entry `boost`, choose the best matching query index per entry, deduplicate results by skill name, and optionally restrict candidates by `typeFilter`. In `relative` mode, the best score SHALL first clear the threshold floor and only results within `maxDropoff` of that best score may remain. In `absolute` mode, each result SHALL individually clear the threshold. + +#### Scenario: Duplicate skill names collapse to the highest-scoring entry + +- **WHEN** multiple indexed entries share the same skill name across different scan locations +- **THEN** search returns only the highest-scoring result for that name + +#### Scenario: Relative scoring uses the best result as the floor anchor + +- **WHEN** `scoringMode` is `"relative"` and the top result meets the threshold +- **THEN** search returns up to `topK` results whose scores are no more than `maxDropoff` below the best result + +#### Scenario: Absolute scoring enforces per-result thresholding + +- **WHEN** `scoringMode` is `"absolute"` +- **THEN** only results whose boosted score is at least the threshold are returned, up to `topK` + +### Requirement: readSkillContent returns bodies without frontmatter and resolves memory section references + +`SkillIndex.readSkillContent(location)` SHALL read the body content for an indexed location. For ordinary skill or rule files, it SHALL strip frontmatter and return the trimmed body. For memory section references of the form `"path#SectionName"`, it SHALL reparse the memory file and return the trimmed body for the matching section, or an empty string when the section is not found. + +#### Scenario: Reading a normal skill strips frontmatter + +- **WHEN** `readSkillContent()` is called with a markdown file location +- **THEN** it returns the parsed body content without the frontmatter block + +#### Scenario: Reading a memory section resolves by section name + +- **WHEN** `readSkillContent()` is called with a `path#SectionName` memory location +- **THEN** it returns the matching section body from `parseMemoryFile` + +### Requirement: needsRebuild is driven by first-build state and cache TTL + +`SkillIndex.needsRebuild()` SHALL return `true` before the first successful build and thereafter only when the elapsed time since the last build is at least `cacheTimeMs`. + +#### Scenario: New index requires an initial build + +- **WHEN** no successful build has run yet +- **THEN** `needsRebuild()` returns `true` + +#### Scenario: Expired build requires refresh + +- **WHEN** the last build time is older than or equal to the configured cache TTL +- **THEN** `needsRebuild()` returns `true` diff --git a/openspec/specs/sync-migration/spec.md b/openspec/specs/sync-migration/spec.md new file mode 100644 index 0000000..4b5e077 --- /dev/null +++ b/openspec/specs/sync-migration/spec.md @@ -0,0 +1,123 @@ +## Requirements + +### Requirement: The sync repo schema version is stored in a marker file with a legacy default + +The sync migration subsystem SHALL store its schema version at `/.memex-sync/version.json`. `readSyncRepoVersion(syncRepoDir)` SHALL return `1` when the marker file is missing, unreadable, malformed JSON, or missing a positive integer `version` key. `writeSyncRepoVersion(syncRepoDir, version)` SHALL create the `.memex-sync/` directory if needed and write a JSON object containing that version. + +#### Scenario: Marker file is absent or malformed + +- **WHEN** `readSyncRepoVersion` runs and `.memex-sync/version.json` is missing, invalid JSON, or lacks a numeric positive `version` +- **THEN** it returns `1` + +#### Scenario: Marker file round-trips a written version + +- **WHEN** `writeSyncRepoVersion(syncRepoDir, 2)` runs and `readSyncRepoVersion(syncRepoDir)` is called afterward +- **THEN** the read returns `2` + +### Requirement: Markdown body merging is lossless but deduplicates identical content + +`mergeMarkdownBodies(a, b)` SHALL trim both markdown bodies, return one copy when the trimmed bodies are identical, and otherwise concatenate the trimmed bodies with a blank line separator. + +#### Scenario: Markdown bodies differ + +- **WHEN** `mergeMarkdownBodies` receives two different markdown bodies +- **THEN** it returns `a.trim() + "\n\n" + b.trim()` + +#### Scenario: Markdown bodies are identical after trimming + +- **WHEN** `mergeMarkdownBodies` receives bodies that trim to the same value +- **THEN** it returns that value exactly once + +### Requirement: Mid-operation git states are detected before migration work starts + +`isMidRebaseOrMerge(syncRepoDir)` SHALL return true when any of `.git/rebase-merge`, `.git/rebase-apply`, or `.git/MERGE_HEAD` exists in the sync repo, and false otherwise. + +#### Scenario: Repo is mid-rebase or mid-merge + +- **WHEN** any one of the rebase or merge sentinel paths exists under `.git/` +- **THEN** `isMidRebaseOrMerge` returns true + +### Requirement: Project discovery identifies directories that directly own memory/ + +`findProjectIds(projectsDir)` SHALL walk the `projects/` tree recursively, collect every relative path whose directory contains a direct `memory/` child, and stop descending once such a project directory is found. + +#### Scenario: Nested project IDs are discovered at memory parents only + +- **WHEN** `projects/` contains nested directories where only certain directories have a direct `memory/` child +- **THEN** `findProjectIds` returns only those relative parent paths and does not recurse into their `memory/` contents + +### Requirement: Case-only git renames use a two-step move sequence + +`gitRenameCaseOnly(syncRepoDir, srcRelative, dstRelative)` SHALL create the destination parent directory when needed and perform a two-step rename of `srcRelative` to `srcRelative + ".memex-rename-tmp"` and then to `dstRelative` so case-only renames work on case-insensitive filesystems. + +#### Scenario: Case-only rename is required + +- **WHEN** a mixed-case project path must be renamed only by letter casing +- **THEN** `gitRenameCaseOnly` moves it through a temporary path before the final destination + +### Requirement: Empty legacy ancestors are pruned after project relocation + +`removeEmptyLegacyAncestors(syncRepoDir, srcRelative)` SHALL walk upward from the legacy path's parent directory, remove empty directories, and stop when it reaches `projects/` or finds a non-empty directory. + +#### Scenario: Empty mixed-case parent directories remain after migration + +- **WHEN** a project rename or merge leaves empty legacy ancestor directories under `projects/` +- **THEN** `removeEmptyLegacyAncestors` removes those empty directories but does not remove the `projects/` root + +### Requirement: Project ID migration lowercases mixed-case project directories and merges collisions safely + +`migrateProjectIdsToLowercase(syncRepoDir)` SHALL return `{ renamed, merged }` and operate on directories under `projects/` that are the immediate parents of `memory/` subdirectories. It SHALL ignore repos with no `projects/` directory, filter project IDs containing uppercase characters, sort them deepest-first, and rename each path to its lowercase form. For case-only renames it SHALL use `gitRenameCaseOnly`. When a distinct lowercase directory already exists, it SHALL merge the two trees file-by-file: move files that exist only in the legacy tree, merge colliding markdown files with `mergeMarkdownBodies`, keep the newer mtime for colliding non-markdown files, remove legacy files with git, and then delete the legacy tree. After either path it SHALL prune empty legacy ancestors. + +#### Scenario: No projects directory exists + +- **WHEN** `migrateProjectIdsToLowercase` runs in a sync repo with no `projects/` directory +- **THEN** it returns `{ renamed: [], merged: [] }` + +#### Scenario: Mixed-case project is renamed without a lowercase collision + +- **WHEN** `projects/GitHub.com/Jim80Net/Repo/memory/notes.md` exists and no distinct lowercase destination exists +- **THEN** the migration renames it to `projects/github.com/jim80net/repo/memory/notes.md`, records the legacy ID in `renamed`, and removes empty mixed-case ancestors + +#### Scenario: Distinct lowercase project already exists on a case-sensitive filesystem + +- **WHEN** both `projects/Foo/.../memory/` and `projects/foo/.../memory/` exist as distinct directories +- **THEN** the migration merges them into the lowercase destination, records the legacy ID in `merged`, and removes the legacy tree + +#### Scenario: Colliding markdown files are losslessly merged + +- **WHEN** the legacy and canonical trees both contain the same `.md` file with different content +- **THEN** the destination markdown file contains both trimmed bodies joined by a blank line + +#### Scenario: Colliding non-markdown files keep the newer version + +- **WHEN** the legacy and canonical trees both contain the same non-markdown file +- **THEN** the migrated destination keeps whichever file has the newer `mtime` + +### Requirement: Migration orchestration is gated, versioned, and idempotent + +`runSyncMigrations(config, syncRepoDir)` SHALL be the orchestrator for sync repo migrations. It SHALL skip immediately with a case-sensitive status string when `config.caseSensitive === true`, skip with a mid-rebase/merge status string when `isMidRebaseOrMerge(syncRepoDir)` is true, initialize only the v2 marker and return `"marker initialized (fresh repo)"` when the repo has no commits yet, and skip with `"migration skipped (already v2)"` when the version marker is already at least `2`. Otherwise it SHALL run `migrateProjectIdsToLowercase`, write schema version `2`, stage all changes with `git add -A`, and create a single commit with message `"memex: migrate project IDs to lowercase (schema v1 → v2)"` only when staged changes remain. + +#### Scenario: Case-sensitive mode opts out of migration + +- **WHEN** `runSyncMigrations` runs with `config.caseSensitive === true` +- **THEN** it returns a case-sensitive skip status and does not change the repo + +#### Scenario: Fresh repo initializes the marker only + +- **WHEN** `runSyncMigrations` runs in a repo with no commits yet +- **THEN** it writes schema version `2`, creates no commit, and returns `"marker initialized (fresh repo)"` + +#### Scenario: Repo is already at schema v2 + +- **WHEN** `readSyncRepoVersion(syncRepoDir)` returns `2` or higher +- **THEN** `runSyncMigrations` returns `"migration skipped (already v2)"` + +#### Scenario: Migration produces no staged changes after version write and scan + +- **WHEN** `runSyncMigrations` stages all changes and `git status --porcelain` is empty +- **THEN** it returns `"migration: no changes (renamed X, merged Y)"` and creates no commit + +#### Scenario: Migration changes are committed once + +- **WHEN** migration plus marker writing leave staged changes in the sync repo +- **THEN** `runSyncMigrations` creates exactly one commit with message `"memex: migrate project IDs to lowercase (schema v1 → v2)"` and returns a summary string reporting renamed and merged counts diff --git a/openspec/specs/sync/spec.md b/openspec/specs/sync/spec.md new file mode 100644 index 0000000..c0afb38 --- /dev/null +++ b/openspec/specs/sync/spec.md @@ -0,0 +1,143 @@ +## Requirements + +### Requirement: Markdown conflict auto-resolution preserves both sides without conflict markers + +`autoResolveMarkdownConflict(content)` SHALL replace inline git conflict markers in markdown with merged body content. For each `<<<<<<<` / `=======` / `>>>>>>>` block, it SHALL trim both sides, keep only one copy when the trimmed bodies are identical, and otherwise concatenate the trimmed bodies with a blank line between them. + +#### Scenario: Conflicting markdown bodies differ + +- **WHEN** `autoResolveMarkdownConflict` receives a markdown file containing one git conflict block whose two sides differ after trimming +- **THEN** the returned content replaces the conflict markers with `ours + "\n\n" + theirs` + +#### Scenario: Conflicting markdown bodies are identical after trimming + +- **WHEN** `autoResolveMarkdownConflict` receives a conflict block whose two sides trim to the same body +- **THEN** the returned content keeps that body exactly once and removes the conflict markers + +### Requirement: Sync repo initialization ensures a local repo and configured origin + +`initSyncRepo(config, syncRepoDir)` SHALL return immediately when sync is disabled or `config.repo` is empty. Otherwise it SHALL create `syncRepoDir`, reuse an existing git repo when present, and ensure the `origin` remote points at `config.repo`. If the directory is not yet a git repo, it SHALL try `git clone ` first and fall back to `git init` plus `git remote add origin ` when cloning fails. + +#### Scenario: Existing repo has a different origin URL + +- **WHEN** `initSyncRepo` runs in a directory that is already a git repo and `origin` points somewhere other than `config.repo` +- **THEN** it updates `origin` to `config.repo` and leaves the repo in place + +#### Scenario: Existing repo has no origin remote + +- **WHEN** `initSyncRepo` runs in a directory that is already a git repo but `remote get-url origin` fails +- **THEN** it adds `origin` pointing at `config.repo` + +#### Scenario: Clone falls back to init plus remote add + +- **WHEN** `initSyncRepo` runs in a non-repo directory and `git clone` fails +- **THEN** it initializes a new git repo in `syncRepoDir` and adds `origin` with `config.repo` + +### Requirement: syncPull gates sync work before pulling and runs migration only on safe states + +`syncPull(config, syncRepoDir)` SHALL return `"sync disabled"` without touching the repo when sync is disabled or `config.repo` is empty. Otherwise it SHALL initialize the repo, short-circuit local-only and fresh-repo cases before fetching, fetch from `origin`, determine the default branch, call `pullWithConflictResolution(syncRepoDir, "origin/")`, and run `runSyncMigrations(config, syncRepoDir)` only after a successful pull or when the repo has no remote configured. + +#### Scenario: No remote configured after initialization + +- **WHEN** `syncPull` runs after `initSyncRepo` and `hasRemote(syncRepoDir)` returns false +- **THEN** it runs `runSyncMigrations` once and returns `"no remote configured"` + +#### Scenario: Repo has a remote but no commits yet + +- **WHEN** `syncPull` runs and `hasRemote(syncRepoDir)` returns true while `hasCommits(syncRepoDir)` returns false +- **THEN** it returns `"no commits yet"` and defers migration until commits exist + +#### Scenario: Fetch fails before pull + +- **WHEN** `git fetch origin` fails during `syncPull` +- **THEN** the function returns `"fetch failed (remote unreachable?)"` and does not run pull conflict resolution or post-pull migration + +#### Scenario: Pull succeeds and migration runs afterward + +- **WHEN** fetch succeeds and `pullWithConflictResolution` returns a non-failure status string +- **THEN** `syncPull` runs `runSyncMigrations` after the pull and returns the pull status string + +### Requirement: Pull conflict resolution prefers rebase, then merge, with automatic file resolution + +`pullWithConflictResolution(syncRepoDir, remoteBranch)` SHALL attempt `git rebase ` first. If rebase fails, it SHALL inspect unresolved files, auto-resolve markdown conflicts by rewriting the file with `autoResolveMarkdownConflict`, auto-resolve non-markdown conflicts by checking out `--theirs`, and stage every resolved file. If a resolved rebase can continue, it SHALL return `"pulled with auto-resolved conflicts: ..."`; otherwise it SHALL abort the rebase and fall back to `git merge --no-edit`. A clean merge SHALL return `"pulled (merge)"`. A merge with auto-resolved conflicts SHALL commit `"Auto-resolve merge conflicts"` and return `"pulled with merge + auto-resolved: ..."`. If neither rebase nor merge can be resolved automatically, it SHALL abort the merge when possible and return `"pull failed: unresolvable conflicts"`. + +#### Scenario: Rebase succeeds without conflicts + +- **WHEN** `git rebase ` succeeds on the first attempt +- **THEN** `pullWithConflictResolution` returns `"pulled successfully"` + +#### Scenario: Rebase conflicts are auto-resolved and continued + +- **WHEN** rebase fails, `resolveConflicts` stages one or more files, and `git rebase --continue` succeeds +- **THEN** the function returns `"pulled with auto-resolved conflicts: ..."` + +#### Scenario: Rebase aborts and merge succeeds cleanly + +- **WHEN** rebase cannot continue after conflict handling but `git merge --no-edit` succeeds +- **THEN** the function returns `"pulled (merge)"` + +#### Scenario: Merge conflicts are auto-resolved + +- **WHEN** both rebase and clean merge fail, but merge conflict handling resolves one or more files +- **THEN** the function creates a merge commit with message `"Auto-resolve merge conflicts"` and returns `"pulled with merge + auto-resolved: ..."` + +#### Scenario: Conflicts remain unresolvable + +- **WHEN** neither the rebase path nor the merge path produces any auto-resolved files that can complete the operation +- **THEN** the function returns `"pull failed: unresolvable conflicts"` + +### Requirement: syncCommitAndPush copies tracked content into the sync repo, commits once, and pushes + +`syncCommitAndPush(config, syncRepoDir, sourceDirs, cwd)` SHALL return `"sync disabled"` when sync is disabled or `config.repo` is empty. Otherwise it SHALL initialize the repo, copy rules from `sourceDirs.rules` into `/rules`, copy skill definitions from `sourceDirs.skills` into `/skills`, resolve the canonical project memory destination with `getSyncProjectMemoryDir(cwd, syncRepoDir, config)`, and copy markdown project memories into that destination. If nothing is copied, it SHALL return `"no changes to sync"`. When copied content exists, it SHALL stage all changes, return `"no changes after staging"` if staging reveals no git diff, commit once with a message of the form `sync from at `, and then push to the default branch by trying `git push` first and `git push -u origin ` as a fallback. + +#### Scenario: No files need copying + +- **WHEN** the rules, skills, and project memory sync helpers all report zero copied files +- **THEN** `syncCommitAndPush` returns `"no changes to sync"` + +#### Scenario: Commit succeeds but no remote is configured + +- **WHEN** content is copied and committed but `hasRemote(syncRepoDir)` returns false before pushing +- **THEN** the function returns `"committed (no remote)"` + +#### Scenario: Direct push fails but upstream push succeeds + +- **WHEN** the commit succeeds, `git push` fails, and `git push -u origin ` succeeds +- **THEN** the function returns `"synced file(s)"` + +#### Scenario: Push fails after a successful commit + +- **WHEN** the commit succeeds but both push attempts fail +- **THEN** the function returns `"committed locally, push failed: ..."` + +### Requirement: syncDirectory copies matching files only when the source is newer + +`syncDirectory(srcDir, destDir, pattern)` SHALL return `0` when `srcDir` cannot be read. Otherwise it SHALL filter directory entries by the suffix implied by `pattern`, skip non-files and unreadable entries, compare source and destination mtimes, create `destDir` when a copy is needed, and write the source content into the destination only when the destination is missing or older than the source. + +#### Scenario: Destination file is up to date + +- **WHEN** a source file matches the requested pattern but the destination file exists with `mtime >= src.mtime` +- **THEN** `syncDirectory` leaves the destination untouched and does not count the file as copied + +#### Scenario: Destination file is missing or stale + +- **WHEN** a source file matches the requested pattern and the destination file is missing or older +- **THEN** `syncDirectory` creates the destination directory if needed, writes the file, and increments the copied count + +### Requirement: syncSkillsDirectory copies SKILL.md from named skill folders when newer + +`syncSkillsDirectory(srcDir, destDir)` SHALL return `0` when `srcDir` cannot be read. Otherwise it SHALL treat each immediate child entry as a skill folder, inspect `/SKILL.md`, skip missing or unreadable skill files, compare mtimes against `//SKILL.md`, create destination skill folders when copying, and write only newer or missing `SKILL.md` files. + +#### Scenario: Skill definition is copied into a matching subdirectory + +- **WHEN** `//SKILL.md` exists and the destination file is missing or older +- **THEN** `syncSkillsDirectory` creates `//` if needed, copies `SKILL.md`, and increments the copied count + +### Requirement: getSyncScanDirs exposes the sync repo scan roots + +`getSyncScanDirs(syncRepoPath)` SHALL return `{ rulesDir, skillsDir }` where `rulesDir` is `/rules` and `skillsDir` is `/skills`. + +#### Scenario: Scan roots are derived from the sync repo path + +- **WHEN** `getSyncScanDirs(syncRepoPath)` is called +- **THEN** it returns the `rules/` and `skills/` directories rooted at that sync repo path diff --git a/openspec/specs/telemetry/spec.md b/openspec/specs/telemetry/spec.md new file mode 100644 index 0000000..af91add --- /dev/null +++ b/openspec/specs/telemetry/spec.md @@ -0,0 +1,94 @@ +## Requirements + +### Requirement: loadTelemetry returns version 1 telemetry data and falls back to empty data on invalid input + +`loadTelemetry(telemetryPath)` SHALL read JSON telemetry from disk and expect the version 1 schema `{ version: 1, entries: {} }`. If the file is missing, unreadable, malformed, or has a version other than `1`, the function SHALL return an empty version 1 telemetry object. + +#### Scenario: Valid version 1 telemetry loads successfully + +- **WHEN** the telemetry file contains valid JSON with `version: 1` +- **THEN** `loadTelemetry` returns the parsed data + +#### Scenario: Corrupt or mismatched telemetry is ignored + +- **WHEN** the telemetry file is unreadable, invalid JSON, or declares a version other than `1` +- **THEN** `loadTelemetry` returns `{ version: 1, entries: {} }` + +### Requirement: saveTelemetry writes atomically and creates parent directories + +`saveTelemetry(telemetryPath, data)` SHALL create the destination parent directory if needed, write the JSON payload to a uniquely suffixed temporary file, and then replace the target path via rename so the final write is atomic at the file level. + +#### Scenario: Saving telemetry creates missing directories + +- **WHEN** the parent directory for `telemetryPath` does not exist +- **THEN** `saveTelemetry` creates it before writing the temp file and renaming it into place + +### Requirement: recordMatch mutates telemetry entries in place and tracks sessions and query hits + +`recordMatch(telemetry, location, sessionId, queryIndex?)` SHALL mutate the supplied telemetry object in place. On a first match for a location, it SHALL create an entry with `matchCount = 1`, `firstMatched = now`, `lastMatched = now`, and `sessionIds = [sessionId]`. On later matches it SHALL increment `matchCount`, refresh `lastMatched`, retain unique session IDs in a sliding window capped at 50 values, and increment `queryHits[queryIndex]` when a query index is provided. + +#### Scenario: First match creates a telemetry entry + +- **WHEN** `recordMatch` is called for a location that does not yet exist in `telemetry.entries` +- **THEN** a new entry is created with count `1`, matching first and last timestamps, and the current session ID recorded once + +#### Scenario: Repeated matches update counts and maintain the session window + +- **WHEN** `recordMatch` is called repeatedly for an existing location across many sessions +- **THEN** the match count increases, `lastMatched` updates, duplicate session IDs are not re-added, and only the most recent 50 unique session IDs are retained + +#### Scenario: Query hits accumulate by query index + +- **WHEN** `recordMatch` is called with a `queryIndex` +- **THEN** the entry records or increments that index under `queryHits` + +### Requirement: recordObservation appends observations with a capped history and ignores unknown entries + +`recordObservation(telemetry, location, observation)` SHALL append the observation to the matched entry's `observations` array, creating that array when needed, and SHALL retain only the most recent 100 observations. If the location has no telemetry entry, the function SHALL do nothing. + +#### Scenario: Observation history is capped + +- **WHEN** more than 100 observations are recorded for one entry +- **THEN** only the latest 100 observations remain on that entry + +#### Scenario: Missing entries are ignored + +- **WHEN** `recordObservation` is called for a location absent from `telemetry.entries` +- **THEN** the telemetry data remains unchanged + +### Requirement: clearObservations removes stored observations and no-ops for missing entries + +`clearObservations(telemetry, location)` SHALL delete the `observations` field from the specified telemetry entry. If the entry does not exist, the function SHALL leave telemetry unchanged. + +#### Scenario: Observations are cleared from an existing entry + +- **WHEN** `clearObservations` is called for an entry with stored observations +- **THEN** the entry no longer has an `observations` field + +#### Scenario: Missing entry is ignored during clear + +- **WHEN** `clearObservations` is called for an unknown location +- **THEN** no telemetry entries are created or modified + +### Requirement: formatTelemetryReport renders a markdown table or an empty-data message + +`formatTelemetryReport(telemetry)` SHALL return `"No telemetry data."` when there are no entries. Otherwise it SHALL return a markdown table with the columns `Entry`, `Matches`, `Sessions`, `Last Match`, `Obs`, and `Query Hits`, rendering each telemetry entry on its own row. + +#### Scenario: Empty telemetry renders a fixed message + +- **WHEN** `telemetry.entries` is empty +- **THEN** the function returns `"No telemetry data."` + +#### Scenario: Non-empty telemetry renders the report table + +- **WHEN** one or more telemetry entries exist +- **THEN** the returned markdown begins with the expected header row and includes each entry's match count, session count, last match timestamp, observation count, and formatted query-hit summary + +### Requirement: getEntryTelemetry returns entry data by location + +`getEntryTelemetry(telemetry, location)` SHALL return the telemetry entry stored at that location, or `undefined` when no entry exists. + +#### Scenario: Entry lookup misses + +- **WHEN** the requested location is not present in `telemetry.entries` +- **THEN** `getEntryTelemetry` returns `undefined` diff --git a/openspec/specs/traces/spec.md b/openspec/specs/traces/spec.md new file mode 100644 index 0000000..59ad747 --- /dev/null +++ b/openspec/specs/traces/spec.md @@ -0,0 +1,79 @@ +## Requirements + +### Requirement: writeTrace creates the traces directory and writes a sanitized date-prefixed JSON filename + +`writeTrace(tracesDir, trace)` SHALL create `tracesDir` recursively before writing. It SHALL write the trace as pretty-printed JSON to a filename shaped as `{date}-{sessionSlug}.json`, where `date` is the current ISO calendar date, `sessionSlug` is derived from `trace.sessionKey` by replacing every non-alphanumeric character with `-`, and the slug is truncated to at most 60 characters. + +#### Scenario: Trace files are written under a sanitized session slug + +- **WHEN** `writeTrace` is called for a session key containing punctuation or path-like characters +- **THEN** the output filename uses only alphanumeric characters and dashes in the slug portion and is prefixed by the current date + +### Requirement: TraceAccumulator stores per-session trace state + +`TraceAccumulator` SHALL maintain per-session state in memory containing `startTime`, a `skillsInjected` set, a `toolsCalled` set, `agentId`, and `messageCount`. + +#### Scenario: Session state is initialized on first injection + +- **WHEN** a session key is first seen through `recordInjection` +- **THEN** the accumulator creates an in-memory session record with the required fields and initializes both collections as empty sets before adding new skills + +### Requirement: recordInjection adds injected skills and creates session state when missing + +`recordInjection(sessionKey, agentId, skillNames)` SHALL create a session record when one does not exist and SHALL add each provided skill name to the session's `skillsInjected` set so repeated names are deduplicated. + +#### Scenario: Repeated injection does not duplicate a skill name + +- **WHEN** the same skill name is recorded multiple times for one session +- **THEN** the session retains that skill only once in `skillsInjected` + +### Requirement: recordToolCall records tool usage in a set + +`recordToolCall(sessionKey, toolName)` SHALL add the tool name to the session's `toolsCalled` set when the session exists. + +#### Scenario: Tool call is ignored for unknown session + +- **WHEN** `recordToolCall` is called before any session entry exists for that key +- **THEN** no session record is created and no tool call is stored + +### Requirement: recordMessageCount updates the tracked message count + +`recordMessageCount(sessionKey, count)` SHALL update the session's stored `messageCount` when the session exists. + +#### Scenario: Message count replaces the prior count + +- **WHEN** `recordMessageCount` is called multiple times for the same session +- **THEN** the most recent count becomes the stored `messageCount` + +### Requirement: finalize returns an execution trace, removes session state, and only persists meaningful traces + +`finalize(sessionKey, outcome, errorSummary?)` SHALL return `null` when the session key is unknown. For known sessions it SHALL build an `ExecutionTrace` from the accumulated state, including timestamp, duration, injected skills, called tools, message count, outcome, and optional error summary, then delete the session from the accumulator. The trace SHALL be written to disk only when `messageCount > 2` or at least one tool was called. + +#### Scenario: Unknown session returns null + +- **WHEN** `finalize` is called for a session key that is not being tracked +- **THEN** it returns `null` + +#### Scenario: Finalized session is removed from the accumulator + +- **WHEN** `finalize` succeeds for a tracked session +- **THEN** that session's in-memory state is deleted before the call returns + +#### Scenario: Short idle sessions are not written to disk + +- **WHEN** a finalized session has `messageCount <= 2` and no recorded tools +- **THEN** `finalize` returns the trace object but does not call `writeTrace` + +#### Scenario: Active sessions are persisted + +- **WHEN** a finalized session has more than two messages or at least one recorded tool call +- **THEN** `finalize` writes the trace JSON to disk and returns the trace object + +### Requirement: cleanup removes expired session state based on age + +`cleanup(maxAgeMs = 3600000)` SHALL delete any tracked session whose `startTime` is older than the current time minus `maxAgeMs`. + +#### Scenario: Default cleanup removes sessions older than one hour + +- **WHEN** `cleanup()` runs with no explicit argument +- **THEN** any tracked session older than one hour is removed from the accumulator diff --git a/openspec/specs/types/spec.md b/openspec/specs/types/spec.md new file mode 100644 index 0000000..68b64c4 --- /dev/null +++ b/openspec/specs/types/spec.md @@ -0,0 +1,168 @@ +## Requirements + +### Requirement: Logger type for structured logging + +The `Logger` type SHALL provide four methods: `info(msg)`, `warn(msg)`, `error(msg)`, and an optional `debug?(msg)`. Consumers supply their own implementation; the core never constructs a Logger. + +#### Scenario: Logger with debug method + +- **WHEN** a consumer provides a Logger with `info`, `warn`, `error`, and `debug` +- **THEN** TypeScript accepts the value and all four methods are callable + +#### Scenario: Logger without debug method + +- **WHEN** a consumer provides a Logger with only `info`, `warn`, and `error` +- **THEN** TypeScript accepts the value (debug is optional) and the three required methods are callable + +### Requirement: SkillType discriminates entry kinds + +`SkillType` SHALL be a union of string literals: `"skill"`, `"memory"`, `"tool-guidance"`, `"workflow"`, `"session-learning"`, `"stop-rule"`, and `"rule"`. Each indexed entry carries exactly one SkillType, which determines how consumers present and filter it. + +#### Scenario: Valid SkillType values + +- **WHEN** a value is assigned one of the seven literal strings +- **THEN** TypeScript accepts the assignment without widening to `string` + +### Requirement: IndexedSkill captures an embedded entry + +`IndexedSkill` SHALL contain `name`, `description`, `location` (file path, possibly with `#SectionName`), `type` (SkillType), `embeddings` (array of number arrays), `queries` (search trigger strings), and optional `oneLiner` and `boost`. + +#### Scenario: IndexedSkill with memory section reference + +- **WHEN** a memory file section named "Project Conventions" at path `/skills/team.md` is indexed +- **THEN** `location` is `"/skills/team.md#Project Conventions"` and `type` is `"memory"` + +### Requirement: SkillSearchResult pairs a skill with match metadata + +`SkillSearchResult` SHALL contain `skill` (IndexedSkill), `score` (number, cosine similarity plus boost), and `bestQueryIndex` (the index into `skill.queries` that had the highest similarity to the search query). + +#### Scenario: Best query index identifies the matching trigger + +- **WHEN** a skill has queries `["deploy steps", "release process"]` and the second query matches best +- **THEN** `bestQueryIndex` is `1` + +### Requirement: ParsedFrontmatter is a partial record with list keys + +`ParsedFrontmatter` SHALL allow `name`, `description`, `queries`, `type`, `paths`, `hooks`, `keywords`, `oneLiner`, and `boost` as known keys, plus an index signature `[key: string]: unknown` for extension. `queries`, `paths`, `hooks`, and `keywords` are string arrays; `boost` is a number. + +#### Scenario: Unknown frontmatter key preserved + +- **WHEN** a skill file contains `custom_field: value` in its frontmatter +- **THEN** `ParsedFrontmatter` carries `custom_field` via the index signature + +### Requirement: CacheData uses version 2 schema + +`CacheData` SHALL have `version: 2` (literal type), `embeddingModel: string`, and `skills: Record` keyed by file location. + +#### Scenario: Version mismatch returns empty cache + +- **WHEN** `loadCache` reads a file with `version: 1` +- **THEN** the function returns an empty `CacheData` with `version: 2` + +### Requirement: CachedSkill stores serialized entry data + +`CachedSkill` SHALL contain `name`, `description`, `queries`, `embeddings` (number arrays), `mtime` (number), `type` (SkillType), and optional `oneLiner` and `boost`. + +#### Scenario: Round-trip conversion preserves all fields + +- **WHEN** an `IndexedSkill` with location "/a.md" is converted via `toCachedSkill` then `fromCachedSkill` +- **THEN** all fields match except `location`, which is passed separately as the Record key + +### Requirement: SessionState tracks per-session rule exposure + +`SessionState` SHALL contain `sessionId: string` and `shownRules: Record` mapping rule location to the timestamp when full content was last injected. + +#### Scenario: Rule shown timestamp recorded + +- **WHEN** a rule at location `/rules/deploy.md` is injected in session "abc" +- **THEN** `shownRules["/rules/deploy.md"]` is set to the injection timestamp + +### Requirement: HookInput carries platform hook payload + +`HookInput` SHALL contain `hook_event_name: string` and optional `session_id`, `transcript_path`, `cwd`, `prompt`, `tool_name`, and `tool_input` fields. `HookOutput` SHALL contain an optional `additionalContext` string. + +#### Scenario: HookInput with tool context + +- **WHEN** a PreToolUse hook fires with `tool_name: "Bash"` and `tool_input: { command: "rm -rf /" }` +- **THEN** `HookInput.tool_name` is `"Bash"` and `HookInput.tool_input` is `{ command: "rm -rf /" }` + +### Requirement: Observation tracks match assessment outcomes + +`Observation` SHALL contain `sessionId`, `prompt`, `score: number`, `queryIndex: number`, `outcome` (`"used" | "ignored" | "corrected" | "missed"`), `diagnosis: string`, and `timestamp: string` (ISO). + +#### Scenario: Observation with "ignored" outcome + +- **WHEN** a skill was matched but the agent did not follow it +- **THEN** `outcome` is `"ignored"` and `diagnosis` describes the context + +### Requirement: EntryTelemetry aggregates per-entry match statistics + +`EntryTelemetry` SHALL contain `matchCount`, `lastMatched`, `firstMatched` (ISO timestamps), `sessionIds` (string array capped at 50), optional `queryHits` (Record of query index to hit count), and optional `observations` (array capped at 100). + +#### Scenario: Session ID cap + +- **WHEN** `recordMatch` is called for the 51st unique session ID +- **THEN** `sessionIds` is trimmed to the 50 most recent entries + +### Requirement: TelemetryData uses version 1 schema + +`TelemetryData` SHALL have `version: 1` (literal type) and `entries: Record` keyed by skill location. + +#### Scenario: Missing telemetry file returns empty data + +- **WHEN** `loadTelemetry` reads a non-existent file +- **THEN** the returned `TelemetryData` has `version: 1` and `entries: {}` + +### Requirement: ExecutionTrace records session lifecycle data + +`ExecutionTrace` SHALL contain `sessionKey`, `agentId`, `timestamp` (ISO), `skillsInjected` (string array), `toolsCalled` (string array), `messageCount`, `durationMs`, `outcome` (`"completed" | "error" | "timeout" | "unknown"`), and optional `errorSummary`. + +#### Scenario: Trace with error outcome + +- **WHEN** an agent session ends with an error +- **THEN** `outcome` is `"error"` and `errorSummary` contains a short description + +### Requirement: ScoringMode controls threshold behavior + +`ScoringMode` SHALL be `"relative" | "absolute"`. In absolute mode each result must individually exceed the threshold. In relative mode, results within `maxDropoff` of the best are included if the best clears the threshold. + +#### Scenario: Relative mode returns cluster + +- **WHEN** `scoringMode` is `"relative"` with `threshold: 0.35` and `maxDropoff: 0.1` +- **THEN** a result scoring 0.72 is included if the best match scores 0.78 + +### Requirement: MemexPaths defines all filesystem paths + +`MemexPaths` SHALL contain `cacheDir`, `modelsDir`, `sessionsDir`, `syncRepoDir`, `projectsDir`, `globalSkillsDir`, `globalRulesDir`, `telemetryPath`, `registryPath`, and `tracesDir` — all strings. Consumers construct this object and pass individual paths; the core never assumes path locations. + +#### Scenario: All path fields required + +- **WHEN** a consumer constructs a `MemexPaths` object +- **THEN** all ten path fields must be provided (no optional path fields) + +### Requirement: MemexCoreConfig has sensible defaults + +`MemexCoreConfig` SHALL require `enabled`, `embeddingModel`, `embeddingBackend` (`"openai" | "local"`), `cacheTimeMs`, `topK`, `threshold`, `scoringMode`, `maxDropoff`, `maxInjectedChars`, `types` (SkillType array), `skillDirs`, and `memoryDirs`. All have defaults provided by `DEFAULT_CORE_CONFIG`. + +#### Scenario: Partial config merged with defaults + +- **WHEN** `resolveCoreConfig({ topK: 5 })` is called +- **THEN** the result has `topK: 5` and all other fields from `DEFAULT_CORE_CONFIG` + +### Requirement: SyncConfig controls cross-device synchronization + +`SyncConfig` SHALL require `enabled`, `repo`, `autoPull`, `autoCommitPush`, and `projectMappings` (Record of string to string). It SHALL include an optional `caseSensitive?: boolean` field — when unset or `false`, project IDs are lowercased across all resolution paths. + +#### Scenario: Default case handling + +- **WHEN** `SyncConfig` is constructed without `caseSensitive` +- **THEN** project IDs are lowercased across manual mappings, git remotes, and encoded path fallbacks + +### Requirement: ProjectRegistry tracks known directories + +`ProjectRegistry` SHALL have `version: 1` (literal type) and `projects: Record` mapping cwd paths to metadata. + +#### Scenario: Empty registry on missing file + +- **WHEN** `loadRegistry` reads a non-existent file +- **THEN** the returned `ProjectRegistry` has `version: 1` and `projects: {}` \ No newline at end of file