Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions openspec/specs/cache/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
## Requirements

### Requirement: Cache files use schema version 2 and are keyed by embedding model

The cache schema SHALL be `{ version: 2, embeddingModel, skills }`. `loadCache(cachePath, embeddingModel)` SHALL return an empty valid cache object with `version: 2`, the requested `embeddingModel`, and an empty `skills` map when the cache file is missing, unreadable, malformed, has a different schema version, or was created for a different embedding model.

#### Scenario: Missing or corrupt cache yields an empty cache

- **WHEN** `loadCache(cachePath, embeddingModel)` cannot read or parse the cache file
- **THEN** it returns `{ version: 2, embeddingModel, skills: {} }`

#### Scenario: Model mismatch invalidates the cache

- **WHEN** the on-disk cache was written for a different `embeddingModel`
- **THEN** `loadCache` returns an empty cache for the requested model instead of reusing stored skills

### Requirement: saveCache writes atomically through a temporary file and rename

`saveCache(cachePath, data)` SHALL create the parent directory recursively, write the serialized cache JSON to a temporary file whose name is `<cachePath>.<randomHex>.tmp`, where `<randomHex>` comes from `randomBytes(4).toString("hex")`, and then atomically replace the target path via `rename(tmpPath, cachePath)`.

#### Scenario: Cache writes use a temp-file swap

- **WHEN** `saveCache(cachePath, data)` persists cache data
- **THEN** it writes to a randomly suffixed `.tmp` file first and renames that file to `cachePath`

### Requirement: Cached skills preserve mtime-based reuse metadata

`CachedSkill` entries SHALL store an `mtime` alongside the embedded data so callers can reuse embeddings only when the current file mtime still matches the cached value. `getCachedSkill`, `setCachedSkill`, and `removeCachedSkill` SHALL read, write, and delete cache entries by location key within `cache.skills`.

#### Scenario: Cached entry can gate reuse by file mtime

- **WHEN** a caller retrieves a cached skill whose stored `mtime` matches the current file `mtime`
- **THEN** the caller has the metadata needed to reuse the cached embeddings without re-embedding the file

#### Scenario: Cached entries are keyed by location

- **WHEN** `setCachedSkill(cache, location, skill)` and `getCachedSkill(cache, location)` are used with the same location
- **THEN** the stored `CachedSkill` is returned from `cache.skills[location]`

### Requirement: Cache conversion strips and restores location around persistence

`toCachedSkill(skill, mtime)` SHALL persist the `IndexedSkill` fields except `location`, because the cache key stores that path separately. `fromCachedSkill(location, cached)` SHALL reconstruct an `IndexedSkill` by restoring the supplied `location` while preserving the cached name, description, queries, embeddings, type, `oneLiner`, and `boost` fields.

#### Scenario: Location is omitted from stored CachedSkill values

- **WHEN** `toCachedSkill(skill, mtime)` converts an `IndexedSkill`
- **THEN** the returned `CachedSkill` includes the supplied `mtime` and skill metadata, but not `location`

#### Scenario: Restoring a cached skill reinstates the location key

- **WHEN** `fromCachedSkill(location, cached)` converts a stored cache entry back to an `IndexedSkill`
- **THEN** the returned skill's `location` is the supplied key and its remaining fields come from `cached`
34 changes: 34 additions & 0 deletions openspec/specs/config/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Requirements

### Requirement: The core config exposes fixed baseline defaults

`DEFAULT_CORE_CONFIG` SHALL provide these defaults: `enabled: true`, `embeddingModel: "Xenova/all-MiniLM-L6-v2"`, `embeddingBackend: "local"`, `cacheTimeMs: 300000`, `topK: 3`, `threshold: 0.35`, `scoringMode: "relative"`, `maxDropoff: 0.1`, `maxInjectedChars: 8000`, `types: ["skill", "memory", "workflow", "session-learning", "rule"]`, `skillDirs: []`, and `memoryDirs: []`.

#### Scenario: Default config is used as the baseline

- **WHEN** the package exports `DEFAULT_CORE_CONFIG`
- **THEN** it contains the documented default values for all core config fields

### Requirement: resolveCoreConfig merges partial config with runtime type checks

`resolveCoreConfig(partial?)` SHALL return a shallow copy of `DEFAULT_CORE_CONFIG` when `partial` is omitted. When `partial` is provided, it SHALL merge field-by-field using runtime checks: `enabled` only accepts booleans, `embeddingModel` only accepts strings, `embeddingBackend` only accepts the literal `"openai"` and otherwise falls back to the default `"local"`, numeric fields (`cacheTimeMs`, `topK`, `threshold`, `maxDropoff`, `maxInjectedChars`) only accept numbers, `scoringMode` only accepts the literal `"absolute"` and otherwise falls back to the default `"relative"`, `types` accepts any array value as-is, and `skillDirs` / `memoryDirs` accept arrays whose elements are coerced with `String(...)`.

#### Scenario: Omitted partial returns a cloned default config

- **WHEN** `resolveCoreConfig()` is called without arguments
- **THEN** it returns a new object with the same field values as `DEFAULT_CORE_CONFIG`

#### Scenario: Valid overrides replace defaults

- **WHEN** `resolveCoreConfig(partial)` is called with correctly typed override values
- **THEN** those fields replace the defaults in the returned config

#### Scenario: Invalid override types fall back to defaults

- **WHEN** `resolveCoreConfig(partial)` receives values of the wrong runtime type for a field
- **THEN** that field in the returned config falls back to `DEFAULT_CORE_CONFIG`

#### Scenario: Directory arrays are string-coerced element by element

- **WHEN** `resolveCoreConfig(partial)` receives `skillDirs` or `memoryDirs` as arrays
- **THEN** the returned arrays contain `String(...)` of each supplied element
57 changes: 57 additions & 0 deletions openspec/specs/embeddings/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
## Requirements

### Requirement: Embedding providers implement a shared batch embedding contract

`EmbeddingProvider` SHALL expose a single `embed(texts: string[]): Promise<number[][]>` method. Both built-in providers SHALL accept an array of input strings and return an array of embedding vectors in matching order. When `texts` is empty, both providers SHALL return an empty array without performing backend work.

#### Scenario: Empty input short-circuits

- **WHEN** `embed([])` is called on either `LocalEmbeddingProvider` or `OpenAIEmbeddingProvider`
- **THEN** the provider returns `[]`

### Requirement: LocalEmbeddingProvider lazily initializes a local ONNX feature-extraction pipeline

`LocalEmbeddingProvider` SHALL default its model name to `"Xenova/all-MiniLM-L6-v2"`, optionally accept a `cacheDir`, and lazily initialize its extractor on the first non-empty `embed()` call by memoizing a single `extractorPromise`. Initialization SHALL resolve `@huggingface/transformers` through this fallback chain: direct `import("@huggingface/transformers")`, then `createRequire(...).resolve("@huggingface/transformers")` followed by dynamic import of the resolved path, then dynamic import of an absolute `../node_modules/@huggingface/transformers/src/transformers.js` path relative to the module directory. If all three resolution paths fail, initialization SHALL throw an install guidance error. When a `cacheDir` is provided, it SHALL be assigned to `transformers.env.cacheDir`. The created pipeline SHALL use task `"feature-extraction"`, the configured model name, and `dtype: "q8"`.

#### Scenario: First embed call initializes the extractor once

- **WHEN** `embed(texts)` is called for the first time on a `LocalEmbeddingProvider`
- **THEN** the provider initializes and memoizes a single feature-extraction pipeline before generating embeddings

#### Scenario: Optional cache directory is propagated

- **WHEN** a `LocalEmbeddingProvider` is constructed with a `cacheDir`
- **THEN** extractor initialization sets `transformers.env.cacheDir` to that directory before creating the pipeline

### Requirement: OpenAIEmbeddingProvider batches requests to the embeddings API

`OpenAIEmbeddingProvider` SHALL be constructed with a model name and API key. For non-empty inputs, it SHALL call `https://api.openai.com/v1/embeddings` with bearer-token authorization, sending inputs in batches of 2048 strings and placing each returned embedding into the original result order using the response item's `index`. If any HTTP response is non-OK, the provider SHALL read the response body text and throw `Error("OpenAI embeddings API error <status>: <body>")`.

#### Scenario: Large input is split into 2048-item batches

- **WHEN** `embed(texts)` is called with more than 2048 input strings
- **THEN** the provider submits multiple sequential requests, each containing at most 2048 strings

#### Scenario: Non-200 API response raises a descriptive error

- **WHEN** the OpenAI embeddings endpoint responds with a non-OK status
- **THEN** the provider throws an error containing the HTTP status code and response body text

### Requirement: cosineSimilarity optimizes for pre-normalized vectors and falls back safely

`cosineSimilarity(a, b)` SHALL return `0` when the vectors have different lengths. It SHALL compute the dot product for all equal-length vectors, then use a fast path that returns the dot product directly when both squared norms are within `1e-6` of `1.0`. Otherwise it SHALL compute the full cosine similarity formula `dot / (|a| * |b|)`. If the denominator is zero, it SHALL return `0`.

#### Scenario: Mismatched vector lengths return zero

- **WHEN** `cosineSimilarity(a, b)` is called with vectors of different lengths
- **THEN** the result is `0`

#### Scenario: Normalized vectors use the fast path

- **WHEN** `cosineSimilarity(a, b)` is called with vectors whose squared norms are both within `1e-6` of `1.0`
- **THEN** the result is the raw dot product

#### Scenario: Non-normalized vectors use the full cosine formula

- **WHEN** `cosineSimilarity(a, b)` is called with equal-length vectors that are not both pre-normalized
- **THEN** the result is `dot / (sqrt(normSqA) * sqrt(normSqB))`, or `0` if that denominator is zero
44 changes: 44 additions & 0 deletions openspec/specs/file-lock/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## Requirements

### Requirement: Advisory file locking via mkdir

`acquireLock(filePath)` SHALL create an advisory lock directory at `${filePath}.lock` using `mkdir`, which is atomic on all platforms. It returns an unlock function that removes the lock directory. If the lock already exists, it retries with exponential backoff until a timeout is reached.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The requirement says "exponential backoff" but the implementation uses a fixed 50ms retry interval. This mismatch could mislead future implementers into adding exponential backoff where none currently exists. Change to "fixed-interval retry" to match the actual code.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At openspec/specs/file-lock/spec.md, line 5:

<comment>The requirement says "exponential backoff" but the implementation uses a fixed 50ms retry interval. This mismatch could mislead future implementers into adding exponential backoff where none currently exists. Change to "fixed-interval retry" to match the actual code.</comment>

<file context>
@@ -0,0 +1,44 @@
+
+### Requirement: Advisory file locking via mkdir
+
+`acquireLock(filePath)` SHALL create an advisory lock directory at `${filePath}.lock` using `mkdir`, which is atomic on all platforms. It returns an unlock function that removes the lock directory. If the lock already exists, it retries with exponential backoff until a timeout is reached.
+
+#### Scenario: Lock acquired on first attempt
</file context>
Fix with Cubic


#### Scenario: Lock acquired on first attempt

- **WHEN** no lock directory exists for a given file path
- **THEN** `mkdir` succeeds, the lock directory is created, and the returned unlock function removes it

#### Scenario: Lock contention with retry

- **WHEN** another process holds the lock directory
- **THEN** the acquiring process retries every 50ms until the 5-second timeout

#### Scenario: Stale lock detection and recovery

- **WHEN** a lock directory exists whose `mtimeMs` is more than 30 seconds old
- **THEN** the lock is considered stale, force-removed, and acquisition retries immediately

#### Scenario: Lock between mkdir and stat

- **WHEN** the lock directory is released by another process between the failing `mkdir` and the `stat` check
- **THEN** `stat` throws, the catch block detects the lock was released, and acquisition retries immediately

#### Scenario: Timeout with best-effort fallback

- **WHEN** the 5-second deadline is reached without acquiring the lock
- **THEN** a no-op unlock function is returned and execution proceeds without the lock (best-effort)

### Requirement: withFileLock executes callback under lock

`withFileLock(filePath, fn)` SHALL acquire the lock, execute the callback `fn`, and release the lock in a `finally` block — even if the callback throws.

#### Scenario: Successful locked operation

- **WHEN** `withFileLock("/data/cache.json", async () => { ... })` is called
- **THEN** the lock directory `/data/cache.json.lock` exists during callback execution and is removed afterward

#### Scenario: Callback error releases lock

- **WHEN** the callback throws an error
- **THEN** the lock is released (unlock function called in `finally`) and the error propagates to the caller
79 changes: 79 additions & 0 deletions openspec/specs/git-helpers/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
## Requirements

### Requirement: Git subprocess wrapper with timeout

`git(args, cwd)` SHALL execute `git` with the given arguments in the specified working directory with a 30-second timeout. It returns `{ stdout, stderr }` on success and throws on non-zero exit codes.

#### Scenario: Successful git command

- **WHEN** `git(["rev-parse", "--git-dir"], "/path/to/repo")` is called in a valid git repo
- **THEN** the function returns `{ stdout: string, stderr: string }` with the command output

#### Scenario: Non-zero exit code

- **WHEN** `git(["checkout", "nonexistent-branch"], "/path/to/repo")` is called and git exits with code 128
- **THEN** the function throws with the error details

### Requirement: isGitRepo detects git repositories

`isGitRepo(dir)` SHALL return `true` if `git rev-parse --git-dir` succeeds in the given directory, `false` otherwise.

#### Scenario: Valid git repository

- **WHEN** `isGitRepo` is called on a directory containing a `.git` folder
- **THEN** the function returns `true`

#### Scenario: Non-git directory

- **WHEN** `isGitRepo` is called on a directory without a `.git` folder
- **THEN** the function returns `false`

### Requirement: hasRemote checks for configured remotes

`hasRemote(dir)` SHALL return `true` if `git remote` produces non-empty output, `false` otherwise.

#### Scenario: Remote configured

- **WHEN** `hasRemote` is called on a repo with `origin` configured
- **THEN** the function returns `true`

#### Scenario: No remotes

- **WHEN** `hasRemote` is called on a freshly `git init`-ed repo with no remotes
- **THEN** the function returns `false`

### Requirement: hasCommits checks for any commits

`hasCommits(dir)` SHALL return `true` if `git rev-parse HEAD` succeeds (at least one commit exists), `false` otherwise.

#### Scenario: Repo with commits

- **WHEN** `hasCommits` is called on a repo that has at least one commit
- **THEN** the function returns `true`

#### Scenario: Empty repo with no commits

- **WHEN** `hasCommits` is called on a freshly `git init`-ed repo with zero commits
- **THEN** the function returns `false`

### Requirement: getDefaultBranch resolves the default branch name

`getDefaultBranch(dir)` SHALL determine the default branch name through a three-step cascade:
1. Try `git symbolic-ref refs/remotes/origin/HEAD` and extract the branch name.
2. If that fails, try `git ls-remote --symref origin HEAD` and parse the branch from the ref.
3. If both fail, return `"main"` as the fallback.

#### Scenario: Symbolic ref resolves

- **WHEN** `git symbolic-ref refs/remotes/origin/HEAD` outputs `refs/remotes/origin/main`
- **THEN** `getDefaultBranch` returns `"main"`

#### Scenario: ls-remote fallback

- **WHEN** `symbolic-ref` fails but `ls-remote --symref origin HEAD` outputs `ref: refs/heads/develop`
- **THEN** `getDefaultBranch` returns `"develop"`

#### Scenario: Default fallback

- **WHEN** both `symbolic-ref` and `ls-remote` fail
- **THEN** `getDefaultBranch` returns `"main"`
25 changes: 25 additions & 0 deletions openspec/specs/path-encoder/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
## Requirements

### Requirement: encodeProjectPath transforms absolute paths to safe directory names

`encodeProjectPath(cwd)` SHALL transform an absolute filesystem path into a directory-name-safe string by replacing `/`, `.`, and `_` characters with `-` (hyphen). Consecutive hyphens are preserved because they encode original dots and separators.

#### Scenario: Typical Unix path

- **WHEN** `encodeProjectPath("/home/user/.myproject")` is called
- **THEN** the result is `"-home-user--myproject"`

#### Scenario: Path with underscores

- **WHEN** `encodeProjectPath("/Users/jim/work/my_project")` is called
- **THEN** the result is `"-Users-jim-work-my-project"`

#### Scenario: Root path

- **WHEN** `encodeProjectPath("/")` is called
- **THEN** the result is `"-"`

#### Scenario: Path used in _local fallback

- **WHEN** `resolveProjectId` falls through to the encoded path fallback for `/home/me/work`
- **THEN** the resulting project ID contains `"_local/-home-me-work"` as the encoded segment
Loading
Loading