Skip to content

feat: true incremental vector index with concurrent README fetching#230

Merged
AmintaCCCP merged 16 commits into
mainfrom
feature/incremental-vector-index
Jun 27, 2026
Merged

feat: true incremental vector index with concurrent README fetching#230
AmintaCCCP merged 16 commits into
mainfrom
feature/incremental-vector-index

Conversation

@AmintaCCCP

@AmintaCCCP AmintaCCCP commented Jun 27, 2026

Copy link
Copy Markdown
Owner

改动概览

1. 增量索引真正增量 — vector_indexed_at 字段

  • 全数据流添加 vector_indexed_at:Repository 类型 → SQLite schema → backend CRUD → sync import → repositoryMerge
  • 增量索引只索引未索引或内容已更新的仓库(比较 vector_indexed_at vs last_edited/analyzed_at
  • 重建索引前清除所有 vector_indexed_at,完成后只为成功索引的 repo 设置

2. 未索引数量 badge

  • 增量索引按钮旁显示待索引仓库数量,为 0 时 disabled

3. README 获取并发化

  • 串行 for 循环 → 并发 5 的 Promise.allSettled 批次

4. 同步按钮不再触发自动索引

  • 移除同步后的自动索引代码块(因新仓库没有 analyzed_at,实际是空操作)

审计修复

  • onRepoIndexedgetState() 获取最新数据,避免 stale closure 覆盖并发编辑
  • vectorCount 增量时累加,重建时替换全部

Summary by CodeRabbit

  • New Features

    • Added full end-to-end vector_indexed_at support: schema migration, repository API responses, bulk upsert, sync import/export, and local merge preservation.
    • Updated vector indexing controls with full rebuild and incremental indexing (including accurate “unindexed” counting and status/progress updates).
  • Bug Fixes

    • Improved embedding reliability by retrying with progressively truncated per-item text on length/token-limit errors.
    • Added vector_indexed_at validation for repository updates.
    • Cleanup query now requests different metadata while keeping deletion behavior unchanged.
  • Chores / Tests

    • Persisted/restored vector search status across restarts and added/updated unit tests for fallback and truncation behavior.

- Add vector_indexed_at field across full data flow (type, DB schema, CRUD, sync, merge)
- Make incremental index truly incremental: only index repos without vector_indexed_at
- Add freshness check: re-index when content updated after last index
- Rebuild clears all vector_indexed_at before full re-index
- Concurrent README fetching (5 parallel, Promise.allSettled)
- Show unindexed count badge on incremental index button
- Remove auto-indexing from sync button (was effectively a no-op)
- Fix stale closure in onRepoIndexed callback (use getState())
- Fix vectorCount to accumulate for incremental operations

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds vector_indexed_at storage and API/import propagation, updates vector indexing with incremental callbacks and result IDs, and changes the settings UI, store, and star-sync flow to use the new indexing state.

Changes

Vector indexing timestamp and incremental flow

Layer / File(s) Summary
Repository metadata field
src/types/index.ts, server/src/db/schema.ts, src/utils/repositoryMerge.ts
vector_indexed_at is added to the repository type, database schema, and local metadata merge fields.
Repository API persistence
server/src/routes/repositories.ts, server/src/routes/sync.ts
vector_indexed_at is returned from repository reads and persisted through bulk upsert, partial update, and repository import paths.
Incremental indexing service
src/services/vectorSearchService.ts, src/services/vectorSearchService.test.ts
indexAllRepos adds incremental filtering, concurrent README fetching, per-repo callbacks, indexedRepoIds, and fallback embedding behavior covered by tests.
Indexing controls and state
src/store/useAppStore.ts, src/components/SearchBar.tsx, src/components/SearchBar.test.tsx, src/components/settings/VectorSearchSettings.tsx
vectorSearchStatus is persisted in the store, repository metadata batch updates are added, star-sync stops auto-indexing, and the vector search settings screen computes unindexed counts and drives rebuild or incremental indexing.
Cleanup query metadata
cloudflare-worker/src/index.ts, cloudflare-worker/worker.js
The cleanup query changes its returnMetadata setting to 'none' in both worker implementations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit hopped through bits and bloom,
And stamped each repo with time and room.
Vector stars now line up neat,
With counts and syncs in tidy beat.
I twitch my nose, then bink and grin,
For indexed hops to win, win, win. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: true incremental vector indexing and concurrent README fetching.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/incremental-vector-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces incremental vector indexing for repositories, adding a vector_indexed_at timestamp to track indexing status across the database, routes, and frontend. It removes background indexing from the search bar, adds concurrency to README fetching, and displays an unindexed repository count in the settings UI. The reviewer identified a bug in the SQL update statement that prevents clearing the vector_indexed_at field, and highlighted a performance issue where updating repositories in a loop triggers excessive state updates and IndexedDB writes, recommending a single batch update using setRepositories instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread server/src/routes/repositories.ts Outdated
last_edited = CASE WHEN excluded.last_edited IS NOT NULL AND excluded.last_edited != '' THEN excluded.last_edited ELSE repositories.last_edited END,
subscribed_to_releases = excluded.subscribed_to_releases
subscribed_to_releases = excluded.subscribed_to_releases,
vector_indexed_at = CASE WHEN excluded.vector_indexed_at IS NOT NULL AND excluded.vector_indexed_at != '' THEN excluded.vector_indexed_at ELSE repositories.vector_indexed_at END

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CASE WHEN clause prevents vector_indexed_at from ever being cleared (set to NULL) during a rebuild or reset, since any NULL or empty string sent by the frontend will just fall back to the existing database value. Since the frontend already preserves local metadata (including vector_indexed_at) during sync merges, it is safe and correct to directly assign excluded.vector_indexed_at.

Suggested change
vector_indexed_at = CASE WHEN excluded.vector_indexed_at IS NOT NULL AND excluded.vector_indexed_at != '' THEN excluded.vector_indexed_at ELSE repositories.vector_indexed_at END
vector_indexed_at = excluded.vector_indexed_at

Comment on lines 58 to 62
setVectorIndexingState,
repositories,
githubToken,
updateRepository,
} = useAppStore();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Destructure setRepositories from useAppStore so that we can perform efficient batch updates of the repository list instead of calling updateRepository in a loop.

Suggested change
setVectorIndexingState,
repositories,
githubToken,
updateRepository,
} = useAppStore();
setVectorIndexingState,
repositories,
githubToken,
updateRepository,
setRepositories,
} = useAppStore();

Comment on lines +262 to +267
// 2. 清除所有 vector_indexed_at(包括之前失败/不可索引的 repo 的残留值)
for (const repo of repositories) {
if (repo.vector_indexed_at) {
updateRepository({ ...repo, vector_indexed_at: undefined });
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling updateRepository in a loop triggers a state update and IndexedDB write for each repository. This is extremely inefficient and can cause significant UI lag or crash the app when there are many repositories. Instead, perform a single batch update using setRepositories.

      // 2. 清除所有 vector_indexed_at(包括之前失败/不可索引的 repo 的残留值)
      const clearedRepos = repositories.map(repo =>
        repo.vector_indexed_at ? { ...repo, vector_indexed_at: undefined } : repo
      );
      setRepositories(clearedRepos);

Comment on lines +284 to +290
// 4. 为成功索引的 repo 设置 vector_indexed_at
const indexedSet = new Set(result.indexedRepoIds);
for (const repo of useAppStore.getState().repositories) {
if (indexedSet.has(repo.id)) {
updateRepository({ ...repo, vector_indexed_at: now });
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling updateRepository in a loop triggers a state update and IndexedDB write for each successfully indexed repository. This is extremely inefficient and can cause significant UI lag or crash the app when there are many repositories. Instead, perform a single batch update using setRepositories.

      // 4. 为成功索引的 repo 设置 vector_indexed_at
      const indexedSet = new Set(result.indexedRepoIds);
      const updatedRepos = useAppStore.getState().repositories.map(repo =>
        indexedSet.has(repo.id) ? { ...repo, vector_indexed_at: now } : repo
      );
      setRepositories(updatedRepos);

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/components/settings/VectorSearchSettings.tsx (1)

236-240: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Reuse one GitHub client for README batches.

Creating a new GitHubApiService per README request discards the service’s per-instance rate-limit state during the new concurrent fetch flow. Instantiate it once inside createClients and close over it.

♻️ Proposed refactor
-    const readmeFetcher = githubToken
-      ? (owner: string, repo: string, signal?: AbortSignal) => {
-          const api = new GitHubApiService(githubToken);
-          return api.getRepositoryReadme(owner, repo, signal);
-        }
-      : undefined;
+    const githubApi = githubToken ? new GitHubApiService(githubToken) : null;
+    const readmeFetcher = githubApi
+      ? (owner: string, repo: string, signal?: AbortSignal) =>
+          githubApi.getRepositoryReadme(owner, repo, signal)
+      : undefined;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/settings/VectorSearchSettings.tsx` around lines 236 - 240, The
README batch fetcher currently creates a new GitHubApiService inside
readmeFetcher for every request, which loses per-instance rate-limit state
during concurrent reads. Move the GitHubApiService instantiation into
createClients and have readmeFetcher close over that single shared client, while
keeping the existing getRepositoryReadme usage and githubToken gating intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 254-257: The full rebuild flow in VectorSearchSettings should
prune vectors only for repositories that were successfully rebuilt, not every
current repository ID. Move the cleanup logic out of the early pre-rebuild step
and base it on indexedRepoIds/result.indexed from the rebuild outcome, then call
clients.vectorService.cleanup with only those successful IDs. Update the related
success reporting around result.indexed so cleanup and indexing stay in sync.
- Around line 341-346: The vector count update in VectorSearchSettings should
not add result.indexed for repositories that were already indexed and only
re-upserted. Adjust the setVectorSearchStatus logic to count only newly indexed
repos by checking which items lack a previous vector_indexed_at (or equivalent
state) before incrementing vectorCount, using the existing result,
setVectorIndexingState, and useAppStore.getState() flow as the anchor points.

---

Nitpick comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 236-240: The README batch fetcher currently creates a new
GitHubApiService inside readmeFetcher for every request, which loses
per-instance rate-limit state during concurrent reads. Move the GitHubApiService
instantiation into createClients and have readmeFetcher close over that single
shared client, while keeping the existing getRepositoryReadme usage and
githubToken gating intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b62bc187-8527-4598-899d-04dca1c34d36

📥 Commits

Reviewing files that changed from the base of the PR and between 22c23e4 and 0ca16ac.

📒 Files selected for processing (8)
  • server/src/db/schema.ts
  • server/src/routes/repositories.ts
  • server/src/routes/sync.ts
  • src/components/SearchBar.tsx
  • src/components/settings/VectorSearchSettings.tsx
  • src/services/vectorSearchService.ts
  • src/types/index.ts
  • src/utils/repositoryMerge.ts
💤 Files with no reviewable changes (1)
  • src/components/SearchBar.tsx

Comment thread src/components/settings/VectorSearchSettings.tsx Outdated
Comment thread src/components/settings/VectorSearchSettings.tsx Outdated
AmintaCCCP and others added 2 commits June 27, 2026 15:49
- Fix A: ON CONFLICT use direct assignment for vector_indexed_at (allow NULL on rebuild)
- Fix B: Add setRepositories to store destructuring
- Fix C/D: Replace per-repo updateRepository loops with batch setRepositories
- Fix E: Move cleanup after rebuild, keep only successfully indexed vectors
- Fix F: vectorCount only counts newly indexed repos (not re-indexed ones)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/routes/repositories.ts (1)

203-204: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Validate vector_indexed_at before persisting it.

Both bulk upsert and PATCH currently accept any JSON value. A number/object can either fail binding or break incremental comparisons that expect an ISO string/null.

🐛 Proposed fix
       if (typeof repo.stargazers_count !== 'number' || repo.stargazers_count < 0) {
         res.status(400).json({ error: 'Each repository must have a valid non-negative stargazers_count', code: 'INVALID_STARGAZERS_COUNT' });
         return;
       }
+      if (
+        'vector_indexed_at' in repo &&
+        repo.vector_indexed_at != null &&
+        typeof repo.vector_indexed_at !== 'string'
+      ) {
+        res.status(400).json({ error: 'vector_indexed_at must be an ISO string or null', code: 'INVALID_VECTOR_INDEXED_AT' });
+        return;
+      }
     }
     const updates = req.body as Record<string, unknown>;
+    if (
+      'vector_indexed_at' in updates &&
+      updates.vector_indexed_at != null &&
+      typeof updates.vector_indexed_at !== 'string'
+    ) {
+      res.status(400).json({ error: 'vector_indexed_at must be an ISO string or null', code: 'INVALID_VECTOR_INDEXED_AT' });
+      return;
+    }
 
     const allowedFields: Record<string, (v: unknown) => unknown> = {
-      vector_indexed_at: (v) => v,
+      vector_indexed_at: (v) => v ?? null,

Also applies to: 238-238

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/repositories.ts` around lines 203 - 204, Validate
repo.vector_indexed_at in the repositories upsert/PATCH flow before saving it,
and only persist a null or a valid ISO date string. Update the logic around the
bulk upsert and PATCH handling in repositories route code (the repository
persistence mapping) so invalid JSON values are rejected or normalized instead
of passing through to comparisons or DB binding.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 274-279: The rebuild flow in VectorSearchSettings should not
continue as successful if vectorService.cleanup fails. In the rebuild handler
around the cleanup try/catch, propagate the cleanup error or otherwise mark the
operation as failed so the success UI/state update and the “rebuilt vectors”
report are not emitted after a failed cleanup. Use the existing rebuild logic,
result.indexedRepoIds, and cleanup call to ensure success is only stamped after
both rebuild and cleanup complete.

---

Outside diff comments:
In `@server/src/routes/repositories.ts`:
- Around line 203-204: Validate repo.vector_indexed_at in the repositories
upsert/PATCH flow before saving it, and only persist a null or a valid ISO date
string. Update the logic around the bulk upsert and PATCH handling in
repositories route code (the repository persistence mapping) so invalid JSON
values are rejected or normalized instead of passing through to comparisons or
DB binding.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e6afa5a-8453-4c94-a5bd-c44117cd47b3

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca16ac and fddc07e.

📒 Files selected for processing (2)
  • server/src/routes/repositories.ts
  • src/components/settings/VectorSearchSettings.tsx

Comment thread src/components/settings/VectorSearchSettings.tsx
AmintaCCCP and others added 2 commits June 27, 2026 15:58
Re-throw cleanup error so rebuild is not reported as successful when
stale vectors cannot be pruned. Includes error message in result.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
SiliconFlow (and similar providers) reject an entire batch with error
20015 "parameter is invalid" when any single text exceeds the model's
token limit. Previously this failed all 100 repos in the batch with no
retry, leaving them permanently unindexed.

Changes:
- Add embedWithFallback: batch first (fast path), fall back to per-item
  embedding only on length-class errors (400/20015/"too long"). Non-length
  errors (auth/5xx) still fail the whole batch.
- Add truncation retry ladder for oversized items: halve text length
  (6000 → 3000 → 1500 → ... → 256 floor) before giving up on an item.
- Rewrite embedding loop to upsert only successful items, count failures
  per-item instead of per-batch.
- Lower default batchSize 100 → 32 to reduce blast radius.

Answering the chunk question: each batch = N repos (now 32), one embed
call + one upsert per batch. This never affected retrieval precision
(each repo is an independent vector keyed by repo id); it only affected
success rate. Smaller batches + per-item isolation fix the failures.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/vectorSearchService.ts (1)

493-494: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Compare against the latest content timestamp.

When last_edited exists, analyzed_at is ignored. Since the embedding text includes analysis-derived fields, a repo re-analyzed after indexing can be skipped incorrectly.

Proposed fix
-      const contentTime = r.last_edited || r.analyzed_at || '';
+      const contentTime = [r.last_edited, r.analyzed_at]
+        .filter((time): time is string => Boolean(time))
+        .reduce((latest, time) => (time > latest ? time : latest), '');
       return contentTime > r.vector_indexed_at;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/vectorSearchService.ts` around lines 493 - 494, In
vectorSearchService, the timestamp comparison in the logic that uses last_edited
and analyzed_at is choosing last_edited first and ignoring analyzed_at when both
exist. Update the comparison to use the latest content timestamp instead of a
fixed priority, so the check against vector_indexed_at reflects whichever of
last_edited or analyzed_at is newer. Locate and adjust the return logic in the
function handling repo reindexing/index freshness.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/vectorSearchService.test.ts`:
- Line 69: The mock callbacks in vectorSearchService.test.ts are declaring
unused parameters, which triggers ESLint. Update the callback passed to
makeClient and the other affected callback so they only accept the parameters
they actually use, removing the unused _purpose and _signal placeholders while
keeping the same behavior in the test helpers.
- Line 94: The test fixture in vectorSearchService.test.ts is too generic
because the thrown message “400 parameter is invalid” hard-codes a broad
detector; update the fixture to use the length-specific 20015 payload or
explicit token/length wording in the relevant test case, and add a separate
negative assertion that generic 400 errors do not trigger the same handling in
the vectorSearchService tests.

In `@src/services/vectorSearchService.ts`:
- Around line 371-378: The length-error matcher in vectorSearchService’s
token/length helper is too broad because it treats generic 400s and “parameter
is invalid” as token-length issues. Narrow the checks in that helper so only
explicit length/token messages are classified as retryable length failures, and
leave unrelated 400/configuration errors to propagate from indexAllRepos instead
of being converted into per-item nulls.
- Around line 452-455: The per-item fallback in vectorSearchService should not
swallow non-length errors after a batch length failure; in the single-item retry
path around the looksLikeLengthError(err) check, rethrow any non-length error
instead of breaking and continuing so 5xx/network/auth failures still fail the
batch. Update the retry loop logic in the relevant vectorSearchService method to
only continue shortening candidates for length-related errors and to propagate
other errors immediately.

---

Outside diff comments:
In `@src/services/vectorSearchService.ts`:
- Around line 493-494: In vectorSearchService, the timestamp comparison in the
logic that uses last_edited and analyzed_at is choosing last_edited first and
ignoring analyzed_at when both exist. Update the comparison to use the latest
content timestamp instead of a fixed priority, so the check against
vector_indexed_at reflects whichever of last_edited or analyzed_at is newer.
Locate and adjust the return logic in the function handling repo
reindexing/index freshness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 608a9171-ef54-4030-bb03-f989bc039fed

📥 Commits

Reviewing files that changed from the base of the PR and between 996d23a and 4fe81d5.

📒 Files selected for processing (2)
  • src/services/vectorSearchService.test.ts
  • src/services/vectorSearchService.ts

Comment thread src/services/vectorSearchService.test.ts Outdated
Comment thread src/services/vectorSearchService.test.ts Outdated
Comment thread src/services/vectorSearchService.ts
Comment thread src/services/vectorSearchService.ts Outdated
AmintaCCCP and others added 2 commits June 27, 2026 16:58
- Fix 1: narrow looksLikeLengthError to explicit length/token keywords
  (20015, "input length", "maximum token", "token limit", "too long").
  Drop generic 400 and "parameter is invalid" so config errors propagate
  instead of being misclassified as retryable length failures.
- Fix 2: in per-item retry, rethrow non-length errors (5xx/network/auth)
  instead of swallowing them with break, so the whole batch fails fast.
- Fix 3: freshness check uses the newer of last_edited/analyzed_at
  (sorted pop) instead of fixed last_edited-first priority.
- Fix 4: remove unused _purpose/_signal params from test mock callbacks.
- Fix 5: use explicit 20015 payload in length fixtures; add negative
  assertion that generic 400 errors do NOT trigger per-item fallback.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Problem: incremental index reported "784 failed" with error
"Cannot read properties of undefined (reading vectorCount)".

Root cause: useAppStore initial state had no vectorSearchStatus field,
so it was undefined unless the user had run "Test Worker Connection" first.
handleIncrementalIndex read .vectorCount without optional chaining, threw,
and the catch block then overwrote the successful index result with
errors: repositories.length — marking actually-succeeded repos as failed.

Fixes:
- Initialize vectorSearchStatus in store ({ connected, vectorCount, dimensions }).
- Use optional chaining (?.vectorCount ?? 0) for the prevCount read.
- Wrap setVectorSearchStatus in its own try/catch so a status-update
  failure cannot roll back the already-succeeded indexing result.
- Rebuild catch now also surfaces the error message.

UI: unify button/badge styling — incremental index + abort buttons now use
rounded-lg (matching Rebuild), badge padding px-1.5 → px-2 (matches project
badge convention in SearchBar).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AmintaCCCP

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/routes/repositories.ts (1)

226-239: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate vector_indexed_at before writing it.

Line 238 currently persists whatever value the client sends. If this column gets a number/boolean/object instead of null | ISO string, the later comparisons in src/components/settings/VectorSearchSettings.tsx and src/services/vectorSearchService.ts will misclassify repos as indexed or stale. Normalize empty values to null and reject non-string timestamps with a 400 here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/repositories.ts` around lines 226 - 239, The repository
update path currently writes vector_indexed_at without validating its shape, so
adjust the allowedFields handling in repositories.ts to normalize empty values
to null and reject any non-string timestamp before persistence. Update the
update logic around allowedFields/vector_indexed_at to return a 400 for invalid
inputs, and ensure only null or ISO string values can be stored so
VectorSearchSettings and vectorSearchService keep their indexed/stale checks
accurate.
🧹 Nitpick comments (1)
src/components/settings/VectorSearchSettings.tsx (1)

239-243: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Reuse one GitHub API client for README fetches.

GitHubApiService keeps rate-limit state on the instance; creating a fresh client for every README call loses that state between batches and increases the chance of empty README fallbacks when limits are low.

♻️ Proposed refactor
-    const readmeFetcher = githubToken
-      ? (owner: string, repo: string, signal?: AbortSignal) => {
-          const api = new GitHubApiService(githubToken);
-          return api.getRepositoryReadme(owner, repo, signal);
-        }
+    const githubApi = githubToken ? new GitHubApiService(githubToken) : null;
+    const readmeFetcher = githubApi
+      ? (owner: string, repo: string, signal?: AbortSignal) =>
+          githubApi.getRepositoryReadme(owner, repo, signal)
       : undefined;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/settings/VectorSearchSettings.tsx` around lines 239 - 243,
Reuse a single GitHubApiService instance for README fetching in
VectorSearchSettings instead of constructing a new client inside readmeFetcher
each time. Create the GitHubApiService once when githubToken is available, then
have readmeFetcher call getRepositoryReadme on that shared instance so
rate-limit state is preserved across README batches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Line 61: The metadata stamping flow in VectorSearchSettings is replacing the
active filtered searchResults by calling setRepositories, which resets the
result set to all repositories. Update the vector_indexed_at handling to perform
a batch repository metadata update that maps changes by repo id across both
repositories and the current searchResults state without swapping out the
filtered list. Make the fix in the related vector metadata update paths in
VectorSearchSettings so the search view stays intact after stamping.

In `@src/services/vectorSearchService.ts`:
- Around line 433-436: Retry candidates are not shrinking for moderately long
inputs because the ladder in vectorSearchService uses a fixed retryMaxChars cap,
so truncateForRetry can return the original text multiple times. Update the
candidate construction to base each retry step on original.length or
progressively halve the previous candidate so each retry is strictly shorter,
and keep the logic centered around truncateForRetry and the candidates array in
vectorSearchService.
- Around line 573-581: The batch upsert path in vectorSearchService is marking
every repo in vectorizeVectors as indexed immediately after
vectorService.upsert, but that call can partially succeed, so indexedRepoIds,
onRepoIndexed, and the indexed count may drift. Update the flow to avoid
treating batch success as per-item success: either have vectorService.upsert
return per-vector success results and only push repo IDs for confirmed items, or
re-check index state before adding each repo in this batch-processing block.

In `@src/store/useAppStore.ts`:
- Line 1053: Persist or restore vectorSearchStatus in useAppStore so vectorCount
survives restarts; the current store initialization includes vectorSearchStatus
but the persisted slice/migration path does not, causing incremental counting to
start from zero. Update the persistence/migration logic around useAppStore to
load or rehydrate vectorSearchStatus (especially vectorCount) before any
incremental indexing runs, and ensure any missing persisted status is
initialized during migration.

---

Outside diff comments:
In `@server/src/routes/repositories.ts`:
- Around line 226-239: The repository update path currently writes
vector_indexed_at without validating its shape, so adjust the allowedFields
handling in repositories.ts to normalize empty values to null and reject any
non-string timestamp before persistence. Update the update logic around
allowedFields/vector_indexed_at to return a 400 for invalid inputs, and ensure
only null or ISO string values can be stored so VectorSearchSettings and
vectorSearchService keep their indexed/stale checks accurate.

---

Nitpick comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 239-243: Reuse a single GitHubApiService instance for README
fetching in VectorSearchSettings instead of constructing a new client inside
readmeFetcher each time. Create the GitHubApiService once when githubToken is
available, then have readmeFetcher call getRepositoryReadme on that shared
instance so rate-limit state is preserved across README batches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e7458e31-20a8-454f-af19-d5a4679d2959

📥 Commits

Reviewing files that changed from the base of the PR and between 22c23e4 and 70abc4a.

📒 Files selected for processing (10)
  • server/src/db/schema.ts
  • server/src/routes/repositories.ts
  • server/src/routes/sync.ts
  • src/components/SearchBar.tsx
  • src/components/settings/VectorSearchSettings.tsx
  • src/services/vectorSearchService.test.ts
  • src/services/vectorSearchService.ts
  • src/store/useAppStore.ts
  • src/types/index.ts
  • src/utils/repositoryMerge.ts
💤 Files with no reviewable changes (1)
  • src/components/SearchBar.tsx

Comment thread src/components/settings/VectorSearchSettings.tsx Outdated
Comment thread src/services/vectorSearchService.ts Outdated
Comment thread src/services/vectorSearchService.ts
Comment thread src/store/useAppStore.ts
AmintaCCCP and others added 5 commits June 27, 2026 17:37
The SearchBar component uses useDialog() (requires DialogProvider) and
useAppStore.getState() (for vectorSearchConfig/embeddingConfigs in effects).
The test mocked useAppStore as a hook but not useDialog or getState, causing
"useDialog must be used within a DialogProvider" errors and swallowed
state-access errors during effect runs.

- Mock useDialog (matches ForkTimeline.test.tsx convention).
- Add vectorSearchConfig/vectorSearchStatus/embeddingConfigs to base state.
- Expose getState() on the mocked store.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Validate vector_indexed_at shape in PATCH endpoint: reject non-string
  (number/boolean/object) with 400, normalize null/empty to null.
- Add updateRepositoriesMetadata store action for batch metadata updates
  that preserves the filtered searchResults (setRepositories was resetting
  searchResults to all repos, breaking the active search view).
- Use updateRepositoriesMetadata in rebuild/incremental handlers.
- Fix retry candidate ladder: dedupe candidates so each retry step is
  strictly shorter (truncateForRetry could return the original when
  text was already under retryMaxChars).
- Use upsert result count for indexedRepoIds/indexed instead of assuming
  full batch success; count shortfall as errors.
- Persist vectorSearchStatus (vectorCount) across restarts; migrate old
  persisted state to initialize the field.
- Reuse a single GitHubApiService instance for README fetches to preserve
  rate-limit state across batches.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both handleRebuildIndex and handleIncrementalIndex captured `repositories`
from the useCallback closure, which could be stale when called after a
sync or AI analysis that updated the store without re-rendering the
settings panel. This caused:

- Badge showing 1 (fresh state) but handler indexing 0 repos (stale)
- Silent "instant complete" with no progress or results
- Newly analyzed repos not being found by AI vector search

Fix: read useAppStore.getState().repositories inside each handler at
call time. Remove `repositories` from both useCallback dependency arrays.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s to brand-indigo

- Cleanup failure (Vectorize returnMetadata:false bug) no longer discards
  successful indexing result — warns instead of throwing
- Incremental index badge: bg-purple-500 → bg-brand-indigo
- Progress bar: bg-purple-500 → bg-brand-indigo

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AmintaCCCP

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@AmintaCCCP

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/components/settings/VectorSearchSettings.tsx (1)

281-286: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Treat cleanup failure as rebuild failure.

If cleanup() fails here, stale vectors remain in the index while Lines 288-299 still stamp a successful rebuild and replace vectorCount with result.indexed.

🔧 Minimal fix
       try {
         await clients.vectorService.cleanup(result.indexedRepoIds.map(String), controller.signal);
       } catch (cleanupErr) {
-        console.warn('Vector cleanup failed after rebuild (non-fatal):', cleanupErr);
+        throw cleanupErr;
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/settings/VectorSearchSettings.tsx` around lines 281 - 286,
Treat the post-rebuild cleanup as part of the rebuild flow instead of a
non-fatal step in VectorSearchSettings. In the rebuild handler around
cleanup(result.indexedRepoIds.map(String), controller.signal), remove the silent
catch/console.warn behavior and make cleanup failures abort the rebuild path so
the success state is not stamped when stale vectors may remain. Ensure the
success update logic that follows only runs after both the rebuild and cleanup
succeed, so the rebuild status, vectorCount, and related state in this component
reflect a fully consistent index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/components/SearchBar.test.tsx`:
- Around line 77-80: `SearchBar.test.tsx` is stubbing `useAppStore.getState()`
to always return a fresh `baseStoreState()`, which ignores per-test overrides
set via `mockUseAppStore.mockReturnValue(createStoreState(...))`. Update the
`mockUseAppStore`/`getState` mock so it returns the same mocked store instance
or current mocked state used by `useAppStore`, keeping direct `getState()` calls
in `SearchBar` aligned with each test’s overridden store state.

In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 305-307: The fallback indexing result is using the total
repository count, which can overreport failures for repos that were never
eligible to be indexed. Update the error fallback in VectorSearchSettings’s
indexing flow (including the indexAllRepos path and the similar fallback around
the other referenced block) so it reports only repos actually attempted or
leaves the failed count unset/accurate rather than using
useAppStore.getState().repositories.length. Keep the state update logic in
setVectorIndexingState consistent with the real attempted set.

---

Duplicate comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 281-286: Treat the post-rebuild cleanup as part of the rebuild
flow instead of a non-fatal step in VectorSearchSettings. In the rebuild handler
around cleanup(result.indexedRepoIds.map(String), controller.signal), remove the
silent catch/console.warn behavior and make cleanup failures abort the rebuild
path so the success state is not stamped when stale vectors may remain. Ensure
the success update logic that follows only runs after both the rebuild and
cleanup succeed, so the rebuild status, vectorCount, and related state in this
component reflect a fully consistent index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 97d9a86a-fb3a-4239-adb5-4166d1c046a6

📥 Commits

Reviewing files that changed from the base of the PR and between 70abc4a and db0c549.

📒 Files selected for processing (5)
  • server/src/routes/repositories.ts
  • src/components/SearchBar.test.tsx
  • src/components/settings/VectorSearchSettings.tsx
  • src/services/vectorSearchService.ts
  • src/store/useAppStore.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/routes/repositories.ts
  • src/services/vectorSearchService.ts

Comment thread src/components/SearchBar.test.tsx Outdated
Comment thread src/components/settings/VectorSearchSettings.tsx Outdated
- Worker: returnMetadata: false → 'none' (Vectorize rejects boolean)
- Rebuild handler: restore cleanup as fatal (now safe with Worker fix)
- Error fallback: use indexable count instead of total repositories.length
- Test: getState() now returns current mock state instead of fresh base

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AmintaCCCP

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/components/settings/VectorSearchSettings.tsx (1)

372-378: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Use the incremental predicate for fallback error counts.

In incremental mode, indexAllRepos(..., incremental: true) should only attempt repos missing/stale vector_indexed_at, but this fallback reports every analyzed/non-failed repo as an error, including already up-to-date repos. Reuse the same predicate as unindexedCount here.

🐛 Proposed fix
-        const currentRepos = useAppStore.getState().repositories;
-        const indexableCount = currentRepos.filter((r) => r.analyzed_at && !r.analysis_failed).length;
-        const skippedCount = currentRepos.length - indexableCount;
+        const currentRepos = useAppStore.getState().repositories;
+        const attemptedCount = currentRepos.filter((r) => {
+          if (!r.analyzed_at || r.analysis_failed) return false;
+          if (!r.vector_indexed_at) return true;
+          const contentTime = [r.last_edited, r.analyzed_at]
+            .filter((t): t is string => !!t)
+            .sort()
+            .pop() || '';
+          return contentTime > r.vector_indexed_at;
+        }).length;
+        const skippedCount = currentRepos.length - attemptedCount;
         setVectorIndexingState({
           isIndexing: false,
           phase: null,
-          result: { indexed: 0, skipped: skippedCount, errors: indexableCount, error: msg },
+          result: { indexed: 0, skipped: skippedCount, errors: attemptedCount, error: msg },
         });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/settings/VectorSearchSettings.tsx` around lines 372 - 378, The
fallback error counting in VectorSearchSettings’s indexing state is using a
broad analyzed/non-failed repository filter, which overcounts errors in
incremental mode. Update the logic in VectorSearchSettings so the `errors` value
reuses the same predicate as `unindexedCount` from `indexAllRepos(...,
incremental: true)`, ensuring only repos missing or stale `vector_indexed_at`
are counted. Keep the existing state update shape in `setVectorIndexingState`,
but align the fallback count with the incremental selection predicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 372-378: The fallback error counting in VectorSearchSettings’s
indexing state is using a broad analyzed/non-failed repository filter, which
overcounts errors in incremental mode. Update the logic in VectorSearchSettings
so the `errors` value reuses the same predicate as `unindexedCount` from
`indexAllRepos(..., incremental: true)`, ensuring only repos missing or stale
`vector_indexed_at` are counted. Keep the existing state update shape in
`setVectorIndexingState`, but align the fallback count with the incremental
selection predicate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 904f9d47-560a-4783-b6a8-18e25b1cf3a7

📥 Commits

Reviewing files that changed from the base of the PR and between db0c549 and 399f78e.

📒 Files selected for processing (4)
  • cloudflare-worker/src/index.ts
  • cloudflare-worker/worker.js
  • src/components/SearchBar.test.tsx
  • src/components/settings/VectorSearchSettings.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/components/SearchBar.test.tsx

Incremental handler's catch block now reuses the same predicate as
unindexedCount (analyzed + not failed + vector_indexed_at stale)
instead of broad indexable filter.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AmintaCCCP

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@AmintaCCCP

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/services/vectorSearchService.ts (1)

385-394: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

The fallback ladder still either jumps straight to 256 chars or does not shrink at all.

truncateForRetry(10000, 6000) currently collapses to 256, while any text already under retryMaxChars produces duplicate candidates and gets no shorter retry. That drops most README context for rescued items and still leaves many token-heavy 1.5k–6k inputs unrecoverable.

Suggested fix
 export function truncateForRetry(text: string, maxChars: number): string {
-  // 已经够短,无需截断
-  if (text.length <= maxChars) return text;
-  // 逐级折半 maxChars 直到:要么 text 能放下,要么触达 256 下限
-  let limit = maxChars;
-  while (limit > 256 && text.length > limit) {
-    limit = Math.floor(limit / 2);
-  }
-  return text.slice(0, Math.max(256, limit));
+  return text.slice(0, Math.max(256, Math.min(text.length, maxChars)));
 }
@@
-    const half = Math.floor(retryMaxChars / 2);
-    const candidates = [
-      original,
-      truncateForRetry(original, retryMaxChars),
-      truncateForRetry(original, half),
-    ].filter((c, idx, arr) => idx === 0 || c.length < (arr[idx - 1]?.length ?? Infinity));
+    const firstLimit = Math.max(256, Math.min(retryMaxChars, Math.floor(original.length / 2)));
+    const secondLimit = Math.max(256, Math.floor(firstLimit / 2));
+    const candidates = Array.from(new Set([
+      original,
+      truncateForRetry(original, firstLimit),
+      truncateForRetry(original, secondLimit),
+    ]));

Also applies to: 432-439

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/vectorSearchService.ts` around lines 385 - 394, The fallback
sizing in truncateForRetry is too aggressive and skips useful intermediate
lengths, so retry candidates either stay unchanged or collapse to 256 too early.
Update truncateForRetry in vectorSearchService so it progressively shrinks text
in smaller steps toward retryMaxChars instead of halving straight to the floor,
and make sure the retry ladder around this helper produces distinct shorter
candidates for already-short inputs and mid-sized 1.5k–6k inputs. Also adjust
the related retry candidate generation logic in the nearby retry path so it uses
truncateForRetry consistently and preserves more context before falling back to
the minimum.
🧹 Nitpick comments (1)
src/services/vectorSearchService.test.ts (1)

57-67: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Pin the expected retry sizes in these tests.

The current assertions still pass when the retry ladder jumps straight from 10k chars to 256, so they will not catch the truncation bug in truncateForRetry / embedWithFallback. Assert the actual intermediate sizes, or at least strictly decreasing candidate lengths, here.

Also applies to: 108-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/vectorSearchService.test.ts` around lines 57 - 67, The
retry-size assertions are too loose and can still pass even if
`truncateForRetry`/`embedWithFallback` skips the expected fallback steps. Update
the tests in `vectorSearchService.test.ts` to pin the actual retry ladder by
asserting the exact intermediate truncation lengths (or a strictly decreasing
sequence of candidate sizes) for the relevant `truncateForRetry` cases and the
`embedWithFallback` flow, using the existing `truncateForRetry` and
`embedWithFallback` symbols to target the behavior precisely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/routes/repositories.ts`:
- Around line 203-204: `vector_indexed_at` is only being checked as a string, so
invalid values can still be written and later break comparisons in
`vectorSearchService` and `VectorSearchSettings`. Add a single shared
normalizer/parser for this field and use it in all write paths in
`repositories.ts`, including the PUT/PATCH update flow and the sync/bulk upsert
import path, so only real ISO timestamps (or null) are persisted.

In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 268-279: The indexing flow in VectorSearchSettings currently only
updates vector_indexed_at after indexAllRepos() completes, so aborted
long-running runs can leave local repo stamps stale; pass an onRepoIndexed
callback into indexAllRepos() in both the full and incremental paths and use it
to patch the store as each repo is confirmed indexed. Update the code around
indexAllRepos, setVectorIndexingState, and the metadata write logic so confirmed
repos are persisted even if the run throws before returning indexedRepoIds.

---

Duplicate comments:
In `@src/services/vectorSearchService.ts`:
- Around line 385-394: The fallback sizing in truncateForRetry is too aggressive
and skips useful intermediate lengths, so retry candidates either stay unchanged
or collapse to 256 too early. Update truncateForRetry in vectorSearchService so
it progressively shrinks text in smaller steps toward retryMaxChars instead of
halving straight to the floor, and make sure the retry ladder around this helper
produces distinct shorter candidates for already-short inputs and mid-sized
1.5k–6k inputs. Also adjust the related retry candidate generation logic in the
nearby retry path so it uses truncateForRetry consistently and preserves more
context before falling back to the minimum.

---

Nitpick comments:
In `@src/services/vectorSearchService.test.ts`:
- Around line 57-67: The retry-size assertions are too loose and can still pass
even if `truncateForRetry`/`embedWithFallback` skips the expected fallback
steps. Update the tests in `vectorSearchService.test.ts` to pin the actual retry
ladder by asserting the exact intermediate truncation lengths (or a strictly
decreasing sequence of candidate sizes) for the relevant `truncateForRetry`
cases and the `embedWithFallback` flow, using the existing `truncateForRetry`
and `embedWithFallback` symbols to target the behavior precisely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 277b573b-7cfe-4c19-b629-8e8987e96871

📥 Commits

Reviewing files that changed from the base of the PR and between 22c23e4 and 2ab26f0.

📒 Files selected for processing (13)
  • cloudflare-worker/src/index.ts
  • cloudflare-worker/worker.js
  • server/src/db/schema.ts
  • server/src/routes/repositories.ts
  • server/src/routes/sync.ts
  • src/components/SearchBar.test.tsx
  • src/components/SearchBar.tsx
  • src/components/settings/VectorSearchSettings.tsx
  • src/services/vectorSearchService.test.ts
  • src/services/vectorSearchService.ts
  • src/store/useAppStore.ts
  • src/types/index.ts
  • src/utils/repositoryMerge.ts
💤 Files with no reviewable changes (1)
  • src/components/SearchBar.tsx

Comment thread server/src/routes/repositories.ts
Comment thread src/components/settings/VectorSearchSettings.tsx
AmintaCCCP and others added 2 commits June 27, 2026 20:20
- truncateForRetry: simplify to single-step truncation with 256 floor
- embedWithFallback: use progressive candidate ladder (half original,
  half of first) instead of aggressive halving to 256
- vector_indexed_at: add ISO 8601 validation in PATCH endpoint
- Stamp repos during indexing via onRepoIndexed callback (batched at 32)
  so abort doesn't lose confirmed progress
- Pin expected retry sizes in tests

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AmintaCCCP AmintaCCCP merged commit 6aa3ded into main Jun 27, 2026
5 checks passed
@AmintaCCCP AmintaCCCP deleted the feature/incremental-vector-index branch June 27, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant