feat: true incremental vector index with concurrent README fetching#230
Conversation
- Add vector_indexed_at field across full data flow (type, DB schema, CRUD, sync, merge) - Make incremental index truly incremental: only index repos without vector_indexed_at - Add freshness check: re-index when content updated after last index - Rebuild clears all vector_indexed_at before full re-index - Concurrent README fetching (5 parallel, Promise.allSettled) - Show unindexed count badge on incremental index button - Remove auto-indexing from sync button (was effectively a no-op) - Fix stale closure in onRepoIndexed callback (use getState()) - Fix vectorCount to accumulate for incremental operations Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesVector indexing timestamp and incremental flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces incremental vector indexing for repositories, adding a vector_indexed_at timestamp to track indexing status across the database, routes, and frontend. It removes background indexing from the search bar, adds concurrency to README fetching, and displays an unindexed repository count in the settings UI. The reviewer identified a bug in the SQL update statement that prevents clearing the vector_indexed_at field, and highlighted a performance issue where updating repositories in a loop triggers excessive state updates and IndexedDB writes, recommending a single batch update using setRepositories instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| last_edited = CASE WHEN excluded.last_edited IS NOT NULL AND excluded.last_edited != '' THEN excluded.last_edited ELSE repositories.last_edited END, | ||
| subscribed_to_releases = excluded.subscribed_to_releases | ||
| subscribed_to_releases = excluded.subscribed_to_releases, | ||
| vector_indexed_at = CASE WHEN excluded.vector_indexed_at IS NOT NULL AND excluded.vector_indexed_at != '' THEN excluded.vector_indexed_at ELSE repositories.vector_indexed_at END |
There was a problem hiding this comment.
The CASE WHEN clause prevents vector_indexed_at from ever being cleared (set to NULL) during a rebuild or reset, since any NULL or empty string sent by the frontend will just fall back to the existing database value. Since the frontend already preserves local metadata (including vector_indexed_at) during sync merges, it is safe and correct to directly assign excluded.vector_indexed_at.
| vector_indexed_at = CASE WHEN excluded.vector_indexed_at IS NOT NULL AND excluded.vector_indexed_at != '' THEN excluded.vector_indexed_at ELSE repositories.vector_indexed_at END | |
| vector_indexed_at = excluded.vector_indexed_at |
| setVectorIndexingState, | ||
| repositories, | ||
| githubToken, | ||
| updateRepository, | ||
| } = useAppStore(); |
There was a problem hiding this comment.
Destructure setRepositories from useAppStore so that we can perform efficient batch updates of the repository list instead of calling updateRepository in a loop.
| setVectorIndexingState, | |
| repositories, | |
| githubToken, | |
| updateRepository, | |
| } = useAppStore(); | |
| setVectorIndexingState, | |
| repositories, | |
| githubToken, | |
| updateRepository, | |
| setRepositories, | |
| } = useAppStore(); |
| // 2. 清除所有 vector_indexed_at(包括之前失败/不可索引的 repo 的残留值) | ||
| for (const repo of repositories) { | ||
| if (repo.vector_indexed_at) { | ||
| updateRepository({ ...repo, vector_indexed_at: undefined }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Calling updateRepository in a loop triggers a state update and IndexedDB write for each repository. This is extremely inefficient and can cause significant UI lag or crash the app when there are many repositories. Instead, perform a single batch update using setRepositories.
// 2. 清除所有 vector_indexed_at(包括之前失败/不可索引的 repo 的残留值)
const clearedRepos = repositories.map(repo =>
repo.vector_indexed_at ? { ...repo, vector_indexed_at: undefined } : repo
);
setRepositories(clearedRepos);
| // 4. 为成功索引的 repo 设置 vector_indexed_at | ||
| const indexedSet = new Set(result.indexedRepoIds); | ||
| for (const repo of useAppStore.getState().repositories) { | ||
| if (indexedSet.has(repo.id)) { | ||
| updateRepository({ ...repo, vector_indexed_at: now }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Calling updateRepository in a loop triggers a state update and IndexedDB write for each successfully indexed repository. This is extremely inefficient and can cause significant UI lag or crash the app when there are many repositories. Instead, perform a single batch update using setRepositories.
// 4. 为成功索引的 repo 设置 vector_indexed_at
const indexedSet = new Set(result.indexedRepoIds);
const updatedRepos = useAppStore.getState().repositories.map(repo =>
indexedSet.has(repo.id) ? { ...repo, vector_indexed_at: now } : repo
);
setRepositories(updatedRepos);
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/components/settings/VectorSearchSettings.tsx (1)
236-240: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winReuse one GitHub client for README batches.
Creating a new
GitHubApiServiceper README request discards the service’s per-instance rate-limit state during the new concurrent fetch flow. Instantiate it once insidecreateClientsand close over it.♻️ Proposed refactor
- const readmeFetcher = githubToken - ? (owner: string, repo: string, signal?: AbortSignal) => { - const api = new GitHubApiService(githubToken); - return api.getRepositoryReadme(owner, repo, signal); - } - : undefined; + const githubApi = githubToken ? new GitHubApiService(githubToken) : null; + const readmeFetcher = githubApi + ? (owner: string, repo: string, signal?: AbortSignal) => + githubApi.getRepositoryReadme(owner, repo, signal) + : undefined;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/settings/VectorSearchSettings.tsx` around lines 236 - 240, The README batch fetcher currently creates a new GitHubApiService inside readmeFetcher for every request, which loses per-instance rate-limit state during concurrent reads. Move the GitHubApiService instantiation into createClients and have readmeFetcher close over that single shared client, while keeping the existing getRepositoryReadme usage and githubToken gating intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 254-257: The full rebuild flow in VectorSearchSettings should
prune vectors only for repositories that were successfully rebuilt, not every
current repository ID. Move the cleanup logic out of the early pre-rebuild step
and base it on indexedRepoIds/result.indexed from the rebuild outcome, then call
clients.vectorService.cleanup with only those successful IDs. Update the related
success reporting around result.indexed so cleanup and indexing stay in sync.
- Around line 341-346: The vector count update in VectorSearchSettings should
not add result.indexed for repositories that were already indexed and only
re-upserted. Adjust the setVectorSearchStatus logic to count only newly indexed
repos by checking which items lack a previous vector_indexed_at (or equivalent
state) before incrementing vectorCount, using the existing result,
setVectorIndexingState, and useAppStore.getState() flow as the anchor points.
---
Nitpick comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 236-240: The README batch fetcher currently creates a new
GitHubApiService inside readmeFetcher for every request, which loses
per-instance rate-limit state during concurrent reads. Move the GitHubApiService
instantiation into createClients and have readmeFetcher close over that single
shared client, while keeping the existing getRepositoryReadme usage and
githubToken gating intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b62bc187-8527-4598-899d-04dca1c34d36
📒 Files selected for processing (8)
server/src/db/schema.tsserver/src/routes/repositories.tsserver/src/routes/sync.tssrc/components/SearchBar.tsxsrc/components/settings/VectorSearchSettings.tsxsrc/services/vectorSearchService.tssrc/types/index.tssrc/utils/repositoryMerge.ts
💤 Files with no reviewable changes (1)
- src/components/SearchBar.tsx
- Fix A: ON CONFLICT use direct assignment for vector_indexed_at (allow NULL on rebuild) - Fix B: Add setRepositories to store destructuring - Fix C/D: Replace per-repo updateRepository loops with batch setRepositories - Fix E: Move cleanup after rebuild, keep only successfully indexed vectors - Fix F: vectorCount only counts newly indexed repos (not re-indexed ones) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/routes/repositories.ts (1)
203-204: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winValidate
vector_indexed_atbefore persisting it.Both bulk upsert and PATCH currently accept any JSON value. A number/object can either fail binding or break incremental comparisons that expect an ISO string/null.
🐛 Proposed fix
if (typeof repo.stargazers_count !== 'number' || repo.stargazers_count < 0) { res.status(400).json({ error: 'Each repository must have a valid non-negative stargazers_count', code: 'INVALID_STARGAZERS_COUNT' }); return; } + if ( + 'vector_indexed_at' in repo && + repo.vector_indexed_at != null && + typeof repo.vector_indexed_at !== 'string' + ) { + res.status(400).json({ error: 'vector_indexed_at must be an ISO string or null', code: 'INVALID_VECTOR_INDEXED_AT' }); + return; + } }const updates = req.body as Record<string, unknown>; + if ( + 'vector_indexed_at' in updates && + updates.vector_indexed_at != null && + typeof updates.vector_indexed_at !== 'string' + ) { + res.status(400).json({ error: 'vector_indexed_at must be an ISO string or null', code: 'INVALID_VECTOR_INDEXED_AT' }); + return; + } const allowedFields: Record<string, (v: unknown) => unknown> = {- vector_indexed_at: (v) => v, + vector_indexed_at: (v) => v ?? null,Also applies to: 238-238
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/repositories.ts` around lines 203 - 204, Validate repo.vector_indexed_at in the repositories upsert/PATCH flow before saving it, and only persist a null or a valid ISO date string. Update the logic around the bulk upsert and PATCH handling in repositories route code (the repository persistence mapping) so invalid JSON values are rejected or normalized instead of passing through to comparisons or DB binding.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 274-279: The rebuild flow in VectorSearchSettings should not
continue as successful if vectorService.cleanup fails. In the rebuild handler
around the cleanup try/catch, propagate the cleanup error or otherwise mark the
operation as failed so the success UI/state update and the “rebuilt vectors”
report are not emitted after a failed cleanup. Use the existing rebuild logic,
result.indexedRepoIds, and cleanup call to ensure success is only stamped after
both rebuild and cleanup complete.
---
Outside diff comments:
In `@server/src/routes/repositories.ts`:
- Around line 203-204: Validate repo.vector_indexed_at in the repositories
upsert/PATCH flow before saving it, and only persist a null or a valid ISO date
string. Update the logic around the bulk upsert and PATCH handling in
repositories route code (the repository persistence mapping) so invalid JSON
values are rejected or normalized instead of passing through to comparisons or
DB binding.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7e6afa5a-8453-4c94-a5bd-c44117cd47b3
📒 Files selected for processing (2)
server/src/routes/repositories.tssrc/components/settings/VectorSearchSettings.tsx
Re-throw cleanup error so rebuild is not reported as successful when stale vectors cannot be pruned. Includes error message in result. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
SiliconFlow (and similar providers) reject an entire batch with error 20015 "parameter is invalid" when any single text exceeds the model's token limit. Previously this failed all 100 repos in the batch with no retry, leaving them permanently unindexed. Changes: - Add embedWithFallback: batch first (fast path), fall back to per-item embedding only on length-class errors (400/20015/"too long"). Non-length errors (auth/5xx) still fail the whole batch. - Add truncation retry ladder for oversized items: halve text length (6000 → 3000 → 1500 → ... → 256 floor) before giving up on an item. - Rewrite embedding loop to upsert only successful items, count failures per-item instead of per-batch. - Lower default batchSize 100 → 32 to reduce blast radius. Answering the chunk question: each batch = N repos (now 32), one embed call + one upsert per batch. This never affected retrieval precision (each repo is an independent vector keyed by repo id); it only affected success rate. Smaller batches + per-item isolation fix the failures. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/services/vectorSearchService.ts (1)
493-494: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winCompare against the latest content timestamp.
When
last_editedexists,analyzed_atis ignored. Since the embedding text includes analysis-derived fields, a repo re-analyzed after indexing can be skipped incorrectly.Proposed fix
- const contentTime = r.last_edited || r.analyzed_at || ''; + const contentTime = [r.last_edited, r.analyzed_at] + .filter((time): time is string => Boolean(time)) + .reduce((latest, time) => (time > latest ? time : latest), ''); return contentTime > r.vector_indexed_at;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/vectorSearchService.ts` around lines 493 - 494, In vectorSearchService, the timestamp comparison in the logic that uses last_edited and analyzed_at is choosing last_edited first and ignoring analyzed_at when both exist. Update the comparison to use the latest content timestamp instead of a fixed priority, so the check against vector_indexed_at reflects whichever of last_edited or analyzed_at is newer. Locate and adjust the return logic in the function handling repo reindexing/index freshness.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/services/vectorSearchService.test.ts`:
- Line 69: The mock callbacks in vectorSearchService.test.ts are declaring
unused parameters, which triggers ESLint. Update the callback passed to
makeClient and the other affected callback so they only accept the parameters
they actually use, removing the unused _purpose and _signal placeholders while
keeping the same behavior in the test helpers.
- Line 94: The test fixture in vectorSearchService.test.ts is too generic
because the thrown message “400 parameter is invalid” hard-codes a broad
detector; update the fixture to use the length-specific 20015 payload or
explicit token/length wording in the relevant test case, and add a separate
negative assertion that generic 400 errors do not trigger the same handling in
the vectorSearchService tests.
In `@src/services/vectorSearchService.ts`:
- Around line 371-378: The length-error matcher in vectorSearchService’s
token/length helper is too broad because it treats generic 400s and “parameter
is invalid” as token-length issues. Narrow the checks in that helper so only
explicit length/token messages are classified as retryable length failures, and
leave unrelated 400/configuration errors to propagate from indexAllRepos instead
of being converted into per-item nulls.
- Around line 452-455: The per-item fallback in vectorSearchService should not
swallow non-length errors after a batch length failure; in the single-item retry
path around the looksLikeLengthError(err) check, rethrow any non-length error
instead of breaking and continuing so 5xx/network/auth failures still fail the
batch. Update the retry loop logic in the relevant vectorSearchService method to
only continue shortening candidates for length-related errors and to propagate
other errors immediately.
---
Outside diff comments:
In `@src/services/vectorSearchService.ts`:
- Around line 493-494: In vectorSearchService, the timestamp comparison in the
logic that uses last_edited and analyzed_at is choosing last_edited first and
ignoring analyzed_at when both exist. Update the comparison to use the latest
content timestamp instead of a fixed priority, so the check against
vector_indexed_at reflects whichever of last_edited or analyzed_at is newer.
Locate and adjust the return logic in the function handling repo
reindexing/index freshness.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 608a9171-ef54-4030-bb03-f989bc039fed
📒 Files selected for processing (2)
src/services/vectorSearchService.test.tssrc/services/vectorSearchService.ts
- Fix 1: narrow looksLikeLengthError to explicit length/token keywords (20015, "input length", "maximum token", "token limit", "too long"). Drop generic 400 and "parameter is invalid" so config errors propagate instead of being misclassified as retryable length failures. - Fix 2: in per-item retry, rethrow non-length errors (5xx/network/auth) instead of swallowing them with break, so the whole batch fails fast. - Fix 3: freshness check uses the newer of last_edited/analyzed_at (sorted pop) instead of fixed last_edited-first priority. - Fix 4: remove unused _purpose/_signal params from test mock callbacks. - Fix 5: use explicit 20015 payload in length fixtures; add negative assertion that generic 400 errors do NOT trigger per-item fallback. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Problem: incremental index reported "784 failed" with error
"Cannot read properties of undefined (reading vectorCount)".
Root cause: useAppStore initial state had no vectorSearchStatus field,
so it was undefined unless the user had run "Test Worker Connection" first.
handleIncrementalIndex read .vectorCount without optional chaining, threw,
and the catch block then overwrote the successful index result with
errors: repositories.length — marking actually-succeeded repos as failed.
Fixes:
- Initialize vectorSearchStatus in store ({ connected, vectorCount, dimensions }).
- Use optional chaining (?.vectorCount ?? 0) for the prevCount read.
- Wrap setVectorSearchStatus in its own try/catch so a status-update
failure cannot roll back the already-succeeded indexing result.
- Rebuild catch now also surfaces the error message.
UI: unify button/badge styling — incremental index + abort buttons now use
rounded-lg (matching Rebuild), badge padding px-1.5 → px-2 (matches project
badge convention in SearchBar).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/routes/repositories.ts (1)
226-239: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winValidate
vector_indexed_atbefore writing it.Line 238 currently persists whatever value the client sends. If this column gets a number/boolean/object instead of
null | ISO string, the later comparisons insrc/components/settings/VectorSearchSettings.tsxandsrc/services/vectorSearchService.tswill misclassify repos as indexed or stale. Normalize empty values tonulland reject non-string timestamps with a 400 here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/repositories.ts` around lines 226 - 239, The repository update path currently writes vector_indexed_at without validating its shape, so adjust the allowedFields handling in repositories.ts to normalize empty values to null and reject any non-string timestamp before persistence. Update the update logic around allowedFields/vector_indexed_at to return a 400 for invalid inputs, and ensure only null or ISO string values can be stored so VectorSearchSettings and vectorSearchService keep their indexed/stale checks accurate.
🧹 Nitpick comments (1)
src/components/settings/VectorSearchSettings.tsx (1)
239-243: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winReuse one GitHub API client for README fetches.
GitHubApiServicekeeps rate-limit state on the instance; creating a fresh client for every README call loses that state between batches and increases the chance of empty README fallbacks when limits are low.♻️ Proposed refactor
- const readmeFetcher = githubToken - ? (owner: string, repo: string, signal?: AbortSignal) => { - const api = new GitHubApiService(githubToken); - return api.getRepositoryReadme(owner, repo, signal); - } + const githubApi = githubToken ? new GitHubApiService(githubToken) : null; + const readmeFetcher = githubApi + ? (owner: string, repo: string, signal?: AbortSignal) => + githubApi.getRepositoryReadme(owner, repo, signal) : undefined;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/settings/VectorSearchSettings.tsx` around lines 239 - 243, Reuse a single GitHubApiService instance for README fetching in VectorSearchSettings instead of constructing a new client inside readmeFetcher each time. Create the GitHubApiService once when githubToken is available, then have readmeFetcher call getRepositoryReadme on that shared instance so rate-limit state is preserved across README batches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Line 61: The metadata stamping flow in VectorSearchSettings is replacing the
active filtered searchResults by calling setRepositories, which resets the
result set to all repositories. Update the vector_indexed_at handling to perform
a batch repository metadata update that maps changes by repo id across both
repositories and the current searchResults state without swapping out the
filtered list. Make the fix in the related vector metadata update paths in
VectorSearchSettings so the search view stays intact after stamping.
In `@src/services/vectorSearchService.ts`:
- Around line 433-436: Retry candidates are not shrinking for moderately long
inputs because the ladder in vectorSearchService uses a fixed retryMaxChars cap,
so truncateForRetry can return the original text multiple times. Update the
candidate construction to base each retry step on original.length or
progressively halve the previous candidate so each retry is strictly shorter,
and keep the logic centered around truncateForRetry and the candidates array in
vectorSearchService.
- Around line 573-581: The batch upsert path in vectorSearchService is marking
every repo in vectorizeVectors as indexed immediately after
vectorService.upsert, but that call can partially succeed, so indexedRepoIds,
onRepoIndexed, and the indexed count may drift. Update the flow to avoid
treating batch success as per-item success: either have vectorService.upsert
return per-vector success results and only push repo IDs for confirmed items, or
re-check index state before adding each repo in this batch-processing block.
In `@src/store/useAppStore.ts`:
- Line 1053: Persist or restore vectorSearchStatus in useAppStore so vectorCount
survives restarts; the current store initialization includes vectorSearchStatus
but the persisted slice/migration path does not, causing incremental counting to
start from zero. Update the persistence/migration logic around useAppStore to
load or rehydrate vectorSearchStatus (especially vectorCount) before any
incremental indexing runs, and ensure any missing persisted status is
initialized during migration.
---
Outside diff comments:
In `@server/src/routes/repositories.ts`:
- Around line 226-239: The repository update path currently writes
vector_indexed_at without validating its shape, so adjust the allowedFields
handling in repositories.ts to normalize empty values to null and reject any
non-string timestamp before persistence. Update the update logic around
allowedFields/vector_indexed_at to return a 400 for invalid inputs, and ensure
only null or ISO string values can be stored so VectorSearchSettings and
vectorSearchService keep their indexed/stale checks accurate.
---
Nitpick comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 239-243: Reuse a single GitHubApiService instance for README
fetching in VectorSearchSettings instead of constructing a new client inside
readmeFetcher each time. Create the GitHubApiService once when githubToken is
available, then have readmeFetcher call getRepositoryReadme on that shared
instance so rate-limit state is preserved across README batches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e7458e31-20a8-454f-af19-d5a4679d2959
📒 Files selected for processing (10)
server/src/db/schema.tsserver/src/routes/repositories.tsserver/src/routes/sync.tssrc/components/SearchBar.tsxsrc/components/settings/VectorSearchSettings.tsxsrc/services/vectorSearchService.test.tssrc/services/vectorSearchService.tssrc/store/useAppStore.tssrc/types/index.tssrc/utils/repositoryMerge.ts
💤 Files with no reviewable changes (1)
- src/components/SearchBar.tsx
The SearchBar component uses useDialog() (requires DialogProvider) and useAppStore.getState() (for vectorSearchConfig/embeddingConfigs in effects). The test mocked useAppStore as a hook but not useDialog or getState, causing "useDialog must be used within a DialogProvider" errors and swallowed state-access errors during effect runs. - Mock useDialog (matches ForkTimeline.test.tsx convention). - Add vectorSearchConfig/vectorSearchStatus/embeddingConfigs to base state. - Expose getState() on the mocked store. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Validate vector_indexed_at shape in PATCH endpoint: reject non-string (number/boolean/object) with 400, normalize null/empty to null. - Add updateRepositoriesMetadata store action for batch metadata updates that preserves the filtered searchResults (setRepositories was resetting searchResults to all repos, breaking the active search view). - Use updateRepositoriesMetadata in rebuild/incremental handlers. - Fix retry candidate ladder: dedupe candidates so each retry step is strictly shorter (truncateForRetry could return the original when text was already under retryMaxChars). - Use upsert result count for indexedRepoIds/indexed instead of assuming full batch success; count shortfall as errors. - Persist vectorSearchStatus (vectorCount) across restarts; migrate old persisted state to initialize the field. - Reuse a single GitHubApiService instance for README fetches to preserve rate-limit state across batches. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both handleRebuildIndex and handleIncrementalIndex captured `repositories` from the useCallback closure, which could be stale when called after a sync or AI analysis that updated the store without re-rendering the settings panel. This caused: - Badge showing 1 (fresh state) but handler indexing 0 repos (stale) - Silent "instant complete" with no progress or results - Newly analyzed repos not being found by AI vector search Fix: read useAppStore.getState().repositories inside each handler at call time. Remove `repositories` from both useCallback dependency arrays. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s to brand-indigo - Cleanup failure (Vectorize returnMetadata:false bug) no longer discards successful indexing result — warns instead of throwing - Incremental index badge: bg-purple-500 → bg-brand-indigo - Progress bar: bg-purple-500 → bg-brand-indigo Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/components/settings/VectorSearchSettings.tsx (1)
281-286: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winTreat cleanup failure as rebuild failure.
If
cleanup()fails here, stale vectors remain in the index while Lines 288-299 still stamp a successful rebuild and replacevectorCountwithresult.indexed.🔧 Minimal fix
try { await clients.vectorService.cleanup(result.indexedRepoIds.map(String), controller.signal); } catch (cleanupErr) { - console.warn('Vector cleanup failed after rebuild (non-fatal):', cleanupErr); + throw cleanupErr; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/settings/VectorSearchSettings.tsx` around lines 281 - 286, Treat the post-rebuild cleanup as part of the rebuild flow instead of a non-fatal step in VectorSearchSettings. In the rebuild handler around cleanup(result.indexedRepoIds.map(String), controller.signal), remove the silent catch/console.warn behavior and make cleanup failures abort the rebuild path so the success state is not stamped when stale vectors may remain. Ensure the success update logic that follows only runs after both the rebuild and cleanup succeed, so the rebuild status, vectorCount, and related state in this component reflect a fully consistent index.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/components/SearchBar.test.tsx`:
- Around line 77-80: `SearchBar.test.tsx` is stubbing `useAppStore.getState()`
to always return a fresh `baseStoreState()`, which ignores per-test overrides
set via `mockUseAppStore.mockReturnValue(createStoreState(...))`. Update the
`mockUseAppStore`/`getState` mock so it returns the same mocked store instance
or current mocked state used by `useAppStore`, keeping direct `getState()` calls
in `SearchBar` aligned with each test’s overridden store state.
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 305-307: The fallback indexing result is using the total
repository count, which can overreport failures for repos that were never
eligible to be indexed. Update the error fallback in VectorSearchSettings’s
indexing flow (including the indexAllRepos path and the similar fallback around
the other referenced block) so it reports only repos actually attempted or
leaves the failed count unset/accurate rather than using
useAppStore.getState().repositories.length. Keep the state update logic in
setVectorIndexingState consistent with the real attempted set.
---
Duplicate comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 281-286: Treat the post-rebuild cleanup as part of the rebuild
flow instead of a non-fatal step in VectorSearchSettings. In the rebuild handler
around cleanup(result.indexedRepoIds.map(String), controller.signal), remove the
silent catch/console.warn behavior and make cleanup failures abort the rebuild
path so the success state is not stamped when stale vectors may remain. Ensure
the success update logic that follows only runs after both the rebuild and
cleanup succeed, so the rebuild status, vectorCount, and related state in this
component reflect a fully consistent index.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 97d9a86a-fb3a-4239-adb5-4166d1c046a6
📒 Files selected for processing (5)
server/src/routes/repositories.tssrc/components/SearchBar.test.tsxsrc/components/settings/VectorSearchSettings.tsxsrc/services/vectorSearchService.tssrc/store/useAppStore.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- server/src/routes/repositories.ts
- src/services/vectorSearchService.ts
- Worker: returnMetadata: false → 'none' (Vectorize rejects boolean) - Rebuild handler: restore cleanup as fatal (now safe with Worker fix) - Error fallback: use indexable count instead of total repositories.length - Test: getState() now returns current mock state instead of fresh base Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/components/settings/VectorSearchSettings.tsx (1)
372-378: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winUse the incremental predicate for fallback error counts.
In incremental mode,
indexAllRepos(..., incremental: true)should only attempt repos missing/stalevector_indexed_at, but this fallback reports every analyzed/non-failed repo as an error, including already up-to-date repos. Reuse the same predicate asunindexedCounthere.🐛 Proposed fix
- const currentRepos = useAppStore.getState().repositories; - const indexableCount = currentRepos.filter((r) => r.analyzed_at && !r.analysis_failed).length; - const skippedCount = currentRepos.length - indexableCount; + const currentRepos = useAppStore.getState().repositories; + const attemptedCount = currentRepos.filter((r) => { + if (!r.analyzed_at || r.analysis_failed) return false; + if (!r.vector_indexed_at) return true; + const contentTime = [r.last_edited, r.analyzed_at] + .filter((t): t is string => !!t) + .sort() + .pop() || ''; + return contentTime > r.vector_indexed_at; + }).length; + const skippedCount = currentRepos.length - attemptedCount; setVectorIndexingState({ isIndexing: false, phase: null, - result: { indexed: 0, skipped: skippedCount, errors: indexableCount, error: msg }, + result: { indexed: 0, skipped: skippedCount, errors: attemptedCount, error: msg }, });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/settings/VectorSearchSettings.tsx` around lines 372 - 378, The fallback error counting in VectorSearchSettings’s indexing state is using a broad analyzed/non-failed repository filter, which overcounts errors in incremental mode. Update the logic in VectorSearchSettings so the `errors` value reuses the same predicate as `unindexedCount` from `indexAllRepos(..., incremental: true)`, ensuring only repos missing or stale `vector_indexed_at` are counted. Keep the existing state update shape in `setVectorIndexingState`, but align the fallback count with the incremental selection predicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 372-378: The fallback error counting in VectorSearchSettings’s
indexing state is using a broad analyzed/non-failed repository filter, which
overcounts errors in incremental mode. Update the logic in VectorSearchSettings
so the `errors` value reuses the same predicate as `unindexedCount` from
`indexAllRepos(..., incremental: true)`, ensuring only repos missing or stale
`vector_indexed_at` are counted. Keep the existing state update shape in
`setVectorIndexingState`, but align the fallback count with the incremental
selection predicate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 904f9d47-560a-4783-b6a8-18e25b1cf3a7
📒 Files selected for processing (4)
cloudflare-worker/src/index.tscloudflare-worker/worker.jssrc/components/SearchBar.test.tsxsrc/components/settings/VectorSearchSettings.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- src/components/SearchBar.test.tsx
Incremental handler's catch block now reuses the same predicate as unindexedCount (analyzed + not failed + vector_indexed_at stale) instead of broad indexable filter. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/services/vectorSearchService.ts (1)
385-394: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winThe fallback ladder still either jumps straight to 256 chars or does not shrink at all.
truncateForRetry(10000, 6000)currently collapses to 256, while any text already underretryMaxCharsproduces duplicate candidates and gets no shorter retry. That drops most README context for rescued items and still leaves many token-heavy 1.5k–6k inputs unrecoverable.Suggested fix
export function truncateForRetry(text: string, maxChars: number): string { - // 已经够短,无需截断 - if (text.length <= maxChars) return text; - // 逐级折半 maxChars 直到:要么 text 能放下,要么触达 256 下限 - let limit = maxChars; - while (limit > 256 && text.length > limit) { - limit = Math.floor(limit / 2); - } - return text.slice(0, Math.max(256, limit)); + return text.slice(0, Math.max(256, Math.min(text.length, maxChars))); } @@ - const half = Math.floor(retryMaxChars / 2); - const candidates = [ - original, - truncateForRetry(original, retryMaxChars), - truncateForRetry(original, half), - ].filter((c, idx, arr) => idx === 0 || c.length < (arr[idx - 1]?.length ?? Infinity)); + const firstLimit = Math.max(256, Math.min(retryMaxChars, Math.floor(original.length / 2))); + const secondLimit = Math.max(256, Math.floor(firstLimit / 2)); + const candidates = Array.from(new Set([ + original, + truncateForRetry(original, firstLimit), + truncateForRetry(original, secondLimit), + ]));Also applies to: 432-439
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/vectorSearchService.ts` around lines 385 - 394, The fallback sizing in truncateForRetry is too aggressive and skips useful intermediate lengths, so retry candidates either stay unchanged or collapse to 256 too early. Update truncateForRetry in vectorSearchService so it progressively shrinks text in smaller steps toward retryMaxChars instead of halving straight to the floor, and make sure the retry ladder around this helper produces distinct shorter candidates for already-short inputs and mid-sized 1.5k–6k inputs. Also adjust the related retry candidate generation logic in the nearby retry path so it uses truncateForRetry consistently and preserves more context before falling back to the minimum.
🧹 Nitpick comments (1)
src/services/vectorSearchService.test.ts (1)
57-67: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winPin the expected retry sizes in these tests.
The current assertions still pass when the retry ladder jumps straight from 10k chars to 256, so they will not catch the truncation bug in
truncateForRetry/embedWithFallback. Assert the actual intermediate sizes, or at least strictly decreasing candidate lengths, here.Also applies to: 108-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/vectorSearchService.test.ts` around lines 57 - 67, The retry-size assertions are too loose and can still pass even if `truncateForRetry`/`embedWithFallback` skips the expected fallback steps. Update the tests in `vectorSearchService.test.ts` to pin the actual retry ladder by asserting the exact intermediate truncation lengths (or a strictly decreasing sequence of candidate sizes) for the relevant `truncateForRetry` cases and the `embedWithFallback` flow, using the existing `truncateForRetry` and `embedWithFallback` symbols to target the behavior precisely.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/src/routes/repositories.ts`:
- Around line 203-204: `vector_indexed_at` is only being checked as a string, so
invalid values can still be written and later break comparisons in
`vectorSearchService` and `VectorSearchSettings`. Add a single shared
normalizer/parser for this field and use it in all write paths in
`repositories.ts`, including the PUT/PATCH update flow and the sync/bulk upsert
import path, so only real ISO timestamps (or null) are persisted.
In `@src/components/settings/VectorSearchSettings.tsx`:
- Around line 268-279: The indexing flow in VectorSearchSettings currently only
updates vector_indexed_at after indexAllRepos() completes, so aborted
long-running runs can leave local repo stamps stale; pass an onRepoIndexed
callback into indexAllRepos() in both the full and incremental paths and use it
to patch the store as each repo is confirmed indexed. Update the code around
indexAllRepos, setVectorIndexingState, and the metadata write logic so confirmed
repos are persisted even if the run throws before returning indexedRepoIds.
---
Duplicate comments:
In `@src/services/vectorSearchService.ts`:
- Around line 385-394: The fallback sizing in truncateForRetry is too aggressive
and skips useful intermediate lengths, so retry candidates either stay unchanged
or collapse to 256 too early. Update truncateForRetry in vectorSearchService so
it progressively shrinks text in smaller steps toward retryMaxChars instead of
halving straight to the floor, and make sure the retry ladder around this helper
produces distinct shorter candidates for already-short inputs and mid-sized
1.5k–6k inputs. Also adjust the related retry candidate generation logic in the
nearby retry path so it uses truncateForRetry consistently and preserves more
context before falling back to the minimum.
---
Nitpick comments:
In `@src/services/vectorSearchService.test.ts`:
- Around line 57-67: The retry-size assertions are too loose and can still pass
even if `truncateForRetry`/`embedWithFallback` skips the expected fallback
steps. Update the tests in `vectorSearchService.test.ts` to pin the actual retry
ladder by asserting the exact intermediate truncation lengths (or a strictly
decreasing sequence of candidate sizes) for the relevant `truncateForRetry`
cases and the `embedWithFallback` flow, using the existing `truncateForRetry`
and `embedWithFallback` symbols to target the behavior precisely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 277b573b-7cfe-4c19-b629-8e8987e96871
📒 Files selected for processing (13)
cloudflare-worker/src/index.tscloudflare-worker/worker.jsserver/src/db/schema.tsserver/src/routes/repositories.tsserver/src/routes/sync.tssrc/components/SearchBar.test.tsxsrc/components/SearchBar.tsxsrc/components/settings/VectorSearchSettings.tsxsrc/services/vectorSearchService.test.tssrc/services/vectorSearchService.tssrc/store/useAppStore.tssrc/types/index.tssrc/utils/repositoryMerge.ts
💤 Files with no reviewable changes (1)
- src/components/SearchBar.tsx
- truncateForRetry: simplify to single-step truncation with 256 floor - embedWithFallback: use progressive candidate ladder (half original, half of first) instead of aggressive halving to 256 - vector_indexed_at: add ISO 8601 validation in PATCH endpoint - Stamp repos during indexing via onRepoIndexed callback (batched at 32) so abort doesn't lose confirmed progress - Pin expected retry sizes in tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
改动概览
1. 增量索引真正增量 —
vector_indexed_at字段vector_indexed_at:Repository 类型 → SQLite schema → backend CRUD → sync import → repositoryMergevector_indexed_atvslast_edited/analyzed_at)vector_indexed_at,完成后只为成功索引的 repo 设置2. 未索引数量 badge
3. README 获取并发化
for循环 → 并发 5 的Promise.allSettled批次4. 同步按钮不再触发自动索引
analyzed_at,实际是空操作)审计修复
onRepoIndexed用getState()获取最新数据,避免 stale closure 覆盖并发编辑vectorCount增量时累加,重建时替换全部Summary by CodeRabbit
New Features
vector_indexed_atsupport: schema migration, repository API responses, bulk upsert, sync import/export, and local merge preservation.Bug Fixes
vector_indexed_atvalidation for repository updates.Chores / Tests