You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recall@10 drops to 0.07 on monorepos (benchmarked on fiber, 396 files, 4,382 chunks). The retriever's top-10 gets diluted across the entire search space instead of surfacing the specific file.
Proposed fix: scoped retrieval
When a query mentions or implies a subdirectory, scope the search to that subdirectory first. Fall back to full-repo search if scoped results are insufficient.
Approach
Query-time scope detection: extract subdirectory hints from the query (e.g., "how does middleware/logger work?" → scope to middleware/)
File-path prefix filter on vector search: filter chunks by file_path LIKE 'middleware/%' before ranking
Fallback: if scoped search returns fewer than top_k results above confidence threshold, expand to full repo
Optional explicit scope: allow context_search("logger", scope="middleware/") as an MCP tool parameter
Expected impact
Monorepo recall should jump from 0.07 to 0.5+ (most queries target a specific package/directory)
No impact on single-repo performance (scope detection returns nothing, falls through to full search)
Benchmark plan
Re-run fiber benchmark with scoped retrieval. Target: Recall@10 > 0.50.
Problem
Recall@10 drops to 0.07 on monorepos (benchmarked on fiber, 396 files, 4,382 chunks). The retriever's top-10 gets diluted across the entire search space instead of surfacing the specific file.
Proposed fix: scoped retrieval
When a query mentions or implies a subdirectory, scope the search to that subdirectory first. Fall back to full-repo search if scoped results are insufficient.
Approach
middleware/)file_path LIKE 'middleware/%'before rankingtop_kresults above confidence threshold, expand to full repocontext_search("logger", scope="middleware/")as an MCP tool parameterExpected impact
Benchmark plan
Re-run fiber benchmark with scoped retrieval. Target: Recall@10 > 0.50.
Related
benchmarks/results/fiber.md(Recall@10 = 0.07)benchmarks/results/chi.md(Recall@10 = 0.67, less affected)