Avoid making your AI run blind grep loops on every reply.
Use code-indexer to index once, then retrieve semantically via MCP.
code-indexer is a Docker-only MCP server for multi-repo code retrieval.
Pipeline:
- Tree-sitter chunking + textual fallback
- Dense embeddings
- Sparse retrieval
- Reranking
- Incremental indexing (
mtime + size + content hash)
Default models:
- Dense embedder:
jinaai/jina-embeddings-v2-base-code - Sparse encoder:
naver/splade-cocondenser-ensembledistil - Reranker:
BAAI/bge-reranker-v2-m3
Find other options:
- Code embedding models: https://huggingface.co/models?search=code%20embedding
- Sparse retrieval models: https://huggingface.co/models?search=splade
- Rerankers: https://huggingface.co/models?search=reranker
- Docker + Docker Compose
- Optional GPU runtime:
- NVIDIA: NVIDIA Container Toolkit
- AMD ROCm: ROCm-compatible Linux host
- Intel iGPU:
/dev/driaccess (runtime support depends on backend libraries)
- Create local config files.
Windows (PowerShell):
Copy-Item .env.example .env
Copy-Item config/codebases.example.yaml config/codebases.yamlLinux (bash):
cp .env.example .env
cp config/codebases.example.yaml config/codebases.yaml- Start (CPU/default).
Windows (PowerShell):
docker compose up --build -dLinux (bash):
docker compose up --build -d- MCP endpoint:
http://localhost:${MCP_PORT}/mcp(default:http://localhost:8000/mcp)
.envis the single source of truth for:- model IDs (
EMBEDDING_MODEL,SPARSE_MODEL,RERANKER_MODEL) - runtime tuning (
CHUNK_SIZE,USE_RERANKER, etc.)
- model IDs (
config/codebases.yamlis only for:- codebase paths (
codebases) - file scope (
default_include_extensions,default_exclude_dirs)
- codebase paths (
Windows (PowerShell):
docker compose --compatibility -f docker-compose.yml -f docker-compose.nvidia.yml up --build -dLinux (bash):
docker compose --compatibility -f docker-compose.yml -f docker-compose.nvidia.yml up --build -dUses mcp-server/Dockerfile.rocm.
Windows (PowerShell):
docker compose -f docker-compose.yml -f docker-compose.amd.yml up --build -dLinux (bash):
docker compose -f docker-compose.yml -f docker-compose.amd.yml up --build -dWindows (PowerShell):
docker compose -f docker-compose.yml -f docker-compose.intel.yml up --build -dLinux (bash):
docker compose -f docker-compose.yml -f docker-compose.intel.yml up --build -dNotes:
- Intel/AMD acceleration depends on backend libraries in your image/runtime.
- If acceleration is unavailable, set devices to CPU:
EMBEDDING_DEVICE=cpuSPARSE_DEVICE=cpuRERANK_DEVICE=cpu
Always pass container-visible paths.
- Use
/host/...for paths outsideHOST_CODEBASES_ROOT - Use
/workspaces/...for paths insideHOST_CODEBASES_ROOT - Do not pass raw host paths like
C:\repo\apior/home/user/repo
Examples:
C:\Users\you\repos\api->/host/Users/you/repos/api/home/you/repos/api->/host/home/you/repos/apiHOST_CODEBASES_ROOT=C:/repos+C:/repos/api->/workspaces/api
{
"mcpServers": {
"code-indexer": {
"transport": "streamable-http",
"url": "http://localhost:<MCP_PORT>/mcp"
}
}
}| Tool | Purpose |
|---|---|
list_codebases |
List codebases and indexed chunk counts |
register_codebase |
Register codebase path at runtime |
reindex_codebase |
Synchronous reindex (supports full_reindex) |
delete_codebase_index |
Delete vectors for one codebase index |
delete_all_indexes |
Delete all vectors from the collection |
start_index_job |
Start async indexing and return job_id |
cancel_index_job |
Request safe cancellation for one indexing job |
stop_codebase_indexing |
Request safe cancellation for active jobs of one codebase |
get_index_job_status |
Poll async status/result |
list_index_jobs |
List recent jobs |
search_code |
Hybrid semantic search + rerank |
read_code_file |
Read exact file lines |
For best retrieval quality, always send search_code.query in English.
Suggested system instruction:
When calling MCP tool search_code, always write query in English.
Example 1:
User:
Index my repo and find where retry policy is implemented.
AI tool calls:
register_codebase(codebase_id="payments", path="/host/Users/you/repos/payments", index_now=true)
search_code(query="where retry policy is implemented", codebase_ids=["payments"], search_mode="code_only", limit=5)
Example 2 (large repo, timeout-safe):
User:
Reindex the monorepo now.
AI tool calls:
start_index_job(codebase_id="monorepo")
get_index_job_status(job_id="...", include_result=true)
Example 3 (Portuguese user, English retrieval query):
User:
Onde o fallback textual Γ© usado no chunking?
AI tool call:
search_code(query="where textual fallback is used in chunking", codebase_ids=["rag-test"], search_mode="code_only")
Measured on 2026-02-28 on this same repository (code-indexer), using MCP tools and default models, on an NVIDIA GeForce RTX 4060 8GB.
Dataset/profile:
- Files indexed: 14
- Chunks after full rebuild: 218
USE_TREE_SITTER=trueUSE_SPARSE=trueUSE_RERANKER=true
| Scenario | Runs | Min (s) | Avg (s) | P50 (s) | Max (s) |
|---|---|---|---|---|---|
Full rebuild (full_reindex=true) |
1 | 7.386 | 7.386 | 7.386 | 7.386 |
| Incremental no-change | 5 | 0.068 | 0.074 | 0.069 | 0.096 |
| Async incremental end-to-end | 1 | 0.140 | 0.140 | 0.140 | 0.140 |
Async time-to-first-response (start_index_job) |
1 | 0.008 | 0.008 | 0.008 | 0.008 |
| Scenario | Runs | Min (s) | Avg (s) | P50 (s) | Max (s) |
|---|---|---|---|---|---|
search_code code_only |
8 | 0.487 | 0.501 | 0.496 | 0.518 |
search_code mixed |
8 | 0.857 | 0.879 | 0.873 | 0.917 |
search_code docs_only |
8 | 0.388 | 0.391 | 0.390 | 0.397 |
- Incremental no-change vs full rebuild: about
99.8xfaster. - Async indexing first response: about
8 ms.
If indexing takes hours, tune for throughput first:
- Use async indexing tools (avoid MCP client timeout):
start_index_job(codebase_id="my-repo")- Poll with
get_index_job_status(job_id="...")
- Restrict scope in
config/codebases.yaml:- keep only required extensions in
include_extensions - aggressively exclude heavy folders in
exclude_dirs
- keep only required extensions in
- Increase indexing throughput in
.env:INDEX_CHUNK_BUFFER_SIZE=512(or1024)QDRANT_UPSERT_BATCH_SIZE=256(or512)QDRANT_WRITE_WAIT=falseINDEX_FILE_WORKERS=4(or up to CPU cores)INDEX_MAX_PENDING_FUTURES=16(orworkers * 4)EMBEDDING_BATCH_SIZE=32(or64if VRAM allows)
- If indexing speed is priority over recall, disable sparse indexing:
USE_SPARSE=false
- Enable progress logs:
INDEX_PROGRESS_EVERY_FILES=250- then run
docker compose logs -f mcp-server
- For minified/single-line files, keep dedupe based on character ranges:
DEDUPE_CHAR_OVERLAP_THRESHOLD=0.65
Recent reindex_codebase / start_index_job results now include:
files_scannedflush_batchesindex_chunk_buffer_sizeqdrant_upsert_batch_sizeindex_file_workersduration_seconds
- Confirm client transport is
streamable-http - Set
MCP_STATELESS_HTTP=true - Restart stack
Windows (PowerShell):
docker compose up -d --buildLinux (bash):
docker compose up -d --buildCause:
- MCP server process is alive, but endpoint is not ready yet (commonly blocked by long startup indexing).
Fix:
- Keep
MCP_TRANSPORT=streamable-http - Set
AUTO_INDEX_ON_STARTUP=falsefor immediate startup, or keep it enabled with async startup on latest version - Start indexing via async tools after server is up:
start_index_job(...)get_index_job_status(...)
- Check mount roots (
HOST_FILESYSTEM_ROOT,HOST_CODEBASES_ROOT) - Send
/host/...or/workspaces/...path, not host-native path
- Qdrant vectors:
qdrant_data - Models/cache/state:
model_cache - Incremental state:
${INDEX_STATE_DIR}(default/models/index_state)