Srclight runs as a single MCP server process. It indexes repos on local filesystems and serves them to any MCP client (Claude Code, Cursor, etc.).
MCP Client ──stdio/sse──→ srclight serve --workspace myworkspace
│
┌───────────────┼───────────────┐
▼ ▼ ▼
project-a project-b project-c
.srclight/ .srclight/ .srclight/
index.db index.db index.db
embeddings.npy embeddings.npy embeddings.npy
Each repo has its own .srclight/ directory with:
index.db— SQLite FTS5 index (write path, per-symbol CRUD)embeddings.npy— float32 matrix snapshot for GPU/CPU searchembeddings_norms.npy— pre-computed row normsembeddings_meta.json— symbol mapping + cache version
# From PyPI
pip install srclight
# From source
git clone https://github.com/srclight/srclight.git
cd srclight
pip install -e .Srclight supports two transport modes:
- stdio — one server process per client session (simple, no setup)
- SSE — one persistent server, many clients (recommended for workspaces)
Each Claude Code session spawns its own srclight process:
# Add for current project
claude mcp add srclight -- srclight serve --workspace myworkspace
# Add globally (available in all projects)
claude mcp add --scope user srclight -- srclight serve --workspace myworkspaceRun srclight as a persistent background service. This is faster (no cold start per session), supports multiple concurrent clients, and survives restarts.
Create the service file (~/.config/systemd/user/srclight.service):
[Unit]
Description=Srclight MCP Server (workspace: myworkspace)
After=network.target
[Service]
Type=simple
ExecStart=/path/to/srclight-venv/bin/srclight serve --workspace myworkspace
Restart=on-failure
RestartSec=3
Environment=PATH=/path/to/srclight-venv/bin:/usr/local/bin:/usr/bin:/bin
[Install]
WantedBy=default.targetEnable and start:
systemctl --user daemon-reload
systemctl --user enable srclight
systemctl --user start srclight
# Verify it's running
systemctl --user status srclight
curl -s http://127.0.0.1:8742/sse # should stream SSE eventsConnect Claude Code to the SSE server:
claude mcp add --transport sse srclight http://127.0.0.1:8742/sseWSL + Windows Claude Code: If Claude Code runs on Windows but srclight runs in WSL, the same localhost:8742 URL works — WSL2 forwards localhost ports to Windows automatically:
# Run this in Windows Claude Code (cmd/PowerShell terminal)
claude mcp add --transport sse srclight http://127.0.0.1:8742/sseRecommended: SSE (streamable HTTP). Run srclight as a long-lived server (Option B above or srclight serve -p 8742 in a terminal), then point Cursor at it. Config lives in project .cursor/mcp.json or user ~/.cursor/mcp.json. Example: cursor-mcp-example.json.
- UI: Settings → Tools & MCP → Add new MCP server → Type streamableHttp, URL:
http://127.0.0.1:8742/sse. Name it e.g.srclight. - Ensure srclight is running first (e.g.
srclight serve --workspace myworkspaceor your systemd service). - Restart Cursor completely after adding the server.
Alternatively you can use stdio (Type command, Command: srclight, Args: serve --workspace myworkspace); SSE is preferred so the server is already warm and tools don’t pay cold-start.
- If tools feel stuck: Cursor applies a short timeout to MCP tool calls (~60–120s). Srclight uses a 20s timeout for embedding API calls so search returns quickly or falls back to keyword-only. Prefer SSE so the first request doesn’t pay cold-start cost.
WSL + Windows Cursor: If Cursor runs on Windows and srclight runs in WSL, http://127.0.0.1:8742/sse works the same way — WSL2 forwards localhost to Windows.
{
"mcpServers": {
"srclight": {
"command": "srclight",
"args": ["serve", "--workspace", "myworkspace"]
}
}
}OpenClaw connects to srclight via its built-in mcporter MCP tool server.
Prerequisite: Srclight must be running as an SSE server (Option B above).
# 1. Add srclight to mcporter config
mcporter config add srclight http://127.0.0.1:8742/sse \
--transport sse --scope home \
--description "Srclight deep code indexing"
# 2. Verify the connection
mcporter call srclight.list_projects
# 3. Restart the OpenClaw gateway to pick up the new server
systemctl --user restart openclaw-gatewayThe OpenClaw agent uses srclight tools via the mcporter skill and exec:
mcporter call srclight.list_projects
mcporter call srclight.search_symbols query="MyClass"
mcporter call srclight.get_callers symbol_name="lookup" project="my-repo"
mcporter call srclight.hybrid_search query="authentication logic"
All 29 srclight tools are available as srclight.<tool_name> through mcporter.
Start a new session and ask:
What projects are in the srclight workspace?
The agent should call list_projects() and show your repos.
Once the MCP server is active, just ask naturally:
| Question | What happens |
|---|---|
| "Compare dictionary lookup in project-a vs project-b" | hybrid_search("dictionary lookup", project="project-a") + same for project-b |
| "Show me the TTS architecture" | semantic_search("text to speech provider") across all projects |
| "Map the project-a codebase" | codebase_map(project="project-a") |
"Who calls lookup in project-c?" |
get_callers("lookup", project="project-c") |
| "What changed recently across all repos?" | recent_changes() |
The project parameter filters to one repo. Omit it to search all.
- You commit in any repo with hooks installed
- The
post-commithook fires (background, non-blocking) srclight index .runs withflock(prevents concurrent re-indexes)- Changed files are re-parsed (tree-sitter), FTS5 indexes updated
- Output logged to
.srclight/reindex.log
Note: The hook does NOT re-embed. FTS5 search (search_symbols, keyword part of hybrid_search) is always fresh. Semantic search for new/changed symbols requires a manual embed pass (see below).
git checkout other-branchtriggerspost-checkouthook- Only fires on branch checkouts (not file checkouts) and only when HEAD changes
- Same background
srclight index .as post-commit - FTS5 indexes updated for all files that differ between branches
After major refactors, new branches with many new files, or initial setup:
# Re-embed a single project
cd /path/to/repo
srclight index --embed qwen3-embedding
# Re-embed all projects in workspace
srclight workspace index -w myworkspace --embed qwen3-embedding
# Re-embed just one project via workspace command
srclight workspace index -w myworkspace -p project-name --embed qwen3-embeddingEmbedding is incremental — only symbols whose body_hash changed get re-embedded. The .npy sidecar is rebuilt automatically after embedding.
Srclight indexes non-code files alongside source code. Documents are extracted into searchable symbols (sections, pages, tables) with the same FTS5 indexes and embedding support as code symbols.
| Format | Extension(s) | Install extra | What's extracted |
|---|---|---|---|
.pdf |
srclight[pdf] |
Pages, tables, heading-based sections | |
| Word | .docx |
srclight[docs] |
Heading-based sections |
| Excel | .xlsx |
srclight[docs] |
Sheets with column metadata |
| HTML | .html, .htm |
srclight[docs] |
Heading-based sections |
| Images | .png, .jpg, .svg, etc. |
srclight[docs] |
Dimensions, EXIF, optional OCR text |
| CSV/TSV | .csv, .tsv |
(built-in) | Column names, row counts |
.eml |
(built-in) | Subject, from/to, body | |
| Text/RST | .txt, .rst, .log |
(built-in) | Sections (RST headings) or chunks |
| Markdown | .md |
(built-in) | Heading-based sections |
Install all document extras at once:
pip install 'srclight[docs,pdf]'Scanned or image-only PDF pages (where extract_text() returns nothing) can be OCR'd automatically using PaddleOCR. This is per-page: native-text pages are never touched, and mixed PDFs (some pages scanned, some native) work correctly.
PaddleOCR renders PDF pages to images via pdf2image, which requires the Poppler PDF rendering library:
# Ubuntu / Debian / WSL
sudo apt install poppler-utils
# macOS
brew install poppler
# Windows (native)
# Download from https://github.com/oschwartz10612/poppler-windows/releases
# Add the bin/ directory to your PATHpip install 'srclight[pdf,paddleocr]'This installs paddleocr>=2.8 and pdf2image>=1.16. On first use, PaddleOCR downloads its models (~1 GB).
- During indexing, for each PDF page where
extract_text()returns empty and the page has embedded images: - PaddleOCR is lazily initialized (once per indexer run, only if a scanned page is encountered)
- The page is rasterized at 200 DPI via
pdf2image+ Poppler - PaddleOCR extracts text from the rendered image
- The OCR'd text becomes a searchable "page" symbol, same as native text
If paddleocr or pdf2image are not installed, scanned pages are silently skipped (no error). Native-text PDFs work fine without these dependencies.
If an agent needs to enable scanned-PDF indexing, run these commands:
# 1. Check if poppler is installed
which pdftoppm # Linux/macOS — should return a path
# If missing: sudo apt install poppler-utils (or brew install poppler)
# 2. Install Python deps into srclight's environment
pip install 'srclight[pdf,paddleocr]'
# 3. Verify the install
python -c "import paddleocr; print('paddleocr OK')"
python -c "import pdf2image; print('pdf2image OK')"
# 4. Re-index the project to pick up scanned PDFs
srclight index
# Or with embeddings:
srclight index --embed qwen3-embeddingPaddleOCR defaults to CPU. For GPU acceleration:
# Check if NVIDIA GPU is available
nvidia-smi # Shows GPU model, driver, CUDA version
# Install PaddlePaddle with GPU support (CUDA 11.8 or 12.x)
pip install paddlepaddle-gpu # Replaces the CPU-only paddlepaddle
# Verify GPU is available to PaddlePaddle
python -c "import paddle; print('GPU available:', paddle.device.is_compiled_with_cuda())"Note: Srclight's PaddleOCR wrapper currently initializes with device="cpu". To use GPU, you would need to modify the _init_paddle() call in pdf_extractor.py to pass device="gpu". This is a future enhancement.
For standalone image files (PNG, JPG, TIFF, etc.), srclight can extract text using Tesseract OCR:
# System prerequisite
sudo apt install tesseract-ocr # Ubuntu/Debian/WSL
brew install tesseract # macOS
# Python dependency
pip install 'srclight[docs,ocr]'This is independent of PaddleOCR — pytesseract handles image files, while PaddleOCR handles scanned PDF pages.
# 1. Add to workspace
srclight workspace add /path/to/new-repo -w myworkspace
srclight workspace add /path/to/new-repo -w myworkspace -n custom-name # optional custom name
# 2. Index with embeddings
srclight workspace index -w myworkspace -p new-repo --embed qwen3-embedding
# 3. Install git hooks
cd /path/to/new-repo
srclight hook install
# Or install across entire workspace (safe — skips already-installed):
srclight hook install --workspace myworkspace
# 4. Verify
srclight workspace status -w myworkspaceThe new repo is immediately searchable. The MCP server picks up new projects on the next tool call (no restart needed — workspace config is re-read).
Note: Both srclight index and srclight hook install automatically add .srclight/ to the repo's .gitignore. The index databases and embedding files can be large (hundreds of MB) and should never be committed.
Srclight discovers files using git ls-files, which does not recurse into submodules. Git treats submodules as opaque "gitlink" entries, so their contents are invisible to the indexer. This also applies to vendored code that lives in a separate git repo nested inside the parent.
Recommendation: If you want a submodule indexed, clone it separately and add it as its own project.
# Clone the submodule repo standalone
git clone [email protected]:your-org/some-lib.git /path/to/some-lib
# Add and index it
srclight workspace add /path/to/some-lib -w myworkspace
srclight workspace index -w myworkspace -p some-lib --embed qwen3-embedding
srclight hook install --workspace myworkspaceThis gives you full symbol search, relationship graphs, and semantic search across the submodule — and keeps it independently searchable alongside the parent project.
What about vendored copies? If a dependency is committed directly into your repo (e.g. third_party/some-lib/ without a .gitmodules entry), then git ls-files does return those files and srclight indexes them as part of the parent project. No extra steps needed. If you later convert a vendored directory to a proper git submodule, its files will disappear from the parent's index on the next reindex — at which point you'd add it as a standalone project.
# Remove from workspace config
srclight workspace remove project-name -w myworkspace
# Optionally remove hooks
cd /path/to/repo
srclight hook uninstall
# The .srclight/ directory in the repo is left on disk (safe to delete manually)# Workspace overview (all projects)
srclight workspace status -w myworkspace
# List all workspaces
srclight workspace list
# Hook status for current repo
srclight hook status
# Hook status for all repos in workspace
srclight hook status --workspace myworkspace| Operation | Time | Notes |
|---|---|---|
| Semantic search (workspace, 27K vectors) | ~105ms warm | GPU-resident .npy cache |
| Semantic search (single repo, 15K vectors) | ~12ms warm | |
| Cold start (first query after server start) | ~300ms | Loads .npy to GPU VRAM |
| FTS5 search | <10ms | SQLite, always fast |
| Incremental re-index (post-commit) | 1-5s | Background, non-blocking |
| Full re-embed (27K symbols) | ~15 min | Ollama qwen3-embedding, one-time |
# Check if srclight binary works
srclight workspace status -w myworkspace
# Restart by removing and re-adding
claude mcp remove srclight
claude mcp add --scope user srclight -- srclight serve --workspace myworkspaceCursor applies a short timeout to MCP tool calls. Srclight avoids long blocks by:
- Using a 20s timeout for embedding API requests (Ollama, OpenAI, etc.). If the embed service is slow or unreachable, the tool returns within 20s:
hybrid_searchfalls back to keyword-only;semantic_searchreturns an error with a hint. - Prefer SSE (streamable HTTP) with a long-running server so the first request doesn’t pay cold-start (workspace load, vector cache load). In Cursor MCP config, use Type streamableHttp and URL
http://127.0.0.1:8742/ssewith srclight started separately (e.g. systemd or a terminal).
If you need a longer embed timeout (e.g. for slow Ollama on first load), set:
export SRCLIGHT_EMBED_REQUEST_TIMEOUT=45Then start Cursor (or start srclight with that env in its process).
# Check embedding status via CLI
cd /path/to/repo
srclight index --embed qwen3-embedding
# Or ask the agent: "What's the embedding status?"
# → calls embedding_status() tool
# Check whether the embedding provider (e.g. Ollama) is reachable
# → calls embedding_health() tool# Check hook status
cd /path/to/repo
srclight hook status
# Re-install if needed
srclight hook install
# Check hook log
cat .srclight/reindex.logIf a repo changes location on disk, update the workspace:
srclight workspace remove old-name -w myworkspace
srclight workspace add /new/path/to/repo -w myworkspace
srclight workspace index -w myworkspace -p new-name --embed qwen3-embeddingClaude Code supports custom agents defined in .claude/agents/*.md. These agents run as subprocesses with their own tool access, controlled by the tools: frontmatter field.
Custom agents defined in .claude/agents/ cannot access MCP tools. This is a known bug (#25200) as of Claude Code v2.1.52 (Feb 2026).
The tool injection code has two paths: built-in agents receive MCP tools, custom agents do not. None of these workarounds help:
- Adding MCP tool names to
tools:frontmatter - Adding
ToolSearchtotools:frontmatter - Adding
mcpServers:to frontmatter - Omitting
tools:entirely (should inherit all — doesn't)
| Agent Type | Tools | MCP Access |
|---|---|---|
general-purpose |
* (all) |
Yes |
Explore |
All except Task/Edit/Write | Yes |
Plan |
All except Task/Edit/Write | Yes |
Custom agents (.claude/agents/) |
Core tools only | No — bug #13605 |
Until the bug is fixed, the only way to give a subagent access to srclight is to use general-purpose as the subagent_type. It has (Tools: *) which includes ToolSearch and all MCP tools:
Task(
subagent_type="general-purpose",
prompt="You are a UI design reviewer. Use srclight MCP tools for code analysis..."
)
The agent must call ToolSearch("srclight") before using any mcp__srclight__* tool. Include this instruction in the prompt.
Tradeoff: general-purpose agents also have write access (Edit, Write), which is more permissive than a read-only reviewer needs. The agent's system prompt can instruct it not to modify files.
Since custom agents can't access MCP (#13605), invoke via general-purpose and include your review instructions in the prompt:
Task(
subagent_type="general-purpose",
prompt="You are a senior UI/UX designer reviewing a Flutter app.
## srclight Code Index (MCP)
Use ToolSearch to load srclight tools before calling them. Key tools:
| Tool | Use |
|------|-----|
| mcp__srclight__symbols_in_file(path, project) | Widget/class outline |
| mcp__srclight__get_callers(symbol, project) | Consistency checks |
| mcp__srclight__search_symbols(query, project) | Find exact names |
Workflow: Use symbols_in_file to get outlines, then Read sections.
Use get_callers to verify token usage consistency. Use Grep for pattern
violations (raw Color literals, bare EdgeInsets) that srclight can't catch.
DO NOT modify any files. This is a read-only review."
)
## Architecture Notes
- **One server, one workspace**: The MCP server runs in workspace mode serving all repos. Each project's `.srclight/index.db` is ATTACHed to a `:memory:` database at query time via SQLite's ATTACH mechanism.
- **ATTACH limit**: SQLite allows max 10 ATTACHed databases. >10 projects are handled by batch detach/reattach in `_iter_batches()`.
- **GPU cache**: Each project gets its own `VectorCache` loaded to GPU VRAM (cupy) or CPU RAM (numpy). Caches are loaded lazily on first semantic search and invalidated when `embedding_cache_version` in the DB changes.
- **No network**: Everything runs locally. Ollama is on `localhost:11434`. No cloud APIs unless you opt into Voyage Code 3.