QMDR — hybrid markdown search (BM25 + vector + LLM reranking). Fork of tobi/qmd, cloud APIs replace local GGUF models.
Bun ≤1.3.6 required — 1.3.7+ segfault with sqlite-vec (unfixed as of 1.3.9).
Path A — Bun (recommended):
curl -fsSL https://bun.sh/install | bash -s "bun-v1.3.6"
# China: prepend BUN_INSTALL_MIRROR=https://npmmirror.com/mirrors/bun
bun install -g github:uf-hy/qmdrPath B — Binary (no Bun needed):
Available: macOS arm64/x64, Linux x64/arm64, Windows x64. Windows ARM → use x64 binary (runs via emulation).
OS=$(uname -s | tr A-Z a-z); ARCH=$(uname -m); [ "$ARCH" = "x86_64" ] && ARCH="x64"
DIR=/usr/local/bin; [ ! -w "$DIR" ] && DIR="$HOME/.local/bin" && mkdir -p "$DIR"
curl -L "https://github.com/uf-hy/qmdr/releases/latest/download/qmd-${OS}-${ARCH}" -o "$DIR/qmd" && chmod +x "$DIR/qmd"Windows (PowerShell):
Invoke-WebRequest "https://github.com/uf-hy/qmdr/releases/latest/download/qmd-windows-x64.exe" -OutFile "$env:LOCALAPPDATA\qmd.exe"
# Add to PATH if needed: [Environment]::SetEnvironmentVariable("Path", "$env:Path;$env:LOCALAPPDATA", "User")Binary: sqlite-vec must be loadable (QMD_ALLOW_SQLITE_EXTENSIONS=1). macOS: brew install sqlite. Linux: apt install libsqlite3-dev. Windows: sqlite3.dll usually bundled. Or set QMD_SQLITE_VEC_PATH.
Verify: qmd --help. If not found, ensure install dir is in PATH.
Config file: ~/.config/qmd/.env (auto-loaded on startup). Collect all keys, write once.
mkdir -p ~/.config/qmdSiliconFlow gives ¥14 free credit on signup. Daily cost ~¥0.06 for typical use (indexing + querying). The free credit lasts 6+ months.
SiliconFlow also offers permanently free models (embedding, chat, reranking) — with these, the ¥14 credit is only used for premium models. See "Free tier" config below.
Region: China → default endpoint works. International → add QMD_SILICONFLOW_BASE_URL=https://api.siliconflow.com/v1.
Default config (best quality, uses ¥14 credit):
# Required: SiliconFlow. Register at cloud.siliconflow.cn (.com intl). ¥14 free credit on signup.
QMD_SILICONFLOW_API_KEY=sk-xxx
# Recommended: Gemini reranking (free). Get key at ai.google.dev
QMD_GEMINI_API_KEY=xxx
QMD_RERANK_PROVIDER=gemini
# Alternative: SiliconFlow reranking (no extra key, uses ¥14 credit)
# QMD_RERANK_PROVIDER=siliconflow
# QMD_RERANK_MODE=llmFree tier config (¥0 cost, SiliconFlow free models only, slightly lower quality):
QMD_SILICONFLOW_API_KEY=sk-xxx
QMD_SILICONFLOW_EMBED_MODEL=BAAI/bge-m3
QMD_SILICONFLOW_QUERY_EXPANSION_MODEL=Qwen/Qwen2.5-7B-Instruct
QMD_RERANK_PROVIDER=siliconflow
QMD_RERANK_MODE=llm
QMD_LLM_RERANK_MODEL=Qwen/Qwen2.5-7B-InstructFree tier uses bge-m3 (1024d) instead of Qwen3-Embedding-8B (4096d), and Qwen2.5-7B for query expansion/reranking instead of GLM-4.5-Air. Chinese retrieval quality is slightly lower but still good. Switching models requires
qmd embed -fto rebuild vectors.
# Optional: custom OpenAI-compatible endpoint
# QMD_OPENAI_API_KEY=xxx
# QMD_OPENAI_BASE_URL=https://custom-endpoint.com/v1
# QMD_EMBED_PROVIDER=openaiAlibaba Bailian (百炼) reranking (native qwen3-rerank, fastest):
QMD_DASHSCOPE_API_KEY=sk-xxx
QMD_RERANK_PROVIDER=dashscope
QMD_RERANK_MODE=rerank
# QMD_DASHSCOPE_RERANK_MODEL=qwen3-rerank # defaultBailian's qwen3-rerank uses a dedicated rerank API (
/compatible-api/v1/reranks), not the OpenAI-compatible chat endpoint. QMDR handles this automatically whenQMD_RERANK_PROVIDER=dashscope. Combine with SiliconFlow for embedding + query expansion.
Step 1: Calculate your daily tokens
Memory - how to calculate:
- Count characters in your memory/*.md files
- tokens = characters ÷ 1.74 (Chinese text approximate ratio)
- Or use tiktoken for exact count
Sessions - how to calculate:
- Check your session files in ~/.openclaw/agents/*/sessions/
- Estimate similarly, or reference: ~360K tokens/day
Step 2: Calculate years with ¥14 credit
Formula:
- Years = (14 ÷ price_per_M) × 1,000,000 ÷ daily_tokens
- Free models (bge-large-zh-v1.5, bge-m3): unlimited
Example - if your memory = 5K tokens/day:
- Qwen3-Embedding-0.6B (¥0.07/M): (14 ÷ 0.07) × 1M ÷ 5,000 = 40,000 days ≈ 110 years
- Qwen3-Embedding-4B (¥0.14/M): (14 ÷ 0.14) × 1M ÷ 5,000 = 20,000 days ≈ 55 years
- Free models: unlimited
Example - if memory + sessions = 360K tokens/day:
- Qwen3-Embedding-0.6B: (14 ÷ 0.07) × 1M ÷ 360,000 = 555 days ≈ 1.5 years
- Qwen3-Embedding-4B: (14 ÷ 0.14) × 1M ÷ 360,000 = 277 days ≈ 0.8 years
- Free models: unlimited
QMDR supports two reranking strategies:
QMD_RERANK_MODE=rerank— Dedicated reranker model (e.g. bge-reranker, qwen3-rerank). Fast (~300ms), returns relevance scores directly. Best quality.QMD_RERANK_MODE=llm(default) — Uses a chat model to extract and rank relevant content. Slower (~1-3s) but works with any OpenAI-compatible API. Also extracts key passages.
For LLM rerank, use a lightweight/cheap model — it only needs to read ~20 short chunks and pick the relevant ones. Recommended models:
- SiliconFlow →
zai-org/GLM-4.5-Air(default, ¥1/M in) - SiliconFlow free →
Qwen/Qwen2.5-7B-Instruct(free, unlimited)
After configuring, run qmd doctor --bench to verify rerank latency. Target: <500ms for dedicated reranker, <3s for LLM rerank.
Provider auto-routing: siliconflow → gemini → dashscope → openai (first with configured key).
QMDR requires cloud APIs — there are no local/fallback models. Unconfigured providers will show ❌ in qmd doctor.
qmd doctor # check providers
qmd doctor --bench # optional: query quality testProfile query performance — find bottlenecks in the search pipeline:
qmd query "your question" --profileShows timing breakdown: query expansion → BM25 retrieval → vector search → reranking → total. Reranking typically takes 60-80% of total time. Use this to decide if you need a faster reranker.
Verbose output — see the full query process (expanded queries, chunk selection, reranker scores):
qmd query "your question" --verboseDefault output is minimal (results only, ~400 tokens). --verbose shows everything (~3700 tokens). --profile auto-enables verbose.
Native memory backend, no MCP.
Index workspace (OpenClaw workspace: directory containing openclaw.json, typically ~/.openclaw/workspace):
qmd collection add <WORKSPACE_PATH> --name memory --mask "*.md" # idempotent
qmd context add qmd://memory "Personal memory files"
qmd embed
qmd query "test" -c memory # empty result = no .md files in pathMerge into openclaw.json (typically ~/.openclaw/openclaw.json — read existing first, merge memory block only):
{
"memory": {
"backend": "qmd",
"qmd": {
"command": "qmd",
"includeDefaultMemory": true,
"sessions": { "enabled": true },
"update": { "interval": "5m", "debounceMs": 15000 },
"limits": { "maxResults": 6, "timeoutMs": 60000 }
}
}
}command: "qmd" (global) or absolute path to src/qmd.ts (source mode — needs #!/usr/bin/env bun shebang + chmod +x).
Never use "bun /path/to/qmd.ts" — bun misparses subcommands.
Sessions indexing — "sessions": { "enabled": true } exports your AI conversation history as markdown files and indexes them. This means QMDR can search through past conversations, decisions, and context — not just your static markdown files. Highly recommended for personal knowledge retrieval.
Cost tip: Sessions generate ~300-400K tokens/day. With free models (bge-large-zh-v1.5, bge-m3), cost is ¥0. Or use ¥14 credit: ~0.4-1.5 years depending on model tier.
Env keys: ~/.config/qmd/.env auto-loaded (recommended). Or add to launchd/systemd. Process env > .env.
Restart: openclaw gateway restart. Verify memory_search returns "provider": "qmd".
If QMDR fails or times out, OpenClaw automatically falls back to built-in SQLite search.
Install skill:
# Claude Code
mkdir -p ~/.claude/skills/qmdr
curl -sL https://raw.githubusercontent.com/uf-hy/qmdr/main/skills/qmdr/SKILL.md -o ~/.claude/skills/qmdr/SKILL.md
# OpenCode
mkdir -p ~/.config/opencode/skills/qmdr
curl -sL https://raw.githubusercontent.com/uf-hy/qmdr/main/skills/qmdr/SKILL.md -o ~/.config/opencode/skills/qmdr/SKILL.mdUsage: qmd query "natural language" (hybrid) | qmd search "keywords" (BM25) | qmd get "#docid" (full doc).
Always use natural language for query, not keyword concatenation.
Index your project before first use:
qmd collection add . --name project --mask "*.md"
qmd embedEssential (used during setup):
Variable Default Note QMD_SILICONFLOW_API_KEY — Required QMD_GEMINI_API_KEY — Recommended (reranking) QMD_OPENAI_API_KEY — Optional (custom endpoint) QMD_EMBED_PROVIDER auto siliconflow / openai QMD_QUERY_EXPANSION_PROVIDER auto siliconflow / gemini / openai QMD_RERANK_PROVIDER auto siliconflow / gemini / openai / dashscope QMD_RERANK_MODE llm llm / rerank (dedicated API) QMD_SILICONFLOW_BASE_URL https://api.siliconflow.cn/v1 International: .com QMD_GEMINI_BASE_URL Google default Custom endpoint / proxy (China users) QMD_DASHSCOPE_API_KEY — Alibaba Bailian (rerank only)
Model/tuning overrides (change only if needed):
Variable Default QMD_SILICONFLOW_EMBED_MODEL Qwen/Qwen3-Embedding-8B QMD_SILICONFLOW_QUERY_EXPANSION_MODEL zai-org/GLM-4.5-Air QMD_GEMINI_MODEL gemini-2.5-flash (thinkingBudget=0) QMD_LLM_RERANK_MODEL zai-org/GLM-4.5-Air QMD_SILICONFLOW_RERANK_MODEL BAAI/bge-reranker-v2-m3 QMD_DASHSCOPE_RERANK_MODEL qwen3-rerank QMD_CHUNK_SIZE_TOKENS 200 QMD_CHUNK_OVERLAP_TOKENS 40 QMD_SQLITE_VEC_PATH auto
Changing embedding model → qmd embed -f to rebuild index.
Problem Fix qmd: command not found Ensure install dir in PATH sqlite-vec load error macOS: brew install sqlite; or set QMD_SQLITE_VEC_PATH /$bunfs/root/ error Use bun install -g (source), not compiled binary Segfault on Linux Bun ≥1.3.7 — downgrade to 1.3.6 Dimension mismatch qmd embed -f Slow queries (>5s) qmd doctor — use non-thinking models Script not found "query" Use direct path with shebang, not "bun /path/to/qmd.ts"