fix(docker): bundle qmd binary + prebuilt codex index for search_codex runtime#70
Conversation
…x runtime Post-PR-#69 verification (2026-05-02T23:38Z): Railway redeployed v1.4.0 successfully — `serverInfo.version: "1.4.0"` confirmed via S0 against gateway. tools/list now shows all 9 tools including search_codex (intent layer from PR #65). But calling search_codex returns: { "error": "qmd_unavailable", "message": "qmd binary not found in PATH", "hint": "@tobilu/qmd is a peerDependency. install + run scripts/build-micodex-index.sh" } Same KRANZ act 1 reproducibility issue caught locally during session 09a forge prep — now at the Docker layer. PR #69 fixed the build (alpine→slim); this PR fixes the runtime (qmd binary + prebuilt index). Builder stage: - npm install -g @tobilu/qmd@^2.1.0 — postinstall reuses already-installed git/make/python3/g++ for node-llama-cpp clone+build - bash scripts/build-micodex-index.sh — registers codex-grails + codex-core-lore collections + adds context, populates ~/.cache/qmd/index.sqlite Runtime stage: - COPY /usr/local/lib/node_modules/@tobilu (the package + .node binary) - COPY /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd → /usr/local/bin/qmd (explicit file copy; not relying on Docker's symlink-resolution behavior) - chmod +x the bin - COPY /root/.cache/qmd (the prebuilt index) The runtime stage stays compiler-free (F5 PRAISE preserved) — only adds qmd's compiled artifacts + index, no build tools. Verifies post-merge: - curl gateway tools/call search_codex {intent: "void motif"} → @g876 Black Hole at score ≥0.7 (currently returns qmd_unavailable error) - WITNESS S3 (substrate-truth seed cases) all 5 cases pass against prod - 9/9 tools functional (today: 8/9 — lookup_* work, search_codex broken) Image-size cost: +~50MB for global qmd install + ~90MB for qmd index.sqlite. Acceptable trade for working intent layer in prod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
zkSoju
left a comment
There was a problem hiding this comment.
Summary
Analytical review of #70. Enrichment pass was unavailable; findings are unenriched.
Findings
{
"schema_version": 1,
"findings": [
{
"id": "qmd-bin-symlink-vs-copy",
"title": "Copying qmd bin script as a file may break if it's a symlink to dist/cli.js",
"severity": "MEDIUM",
"category": "correctness",
"file": "Dockerfile",
"description": "npm typically installs global bin entries as symlinks in /usr/local/bin pointing to ../lib/node_modules/<pkg>/bin/<file>. The Dockerfile does `COPY --from=builder /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd` which copies the actual script content. This works only if bin/qmd is a self-contained Node script with a proper shebang. If qmd's package.json `bin` points to a different path (e.g., dist/cli.js), or if the script uses relative requires that resolve based on its location in node_modules, copying it to /usr/local/bin will break module resolution because Node's require lookup starts from the script's own directory.",
"suggestion": "Recreate the symlink instead: `RUN ln -s /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd`. Verify the actual bin entry path from the package's package.json — npm 9+ installs to /usr/local/bin as symlinks, so this might already exist if you also COPY /usr/local/bin/qmd from builder. Either way, prefer the symlink approach to preserve the script's __dirname-relative resolution.",
"confidence": 0.7
},
{
"id": "version-pinning",
"title": "Caret range on qmd defeats reproducible image builds",
"severity": "LOW",
"category": "reproducibility",
"file": "Dockerfile",
"description": "`npm install -g @tobilu/qmd@^2.1.0` allows any 2.x.y >= 2.1.0 to be picked up at image build time. Two builds of the same Dockerfile from the same source can produce different qmd versions, complicating debugging of search_codex runtime failures.",
"suggestion": "Pin to an exact version (e.g., `@tobilu/qmd@2.1.0`) and bump intentionally. Consider also pinning in pnpm/npm lockfile coverage if qmd is referenced elsewhere.",
"confidence": 0.9
},
{
"id": "index-rebuild-cache-miss",
"title": "Codex index rebuild on every image build inflates build time and is non-cached",
"severity": "LOW",
"category": "performance",
"file": "Dockerfile",
"description": "`RUN bash scripts/build-micodex-index.sh` runs after `COPY . .` so any change to the working tree invalidates the layer and re-builds the index. If indexing involves embedding generation via node-llama-cpp this can be slow. No issue if intentional, but worth flagging.",
"suggestion": "If the index inputs are a known subset of the repo (e.g., codex/ directory), copy only those before the index build to maximize layer cache hits. Otherwise, accept the cost and document expected build time.",
"confidence": 0.6
},
{
"id": "root-cache-path-runtime-user",
"title": "Index copied to /root/.cache/qmd assumes runtime runs as root",
"severity": "MEDIUM",
"category": "correctness",
"file": "Dockerfile",
"description": "The index is copied to `/root/.cache/qmd` in the runtime image. If the container is ever run as a non-root user (e.g., via `USER` directive added later, Kubernetes runAsNonRoot, or `docker run -u`), qmd will look in $HOME/.cache/qmd of that user, not /root, and search_codex will silently return qmd_unavailable again — the exact failure mode this PR fixes.",
"suggestion": "Either (a) document that the runtime must be root, (b) set `ENV XDG_CACHE_HOME=/opt/qmd-cache` and copy the index there with world-readable perms so it works regardless of UID, or (c) honor a qmd-specific env var if one exists. Option (b) is most robust.",
"confidence": 0.75
},
{
"id": "no-runtime-smoke-test",
"title": "No build-time verification that qmd is functional in runtime stage",
"severity": "LOW",
"category": "testing",
"file": "Dockerfile",
"description": "The PR description and comments note that this fixes a silent-runtime-failure pattern (qmd_unavailable returned only at first searchCodex call). The Dockerfile does not add a smoke test (e.g., `RUN qmd --version` or `RUN qmd query ...`) in the runtime stage to fail the build if the binary or index is broken. Future regressions to bin path, index location, or libstdc++ deps will again fail silently at runtime.",
"suggestion": "Add `RUN qmd --version && qmd query --help` (or a minimal real query against the bundled index) in the runtime stage so image build fails loudly on regressions.",
"confidence": 0.85
},
{
"id": "scope-name-redundant-copy",
"title": "Redundant COPY of @tobilu scope and qmd subpath",
"severity": "LOW",
"category": "clarity",
"file": "Dockerfile",
"description": "The first COPY copies the entire `/usr/local/lib/node_modules/@tobilu` directory which already contains `@tobilu/qmd/bin/qmd`. The second COPY then extracts the same file to /usr/local/bin/qmd. This is fine but slightly redundant; the second COPY could be replaced with a symlink (also addressing the symlink concern in qmd-bin-symlink-vs-copy).",
"suggestion": "Replace the second COPY + chmod with `RUN ln -s /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd`.",
"confidence": 0.8
},
{
"id": "praise-failure-mode-doc",
"title": "Comments precisely document the silent-failure mode being fixed",
"severity": "PRAISE",
"category": "documentation",
"file": "Dockerfile",
"description": "The inline comments reference the exact source location (src/lookups/search.ts:120), the failure signature (\"qmd_unavailable\"), the runtime mechanism (spawnSync), and the dlopen dependency on libstdc++/libgomp. This makes the rationale auditable and gives the next person debugging build/runtime issues a clear map.",
"suggestion": "Keep this style for future Dockerfile changes."
}
]
}Callouts
Enrichment unavailable for this review.
…moke, pin Addresses 4 bridgebuilder findings inline: - F1 MEDIUM (qmd-bin-symlink-vs-copy · conf 0.7) ✅ Switched COPY+chmod to RUN ln -s. Preserves __dirname-relative require resolution if qmd's bin script does any. Also addresses F6 redundancy (the bin file was already in @tobilu COPY tree above). - F4 MEDIUM (root-cache-path-runtime-user · conf 0.75) ✅ Set ENV XDG_CACHE_HOME=/opt/qmd-cache in both stages. UID-portable — survives USER directive, K8s runAsNonRoot, docker run -u <uid>. Index built/copied to /opt/qmd-cache instead of /root/.cache/qmd. Added chmod -R a+r so non-root users can read. - F2 LOW (version-pinning · conf 0.9) ✅ npm install -g @tobilu/qmd@^2.1.0 → @2.1.0 (exact pin). Reproducible builds; bump intentionally. - F5 LOW (no-runtime-smoke-test · conf 0.85) ✅ Added RUN qmd --version && qmd status to runtime stage. Catches symlink failures, missing libs, XDG_CACHE_HOME mismatches at build time instead of first search_codex call. - F3 LOW (index-rebuild-cache-miss · conf 0.6) ⏸ deferred Real but non-trivial fix (split COPY into multiple stages for grails/ + core-lore/ + scripts/ before main COPY . .). Worthwhile follow-up if build time becomes the bottleneck. - F6 LOW (scope-name-redundant-copy · conf 0.8) ✅ resolved by F1 fix - F7 PRAISE (failure-mode docs) — comments extended with bridgebuilder ref Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30957495a9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # returns "qmd_unavailable" at first call — the silent-runtime-failure pattern | ||
| # from PR #69 reaching one layer deeper. | ||
| COPY --from=builder /usr/local/lib/node_modules/@tobilu /usr/local/lib/node_modules/@tobilu | ||
| COPY --from=builder /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd |
There was a problem hiding this comment.
Copy the installed qmd symlink, not the package wrapper
@tobilu/qmd 2.1.0 ships a bin/qmd shell wrapper that derives its package root from dirname($0) (after symlink resolution) and then executes $DIR/dist/cli/qmd.js; copying .../qmd/bin/qmd directly to /usr/local/bin/qmd makes $0 resolve to /usr/local/bin/qmd, so it looks for /usr/local/dist/cli/qmd.js (missing) and exits at runtime. In this image, searchCodex shells out to qmd, so this line can keep search_codex broken even though files were copied. Copy /usr/local/bin/qmd from builder (preserving npm’s symlink target) or adjust the wrapper target path.
Useful? React with 👍 / 👎.
Bridgebuilder #70 findings addressedPushed
Ready to merge. |
…at smoke Tightens runtime smoke from `qmd status` (passes even with empty registry — the exact silent-success that #70 shipped) to explicit grep assertions on both expected collection names. Fails loudly if EITHER: - /root/.config/qmd registry COPY missed - collection registration failed in builder - qmd 2.x changes registry format such that names don't appear in `qmd status` output F1 (MEDIUM root-path portability) intentionally deferred — the PR's premise is that non-root deploy was theoretical and the simpler default-paths approach is safer than half-XDG (which is what #70 shipped and broke things). Revisit if/when non-root deploy is actually needed; would require qmd 2.x to expose config-dir override. F3/F4 (LOW perms/size) deferred — low-leverage; address if image bloats or runtime user changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ailed post-#70 (#71) * fix(docker): copy qmd registry (~/.config/qmd) — search_codex still failed post-#70 Verified post-merge of PR #70 (Railway redeploy 2026-05-02T~23:55Z): search_codex returned a NEW error — "qmd collection \"codex-grails\" not found." Build/symlink worked; the qmd binary loads. But qmd's collection registry is missing in runtime. Root cause: qmd splits state across THREE locations (verified via local install inspection + qmd source at dist/store.js:345, dist/cli/qmd.js:225, dist/llm.js:64, dist/collections.js:69): /root/.config/qmd/index.yml collection registry (XDG-config respected) /root/.cache/qmd/index.sqlite content + embeddings (XDG-cache respected) /root/.cache/qmd/models/ embedding model (HARDCODED homedir/.cache) #70 set XDG_CACHE_HOME=/opt/qmd-cache for non-root portability per bridgebuilder F4 — but XDG only redirects ~/.cache, not ~/.config. So: - Builder wrote registry to /root/.config/qmd/index.yml (always defaults here) - Builder wrote sqlite/models to /opt/qmd-cache/qmd/ (XDG redirect) - Runtime COPYed only /opt/qmd-cache → registry was MISSING in runtime - qmd at runtime: "I have a sqlite file but don't know what collections exist" - Result: "qmd collection codex-grails not found" Pragmatic fix: don't rebase qmd's paths. Drop XDG_CACHE_HOME ENV statements; let qmd write defaults in builder; COPY both /root/.config/qmd + /root/.cache/qmd verbatim to runtime. The non-root portability concern from F4 was theoretical (Railway runs as root). If non-root deploy is needed later, address then with a proper qmd config-path override (which qmd 2.x doesn't expose for ~/.config). Tightened smoke test: `qmd status` (no head pipe) loads the registry — if collections list is empty here, the COPY missed something. Build fails loudly at this step instead of search_codex erroring at first call. Image-size cost grows by ~/.cache/qmd/models (~50-150MB depending on embedding model qmd downloaded). Trade for working search_codex without runtime model download. V2 candidate: lazy model load + persistent volume mount. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): bridgebuilder #71 F2 — assert both collections register at smoke Tightens runtime smoke from `qmd status` (passes even with empty registry — the exact silent-success that #70 shipped) to explicit grep assertions on both expected collection names. Fails loudly if EITHER: - /root/.config/qmd registry COPY missed - collection registration failed in builder - qmd 2.x changes registry format such that names don't appear in `qmd status` output F1 (MEDIUM root-path portability) intentionally deferred — the PR's premise is that non-root deploy was theoretical and the simpler default-paths approach is safer than half-XDG (which is what #70 shipped and broke things). Revisit if/when non-root deploy is actually needed; would require qmd 2.x to expose config-dir override. F3/F4 (LOW perms/size) deferred — low-leverage; address if image bloats or runtime user changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Post-#69 verification surfaced a deeper layer of the same KRANZ-act-1 reproducibility issue: the codex MCP at v1.4.0 now registers
search_codex(per PR #65), but calling it returns:PR #69 fixed the build (alpine→slim · runtime libs). This PR fixes the runtime (qmd binary + prebuilt qmd index in the image).
What changes
Builder stage (after
pnpm install):npm install -g @tobilu/qmd@^2.1.0— postinstall reuses the build toolchain (git/make/python3/g++) already added in fix(docker): node:20-alpine → node:20-slim for node-llama-cpp postinstall #69bash scripts/build-micodex-index.sh— populates~/.cache/qmd/index.sqlitewith codex-grails + codex-core-lore collections + contextRuntime stage (after libs install):
COPYqmd's global node_modules (package JS +.nodeaddon)chmod +xthe binCOPYthe prebuilt~/.cache/qmdindexRuntime stage stays compiler-free (#69 F5 PRAISE preserved) — only adds qmd's compiled artifacts + index, no build tools.
Verifies post-merge
curl -X POST https://mcp.0xhoneyjar.xyz/codex/mcp ... tools/call name:"search_codex" arguments:{intent:"void motif"}→ returns@g876Black Hole at score ≥0.7 (today: returnsqmd_unavailable)grimoires/loa/qa/qa-cycle-micodex-09a-2026-05-02.md(PR feat(eval): MICODEX intent-layer eval corpus — session 09a · 58/58 #68): all 5 substrate-truth seed cases pass against prodlookup_*work,search_codexbroken)Image-size cost
+~50MBfor global qmd package install+~90MBfor qmdindex.sqlite(43 grails + core-lore content + embeddings)Acceptable trade for a working intent layer in prod. If image size becomes a concern, V2 candidate: lazy-build the index on container start (saves image bytes, costs ~30-90s cold start).
Related
🤖 Generated with Claude Code