Skip to content

fix(docker): bundle qmd binary + prebuilt codex index for search_codex runtime#70

Merged
zkSoju merged 2 commits into
mainfrom
fix/dockerfile-bundle-qmd-binary-and-index
May 2, 2026
Merged

fix(docker): bundle qmd binary + prebuilt codex index for search_codex runtime#70
zkSoju merged 2 commits into
mainfrom
fix/dockerfile-bundle-qmd-binary-and-index

Conversation

@zkSoju
Copy link
Copy Markdown
Contributor

@zkSoju zkSoju commented May 2, 2026

Summary

Post-#69 verification surfaced a deeper layer of the same KRANZ-act-1 reproducibility issue: the codex MCP at v1.4.0 now registers search_codex (per PR #65), but calling it returns:

{
  "error": "qmd_unavailable",
  "message": "qmd binary not found in PATH",
  "hint": "@tobilu/qmd is a peerDependency. install + run scripts/build-micodex-index.sh"
}

PR #69 fixed the build (alpine→slim · runtime libs). This PR fixes the runtime (qmd binary + prebuilt qmd index in the image).

What changes

Builder stage (after pnpm install):

Runtime stage (after libs install):

  • COPY qmd's global node_modules (package JS + .node addon)
  • Explicit file-copy of qmd bin script (not relying on Docker symlink-resolution behavior)
  • chmod +x the bin
  • COPY the prebuilt ~/.cache/qmd index

Runtime stage stays compiler-free (#69 F5 PRAISE preserved) — only adds qmd's compiled artifacts + index, no build tools.

Verifies post-merge

  • Railway auto-builds successfully (no postinstall regression)
  • curl -X POST https://mcp.0xhoneyjar.xyz/codex/mcp ... tools/call name:"search_codex" arguments:{intent:"void motif"} → returns @g876 Black Hole at score ≥0.7 (today: returns qmd_unavailable)
  • WITNESS S3 in grimoires/loa/qa/qa-cycle-micodex-09a-2026-05-02.md (PR feat(eval): MICODEX intent-layer eval corpus — session 09a · 58/58 #68): all 5 substrate-truth seed cases pass against prod
  • 9/9 tools functional in production (today: 8/9 — lookup_* work, search_codex broken)

Image-size cost

  • +~50MB for global qmd package install
  • +~90MB for qmd index.sqlite (43 grails + core-lore content + embeddings)

Acceptable trade for a working intent layer in prod. If image size becomes a concern, V2 candidate: lazy-build the index on container start (saves image bytes, costs ~30-90s cold start).

Related

🤖 Generated with Claude Code

…x runtime

Post-PR-#69 verification (2026-05-02T23:38Z): Railway redeployed v1.4.0
successfully — `serverInfo.version: "1.4.0"` confirmed via S0 against
gateway. tools/list now shows all 9 tools including search_codex (intent
layer from PR #65). But calling search_codex returns:

  {
    "error": "qmd_unavailable",
    "message": "qmd binary not found in PATH",
    "hint": "@tobilu/qmd is a peerDependency. install + run scripts/build-micodex-index.sh"
  }

Same KRANZ act 1 reproducibility issue caught locally during session 09a
forge prep — now at the Docker layer. PR #69 fixed the build (alpine→slim);
this PR fixes the runtime (qmd binary + prebuilt index).

Builder stage:
- npm install -g @tobilu/qmd@^2.1.0 — postinstall reuses already-installed
  git/make/python3/g++ for node-llama-cpp clone+build
- bash scripts/build-micodex-index.sh — registers codex-grails + codex-core-lore
  collections + adds context, populates ~/.cache/qmd/index.sqlite

Runtime stage:
- COPY /usr/local/lib/node_modules/@tobilu (the package + .node binary)
- COPY /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd → /usr/local/bin/qmd
  (explicit file copy; not relying on Docker's symlink-resolution behavior)
- chmod +x the bin
- COPY /root/.cache/qmd (the prebuilt index)

The runtime stage stays compiler-free (F5 PRAISE preserved) — only adds
qmd's compiled artifacts + index, no build tools.

Verifies post-merge:
- curl gateway tools/call search_codex {intent: "void motif"} → @g876 Black
  Hole at score ≥0.7 (currently returns qmd_unavailable error)
- WITNESS S3 (substrate-truth seed cases) all 5 cases pass against prod
- 9/9 tools functional (today: 8/9 — lookup_* work, search_codex broken)

Image-size cost: +~50MB for global qmd install + ~90MB for qmd index.sqlite.
Acceptable trade for working intent layer in prod.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 2, 2026 11:51pm

Request Review

Copy link
Copy Markdown
Contributor Author

@zkSoju zkSoju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Analytical review of #70. Enrichment pass was unavailable; findings are unenriched.

Findings

{
  "schema_version": 1,
  "findings": [
    {
      "id": "qmd-bin-symlink-vs-copy",
      "title": "Copying qmd bin script as a file may break if it's a symlink to dist/cli.js",
      "severity": "MEDIUM",
      "category": "correctness",
      "file": "Dockerfile",
      "description": "npm typically installs global bin entries as symlinks in /usr/local/bin pointing to ../lib/node_modules/<pkg>/bin/<file>. The Dockerfile does `COPY --from=builder /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd` which copies the actual script content. This works only if bin/qmd is a self-contained Node script with a proper shebang. If qmd's package.json `bin` points to a different path (e.g., dist/cli.js), or if the script uses relative requires that resolve based on its location in node_modules, copying it to /usr/local/bin will break module resolution because Node's require lookup starts from the script's own directory.",
      "suggestion": "Recreate the symlink instead: `RUN ln -s /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd`. Verify the actual bin entry path from the package's package.json — npm 9+ installs to /usr/local/bin as symlinks, so this might already exist if you also COPY /usr/local/bin/qmd from builder. Either way, prefer the symlink approach to preserve the script's __dirname-relative resolution.",
      "confidence": 0.7
    },
    {
      "id": "version-pinning",
      "title": "Caret range on qmd defeats reproducible image builds",
      "severity": "LOW",
      "category": "reproducibility",
      "file": "Dockerfile",
      "description": "`npm install -g @tobilu/qmd@^2.1.0` allows any 2.x.y >= 2.1.0 to be picked up at image build time. Two builds of the same Dockerfile from the same source can produce different qmd versions, complicating debugging of search_codex runtime failures.",
      "suggestion": "Pin to an exact version (e.g., `@tobilu/qmd@2.1.0`) and bump intentionally. Consider also pinning in pnpm/npm lockfile coverage if qmd is referenced elsewhere.",
      "confidence": 0.9
    },
    {
      "id": "index-rebuild-cache-miss",
      "title": "Codex index rebuild on every image build inflates build time and is non-cached",
      "severity": "LOW",
      "category": "performance",
      "file": "Dockerfile",
      "description": "`RUN bash scripts/build-micodex-index.sh` runs after `COPY . .` so any change to the working tree invalidates the layer and re-builds the index. If indexing involves embedding generation via node-llama-cpp this can be slow. No issue if intentional, but worth flagging.",
      "suggestion": "If the index inputs are a known subset of the repo (e.g., codex/ directory), copy only those before the index build to maximize layer cache hits. Otherwise, accept the cost and document expected build time.",
      "confidence": 0.6
    },
    {
      "id": "root-cache-path-runtime-user",
      "title": "Index copied to /root/.cache/qmd assumes runtime runs as root",
      "severity": "MEDIUM",
      "category": "correctness",
      "file": "Dockerfile",
      "description": "The index is copied to `/root/.cache/qmd` in the runtime image. If the container is ever run as a non-root user (e.g., via `USER` directive added later, Kubernetes runAsNonRoot, or `docker run -u`), qmd will look in $HOME/.cache/qmd of that user, not /root, and search_codex will silently return qmd_unavailable again — the exact failure mode this PR fixes.",
      "suggestion": "Either (a) document that the runtime must be root, (b) set `ENV XDG_CACHE_HOME=/opt/qmd-cache` and copy the index there with world-readable perms so it works regardless of UID, or (c) honor a qmd-specific env var if one exists. Option (b) is most robust.",
      "confidence": 0.75
    },
    {
      "id": "no-runtime-smoke-test",
      "title": "No build-time verification that qmd is functional in runtime stage",
      "severity": "LOW",
      "category": "testing",
      "file": "Dockerfile",
      "description": "The PR description and comments note that this fixes a silent-runtime-failure pattern (qmd_unavailable returned only at first searchCodex call). The Dockerfile does not add a smoke test (e.g., `RUN qmd --version` or `RUN qmd query ...`) in the runtime stage to fail the build if the binary or index is broken. Future regressions to bin path, index location, or libstdc++ deps will again fail silently at runtime.",
      "suggestion": "Add `RUN qmd --version && qmd query --help` (or a minimal real query against the bundled index) in the runtime stage so image build fails loudly on regressions.",
      "confidence": 0.85
    },
    {
      "id": "scope-name-redundant-copy",
      "title": "Redundant COPY of @tobilu scope and qmd subpath",
      "severity": "LOW",
      "category": "clarity",
      "file": "Dockerfile",
      "description": "The first COPY copies the entire `/usr/local/lib/node_modules/@tobilu` directory which already contains `@tobilu/qmd/bin/qmd`. The second COPY then extracts the same file to /usr/local/bin/qmd. This is fine but slightly redundant; the second COPY could be replaced with a symlink (also addressing the symlink concern in qmd-bin-symlink-vs-copy).",
      "suggestion": "Replace the second COPY + chmod with `RUN ln -s /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd`.",
      "confidence": 0.8
    },
    {
      "id": "praise-failure-mode-doc",
      "title": "Comments precisely document the silent-failure mode being fixed",
      "severity": "PRAISE",
      "category": "documentation",
      "file": "Dockerfile",
      "description": "The inline comments reference the exact source location (src/lookups/search.ts:120), the failure signature (\"qmd_unavailable\"), the runtime mechanism (spawnSync), and the dlopen dependency on libstdc++/libgomp. This makes the rationale auditable and gives the next person debugging build/runtime issues a clear map.",
      "suggestion": "Keep this style for future Dockerfile changes."
    }
  ]
}

Callouts

Enrichment unavailable for this review.

…moke, pin

Addresses 4 bridgebuilder findings inline:

- F1 MEDIUM (qmd-bin-symlink-vs-copy · conf 0.7) ✅
  Switched COPY+chmod to RUN ln -s. Preserves __dirname-relative require
  resolution if qmd's bin script does any. Also addresses F6 redundancy
  (the bin file was already in @tobilu COPY tree above).

- F4 MEDIUM (root-cache-path-runtime-user · conf 0.75) ✅
  Set ENV XDG_CACHE_HOME=/opt/qmd-cache in both stages. UID-portable —
  survives USER directive, K8s runAsNonRoot, docker run -u <uid>. Index
  built/copied to /opt/qmd-cache instead of /root/.cache/qmd. Added
  chmod -R a+r so non-root users can read.

- F2 LOW (version-pinning · conf 0.9) ✅
  npm install -g @tobilu/qmd@^2.1.0 → @2.1.0 (exact pin). Reproducible
  builds; bump intentionally.

- F5 LOW (no-runtime-smoke-test · conf 0.85) ✅
  Added RUN qmd --version && qmd status to runtime stage. Catches
  symlink failures, missing libs, XDG_CACHE_HOME mismatches at build
  time instead of first search_codex call.

- F3 LOW (index-rebuild-cache-miss · conf 0.6) ⏸ deferred
  Real but non-trivial fix (split COPY into multiple stages for grails/
  + core-lore/ + scripts/ before main COPY . .). Worthwhile follow-up
  if build time becomes the bottleneck.

- F6 LOW (scope-name-redundant-copy · conf 0.8) ✅ resolved by F1 fix
- F7 PRAISE (failure-mode docs) — comments extended with bridgebuilder ref

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30957495a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread Dockerfile Outdated
# returns "qmd_unavailable" at first call — the silent-runtime-failure pattern
# from PR #69 reaching one layer deeper.
COPY --from=builder /usr/local/lib/node_modules/@tobilu /usr/local/lib/node_modules/@tobilu
COPY --from=builder /usr/local/lib/node_modules/@tobilu/qmd/bin/qmd /usr/local/bin/qmd
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Copy the installed qmd symlink, not the package wrapper

@tobilu/qmd 2.1.0 ships a bin/qmd shell wrapper that derives its package root from dirname($0) (after symlink resolution) and then executes $DIR/dist/cli/qmd.js; copying .../qmd/bin/qmd directly to /usr/local/bin/qmd makes $0 resolve to /usr/local/bin/qmd, so it looks for /usr/local/dist/cli/qmd.js (missing) and exits at runtime. In this image, searchCodex shells out to qmd, so this line can keep search_codex broken even though files were copied. Copy /usr/local/bin/qmd from builder (preserving npm’s symlink target) or adjust the wrapper target path.

Useful? React with 👍 / 👎.

@zkSoju
Copy link
Copy Markdown
Contributor Author

zkSoju commented May 2, 2026

Bridgebuilder #70 findings addressed

Pushed 3db54d5fa addressing 4 of 7 findings inline:

  • F1 MEDIUM (qmd-bin-symlink-vs-copy) ✅ — switched COPY+chmod to RUN ln -s to preserve __dirname-relative require resolution. Also resolves F6 (redundancy).
  • F4 MEDIUM (root-cache-path-runtime-user) ✅ — set ENV XDG_CACHE_HOME=/opt/qmd-cache in both stages + chmod -R a+r for non-root readability. UID-portable across USER directive, K8s runAsNonRoot, docker run -u.
  • F2 LOW (version-pinning) ✅ — @^2.1.0@2.1.0 exact pin. Reproducible builds.
  • F5 LOW (no-runtime-smoke-test) ✅ — added RUN qmd --version && qmd status | head -5 in runtime stage. Catches symlink failures, missing libs, XDG mismatches at build time.
  • F3 LOW (index-rebuild-cache-miss) ⏸ deferred — real but non-trivial (needs splitting COPY . . to maximize layer cache). Worthwhile follow-up if build time becomes painful.
  • F6 LOW (scope-name-redundant-copy) ✅ resolved by F1 fix.
  • F7 PRAISE (failure-mode docs) — comments extended with bridgebuilder ref.

Ready to merge.

@zkSoju zkSoju merged commit 193e097 into main May 2, 2026
3 of 4 checks passed
@zkSoju zkSoju deleted the fix/dockerfile-bundle-qmd-binary-and-index branch May 2, 2026 23:51
zkSoju added a commit that referenced this pull request May 3, 2026
…at smoke

Tightens runtime smoke from `qmd status` (passes even with empty registry —
the exact silent-success that #70 shipped) to explicit grep assertions on
both expected collection names. Fails loudly if EITHER:
- /root/.config/qmd registry COPY missed
- collection registration failed in builder
- qmd 2.x changes registry format such that names don't appear in `qmd status` output

F1 (MEDIUM root-path portability) intentionally deferred — the PR's premise
is that non-root deploy was theoretical and the simpler default-paths approach
is safer than half-XDG (which is what #70 shipped and broke things). Revisit
if/when non-root deploy is actually needed; would require qmd 2.x to expose
config-dir override.

F3/F4 (LOW perms/size) deferred — low-leverage; address if image bloats or
runtime user changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zkSoju added a commit that referenced this pull request May 3, 2026
…ailed post-#70 (#71)

* fix(docker): copy qmd registry (~/.config/qmd) — search_codex still failed post-#70

Verified post-merge of PR #70 (Railway redeploy 2026-05-02T~23:55Z): search_codex
returned a NEW error — "qmd collection \"codex-grails\" not found." Build/symlink
worked; the qmd binary loads. But qmd's collection registry is missing in runtime.

Root cause: qmd splits state across THREE locations (verified via local install
inspection + qmd source at dist/store.js:345, dist/cli/qmd.js:225, dist/llm.js:64,
dist/collections.js:69):

  /root/.config/qmd/index.yml      collection registry (XDG-config respected)
  /root/.cache/qmd/index.sqlite    content + embeddings (XDG-cache respected)
  /root/.cache/qmd/models/         embedding model (HARDCODED homedir/.cache)

#70 set XDG_CACHE_HOME=/opt/qmd-cache for non-root portability per bridgebuilder
F4 — but XDG only redirects ~/.cache, not ~/.config. So:
  - Builder wrote registry to /root/.config/qmd/index.yml (always defaults here)
  - Builder wrote sqlite/models to /opt/qmd-cache/qmd/ (XDG redirect)
  - Runtime COPYed only /opt/qmd-cache → registry was MISSING in runtime
  - qmd at runtime: "I have a sqlite file but don't know what collections exist"
  - Result: "qmd collection codex-grails not found"

Pragmatic fix: don't rebase qmd's paths. Drop XDG_CACHE_HOME ENV statements;
let qmd write defaults in builder; COPY both /root/.config/qmd + /root/.cache/qmd
verbatim to runtime. The non-root portability concern from F4 was theoretical
(Railway runs as root). If non-root deploy is needed later, address then with a
proper qmd config-path override (which qmd 2.x doesn't expose for ~/.config).

Tightened smoke test: `qmd status` (no head pipe) loads the registry — if
collections list is empty here, the COPY missed something. Build fails loudly
at this step instead of search_codex erroring at first call.

Image-size cost grows by ~/.cache/qmd/models (~50-150MB depending on embedding
model qmd downloaded). Trade for working search_codex without runtime model
download. V2 candidate: lazy model load + persistent volume mount.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): bridgebuilder #71 F2 — assert both collections register at smoke

Tightens runtime smoke from `qmd status` (passes even with empty registry —
the exact silent-success that #70 shipped) to explicit grep assertions on
both expected collection names. Fails loudly if EITHER:
- /root/.config/qmd registry COPY missed
- collection registration failed in builder
- qmd 2.x changes registry format such that names don't appear in `qmd status` output

F1 (MEDIUM root-path portability) intentionally deferred — the PR's premise
is that non-root deploy was theoretical and the simpler default-paths approach
is safer than half-XDG (which is what #70 shipped and broke things). Revisit
if/when non-root deploy is actually needed; would require qmd 2.x to expose
config-dir override.

F3/F4 (LOW perms/size) deferred — low-leverage; address if image bloats or
runtime user changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant