feat(metadata): support server-side metadata search evals by jack-arturo · Pull Request #10 · verygoodplugins/automem-evals

jack-arturo · 2026-06-10T18:13:34Z

Summary

Add server-metadata-search eval support for comparing two AutoMem code checkouts without transforming restored data.
Add baseline/candidate AutoMem dir and Python env controls, vector identity checks, and a metadata sidecar run label.
Add filtered AUTOMEM_RUNTIME_ENV_FILE loading so both Docker stacks share embedding runtime config without leaking unrelated env into Compose.
Update tests and README workflow for server-side metadata search evals.

Breaking Changes

None.

Validation

python3 -m unittest discover -s runners -p 'test_*.py' -> 186 tests OK, 1 skipped.
python3 -m unittest scripts/test_real_data_metadata_eval_shell.py -> 6 tests OK.
Real-data eval: data/sweep_runs/20260610T174320Z-metadata-server-metadata-search.
Metadata hit@5: 0.140 -> 0.770; MRR: 0.126 -> 0.672.
Vector identity passed: 9,921 baseline and 9,921 candidate vectors were byte-identical.
Preserve/mixed suite had zero non-OK statuses.

Related AutoMem PR: verygoodplugins/automem#177

Copilot

Pull request overview

Adds a new “server-metadata-search” real-data metadata eval mode that compares baseline vs candidate AutoMem server code directly (without transforming restored data), while improving run metadata capture and reproducibility for worktree-based runs.

Changes:

Extend scripts/real_data_metadata_eval.sh to support separate baseline/candidate AutoMem checkouts + Python envs, add runtime env-file allowlist loading, and introduce a server-side vector identity preflight path.
Add/extend unit + shell-level tests covering runtime env-file redaction, restore-plan rendering for separate checkouts, and report run-label propagation.
Update the eval report generator to accept and emit a run_label in both markdown output and metrics JSON; update README workflow notes for worktrees.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
scripts/test_real_data_metadata_eval_shell.py	Adds integration-style shell tests for runtime env allowlist loading/redaction and server-variant restore-plan behavior.
scripts/real_data_metadata_eval.sh	Adds server-side eval variant support, baseline/candidate checkout + Python controls, allowlisted runtime env-file loading, and run labeling.
runners/test_run_metadata_ab_eval.py	Updates report-writing test coverage to validate run-label rendering.
runners/test_check_vector_identity.py	New unit tests for vector identity comparison + artifact writing.
runners/run_metadata_ab_eval.py	Adds `--run-label` CLI option and includes run label in report + metrics output.
runners/check_vector_identity.py	New runner that compares baseline/candidate Qdrant vectors and writes preflight/summary artifacts.
README.md	Updates worktree guidance to cover server-code variants and separate baseline/candidate AutoMem directories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

## Summary - Add a bounded evidence-based metadata sidecar candidate channel to `/recall`, controlled by `RECALL_METADATA_SEARCH_ENABLED`. - Search whitelisted metadata values without replacing content vectors, generating production tags, mutating graph content, or changing the HTTP API. - Use field words only for scoring/disambiguation; this is not keyword-intent routing. - Document metadata storage, recall response shape, update/enrichment behavior, consolidation notes, and search behavior for issue #110. ## Breaking Changes None. ## Validation - `.venv/bin/python -m pytest -q` -> 465 passed, 12 skipped. - Real-data eval: `data/sweep_runs/20260610T174320Z-metadata-server-metadata-search`. - Metadata hit@5: 0.140 -> 0.770; MRR: 0.126 -> 0.672. - Qdrant vectors stayed byte-identical: 9,921 baseline and 9,921 candidate vectors. - Preserve/mixed suite had zero non-OK statuses. Eval harness PR: verygoodplugins/automem-evals#10 Closes #110 --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

feat(metadata): support server-side metadata search evals

f392435

Copilot AI review requested due to automatic review settings June 10, 2026 18:13

Copilot started reviewing on behalf of jack-arturo June 10, 2026 18:13 View session

jack-arturo mentioned this pull request Jun 10, 2026

feat(recall): add metadata sidecar search verygoodplugins/automem#177

Merged

Copilot AI reviewed Jun 10, 2026

View reviewed changes

Comment thread scripts/real_data_metadata_eval.sh Outdated

Comment thread README.md Outdated

jack-arturo and others added 2 commits June 10, 2026 20:47

Potential fix for pull request finding

5d7f4b5

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

20a109c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

jack-arturo merged commit c56fda2 into main Jun 10, 2026

jack-arturo deleted the feat/metadata-signal-diagnosis branch June 10, 2026 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metadata): support server-side metadata search evals#10

feat(metadata): support server-side metadata search evals#10
jack-arturo merged 3 commits into
mainfrom
feat/metadata-signal-diagnosis

jack-arturo commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jack-arturo commented Jun 10, 2026

Summary

Breaking Changes

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants