Skip to content

docs(adr): propose unified search index mapping#2662

Open
dschmidt wants to merge 2 commits into
opencloud-eu:mainfrom
dschmidt:docs/adr-reflection-based-search-mapping
Open

docs(adr): propose unified search index mapping#2662
dschmidt wants to merge 2 commits into
opencloud-eu:mainfrom
dschmidt:docs/adr-reflection-based-search-mapping

Conversation

@dschmidt
Copy link
Copy Markdown
Contributor

@dschmidt dschmidt commented Apr 23, 2026

📄 Rendered ADR: https://github.com/dschmidt/opencloud/blob/docs/adr-reflection-based-search-mapping/docs/adr/0005-unified-search-index-mapping.md


Proposes ADR 0005 — a single central Go-struct + overrides map as the source of truth for the search index layout across bleve and OpenSearch, so both backends stop drifting on implicit per-backend defaults and adding new index features (geopoint, case-preserving aggregation siblings, ...) becomes a one-line change in one place.

Also records the end-to-end case-handling principle for facet data (indexed as case-preserving keywords so aggregation buckets return correct display values) and the cross-backend sibling-field pattern that makes features like geopoint trivially uniform across bleve and OpenSearch.

#2659 is a proof-of-concept implementation this ADR emerged from; scope and APIs there will be revisited once the decision here lands.

Deciders TBD — please add reviewers who should sign off on the technical direction.

Propose a single central Go-struct + overrides map as the source of
truth for the search index layout across bleve and OpenSearch — the
same definition drives the per-backend index mapping, the write-time
adapter, the hit-decoding path, and the KQL compiler's case-folding
rules, so the two backends cannot drift silently again.

Also records the end-to-end case-handling principle for facet data
(indexed as case-preserving keywords so aggregation buckets return
correct display values), the sibling-field pattern for geopoint on
Location, and the rationale for replacing the two backends' implicit
defaults with an explicit, backend-agnostic contract.

PR opencloud-eu#2659 is a proof-of-concept implementation the proposal emerged
from; scope and APIs there will be revisited once this ADR lands.
@dschmidt dschmidt force-pushed the docs/adr-reflection-based-search-mapping branch from e94393b to 8954125 Compare April 27, 2026 07:44
@sonarqubecloud
Copy link
Copy Markdown

@dschmidt
Copy link
Copy Markdown
Contributor Author

Updated it to make it more focused :)

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 12, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown

@codacy-production codacy-production Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The ADR for unified search index mapping is technically sound and aligns with Codacy standards. However, the document is currently incomplete as the 'Deciders' list remains empty; formal identification of stakeholders is required before this can be approved. While the proposed architecture eliminates behavioral drift, it explicitly accepts a performance trade-off for Bleve write-paths and creates a temporary feature gap for WebDAV REPORTs until graph-search is implemented. These should be acknowledged by the leadership team. Additionally, all identified verification scenarios for the implementation phase are currently missing and must be addressed in subsequent PRs.

About this PR

  • The 'Deciders' list in the ADR is currently empty. Technical leadership stakeholders must be formally identified and added to ensure the proposal has the necessary oversight.
  • The reliance on a future graph-search implementation to resolve facet exposure in WebDAV REPORTs leaves a temporary feature gap that may impact consumers relying on those reports.
  • The proposal relies on a JSON round-trip for bleve write-paths. This is an acceptable performance trade-off for now but will require monitoring during high-volume ingestion to ensure it does not become a bottleneck.

Test suggestions

  • Verify that a single struct definition generates matching semantic mappings for both bleve and OpenSearch backends.
  • Ensure facet sub-fields (e.g., audio.artist) correctly preserve casing in the index to support accurate aggregation bucket labels.
  • Verify the KQL compiler correctly folds query values to lowercase for fields like 'Name' but preserves case for 'audio.artist' based on the shared mapping.
  • Validate that the search service fails to start if the overrides map contains typos or references non-existent struct fields.
  • Verify that existing indexes remain operational under their old mappings while new indexes adopt the unified schema.
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Verify that a single struct definition generates matching semantic mappings for both bleve and OpenSearch backends.
2. Ensure facet sub-fields (e.g., audio.artist) correctly preserve casing in the index to support accurate aggregation bucket labels.
3. Verify the KQL compiler correctly folds query values to lowercase for fields like 'Name' but preserves case for 'audio.artist' based on the shared mapping.
4. Validate that the search service fails to start if the overrides map contains typos or references non-existent struct fields.
5. Verify that existing indexes remain operational under their old mappings while new indexes adopt the unified schema.

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback

- OpenSearch facet sub-strings (`audio.*`, `photo.*`, `image.*`)
change from the dynamic `text + keyword` multi-field to
`keyword`-only, implementing the principle above. As noted in
the context section, the tokenized leg is not actually reachable
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Suggestion: The parenthetical list of facet sub-strings is missing location.*. Including it ensures consistency with the facet definitions provided in the Context and Decision Outcome sections where location is explicitly mentioned.

Comment thread docs/adr/0005-unified-search-index-mapping.md
The bullet enumerating which OpenSearch facet sub-fields change
from text+keyword to keyword-only listed audio, photo and image but
omitted location, which is in the same dynamic-template-handled
facet block per the context section. Add it for consistency.
@dschmidt
Copy link
Copy Markdown
Contributor Author

dschmidt commented May 12, 2026

Test suggestions: out of scope for this PR. This is an ADR, so there is no code to test. The five suggestions are all behaviors of the future implementation that this ADR enables (struct-to-mapping equivalence across backends, case-preservation in facet sub-fields, KQL compiler folding behavior, startup validation of the overrides map, mixed-mapping coexistence). They belong on the implementation PR(s) that will follow this decision, not on the decision record itself imho.

WebDAV REPORT feature gap: already discussed in Follow-ups out of scope for this ADR. The WebDAV search endpoint currently renders none of the facet fields back to the client, so this is a pre-existing missing feature, not a regression introduced by the proposal. Resolution path is also called out: let the graph-search endpoint take over once graph search lands.

Bleve JSON round-trip performance trade-off: already discussed in Known trade-off. On hot paths it is measurable but not significant, and if it ever matters a direct reflection walker can replace the json round-trip without changing any call site.

Empty Deciders list: yes, we know. we haven't decided yet :)

Inline comment on line 222 (location.* missing): fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant