docs(adr): propose unified search index mapping#2662
Conversation
Propose a single central Go-struct + overrides map as the source of truth for the search index layout across bleve and OpenSearch — the same definition drives the per-backend index mapping, the write-time adapter, the hit-decoding path, and the KQL compiler's case-folding rules, so the two backends cannot drift silently again. Also records the end-to-end case-handling principle for facet data (indexed as case-preserving keywords so aggregation buckets return correct display values), the sibling-field pattern for geopoint on Location, and the rationale for replacing the two backends' implicit defaults with an explicit, backend-agnostic contract. PR opencloud-eu#2659 is a proof-of-concept implementation the proposal emerged from; scope and APIs there will be revisited once this ADR lands.
e94393b to
8954125
Compare
|
|
Updated it to make it more focused :) |
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Pull Request Overview
The ADR for unified search index mapping is technically sound and aligns with Codacy standards. However, the document is currently incomplete as the 'Deciders' list remains empty; formal identification of stakeholders is required before this can be approved. While the proposed architecture eliminates behavioral drift, it explicitly accepts a performance trade-off for Bleve write-paths and creates a temporary feature gap for WebDAV REPORTs until graph-search is implemented. These should be acknowledged by the leadership team. Additionally, all identified verification scenarios for the implementation phase are currently missing and must be addressed in subsequent PRs.
About this PR
- The 'Deciders' list in the ADR is currently empty. Technical leadership stakeholders must be formally identified and added to ensure the proposal has the necessary oversight.
- The reliance on a future graph-search implementation to resolve facet exposure in WebDAV REPORTs leaves a temporary feature gap that may impact consumers relying on those reports.
- The proposal relies on a JSON round-trip for bleve write-paths. This is an acceptable performance trade-off for now but will require monitoring during high-volume ingestion to ensure it does not become a bottleneck.
Test suggestions
- Verify that a single struct definition generates matching semantic mappings for both bleve and OpenSearch backends.
- Ensure facet sub-fields (e.g., audio.artist) correctly preserve casing in the index to support accurate aggregation bucket labels.
- Verify the KQL compiler correctly folds query values to lowercase for fields like 'Name' but preserves case for 'audio.artist' based on the shared mapping.
- Validate that the search service fails to start if the overrides map contains typos or references non-existent struct fields.
- Verify that existing indexes remain operational under their old mappings while new indexes adopt the unified schema.
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Verify that a single struct definition generates matching semantic mappings for both bleve and OpenSearch backends.
2. Ensure facet sub-fields (e.g., audio.artist) correctly preserve casing in the index to support accurate aggregation bucket labels.
3. Verify the KQL compiler correctly folds query values to lowercase for fields like 'Name' but preserves case for 'audio.artist' based on the shared mapping.
4. Validate that the search service fails to start if the overrides map contains typos or references non-existent struct fields.
5. Verify that existing indexes remain operational under their old mappings while new indexes adopt the unified schema.
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
| - OpenSearch facet sub-strings (`audio.*`, `photo.*`, `image.*`) | ||
| change from the dynamic `text + keyword` multi-field to | ||
| `keyword`-only, implementing the principle above. As noted in | ||
| the context section, the tokenized leg is not actually reachable |
There was a problem hiding this comment.
⚪ LOW RISK
Suggestion: The parenthetical list of facet sub-strings is missing location.*. Including it ensures consistency with the facet definitions provided in the Context and Decision Outcome sections where location is explicitly mentioned.
The bullet enumerating which OpenSearch facet sub-fields change from text+keyword to keyword-only listed audio, photo and image but omitted location, which is in the same dynamic-template-handled facet block per the context section. Add it for consistency.
|
Test suggestions: out of scope for this PR. This is an ADR, so there is no code to test. The five suggestions are all behaviors of the future implementation that this ADR enables (struct-to-mapping equivalence across backends, case-preservation in facet sub-fields, KQL compiler folding behavior, startup validation of the overrides map, mixed-mapping coexistence). They belong on the implementation PR(s) that will follow this decision, not on the decision record itself imho. WebDAV REPORT feature gap: already discussed in Follow-ups out of scope for this ADR. The WebDAV search endpoint currently renders none of the facet fields back to the client, so this is a pre-existing missing feature, not a regression introduced by the proposal. Resolution path is also called out: let the graph-search endpoint take over once graph search lands. Bleve JSON round-trip performance trade-off: already discussed in Known trade-off. On hot paths it is measurable but not significant, and if it ever matters a direct reflection walker can replace the json round-trip without changing any call site. Empty Deciders list: yes, we know. we haven't decided yet :) Inline comment on line 222 ( |



📄 Rendered ADR: https://github.com/dschmidt/opencloud/blob/docs/adr-reflection-based-search-mapping/docs/adr/0005-unified-search-index-mapping.md
Proposes ADR 0005 — a single central Go-struct + overrides map as the source of truth for the search index layout across bleve and OpenSearch, so both backends stop drifting on implicit per-backend defaults and adding new index features (geopoint, case-preserving aggregation siblings, ...) becomes a one-line change in one place.
Also records the end-to-end case-handling principle for facet data (indexed as case-preserving keywords so aggregation buckets return correct display values) and the cross-backend sibling-field pattern that makes features like geopoint trivially uniform across bleve and OpenSearch.
#2659 is a proof-of-concept implementation this ADR emerged from; scope and APIs there will be revisited once the decision here lands.
Deciders TBD — please add reviewers who should sign off on the technical direction.