Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions documents/reader_pipeline/reader_pipeline_engineering_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,17 @@ conversation package 不仅包含正文,还包含正文之外的 sidecar 结
- `content_text` 只作为 fallback plain text
- table / math / code 必须按 `semantic_ast_v2` 保真渲染
- 不允许 reader 再从正文尾部推断 citation 或 artifact
- reader search 也必须遵守同一份 package-aware contract,而不是重新退回 `content_text` 单源扫描
- search occurrence 固定覆盖:
- `body`
- `source`
- `attachment`
- `artifact`
- `annotation`
- Reader 内的 occurrence 顺序固定为:
- 先按消息顺序
- 同消息内按 `body -> source -> attachment -> artifact -> annotation`
- single CJK 字符查询允许进入 full-text reader search;single non-CJK 字符保持 title/snippet-only,不进入 reader 全文命中导航

### 4.2.2 Sources, Attachment, And Artifact Sections

Expand All @@ -113,6 +124,10 @@ conversation package 不仅包含正文,还包含正文之外的 sidecar 结
- 允许显示 excerpt,但 excerpt 只来自 `markdownSnapshot / plainText / normalizedHtmlSnapshot`
- 本轮不要求完整 Artifact 预览复刻
- 任何动态内容都不直接 live replay
- Reader search 命中 sidecar 时:
- 自动展开对应 `Sources / Attachments / Artifacts` section
- 滚动并聚焦到具体 item
- 当前命中项使用与正文相同的 active highlight 语义

### 4.3 Export

Expand Down
50 changes: 41 additions & 9 deletions documents/ui_refactor/threads_search_engineering_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Version: topic-based canonical spec
Phase: threads-search-and-reader-navigation
Status: Decision Complete (Docs only; no code implementation in this pass)
Status: Active implementation baseline
Audience: Frontend engineers, QA, release owners

---
Expand All @@ -18,11 +18,12 @@ Locked decisions:

1. `SearchSession` is promoted to `VestiSidepanel` top-level state instead of remaining local to `TimelinePage`.
2. Search result ordering remains `updated_at` descending.
3. Offscreen search returns only lightweight conversation-level match summaries.
4. Reader builds occurrence-level navigation locally after messages are loaded.
3. Offscreen search returns lightweight conversation-level match summaries with source taxonomy.
4. Reader builds occurrence-level navigation locally after messages are loaded, including sidecar targets.
5. Reader must expose an explicit `reader_building_index` state.
6. List restore uses `anchorConversationId` as the primary restore target rather than raw `scrollTop`.
7. Highlight scope is limited to text-capable nodes; it does not expand into `code_block`, `math`, or table-cell-level rich parsing in this phase.
8. Query readiness is shared: empty query disables search, a single CJK character enters full-text, and a single non-CJK character remains title/snippet-only.

---

Expand All @@ -31,8 +32,8 @@ Locked decisions:
Current behavior remains split across two unrelated mechanisms:

1. Title and snippet matching happen locally in the Threads list.
2. Message-body search goes through `searchConversationIdsByText(query)` and returns only `number[]` conversation ids.
3. Reader does not receive search context and cannot navigate or highlight body hits.
2. Legacy paths still assume `content_text` is the only full-text source, while package-aware helpers already understand `citations[] / attachments[] / artifacts[]`.
3. Reader-side search historically understood body text only and could not navigate sidecar hits.
4. Opening Reader unmounts `TimelinePage`, so local search state would be lost unless it is lifted.

This means the current implementation can only answer "did this conversation match somewhere in messages" but cannot drive:
Expand Down Expand Up @@ -67,6 +68,19 @@ Boundary rule:

Goal: replace `searchConversationIdsByText` with a lightweight summary interface that can power list excerpts and Reader entry.

#### Query readiness contract

```ts
export type SearchReadiness = "empty" | "title_snippet_only" | "fulltext";
```

Rules:
- empty query => `empty`
- single CJK character => `fulltext`
- single Latin/digit/symbol character => `title_snippet_only`
- query length `>= 2` => `fulltext`
- title/snippet highlight may still run for `title_snippet_only`; offscreen full-text scan must not

#### Repository / storage / messaging signature

```ts
Expand All @@ -79,6 +93,8 @@ export interface ConversationMatchSummary {
conversationId: number;
firstMatchedMessageId: number;
bestExcerpt: string;
firstMatchedSurface: "body" | "source" | "attachment" | "artifact" | "annotation";
matchedSurfaces: Array<"body" | "source" | "attachment" | "artifact" | "annotation">;
}

export async function searchConversationMatchesByText(
Expand Down Expand Up @@ -111,9 +127,15 @@ ConversationMatchSummary[]

Semantics:
- `firstMatchedMessageId` is the earliest matched message within that conversation by `created_at`; if timestamps tie, the lower message id wins.
- `bestExcerpt` must come from the same message identified by `firstMatchedMessageId`.
- `bestExcerpt` and `firstMatchedSurface` must come from the same message identified by `firstMatchedMessageId`.
- `conversationIds` is an optional candidate-set constraint from the currently filtered list and exists to reduce offscreen scan cost.
- `matchedInMessages` is not a repository field; it is derived in the list layer from the presence of a summary plus local title/snippet matching.
- `matchedSurfaces` is a conversation-level union used for explanation and QA, not for ranking.
- message search projection must cover:
- `body`
- `source`
- `attachment`
- `artifact`
- `annotation`

Explicit exclusions for this phase:
- no `totalOccurrenceCount`
Expand All @@ -131,8 +153,14 @@ Goal: make `ConversationCard` visually distinguish title hits, snippet hits, and

Key changes:
- introduce `splitWithHighlight(text, query): HighlightedSegment[]` as a reusable pure function
- use `bestExcerpt` to replace the displayed snippet when the hit exists only in message body
- keep the existing `Matched in messages` badge as the explanatory hint
- use `bestExcerpt` to replace the displayed snippet when the hit exists only in search projection
- surface-specific hint copy is fixed:
- `Matched in messages`
- `Matched in sources`
- `Matched in attachments`
- `Matched in artifacts`
- `Matched in notes`
- do not show the hint when title or snippet already matched locally

Completion criteria:
- title-hit, snippet-hit, and message-only-hit cards are visually distinguishable
Expand Down Expand Up @@ -170,6 +198,7 @@ Key changes:
- `AstMessageRenderer` uses the shared highlight splitter on text-capable nodes
- `MessageBubble` uses the same utility in plain-text fallback mode
- Reader navigation scrolls and focuses the active rendered occurrence target
- sidecar hits must auto-expand the owning `Sources / Attachments / Artifacts` disclosure and focus the concrete item

Allowed highlight targets:
- paragraph text
Expand All @@ -178,6 +207,9 @@ Allowed highlight targets:
- blockquote text
- `strong`
- `em`
- source label / host
- attachment `indexAlt / label / mime`
- artifact `title / descriptor / excerpt`
- `code_inline`

Out of scope for this phase:
Expand Down
33 changes: 29 additions & 4 deletions documents/ui_refactor/threads_search_state_machine_contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ stateDiagram-v2
[*] --> list_idle

list_idle --> list_results: QUERY_CHANGED / local title-snippet hit
list_idle --> list_searching_body: QUERY_CHANGED(query.length >= 2)
list_idle --> list_searching_body: QUERY_CHANGED(fulltext query)
list_idle --> list_results: QUERY_CHANGED(title/snippet-only query)
list_idle --> list_empty: QUERY_CHANGED / no local hit and no body hit

list_searching_body --> list_results: BODY_SEARCH_RESOLVED(results > 0)
Expand All @@ -38,7 +39,8 @@ stateDiagram-v2
list_results --> list_results: QUERY_CHANGED / FILTER_CHANGED / RESULTS_RESHAPED
list_results --> reader_loading_messages: OPEN_READER / freeze search session

list_empty --> list_searching_body: QUERY_CHANGED(query.length >= 2)
list_empty --> list_searching_body: QUERY_CHANGED(fulltext query)
list_empty --> list_results: QUERY_CHANGED(title/snippet-only query)
list_empty --> list_idle: QUERY_CLEARED

reader_loading_messages --> reader_building_index: MESSAGES_LOADED
Expand Down Expand Up @@ -67,17 +69,32 @@ export interface ConversationMatchSummary {
conversationId: number;
firstMatchedMessageId: number;
bestExcerpt: string;
firstMatchedSurface: "body" | "source" | "attachment" | "artifact" | "annotation";
matchedSurfaces: Array<"body" | "source" | "attachment" | "artifact" | "annotation">;
}
```

Rules:
- repository, `storageService`, and messaging/offscreen handler all keep the same query and response shape.
- message type for this contract is `SEARCH_CONVERSATION_MATCHES_BY_TEXT`.
- `firstMatchedMessageId` is the earliest matched message by `created_at`; ties fall back to lower message id.
- `bestExcerpt` must be cut from the same message identified by `firstMatchedMessageId`.
- `bestExcerpt` and `firstMatchedSurface` must be cut from the same message identified by `firstMatchedMessageId`.
- `conversationIds` limits the scan to the current list candidate set when provided.
- offscreen never returns occurrence-level detail.

### 3.1.1 Query readiness contract

```ts
export type SearchReadiness = "empty" | "title_snippet_only" | "fulltext";
```

Rules:
- empty query => `empty`
- single CJK character => `fulltext`
- single non-CJK character => `title_snippet_only`
- query length `>= 2` => `fulltext`
- title/snippet-only queries must not dispatch offscreen full-text search

### 3.2 Search session contract

```ts
Expand Down Expand Up @@ -105,6 +122,7 @@ Rules:
export interface ReaderOccurrence {
occurrenceKey: string;
messageId: number;
surface: "body" | "source" | "attachment" | "artifact" | "annotation";
nodeKey: string;
charOffset: number;
length: number;
Expand All @@ -120,9 +138,14 @@ export interface ReaderSearchModel {

Rules:
- `ReaderOccurrence` is derived locally from loaded messages.
- `surface` distinguishes body hits from sidecar hits and must survive prev/next navigation.
- `nodeKey` must be stable between index building and render targeting.
- `nodeKey` should use an AST/renderer path string, not a random uuid.
- recommended form: `msg-42:p[1]:text[0]`
- recommended forms:
- `msg-42:p[1]:text[0]`
- `msg-42:source[0]:label`
- `msg-42:attachment[0]:indexAlt`
- `msg-42:artifact[0]:excerpt`

### 3.4 Threads state union

Expand Down Expand Up @@ -208,6 +231,7 @@ Rules:
- `BACK_TO_LIST` restores the same session as mutable list state.
- `MESSAGES_LOADED` does not imply navigation readiness.
- only `INDEX_BUILT` may enter `reader_ready`.
- `QUERY_CHANGED` for `title_snippet_only` query must never enter `list_searching_body`.

---

Expand All @@ -232,6 +256,7 @@ Required behavior:
- building `occurrences[]`
- injecting `<mark>` segments
- locating the active occurrence for `scrollIntoView` and focus
- sidecar targets must be mappable back to a disclosure section plus concrete item key so Reader can auto-expand on match

This bridge is the reason `nodeKey` is a first-class field in the state contract rather than an implementation footnote.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ For each relevant case, confirm:
- annotation export anchor text is not blank when the anchor turn only has sidecars
- prompt/compression transcript includes sidecar summaries when body text is absent
- search/retrieval can still surface attachment-only messages via summary text
- single CJK character query enters full-text search across body + sidecars
- single non-CJK character query remains title/snippet-only and does not trigger body-sidecar full-text scan
- Threads search can distinguish `Matched in messages / sources / attachments / artifacts / notes`
- Reader search auto-expands the owning `Sources / Attachments / Artifacts` section when the active hit lands in a sidecar item

6. Metadata
- title still follows app-shell title truth, not the largest body heading
Expand All @@ -95,3 +99,5 @@ Before sign-off, answer all of these:
- Did any attachment imply raw replay support?
- Did any dynamic artifact render as a live embedded surface?
- Did any attachment-only message disappear from preview, export, or prompt flow?
- Did any single CJK character query fail to reach full-text body + sidecar search?
- Did any Reader sidecar hit fail to auto-expand and focus the correct item?
Loading
Loading