-
Notifications
You must be signed in to change notification settings - Fork 4
✨(summarize): new summarize tool integration #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe PR refactors document summarization from a direct agent hook to a tool-based architecture. It removes the public Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Agent as Conversation Agent
participant Tool as Summarize Tool
participant Chunker as Text Chunker
participant SumAgent as Summarization Agent
participant Storage as Document Storage
Note over User,Storage: Old Flow (Removed)
User->>Agent: invoke hand_off_to_summarization_agent()
Agent->>Storage: read_document_content()
Storage-->>Agent: document content
Agent->>SumAgent: invoke with full content
SumAgent-->>Agent: summary
Agent-->>User: ToolReturn(summary)
Note over User,Storage: New Flow
User->>Agent: invoke summarize tool
Agent->>Tool: document_summarize(ctx)
Tool->>Storage: read attachments
Storage-->>Tool: document contents
Tool->>Chunker: chunk texts with semchunk
Chunker-->>Tool: text chunks
loop Concurrent Chunks (semaphore-limited)
Tool->>SumAgent: summarize_chunk(idx, chunk)
SumAgent-->>Tool: chunk summary
end
Tool->>SumAgent: merge chunk summaries
SumAgent-->>Tool: final merged summary
Tool->>Agent: ToolReturn(summary, sources)
Agent-->>User: streamed response with summary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~40 minutes
Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/backend/chat/tools/document_search_rag.py (1)
29-33: Guard missing collection_id to prevent backend errors.If
conversation.collection_idis empty, constructing the store may fail. Early-return with empty results.Apply this change:
- document_store = document_store_backend(ctx.deps.conversation.collection_id) + if not getattr(ctx.deps.conversation, "collection_id", None): + return ToolReturn(return_value=[], content="", metadata={"sources": []}) + document_store = document_store_backend(ctx.deps.conversation.collection_id)src/backend/chat/clients/pydantic_ai.py (1)
580-596: Treat non-URL sources from summarize safely; avoid emitting invalid URL sources.
summarize.pyreturns document names as “sources”. Here they’re emitted as URL sources, which can render broken links. Only emit URL sources when strings look like http(s) URLs.Apply this diff:
- if final_output.metadata and ( - sources := final_output.metadata.get("sources") - ): - for source_url in sources: - url_source = LanguageModelV1Source( - sourceType="url", - id=str(uuid.uuid4()), - url=source_url, - providerMetadata={}, - ) - _new_source_ui = SourceUIPart( - type="source", source=url_source - ) - _ui_sources.append(_new_source_ui) - yield events_v4.SourcePart( - **_new_source_ui.source.model_dump() - ) + if final_output.metadata and ( + sources := final_output.metadata.get("sources") + ): + for src in sources: + if isinstance(src, str) and src.startswith(("http://", "https://")): + url_source = LanguageModelV1Source( + sourceType="url", + id=str(uuid.uuid4()), + url=src, + providerMetadata={}, + ) + _new_source_ui = SourceUIPart( + type="source", source=url_source + ) + _ui_sources.append(_new_source_ui) + yield events_v4.SourcePart( + **_new_source_ui.source.model_dump() + )
🧹 Nitpick comments (6)
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1)
70-71: Label update looks good; consider i18n consistency.Works as intended. For consistency, consider localizing the document parsing messages above too (currently hardcoded) and prefer stable keys (e.g., 'summarize.inProgress') over literal strings.
src/backend/chat/clients/pydantic_ai.py (1)
81-83:max_result_retriesis introduced but not used.Either wire it into the tool/result validation retry path or remove to avoid drift.
src/backend/chat/agents/summarize.py (4)
82-85: Reduce content leakage in logs for chunk processing.Even truncated prompts/responses can leak sensitive data. Prefer debug level and avoid content bodies.
Apply this diff:
- logger.info("[summarize] CHUNK %s/%s prompt=> %s", idx, total_chunks, sum_prompt[0:100]+'...') + logger.debug("[summarize] CHUNK %s/%s prompt emitted", idx, total_chunks) - resp = await summarization_agent.run(sum_prompt, usage=ctx.usage) - logger.info("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "") + resp = await summarization_agent.run(sum_prompt, usage=ctx.usage) + logger.debug("[summarize] CHUNK %s/%s response received", idx, total_chunks)Based on learnings
67-68: Remove duplicate initialization ofchunk_summaries.It’s set twice; keep only once near usage.
Apply this diff:
- chunk_summaries: list[str] = [] + # will be filled by batch processing below @@ - chunk_summaries = [] + chunk_summaries: list[str] = []Also applies to: 88-89
3-6: Remove unused imports.
dataclassesandadd_document_rag_search_toolaren’t used here.Apply this diff:
-import dataclasses import logging import asyncio @@ -from ..tools.document_search_rag import add_document_rag_search_toolAlso applies to: 14-15
100-102: Hard-coded French default instructions.Consider deriving default language from conversation/user settings to avoid unexpected language output.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/backend/chat/agents/summarize.py(3 hunks)src/backend/chat/clients/pydantic_ai.py(2 hunks)src/backend/chat/tools/document_search_rag.py(2 hunks)src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/backend/chat/agents/summarize.py (2)
src/backend/chat/tools/document_search_rag.py (2)
document_search_rag(14-44)add_document_rag_search_tool(10-52)src/backend/chat/models.py (1)
ChatConversationContextKind(77-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test-e2e-chromium
- GitHub Check: test-back
- GitHub Check: build-and-push-backend
🔇 Additional comments (4)
src/backend/chat/tools/document_search_rag.py (1)
50-52: Prompt guidance is clear and aligned with summarize flow.src/backend/chat/clients/pydantic_ai.py (2)
398-406: Explicit summarize-tool instruction is well-scoped.
409-414: Attached-documents system hint is a helpful guardrail.src/backend/chat/agents/summarize.py (1)
31-34: Signature update (optional instructions) fits the workflow.
src/backend/chat/agents/summarize.py
Outdated
| # Collect documents content | ||
| documents_qs = ctx.deps.conversation.contexts.filter( | ||
| kind=ChatConversationContextKind.DOCUMENT.value, | ||
| ) | ||
| documents_prompt = "\n\n".join( | ||
| [ | ||
| ( | ||
| "<document>\n" | ||
| f"<name>\n{doc.name}\n</name>\n" | ||
| f"<content>\n{doc.content}\n</content>\n" | ||
| "</document>" | ||
| ) | ||
| async for doc in documents | ||
| ] | ||
| ) | ||
|
|
||
| formatted_prompt = prompt.format( | ||
| user_prompt=ctx.prompt, | ||
| documents_prompt=documents_prompt, | ||
| documents_list = [doc async for doc in documents_qs] | ||
|
|
||
| # Instructions: rely on tool argument only; model should extract them upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle “no documents” explicitly to avoid empty-context summaries.
If no docs are found, return a clear tool result instead of prompting the model with empty context.
Apply this diff:
- documents_qs = ctx.deps.conversation.contexts.filter(
+ documents_qs = ctx.deps.conversation.contexts.filter(
kind=ChatConversationContextKind.DOCUMENT.value,
)
- documents_list = [doc async for doc in documents_qs]
+ documents_list = [doc async for doc in documents_qs]
+ if not documents_list:
+ return ToolReturn(return_value="", content="No documents available to summarize.", metadata={"sources": []})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Collect documents content | |
| documents_qs = ctx.deps.conversation.contexts.filter( | |
| kind=ChatConversationContextKind.DOCUMENT.value, | |
| ) | |
| documents_prompt = "\n\n".join( | |
| [ | |
| ( | |
| "<document>\n" | |
| f"<name>\n{doc.name}\n</name>\n" | |
| f"<content>\n{doc.content}\n</content>\n" | |
| "</document>" | |
| ) | |
| async for doc in documents | |
| ] | |
| ) | |
| formatted_prompt = prompt.format( | |
| user_prompt=ctx.prompt, | |
| documents_prompt=documents_prompt, | |
| documents_list = [doc async for doc in documents_qs] | |
| # Instructions: rely on tool argument only; model should extract them upstream | |
| # Collect documents content | |
| documents_qs = ctx.deps.conversation.contexts.filter( | |
| kind=ChatConversationContextKind.DOCUMENT.value, | |
| ) | |
| documents_list = [doc async for doc in documents_qs] | |
| if not documents_list: | |
| return ToolReturn( | |
| return_value="", | |
| content="No documents available to summarize.", | |
| metadata={"sources": []}, | |
| ) | |
| # Instructions: rely on tool argument only; model should extract them upstream |
🤖 Prompt for AI Agents
In src/backend/chat/agents/summarize.py around lines 46 to 52, after building
documents_list from the conversation contexts, add an explicit check for an
empty list and immediately return a clear tool result instead of continuing to
invoke the model with empty context; specifically, if documents_list is empty
return the appropriate ToolResult/response used by this agent with a concise
message like "No documents found in conversation contexts" (and the standard
non-error success/handled flag your tooling expects) so the agent caller
receives a deterministic tool response rather than an empty-context summary.
src/backend/chat/agents/summarize.py
Outdated
| return ToolReturn( | ||
| return_value=response.output, | ||
| content=response.output, | ||
| metadata={"sources": {doc.name async for doc in documents}}, | ||
| return_value=final_summary, | ||
| content=final_summary, | ||
| metadata={"sources": {doc.name for doc in documents_list}}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use list for metadata["sources"] to keep metadata JSON-serializable.
Sets can break serialization; also aligns with downstream handling.
Apply this diff:
- metadata={"sources": {doc.name for doc in documents_list}},
+ metadata={"sources": [doc.name for doc in documents_list]},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return ToolReturn( | |
| return_value=response.output, | |
| content=response.output, | |
| metadata={"sources": {doc.name async for doc in documents}}, | |
| return_value=final_summary, | |
| content=final_summary, | |
| metadata={"sources": {doc.name for doc in documents_list}}, | |
| ) | |
| return ToolReturn( | |
| return_value=final_summary, | |
| content=final_summary, | |
| metadata={"sources": [doc.name for doc in documents_list]}, | |
| ) |
🤖 Prompt for AI Agents
In src/backend/chat/agents/summarize.py around lines 119 to 123, the metadata
currently uses a set comprehension {"sources": {doc.name for doc in
documents_list}} which can break JSON serialization; change it to a list instead
by converting the comprehension to a list (e.g., ["sources": [doc.name for doc
in documents_list]] or wrap the set in list(...)) so metadata["sources"] is
JSON-serializable and compatible with downstream handlers.
| # Defensive: ctx.deps or ctx.deps.conversation may be unavailable in some flows (start of conversation) | ||
| if not getattr(ctx, "deps", None) or not getattr(ctx.deps, "conversation", None): | ||
| return ToolReturn(return_value=[], content="", metadata={"sources": set()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid non-JSON-serializable types in metadata; use list instead of set.
Sets in metadata can break serialization later. Return a list for sources.
Apply this diff:
- return ToolReturn(return_value=[], content="", metadata={"sources": set()})
+ return ToolReturn(return_value=[], content="", metadata={"sources": []})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Defensive: ctx.deps or ctx.deps.conversation may be unavailable in some flows (start of conversation) | |
| if not getattr(ctx, "deps", None) or not getattr(ctx.deps, "conversation", None): | |
| return ToolReturn(return_value=[], content="", metadata={"sources": set()}) | |
| # Defensive: ctx.deps or ctx.deps.conversation may be unavailable in some flows (start of conversation) | |
| if not getattr(ctx, "deps", None) or not getattr(ctx.deps, "conversation", None): | |
| return ToolReturn(return_value=[], content="", metadata={"sources": []}) |
🤖 Prompt for AI Agents
In src/backend/chat/tools/document_search_rag.py around lines 25 to 27, the
ToolReturn metadata currently uses a set() for "sources" which is not
JSON-serializable and can break downstream serialization; change the metadata to
use a list (e.g., [] or list()) instead of a set so the return value is
JSON-safe and preserves the empty collection semantics.
c057a7e to
ec9af9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/backend/chat/agents/summarize.py (3)
63-70: Handle empty document list explicitly.If no documents are found, the code will attempt to summarize an empty string, wasting API calls. Add an early return with a clear message when
documentsis empty.Apply this diff:
documents = [await read_document_content(doc) for doc in text_attachment] + if not documents: + return ToolReturn( + return_value="", + content="No documents available to summarize.", + metadata={"sources": []}, + )
132-135: Avoid logging full prompts and summaries; PII/compliance risk.The merged prompt and final summary may contain sensitive document content. Log metadata only (e.g., character count) instead of full bodies.
Apply this diff:
- logger.info("[summarize] MERGE prompt=> %s", merged_prompt) + logger.debug("[summarize] MERGE prompt emitted (len=%d)", len(merged_prompt)) merged_resp = await summarization_agent.run(merged_prompt, usage=ctx.usage) final_summary = (merged_resp.output or "").strip() - logger.info("[summarize] MERGE response<= %s", final_summary) + logger.debug("[summarize] MERGE response received (len=%d)", len(final_summary))
137-140: Use list formetadata["sources"]to ensure JSON serialization.Sets cannot be reliably JSON-serialized. Use a list instead to align with downstream handling.
Apply this diff:
return ToolReturn( return_value=final_summary, - metadata={"sources": {doc[0] for doc in documents}}, + metadata={"sources": [doc[0] for doc in documents]}, )
🧹 Nitpick comments (3)
src/backend/chat/agents/summarize.py (3)
15-15: Remove unused import.The import
add_document_rag_search_toolis not used in this file. It may be leftover from refactoring.Apply this diff:
-from ..tools.document_search_rag import add_document_rag_search_tool
79-82: Consider clarifying edge case comment.The
size <= 0check correctly prevents division by zero, but a comment explaining this edge case would improve readability.def chunk_text(text: str, size: int = 10000) -> list[str]: + # Return full text if chunk size is invalid if size <= 0: return [text] return [text[i : i + size] for i in range(0, len(text), size)]
94-103: Add error handling for chunk summarization failures.If
summarization_agent.run()fails for a chunk, the entire summarization will fail. Consider wrapping this in try-except to handle individual chunk failures gracefully.async def summarize_chunk(idx, chunk, total_chunks, summarization_agent, ctx): sum_prompt = ( "Tu es un agent spécialisé en synthèses de textes. " "Génère un résumé clair et concis du passage suivant (partie {idx}/{total}) :\n" "'''\n{context}\n'''\n\n" ).format(context=chunk, idx=idx, total=total_chunks) logger.info("[summarize] CHUNK %s/%s prompt=> %s", idx, total_chunks, sum_prompt[0:100]+'...') + try: - resp = await summarization_agent.run(sum_prompt, usage=ctx.usage) + resp = await summarization_agent.run(sum_prompt, usage=ctx.usage) + except Exception as e: + logger.error("[summarize] CHUNK %s/%s failed: %s", idx, total_chunks, str(e)) + return f"[Chunk {idx} summarization failed]" logger.info("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "") return resp.output or ""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/backend/chat/agents/summarize.py(5 hunks)src/backend/chat/clients/pydantic_ai.py(1 hunks)src/backend/chat/tools/document_search_rag.py(2 hunks)src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/backend/chat/tools/document_search_rag.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/backend/chat/agents/summarize.py (1)
src/backend/chat/tools/document_search_rag.py (2)
document_search_rag(14-44)add_document_rag_search_tool(10-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test-e2e-chromium
- GitHub Check: test-back
- GitHub Check: build-and-push-backend
- GitHub Check: build-and-push-frontend
🔇 Additional comments (4)
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1)
71-71: LGTM! Clean conditional label for summarization status.The ternary expression correctly displays "Summarizing..." when the tool is 'summarize' and "Search..." otherwise, properly wrapped with i18n support.
src/backend/chat/clients/pydantic_ai.py (1)
493-500: LGTM! Clear system note preventing redundant upload prompts.The
attached_documents_notefunction provides clear context to the model about available documents, avoiding confusion and unnecessary re-upload requests.src/backend/chat/agents/summarize.py (2)
40-52: LGTM! Well-documented function signature with clear examples.The optional
instructionsparameter is properly defined as keyword-only, and the docstring provides helpful usage examples.
105-116: LGTM! Well-structured parallel chunk processing.The batched
asyncio.gatherapproach with batch size of 5 provides good parallelism while avoiding API rate limits. The indexing logic is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/backend/chat/tools/document_search_rag.py (1)
36-40: Use JSON-serializable metadata for sources.
metadata["sources"]is built as aset, which is not JSON-serializable. Downstream serialization of the tool result will crash as soon as a caller tries to encode this response. Switch to a list (or tuple) so the payload stays JSON-safe.- return ToolReturn( - return_value=rag_results.data, - content="", - metadata={"sources": {result.url for result in rag_results.data}}, - ) + return ToolReturn( + return_value=rag_results.data, + content="", + metadata={"sources": [result.url for result in rag_results.data]}, + )src/backend/chat/agents/summarize.py (1)
84-103: Short-circuit when no documents are available.If the conversation has no text attachments, we still build an empty context and call the LLM. That produces hallucinated “summaries” with zero source material. Bail out early with a deterministic tool response (e.g., “No documents available to summarize.”) so callers can handle the situation gracefully.
documents = [await read_document_content(doc) for doc in text_attachment] + if not documents: + return ToolReturn( + return_value="", + content="No documents available to summarize.", + metadata={"sources": []}, + ) +
♻️ Duplicate comments (1)
src/backend/chat/agents/summarize.py (1)
158-161: Fix metadata serialization (sources must be a list).We’re still returning
metadata["sources"]as a set, which breaks JSON serialization of the tool result. Convert it to a list before returning. (This was already flagged earlier.)- return ToolReturn( - return_value=final_summary, - metadata={"sources": {doc[0] for doc in documents}}, - ) + return ToolReturn( + return_value=final_summary, + metadata={"sources": [doc[0] for doc in documents]}, + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/backend/chat/agents/summarize.py(3 hunks)src/backend/chat/tools/document_search_rag.py(2 hunks)src/backend/conversations/settings.py(1 hunks)src/backend/pyproject.toml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test-e2e-chromium
- GitHub Check: test-back
- GitHub Check: lint-back
- GitHub Check: build-and-push-frontend
- GitHub Check: build-and-push-backend
a257939 to
49a0c20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/backend/chat/agents/summarize.py (3)
85-91: Handle empty documents list to prevent processing with no content.If no text attachments are found, the function proceeds to chunk empty data. This matches a previous unresolved review comment about checking for no documents.
Apply this diff to handle the empty case:
documents = [await read_document_content(doc) for doc in text_attachment] + +if not documents: + return ToolReturn( + return_value="No documents available to summarize.", + metadata={"sources": []}, + )
48-50: Remove full prompt/response text from logs to prevent PII leakage.Despite being marked as addressed in a previous review, lines 48-50, 54, 150, and 156 still log full document-derived content at DEBUG level. This violates compliance requirements for PII handling.
Apply this diff:
- logger.debug( - "[summarize] CHUNK %s/%s prompt=> %s", idx, total_chunks, sum_prompt[0:100] + "..." - ) + logger.debug( + "[summarize] CHUNK %s/%s prompt emitted (len=%d)", idx, total_chunks, len(sum_prompt) + ) resp = await summarization_agent.run(sum_prompt, usage=ctx.usage) - logger.debug("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "") + logger.debug( + "[summarize] CHUNK %s/%s response received (len=%d)", + idx, + total_chunks, + len(resp.output or ""), + )And:
- logger.debug("[summarize] MERGE prompt=> %s", merged_prompt) + logger.debug("[summarize] MERGE prompt emitted (len=%d)", len(merged_prompt)) merged_resp = await summarization_agent.run(merged_prompt, usage=ctx.usage) final_summary = (merged_resp.output or "").strip() - logger.debug("[summarize] MERGE response<= %s", final_summary) + logger.debug("[summarize] MERGE response received (len=%d)", len(final_summary))Also applies to: 54-54, 150-150, 156-156
158-161: Convertmetadata["sources"]from set to list for JSON serialization.This was flagged in a previous review but remains unresolved. Sets cannot be JSON-serialized, which will break downstream handling.
Apply this diff:
return ToolReturn( return_value=final_summary, - metadata={"sources": {doc[0] for doc in documents}}, + metadata={"sources": [doc[0] for doc in documents]}, )
🧹 Nitpick comments (2)
src/backend/chat/agents/summarize.py (2)
94-102: Consider using an actual tokenizer for more accurate chunk sizing.The current implementation uses
len(text.split())as a token counter, which is a rough approximation. For production use, consider using the actual tokenizer from the LLM to ensure chunks don't exceed model context limits.Example with tiktoken (if using OpenAI models):
import tiktoken encoding = tiktoken.encoding_for_model(settings.AI_MODEL) chunker = semchunk.chunkerify( tokenizer_or_token_counter=lambda text: len(encoding.encode(text)), chunk_size=chunk_size, )
111-128: Add error handling for chunk summarization failures.The current implementation uses
asyncio.gather()without error handling. If any chunk summarization fails, the entire operation fails. Consider handling partial failures gracefully.Apply this diff:
- chunk_summaries = await asyncio.gather(*summarization_tasks) + chunk_summaries = await asyncio.gather(*summarization_tasks, return_exceptions=True) + # Filter out exceptions and log them + valid_summaries = [] + for idx, summary in enumerate(chunk_summaries, start=1): + if isinstance(summary, Exception): + logger.warning("[summarize] Chunk %s failed: %s", idx, str(summary)) + else: + valid_summaries.append(summary) + chunk_summaries = valid_summaries
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/backend/chat/agents/summarize.py(3 hunks)src/backend/chat/clients/pydantic_ai.py(1 hunks)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py(5 hunks)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py(4 hunks)src/backend/chat/tools/document_search_rag.py(2 hunks)src/backend/conversations/settings.py(1 hunks)src/backend/pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/backend/pyproject.toml
- src/backend/chat/tools/document_search_rag.py
- src/backend/conversations/settings.py
- src/backend/chat/clients/pydantic_ai.py
🔇 Additional comments (7)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (2)
884-910: LGTM! Test assertions correctly reflect the new document handling prompts.The updates properly distinguish between
document_search_ragfor retrieving passages and thesummarizetool for summaries, and the new internal context prompt aligns with the PR's objective to inform the model about attached documents.
1012-1042: LGTM! Pydantic message assertions match the updated system prompts.The expected message structure correctly includes both the document_search_rag guidance and the internal context prompt.
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (3)
377-407: LGTM! Test assertions updated consistently with the new prompt chain.The document_search_rag guidance and internal context prompt are correctly added to the expected message flow.
638-638: Token count update aligns with new system prompts.The increase to 317 prompt tokens correctly reflects the additional tokens from the new
document_search_ragguidance and internal context system prompts added in this PR.
720-750: LGTM! Pydantic message expectations correctly updated.Both the document_search_rag and internal context system prompts are properly reflected in the expected message structure.
src/backend/chat/agents/summarize.py (2)
129-136: LGTM! Context building correctly aggregates chunk summaries.The use of
strict=Trueinzip()ensures documents and summaries are properly aligned.
138-148: LGTM! Merge prompt provides clear instructions for final synthesis.The prompt structure appropriately guides the model to produce a coherent, well-formatted summary while following user instructions.
49a0c20 to
7ee6bfd
Compare
7ee6bfd to
7cf423f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/backend/chat/clients/pydantic_ai.py (1)
482-511: Summarize tool never registered for PDF-only conversationsRight now the summarize tool (and its supporting system prompts) are defined only when
has_not_pdf_docsis true. If the user uploads only PDFs—which is our most common case—the branch falls into theelse:path and the summarize tool is never registered, so the model cannot calldocument_summarizeat all. That breaks the new end-to-end summarization workflow for PDF uploads.Please register the summarize system prompts and tool whenever the conversation actually has documents, regardless of type, and keep
add_document_rag_search_toolgated byhas_not_pdf_docs.- elif has_not_pdf_docs: - add_document_rag_search_tool(self.conversation_agent) - - @self.conversation_agent.system_prompt - def summarization_system_prompt() -> str: - ... - - @self.conversation_agent.system_prompt - def attached_documents_note() -> str: - ... - - @self.conversation_agent.tool(name="summarize", retries=2) - @functools.wraps(document_summarize) - async def summarize(ctx: RunContext, *args, **kwargs) -> ToolReturn: - return await document_summarize(ctx, *args, **kwargs) - else: + elif has_not_pdf_docs: + add_document_rag_search_tool(self.conversation_agent) + else: conversation_documents = [ ... ] ... + if conversation_has_documents: + @self.conversation_agent.system_prompt + def summarization_system_prompt() -> str: + ... + + @self.conversation_agent.system_prompt + def attached_documents_note() -> str: + ... + + @self.conversation_agent.tool(name="summarize", retries=2) + @functools.wraps(document_summarize) + async def summarize(ctx: RunContext, *args, **kwargs) -> ToolReturn: + return await document_summarize(ctx, *args, **kwargs)
♻️ Duplicate comments (1)
src/backend/chat/tools/document_search_rag.py (1)
36-40: Still returning a non-serializable set in metadataWe previously flagged this:
metadata={"sources": {…}}produces aset, which is not JSON-serializable and will blow up when the ToolReturn is encoded. Please switch to a list comprehension (orlist(...)) so downstream serialization keeps working.- return ToolReturn( - return_value=rag_results.data, - content="", - metadata={"sources": {result.url for result in rag_results.data}}, - ) + return ToolReturn( + return_value=rag_results.data, + content="", + metadata={"sources": [result.url for result in rag_results.data]}, + )
🧹 Nitpick comments (2)
src/backend/chat/tools/document_summarize.py (2)
29-50: Consider logging when chunk summary is empty.Line 50 returns an empty string when
resp.outputisNone, which silently continues processing. This might mask issues where the model fails to generate a summary. Consider logging a warning when this occurs.Apply this diff:
logger.debug("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "") + if not resp.output: + logger.warning("Chunk %s/%s produced empty summary", idx, total_chunks) return resp.output or ""
82-94: Consider memory usage when reading multiple large documents.Line 94 creates read tasks for all documents concurrently using a list comprehension with
await. For conversations with many large documents, this could consume significant memory. Consider reading documents one at a time or implementing a limit on concurrent reads.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
CHANGELOG.md(1 hunks)src/backend/chat/agents/summarize.py(0 hunks)src/backend/chat/clients/pydantic_ai.py(4 hunks)src/backend/chat/tests/conftest.py(2 hunks)src/backend/chat/tests/tools/test_document_summarize.py(1 hunks)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py(6 hunks)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py(4 hunks)src/backend/chat/tools/document_search_rag.py(2 hunks)src/backend/chat/tools/document_summarize.py(1 hunks)src/backend/conversations/settings.py(1 hunks)src/backend/pyproject.toml(1 hunks)src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx(1 hunks)
💤 Files with no reviewable changes (1)
- src/backend/chat/agents/summarize.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/backend/pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (5)
src/backend/chat/clients/pydantic_ai.py (1)
src/backend/chat/tools/document_summarize.py (1)
document_summarize(54-189)
src/backend/chat/tools/document_summarize.py (4)
src/backend/chat/clients/pydantic_ai.py (1)
summarize(508-510)src/backend/chat/agents/summarize.py (1)
SummarizationAgent(14-23)src/backend/chat/tools/exceptions.py (1)
ModelCannotRetry(17-23)src/backend/chat/tools/utils.py (1)
last_model_retry_soft_fail(14-50)
src/backend/chat/tests/tools/test_document_summarize.py (2)
src/backend/chat/agents/summarize.py (1)
SummarizationAgent(14-23)src/backend/chat/tools/document_summarize.py (2)
document_summarize(54-189)summarize_chunk(29-50)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (1)
src/backend/chat/tests/conftest.py (1)
SummarizationAgentMock(63-72)
src/backend/chat/tests/conftest.py (2)
src/backend/chat/agents/summarize.py (1)
SummarizationAgent(14-23)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (1)
SummarizationAgentMock(146-152)
🪛 GitHub Actions: Docker Hub Workflow
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx
[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.
🪛 GitHub Actions: Frontend Workflow
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx
[warning] 2-2: Code style issues found in the above file. Run Prettier with --write to fix.
[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.
🔇 Additional comments (8)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (4)
154-154: LGTM!The mock patch location correctly targets the new module where
SummarizationAgentis instantiated. This aligns with the new document summarization workflow introduced indocument_summarize.py.
377-384: LGTM - Tool usage semantics clarified.The new system prompt correctly enforces the separation between document retrieval (
document_search_rag) and summarization (summarizetool). This prevents the model from misusing the retrieval tool for summarization tasks.
399-407: LGTM - Good UX improvement.The internal context prompt prevents the model from requesting document re-uploads when documents are already attached. The
[Internal context]prefix is a clear convention for distinguishing system state from user instructions.
638-638: LGTM - Token usage reflects additional prompts.The increased token count (317 vs. previous ~201) correctly reflects the addition of new system prompts for document_search_rag usage and internal context.
src/backend/chat/tools/document_summarize.py (2)
114-138: LGTM - Good concurrency control.The semaphore-based rate limiting prevents overwhelming the LLM backend with concurrent requests. The error handling appropriately distinguishes between retryable and non-retryable failures.
139-189: LGTM - Solid merge implementation with good error handling.The merge step properly combines chunk summaries with clear instructions and validates the result. The exception handling correctly separates retryable errors from terminal failures.
src/backend/chat/tests/tools/test_document_summarize.py (2)
56-86: LGTM - Good test coverage for chunk summarization.These tests properly validate both the success path and error handling for individual chunk summarization. The use of FunctionModel for mocking is appropriate and the assertions are specific.
106-472: Excellent comprehensive test coverage.These tests thoroughly cover all major code paths including:
- Success paths with single and multiple documents
- Custom instructions handling
- Various error scenarios (no attachments, IO errors, chunk failures, merge failures)
- Edge cases (empty results, large documents)
The mock setup is realistic and assertions validate both outputs and internal behavior.
| @sync_to_async | ||
| def read_document_content(doc): | ||
| """Read document content asynchronously.""" | ||
| with default_storage.open(doc.key) as f: | ||
| return doc.file_name, f.read().decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle non-UTF-8 document encoding errors.
The function assumes all documents are UTF-8 encoded. If a document uses a different encoding, decode("utf-8") will raise a UnicodeDecodeError, which will propagate as an unexpected exception. Consider adding error handling or detecting the encoding.
Apply this diff to add basic error handling:
@sync_to_async
def read_document_content(doc):
"""Read document content asynchronously."""
- with default_storage.open(doc.key) as f:
- return doc.file_name, f.read().decode("utf-8")
+ try:
+ with default_storage.open(doc.key) as f:
+ return doc.file_name, f.read().decode("utf-8")
+ except UnicodeDecodeError as exc:
+ logger.warning("Document %s is not UTF-8 encoded", doc.file_name)
+ raise ModelCannotRetry(
+ f"Document {doc.file_name} uses an unsupported encoding. "
+ "Only UTF-8 encoded text documents are supported."
+ ) from excCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/backend/chat/tools/document_summarize.py around lines 22 to 26, the
function unconditionally decodes file bytes with decode("utf-8") which will
raise UnicodeDecodeError for non-UTF-8 files; update it to catch
UnicodeDecodeError and handle fallback decoding: try utf-8 first, on
UnicodeDecodeError either (a) retry decoding with a permissive fallback like
errors="replace" or a fallback encoding such as "latin-1", or (b) detect
encoding with chardet/charset-normalizer and decode accordingly; ensure you log
or warn about the fallback and still return (doc.file_name, decoded_text) so
callers receive a string rather than an exception.
| SUMMARIZATION_CHUNK_SIZE = values.PositiveIntegerValue( | ||
| default=20_000, # Approx 20k words per chunk | ||
| environ_name="SUMMARIZATION_CHUNK_SIZE", | ||
| environ_prefix=None, | ||
| ) | ||
| SUMMARIZATION_OVERLAP_SIZE = values.FloatValue( | ||
| default=0.05, # 5% overlap | ||
| environ_name="SUMMARIZATION_OVERLAP_SIZE", | ||
| environ_prefix=None, | ||
| ) | ||
| SUMMARIZATION_CONCURRENT_REQUESTS = values.PositiveIntegerValue( | ||
| default=5, | ||
| environ_name="SUMMARIZATION_CONCURRENT_REQUESTS", | ||
| environ_prefix=None, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify chunk size doesn't exceed model context windows.
The default chunk size of 20,000 words is quite large and may exceed the context window of some LLM models used for summarization. Ensure that the models configured via LLM_SUMMARIZATION_MODEL_HRID can handle prompts of this size (approximately 26,000+ tokens with instructions included).
Consider generating a shell script to check the configured summarization model's context window:
🏁 Script executed:
#!/bin/bash
# Check the configured summarization model's context window limits
echo "Configured summarization model HRID:"
rg -n "LLM_SUMMARIZATION_MODEL_HRID" --type=py -C 2Length of output: 1539
🏁 Script executed:
#!/bin/bash
# Search for LLM model definitions and configurations
echo "=== Looking for model configurations ==="
rg -n "LLModel|model_name|context_window" --type=py -C 2 | head -100
echo ""
echo "=== Checking for token counting utilities ==="
rg -n "token|TOKEN" --type=py src/backend/conversations/settings.py
echo ""
echo "=== Looking for summarization model configuration ==="
rg -n "LLM_CONFIGURATIONS|default-summarization-model" --type=py -C 3Length of output: 23799
🏁 Script executed:
#!/bin/bash
# Check for any context window validation or safeguards
echo "=== Checking for context window validation ==="
rg -n "context|CONTEXT_WINDOW|max_tokens" --type=py src/backend/chat/tools/ -A 2 -B 2
echo ""
echo "=== Checking summarization logic ==="
rg -n "chunk" --type=py src/backend/chat/tools/document_summarize.py -C 3
echo ""
echo "=== Checking for token calculation or limit checking ==="
rg -n "token|TOKEN" --type=py src/backend/chat/tools/ -C 2Length of output: 9397
Verify chunk size does not exceed the summarization model's context window.
The default chunk size of 20,000 words is problematic for Mistral 7B Instruct v0.1, which has an 8K sequence length. Mistral-7B-Instruct-v0.2 offers a 32k context window compared to 8k in v0.1. With the word-counting method at line 99 (len(text.split())), 20,000 words converts to approximately 26,000 tokens (~1.3 tokens per word)—exceeding v0.1's capacity by 3.25×.
- Confirm the deployed summarization model version (production uses v0.1 or v0.2?)
- Add runtime validation to ensure chunk size respects the configured model's context limits
- Consider reducing the default chunk size or adding a safeguard that adjusts it based on the model's capabilities
🤖 Prompt for AI Agents
In src/backend/conversations/settings.py around lines 789 to 803, the default
SUMMARIZATION_CHUNK_SIZE (20,000 words) can exceed the deployed summarization
model's context window (Mistral 7B Instruct v0.1 = 8k, v0.2 = 32k). Update the
settings to: (1) lower the default chunk size to a safe value for v0.1 (suggest
~6,000 words) or choose a conservative default; (2) add runtime validation that
reads the configured summarization model/version (from env or model config),
maps it to a context-window in tokens (e.g., 8k for v0.1, 32k for v0.2),
converts the chunk-size words -> estimated tokens using the existing estimator
(len(text.split()) * 1.3 or a fixed 1.3 factor), accounts for overlap and a
safety buffer, and either cap the chunk size or raise/log a clear error if the
configured size would exceed the model's context window; and (3) include a clear
log message indicating when the chunk size is adjusted and which model/version
made the decision.
Improve the existing tool to manage bigger documents.
This reduces the code complexity while allowing better "cuts" also providing overlap for free. Also, do not wait for sub-batch to complete a use a global concurrency instead.
This also fix the tool dynamic registation to use the original function signature and docsting.
Allows the LLM to retry the summarization in some cases.
When the loader was not in the proper place.
649761b to
b33a3e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
src/backend/conversations/settings.py (1)
789-803: Verify chunk size doesn't exceed model context windows.The default
SUMMARIZATION_CHUNK_SIZEof 20,000 words may exceed the context window of some LLM models. As previously noted, with the word-counting method at line 99 ofdocument_summarize.py(len(text.split())), 20,000 words converts to approximately 26,000 tokens (~1.3 tokens per word), which would exceed Mistral 7B Instruct v0.1's 8K context window by 3.25×.Consider:
- Lowering the default chunk size to a safer value (e.g., ~6,000 words for v0.1 compatibility)
- Adding runtime validation to check the configured model's context limits
- Implementing dynamic chunk size adjustment based on the model's capabilities
Based on past review comments.
src/backend/chat/tools/document_summarize.py (1)
22-26: Handle non-UTF-8 document encoding errors.The
read_document_contentfunction assumes all documents are UTF-8 encoded. If a document uses a different encoding,decode("utf-8")will raise aUnicodeDecodeError, which will propagate as an unexpected exception and be caught by the generic exception handler at lines 183-189, resulting in a less helpful error message.Apply this diff to add explicit error handling:
@sync_to_async def read_document_content(doc): """Read document content asynchronously.""" - with default_storage.open(doc.key) as f: - return doc.file_name, f.read().decode("utf-8") + try: + with default_storage.open(doc.key) as f: + return doc.file_name, f.read().decode("utf-8") + except UnicodeDecodeError as exc: + logger.warning("Document %s is not UTF-8 encoded", doc.file_name) + raise ModelCannotRetry( + f"Document {doc.file_name} uses an unsupported encoding. " + "Only UTF-8 encoded text documents are supported." + ) from excBased on past review comments.
src/backend/chat/tests/tools/test_document_summarize.py (1)
277-299: Same return type inconsistency as test_document_summarize_no_text_attachments.This test has the same issue with expecting a string return value when the source code shows
document_summarizereturnsToolReturnand raisesModelCannotRetry. See the previous comment for details.
🧹 Nitpick comments (3)
src/backend/chat/tests/tools/test_document_summarize.py (3)
41-48: Consider documenting themock_summarization_agentfixture dependency.Multiple test functions use a
mock_summarization_agentfixture that is not defined in this file. While it's likely defined inconftest.py, consider adding a comment or docstring to help readers understand where this critical fixture comes from.
184-188: Consider more explicit error handling in mock function.The
mock_open_side_effectfunction implicitly returns content fordoc2.txtwhen the key doesn't matchdoc1.txt. While this works for the current test, consider being more explicit to prevent confusion and make the test more maintainable.Apply this diff for clarity:
def mock_open_side_effect(key): """Mock file opening based on key.""" if key == "doc1.txt": return io.BytesIO(file_content1.encode("utf-8")) + elif key == "doc2.txt": - return io.BytesIO(file_content2.encode("utf-8")) + return io.BytesIO(file_content2.encode("utf-8")) + else: + raise FileNotFoundError(f"Unexpected key: {key}")
259-274: Decorator exists and correctly converts exceptions to strings—no functional issue, but return type annotation is inaccurate.The
@last_model_retry_soft_faildecorator is properly defined insrc/backend/chat/tools/utils.py(lines 14–49) and correctly applied todocument_summarize(line 53). WhenModelCannotRetryis raised, the decorator catches it and returnsstr(exc.message)(line 39), which is exactly what the tests expect.The tests are correct. However, there is a type annotation mismatch:
document_summarizedeclares return typeToolReturn(line 55–56), but the decorator wrapper can return eitherToolReturn(normal execution) orstr(on exception). The function's return type should beToolReturn | strto accurately reflect its runtime behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
CHANGELOG.md(1 hunks)src/backend/chat/agents/summarize.py(0 hunks)src/backend/chat/clients/pydantic_ai.py(4 hunks)src/backend/chat/tests/conftest.py(2 hunks)src/backend/chat/tests/tools/test_document_summarize.py(1 hunks)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py(6 hunks)src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py(4 hunks)src/backend/chat/tools/document_search_rag.py(2 hunks)src/backend/chat/tools/document_summarize.py(1 hunks)src/backend/conversations/settings.py(1 hunks)src/backend/pyproject.toml(1 hunks)src/frontend/apps/conversations/src/features/chat/components/Chat.tsx(1 hunks)
💤 Files with no reviewable changes (1)
- src/backend/chat/agents/summarize.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/backend/chat/tools/document_search_rag.py
- src/backend/chat/tests/conftest.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/backend/chat/tests/tools/test_document_summarize.py (3)
src/backend/chat/clients/pydantic_ai.py (1)
summarize(508-510)src/backend/chat/agents/summarize.py (1)
SummarizationAgent(14-23)src/backend/chat/tools/document_summarize.py (2)
document_summarize(54-189)summarize_chunk(29-50)
src/backend/chat/tools/document_summarize.py (4)
src/backend/chat/clients/pydantic_ai.py (1)
summarize(508-510)src/backend/chat/agents/summarize.py (1)
SummarizationAgent(14-23)src/backend/chat/tools/exceptions.py (1)
ModelCannotRetry(17-23)src/backend/chat/tools/utils.py (1)
last_model_retry_soft_fail(14-50)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (1)
src/backend/chat/tests/conftest.py (1)
SummarizationAgentMock(63-72)
src/backend/chat/clients/pydantic_ai.py (1)
src/backend/chat/tools/document_summarize.py (1)
document_summarize(54-189)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test-e2e-chromium
- GitHub Check: test-back
- GitHub Check: lint-back
- GitHub Check: build-and-push-backend
- GitHub Check: build-and-push-frontend
🔇 Additional comments (14)
src/backend/pyproject.toml (1)
64-64: LGTM!The
semchunk==3.2.5dependency is correctly added to support the document chunking functionality in the new summarization tool. Version pinning is appropriate for production stability.src/frontend/apps/conversations/src/features/chat/components/Chat.tsx (1)
704-720: LGTM!The dynamic loader text logic correctly distinguishes between summarization and search operations. The implementation properly filters out the
document_parsingtool and shows "Summarizing..." when thesummarizetool is active, providing clear user feedback.src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (2)
884-910: LGTM!The updated system prompts correctly reflect the new document summarization architecture:
- Clear guidance restricting
document_search_ragto passage retrieval only- Proper internal context notification about attached documents
- Well-aligned with the new tool-based summarization flow
1012-1042: Test assertions properly updated.The pydantic_messages assertions correctly validate the new system prompts, ensuring the internal context about attached documents is properly included in the conversation flow.
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (3)
154-154: Correct mock path update.The patch target correctly points to the new location where
SummarizationAgentis imported (chat.tools.document_summarize) rather than the old public entry point, aligning with the architectural refactoring.
377-407: System prompt updates align with new architecture.The test expectations correctly reflect the new document handling flow:
document_search_ragrestricted to passage retrieval- Clear separation between search and summarization
- Internal context prompt about attached documents
638-638: Token usage update is expected.The increased token usage (317/19 vs previous values) is expected due to the additional system prompts for document handling and internal context. This reflects the actual behavior of the new summarization flow.
src/backend/chat/clients/pydantic_ai.py (2)
10-10: LGTM!The new imports (
functools,RunContext, anddocument_summarize) are necessary and correctly used for the refactored summarization architecture.Also applies to: 29-29, 76-76
497-510: Well-structured tool integration.The summarization tool integration is cleanly implemented:
- Clear internal context system prompt informing the model about attached documents
- Proper delegation to
document_summarizewith context passthrough- Appropriate retry configuration (retries=2) for resilience
- Good use of
functools.wrapsto preserve function metadatasrc/backend/chat/tools/document_summarize.py (4)
114-121: Concurrency control is well-implemented.The semaphore-based concurrency limiting (
settings.SUMMARIZATION_CONCURRENT_REQUESTS) properly prevents overwhelming the LLM backend with parallel requests. The nested async function with semaphore context manager is a clean pattern.
148-178: Robust final merge logic.The merge summarization step is well-structured:
- Clear prompt construction with constraints
- Proper error handling with
ModelRetry- Empty result validation
- Good logging for debugging
- Correct
ToolReturnwith sources metadata
180-189: Comprehensive exception handling.The exception handling correctly distinguishes between retryable errors (
ModelRetry), non-retryable errors (ModelCannotRetry), and unexpected exceptions, with appropriate logging and user-facing messages.
97-105: Model context window information is not available in the codebase configuration.The proposed fix references
model_config.max_context_tokens, but theLLModelclass does not store this attribute. TheLLMSettingsonly includesmax_tokens(output limit), not context window limits. There is no infrastructure in the codebase to track or retrieve model context window limits for different models.While validating chunk size against model context limits is a reasonable concern, the suggested implementation cannot be applied as written without first extending the LLModel configuration system to track context window information.
Likely an incorrect or invalid review comment.
src/backend/chat/tests/tools/test_document_summarize.py (1)
106-472: Well-structured test suite with comprehensive coverage.The test suite effectively covers:
- Single and multiple document scenarios
- Custom instructions handling
- Error conditions (missing attachments, read errors, LLM failures)
- Edge cases (empty results, large documents with chunking)
- Both chunk-level and merge-level error handling
The tests properly use mocks, fixtures, and assertions to validate behavior. The call-counting pattern (e.g., lines 133-148, 452-466) is an excellent approach for verifying the internal flow without tight coupling.
| - 🐛(front) fix target blank links in chat #103 | ||
| - 🚑️(posthog) pass str instead of UUID for user PK #134 | ||
| - ⚡️(web-search) keep running when tool call fails #137 | ||
| - ✨(summarize): new summarize tool integration #78 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changelog entry is in the wrong section.
The entry "✨(summarize): new summarize tool integration #78" describes a new feature (indicated by the ✨ emoji), but it's placed under the "Fixed" section. According to Keep a Changelog format, new features should be listed under "Added" (for entirely new features) or "Changed" (for changes in existing functionality).
Apply this diff to move the entry to the appropriate section:
+### Added
+
+- ✨(summarize): new summarize tool integration #78
+
### Fixed
- 🐛(front) fix target blank links in chat #103
- 🚑️(posthog) pass str instead of UUID for user PK #134
- ⚡️(web-search) keep running when tool call fails #137
-- ✨(summarize): new summarize tool integration #78Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In CHANGELOG.md around line 16, the entry "✨(summarize): new summarize tool
integration #78" is placed under the "Fixed" section but is a new feature; move
that line out of the Fixed section and insert it under the "Added" (or "Changed"
if it modifies existing functionality) section instead, preserving the entry
text and ordering entries chronologically within the target section and updating
surrounding spacing/headers as needed.
| @pytest.mark.asyncio | ||
| async def test_summarize_chunk_handles_empty_response(mocked_context): | ||
| """Test that summarize_chunk handles empty responses from the agent.""" | ||
| summarization_agent = SummarizationAgent() | ||
|
|
||
| def mocked_empty_summary(_messages, _info=None): | ||
| """Mocked summary that returns empty content.""" | ||
| return ModelResponse(parts=[TextPart(content="")]) | ||
|
|
||
| with summarization_agent.override(model=FunctionModel(mocked_empty_summary)): | ||
| chunk = "This is a test chunk." | ||
|
|
||
| # Empty responses cause ModelRetry since pydantic-ai considers them invalid | ||
| with pytest.raises(ModelRetry): | ||
| await summarize_chunk(1, chunk, 1, summarization_agent, mocked_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that empty responses actually raise ModelRetry.
The test expects ModelRetry to be raised when the agent returns an empty string, but examining the source code of summarize_chunk, it simply returns resp.output or "" without explicit validation. Unless Pydantic AI's framework automatically validates non-empty strings for output_type=str agents, this test might not be exercising the intended behavior.
Run the following script to check if SummarizationAgent has validation that would raise ModelRetry for empty strings:
🏁 Script executed:
#!/bin/bash
# Check if there's validation logic for empty responses in the agent or base class
# Look for validation or empty-string checks in agent classes
rg -n -A5 -B5 --type=py 'output_type.*=.*str|def __init__|class.*Agent' src/backend/chat/agents/Length of output: 3409
Now I need to examine the actual summarize_chunk function to see if it validates empty responses:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Find and examine the summarize_chunk function implementation
rg -n -A20 'def summarize_chunk' --type=py src/backend/chat/Length of output: 4006
Need to verify if SummarizationAgent or BaseAgent has an @output_validator that might validate empty strings:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Check if SummarizationAgent or BaseAgent have @output_validator defined
rg -n -A5 -B2 'output_validator' --type=py src/backend/chat/agents/Length of output: 0
Test expectation is incorrect — empty responses will not raise ModelRetry.
The test mocks an empty response and expects ModelRetry to be raised, but the code will not trigger it:
summarize_chunkonly raisesModelRetrywhen anExceptionis caught (lines 43-47)- An empty string response is not an exception, so it bypasses the error handler
SummarizationAgenthas no@output_validatordecorator to validate empty outputs- Pydantic AI's default
stroutput type does not automatically validate against empty strings
The function will simply return the empty string. Update the test to either:
- Remove this test (empty responses are not actually invalid per the current implementation), or
- Add an
@output_validatortoSummarizationAgentto validate non-empty output and raiseModelRetryfor empty strings
🤖 Prompt for AI Agents
In src/backend/chat/tests/tools/test_document_summarize.py around lines 89 to
103, the test incorrectly expects a ModelRetry for an empty model response;
either update the test to assert that summarize_chunk returns an empty string
(or the appropriate empty ModelResponse behavior) when the agent returns
TextPart(content=""), or modify the SummarizationAgent implementation to include
an @output_validator that rejects empty string outputs and raises ModelRetry so
the test remains valid; choose one approach, then update the test or agent
accordingly and adjust assertions to match the implemented behavior.
Fixed - 🦺(front) Fix send prohibited file types - 🐛(front) fix target blank links in chat #103 - 🚑️(posthog) pass str instead of UUID for user PK #134 - ⚡️(web-search) keep running when tool call fails #137 - ✨(summarize): new summarize tool integration #78 Removed - 🔥(posthog) remove posthog middleware for async mode fix #146
Purpose
Cette PR améliore l’expérience de résumé de documents :
Proposal
toolName === 'summarize'Remarque
Les prompts utilisés pour la génération du résumé sont pour l'instant en Français, l'ensemble peut être traduis en anglais mais peut potentiellement impacter les résultats.
Summary by CodeRabbit
Release Notes
New Features
Documentation