Skip to content

Conversation

@camilleAND
Copy link
Contributor

@camilleAND camilleAND commented Oct 15, 2025

Purpose

Cette PR améliore l’expérience de résumé de documents :

  • Nouveau workflow du tool résumé (instructions optionnelles, chunking, merge).
  • Prompts système pour forcer l’usage des tools adéquats sans bruit côté UI.
  • Logs détaillés pour suivre chaque étape (chunk → merge).

Proposal

  • Tool summarize (instructions arg, chunk ~10000, merge final)
  • Forçage d’usage du tool summarize via prompt système (documents présents)
  • Front: “Summarizing…” affiché quand toolName === 'summarize'
  • Option d’activation des retries de validation (max_result_retries via deps)
  • Ajout

Remarque

Les prompts utilisés pour la génération du résumé sont pour l'instant en Français, l'ensemble peut être traduis en anglais mais peut potentiellement impacter les résultats.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a new "Summarize" tool for summarizing attached documents with support for multiple documents and custom instructions.
    • Dynamic loading indicator now displays "Summarizing..." status when using the summarize feature.
  • Documentation

    • Improved system prompts to clarify when to use document search versus the summarize tool.

@coderabbitai
Copy link

coderabbitai bot commented Oct 15, 2025

Walkthrough

The PR refactors document summarization from a direct agent hook to a tool-based architecture. It removes the public hand_off_to_summarization_agent entry point, implements a new document_summarize tool with chunked text processing and concurrent request handling, adds configuration options for chunk size and concurrency, introduces the semchunk dependency, updates agent client integration and system prompts, and modifies frontend UI to show "Summarizing..." status.

Changes

Cohort / File(s) Summary
Summarization agent refactor (removal)
src/backend/chat/agents/summarize.py
Removed public async function hand_off_to_summarization_agent() and helper read_document_content(), along with related imports; retained SummarizationAgent class.
New summarization tool implementation
src/backend/chat/tools/document_summarize.py
Added new document summarization tool implementing chunked processing, concurrent requests with semaphore limiting, multi-chunk merging, and comprehensive error handling via ModelRetry/ModelCannotRetry. Includes read_document_content(), summarize_chunk(), and document_summarize() async entrypoint returning ToolReturn.
Agent client integration
src/backend/chat/clients/pydantic_ai.py
Replaced direct hand_off_to_summarization_agent hook with new tool declaration summarize(ctx, *args, **kwargs) delegating to document_summarize; added attached_documents_note() system prompt. Updated imports to include functools, RunContext, and document_summarize.
Document search guidance
src/backend/chat/tools/document_search_rag.py
Updated query parameter description and system prompt to clarify document_search_rag is for retrieving passages only, not summarization.
Configuration additions
src/backend/conversations/settings.py
Added three new public configuration attributes: SUMMARIZATION_CHUNK_SIZE (default 20,000), SUMMARIZATION_OVERLAP_SIZE (default 0.05), SUMMARIZATION_CONCURRENT_REQUESTS (default 5).
Dependency
src/backend/pyproject.toml
Added semchunk==3.2.5 dependency for text chunking.
Test fixture
src/backend/chat/tests/conftest.py
Added new public fixture mock_summarization_agent_fixture() providing patched SummarizationAgent override with custom model injection for testing.
Comprehensive test suite
src/backend/chat/tests/tools/test_document_summarize.py
Added new test module with 10+ test cases covering success paths, error handling (ModelRetry, IO errors, empty results), multi-chunk processing, concurrent limits, and edge cases.
Integration test updates
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py, test_conversation_with_document_url.py
Updated test expectations to reflect new document_summarize tool flow, revised system prompts for attached documents context, adjusted token usage assertions, updated tool-call sequences.
Frontend streaming UI
src/frontend/apps/conversations/src/features/chat/components/Chat.tsx
Modified loading label to dynamically display "Summarizing..." when current assistant message contains a summarize tool invocation; otherwise shows "Search...".
Changelog
CHANGELOG.md
Added entry documenting new summarize tool integration.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Agent as Conversation Agent
    participant Tool as Summarize Tool
    participant Chunker as Text Chunker
    participant SumAgent as Summarization Agent
    participant Storage as Document Storage

    Note over User,Storage: Old Flow (Removed)
    User->>Agent: invoke hand_off_to_summarization_agent()
    Agent->>Storage: read_document_content()
    Storage-->>Agent: document content
    Agent->>SumAgent: invoke with full content
    SumAgent-->>Agent: summary
    Agent-->>User: ToolReturn(summary)

    Note over User,Storage: New Flow
    User->>Agent: invoke summarize tool
    Agent->>Tool: document_summarize(ctx)
    Tool->>Storage: read attachments
    Storage-->>Tool: document contents
    Tool->>Chunker: chunk texts with semchunk
    Chunker-->>Tool: text chunks
    loop Concurrent Chunks (semaphore-limited)
        Tool->>SumAgent: summarize_chunk(idx, chunk)
        SumAgent-->>Tool: chunk summary
    end
    Tool->>SumAgent: merge chunk summaries
    SumAgent-->>Tool: final merged summary
    Tool->>Agent: ToolReturn(summary, sources)
    Agent-->>User: streamed response with summary
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~40 minutes

  • Dense logic in document_summarize.py: Concurrent processing with semaphore limits, multi-chunk text chunking using semchunk, error handling with ModelRetry/ModelCannotRetry, and document content reading require careful verification of logic flow and edge cases.
  • Tool integration in pydantic_ai.py: New tool registration with functools.wraps, system prompt addition, and delegation pattern should be validated for consistency with agent framework expectations.
  • Comprehensive test coverage: New test_document_summarize.py includes 10+ test scenarios; ensure mocking strategy, context setup, and assertions align with implementation behavior.
  • Breaking change: Removal of hand_off_to_summarization_agent() may affect external consumers; verify no public API contracts are violated.
  • Test fixture migration: Verify mock_summarization_agent_fixture patching in conftest.py correctly handles both old and new import paths (chat.agents.summarize and chat.tools.document_summarize).

Possibly related PRs

Suggested labels

enhancement, frontend, refactoring

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: introduction of a new summarize tool integrated into the system, which aligns with the comprehensive refactoring across multiple files (agents, clients, tools, configuration, tests, and frontend).
Docstring Coverage ✅ Passed Docstring coverage is 91.30% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch camand/feat_tool_summarize

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/backend/chat/tools/document_search_rag.py (1)

29-33: Guard missing collection_id to prevent backend errors.

If conversation.collection_id is empty, constructing the store may fail. Early-return with empty results.

Apply this change:

-        document_store = document_store_backend(ctx.deps.conversation.collection_id)
+        if not getattr(ctx.deps.conversation, "collection_id", None):
+            return ToolReturn(return_value=[], content="", metadata={"sources": []})
+        document_store = document_store_backend(ctx.deps.conversation.collection_id)
src/backend/chat/clients/pydantic_ai.py (1)

580-596: Treat non-URL sources from summarize safely; avoid emitting invalid URL sources.

summarize.py returns document names as “sources”. Here they’re emitted as URL sources, which can render broken links. Only emit URL sources when strings look like http(s) URLs.

Apply this diff:

-                                if final_output.metadata and (
-                                    sources := final_output.metadata.get("sources")
-                                ):
-                                    for source_url in sources:
-                                        url_source = LanguageModelV1Source(
-                                            sourceType="url",
-                                            id=str(uuid.uuid4()),
-                                            url=source_url,
-                                            providerMetadata={},
-                                        )
-                                        _new_source_ui = SourceUIPart(
-                                            type="source", source=url_source
-                                        )
-                                        _ui_sources.append(_new_source_ui)
-                                        yield events_v4.SourcePart(
-                                            **_new_source_ui.source.model_dump()
-                                        )
+                                if final_output.metadata and (
+                                    sources := final_output.metadata.get("sources")
+                                ):
+                                    for src in sources:
+                                        if isinstance(src, str) and src.startswith(("http://", "https://")):
+                                            url_source = LanguageModelV1Source(
+                                                sourceType="url",
+                                                id=str(uuid.uuid4()),
+                                                url=src,
+                                                providerMetadata={},
+                                            )
+                                            _new_source_ui = SourceUIPart(
+                                                type="source", source=url_source
+                                            )
+                                            _ui_sources.append(_new_source_ui)
+                                            yield events_v4.SourcePart(
+                                                **_new_source_ui.source.model_dump()
+                                            )
🧹 Nitpick comments (6)
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1)

70-71: Label update looks good; consider i18n consistency.

Works as intended. For consistency, consider localizing the document parsing messages above too (currently hardcoded) and prefer stable keys (e.g., 'summarize.inProgress') over literal strings.

src/backend/chat/clients/pydantic_ai.py (1)

81-83: max_result_retries is introduced but not used.

Either wire it into the tool/result validation retry path or remove to avoid drift.

src/backend/chat/agents/summarize.py (4)

82-85: Reduce content leakage in logs for chunk processing.

Even truncated prompts/responses can leak sensitive data. Prefer debug level and avoid content bodies.

Apply this diff:

-        logger.info("[summarize] CHUNK %s/%s prompt=> %s", idx, total_chunks, sum_prompt[0:100]+'...')
+        logger.debug("[summarize] CHUNK %s/%s prompt emitted", idx, total_chunks)
-        resp = await summarization_agent.run(sum_prompt, usage=ctx.usage)
-        logger.info("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "")
+        resp = await summarization_agent.run(sum_prompt, usage=ctx.usage)
+        logger.debug("[summarize] CHUNK %s/%s response received", idx, total_chunks)

Based on learnings


67-68: Remove duplicate initialization of chunk_summaries.

It’s set twice; keep only once near usage.

Apply this diff:

-    chunk_summaries: list[str] = []
+    # will be filled by batch processing below
@@
-    chunk_summaries = []
+    chunk_summaries: list[str] = []

Also applies to: 88-89


3-6: Remove unused imports.

dataclasses and add_document_rag_search_tool aren’t used here.

Apply this diff:

-import dataclasses
 import logging
 import asyncio
@@
-from ..tools.document_search_rag import add_document_rag_search_tool

Also applies to: 14-15


100-102: Hard-coded French default instructions.

Consider deriving default language from conversation/user settings to avoid unexpected language output.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d242328 and c057a7e.

📒 Files selected for processing (4)
  • src/backend/chat/agents/summarize.py (3 hunks)
  • src/backend/chat/clients/pydantic_ai.py (2 hunks)
  • src/backend/chat/tools/document_search_rag.py (2 hunks)
  • src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/backend/chat/agents/summarize.py (2)
src/backend/chat/tools/document_search_rag.py (2)
  • document_search_rag (14-44)
  • add_document_rag_search_tool (10-52)
src/backend/chat/models.py (1)
  • ChatConversationContextKind (77-81)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test-e2e-chromium
  • GitHub Check: test-back
  • GitHub Check: build-and-push-backend
🔇 Additional comments (4)
src/backend/chat/tools/document_search_rag.py (1)

50-52: Prompt guidance is clear and aligned with summarize flow.

src/backend/chat/clients/pydantic_ai.py (2)

398-406: Explicit summarize-tool instruction is well-scoped.


409-414: Attached-documents system hint is a helpful guardrail.

src/backend/chat/agents/summarize.py (1)

31-34: Signature update (optional instructions) fits the workflow.

Comment on lines 46 to 72
# Collect documents content
documents_qs = ctx.deps.conversation.contexts.filter(
kind=ChatConversationContextKind.DOCUMENT.value,
)
documents_prompt = "\n\n".join(
[
(
"<document>\n"
f"<name>\n{doc.name}\n</name>\n"
f"<content>\n{doc.content}\n</content>\n"
"</document>"
)
async for doc in documents
]
)

formatted_prompt = prompt.format(
user_prompt=ctx.prompt,
documents_prompt=documents_prompt,
documents_list = [doc async for doc in documents_qs]

# Instructions: rely on tool argument only; model should extract them upstream
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle “no documents” explicitly to avoid empty-context summaries.

If no docs are found, return a clear tool result instead of prompting the model with empty context.

Apply this diff:

-    documents_qs = ctx.deps.conversation.contexts.filter(
+    documents_qs = ctx.deps.conversation.contexts.filter(
         kind=ChatConversationContextKind.DOCUMENT.value,
     )
-    documents_list = [doc async for doc in documents_qs]
+    documents_list = [doc async for doc in documents_qs]
+    if not documents_list:
+        return ToolReturn(return_value="", content="No documents available to summarize.", metadata={"sources": []})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Collect documents content
documents_qs = ctx.deps.conversation.contexts.filter(
kind=ChatConversationContextKind.DOCUMENT.value,
)
documents_prompt = "\n\n".join(
[
(
"<document>\n"
f"<name>\n{doc.name}\n</name>\n"
f"<content>\n{doc.content}\n</content>\n"
"</document>"
)
async for doc in documents
]
)
formatted_prompt = prompt.format(
user_prompt=ctx.prompt,
documents_prompt=documents_prompt,
documents_list = [doc async for doc in documents_qs]
# Instructions: rely on tool argument only; model should extract them upstream
# Collect documents content
documents_qs = ctx.deps.conversation.contexts.filter(
kind=ChatConversationContextKind.DOCUMENT.value,
)
documents_list = [doc async for doc in documents_qs]
if not documents_list:
return ToolReturn(
return_value="",
content="No documents available to summarize.",
metadata={"sources": []},
)
# Instructions: rely on tool argument only; model should extract them upstream
🤖 Prompt for AI Agents
In src/backend/chat/agents/summarize.py around lines 46 to 52, after building
documents_list from the conversation contexts, add an explicit check for an
empty list and immediately return a clear tool result instead of continuing to
invoke the model with empty context; specifically, if documents_list is empty
return the appropriate ToolResult/response used by this agent with a concise
message like "No documents found in conversation contexts" (and the standard
non-error success/handled flag your tooling expects) so the agent caller
receives a deterministic tool response rather than an empty-context summary.

Comment on lines 119 to 140
return ToolReturn(
return_value=response.output,
content=response.output,
metadata={"sources": {doc.name async for doc in documents}},
return_value=final_summary,
content=final_summary,
metadata={"sources": {doc.name for doc in documents_list}},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use list for metadata["sources"] to keep metadata JSON-serializable.

Sets can break serialization; also aligns with downstream handling.

Apply this diff:

-        metadata={"sources": {doc.name for doc in documents_list}},
+        metadata={"sources": [doc.name for doc in documents_list]},
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return ToolReturn(
return_value=response.output,
content=response.output,
metadata={"sources": {doc.name async for doc in documents}},
return_value=final_summary,
content=final_summary,
metadata={"sources": {doc.name for doc in documents_list}},
)
return ToolReturn(
return_value=final_summary,
content=final_summary,
metadata={"sources": [doc.name for doc in documents_list]},
)
🤖 Prompt for AI Agents
In src/backend/chat/agents/summarize.py around lines 119 to 123, the metadata
currently uses a set comprehension {"sources": {doc.name for doc in
documents_list}} which can break JSON serialization; change it to a list instead
by converting the comprehension to a list (e.g., ["sources": [doc.name for doc
in documents_list]] or wrap the set in list(...)) so metadata["sources"] is
JSON-serializable and compatible with downstream handlers.

Comment on lines 25 to 27
# Defensive: ctx.deps or ctx.deps.conversation may be unavailable in some flows (start of conversation)
if not getattr(ctx, "deps", None) or not getattr(ctx.deps, "conversation", None):
return ToolReturn(return_value=[], content="", metadata={"sources": set()})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid non-JSON-serializable types in metadata; use list instead of set.

Sets in metadata can break serialization later. Return a list for sources.

Apply this diff:

-            return ToolReturn(return_value=[], content="", metadata={"sources": set()})
+            return ToolReturn(return_value=[], content="", metadata={"sources": []})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Defensive: ctx.deps or ctx.deps.conversation may be unavailable in some flows (start of conversation)
if not getattr(ctx, "deps", None) or not getattr(ctx.deps, "conversation", None):
return ToolReturn(return_value=[], content="", metadata={"sources": set()})
# Defensive: ctx.deps or ctx.deps.conversation may be unavailable in some flows (start of conversation)
if not getattr(ctx, "deps", None) or not getattr(ctx.deps, "conversation", None):
return ToolReturn(return_value=[], content="", metadata={"sources": []})
🤖 Prompt for AI Agents
In src/backend/chat/tools/document_search_rag.py around lines 25 to 27, the
ToolReturn metadata currently uses a set() for "sources" which is not
JSON-serializable and can break downstream serialization; change the metadata to
use a list (e.g., [] or list()) instead of a set so the return value is
JSON-safe and preserves the empty collection semantics.

@qbey qbey added noChangeLog This does not require a changelog line backend labels Oct 15, 2025
@eliott07
Copy link

#87

@qbey qbey force-pushed the camand/feat_tool_summarize branch from c057a7e to ec9af9a Compare November 6, 2025 10:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/backend/chat/agents/summarize.py (3)

63-70: Handle empty document list explicitly.

If no documents are found, the code will attempt to summarize an empty string, wasting API calls. Add an early return with a clear message when documents is empty.

Apply this diff:

     documents = [await read_document_content(doc) for doc in text_attachment]
+    if not documents:
+        return ToolReturn(
+            return_value="",
+            content="No documents available to summarize.",
+            metadata={"sources": []},
+        )

132-135: Avoid logging full prompts and summaries; PII/compliance risk.

The merged prompt and final summary may contain sensitive document content. Log metadata only (e.g., character count) instead of full bodies.

Apply this diff:

-    logger.info("[summarize] MERGE prompt=> %s", merged_prompt)
+    logger.debug("[summarize] MERGE prompt emitted (len=%d)", len(merged_prompt))
     merged_resp = await summarization_agent.run(merged_prompt, usage=ctx.usage)
     final_summary = (merged_resp.output or "").strip()
-    logger.info("[summarize] MERGE response<= %s", final_summary)
+    logger.debug("[summarize] MERGE response received (len=%d)", len(final_summary))

137-140: Use list for metadata["sources"] to ensure JSON serialization.

Sets cannot be reliably JSON-serialized. Use a list instead to align with downstream handling.

Apply this diff:

     return ToolReturn(
         return_value=final_summary,
-        metadata={"sources": {doc[0] for doc in documents}},
+        metadata={"sources": [doc[0] for doc in documents]},
     )
🧹 Nitpick comments (3)
src/backend/chat/agents/summarize.py (3)

15-15: Remove unused import.

The import add_document_rag_search_tool is not used in this file. It may be leftover from refactoring.

Apply this diff:

-from ..tools.document_search_rag import add_document_rag_search_tool

79-82: Consider clarifying edge case comment.

The size <= 0 check correctly prevents division by zero, but a comment explaining this edge case would improve readability.

 def chunk_text(text: str, size: int = 10000) -> list[str]:
+    # Return full text if chunk size is invalid
     if size <= 0:
         return [text]
     return [text[i : i + size] for i in range(0, len(text), size)]

94-103: Add error handling for chunk summarization failures.

If summarization_agent.run() fails for a chunk, the entire summarization will fail. Consider wrapping this in try-except to handle individual chunk failures gracefully.

 async def summarize_chunk(idx, chunk, total_chunks, summarization_agent, ctx):
     sum_prompt = (
         "Tu es un agent spécialisé en synthèses de textes. "
         "Génère un résumé clair et concis du passage suivant (partie {idx}/{total}) :\n"
         "'''\n{context}\n'''\n\n"
     ).format(context=chunk, idx=idx, total=total_chunks)
     logger.info("[summarize] CHUNK %s/%s prompt=> %s", idx, total_chunks, sum_prompt[0:100]+'...')
+    try:
-    resp = await summarization_agent.run(sum_prompt, usage=ctx.usage)
+        resp = await summarization_agent.run(sum_prompt, usage=ctx.usage)
+    except Exception as e:
+        logger.error("[summarize] CHUNK %s/%s failed: %s", idx, total_chunks, str(e))
+        return f"[Chunk {idx} summarization failed]"
     logger.info("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "")
     return resp.output or ""
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c057a7e and ec9af9a.

📒 Files selected for processing (4)
  • src/backend/chat/agents/summarize.py (5 hunks)
  • src/backend/chat/clients/pydantic_ai.py (1 hunks)
  • src/backend/chat/tools/document_search_rag.py (2 hunks)
  • src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/backend/chat/tools/document_search_rag.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/backend/chat/agents/summarize.py (1)
src/backend/chat/tools/document_search_rag.py (2)
  • document_search_rag (14-44)
  • add_document_rag_search_tool (10-52)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test-e2e-chromium
  • GitHub Check: test-back
  • GitHub Check: build-and-push-backend
  • GitHub Check: build-and-push-frontend
🔇 Additional comments (4)
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1)

71-71: LGTM! Clean conditional label for summarization status.

The ternary expression correctly displays "Summarizing..." when the tool is 'summarize' and "Search..." otherwise, properly wrapped with i18n support.

src/backend/chat/clients/pydantic_ai.py (1)

493-500: LGTM! Clear system note preventing redundant upload prompts.

The attached_documents_note function provides clear context to the model about available documents, avoiding confusion and unnecessary re-upload requests.

src/backend/chat/agents/summarize.py (2)

40-52: LGTM! Well-documented function signature with clear examples.

The optional instructions parameter is properly defined as keyword-only, and the docstring provides helpful usage examples.


105-116: LGTM! Well-structured parallel chunk processing.

The batched asyncio.gather approach with batch size of 5 provides good parallelism while avoiding API rate limits. The indexing logic is correct.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/backend/chat/tools/document_search_rag.py (1)

36-40: Use JSON-serializable metadata for sources.

metadata["sources"] is built as a set, which is not JSON-serializable. Downstream serialization of the tool result will crash as soon as a caller tries to encode this response. Switch to a list (or tuple) so the payload stays JSON-safe.

-        return ToolReturn(
-            return_value=rag_results.data,
-            content="",
-            metadata={"sources": {result.url for result in rag_results.data}},
-        )
+        return ToolReturn(
+            return_value=rag_results.data,
+            content="",
+            metadata={"sources": [result.url for result in rag_results.data]},
+        )
src/backend/chat/agents/summarize.py (1)

84-103: Short-circuit when no documents are available.

If the conversation has no text attachments, we still build an empty context and call the LLM. That produces hallucinated “summaries” with zero source material. Bail out early with a deterministic tool response (e.g., “No documents available to summarize.”) so callers can handle the situation gracefully.

     documents = [await read_document_content(doc) for doc in text_attachment]
 
+    if not documents:
+        return ToolReturn(
+            return_value="",
+            content="No documents available to summarize.",
+            metadata={"sources": []},
+        )
+
♻️ Duplicate comments (1)
src/backend/chat/agents/summarize.py (1)

158-161: Fix metadata serialization (sources must be a list).

We’re still returning metadata["sources"] as a set, which breaks JSON serialization of the tool result. Convert it to a list before returning. (This was already flagged earlier.)

-    return ToolReturn(
-        return_value=final_summary,
-        metadata={"sources": {doc[0] for doc in documents}},
-    )
+    return ToolReturn(
+        return_value=final_summary,
+        metadata={"sources": [doc[0] for doc in documents]},
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec9af9a and a257939.

📒 Files selected for processing (4)
  • src/backend/chat/agents/summarize.py (3 hunks)
  • src/backend/chat/tools/document_search_rag.py (2 hunks)
  • src/backend/conversations/settings.py (1 hunks)
  • src/backend/pyproject.toml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test-e2e-chromium
  • GitHub Check: test-back
  • GitHub Check: lint-back
  • GitHub Check: build-and-push-frontend
  • GitHub Check: build-and-push-backend

@qbey qbey force-pushed the camand/feat_tool_summarize branch from a257939 to 49a0c20 Compare November 6, 2025 14:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
src/backend/chat/agents/summarize.py (3)

85-91: Handle empty documents list to prevent processing with no content.

If no text attachments are found, the function proceeds to chunk empty data. This matches a previous unresolved review comment about checking for no documents.

Apply this diff to handle the empty case:

 documents = [await read_document_content(doc) for doc in text_attachment]
+
+if not documents:
+    return ToolReturn(
+        return_value="No documents available to summarize.",
+        metadata={"sources": []},
+    )

48-50: Remove full prompt/response text from logs to prevent PII leakage.

Despite being marked as addressed in a previous review, lines 48-50, 54, 150, and 156 still log full document-derived content at DEBUG level. This violates compliance requirements for PII handling.

Apply this diff:

-    logger.debug(
-        "[summarize] CHUNK %s/%s prompt=> %s", idx, total_chunks, sum_prompt[0:100] + "..."
-    )
+    logger.debug(
+        "[summarize] CHUNK %s/%s prompt emitted (len=%d)", idx, total_chunks, len(sum_prompt)
+    )
 
     resp = await summarization_agent.run(sum_prompt, usage=ctx.usage)
 
-    logger.debug("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "")
+    logger.debug(
+        "[summarize] CHUNK %s/%s response received (len=%d)",
+        idx,
+        total_chunks,
+        len(resp.output or ""),
+    )

And:

-    logger.debug("[summarize] MERGE prompt=> %s", merged_prompt)
+    logger.debug("[summarize] MERGE prompt emitted (len=%d)", len(merged_prompt))
 
     merged_resp = await summarization_agent.run(merged_prompt, usage=ctx.usage)
 
     final_summary = (merged_resp.output or "").strip()
 
-    logger.debug("[summarize] MERGE response<= %s", final_summary)
+    logger.debug("[summarize] MERGE response received (len=%d)", len(final_summary))

Also applies to: 54-54, 150-150, 156-156


158-161: Convert metadata["sources"] from set to list for JSON serialization.

This was flagged in a previous review but remains unresolved. Sets cannot be JSON-serialized, which will break downstream handling.

Apply this diff:

     return ToolReturn(
         return_value=final_summary,
-        metadata={"sources": {doc[0] for doc in documents}},
+        metadata={"sources": [doc[0] for doc in documents]},
     )
🧹 Nitpick comments (2)
src/backend/chat/agents/summarize.py (2)

94-102: Consider using an actual tokenizer for more accurate chunk sizing.

The current implementation uses len(text.split()) as a token counter, which is a rough approximation. For production use, consider using the actual tokenizer from the LLM to ensure chunks don't exceed model context limits.

Example with tiktoken (if using OpenAI models):

import tiktoken

encoding = tiktoken.encoding_for_model(settings.AI_MODEL)
chunker = semchunk.chunkerify(
    tokenizer_or_token_counter=lambda text: len(encoding.encode(text)),
    chunk_size=chunk_size,
)

111-128: Add error handling for chunk summarization failures.

The current implementation uses asyncio.gather() without error handling. If any chunk summarization fails, the entire operation fails. Consider handling partial failures gracefully.

Apply this diff:

-        chunk_summaries = await asyncio.gather(*summarization_tasks)
+        chunk_summaries = await asyncio.gather(*summarization_tasks, return_exceptions=True)
+        # Filter out exceptions and log them
+        valid_summaries = []
+        for idx, summary in enumerate(chunk_summaries, start=1):
+            if isinstance(summary, Exception):
+                logger.warning("[summarize] Chunk %s failed: %s", idx, str(summary))
+            else:
+                valid_summaries.append(summary)
+        chunk_summaries = valid_summaries
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a257939 and 49a0c20.

📒 Files selected for processing (7)
  • src/backend/chat/agents/summarize.py (3 hunks)
  • src/backend/chat/clients/pydantic_ai.py (1 hunks)
  • src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (5 hunks)
  • src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (4 hunks)
  • src/backend/chat/tools/document_search_rag.py (2 hunks)
  • src/backend/conversations/settings.py (1 hunks)
  • src/backend/pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/backend/pyproject.toml
  • src/backend/chat/tools/document_search_rag.py
  • src/backend/conversations/settings.py
  • src/backend/chat/clients/pydantic_ai.py
🔇 Additional comments (7)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (2)

884-910: LGTM! Test assertions correctly reflect the new document handling prompts.

The updates properly distinguish between document_search_rag for retrieving passages and the summarize tool for summaries, and the new internal context prompt aligns with the PR's objective to inform the model about attached documents.


1012-1042: LGTM! Pydantic message assertions match the updated system prompts.

The expected message structure correctly includes both the document_search_rag guidance and the internal context prompt.

src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (3)

377-407: LGTM! Test assertions updated consistently with the new prompt chain.

The document_search_rag guidance and internal context prompt are correctly added to the expected message flow.


638-638: Token count update aligns with new system prompts.

The increase to 317 prompt tokens correctly reflects the additional tokens from the new document_search_rag guidance and internal context system prompts added in this PR.


720-750: LGTM! Pydantic message expectations correctly updated.

Both the document_search_rag and internal context system prompts are properly reflected in the expected message structure.

src/backend/chat/agents/summarize.py (2)

129-136: LGTM! Context building correctly aggregates chunk summaries.

The use of strict=True in zip() ensures documents and summaries are properly aligned.


138-148: LGTM! Merge prompt provides clear instructions for final synthesis.

The prompt structure appropriately guides the model to produce a coherent, well-formatted summary while following user instructions.

@qbey qbey force-pushed the camand/feat_tool_summarize branch from 49a0c20 to 7ee6bfd Compare November 6, 2025 21:57
@qbey qbey changed the title feat(tool/summarize): new summarize tool integration ✨(summarize): new summarize tool integration Nov 6, 2025
@qbey qbey force-pushed the camand/feat_tool_summarize branch from 7ee6bfd to 7cf423f Compare November 6, 2025 21:59
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/backend/chat/clients/pydantic_ai.py (1)

482-511: Summarize tool never registered for PDF-only conversations

Right now the summarize tool (and its supporting system prompts) are defined only when has_not_pdf_docs is true. If the user uploads only PDFs—which is our most common case—the branch falls into the else: path and the summarize tool is never registered, so the model cannot call document_summarize at all. That breaks the new end-to-end summarization workflow for PDF uploads.

Please register the summarize system prompts and tool whenever the conversation actually has documents, regardless of type, and keep add_document_rag_search_tool gated by has_not_pdf_docs.

-        elif has_not_pdf_docs:
-            add_document_rag_search_tool(self.conversation_agent)
-
-            @self.conversation_agent.system_prompt
-            def summarization_system_prompt() -> str:
-                ...
-
-            @self.conversation_agent.system_prompt
-            def attached_documents_note() -> str:
-                ...
-
-            @self.conversation_agent.tool(name="summarize", retries=2)
-            @functools.wraps(document_summarize)
-            async def summarize(ctx: RunContext, *args, **kwargs) -> ToolReturn:
-                return await document_summarize(ctx, *args, **kwargs)
-        else:
+        elif has_not_pdf_docs:
+            add_document_rag_search_tool(self.conversation_agent)
+        else:
             conversation_documents = [
                 ...
             ]
             ...
 
+        if conversation_has_documents:
+            @self.conversation_agent.system_prompt
+            def summarization_system_prompt() -> str:
+                ...
+
+            @self.conversation_agent.system_prompt
+            def attached_documents_note() -> str:
+                ...
+
+            @self.conversation_agent.tool(name="summarize", retries=2)
+            @functools.wraps(document_summarize)
+            async def summarize(ctx: RunContext, *args, **kwargs) -> ToolReturn:
+                return await document_summarize(ctx, *args, **kwargs)
♻️ Duplicate comments (1)
src/backend/chat/tools/document_search_rag.py (1)

36-40: Still returning a non-serializable set in metadata

We previously flagged this: metadata={"sources": {…}} produces a set, which is not JSON-serializable and will blow up when the ToolReturn is encoded. Please switch to a list comprehension (or list(...)) so downstream serialization keeps working.

-        return ToolReturn(
-            return_value=rag_results.data,
-            content="",
-            metadata={"sources": {result.url for result in rag_results.data}},
-        )
+        return ToolReturn(
+            return_value=rag_results.data,
+            content="",
+            metadata={"sources": [result.url for result in rag_results.data]},
+        )
🧹 Nitpick comments (2)
src/backend/chat/tools/document_summarize.py (2)

29-50: Consider logging when chunk summary is empty.

Line 50 returns an empty string when resp.output is None, which silently continues processing. This might mask issues where the model fails to generate a summary. Consider logging a warning when this occurs.

Apply this diff:

     logger.debug("[summarize] CHUNK %s/%s response<= %s", idx, total_chunks, resp.output or "")
+    if not resp.output:
+        logger.warning("Chunk %s/%s produced empty summary", idx, total_chunks)
     return resp.output or ""

82-94: Consider memory usage when reading multiple large documents.

Line 94 creates read tasks for all documents concurrently using a list comprehension with await. For conversations with many large documents, this could consume significant memory. Consider reading documents one at a time or implementing a limit on concurrent reads.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49a0c20 and 7cf423f.

📒 Files selected for processing (12)
  • CHANGELOG.md (1 hunks)
  • src/backend/chat/agents/summarize.py (0 hunks)
  • src/backend/chat/clients/pydantic_ai.py (4 hunks)
  • src/backend/chat/tests/conftest.py (2 hunks)
  • src/backend/chat/tests/tools/test_document_summarize.py (1 hunks)
  • src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (6 hunks)
  • src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (4 hunks)
  • src/backend/chat/tools/document_search_rag.py (2 hunks)
  • src/backend/chat/tools/document_summarize.py (1 hunks)
  • src/backend/conversations/settings.py (1 hunks)
  • src/backend/pyproject.toml (1 hunks)
  • src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx (1 hunks)
💤 Files with no reviewable changes (1)
  • src/backend/chat/agents/summarize.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/backend/pyproject.toml
🧰 Additional context used
🧬 Code graph analysis (5)
src/backend/chat/clients/pydantic_ai.py (1)
src/backend/chat/tools/document_summarize.py (1)
  • document_summarize (54-189)
src/backend/chat/tools/document_summarize.py (4)
src/backend/chat/clients/pydantic_ai.py (1)
  • summarize (508-510)
src/backend/chat/agents/summarize.py (1)
  • SummarizationAgent (14-23)
src/backend/chat/tools/exceptions.py (1)
  • ModelCannotRetry (17-23)
src/backend/chat/tools/utils.py (1)
  • last_model_retry_soft_fail (14-50)
src/backend/chat/tests/tools/test_document_summarize.py (2)
src/backend/chat/agents/summarize.py (1)
  • SummarizationAgent (14-23)
src/backend/chat/tools/document_summarize.py (2)
  • document_summarize (54-189)
  • summarize_chunk (29-50)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (1)
src/backend/chat/tests/conftest.py (1)
  • SummarizationAgentMock (63-72)
src/backend/chat/tests/conftest.py (2)
src/backend/chat/agents/summarize.py (1)
  • SummarizationAgent (14-23)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (1)
  • SummarizationAgentMock (146-152)
🪛 GitHub Actions: Docker Hub Workflow
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx

[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

🪛 GitHub Actions: Frontend Workflow
src/frontend/apps/conversations/src/features/chat/components/ToolInvocationItem.tsx

[warning] 2-2: Code style issues found in the above file. Run Prettier with --write to fix.


[error] 1-1: Prettier formatting check failed. Run 'prettier --write' to fix code style issues in this file.

🔇 Additional comments (8)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (4)

154-154: LGTM!

The mock patch location correctly targets the new module where SummarizationAgent is instantiated. This aligns with the new document summarization workflow introduced in document_summarize.py.


377-384: LGTM - Tool usage semantics clarified.

The new system prompt correctly enforces the separation between document retrieval (document_search_rag) and summarization (summarize tool). This prevents the model from misusing the retrieval tool for summarization tasks.


399-407: LGTM - Good UX improvement.

The internal context prompt prevents the model from requesting document re-uploads when documents are already attached. The [Internal context] prefix is a clear convention for distinguishing system state from user instructions.


638-638: LGTM - Token usage reflects additional prompts.

The increased token count (317 vs. previous ~201) correctly reflects the addition of new system prompts for document_search_rag usage and internal context.

src/backend/chat/tools/document_summarize.py (2)

114-138: LGTM - Good concurrency control.

The semaphore-based rate limiting prevents overwhelming the LLM backend with concurrent requests. The error handling appropriately distinguishes between retryable and non-retryable failures.


139-189: LGTM - Solid merge implementation with good error handling.

The merge step properly combines chunk summaries with clear instructions and validates the result. The exception handling correctly separates retryable errors from terminal failures.

src/backend/chat/tests/tools/test_document_summarize.py (2)

56-86: LGTM - Good test coverage for chunk summarization.

These tests properly validate both the success path and error handling for individual chunk summarization. The use of FunctionModel for mocking is appropriate and the assertions are specific.


106-472: Excellent comprehensive test coverage.

These tests thoroughly cover all major code paths including:

  • Success paths with single and multiple documents
  • Custom instructions handling
  • Various error scenarios (no attachments, IO errors, chunk failures, merge failures)
  • Edge cases (empty results, large documents)

The mock setup is realistic and assertions validate both outputs and internal behavior.

Comment on lines +22 to +26
@sync_to_async
def read_document_content(doc):
"""Read document content asynchronously."""
with default_storage.open(doc.key) as f:
return doc.file_name, f.read().decode("utf-8")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle non-UTF-8 document encoding errors.

The function assumes all documents are UTF-8 encoded. If a document uses a different encoding, decode("utf-8") will raise a UnicodeDecodeError, which will propagate as an unexpected exception. Consider adding error handling or detecting the encoding.

Apply this diff to add basic error handling:

 @sync_to_async
 def read_document_content(doc):
     """Read document content asynchronously."""
-    with default_storage.open(doc.key) as f:
-        return doc.file_name, f.read().decode("utf-8")
+    try:
+        with default_storage.open(doc.key) as f:
+            return doc.file_name, f.read().decode("utf-8")
+    except UnicodeDecodeError as exc:
+        logger.warning("Document %s is not UTF-8 encoded", doc.file_name)
+        raise ModelCannotRetry(
+            f"Document {doc.file_name} uses an unsupported encoding. "
+            "Only UTF-8 encoded text documents are supported."
+        ) from exc

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/backend/chat/tools/document_summarize.py around lines 22 to 26, the
function unconditionally decodes file bytes with decode("utf-8") which will
raise UnicodeDecodeError for non-UTF-8 files; update it to catch
UnicodeDecodeError and handle fallback decoding: try utf-8 first, on
UnicodeDecodeError either (a) retry decoding with a permissive fallback like
errors="replace" or a fallback encoding such as "latin-1", or (b) detect
encoding with chardet/charset-normalizer and decode accordingly; ensure you log
or warn about the fallback and still return (doc.file_name, decoded_text) so
callers receive a string rather than an exception.

Comment on lines +789 to +803
SUMMARIZATION_CHUNK_SIZE = values.PositiveIntegerValue(
default=20_000, # Approx 20k words per chunk
environ_name="SUMMARIZATION_CHUNK_SIZE",
environ_prefix=None,
)
SUMMARIZATION_OVERLAP_SIZE = values.FloatValue(
default=0.05, # 5% overlap
environ_name="SUMMARIZATION_OVERLAP_SIZE",
environ_prefix=None,
)
SUMMARIZATION_CONCURRENT_REQUESTS = values.PositiveIntegerValue(
default=5,
environ_name="SUMMARIZATION_CONCURRENT_REQUESTS",
environ_prefix=None,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify chunk size doesn't exceed model context windows.

The default chunk size of 20,000 words is quite large and may exceed the context window of some LLM models used for summarization. Ensure that the models configured via LLM_SUMMARIZATION_MODEL_HRID can handle prompts of this size (approximately 26,000+ tokens with instructions included).

Consider generating a shell script to check the configured summarization model's context window:


🏁 Script executed:

#!/bin/bash
# Check the configured summarization model's context window limits
echo "Configured summarization model HRID:"
rg -n "LLM_SUMMARIZATION_MODEL_HRID" --type=py -C 2

Length of output: 1539



🏁 Script executed:

#!/bin/bash
# Search for LLM model definitions and configurations
echo "=== Looking for model configurations ==="
rg -n "LLModel|model_name|context_window" --type=py -C 2 | head -100

echo ""
echo "=== Checking for token counting utilities ==="
rg -n "token|TOKEN" --type=py src/backend/conversations/settings.py

echo ""
echo "=== Looking for summarization model configuration ==="
rg -n "LLM_CONFIGURATIONS|default-summarization-model" --type=py -C 3

Length of output: 23799


🏁 Script executed:

#!/bin/bash
# Check for any context window validation or safeguards
echo "=== Checking for context window validation ==="
rg -n "context|CONTEXT_WINDOW|max_tokens" --type=py src/backend/chat/tools/ -A 2 -B 2

echo ""
echo "=== Checking summarization logic ==="
rg -n "chunk" --type=py src/backend/chat/tools/document_summarize.py -C 3

echo ""
echo "=== Checking for token calculation or limit checking ==="
rg -n "token|TOKEN" --type=py src/backend/chat/tools/ -C 2

Length of output: 9397


Verify chunk size does not exceed the summarization model's context window.

The default chunk size of 20,000 words is problematic for Mistral 7B Instruct v0.1, which has an 8K sequence length. Mistral-7B-Instruct-v0.2 offers a 32k context window compared to 8k in v0.1. With the word-counting method at line 99 (len(text.split())), 20,000 words converts to approximately 26,000 tokens (~1.3 tokens per word)—exceeding v0.1's capacity by 3.25×.

  • Confirm the deployed summarization model version (production uses v0.1 or v0.2?)
  • Add runtime validation to ensure chunk size respects the configured model's context limits
  • Consider reducing the default chunk size or adding a safeguard that adjusts it based on the model's capabilities
🤖 Prompt for AI Agents
In src/backend/conversations/settings.py around lines 789 to 803, the default
SUMMARIZATION_CHUNK_SIZE (20,000 words) can exceed the deployed summarization
model's context window (Mistral 7B Instruct v0.1 = 8k, v0.2 = 32k). Update the
settings to: (1) lower the default chunk size to a safe value for v0.1 (suggest
~6,000 words) or choose a conservative default; (2) add runtime validation that
reads the configured summarization model/version (from env or model config),
maps it to a context-window in tokens (e.g., 8k for v0.1, 32k for v0.2),
converts the chunk-size words -> estimated tokens using the existing estimator
(len(text.split()) * 1.3 or a fixed 1.3 factor), accounts for overlap and a
safety buffer, and either cap the chunk size or raise/log a clear error if the
configured size would exceed the model's context window; and (3) include a clear
log message indicating when the chunk size is adjusted and which model/version
made the decision.

camilleAND and others added 5 commits November 6, 2025 23:17
Improve the existing tool to manage bigger documents.
This reduces the code complexity while allowing better "cuts"
also providing overlap for free.
Also, do not wait for sub-batch to complete a use a global
concurrency instead.
This also fix the tool dynamic registation to use the original
function signature and docsting.
Allows the LLM to retry the summarization in some cases.
When the loader was not in the proper place.
@qbey qbey force-pushed the camand/feat_tool_summarize branch from 649761b to b33a3e4 Compare November 6, 2025 22:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/backend/conversations/settings.py (1)

789-803: Verify chunk size doesn't exceed model context windows.

The default SUMMARIZATION_CHUNK_SIZE of 20,000 words may exceed the context window of some LLM models. As previously noted, with the word-counting method at line 99 of document_summarize.py (len(text.split())), 20,000 words converts to approximately 26,000 tokens (~1.3 tokens per word), which would exceed Mistral 7B Instruct v0.1's 8K context window by 3.25×.

Consider:

  1. Lowering the default chunk size to a safer value (e.g., ~6,000 words for v0.1 compatibility)
  2. Adding runtime validation to check the configured model's context limits
  3. Implementing dynamic chunk size adjustment based on the model's capabilities

Based on past review comments.

src/backend/chat/tools/document_summarize.py (1)

22-26: Handle non-UTF-8 document encoding errors.

The read_document_content function assumes all documents are UTF-8 encoded. If a document uses a different encoding, decode("utf-8") will raise a UnicodeDecodeError, which will propagate as an unexpected exception and be caught by the generic exception handler at lines 183-189, resulting in a less helpful error message.

Apply this diff to add explicit error handling:

 @sync_to_async
 def read_document_content(doc):
     """Read document content asynchronously."""
-    with default_storage.open(doc.key) as f:
-        return doc.file_name, f.read().decode("utf-8")
+    try:
+        with default_storage.open(doc.key) as f:
+            return doc.file_name, f.read().decode("utf-8")
+    except UnicodeDecodeError as exc:
+        logger.warning("Document %s is not UTF-8 encoded", doc.file_name)
+        raise ModelCannotRetry(
+            f"Document {doc.file_name} uses an unsupported encoding. "
+            "Only UTF-8 encoded text documents are supported."
+        ) from exc

Based on past review comments.

src/backend/chat/tests/tools/test_document_summarize.py (1)

277-299: Same return type inconsistency as test_document_summarize_no_text_attachments.

This test has the same issue with expecting a string return value when the source code shows document_summarize returns ToolReturn and raises ModelCannotRetry. See the previous comment for details.

🧹 Nitpick comments (3)
src/backend/chat/tests/tools/test_document_summarize.py (3)

41-48: Consider documenting the mock_summarization_agent fixture dependency.

Multiple test functions use a mock_summarization_agent fixture that is not defined in this file. While it's likely defined in conftest.py, consider adding a comment or docstring to help readers understand where this critical fixture comes from.


184-188: Consider more explicit error handling in mock function.

The mock_open_side_effect function implicitly returns content for doc2.txt when the key doesn't match doc1.txt. While this works for the current test, consider being more explicit to prevent confusion and make the test more maintainable.

Apply this diff for clarity:

 def mock_open_side_effect(key):
     """Mock file opening based on key."""
     if key == "doc1.txt":
         return io.BytesIO(file_content1.encode("utf-8"))
+    elif key == "doc2.txt":
-    return io.BytesIO(file_content2.encode("utf-8"))
+        return io.BytesIO(file_content2.encode("utf-8"))
+    else:
+        raise FileNotFoundError(f"Unexpected key: {key}")

259-274: Decorator exists and correctly converts exceptions to strings—no functional issue, but return type annotation is inaccurate.

The @last_model_retry_soft_fail decorator is properly defined in src/backend/chat/tools/utils.py (lines 14–49) and correctly applied to document_summarize (line 53). When ModelCannotRetry is raised, the decorator catches it and returns str(exc.message) (line 39), which is exactly what the tests expect.

The tests are correct. However, there is a type annotation mismatch: document_summarize declares return type ToolReturn (line 55–56), but the decorator wrapper can return either ToolReturn (normal execution) or str (on exception). The function's return type should be ToolReturn | str to accurately reflect its runtime behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf423f and b33a3e4.

📒 Files selected for processing (12)
  • CHANGELOG.md (1 hunks)
  • src/backend/chat/agents/summarize.py (0 hunks)
  • src/backend/chat/clients/pydantic_ai.py (4 hunks)
  • src/backend/chat/tests/conftest.py (2 hunks)
  • src/backend/chat/tests/tools/test_document_summarize.py (1 hunks)
  • src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (6 hunks)
  • src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (4 hunks)
  • src/backend/chat/tools/document_search_rag.py (2 hunks)
  • src/backend/chat/tools/document_summarize.py (1 hunks)
  • src/backend/conversations/settings.py (1 hunks)
  • src/backend/pyproject.toml (1 hunks)
  • src/frontend/apps/conversations/src/features/chat/components/Chat.tsx (1 hunks)
💤 Files with no reviewable changes (1)
  • src/backend/chat/agents/summarize.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/backend/chat/tools/document_search_rag.py
  • src/backend/chat/tests/conftest.py
🧰 Additional context used
🧬 Code graph analysis (4)
src/backend/chat/tests/tools/test_document_summarize.py (3)
src/backend/chat/clients/pydantic_ai.py (1)
  • summarize (508-510)
src/backend/chat/agents/summarize.py (1)
  • SummarizationAgent (14-23)
src/backend/chat/tools/document_summarize.py (2)
  • document_summarize (54-189)
  • summarize_chunk (29-50)
src/backend/chat/tools/document_summarize.py (4)
src/backend/chat/clients/pydantic_ai.py (1)
  • summarize (508-510)
src/backend/chat/agents/summarize.py (1)
  • SummarizationAgent (14-23)
src/backend/chat/tools/exceptions.py (1)
  • ModelCannotRetry (17-23)
src/backend/chat/tools/utils.py (1)
  • last_model_retry_soft_fail (14-50)
src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (1)
src/backend/chat/tests/conftest.py (1)
  • SummarizationAgentMock (63-72)
src/backend/chat/clients/pydantic_ai.py (1)
src/backend/chat/tools/document_summarize.py (1)
  • document_summarize (54-189)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test-e2e-chromium
  • GitHub Check: test-back
  • GitHub Check: lint-back
  • GitHub Check: build-and-push-backend
  • GitHub Check: build-and-push-frontend
🔇 Additional comments (14)
src/backend/pyproject.toml (1)

64-64: LGTM!

The semchunk==3.2.5 dependency is correctly added to support the document chunking functionality in the new summarization tool. Version pinning is appropriate for production stability.

src/frontend/apps/conversations/src/features/chat/components/Chat.tsx (1)

704-720: LGTM!

The dynamic loader text logic correctly distinguishes between summarization and search operations. The implementation properly filters out the document_parsing tool and shows "Summarizing..." when the summarize tool is active, providing clear user feedback.

src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_url.py (2)

884-910: LGTM!

The updated system prompts correctly reflect the new document summarization architecture:

  • Clear guidance restricting document_search_rag to passage retrieval only
  • Proper internal context notification about attached documents
  • Well-aligned with the new tool-based summarization flow

1012-1042: Test assertions properly updated.

The pydantic_messages assertions correctly validate the new system prompts, ensuring the internal context about attached documents is properly included in the conversation flow.

src/backend/chat/tests/views/chat/conversations/test_conversation_with_document_upload.py (3)

154-154: Correct mock path update.

The patch target correctly points to the new location where SummarizationAgent is imported (chat.tools.document_summarize) rather than the old public entry point, aligning with the architectural refactoring.


377-407: System prompt updates align with new architecture.

The test expectations correctly reflect the new document handling flow:

  • document_search_rag restricted to passage retrieval
  • Clear separation between search and summarization
  • Internal context prompt about attached documents

638-638: Token usage update is expected.

The increased token usage (317/19 vs previous values) is expected due to the additional system prompts for document handling and internal context. This reflects the actual behavior of the new summarization flow.

src/backend/chat/clients/pydantic_ai.py (2)

10-10: LGTM!

The new imports (functools, RunContext, and document_summarize) are necessary and correctly used for the refactored summarization architecture.

Also applies to: 29-29, 76-76


497-510: Well-structured tool integration.

The summarization tool integration is cleanly implemented:

  • Clear internal context system prompt informing the model about attached documents
  • Proper delegation to document_summarize with context passthrough
  • Appropriate retry configuration (retries=2) for resilience
  • Good use of functools.wraps to preserve function metadata
src/backend/chat/tools/document_summarize.py (4)

114-121: Concurrency control is well-implemented.

The semaphore-based concurrency limiting (settings.SUMMARIZATION_CONCURRENT_REQUESTS) properly prevents overwhelming the LLM backend with parallel requests. The nested async function with semaphore context manager is a clean pattern.


148-178: Robust final merge logic.

The merge summarization step is well-structured:

  • Clear prompt construction with constraints
  • Proper error handling with ModelRetry
  • Empty result validation
  • Good logging for debugging
  • Correct ToolReturn with sources metadata

180-189: Comprehensive exception handling.

The exception handling correctly distinguishes between retryable errors (ModelRetry), non-retryable errors (ModelCannotRetry), and unexpected exceptions, with appropriate logging and user-facing messages.


97-105: Model context window information is not available in the codebase configuration.

The proposed fix references model_config.max_context_tokens, but the LLModel class does not store this attribute. The LLMSettings only includes max_tokens (output limit), not context window limits. There is no infrastructure in the codebase to track or retrieve model context window limits for different models.

While validating chunk size against model context limits is a reasonable concern, the suggested implementation cannot be applied as written without first extending the LLModel configuration system to track context window information.

Likely an incorrect or invalid review comment.

src/backend/chat/tests/tools/test_document_summarize.py (1)

106-472: Well-structured test suite with comprehensive coverage.

The test suite effectively covers:

  • Single and multiple document scenarios
  • Custom instructions handling
  • Error conditions (missing attachments, read errors, LLM failures)
  • Edge cases (empty results, large documents with chunking)
  • Both chunk-level and merge-level error handling

The tests properly use mocks, fixtures, and assertions to validate behavior. The call-counting pattern (e.g., lines 133-148, 452-466) is an excellent approach for verifying the internal flow without tight coupling.

- 🐛(front) fix target blank links in chat #103
- 🚑️(posthog) pass str instead of UUID for user PK #134
- ⚡️(web-search) keep running when tool call fails #137
- ✨(summarize): new summarize tool integration #78
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Changelog entry is in the wrong section.

The entry "✨(summarize): new summarize tool integration #78" describes a new feature (indicated by the ✨ emoji), but it's placed under the "Fixed" section. According to Keep a Changelog format, new features should be listed under "Added" (for entirely new features) or "Changed" (for changes in existing functionality).

Apply this diff to move the entry to the appropriate section:

+### Added
+
+- ✨(summarize): new summarize tool integration #78
+
 ### Fixed
 
 - 🐛(front) fix target blank links in chat #103
 - 🚑️(posthog) pass str instead of UUID for user PK #134
 - ⚡️(web-search) keep running when tool call fails #137
-- ✨(summarize): new summarize tool integration #78

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In CHANGELOG.md around line 16, the entry "✨(summarize): new summarize tool
integration #78" is placed under the "Fixed" section but is a new feature; move
that line out of the Fixed section and insert it under the "Added" (or "Changed"
if it modifies existing functionality) section instead, preserving the entry
text and ordering entries chronologically within the target section and updating
surrounding spacing/headers as needed.

Comment on lines +89 to +103
@pytest.mark.asyncio
async def test_summarize_chunk_handles_empty_response(mocked_context):
"""Test that summarize_chunk handles empty responses from the agent."""
summarization_agent = SummarizationAgent()

def mocked_empty_summary(_messages, _info=None):
"""Mocked summary that returns empty content."""
return ModelResponse(parts=[TextPart(content="")])

with summarization_agent.override(model=FunctionModel(mocked_empty_summary)):
chunk = "This is a test chunk."

# Empty responses cause ModelRetry since pydantic-ai considers them invalid
with pytest.raises(ModelRetry):
await summarize_chunk(1, chunk, 1, summarization_agent, mocked_context)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify that empty responses actually raise ModelRetry.

The test expects ModelRetry to be raised when the agent returns an empty string, but examining the source code of summarize_chunk, it simply returns resp.output or "" without explicit validation. Unless Pydantic AI's framework automatically validates non-empty strings for output_type=str agents, this test might not be exercising the intended behavior.

Run the following script to check if SummarizationAgent has validation that would raise ModelRetry for empty strings:


🏁 Script executed:

#!/bin/bash
# Check if there's validation logic for empty responses in the agent or base class

# Look for validation or empty-string checks in agent classes
rg -n -A5 -B5 --type=py 'output_type.*=.*str|def __init__|class.*Agent' src/backend/chat/agents/

Length of output: 3409


Now I need to examine the actual summarize_chunk function to see if it validates empty responses:

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Find and examine the summarize_chunk function implementation
rg -n -A20 'def summarize_chunk' --type=py src/backend/chat/

Length of output: 4006


Need to verify if SummarizationAgent or BaseAgent has an @output_validator that might validate empty strings:

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Check if SummarizationAgent or BaseAgent have @output_validator defined
rg -n -A5 -B2 'output_validator' --type=py src/backend/chat/agents/

Length of output: 0


Test expectation is incorrect — empty responses will not raise ModelRetry.

The test mocks an empty response and expects ModelRetry to be raised, but the code will not trigger it:

  • summarize_chunk only raises ModelRetry when an Exception is caught (lines 43-47)
  • An empty string response is not an exception, so it bypasses the error handler
  • SummarizationAgent has no @output_validator decorator to validate empty outputs
  • Pydantic AI's default str output type does not automatically validate against empty strings

The function will simply return the empty string. Update the test to either:

  1. Remove this test (empty responses are not actually invalid per the current implementation), or
  2. Add an @output_validator to SummarizationAgent to validate non-empty output and raise ModelRetry for empty strings
🤖 Prompt for AI Agents
In src/backend/chat/tests/tools/test_document_summarize.py around lines 89 to
103, the test incorrectly expects a ModelRetry for an empty model response;
either update the test to assert that summarize_chunk returns an empty string
(or the appropriate empty ModelResponse behavior) when the agent returns
TextPart(content=""), or modify the SummarizationAgent implementation to include
an @output_validator that rejects empty string outputs and raises ModelRetry so
the test remains valid; choose one approach, then update the test or agent
accordingly and adjust assertions to match the implemented behavior.

@qbey qbey removed the noChangeLog This does not require a changelog line label Nov 6, 2025
@qbey qbey merged commit b33a3e4 into main Nov 6, 2025
21 checks passed
@qbey qbey deleted the camand/feat_tool_summarize branch November 6, 2025 22:27
qbey added a commit that referenced this pull request Nov 10, 2025
Fixed

- 🦺(front) Fix send prohibited file types
- 🐛(front) fix target blank links in chat #103
- 🚑️(posthog) pass str instead of UUID for user PK #134
- ⚡️(web-search) keep running when tool call fails #137
- ✨(summarize): new summarize tool integration #78

Removed

- 🔥(posthog) remove posthog middleware for async mode fix #146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants