feat(backend): implement task-level context aggregation for RAG #786

kissghosts · 2025-12-30T08:50:55Z

This change implements task-level context aggregation, enabling global knowledge base visibility across all subtasks in a task. This is particularly useful for group chat scenarios where new members can access task-level knowledge bases without individual permissions.

Key changes:

Task-level context storage:
- Add contexts.subtask_contexts field to Task JSON structure
- Store context IDs with type (knowledge_base/attachment) for filtering
Incremental sync mechanism:
- Create task_contexts.py service module for context aggregation
- Sync contexts to Task when linking to subtasks
- Implement get_kb_contexts_from_task for efficient retrieval
Priority-based context resolution:
- Current subtask contexts > Task-level historical contexts
- Fallback to Task contexts when subtask has no knowledge bases
Enhanced KnowledgeBaseTool:
- Add dynamic description with available knowledge bases list
- Include KB metadata (ID + Name) in tool description and prompt
WebSocket integration:
- Pass task_id to link_contexts_to_subtask for sync

Benefits:

Task-level knowledge base sharing in group chats
No duplicate knowledge base references
Efficient JSON-based filtering (no extra DB queries)
Backward compatible with existing subtask contexts

Summary by CodeRabbit

New Features
- Unified context system consolidating attachments and knowledge bases for improved management.
- Context badges display attachments and knowledge bases inline within chat messages.
- Enhanced knowledge base retrieval with better integration into chat workflows.
Bug Fixes & Performance
- Reduced tool-calling request limit for improved system stability.
Database
- Added data migration to organize context storage for better scalability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Refactored subtask_attachments table to subtask_contexts table for unified context management. This enables storing multiple context types (attachments, knowledge bases, etc.) with a flexible schema. Key changes: - New SubtaskContext model replacing SubtaskAttachment - New ContextService for unified context operations - Updated attachments API to use ContextService internally - Added contexts field to WebSocket events and frontend types - New ContextBadgeList component for displaying all context types - Database migration with data migration from old table

- Add execute_rag_retrieval_for_contexts() to execute RAG retrieval when creating knowledge_base contexts and store results in SubtaskContext.extracted_text with sources in type_data - Add context_service methods: - update_knowledge_base_retrieval_result(): Store RAG results - mark_knowledge_base_context_failed(): Handle retrieval failures - build_knowledge_base_text_prefix(): Format KB content for messages - get_knowledge_base_contexts_by_subtask(): Get KB contexts - get_knowledge_base_meta_for_task(): Collect unique KBs from task - Modify _build_history_message() to load both attachment and knowledge_base contexts, with attachments having priority for token allocation (MAX_EXTRACTED_TEXT_LENGTH shared limit) - Add get_knowledge_base_meta_prompt() to generate KB meta info prompt for system prompt injection - Update prepare_knowledge_base_tools() to accept task_id and inject historical KB meta info into system prompt This enables: 1. First message executes RAG retrieval and persists results 2. Follow-up messages load RAG results from history 3. Agent receives KB meta info to use KnowledgeBaseTool for additional retrieval if needed

coderabbitai · 2025-12-30T08:51:00Z

📝 Walkthrough

Walkthrough

The PR unifies handling of subtask attachments and knowledge bases by introducing a new SubtaskContext model and ContextService that replace the attachment-only SubtaskAttachment approach. It includes database migration, updated ORM models, refactored services across chat preprocessing/triggering/storage, new unified schemas, and frontend components that display and manage contexts alongside messages.

Changes

Cohort / File(s)	Summary
Database Schema & Migration `backend/alembic/versions/o5p6q7r8s9t0_add_subtask_contexts_table.py`	Creates subtask_contexts table with comprehensive fields (id, subtask_id, user_id, context_type, name, status, binary_data, extracted_text, type_data, etc.), migrates existing attachment data, and drops subtask_attachments; downgrade reverses the process.
Core Models `backend/app/models/subtask_context.py`, `backend/app/models/subtask.py`, `backend/app/models/knowledge.py`, `backend/app/models/__init__.py`	New SubtaskContext ORM model with ContextType/ContextStatus enums, custom binary/text type adapters, and helper properties for attachment/KB metadata; Subtask relationship switches from attachments to contexts (read-only); knowledge.py documentation updated.
Schemas - Context & Subtask `backend/app/schemas/subtask_context.py`, `backend/app/schemas/subtask.py`, `backend/app/schemas/task.py`	New comprehensive schemas for contexts (SubtaskContextResponse, SubtaskContextBrief), backward-compatible attachment responses (AttachmentResponse, AttachmentDetailResponse) with factory constructors; SubtaskInDB adds contexts field; minimal doc updates to task.py.
API Endpoints & Events `backend/app/api/endpoints/adapter/attachments.py`, `backend/app/api/ws/events.py`	Attachment endpoints now use unified ContextService instead of AttachmentService, validate context_type=ATTACHMENT, and build responses via from_context factory method; ChatMessagePayload adds contexts field and marks attachments as legacy.
Core Context Service `backend/app/services/context/__init__.py`, `backend/app/services/context/context_service.py`	New ContextService package/class providing upload, retrieval, linking, deletion, and metadata operations for both attachment and knowledge-base contexts; handles storage backends, truncation, image/text extraction, and ownership checks.
Chat Preprocessing - Contexts `backend/app/services/chat/preprocessing/__init__.py`, `backend/app/services/chat/preprocessing/contexts.py`, `backend/app/services/chat/preprocessing/attachments.py`	New contexts.py module replaces attachments.py with unified context processing (process_contexts, link_contexts_to_subtask, prepare_contexts_for_chat); handles image/text assembly, KB tool preparation, and historical metadata; old attachments.py deleted.
Chat Triggering & Preprocessing `backend/app/services/chat/trigger/core.py`, `backend/app/services/chat/operations/retry.py`, `backend/app/services/chat/storage/task_contexts.py`	trigger_ai_response replaces knowledge_base_ids parameter with user_subtask_id and uses prepare_contexts_for_chat for unified context processing; retry.py switches to contexts eager loading; new task_contexts.py provides task-level context synchronization and KB metadata retrieval.
Chat Shell Tools `backend/app/chat_shell/history/loader.py`, `backend/app/chat_shell/tools/builtin/knowledge_base.py`, `backend/app/chat_shell/tools/knowledge_factory.py`	History loader uses context_service and splits contexts by type; KnowledgeBaseTool adds dynamic description, user_subtask_id tracking, and _persist_rag_results async helper for storing RAG results; knowledge_factory supports historical KB metadata and RAG persistence.
Storage & Export Services `backend/app/services/attachment/mysql_storage.py`, `backend/app/services/export/docx_generator.py`, `backend/app/services/rag/document_service.py`, `backend/app/services/shared_task.py`, `backend/app/services/knowledge_service.py`, `backend/app/services/subtask.py`	All services migrated from SubtaskAttachment to SubtaskContext; storage keys updated; metadata extraction now uses type_data JSON; docx export adapts to context fields; shared_task copies contexts instead of attachments; knowledge_service calls context_service.delete_context.
Configuration `backend/app/core/config.py`	CHAT_TOOL_MAX_REQUESTS lowered from 20 to 10.
Frontend Chat Components `frontend/src/features/tasks/components/chat/ChatArea.tsx`, `frontend/src/features/tasks/components/chat/useChatAreaState.ts`, `frontend/src/features/tasks/components/chat/useChatStreamHandlers.tsx`	ChatArea passes resetContexts to stream handlers; useChatAreaState exposes resetContexts callback to clear selectedContexts; useChatStreamHandlers accepts resetContexts, invokes it post-send, and constructs pendingContexts for message payloads.
Frontend Message Components `frontend/src/features/tasks/components/message/ContextBadgeList.tsx`, `frontend/src/features/tasks/components/message/MessageBubble.tsx`, `frontend/src/features/tasks/components/message/MessagesArea.tsx`	New ContextBadgeList component renders attachment and knowledge-base context badges; MessageBubble adds contexts field and uses ContextBadgeList instead of AttachmentPreview; MessagesArea propagates contexts and sources through message data.
Frontend Context & Hooks `frontend/src/features/tasks/contexts/chatStreamContext.tsx`, `frontend/src/features/tasks/hooks/useUnifiedMessages.ts`	UnifiedMessage adds contexts field (deprecates attachments); ChatStreamContextType.sendMessage options accept pendingContexts; all chat event handlers (chat:start, chat:done, etc.) extract and propagate contexts; useUnifiedMessages adds contexts and sources to DisplayMessage.
Frontend Types `frontend/src/types/api.ts`, `frontend/src/types/socket.ts`	New ContextType and ContextStatus types; SubtaskContextBrief interface with attachment/KB-specific fields; TaskDetailSubtask adds contexts field; ChatMessagePayload adds contexts array; socket types define ChatMessageContext.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Frontend)
    participant Chat as Chat API<br/>(Endpoint)
    participant Context as ContextService
    participant Store as Storage<br/>(MySQL)
    participant DB as Database

    User->>Chat: upload_attachment(file)
    Chat->>Context: upload_attachment(db, user_id, filename, binary_data, subtask_id)
    
    Context->>Context: validate file extension/size
    Context->>Store: save(context_id, binary_data)
    Store->>DB: update context storage metadata
    Store-->>Context: storage_key, storage_backend
    
    Context->>DB: create SubtaskContext (status=UPLOADING)
    Context->>Context: parse document (text/image extraction)
    Context->>DB: update extracted_text, image_base64, status=PARSING
    
    Context->>DB: update status=READY (or FAILED on error)
    Context-->>Chat: (SubtaskContext, TruncationInfo)
    Chat-->>User: AttachmentResponse (via from_context)

sequenceDiagram
    participant User as User (Frontend)
    participant WS as WebSocket<br/>(chat_namespace)
    participant Preprocess as Chat Preprocessing
    participant Context as ContextService
    participant Trigger as Chat Trigger
    participant AI as AI Service

    User->>WS: chat:send (message, attachment_ids, context_ids)
    
    WS->>Preprocess: prepare_contexts_for_chat(user_subtask_id, message)
    Preprocess->>Context: get_by_subtask(subtask_id)
    Context->>Context: filter by context_type
    Context-->>Preprocess: [SubtaskContext, ...]
    
    Preprocess->>Preprocess: separate attachment vs KB contexts
    Preprocess->>Preprocess: build vision/text blocks for attachments
    Preprocess->>Preprocess: prepare KB tool instances (with user_subtask_id)
    Preprocess-->>WS: (final_message, enhanced_system_prompt, extra_tools)
    
    WS->>Trigger: trigger_ai_response(..., user_subtask_id)
    Trigger->>Trigger: create ChatAgent with extra_tools
    Trigger->>AI: stream chat (with KB tool, vision blocks)
    
    AI->>AI: invoke KB tool for RAG retrieval
    AI->>Context: (implicit via KB tool) persist RAG results
    Context->>DB: update KB context (extracted_text, sources)
    
    AI-->>Trigger: response_stream
    Trigger-->>WS: chat:message, chat:done
    WS-->>User: render message + contexts badges

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The changes introduce a significant architectural shift from attachment-centric to unified context handling across multiple layers (database, ORM, services, endpoints, frontend). The review requires understanding the new SubtaskContext model, comprehensive ContextService, context linking/processing pipelines, and widespread integration points across chat preprocessing, triggering, storage, and UI components. Heterogeneous changes across the codebase (not simple repetitive patterns) and dense logic in several files (context_service.py, contexts.py, chat trigger logic) increase complexity.

Possibly related PRs

refactor(context): unify subtask context management #759: Implements the same unified context system (SubtaskContext model, ContextService, schema/endpoint migrations, frontend type updates) as a parallel or duplicate effort.
Wegent/improve attachment error handling #703: Shares the attachment upload/parsing return type change (adding TruncationInfo and truncation handling to response shapes) that this PR also refactors.
Refactor chat input badge display for improved visual hierarchy #690: Modifies the same chat input UI components (ContextBadgeList, badge rendering, context selection) to unify attachment/knowledge-base badge display.

Poem

🐰 From fragments split in scattered stores,

Contexts unified, attachments no more!

Knowledge bases dance with files so fine,

One table holds them—a schema divine. 🎉

The rabbit's refactored, the future is bright!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title directly relates to the main changes in the changeset, focusing on implementing task-level context aggregation for RAG, which is the central feature being added across backend services.
Docstring Coverage	✅ Passed	Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

This change implements task-level context aggregation, enabling global knowledge base visibility across all subtasks in a task. This is particularly useful for group chat scenarios where new members can access task-level knowledge bases without individual permissions. Key changes: 1. Task-level context storage: - Add contexts.subtask_contexts field to Task JSON structure - Store context IDs with type (knowledge_base/attachment) for filtering 2. Incremental sync mechanism: - Create task_contexts.py service module for context aggregation - Sync contexts to Task when linking to subtasks - Implement get_kb_contexts_from_task for efficient retrieval 3. Priority-based context resolution: - Current subtask contexts > Task-level historical contexts - Fallback to Task contexts when subtask has no knowledge bases 4. Enhanced KnowledgeBaseTool: - Add dynamic description with available knowledge bases list - Include KB metadata (ID + Name) in tool description and prompt 5. WebSocket integration: - Pass task_id to link_contexts_to_subtask for sync Benefits: - Task-level knowledge base sharing in group chats - No duplicate knowledge base references - Efficient JSON-based filtering (no extra DB queries) - Backward compatible with existing subtask contexts

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/api/endpoints/adapter/attachments.py (1)

304-355: Critical: executor_download_attachment references undefined attachment_service.

This endpoint still uses attachment_service (lines 321, 331-333, 339-340, 347-348, 351), but the import was removed during the migration to context_service. This will cause a NameError at runtime when this endpoint is called.

The endpoint needs to be migrated to use context_service like the other endpoints, or attachment_service needs to be imported.

🔎 Proposed fix: Migrate to context_service

 @router.get("/{attachment_id}/executor-download")
 async def executor_download_attachment(
     attachment_id: int,
     db: Session = Depends(get_db),
     current_user: User = Depends(security.get_current_user),
 ):
     """
     Download attachment for executor.
     ...
     """
     # Get attachment and verify ownership
-    attachment = attachment_service.get_attachment(
+    context = context_service.get_context_optional(
         db=db,
-        attachment_id=attachment_id,
+        context_id=attachment_id,
         user_id=current_user.id,
     )
 
-    if attachment is None:
+    if context is None:
+        raise HTTPException(status_code=404, detail="Attachment not found")
+
+    # Verify it's an attachment type
+    if context.context_type != ContextType.ATTACHMENT.value:
         raise HTTPException(status_code=404, detail="Attachment not found")
 
     # Get binary data from the appropriate storage backend
-    binary_data = attachment_service.get_attachment_binary_data(
+    binary_data = context_service.get_attachment_binary_data(
         db=db,
-        attachment=attachment,
+        context=context,
     )
 
     if binary_data is None:
         logger.error(
             f"Failed to retrieve binary data for attachment {attachment_id}, "
-            f"storage_backend={attachment.storage_backend}, "
-            f"storage_key={attachment.storage_key}"
+            f"storage_backend={context.storage_backend}, "
+            f"storage_key={context.storage_key}"
         )
         raise HTTPException(
             status_code=500, detail="Failed to retrieve attachment data"
         )
 
     # Encode filename for Content-Disposition header
-    encoded_filename = quote(attachment.original_filename)
+    encoded_filename = quote(context.original_filename)
 
     return Response(
         content=binary_data,
-        media_type=attachment.mime_type,
+        media_type=context.mime_type,
         headers={
             "Content-Disposition": f"attachment; filename*=UTF-8''{encoded_filename}"
         },
     )

🧹 Nitpick comments (20)

backend/alembic/versions/o5p6q7r8s9t0_add_subtask_contexts_table.py (1)
82-120: Refactor SQL construction to use parameters instead of f-strings.

While the current code is safe (the f-string interpolates a code-controlled variable, not user input), using f-strings with sa.text() is a code smell that static analyzers rightfully flag. Consider refactoring to bind parameters or use SQLAlchemy's expression language for better maintainability.
🔎 Alternative approach using conditional execution
-        # Build the updated_at expression based on column existence
-        updated_at_expr = (
-            "COALESCE(updated_at, created_at)" if has_updated_at else "created_at"
-        )
-
-        # Migrate data from subtask_attachments to subtask_contexts
-        connection.execute(
-            sa.text(
-                f"""
+        # Migrate data with conditional updated_at handling
+        if has_updated_at:
+            migration_sql = """
             INSERT INTO subtask_contexts (
                 subtask_id, user_id, context_type, name, status, error_message,
                 binary_data, image_base64, extracted_text, text_length,
                 type_data, created_at, updated_at
             )
             SELECT
                 subtask_id,
                 user_id,
                 'attachment' AS context_type,
                 original_filename AS name,
                 status,
                 error_message,
                 binary_data,
                 image_base64,
                 extracted_text,
                 COALESCE(text_length, 0) AS text_length,
                 JSON_OBJECT(
                     'original_filename', original_filename,
                     'file_extension', file_extension,
                     'file_size', file_size,
                     'mime_type', mime_type,
                     'storage_key', COALESCE(storage_key, ''),
                     'storage_backend', COALESCE(storage_backend, 'mysql')
                 ) AS type_data,
                 created_at,
-                {updated_at_expr} AS updated_at
+                COALESCE(updated_at, created_at) AS updated_at
             FROM subtask_attachments
-        """
+        """
+        else:
+            migration_sql = """
+            INSERT INTO subtask_contexts (
+                subtask_id, user_id, context_type, name, status, error_message,
+                binary_data, image_base64, extracted_text, text_length,
+                type_data, created_at, updated_at
+            )
+            SELECT
+                subtask_id,
+                user_id,
+                'attachment' AS context_type,
+                original_filename AS name,
+                status,
+                error_message,
+                binary_data,
+                image_base64,
+                extracted_text,
+                COALESCE(text_length, 0) AS text_length,
+                JSON_OBJECT(
+                    'original_filename', original_filename,
+                    'file_extension', file_extension,
+                    'file_size', file_size,
+                    'mime_type', mime_type,
+                    'storage_key', COALESCE(storage_key, ''),
+                    'storage_backend', COALESCE(storage_backend, 'mysql')
+                ) AS type_data,
+                created_at,
+                created_at AS updated_at
+            FROM subtask_attachments
+        """
+        
+        connection.execute(sa.text(migration_sql))
-            )
-        )
frontend/src/features/tasks/components/message/MessagesArea.tsx (1)

86-102: Streaming bubble now passes RAG sources correctly

Forwarding message.sources into msgForBubble keeps the streaming path aligned with the non‑streaming one for RAG citations; this looks correct. If you also want context badges to render while streaming, consider additionally passing contexts: message.contexts here for consistency with convertToMessage.

backend/app/services/rag/document_service.py (1)

42-104: Attachment binary fetch refactor to SubtaskContext looks correct, minor naming nit

The switch to querying SubtaskContext with ContextType.ATTACHMENT and ContextStatus.READY plus delegating binary retrieval to context_service.get_attachment_binary_data is sound, and the error handling/logging cover the important failure modes. The only minor nit is that the parameter and surrounding docstrings still talk about attachment_id while it’s now effectively a context ID for an attachment‑type context; consider renaming to avoid confusion in future refactors.

backend/app/services/adapters/task_kinds.py (1)

1075-1145: Subtask contexts assembly matches unified context brief shape

Building contexts_list with base fields plus attachment- and knowledge‑base–specific fields (extension/size/mime_type vs document_count) aligns with the SubtaskContextBrief schema and keeps attachments as an empty legacy field for backward compatibility. You might consider reusing the existing SubtaskContextBrief.from_model helper instead of duplicating the mapping logic here to avoid drift if the brief schema evolves.

backend/app/services/attachment/mysql_storage.py (1)

48-66: MySQL storage now correctly targets SubtaskContext, but key format docs are inconsistent

Using SubtaskContext and updating binary_data, type_data.storage_backend, and type_data.storage_key is consistent with the new context model, and _extract_attachment_id correctly derives the numeric ID from the final underscore‑separated segment. However, the docstrings for save/get/delete/exists say the key format is attachments/{context_id}, while _extract_attachment_id (and its docstring) expect attachments/{uuid}_{timestamp}_{user_id}_{context_id}. It would be good to unify these comments to reflect the actual, supported key format and avoid confusion for future maintainers.

Also applies to: 97-103, 124-133, 159-165, 202-231

backend/app/chat_shell/history/loader.py (2)

86-103: Unified context loading into history is logically sound, with small refactor opportunity

Loading all SubtaskContext rows for a user subtask and then prioritizing attachment vision/text prefixes before fitting knowledge‑base prefixes into MAX_EXTRACTED_TEXT_LENGTH is a reasonable strategy and keeps behavior explicit. One minor clean‑up: _build_history_message takes a context_service parameter but immediately re‑imports context_service inside the function, so the argument is effectively unused; you can either drop the parameter and rely on the import, or remove the inner import and use the injected service to simplify the call site.

Also applies to: 121-217

244-285: get_knowledge_base_meta_prompt helper is clear; consider trimming formatting

The meta‑prompt builder correctly reuses context_service.get_knowledge_base_meta_for_task and formats a concise KB list. If this string is inserted directly into system prompts, you might want to strip() or avoid the leading/trailing blank lines introduced by the triple‑quoted literal to keep prompt formatting tight, but that’s cosmetic.

frontend/src/features/tasks/components/chat/useChatStreamHandlers.tsx (1)

115-136: Context payload and pending context wiring look correct

Mapping selectedContexts into WebSocket contexts (with knowledge_id, name, and optional document_count for knowledge_base) and building pendingContexts in a SubtaskContextBrief‑compatible shape for immediate display are both aligned with the new unified context model. Invoking resetContexts?.() alongside resetAttachment() on send is also a good integration point. As a small polish, you could reuse the exported SubtaskContextBrief type for pendingContexts instead of an inline literal to keep the shapes in lockstep.

Also applies to: 432-453, 455-495, 495-521

backend/app/services/chat/storage/task_contexts.py (2)

23-62: Task-level context sync logic is fine; drop unused db parameter

Incrementally merging new_context_entries into task.json["contexts"]["subtask_contexts"] with deduplication on id and marking the JSON as modified is straightforward and should work well for task‑level aggregation. The db: Session argument to sync_task_contexts is currently unused though; if you don’t expect to need it here, consider removing it to satisfy Ruff’s ARG001 and clarify that the caller is responsible for committing.

65-101: KB context retrieval helpers are efficient and well-scoped

Filtering KB context IDs directly from TaskResource.json["contexts"]["subtask_contexts"] and then loading the corresponding SubtaskContext rows with context_type == KNOWLEDGE_BASE avoids extra joins and keeps behavior clearly bounded. The truthy TaskResource.is_active filter is acceptable, though you may prefer the more explicit TaskResource.is_active == True style for readability.

Also applies to: 104-130

frontend/src/features/tasks/components/message/ContextBadgeList.tsx (1)

62-96: Attachment context → Attachment mapping is reasonable

The status mapping from context statuses to Attachment statuses and the conversion into the minimal Attachment shape expected by AttachmentPreview (id, filename, size, mime type, extension, status, created_at) looks correct and should render existing UI consistently. If AttachmentPreview ever surfaces created_at, you may want to feed a real timestamp from the context (once available on SubtaskContextBrief) instead of an empty string.
backend/app/chat_shell/tools/knowledge_factory.py (1)
44-50: Consider extracting duplicated KB meta prompt logic.

The pattern of conditionally appending kb_meta_prompt to enhanced_system_prompt appears twice (lines 47-49 and 96-99). While minor, this could be simplified to reduce duplication.
🔎 Optional: Extract helper for prompt concatenation
+def _append_kb_meta_if_available(base_prompt: str, db: Any, task_id: Optional[int]) -> str:
+    """Append historical KB meta prompt if task_id is provided and meta exists."""
+    if task_id:
+        kb_meta_prompt = _build_historical_kb_meta_prompt(db, task_id)
+        if kb_meta_prompt:
+            return f"{base_prompt}{kb_meta_prompt}"
+    return base_prompt
backend/app/schemas/subtask.py (1)
87-103: Consider using ContextType enum for type safety.

The context_type field is declared as str (line 91), but the comparison in from_model (line 125) uses string literals "attachment" and "knowledge_base". Using the ContextType enum from app.models.subtask_context would provide type safety and prevent typos.
🔎 Proposed improvement
+from app.models.subtask_context import ContextType
+
 class SubtaskContextBrief(BaseModel):
     """Brief context info for message list display"""
 
     id: int
-    context_type: str
+    context_type: ContextType
     name: str
     status: str
     # ...
Then update from_model:
-        if context.context_type == "attachment":
+        if context.context_type == ContextType.ATTACHMENT.value:
backend/app/services/export/docx_generator.py (1)
326-332: Consider using original_filename property for caption consistency.

Line 329 uses attachment.name for the caption, but the SubtaskContext model has an original_filename property that provides the intended filename (falling back to name if not in type_data). Using the property would be more semantically consistent with other file-related code.
🔎 Minor suggestion
-            run = caption.add_run(attachment.name)
+            run = caption.add_run(attachment.original_filename)
backend/app/models/subtask_context.py (1)

29-43: Enum duplication with backend/app/schemas/subtask_context.py.

These enums (ContextType, ContextStatus) are defined identically in both the model and schema files. While this provides layer separation, it introduces risk of drift if one is updated without the other.

Consider importing from a shared location or consolidating to prevent inconsistencies.

frontend/src/features/tasks/contexts/chatStreamContext.tsx (1)

1-1808: File exceeds recommended 1000-line limit.

Per coding guidelines, file size SHOULD NOT exceed 1000 lines. This file is 1808 lines. Consider splitting into sub-modules:

Message state management

WebSocket event handlers

Skill handling

Stream control utilities

This would improve maintainability and testability.
backend/app/services/context/context_service.py (2)
36-39: Consider adding more context to NotFoundException.

The exception class is minimal. Consider adding context like context_id as an attribute for better error handling upstream.
🔎 Proposed enhancement
 class NotFoundException(Exception):
     """Exception raised when a context is not found."""
-
-    pass
+    
+    def __init__(self, message: str, context_id: int | None = None):
+        super().__init__(message)
+        self.context_id = context_id
154-157: Use logger.exception for exception logging.

Per static analysis, when logging in exception handlers, logger.exception automatically includes the stack trace, which aids debugging.
🔎 Proposed fix
         except StorageError as e:
-            logger.error(f"Failed to save context {context.id} to storage: {e}")
+            logger.exception(f"Failed to save context {context.id} to storage")
             db.rollback()
             raise
         except DocumentParseError as e:
-            logger.error(f"Document parsing failed for context {context.id}: {e}")
+            logger.exception(f"Document parsing failed for context {context.id}")
             context.status = ContextStatus.FAILED.value
             context.error_message = str(e)
             db.commit()
             raise
Also applies to: 185-190
backend/app/services/chat/preprocessing/contexts.py (2)
29-34: Unused user_id parameter in process_contexts.

The user_id parameter is passed but never used. Either remove it or add a comment explaining it's reserved for future access control checks.
🔎 Proposed fix (if removing)
 async def process_contexts(
     db: Session,
     context_ids: List[int],
-    user_id: int,
     message: str,
 ) -> str | dict[str, Any]:
Or if keeping for future use:
 async def process_contexts(
     db: Session,
     context_ids: List[int],
-    user_id: int,
+    user_id: int,  # Reserved for future access control checks
     message: str,
 ) -> str | dict[str, Any]:
79-81: Consider using logger.exception for full stack traces.

Per static analysis, replacing logger.error with logger.exception in exception handlers automatically captures the stack trace, which aids debugging.
🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Error processing context {context_id}: {e}")
+            logger.exception(f"Error processing context {context_id}")
             continue
Also applies to: 435-437

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d07e66 and 609816c.

📒 Files selected for processing (40)

backend/alembic/versions/o5p6q7r8s9t0_add_subtask_contexts_table.py
backend/app/api/endpoints/adapter/attachments.py
backend/app/api/ws/chat_namespace.py
backend/app/api/ws/events.py
backend/app/chat_shell/history/loader.py
backend/app/chat_shell/tools/builtin/knowledge_base.py
backend/app/chat_shell/tools/knowledge_factory.py
backend/app/core/config.py
backend/app/models/__init__.py
backend/app/models/knowledge.py
backend/app/models/subtask.py
backend/app/models/subtask_context.py
backend/app/schemas/subtask.py
backend/app/schemas/subtask_context.py
backend/app/schemas/task.py
backend/app/services/adapters/task_kinds.py
backend/app/services/attachment/mysql_storage.py
backend/app/services/chat/operations/retry.py
backend/app/services/chat/preprocessing/__init__.py
backend/app/services/chat/preprocessing/attachments.py
backend/app/services/chat/preprocessing/contexts.py
backend/app/services/chat/storage/task_contexts.py
backend/app/services/chat/trigger/core.py
backend/app/services/context/__init__.py
backend/app/services/context/context_service.py
backend/app/services/export/docx_generator.py
backend/app/services/knowledge_service.py
backend/app/services/rag/document_service.py
backend/app/services/shared_task.py
backend/app/services/subtask.py
frontend/src/features/tasks/components/chat/ChatArea.tsx
frontend/src/features/tasks/components/chat/useChatAreaState.ts
frontend/src/features/tasks/components/chat/useChatStreamHandlers.tsx
frontend/src/features/tasks/components/message/ContextBadgeList.tsx
frontend/src/features/tasks/components/message/MessageBubble.tsx
frontend/src/features/tasks/components/message/MessagesArea.tsx
frontend/src/features/tasks/contexts/chatStreamContext.tsx
frontend/src/features/tasks/hooks/useUnifiedMessages.ts
frontend/src/types/api.ts
frontend/src/types/socket.ts

💤 Files with no reviewable changes (1)

backend/app/services/chat/preprocessing/attachments.py

🧰 Additional context used

📓 Path-based instructions (10)

**/*.{py,ts,tsx,js,jsx}