Skip to content

Conversation

@kissghosts
Copy link
Collaborator

@kissghosts kissghosts commented Jan 5, 2026

Remove deprecated RAG code that is no longer used after the chat RAG refactoring:

  • Delete process_rag_if_needed and extract_knowledge_base_ids from processor.py
  • Delete entire rag_integration.py file (retrieve_and_assemble_rag_prompt)
  • Update module exports to remove deleted functions

RAG retrieval is now handled dynamically by KnowledgeBaseTool in preprocessing/contexts.py, making these legacy functions obsolete.

Summary by CodeRabbit

  • Refactor
    • Reorganized RAG (Retrieval Augmented Generation) functionality to be handled dynamically by the KnowledgeBaseTool component, improving system modularity and separation of concerns.

✏️ Tip: You can customize this high-level summary in your review settings.

Remove deprecated RAG code that is no longer used after the chat RAG refactoring:

- Delete `process_rag_if_needed` and `extract_knowledge_base_ids` from processor.py
- Delete entire `rag_integration.py` file (retrieve_and_assemble_rag_prompt)
- Update module exports to remove deleted functions

RAG retrieval is now handled dynamically by KnowledgeBaseTool in
preprocessing/contexts.py, making these legacy functions obsolete.
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

The pull request consolidates RAG processing by removing direct retrieval functions from the chat shell tools and restructuring the RAG service to delegate actual retrieval to KnowledgeBaseTool. Functions handling knowledge base retrieval and ID extraction are deleted or replaced with metadata-only extraction.

Changes

Cohort / File(s) Summary
Chat Shell Tools Cleanup
backend/app/chat_shell/tools/__init__.py, backend/app/chat_shell/tools/rag_integration.py
Removed retrieve_and_assemble_rag_prompt from public exports in __init__.py and deleted the entire rag_integration.py module, eliminating direct RAG retrieval logic that included KB validation, node retrieval, deduplication, context assembly, token limiting, and prompt generation.
RAG Service Restructuring
backend/app/services/chat/rag/__init__.py, backend/app/services/chat/rag/processor.py
Removed process_rag_if_needed and extract_knowledge_base_ids functions; updated process_context_and_rag to return only metadata for storage when tool-based RAG is enabled, delegating actual retrieval to KnowledgeBaseTool. Updated module docstring to reflect dynamic RAG handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Chat knowledge integration #649: Adds a similarly named retrieve_and_assemble_rag_prompt implementation in services/chat/rag_integration.py, directly related to the removal of the same function from the chat shell tools in this PR.

Poem

🐰 Behold the RAG refactor dance,
Where tools release their RAG advance,
To KnowledgeBaseTool's eager care—
Metadata floats; retrieval's there!
Less scattered, more intent, hooray! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing legacy RAG code after refactoring to use KnowledgeBaseTool instead.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/services/chat/rag/processor.py (1)

29-47: Update the docstring to match the actual implementation.

The docstring mentions an enable_deep_thinking parameter that doesn't exist in the function signature and describes behavior ("Performs full RAG retrieval and prompt assembly") that is no longer implemented. The current implementation only extracts context metadata when contexts and should_trigger_ai are both truthy, and always returns None for the rag_prompt.

🔎 Proposed fix for the docstring
     """
-    Process context metadata and RAG based on chat version.
+    Extract context metadata for tool-based RAG.
 
-    This function handles RAG processing differently based on enable_deep_thinking:
-    - enable_deep_thinking=True: Only extracts context metadata for tool-based RAG
-    - enable_deep_thinking=False: Performs full RAG retrieval and prompt assembly
-
-    For tool-enabled mode, KnowledgeBaseTool will handle retrieval dynamically.
+    When contexts are provided and AI should be triggered, this function extracts
+    context metadata for storage. KnowledgeBaseTool handles actual RAG retrieval
+    dynamically during tool execution.
 
     Args:
         message: Original user message
         contexts: List of context objects
-        should_trigger_ai: Whether AI should be triggered
+        should_trigger_ai: Whether AI should be triggered (tool-enabled mode)
         user_id: User ID
         db: Database session
 
     Returns:
-        Tuple of (context_metadata dict, rag_prompt string or None)
+        Tuple of (context_metadata dict or None, None). The rag_prompt is always None
+        as RAG retrieval is delegated to KnowledgeBaseTool.
     """
🧹 Nitpick comments (1)
backend/app/services/chat/rag/processor.py (1)

22-28: Remove unused parameters user_id and db from the function signature and call site.

The function body never references these parameters. Update both the function definition in backend/app/services/chat/rag/processor.py and the call site in backend/app/api/ws/chat_namespace.py:448 to remove them.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bb52e6 and 3444a82.

📒 Files selected for processing (4)
  • backend/app/chat_shell/tools/__init__.py
  • backend/app/chat_shell/tools/rag_integration.py
  • backend/app/services/chat/rag/__init__.py
  • backend/app/services/chat/rag/processor.py
💤 Files with no reviewable changes (2)
  • backend/app/chat_shell/tools/init.py
  • backend/app/chat_shell/tools/rag_integration.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)

Files:

  • backend/app/services/chat/rag/__init__.py
  • backend/app/services/chat/rag/processor.py
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants

Files:

  • backend/app/services/chat/rag/__init__.py
  • backend/app/services/chat/rag/processor.py
backend/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

backend/**/*.py: Python backend module imports MUST use uv run prefix when executing commands
Task and Workspace resources MUST use TaskResource model from app.models.task, not the Kind model
Ghost, Model, Shell, Bot, Team, and Skill CRDs MUST use Kind model from app.models.kind

Files:

  • backend/app/services/chat/rag/__init__.py
  • backend/app/services/chat/rag/processor.py
🧬 Code graph analysis (1)
backend/app/services/chat/rag/__init__.py (1)
backend/app/services/chat/rag/processor.py (1)
  • process_context_and_rag (22-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: E2E Tests (Shard 1/3)
  • GitHub Check: E2E Tests (Shard 2/3)
  • GitHub Check: E2E Tests (Shard 3/3)
  • GitHub Check: Test Frontend
  • GitHub Check: Test wegent CLI Integration
🔇 Additional comments (1)
backend/app/services/chat/rag/__init__.py (1)

1-18: LGTM! Public API correctly narrowed.

The module now correctly exposes only process_context_and_rag, and the docstring accurately reflects that RAG retrieval is delegated to KnowledgeBaseTool while this module handles metadata extraction.

Comment on lines +9 to +11
Note: RAG retrieval is now handled dynamically by KnowledgeBaseTool.
This module only extracts context metadata for storage.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for any remaining references to deleted RAG functions

# Search for process_rag_if_needed references
echo "=== Checking for process_rag_if_needed references ==="
rg -n 'process_rag_if_needed' --type py

echo -e "\n=== Checking for extract_knowledge_base_ids references ==="
rg -n 'extract_knowledge_base_ids' --type py

echo -e "\n=== Checking for retrieve_and_assemble_rag_prompt references ==="
rg -n 'retrieve_and_assemble_rag_prompt' --type py

Repository: wecode-ai/Wegent

Length of output: 503


🏁 Script executed:

#!/bin/bash
# Check the function definition in contexts.py
echo "=== extract_knowledge_base_ids definition ==="
sed -n '185,210p' backend/app/services/chat/preprocessing/contexts.py

# Check if extract_knowledge_base_ids is used anywhere (imported/called)
echo -e "\n=== All usages of extract_knowledge_base_ids (excluding definition) ==="
rg 'extract_knowledge_base_ids' --type py -B 2 -A 2

Repository: wecode-ai/Wegent

Length of output: 1943


🏁 Script executed:

#!/bin/bash
# Search for actual function calls to extract_knowledge_base_ids (excluding imports and definition)
echo "=== Checking for calls to extract_knowledge_base_ids ==="
rg 'extract_knowledge_base_ids\(' --type py

# Also check if process_rag_if_needed and retrieve_and_assemble_rag_prompt exist as definitions
echo -e "\n=== Checking for process_rag_if_needed definition ==="
rg 'def process_rag_if_needed' --type py

echo -e "\n=== Checking for retrieve_and_assemble_rag_prompt definition ==="
rg 'def retrieve_and_assemble_rag_prompt' --type py

Repository: wecode-ai/Wegent

Length of output: 321


Remove unused extract_knowledge_base_ids function.

The functions process_rag_if_needed and retrieve_and_assemble_rag_prompt have been successfully deleted with no remaining references. However, extract_knowledge_base_ids still exists in backend/app/services/chat/preprocessing/contexts.py and is exported from the module's __init__.py, but it is not called anywhere in the codebase. Remove this unused function to complete the cleanup.

🤖 Prompt for AI Agents
In backend/app/services/chat/rag/processor.py around lines 9 to 11, remove the
now-unused extract_knowledge_base_ids function: delete its entire definition
from this file, remove any export of it from the module's __init__.py (so it is
no longer re-exported), and ensure there are no remaining imports or references
elsewhere (run a quick grep/IDE search and update/remove any that appear). After
removal, run linters/tests to confirm no unresolved references remain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants