-
Notifications
You must be signed in to change notification settings - Fork 67
refactor(rag): remove unused legacy RAG code after refactoring #830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Remove deprecated RAG code that is no longer used after the chat RAG refactoring: - Delete `process_rag_if_needed` and `extract_knowledge_base_ids` from processor.py - Delete entire `rag_integration.py` file (retrieve_and_assemble_rag_prompt) - Update module exports to remove deleted functions RAG retrieval is now handled dynamically by KnowledgeBaseTool in preprocessing/contexts.py, making these legacy functions obsolete.
📝 WalkthroughWalkthroughThe pull request consolidates RAG processing by removing direct retrieval functions from the chat shell tools and restructuring the RAG service to delegate actual retrieval to KnowledgeBaseTool. Functions handling knowledge base retrieval and ID extraction are deleted or replaced with metadata-only extraction. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/services/chat/rag/processor.py (1)
29-47: Update the docstring to match the actual implementation.The docstring mentions an
enable_deep_thinkingparameter that doesn't exist in the function signature and describes behavior ("Performs full RAG retrieval and prompt assembly") that is no longer implemented. The current implementation only extracts context metadata whencontextsandshould_trigger_aiare both truthy, and always returnsNonefor the rag_prompt.🔎 Proposed fix for the docstring
""" - Process context metadata and RAG based on chat version. + Extract context metadata for tool-based RAG. - This function handles RAG processing differently based on enable_deep_thinking: - - enable_deep_thinking=True: Only extracts context metadata for tool-based RAG - - enable_deep_thinking=False: Performs full RAG retrieval and prompt assembly - - For tool-enabled mode, KnowledgeBaseTool will handle retrieval dynamically. + When contexts are provided and AI should be triggered, this function extracts + context metadata for storage. KnowledgeBaseTool handles actual RAG retrieval + dynamically during tool execution. Args: message: Original user message contexts: List of context objects - should_trigger_ai: Whether AI should be triggered + should_trigger_ai: Whether AI should be triggered (tool-enabled mode) user_id: User ID db: Database session Returns: - Tuple of (context_metadata dict, rag_prompt string or None) + Tuple of (context_metadata dict or None, None). The rag_prompt is always None + as RAG retrieval is delegated to KnowledgeBaseTool. """
🧹 Nitpick comments (1)
backend/app/services/chat/rag/processor.py (1)
22-28: Remove unused parametersuser_idanddbfrom the function signature and call site.The function body never references these parameters. Update both the function definition in
backend/app/services/chat/rag/processor.pyand the call site inbackend/app/api/ws/chat_namespace.py:448to remove them.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/chat_shell/tools/__init__.pybackend/app/chat_shell/tools/rag_integration.pybackend/app/services/chat/rag/__init__.pybackend/app/services/chat/rag/processor.py
💤 Files with no reviewable changes (2)
- backend/app/chat_shell/tools/init.py
- backend/app/chat_shell/tools/rag_integration.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
**/*.{py,ts,tsx,js,jsx}: All code comments MUST be written in English
File size MUST NOT exceed 1000 lines - split into multiple sub-modules if exceeded
Function length SHOULD NOT exceed 50 lines (preferred)
Files:
backend/app/services/chat/rag/__init__.pybackend/app/services/chat/rag/processor.py
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Python code MUST follow PEP 8, use Black formatter with line length 88, and isort for imports
Python code MUST include type hints
Python functions and classes MUST have descriptive names and docstrings for public functions/classes
Python MUST extract magic numbers to named constants
Files:
backend/app/services/chat/rag/__init__.pybackend/app/services/chat/rag/processor.py
backend/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
backend/**/*.py: Python backend module imports MUST useuv runprefix when executing commands
Task and Workspace resources MUST use TaskResource model fromapp.models.task, not the Kind model
Ghost, Model, Shell, Bot, Team, and Skill CRDs MUST use Kind model fromapp.models.kind
Files:
backend/app/services/chat/rag/__init__.pybackend/app/services/chat/rag/processor.py
🧬 Code graph analysis (1)
backend/app/services/chat/rag/__init__.py (1)
backend/app/services/chat/rag/processor.py (1)
process_context_and_rag(22-65)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: E2E Tests (Shard 1/3)
- GitHub Check: E2E Tests (Shard 2/3)
- GitHub Check: E2E Tests (Shard 3/3)
- GitHub Check: Test Frontend
- GitHub Check: Test wegent CLI Integration
🔇 Additional comments (1)
backend/app/services/chat/rag/__init__.py (1)
1-18: LGTM! Public API correctly narrowed.The module now correctly exposes only
process_context_and_rag, and the docstring accurately reflects that RAG retrieval is delegated to KnowledgeBaseTool while this module handles metadata extraction.
| Note: RAG retrieval is now handled dynamically by KnowledgeBaseTool. | ||
| This module only extracts context metadata for storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for any remaining references to deleted RAG functions
# Search for process_rag_if_needed references
echo "=== Checking for process_rag_if_needed references ==="
rg -n 'process_rag_if_needed' --type py
echo -e "\n=== Checking for extract_knowledge_base_ids references ==="
rg -n 'extract_knowledge_base_ids' --type py
echo -e "\n=== Checking for retrieve_and_assemble_rag_prompt references ==="
rg -n 'retrieve_and_assemble_rag_prompt' --type pyRepository: wecode-ai/Wegent
Length of output: 503
🏁 Script executed:
#!/bin/bash
# Check the function definition in contexts.py
echo "=== extract_knowledge_base_ids definition ==="
sed -n '185,210p' backend/app/services/chat/preprocessing/contexts.py
# Check if extract_knowledge_base_ids is used anywhere (imported/called)
echo -e "\n=== All usages of extract_knowledge_base_ids (excluding definition) ==="
rg 'extract_knowledge_base_ids' --type py -B 2 -A 2Repository: wecode-ai/Wegent
Length of output: 1943
🏁 Script executed:
#!/bin/bash
# Search for actual function calls to extract_knowledge_base_ids (excluding imports and definition)
echo "=== Checking for calls to extract_knowledge_base_ids ==="
rg 'extract_knowledge_base_ids\(' --type py
# Also check if process_rag_if_needed and retrieve_and_assemble_rag_prompt exist as definitions
echo -e "\n=== Checking for process_rag_if_needed definition ==="
rg 'def process_rag_if_needed' --type py
echo -e "\n=== Checking for retrieve_and_assemble_rag_prompt definition ==="
rg 'def retrieve_and_assemble_rag_prompt' --type pyRepository: wecode-ai/Wegent
Length of output: 321
Remove unused extract_knowledge_base_ids function.
The functions process_rag_if_needed and retrieve_and_assemble_rag_prompt have been successfully deleted with no remaining references. However, extract_knowledge_base_ids still exists in backend/app/services/chat/preprocessing/contexts.py and is exported from the module's __init__.py, but it is not called anywhere in the codebase. Remove this unused function to complete the cleanup.
🤖 Prompt for AI Agents
In backend/app/services/chat/rag/processor.py around lines 9 to 11, remove the
now-unused extract_knowledge_base_ids function: delete its entire definition
from this file, remove any export of it from the module's __init__.py (so it is
no longer re-exported), and ensure there are no remaining imports or references
elsewhere (run a quick grep/IDE search and update/remove any that appear). After
removal, run linters/tests to confirm no unresolved references remain.
Remove deprecated RAG code that is no longer used after the chat RAG refactoring:
process_rag_if_neededandextract_knowledge_base_idsfrom processor.pyrag_integration.pyfile (retrieve_and_assemble_rag_prompt)RAG retrieval is now handled dynamically by KnowledgeBaseTool in preprocessing/contexts.py, making these legacy functions obsolete.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.