Skip to content

Embedding Service Implementation#17

Open
Ricardo1232 wants to merge 2 commits intoCodeandoGuadalajara:mainfrom
Jorge5452:feat/embedding-service
Open

Embedding Service Implementation#17
Ricardo1232 wants to merge 2 commits intoCodeandoGuadalajara:mainfrom
Jorge5452:feat/embedding-service

Conversation

@Ricardo1232
Copy link
Copy Markdown
Collaborator

Related Issue

Resolves #6

Implementation of EmbeddingService with Qwen/Qwen3-Embedding-0.6B model and integration into RAGService. Replaces mocks with real inference optimized for legal documents.

Key Changes

service/embedding_service.py

  • Thread-Safe Singleton: Ensures single instance with safe initialization (double-checked locking).
  • Lazy Initialization: On-demand model loading.
  • Inference: Uses torch.inference_mode() and production-optimized configuration.

rag_service.py

  • Service integration into main flow.
  • Removal of redundant helpers.
  • Centralization of fallback/mock logic in EmbeddingService.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a real embedding service using the Qwen/Qwen3-Embedding-0.6B model to replace mock embeddings in the RAG pipeline. It introduces a thread-safe singleton service with lazy initialization and integrates it into the existing RAG workflow.

Key changes:

  • New EmbeddingService class with thread-safe singleton pattern and lazy model loading
  • Integration of the embedding service into RAGService, replacing the previous mock embed_query method
  • Addition of force_mock_mode configuration flag for testing without model loading

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 14 comments.

File Description
service/embedding_service.py New service implementing text embedding with Qwen model, thread-safe singleton pattern, and mock fallback
service/init.py Module initialization exporting the EmbeddingService class
rag_service.py Integration of embedding service, removal of mock embed_query method, delegation to EmbeddingService
config.py Addition of force_mock_mode flag for testing configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread service/embedding_service.py Outdated
Comment on lines +26 to +38
# Simplified singleton pattern - GIL provides basic atomicity
if cls._instance is None:
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance

def __init__(self):
# Only initialize once
if not hasattr(self, '_initialized'):
self._initialized = False
self._model = None
self._init_lock = threading.Lock() # Lock for initialization only
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition in singleton initialization. The __init__ method sets self._initialized = False if the attribute doesn't exist, but this can race with the initialize() method. If two threads call get_embedding_service() concurrently and both enter __init__ before either enters initialize(), they could have race conditions. Consider setting _initialized in __new__ or using a more robust initialization pattern to ensure thread safety during the initial setup phase.

Suggested change
# Simplified singleton pattern - GIL provides basic atomicity
if cls._instance is None:
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def __init__(self):
# Only initialize once
if not hasattr(self, '_initialized'):
self._initialized = False
self._model = None
self._init_lock = threading.Lock() # Lock for initialization only
# Thread-safe singleton pattern using double-checked locking
if cls._instance is None:
with cls._lock:
if cls._instance is None:
instance = super().__new__(cls)
# Initialize instance attributes once, under lock, to avoid races
instance._initialized = False
instance._model = None
instance._init_lock = threading.Lock() # Lock for initialization only
cls._instance = instance
return cls._instance
def __init__(self):
# Initialization of singleton state is handled in __new__ to ensure thread safety.
# __init__ is intentionally left as a no-op to avoid racing re-initialization.
pass

Copilot uses AI. Check for mistakes.
Comment thread service/embedding_service.py Outdated
logger.info(f"Embedding service ready ({settings.device}, max_seq: {settings.model_max_seq_length})")

except Exception as e:
logger.error(f"Failed to initialize embedding service: {e}")
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing stack trace in error logging. The error log at line 90 logs the exception message but doesn't include the full stack trace, which makes debugging difficult. Consider using logger.error(f"Failed to initialize embedding service: {e}", exc_info=True) to include the full traceback, consistent with how errors are logged in rag_service.py at line 233.

Copilot uses AI. Check for mistakes.
Comment thread service/embedding_service.py Outdated

def get_embedding_service() -> EmbeddingService:
"""Retrieve the initialized EmbeddingService singleton."""
if not embedding_service._initialized:
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct access to the private attribute _initialized breaks encapsulation. Consider adding a public method like is_initialized() or checking if the service is ready through a public interface instead of accessing _initialized directly.

Copilot uses AI. Check for mistakes.


def get_embedding_service() -> EmbeddingService:
"""Retrieve the initialized EmbeddingService singleton."""
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_embedding_service function exports the helper function but not the global embedding_service instance. However, looking at the service/init.py file, only EmbeddingService (the class) is exported in __all__, not get_embedding_service. This creates an inconsistency - the function exists but isn't part of the public API. Consider either adding get_embedding_service to __all__ in service/init.py, or making it clear that the function is for internal use only.

Suggested change
"""Retrieve the initialized EmbeddingService singleton."""
"""Retrieve the initialized EmbeddingService singleton.
Note:
This is an internal convenience helper and is not part of the
public ``service`` package API (it is not exported via
``service.__all__``). Prefer accessing :class:`EmbeddingService`
through the package's documented public interfaces.
"""

Copilot uses AI. Check for mistakes.
Comment thread service/embedding_service.py Outdated
Comment on lines +180 to +185
# Global singleton instance
embedding_service = EmbeddingService()


def get_embedding_service() -> EmbeddingService:
"""Retrieve the initialized EmbeddingService singleton."""
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global embedding_service instance at line 181 is instantiated at module import time, which could lead to unexpected side effects. If the singleton initialization is deferred to when it's actually needed, consider using lazy initialization or documenting that module import triggers instance creation. This is inconsistent with the lazy initialization pattern described in the PR description.

Suggested change
# Global singleton instance
embedding_service = EmbeddingService()
def get_embedding_service() -> EmbeddingService:
"""Retrieve the initialized EmbeddingService singleton."""
# Global singleton instance (lazily instantiated)
embedding_service = None
def get_embedding_service() -> EmbeddingService:
"""Retrieve the initialized EmbeddingService singleton."""
global embedding_service
if embedding_service is None:
embedding_service = EmbeddingService()

Copilot uses AI. Check for mistakes.
Comment thread service/embedding_service.py Outdated
_lock = threading.Lock()

def __new__(cls):
# Simplified singleton pattern - GIL provides basic atomicity
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Simplified singleton pattern - GIL provides basic atomicity" but the implementation uses proper double-checked locking, which is actually more sophisticated than relying solely on the GIL. The comment is misleading since the code correctly implements double-checked locking for thread safety, not just relying on the GIL. Consider updating the comment to reflect the actual implementation.

Suggested change
# Simplified singleton pattern - GIL provides basic atomicity
# Thread-safe singleton using double-checked locking with a class-level lock

Copilot uses AI. Check for mistakes.
# Optimize model for inference
self._model.to(settings.device)
self._model.eval()
torch.set_grad_enabled(False)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting torch.set_grad_enabled(False) globally affects the entire process, not just this model. This is a side effect that can interfere with other parts of the application or other models that may need gradients. Use the context manager torch.no_grad() or torch.inference_mode() during actual inference calls instead of setting this globally during initialization. The global setting could break training or fine-tuning workflows if they're added later.

Suggested change
torch.set_grad_enabled(False)

Copilot uses AI. Check for mistakes.
Comment on lines +118 to +120
formatted_text = f"query: {text}"
if settings.task_description:
formatted_text = f"{settings.task_description}: {text}"
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task formatting logic has a potential issue. When settings.task_description is set, it overrides the "query:" prefix entirely (line 120), but if it's empty or None, the code falls back to "query:" (line 118). This means the formatted text could be inconsistent. Consider making the logic explicit: if settings.task_description exists, use it with the text; otherwise, use "query:" as the prefix. The current implementation works but could be clearer about the intended behavior.

Suggested change
formatted_text = f"query: {text}"
if settings.task_description:
formatted_text = f"{settings.task_description}: {text}"
prefix = settings.task_description if settings.task_description else "query"
formatted_text = f"{prefix}: {text}"

Copilot uses AI. Check for mistakes.
Comment thread service/embedding_service.py Outdated
Returns:
List[float]: Randomized vector with fixed seed.
"""
import random
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing random inside a function is inefficient and non-idiomatic. The import statement should be moved to the top of the file with the other imports. Local imports are typically only used to avoid circular dependencies or for lazy loading of expensive modules, neither of which applies here.

Copilot uses AI. Check for mistakes.
self._model[0].max_position_embeddings = settings.model_max_seq_length

# Optimize model for inference
self._model.to(settings.device)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant device placement. The model is already placed on the device via device_map parameter in the SentenceTransformer initialization (line 70), and then explicitly moved again with .to(settings.device) at line 82. This redundant operation is unnecessary and could cause issues or confusion. Remove the redundant .to() call since device placement is already handled by the device_map parameter.

Suggested change
self._model.to(settings.device)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(rag): Create EmbeddingService and integrate it into RAGService

2 participants