Embedding Service Implementation by Ricardo1232 · Pull Request #17 · CodeandoGuadalajara/dof-chat

Ricardo1232 · 2026-01-06T21:02:18Z

Related Issue

Resolves #6

Implementation of EmbeddingService with Qwen/Qwen3-Embedding-0.6B model and integration into RAGService. Replaces mocks with real inference optimized for legal documents.

Key Changes

`service/embedding_service.py`

Thread-Safe Singleton: Ensures single instance with safe initialization (double-checked locking).
Lazy Initialization: On-demand model loading.
Inference: Uses torch.inference_mode() and production-optimized configuration.

`rag_service.py`

Service integration into main flow.
Removal of redundant helpers.
Centralization of fallback/mock logic in EmbeddingService.

…e with RAG service

Copilot

Pull request overview

This PR implements a real embedding service using the Qwen/Qwen3-Embedding-0.6B model to replace mock embeddings in the RAG pipeline. It introduces a thread-safe singleton service with lazy initialization and integrates it into the existing RAG workflow.

Key changes:

New EmbeddingService class with thread-safe singleton pattern and lazy model loading
Integration of the embedding service into RAGService, replacing the previous mock embed_query method
Addition of force_mock_mode configuration flag for testing without model loading

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 14 comments.

File	Description
service/embedding_service.py	New service implementing text embedding with Qwen model, thread-safe singleton pattern, and mock fallback
service/init.py	Module initialization exporting the EmbeddingService class
rag_service.py	Integration of embedding service, removal of mock embed_query method, delegation to EmbeddingService
config.py	Addition of force_mock_mode flag for testing configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-06T21:07:03Z

+        # Simplified singleton pattern - GIL provides basic atomicity
+        if cls._instance is None:
+            with cls._lock:
+                if cls._instance is None:
+                    cls._instance = super().__new__(cls)
+        return cls._instance
+
+    def __init__(self):
+        # Only initialize once
+        if not hasattr(self, '_initialized'):
+            self._initialized = False
+            self._model = None
+            self._init_lock = threading.Lock()  # Lock for initialization only


Race condition in singleton initialization. The __init__ method sets self._initialized = False if the attribute doesn't exist, but this can race with the initialize() method. If two threads call get_embedding_service() concurrently and both enter __init__ before either enters initialize(), they could have race conditions. Consider setting _initialized in __new__ or using a more robust initialization pattern to ensure thread safety during the initial setup phase.

Suggested change

# Simplified singleton pattern - GIL provides basic atomicity

if cls._instance is None:

with cls._lock:

if cls._instance is None:

cls._instance = super().__new__(cls)

return cls._instance

def __init__(self):

# Only initialize once

if not hasattr(self, '_initialized'):

self._initialized = False

self._model = None

self._init_lock = threading.Lock() # Lock for initialization only

# Thread-safe singleton pattern using double-checked locking

if cls._instance is None:

with cls._lock:

if cls._instance is None:

instance = super().__new__(cls)

# Initialize instance attributes once, under lock, to avoid races

instance._initialized = False

instance._model = None

instance._init_lock = threading.Lock() # Lock for initialization only

cls._instance = instance

return cls._instance

def __init__(self):

# Initialization of singleton state is handled in __new__ to ensure thread safety.

# __init__ is intentionally left as a no-op to avoid racing re-initialization.

pass

Copilot · 2026-01-06T21:07:03Z

+                logger.info(f"Embedding service ready ({settings.device}, max_seq: {settings.model_max_seq_length})")
+
+            except Exception as e:
+                logger.error(f"Failed to initialize embedding service: {e}")


Missing stack trace in error logging. The error log at line 90 logs the exception message but doesn't include the full stack trace, which makes debugging difficult. Consider using logger.error(f"Failed to initialize embedding service: {e}", exc_info=True) to include the full traceback, consistent with how errors are logged in rag_service.py at line 233.

Copilot · 2026-01-06T21:07:04Z

+
+def get_embedding_service() -> EmbeddingService:
+    """Retrieve the initialized EmbeddingService singleton."""
+    if not embedding_service._initialized:


Direct access to the private attribute _initialized breaks encapsulation. Consider adding a public method like is_initialized() or checking if the service is ready through a public interface instead of accessing _initialized directly.

Copilot · 2026-01-06T21:07:04Z

+
+
+def get_embedding_service() -> EmbeddingService:
+    """Retrieve the initialized EmbeddingService singleton."""


The get_embedding_service function exports the helper function but not the global embedding_service instance. However, looking at the service/init.py file, only EmbeddingService (the class) is exported in __all__, not get_embedding_service. This creates an inconsistency - the function exists but isn't part of the public API. Consider either adding get_embedding_service to __all__ in service/init.py, or making it clear that the function is for internal use only.

Suggested change

"""Retrieve the initialized EmbeddingService singleton."""

"""Retrieve the initialized EmbeddingService singleton.

Note:

This is an internal convenience helper and is not part of the

public ``service`` package API (it is not exported via

``service.__all__``). Prefer accessing :class:`EmbeddingService`

through the package's documented public interfaces.

"""

Copilot · 2026-01-06T21:07:04Z

+# Global singleton instance
+embedding_service = EmbeddingService()
+
+
+def get_embedding_service() -> EmbeddingService:
+    """Retrieve the initialized EmbeddingService singleton."""


The global embedding_service instance at line 181 is instantiated at module import time, which could lead to unexpected side effects. If the singleton initialization is deferred to when it's actually needed, consider using lazy initialization or documenting that module import triggers instance creation. This is inconsistent with the lazy initialization pattern described in the PR description.

Suggested change

# Global singleton instance

embedding_service = EmbeddingService()

def get_embedding_service() -> EmbeddingService:

"""Retrieve the initialized EmbeddingService singleton."""

# Global singleton instance (lazily instantiated)

embedding_service = None

def get_embedding_service() -> EmbeddingService:

"""Retrieve the initialized EmbeddingService singleton."""

global embedding_service

if embedding_service is None:

embedding_service = EmbeddingService()

Copilot · 2026-01-06T21:07:06Z

+    _lock = threading.Lock()
+
+    def __new__(cls):
+        # Simplified singleton pattern - GIL provides basic atomicity


The comment states "Simplified singleton pattern - GIL provides basic atomicity" but the implementation uses proper double-checked locking, which is actually more sophisticated than relying solely on the GIL. The comment is misleading since the code correctly implements double-checked locking for thread safety, not just relying on the GIL. Consider updating the comment to reflect the actual implementation.

Suggested change

# Simplified singleton pattern - GIL provides basic atomicity

# Thread-safe singleton using double-checked locking with a class-level lock

Copilot · 2026-01-06T21:07:06Z

+                # Optimize model for inference
+                self._model.to(settings.device)
+                self._model.eval()
+                torch.set_grad_enabled(False)


Setting torch.set_grad_enabled(False) globally affects the entire process, not just this model. This is a side effect that can interfere with other parts of the application or other models that may need gradients. Use the context manager torch.no_grad() or torch.inference_mode() during actual inference calls instead of setting this globally during initialization. The global setting could break training or fine-tuning workflows if they're added later.

Suggested change

torch.set_grad_enabled(False)

Copilot · 2026-01-06T21:07:07Z

+            formatted_text = f"query: {text}"
+            if settings.task_description:
+                formatted_text = f"{settings.task_description}: {text}"


The task formatting logic has a potential issue. When settings.task_description is set, it overrides the "query:" prefix entirely (line 120), but if it's empty or None, the code falls back to "query:" (line 118). This means the formatted text could be inconsistent. Consider making the logic explicit: if settings.task_description exists, use it with the text; otherwise, use "query:" as the prefix. The current implementation works but could be clearer about the intended behavior.

Suggested change

formatted_text = f"query: {text}"

if settings.task_description:

formatted_text = f"{settings.task_description}: {text}"

prefix = settings.task_description if settings.task_description else "query"

formatted_text = f"{prefix}: {text}"

Copilot · 2026-01-06T21:07:07Z

+        Returns:
+            List[float]: Randomized vector with fixed seed.
+        """
+        import random


Importing random inside a function is inefficient and non-idiomatic. The import statement should be moved to the top of the file with the other imports. Local imports are typically only used to avoid circular dependencies or for lazy loading of expensive modules, neither of which applies here.

Copilot · 2026-01-06T21:07:07Z

+                    self._model[0].max_position_embeddings = settings.model_max_seq_length
+
+                # Optimize model for inference
+                self._model.to(settings.device)


Redundant device placement. The model is already placed on the device via device_map parameter in the SentenceTransformer initialization (line 70), and then explicitly moved again with .to(settings.device) at line 82. This redundant operation is unnecessary and could cause issues or confusion. Remove the redundant .to() call since device placement is already handled by the device_map parameter.

Suggested change

self._model.to(settings.device)

…initialization

feat: implement embedding service for text vectorization and integrat…

61efa8b

…e with RAG service

Ricardo1232 requested review from Copilot and jackbravo January 6, 2026 21:02

Copilot started reviewing on behalf of Ricardo1232 January 6, 2026 21:02 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

feat: add embedding service availability check and improve singleton …

95a6677

…initialization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding Service Implementation#17

Embedding Service Implementation#17
Ricardo1232 wants to merge 2 commits intoCodeandoGuadalajara:mainfrom
Jorge5452:feat/embedding-service

Ricardo1232 commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def get_embedding_service() -> EmbeddingService:
		"""Retrieve the initialized EmbeddingService singleton."""

-    """Retrieve the initialized EmbeddingService singleton."""
+    """Retrieve the initialized EmbeddingService singleton.
+    Note:
+        This is an internal convenience helper and is not part of the
+        public ``service`` package API (it is not exported via
+        ``service.__all__``). Prefer accessing :class:`EmbeddingService`
+        through the package's documented public interfaces.
+    """

	# Simplified singleton pattern - GIL provides basic atomicity
	# Thread-safe singleton using double-checked locking with a class-level lock

Conversation

Ricardo1232 commented Jan 6, 2026

Related Issue

Key Changes

service/embedding_service.py

rag_service.py

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`service/embedding_service.py`

`rag_service.py`