Skip to content

feat: Integrate OpenSearch as Vector Store #46

Description

@ericksonlopes

Description

Integrate OpenSearch as a new vector database option for the WhatYouSaid ecosystem. OpenSearch is a powerful, distributed, open-source search and analytics suite that supports k-NN (k-Nearest Neighbors) search, making it an excellent candidate for enterprise-grade RAG (Retrieval-Augmented Generation) applications. This integration will provide users with more flexibility in choosing a vector store that meets their scalability and high-availability requirements.

Tasks

  • Domain: Add OPENSEARCH = "opensearch" to the VectorStoreType enum in src/domain/entities/enums/vector_store_type_enum.py.
  • Config: Update VectorConfig in src/config/settings.py to include OpenSearch configuration fields (host, port, user, password, use_ssl, verify_certs).
  • Infrastructure/Repositories: Create a new OpenSearch repository implementation in src/infrastructure/repositories/vector/opensearch/.
    • Implement opensearch_client.py for connection management.
    • Implement chunk_repository.py following the IVectorRepository interface.
  • Dependencies: Add opensearch-py to the project dependencies in pyproject.toml.
  • API Dependencies: Update get_vector_repository in src/presentation/api/dependencies.py to instantiate the OpenSearch repository when selected.
  • Frontend: Update UI settings (e.g., SettingsModal.tsx) to allow selecting OpenSearch and configuring its parameters.
  • Documentation: Update .env.example and README.md with OpenSearch setup instructions.

Additional Context

OpenSearch k-NN search requires specific index settings (e.g., "index.knn": "true"). The implementation should handle index creation with the correct mapping for the chosen embedding model's dimensionality. Reference existing implementations for Weaviate and Chroma for consistency in how metadata and content are stored and retrieved.

Metadata

Metadata

Assignees

Labels

backendBackend services, API development, and server-side logic related issuesvector databaseVector database operations, embeddings, and semantic search implementation issues

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions