Skip to content

Conversation

fzowl
Copy link
Contributor

@fzowl fzowl commented Mar 6, 2025

Proposal to introduce multimodal embeddings and a reference implementation using VoyageAI multimodal embeddings api

@fzowl fzowl mentioned this pull request Mar 6, 2025
@justin-cechmanek justin-cechmanek self-assigned this Jun 26, 2025
@bsbodden
Copy link
Collaborator

bsbodden commented Oct 1, 2025

Hi @fzowl,

Thank you for this exploratory work on multimodal embeddings support! The concept of adding multimodal vectorization capabilities to RedisVL is valuable and aligns well with the evolving landscape
of AI applications. We appreciate the effort that went into designing the BaseMultimodalVectorizer class and implementing the VoyageAI multimodal integration.

Why we're closing this PR

Since this PR is marked as DRAFT and assigned to a team member, it appears to be exploratory/proof-of-concept work. However, it has become significantly out of sync with the main branch and would
require substantial rework to be production-ready:

  1. Merge Conflicts with Recent Architectural Changes

The PR cannot be cleanly rebased against main due to conflicts in:

  • redisvl/utils/vectorize/base.py - Recent caching refactor
  • pyproject.toml - Build system and dependency changes

Since this PR was opened, main has undergone significant refactoring including:

  • Introduction of the EmbeddingsCache system
  • New protected _embed/_embed_many methods
  • Switch from Poetry-specific to standard pyproject.toml format
  • Addition of skip_cache parameters and batch caching operations

The multimodal implementation would need to be updated to align with these architectural changes.

  1. Missing Test Coverage

No tests were added for:

  • VoyageAIMultimodalVectorizer functionality
  • BaseMultimodalVectorizer base class
  • Image/URL/mixed content embedding
  • Integration with VoyageAI's multimodal API

For a feature of this scope, comprehensive test coverage is essential before merging.

  1. Code Duplication and Architectural Concerns

The BaseMultimodalVectorizer class duplicates significant logic from BaseVectorizer:

  • Validation methods (check_dtype, check_dims)
  • Helper utilities (batchify, _process_embedding)
  • Async fallback patterns

This creates maintenance burden and divergence risk. Consider:

  • Having BaseMultimodalVectorizer inherit from or compose with BaseVectorizer
  • Extracting shared logic into mixins or utility functions
  • Leveraging the new caching infrastructure rather than reimplementing it
  1. API Design Considerations

The current API has some inconsistencies that would benefit from design review:

Signature Incompatibility:

Text vectorizers

def embed(self, text: str, ...) -> Union[List[float], bytes]

Multimodal vectorizer

def embed(self, content: List[Union[str, HttpUrl, Image]], ...) -> Union[List[float], bytes]

The multimodal embed() takes a list while text vectorizers take a single item. This breaks interoperability and the Liskov Substitution Principle.

Confusing embed_many Signature:
def embed_many(
self,
contents: List[List[Union[str, HttpUrl, Image]]], # List of lists of mixed types
...
)

This nested list structure could be simplified for better usability.

  1. Implementation Details

Some smaller issues that would need addressing:

  • Typo in directory name: multimidal/ should be multimodal/
  • Missing imports: New vectorizer not exposed in init.py
  • Inaccurate error messages: Line 285 says "Must pass in a list of str values" but accepts Images/URLs
  • Enum not updated: Vectorizers enum doesn't include multimodal type

What would be needed for future multimodal support

If we revisit multimodal embeddings in the future, here's what would make it production-ready:

Must-Have:

  1. ✅ Rebase against latest main and resolve all conflicts
  2. ✅ Comprehensive test suite including unit and integration tests
  3. ✅ Integrate with caching system - leverage the new EmbeddingsCache infrastructure
  4. ✅ Fix implementation details (typos, imports, error messages)
  5. ✅ Documentation with clear usage examples

Should-Have:

  1. 📐 Architectural alignment - reduce code duplication, possibly through inheritance or composition
  2. 🎨 API design review - ensure signatures are intuitive and consistent with existing patterns
  3. 📚 Update documentation - explain multimodal support in main docs

Despite these issues, this PR demonstrates vision for where RedisVL should go with multimodal support. These contributions are valuable even if the code itself can't be merged as-is. Thank you again and we look forward to seeing multimodal embeddings support land in a future iteration.

@bsbodden bsbodden closed this Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants