Proposal: multimodal embeddings #294

fzowl · 2025-03-06T15:42:37Z

Proposal to introduce multimodal embeddings and a reference implementation using VoyageAI multimodal embeddings api

Reformatting Adding VoyageAI to enum

bsbodden · 2025-10-01T00:05:24Z

Thank you for this exploratory work on multimodal embeddings support! The concept of adding multimodal vectorization capabilities to RedisVL is valuable and aligns well with the evolving landscape
of AI applications. We appreciate the effort that went into designing the BaseMultimodalVectorizer class and implementing the VoyageAI multimodal integration.

Why we're closing this PR

Since this PR is marked as DRAFT and assigned to a team member, it appears to be exploratory/proof-of-concept work. However, it has become significantly out of sync with the main branch and would
require substantial rework to be production-ready:

Merge Conflicts with Recent Architectural Changes

The PR cannot be cleanly rebased against main due to conflicts in:

redisvl/utils/vectorize/base.py - Recent caching refactor
pyproject.toml - Build system and dependency changes

Since this PR was opened, main has undergone significant refactoring including:

Introduction of the EmbeddingsCache system
New protected _embed/_embed_many methods
Switch from Poetry-specific to standard pyproject.toml format
Addition of skip_cache parameters and batch caching operations

The multimodal implementation would need to be updated to align with these architectural changes.

Missing Test Coverage

No tests were added for:

VoyageAIMultimodalVectorizer functionality
BaseMultimodalVectorizer base class
Image/URL/mixed content embedding
Integration with VoyageAI's multimodal API

For a feature of this scope, comprehensive test coverage is essential before merging.

Code Duplication and Architectural Concerns

The BaseMultimodalVectorizer class duplicates significant logic from BaseVectorizer:

Validation methods (check_dtype, check_dims)
Helper utilities (batchify, _process_embedding)
Async fallback patterns

This creates maintenance burden and divergence risk. Consider:

Having BaseMultimodalVectorizer inherit from or compose with BaseVectorizer
Extracting shared logic into mixins or utility functions
Leveraging the new caching infrastructure rather than reimplementing it

API Design Considerations

The current API has some inconsistencies that would benefit from design review:

Signature Incompatibility:

Text vectorizers

def embed(self, text: str, ...) -> Union[List[float], bytes]

Multimodal vectorizer

def embed(self, content: List[Union[str, HttpUrl, Image]], ...) -> Union[List[float], bytes]

The multimodal embed() takes a list while text vectorizers take a single item. This breaks interoperability and the Liskov Substitution Principle.

Confusing embed_many Signature:
def embed_many(
self,
contents: List[List[Union[str, HttpUrl, Image]]], # List of lists of mixed types
...
)

This nested list structure could be simplified for better usability.

Implementation Details

Some smaller issues that would need addressing:

Typo in directory name: multimidal/ should be multimodal/
Missing imports: New vectorizer not exposed in init.py
Inaccurate error messages: Line 285 says "Must pass in a list of str values" but accepts Images/URLs
Enum not updated: Vectorizers enum doesn't include multimodal type

What would be needed for future multimodal support

If we revisit multimodal embeddings in the future, here's what would make it production-ready:

Must-Have:

✅ Rebase against latest main and resolve all conflicts
✅ Comprehensive test suite including unit and integration tests
✅ Integrate with caching system - leverage the new EmbeddingsCache infrastructure
✅ Fix implementation details (typos, imports, error messages)
✅ Documentation with clear usage examples

Should-Have:

📐 Architectural alignment - reduce code duplication, possibly through inheritance or composition
🎨 API design review - ensure signatures are intuitive and consistent with existing patterns
📚 Update documentation - explain multimodal support in main docs

Despite these issues, this PR demonstrates vision for where RedisVL should go with multimodal support. These contributions are valuable even if the code itself can't be merged as-is. Thank you again and we look forward to seeing multimodal embeddings support land in a future iteration.

fzowl added 5 commits March 6, 2025 13:44

Batching logic

ac4c22e

Reformatting Adding VoyageAI to enum

Corrections

59b7966

Multimodal embedding proposal

e97b950

Multimodal embedding proposal

ff6c99f

Multimodal embedding proposal

feb4c28

fzowl mentioned this pull request Mar 6, 2025

Multimodal embeddings #260

Open

fzowl added 3 commits March 6, 2025 16:47

Updating the lock file

f8bcaaa

Correcting the lint errors

54473bf

Correcting the package name

8d72ef9

justin-cechmanek self-assigned this Jun 26, 2025

bsbodden closed this Oct 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: multimodal embeddings #294

Proposal: multimodal embeddings #294

fzowl commented Mar 6, 2025

Uh oh!

bsbodden commented Oct 1, 2025

Uh oh!

Uh oh!

Proposal: multimodal embeddings #294

Proposal: multimodal embeddings #294

Conversation

fzowl commented Mar 6, 2025

Uh oh!

bsbodden commented Oct 1, 2025

Text vectorizers

Multimodal vectorizer

Uh oh!

Uh oh!