-
Notifications
You must be signed in to change notification settings - Fork 57
Proposal: multimodal embeddings #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reformatting Adding VoyageAI to enum
Hi @fzowl, Thank you for this exploratory work on multimodal embeddings support! The concept of adding multimodal vectorization capabilities to RedisVL is valuable and aligns well with the evolving landscape Why we're closing this PR Since this PR is marked as DRAFT and assigned to a team member, it appears to be exploratory/proof-of-concept work. However, it has become significantly out of sync with the main branch and would
The PR cannot be cleanly rebased against main due to conflicts in:
Since this PR was opened, main has undergone significant refactoring including:
The multimodal implementation would need to be updated to align with these architectural changes.
No tests were added for:
For a feature of this scope, comprehensive test coverage is essential before merging.
The BaseMultimodalVectorizer class duplicates significant logic from BaseVectorizer:
This creates maintenance burden and divergence risk. Consider:
The current API has some inconsistencies that would benefit from design review: Signature Incompatibility: Text vectorizersdef embed(self, text: str, ...) -> Union[List[float], bytes] Multimodal vectorizerdef embed(self, content: List[Union[str, HttpUrl, Image]], ...) -> Union[List[float], bytes] The multimodal embed() takes a list while text vectorizers take a single item. This breaks interoperability and the Liskov Substitution Principle. Confusing embed_many Signature: This nested list structure could be simplified for better usability.
Some smaller issues that would need addressing:
What would be needed for future multimodal support If we revisit multimodal embeddings in the future, here's what would make it production-ready: Must-Have:
Should-Have:
Despite these issues, this PR demonstrates vision for where RedisVL should go with multimodal support. These contributions are valuable even if the code itself can't be merged as-is. Thank you again and we look forward to seeing multimodal embeddings support land in a future iteration. |
Proposal to introduce multimodal embeddings and a reference implementation using VoyageAI multimodal embeddings api