feat(embedding): surface non-symmetric embedding config for VikingDB provider#1110
feat(embedding): surface non-symmetric embedding config for VikingDB provider#1110mvanhorn wants to merge 1 commit intovolcengine:mainfrom
Conversation
…provider VikingDB embedders accepted is_query but ignored it. Now VikingDBDenseEmbedder and VikingDBHybridEmbedder accept query_param/document_param and pass input_type to the API when non-symmetric mode is configured. - Add query_param/document_param to VikingDB Dense and Hybrid constructors - Add _resolve_input_type() to select query vs document param - Pass input_type in _call_api data items when set - Wire factory entries to pass config params through - Sparse embedder unchanged (sparse models are symmetric) Closes volcengine#655 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Failed to generate code suggestions for PR |
21f8fa9 to
91283b1
Compare
|
I think it might be more valuable to configure queries and documents as two separate models. |
|
The current implementation follows the Are you thinking of a config where query and document each point to distinct VikingDB model names? Something like: [embedding.dense]
provider = "vikingdb"
query_model = "bge-m3-query"
document_model = "bge-m3-passage"Happy to restructure if that's the direction you prefer. |
|
need to fix |
|
Could you clarify what needs fixing? Happy to update the implementation if there are specific changes you'd like to see. |
sorry, I mean the conflicts. Thanks |
Problem Statement
Non-symmetric embedding uses different representations for queries vs documents, improving retrieval quality for models that support it. The OpenAI, Gemini, Jina, and Minimax embedders already support
query_param/document_paraminov.conf. The VikingDB embedder acceptsis_querybut ignores it -- all calls use symmetric mode regardless of config.Closes #655.
Changes
VikingDBDenseEmbedder: acceptquery_param/document_param, passinput_typeto API data itemsVikingDBHybridEmbedder: same treatmentVikingDBClientMixin._call_api(): accept optionalinput_type, add to request data items when setquery_param/document_paramfrom config to VikingDB embedder constructorsConfig Example
When configured, retrieval calls
embed(text, is_query=True)which passesinput_type=queryin the API request. Indexing callsembed(text, is_query=False)which passesinput_type=passage.When not configured, behavior is unchanged (symmetric mode, no
input_typein request).Testing
4 unit tests:
_resolve_input_typereturns None in symmetric mode_resolve_input_typereturns correct param for query vs documentImplementation Notes
openai_embedders.py:213-216_resolve_input_type()is a shared helper on Dense and Hybrid embeddersHierarchicalRetrieveralready passesis_query=Truefor queries (line 132), so this works end-to-end once configuredFeature Area
Retrieval/Search
This contribution was developed with AI assistance (Claude Code).