-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
🚀 Describe the new functionality needed
Anthropic published Contextual Retrieval and Contextual Preprocessing and I think adding this behavior to the stack would be beneficial (we have several customers that have asked for this to be included in the stack explicitly).
In short, Anthropic recommends using an LLM to summarize a chunk within the broader context of a document before embedding it.
An example:
original_chunk = "The company's revenue grew by 3% over the previous quarter."
contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."The contextualized_chunk is generated through inference and this prompt:
<document>
{{WHOLE_DOCUMENT}}
</document>
Here is the chunk we want to situate within the whole document
<chunk>
{{CHUNK_CONTENT}}
</chunk>
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. This extension would require an enhancement to our existing OpenAIVectorStoreMixin.openai_attach_file_to_vector_store() behavior and we could make this a configurable option.
💡 Why is this needed? What if we don't build it?
Improvements to context processing and retrieval.
Other thoughts
No response
zanetworkerr3v5
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request