Skip to content

Implement Contextual Retrieval and Contextual Preprocessing #4003

@franciscojavierarceo

Description

@franciscojavierarceo

🚀 Describe the new functionality needed

Anthropic published Contextual Retrieval and Contextual Preprocessing and I think adding this behavior to the stack would be beneficial (we have several customers that have asked for this to be included in the stack explicitly).

In short, Anthropic recommends using an LLM to summarize a chunk within the broader context of a document before embedding it.

An example:

original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."

The contextualized_chunk is generated through inference and this prompt:

<document> 
{{WHOLE_DOCUMENT}} 
</document> 
Here is the chunk we want to situate within the whole document 
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. 

This extension would require an enhancement to our existing OpenAIVectorStoreMixin.openai_attach_file_to_vector_store() behavior and we could make this a configurable option.

💡 Why is this needed? What if we don't build it?

Improvements to context processing and retrieval.

Other thoughts

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions