Implement Contextual Retrieval and Contextual Preprocessing

### 🚀 Describe the new functionality needed

Anthropic published [Contextual Retrieval and Contextual Preprocessing](https://www.anthropic.com/engineering/contextual-retrieval) and I think adding this behavior to the stack would be beneficial (we have several customers that have asked for this to be included in the stack explicitly).

In short, Anthropic recommends using an LLM to summarize a chunk within the broader context of a document before embedding it.

An example:

```python
original_chunk = "The company's revenue grew by 3% over the previous quarter."

contextualized_chunk = "This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter."
```

The `contextualized_chunk` is generated through inference and this prompt:

```md
<document> 
{{WHOLE_DOCUMENT}} 
</document> 
Here is the chunk we want to situate within the whole document 
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. 
```

This extension would require an enhancement to our existing `OpenAIVectorStoreMixin.openai_attach_file_to_vector_store()` behavior and we could make this a configurable option.

### 💡 Why is this needed? What if we don't build it?

Improvements to context processing and retrieval.

### Other thoughts

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Contextual Retrieval and Contextual Preprocessing #4003

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Contextual Retrieval and Contextual Preprocessing #4003

Description

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions