How to use AzureOpenAITextEmbeddingGenerationService.UpsertBatchAsync #9846

ordinaryAndConfident · 2024-11-29T09:25:53Z

ordinaryAndConfident
Nov 29, 2024

My current confusion is that when I break down a document into many small fragments, using UpsertAsynchronous can lead to excessive database links. How can I solve this problem

Answered by westey-m

Nov 29, 2024

@ordinaryAndConfident when considering a chunking strategy it's important to strike a balance on size. If you create chunks that are too small they may be less likely to be found when doing vector searches. If the chunks are too big, they can increase token usage when passed to the LLM for context. The right size will also depend on the type of data that you want to generate embeddings for, so experimenting with different sizes is important.

View full answer

westey-m · 2024-11-29T14:20:55Z

westey-m
Nov 29, 2024
Collaborator

@ordinaryAndConfident when considering a chunking strategy it's important to strike a balance on size. If you create chunks that are too small they may be less likely to be found when doing vector searches. If the chunks are too big, they can increase token usage when passed to the LLM for context. The right size will also depend on the type of data that you want to generate embeddings for, so experimenting with different sizes is important.

1 reply

ordinaryAndConfident Dec 25, 2024
Author

Now I have encountered a new problem. When the number of documents I need to upload is too large, the efficiency of uploading to CosmosDB Nosql will become particularly low because CosmosDb only supports single-item uploads. Should I change the database?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use AzureOpenAITextEmbeddingGenerationService.UpsertBatchAsync #9846

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to use AzureOpenAITextEmbeddingGenerationService.UpsertBatchAsync #9846

ordinaryAndConfident Nov 29, 2024

Replies: 1 comment · 1 reply

westey-m Nov 29, 2024 Collaborator

ordinaryAndConfident Dec 25, 2024 Author

ordinaryAndConfident
Nov 29, 2024

Replies: 1 comment 1 reply

westey-m
Nov 29, 2024
Collaborator

ordinaryAndConfident Dec 25, 2024
Author