Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: AzureCosmosDBNoSqlVectorSearch query error "One of the input values is invalid." #17384

Open
nqtung opened this issue Dec 28, 2024 · 3 comments
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized

Comments

@nqtung
Copy link
Contributor

nqtung commented Dec 28, 2024

Bug Description

Got the error message when trying to query index from CosmosDBNoSQL by following the example at https://docs.llamaindex.ai/en/stable/examples/vector_stores/AzureCosmosDBNoSqlDemo/

The error message:

CosmosHttpResponseError: (BadRequest) One of the input values is invalid.
ActivityId: 3a2a7521-b72a-4723-bb4a-2fe8d49233c6, Windows/10.0.20348 cosmos-netstandard-sdk/3.18.0
Code: BadRequest
Message: One of the input values is invalid.\r
ActivityId: 3a2a7521-b72a-4723-bb4a-2fe8d49233c6, Windows/10.0.20348 cosmos-netstandard-sdk/3.18.0"

The package dependency

  • llama_index-0.10.68.dist-info
  • llama_index_vector_stores_azurecosmosnosql-1.3.0.dist-info
  • azure_cosmos-4.9.0.dist-info

Version

0.10.68

Steps to Reproduce

Relevant Logs/Tracbacks

{
	"name": "CosmosHttpResponseError",
	"message": "(BadRequest) One of the input values is invalid.\r
ActivityId: 3a2a7521-b72a-4723-bb4a-2fe8d49233c6, Windows/10.0.20348 cosmos-netstandard-sdk/3.18.0
Code: BadRequest
Message: One of the input values is invalid.\r
ActivityId: 3a2a7521-b72a-4723-bb4a-2fe8d49233c6, Windows/10.0.20348 cosmos-netstandard-sdk/3.18.0",
	"stack": "---------------------------------------------------------------------------
CosmosHttpResponseError                   Traceback (most recent call last)
Cell In[4], line 1
----> 1 response = query_engine.query(\"What is a buffer?\")
      2 print(textwrap.fill(str(response), 100))

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\instrumentation\\dispatcher.py:260, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    252 self.span_enter(
    253     id_=id_,
    254     bound_args=bound_args,
   (...)
    257     tags=tags,
    258 )
    259 try:
--> 260     result = func(*args, **kwargs)
    261 except BaseException as e:
    262     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\base\\base_query_engine.py:52, in BaseQueryEngine.query(self, str_or_query_bundle)
     50     if isinstance(str_or_query_bundle, str):
     51         str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 52     query_result = self._query(str_or_query_bundle)
     53 dispatcher.event(
     54     QueryEndEvent(query=str_or_query_bundle, response=query_result)
     55 )
     56 return query_result

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\instrumentation\\dispatcher.py:260, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    252 self.span_enter(
    253     id_=id_,
    254     bound_args=bound_args,
   (...)
    257     tags=tags,
    258 )
    259 try:
--> 260     result = func(*args, **kwargs)
    261 except BaseException as e:
    262     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\query_engine\\retriever_query_engine.py:189, in RetrieverQueryEngine._query(self, query_bundle)
    185 \"\"\"Answer a query.\"\"\"
    186 with self.callback_manager.event(
    187     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_bundle.query_str}
    188 ) as query_event:
--> 189     nodes = self.retrieve(query_bundle)
    190     response = self._response_synthesizer.synthesize(
    191         query=query_bundle,
    192         nodes=nodes,
    193     )
    194     query_event.on_end(payload={EventPayload.RESPONSE: response})

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\query_engine\\retriever_query_engine.py:144, in RetrieverQueryEngine.retrieve(self, query_bundle)
    143 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
--> 144     nodes = self._retriever.retrieve(query_bundle)
    145     return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\instrumentation\\dispatcher.py:260, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    252 self.span_enter(
    253     id_=id_,
    254     bound_args=bound_args,
   (...)
    257     tags=tags,
    258 )
    259 try:
--> 260     result = func(*args, **kwargs)
    261 except BaseException as e:
    262     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\base\\base_retriever.py:243, in BaseRetriever.retrieve(self, str_or_query_bundle)
    238 with self.callback_manager.as_trace(\"query\"):
    239     with self.callback_manager.event(
    240         CBEventType.RETRIEVE,
    241         payload={EventPayload.QUERY_STR: query_bundle.query_str},
    242     ) as retrieve_event:
--> 243         nodes = self._retrieve(query_bundle)
    244         nodes = self._handle_recursive_retrieval(query_bundle, nodes)
    245         retrieve_event.on_end(
    246             payload={EventPayload.NODES: nodes},
    247         )

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\instrumentation\\dispatcher.py:260, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs)
    252 self.span_enter(
    253     id_=id_,
    254     bound_args=bound_args,
   (...)
    257     tags=tags,
    258 )
    259 try:
--> 260     result = func(*args, **kwargs)
    261 except BaseException as e:
    262     self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\indices\\vector_store\\retrievers\\retriever.py:101, in VectorIndexRetriever._retrieve(self, query_bundle)
     95     if query_bundle.embedding is None and len(query_bundle.embedding_strs) > 0:
     96         query_bundle.embedding = (
     97             self._embed_model.get_agg_embedding_from_queries(
     98                 query_bundle.embedding_strs
     99             )
    100         )
--> 101 return self._get_nodes_with_embeddings(query_bundle)

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\core\\indices\\vector_store\\retrievers\\retriever.py:177, in VectorIndexRetriever._get_nodes_with_embeddings(self, query_bundle_with_embeddings)
    173 def _get_nodes_with_embeddings(
    174     self, query_bundle_with_embeddings: QueryBundle
    175 ) -> List[NodeWithScore]:
    176     query = self._build_vector_store_query(query_bundle_with_embeddings)
--> 177     query_result = self._vector_store.query(query, **self._kwargs)
    178     return self._build_node_list_from_query_result(query_result)

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\vector_stores\\azurecosmosnosql\\base.py:393, in AzureCosmosDBNoSqlVectorSearch.query(self, query, **kwargs)
    384 def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
    385     \"\"\"Query index for top k most similar nodes.
    386 
    387     Args:
   (...)
    391         A VectorStoreQueryResult containing the results of the query.
    392     \"\"\"
--> 393     return self._query(query, **kwargs)

File ____\\llm2024\\.venv\\Lib\\site-packages\\llama_index\\vector_stores\\azurecosmosnosql\\base.py:363, in AzureCosmosDBNoSqlVectorSearch._query(self, query, **kwargs)
    360 print(f\"_query - query: {query}\")
    361 print(f\"_query - parameters: {parameters}\")
--> 363 for item in self._container.query_items(
    364     query=query,
    365     parameters=parameters,
    366     enable_cross_partition_query=True,
    367 ):
    368     node = metadata_dict_to_node(item[self._metadata_key])
    369     node.set_content(item[self._text_key])

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\core\\paging.py:123, in ItemPaged.__next__(self)
    121 if self._page_iterator is None:
    122     self._page_iterator = itertools.chain.from_iterable(self.by_page())
--> 123 return next(self._page_iterator)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\core\\paging.py:75, in PageIterator.__next__(self)
     73     raise StopIteration(\"End of paging\")
     74 try:
---> 75     self._response = self._get_next(self.continuation_token)
     76 except AzureError as error:
     77     if not error.continuation_token:

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_query_iterable.py:99, in QueryIterable._fetch_next(self, *args)
     89 def _fetch_next(self, *args):  # pylint: disable=unused-argument
     90     \"\"\"Return a block of results with respecting retry policy.
     91 
     92     This method only exists for backward compatibility reasons. (Because
   (...)
     97     :rtype: list
     98     \"\"\"
---> 99     block = self._ex_context.fetch_next_block()
    100     if not block:
    101         raise StopIteration

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\execution_dispatcher.py:131, in _ProxyQueryExecutionContext.fetch_next_block(self)
    129         self._execution_context = self._create_pipelined_execution_context(query_execution_info)
    130     else:
--> 131         raise e
    133 return self._execution_context.fetch_next_block()

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\execution_dispatcher.py:123, in _ProxyQueryExecutionContext.fetch_next_block(self)
    119 # TODO: NEED to change this - make every query retrieve a query plan
    120 # also, we can't have this logic being returned to so often - there should be no need for this
    121 # need to split up query plan logic and actual query iterating logic
    122 try:
--> 123     return self._execution_context.fetch_next_block()
    124 except CosmosHttpResponseError as e:
    125     if _is_partitioned_execution_info(e) or _is_hybrid_search_query(self._query, e):

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\base_execution_context.py:78, in _QueryExecutionContextBase.fetch_next_block(self)
     69 def fetch_next_block(self):
     70     \"\"\"Returns a block of results with respecting retry policy.
     71 
     72     This method only exists for backward compatibility reasons. (Because
   (...)
     76     :rtype: list
     77     \"\"\"
---> 78     self._ensure()
     79     res = list(self._buffer)
     80     self._buffer.clear()

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\base_execution_context.py:63, in _QueryExecutionContextBase._ensure(self)
     60     return
     62 if not self._buffer:
---> 63     results = self._fetch_next_block()
     64     self._buffer.extend(results)
     66 if not self._buffer:

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\base_execution_context.py:163, in _DefaultQueryExecutionContext._fetch_next_block(self)
    161 def _fetch_next_block(self):  # pylint: disable=inconsistent-return-statements
    162     while super(_DefaultQueryExecutionContext, self)._has_more_pages() and not self._buffer:
--> 163         return self._fetch_items_helper_with_retries(self._fetch_function)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\base_execution_context.py:135, in _QueryExecutionContextBase._fetch_items_helper_with_retries(self, fetch_function)
    132 def callback():
    133     return self._fetch_items_helper_no_retries(fetch_function)
--> 135 return _retry_utility.Execute(self._client, self._client._global_endpoint_manager, callback)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_retry_utility.py:99, in Execute(client, global_endpoint_manager, function, *args, **kwargs)
     97     result = ExecuteFunction(function, global_endpoint_manager, *args, **kwargs)
     98 else:
---> 99     result = ExecuteFunction(function, *args, **kwargs)
    100 if not client.last_response_headers:
    101     client.last_response_headers = {}

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_retry_utility.py:199, in ExecuteFunction(function, *args, **kwargs)
    192 def ExecuteFunction(function, *args, **kwargs):
    193     \"\"\"Stub method so that it can be used for mocking purposes as well.
    194     :param Callable function: the function to execute.
    195     :param list args: the explicit arguments for the function.
    196     :returns: the result of executing the function with the passed in arguments
    197     :rtype: tuple(dict, dict)
    198     \"\"\"
--> 199     return function(*args, **kwargs)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\base_execution_context.py:133, in _QueryExecutionContextBase._fetch_items_helper_with_retries.<locals>.callback()
    132 def callback():
--> 133     return self._fetch_items_helper_no_retries(fetch_function)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_execution_context\\base_execution_context.py:122, in _QueryExecutionContextBase._fetch_items_helper_no_retries(self, fetch_function)
    119 new_options[\"continuation\"] = self._continuation
    121 response_headers = {}
--> 122 (fetched_items, response_headers) = fetch_function(new_options)
    124 continuation_key = http_constants.HttpHeaders.Continuation
    125 self._continuation = response_headers.get(continuation_key)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_cosmos_client_connection.py:1090, in CosmosClientConnection.QueryItems.<locals>.fetch_fn(options)
   1089 def fetch_fn(options: Mapping[str, Any]) -> Tuple[List[Dict[str, Any]], CaseInsensitiveDict]:
-> 1090     return self.__QueryFeed(
   1091             path,
   1092             \"docs\",
   1093             collection_id,
   1094             lambda r: r[\"Documents\"],
   1095             lambda _, b: b,
   1096             query,
   1097             options,
   1098             response_hook=response_hook,
   1099             **kwargs)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_cosmos_client_connection.py:3124, in CosmosClientConnection.__QueryFeed(self, path, resource_type, resource_id, result_fn, create_fn, query, options, partition_key_range_id, response_hook, is_query_plan, **kwargs)
   3121     if results:
   3122         return __GetBodiesFromQueryResult(results), last_response_headers
-> 3124 result, last_response_headers = self.__Post(path, request_params, query, req_headers, **kwargs)
   3125 if last_response_headers.get(http_constants.HttpHeaders.IndexUtilization) is not None:
   3126     INDEX_METRICS_HEADER = http_constants.HttpHeaders.IndexUtilization

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_cosmos_client_connection.py:2837, in CosmosClientConnection.__Post(self, path, request_params, body, req_headers, **kwargs)
   2827 \"\"\"Azure Cosmos 'POST' http request.
   2828 
   2829 :param str path: the url to be used for the request.
   (...)
   2834 :rtype: tuple of (dict, dict)
   2835 \"\"\"
   2836 request = self.pipeline_client.post(url=path, headers=req_headers)
-> 2837 return synchronized_request.SynchronizedRequest(
   2838     client=self,
   2839     request_params=request_params,
   2840     global_endpoint_manager=self._global_endpoint_manager,
   2841     connection_policy=self.connection_policy,
   2842     pipeline_client=self.pipeline_client,
   2843     request=request,
   2844     request_data=body,
   2845     **kwargs
   2846 )

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_synchronized_request.py:214, in SynchronizedRequest(client, request_params, global_endpoint_manager, connection_policy, pipeline_client, request, request_data, **kwargs)
    211     request.headers[http_constants.HttpHeaders.ContentLength] = 0
    213 # Pass _Request function with its parameters to retry_utility's Execute method that wraps the call with retries
--> 214 return _retry_utility.Execute(
    215     client,
    216     global_endpoint_manager,
    217     _Request,
    218     request_params,
    219     connection_policy,
    220     pipeline_client,
    221     request,
    222     **kwargs
    223 )

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_retry_utility.py:97, in Execute(client, global_endpoint_manager, function, *args, **kwargs)
     95 try:
     96     if args:
---> 97         result = ExecuteFunction(function, global_endpoint_manager, *args, **kwargs)
     98     else:
     99         result = ExecuteFunction(function, *args, **kwargs)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_retry_utility.py:199, in ExecuteFunction(function, *args, **kwargs)
    192 def ExecuteFunction(function, *args, **kwargs):
    193     \"\"\"Stub method so that it can be used for mocking purposes as well.
    194     :param Callable function: the function to execute.
    195     :param list args: the explicit arguments for the function.
    196     :returns: the result of executing the function with the passed in arguments
    197     :rtype: tuple(dict, dict)
    198     \"\"\"
--> 199     return function(*args, **kwargs)

File ____\\llm2024\\.venv\\Lib\\site-packages\\azure\\cosmos\\_synchronized_request.py:154, in _Request(global_endpoint_manager, request_params, connection_policy, pipeline_client, request, **kwargs)
    152     raise exceptions.CosmosAccessConditionFailedError(message=data, response=response)
    153 if response.status_code >= 400:
--> 154     raise exceptions.CosmosHttpResponseError(message=data, response=response)
    156 result = None
    157 if data:

CosmosHttpResponseError: (BadRequest) One of the input values is invalid.\r
ActivityId: 3a2a7521-b72a-4723-bb4a-2fe8d49233c6, Windows/10.0.20348 cosmos-netstandard-sdk/3.18.0
Code: BadRequest
Message: One of the input values is invalid.\r
ActivityId: 3a2a7521-b72a-4723-bb4a-2fe8d49233c6, Windows/10.0.20348 cosmos-netstandard-sdk/3.18.0"
}
@nqtung nqtung added bug Something isn't working triage Issue needs to be triaged/prioritized labels Dec 28, 2024
Copy link

dosubot bot commented Dec 28, 2024

Hey @nqtung! 👋 I'm here to assist you with any bugs, questions, or contributions you have. Let's squash those bugs together!

The error message "CosmosHttpResponseError: (BadRequest) One of the input values is invalid" can be caused by several issues when querying Azure Cosmos DB using the LlamaIndex framework. Here are some common causes and solutions:

  1. Invalid Partition Key: Ensure that the partition key specified in your Cosmos DB setup matches the one used in your query. If the partition key is incorrect or not set, it can lead to this error.

  2. Invalid Item ID: Check that the item ID used in your query is correct and exists in the database. The ID must be correctly formatted and match the expected schema.

  3. Incorrect Container or Database Properties: Verify that properties such as offer_throughput, session_token, etag, and match_condition are correctly set and applicable to your operation.

  4. Invalid JSON Structure: Ensure that the JSON structure of the item being queried matches the expected schema. This includes having all required fields like "id" or "messages" correctly named and formatted.

  5. Exceeding Property Limits: The total size of all properties in an item should not exceed the maximum allowed size for Cosmos DB, which is 2,097,152 bytes.

  6. Minimum Vector Count: If you are using index types like quantizedFlat or diskANN, ensure that you have at least 1,000 vectors indexed, as querying with fewer vectors may lead to this error [1][2].

Make sure to validate and sanitize your data before performing operations, and ensure that all configurations match your Cosmos DB setup.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@logan-markewich
Copy link
Collaborator

Pretty sure this is fixed, but you'll need to update to the latest version of llama index and the vector store integration. You are on a fairly old version

@nqtung
Copy link
Contributor Author

nqtung commented Dec 28, 2024

I am using the latest version of llama_index_vector_stores_azurecosmosnosql-1.3.0, the issue still there.

The root cause is building wrong CosmosNoSQL query, I already fixed this issue and tested in my project. You find the changed at https://github.com/run-llama/llama_index/pull/17385/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Issue needs to be triaged/prioritized
Projects
None yet
Development

No branches or pull requests

2 participants