Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] After uploading multiple files in "File Collection", the answers obtained by using "Search All" are completely unrelated to the uploaded documents #557

Open
yoyo20010808 opened this issue Dec 7, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@yoyo20010808
Copy link

yoyo20010808 commented Dec 7, 2024

Description

When I have uploaded multiple files to "File Collection", there is no error in the upload process, but when I use the "Search All" interface to ask questions, the following error may appear in the background:
(Please see "Logs" for error details)
The response of the large model is completely irrelevant to the document and the "Information panel" on the right will prompt "No evidence found"; but when I use the "Search In File(s)" interface to select the only target file to ask the same question, the program can be used normally and the answer is correct; when I delete all files and upload again, this problem may disappear.

Reproduction steps

1. On the web page Files→File Collection→upload file then upload and index (repeatedly upload multiple or upload multiple at a time)
2. On the web page Chat→File Collection→Search All→"chat input", enter the question and click "send" to send the question
3. At this time, the background service app may report an error when it is started, and the response obtained is irrelevant to the uploaded document.

Logs

searching in doc_ids ['585ebf1c-f3b8-4623-b247-e5784466900a', 'a5cd7c88-b26d-47b8-918d-b69092a9642d', '17c10ccb-2119-4b25-8069-252ac93b014d']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters'])
Number of requested results 100 is greater than number of elements in index 76, updating n_results = 76
Exception in thread Thread-3 (query_vectorstore):
Traceback (most recent call last):
  File "/mypath/kotaemon/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/mypath/kotaemon/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/mypath/kotaemon/libs/kotaemon/kotaemon/indices/vectorindex.py", line 199, in query_vectorstore
    _, vs_scores, vs_ids = self.vector_store.query(
  File "/mypath/kotaemon/libs/kotaemon/kotaemon/storages/vectorstores/base.py", line 169, in query
    output = self._client.query(
  File "/mypath/kotaemon/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 371, in query
    return self._query(
  File "/mypath/kotaemon/lib/python3.10/site-packages/llama_index/vector_stores/chroma/base.py", line 381, in _query
    results = self._collection.query(
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 221, in query
    query_results = self._client._query(
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 146, in wrapper
    return f(*args, **kwargs)
  File "/mypath/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/mypath/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/mypath/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/mypath/kotaemon/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/mypath/kotaemon/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/mypath/kotaemon/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/api/segment.py", line 103, in wrapper
    return self._rate_limit_enforcer.rate_limit(func)(*args, **kwargs)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/rate_limit/simple_rate_limit/__init__.py", line 23, in wrapper
    return func(*args, **kwargs)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/api/segment.py", line 718, in _query
    return self._executor.knn(
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/execution/executor/local.py", line 139, in knn
    knns = self._vector_segment(plan.scan.collection).query_vectors(query)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 146, in wrapper
    return f(*args, **kwargs)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/segment/impl/vector/local_persistent_hnsw.py", line 450, in query_vectors
    hnsw_results = super().query_vectors(hnsw_query)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 146, in wrapper
    return f(*args, **kwargs)
  File "/mypath/kotaemon/lib/python3.10/site-packages/chromadb/segment/impl/vector/local_hnsw.py", line 162, in query_vectors
    result_labels, distances = self._index.knn_query(
RuntimeError: Cannot return the results in a contigious 2D array. Probably ef or M is too small

Browsers

No response

OS

No response

Additional information

No response

@yoyo20010808 yoyo20010808 added the bug Something isn't working label Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant