Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collection.iterator created with return_properties fails when using properties with object[] dataType #1406

Open
anakin87 opened this issue Nov 11, 2024 · 1 comment

Comments

@anakin87
Copy link

Description

This issue was found in the Haystack integration. Corresponding Haystack issue: deepset-ai/haystack-core-integrations#1172

weaviate-client==4.9.0

When a property with object[] dataType is present,
we get an error when iterating on a collection.iterator created with return_properties.

If we create a similar iterator but without passing return_properties, all works correctly.

Reproducible code example

import weaviate

# I'm running a Docker instance of Weaviate (1.24.5; same behavior with 1.27.2)

client = weaviate.WeaviateClient(
   connection_params=(weaviate.connect.base.ConnectionParams.from_url(url="http://localhost:8080", grpc_port=50051))
)

client.connect()

DOCUMENT_COLLECTION_PROPERTIES = [
   {"name": "_original_id", "dataType": ["text"]},
   {"name": "content", "dataType": ["text"]},
   {"name": "dataframe", "dataType": ["text"]},
   {"name": "blob_data", "dataType": ["blob"]},
   {"name": "blob_mime_type", "dataType": ["text"]},
   {"name": "score", "dataType": ["number"]},
   # the following properties can be present or not. Weaviate shows the same behavior:
   # documents are correctly written but not correctly returned
   # {
   #         'name': 'mylistofobjects', 'dataType': ['object[]'],
   #         'nestedProperties': [
   #             {'dataType': ['text'], 'name': 'doc_id'},
   #             {'dataType': ['number[]'], 'name': 'range'}],
   # }
]


collection_settings = {
   "class": "Default",
   "invertedIndexConfig": {"indexNullState": True},
   "properties": DOCUMENT_COLLECTION_PROPERTIES,
}


collection = client.collections.create_from_dict(collection_settings)

properties = {
   'content': 'This is a test document',
   'dataframe': None,
   'score': None,
   'mylistofobjects': [{'doc_id': '1', 'range': [1, 2]}],
   '_original_id': '3972bbfa2c09af05a7118ed4233124582a138dd83e3de1db3ff742f810df4c41',
}

collection.data.insert(
   properties=properties,
   vector=[0.1] * 300,
)

# this works and returns all properties except byte
# (in this case byte properties are not present, but they are not returned even if present)
it = collection.iterator(include_vector=True)
for i in it:
   print(i)

# this fails
it = collection.iterator(include_vector=True, return_properties=["content", "mylistofobjects"])
for i in it:
   print(i)

Error:

  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/weaviate/collections/grpc/query.py", line 798, in __call
    res = await self._connection.grpc_stub.Search(
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/grpc/aio/_call.py", line 327, in __await__
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "creating primitive value for mylistofobjects: proto: invalid type: []interface {}"
        debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-11-11T17:27:11.360212037+01:00", grpc_status:2, grpc_message:"creating primitive value for mylistofobjects: proto: invalid type: []interface {}"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/newtryweaviate.py", line 55, in <module>
    for i in it:
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/weaviate/collections/iterator.py", line 59, in __next__
    res = self.__query.fetch_objects(
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/weaviate/syncify.py", line 23, in sync_method
    return _EventLoopSingleton.get_instance().run_until_complete(
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/weaviate/event_loop.py", line 40, in run_until_complete
    return fut.result()
  File "/home/anakin87/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/home/anakin87/.pyenv/versions/3.10.13/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/weaviate/collections/queries/fetch_objects/query.py", line 65, in fetch_objects
    res = await self._query.get(
  File "/home/anakin87/apps/haystack-core-integrations/integrations/weaviate/.hatch/weaviate-haystack/lib/python3.10/site-packages/weaviate/collections/grpc/query.py", line 805, in __call
    raise WeaviateQueryError(str(e), "GRPC search")  # pyright: ignore
weaviate.exceptions.WeaviateQueryError: Query call with protocol GRPC search failed with message <AioRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "creating primitive value for mylistofobjects: proto: invalid type: []interface {}"
        debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-11-11T17:27:11.360212037+01:00", grpc_status:2, grpc_message:"creating primitive value for mylistofobjects: proto: invalid type: []interface {}"}"
>.```
@anakin87 anakin87 changed the title collection.iterator fails with return_properties when using properties of with object[] dataType collection.iterator fails with return_properties when using properties with object[] dataType Nov 11, 2024
@anakin87 anakin87 changed the title collection.iterator fails with return_properties when using properties with object[] dataType collection.iterator created with return_properties fails when using properties with object[] dataType Nov 11, 2024
@tsmith023
Copy link
Contributor

tsmith023 commented Nov 14, 2024

Hi @anakin87, thanks for raising this one! The exact syntax that you're attempting here actually isn't supported currently but I think it should be, it seems an oversight that it was missed

In short, if you specify the object or object[] property that you want returned then you have to also specify the exact nested properties that you want back. In your case, this would look like:

it = collection.iterator(
    include_vector=True,
    return_properties=[
        "content",
        wvc.query.QueryNested(name="mylistofobjects", properties=["doc_id", "range"])
    ]
)

However, the use-case you show with the Python client appears natural so it is certainly a surprise that it fails with such a cryptic error! I will look into how we can support this in the gRPC API on the server. If we can do it easily, we'll be able to release a patch version to all currently supported minors including the fix 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants