-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173
Conversation
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdoern we have inconsistencies in the vector_stores api, what does the compliance tool say about it?
@r-bit-rry thanks for finding this. instead of massaging the types after receiving them, will you fix the api?
as you've pointed out, the correct type information for attributes is -
VectorStoreFileAttributes:
anyOf:
- type: object
description: |
Set of 16 key-value pairs that can be attached to an object. This can be
useful for storing additional information about the object in a structured
format, and querying for objects via API or the dashboard. Keys are strings
with a maximum length of 64 characters. Values are strings with a maximum
length of 512 characters, booleans, or numbers.
maxProperties: 16
propertyNames:
type: string
maxLength: 64
additionalProperties:
anyOf:
- type: string
maxLength: 512
- type: number
- type: boolean
x-oaiTypeLabel: map
- type: 'null'
we're accepting the wrong type in multiple places -
yeah I can, but I did not want to break API backward compatibility, if we are comfortable with that I will. |
|
@mattf @r-bit-rry this PR is adding a check diffing the openai openapi spec against ours, you can see the output here https://github.com/llamastack/llama-stack/actions/runs/19433942649/job/55599692027?pr=3529 of the most recent run has a bunch of warnings/errs for |
ok, reasonable enough. this nicely declares the correct type and is backward compatible. you could close that issue w/ suggestion to do |
|
Hm, stainless builds are failing -- not completely clear if that is due to Cloudflare issues from today, or something from this change. @dgellow could you help? |
|
@ashwinb can we just re-trigger them, I suspect it has to do with the cloudflare outage. |
|
I don't think it is related to cloudflare or github issues from the past days, I can see the first occurrence was ~2 days ago. |
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> I believe that should avoid CI issues seen in #4173. Error we see in Stainless logs: ``` (cannot lock ref 'refs/heads/preview/base/fix/issue-3797-metadata-validation': 'refs/heads/preview/base/fix' exists; cannot create 'refs/heads/preview/base/fix/issue-3797-metadata-validation') ``` The issue is that if a branch `fix` exists, `fix/<whatever>` cannot be created (that's how git refs work unfortunately...). The fix in this PR is to ensure PRs from forks are using the author as a prefix. In addition we will do changes to the Stainless API to return better error messages here, it should have been a 4xx with a meaningful error, not a 500. And we will likely need to delete the `fix` branch. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
✱ Stainless preview buildsThis PR will update the
|
|
There's an error on the Kotlin SDK generation which is almost certainly a Stainless thing now (cc @dgellow). Things look good on this PR, landing! |
|
@ashwinb Do you mean the failing tests? I don't see a codegen error for any language edit: clarified over discord, there was in fact no error, but the PR comment showed "fatal error" while the builds were queuing (so, more like a UI glitch we will need to iron out) |
Fix for Issue #3797
Problem
Vector store search failed with Pydantic ValidationError when chunk metadata contained list-type values.
Error:
Root Cause:
Chunk.metadataacceptsdict[str, Any](any type allowed)VectorStoreSearchResponse.attributesrequiresdict[str, str | float | bool](primitives only)Solution
Added utility function to filter metadata to primitive types before creating search response.
Impact
Fixed:
tags: ["transformers", "gpu"])Preserved:
VectorStoreContent.metadataAffected:
All vector store providers using
OpenAIVectorStoreMixin: FAISS, Chroma, Qdrant, Milvus, Weaviate, PGVector, SQLite-vecTesting
tests/unit/providers/vector_io/test_vector_utils.py::test_sanitize_metadata_for_attributes