fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173

r-bit-rry · 2025-11-17T09:09:49Z

Fix for Issue #3797

Problem

Vector store search failed with Pydantic ValidationError when chunk metadata contained list-type values.

Error:

ValidationError: 3 validation errors for VectorStoreSearchResponse
attributes.tags.str: Input should be a valid string
attributes.tags.float: Input should be a valid number
attributes.tags.bool: Input should be a valid boolean

Root Cause:

Chunk.metadata accepts dict[str, Any] (any type allowed)
VectorStoreSearchResponse.attributes requires dict[str, str | float | bool] (primitives only)
Direct assignment at line 641 caused validation failure for non-primitive types

Solution

Added utility function to filter metadata to primitive types before creating search response.

Impact

Fixed:

Vector search works with list metadata (e.g., tags: ["transformers", "gpu"])
Lists become searchable as comma-separated strings
No ValidationError on search responses

Preserved:

Full metadata still available in VectorStoreContent.metadata
No API schema changes
Backward compatible with existing primitive metadata

Affected:
All vector store providers using OpenAIVectorStoreMixin: FAISS, Chroma, Qdrant, Milvus, Weaviate, PGVector, SQLite-vec

Testing

tests/unit/providers/vector_io/test_vector_utils.py::test_sanitize_metadata_for_attributes

mattf

@cdoern we have inconsistencies in the vector_stores api, what does the compliance tool say about it?

@r-bit-rry thanks for finding this. instead of massaging the types after receiving them, will you fix the api?

as you've pointed out, the correct type information for attributes is -

    VectorStoreFileAttributes:
      anyOf:
        - type: object
          description: |
            Set of 16 key-value pairs that can be attached to an object. This can be
            useful for storing additional information about the object in a structured
            format, and querying for objects via API or the dashboard. Keys are strings
            with a maximum length of 64 characters. Values are strings with a maximum
            length of 512 characters, booleans, or numbers.
          maxProperties: 16
          propertyNames:
            type: string
            maxLength: 64
          additionalProperties:
            anyOf:
              - type: string
                maxLength: 512
              - type: number
              - type: boolean
          x-oaiTypeLabel: map
        - type: 'null'

we're accepting the wrong type in multiple places -

r-bit-rry · 2025-11-17T14:14:36Z

@cdoern we have inconsistencies in the vector_stores api, what does the compliance tool say about it?

@r-bit-rry thanks for finding this. instead of massaging the types after receiving them, will you fix the api?

as you've pointed out, the correct type information for attributes is -

    VectorStoreFileAttributes:
      anyOf:
        - type: object
          description: |
            Set of 16 key-value pairs that can be attached to an object. This can be
            useful for storing additional information about the object in a structured
            format, and querying for objects via API or the dashboard. Keys are strings
            with a maximum length of 64 characters. Values are strings with a maximum
            length of 512 characters, booleans, or numbers.
          maxProperties: 16
          propertyNames:
            type: string
            maxLength: 64
          additionalProperties:
            anyOf:
              - type: string
                maxLength: 512
              - type: number
              - type: boolean
          x-oaiTypeLabel: map
        - type: 'null'

we're accepting the wrong type in multiple places -

* [POST /vector_stores/{id}/files/{fid}](https://github.com/llamastack/llama-stack/blob/main/src/llama_stack_api/vector_io.py#L753)

* [POST /vector_stores/{id}/files](https://github.com/llamastack/llama-stack/blob/main/src/llama_stack_api/vector_io.py#L665)

yeah I can, but I did not want to break API backward compatibility, if we are comfortable with that I will.

cdoern · 2025-11-17T15:59:04Z

@mattf @r-bit-rry this PR is adding a check diffing the openai openapi spec against ours, you can see the output here https://github.com/llamastack/llama-stack/actions/runs/19433942649/job/55599692027?pr=3529 of the most recent run has a bunch of warnings/errs for vector_stores in step Run OpenAPI Breaking Change Diff Against OpenAI API

r-bit-rry · 2025-11-17T18:13:41Z

@mattf I introduced changes to the schema and aadditional validations.
I decided to keep the massaging/sanitation method because it still solves issue #3797 which was passing a list of strings.

mattf · 2025-11-18T13:35:48Z

@mattf I introduced changes to the schema and aadditional validations. I decided to keep the massaging/sanitation method because it still solves issue #3797 which was passing a list of strings.

ok, reasonable enough. this nicely declares the correct type and is backward compatible.

you could close that issue w/ suggestion to do {"tags": "tag0,tag1"} and user can split on the output end.

ashwinb · 2025-11-18T20:25:47Z

Hm, stainless builds are failing -- not completely clear if that is due to Cloudflare issues from today, or something from this change. @dgellow could you help?

r-bit-rry · 2025-11-19T09:13:32Z

@ashwinb can we just re-trigger them, I suspect it has to do with the cloudflare outage.
I don't have permission to re-trigger them (without generating some empty commit)

dgellow · 2025-11-19T13:31:04Z

I don't think it is related to cloudflare or github issues from the past days, I can see the first occurrence was ~2 days ago.
I will need to review our logs and will share more details later today.

# What does this PR do?  I believe that should avoid CI issues seen in #4173. Error we see in Stainless logs: ``` (cannot lock ref 'refs/heads/preview/base/fix/issue-3797-metadata-validation': 'refs/heads/preview/base/fix' exists; cannot create 'refs/heads/preview/base/fix/issue-3797-metadata-validation') ``` The issue is that if a branch `fix` exists, `fix/<whatever>` cannot be created (that's how git refs work unfortunately...). The fix in this PR is to ensure PRs from forks are using the author as a prefix. In addition we will do changes to the Stainless API to return better error messages here, it should have been a 4xx with a meaningful error, not a 500. And we will likely need to delete the `fix` branch.   ## Test Plan

github-actions · 2025-11-19T18:10:07Z

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

fix: Pydantic validation error with list-type metadata in vector search (#3797)

⚠️

llama-stack-client-node studio · code

There was a regression in your SDK.
generate ⚠️ → build ✅ → lint ✅ → test ✅
npm install https://pkg.stainless.com/s/llama-stack-client-node/6d128b0a9059602f60aed13f643a1adc65f93f1c/dist.tar.gz

⚠️

llama-stack-client-kotlin studio · code

There was a regression in your SDK.
generate ⚠️ → lint ✅ → test ❗

⚠️

llama-stack-client-python studio · code

There was a regression in your SDK.
generate ⚠️ → build ⏳ → lint ⏳ → test ⏳

⚠️

llama-stack-client-go studio · code

There was a regression in your SDK.
generate ⚠️ → lint ❗ → test ❗
go get github.com/stainless-sdks/llama-stack-client-go@d0dd1c22471bbb6f9912c401921f1a708410574f

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2025-11-19 18:27:06 UTC

ashwinb · 2025-11-19T18:16:26Z

There's an error on the Kotlin SDK generation which is almost certainly a Stainless thing now (cc @dgellow). Things look good on this PR, landing!

dgellow · 2025-11-19T18:34:39Z

@ashwinb Do you mean the failing tests? I don't see a codegen error for any language

edit: clarified over discord, there was in fact no error, but the PR comment showed "fatal error" while the builds were queuing (so, more like a UI glitch we will need to iron out)

fix(3797): sanitize metadata for attributes to avoid silent failure

f4a54b9

r-bit-rry requested review from ashwinb, bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners November 17, 2025 09:09

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025

r-bit-rry changed the title ~~Fix: Pydantic validation error with list-type metadata in vector search (#3797)~~ fix: Pydantic validation error with list-type metadata in vector search (#3797) Nov 17, 2025

minor fix to type declarations

3c672f4

mattf requested changes Nov 17, 2025

View reviewed changes

changes according the the comments

1abb78b

mattf approved these changes Nov 18, 2025

View reviewed changes

r-bit-rry and others added 2 commits November 18, 2025 17:16

Merge branch 'main' into fix/issue-3797-metadata-validation

190083d

Merge branch 'main' into fix/issue-3797-metadata-validation

6e2f3bf

Merge branch 'main' into fix/issue-3797-metadata-validation

aeb3f80

dgellow mentioned this pull request Nov 19, 2025

fix(ci): prefix stainless branches with fork author #4187

Merged

Merge branch 'main' into fix/issue-3797-metadata-validation

0358770

ashwinb merged commit f18870a into llamastack:main Nov 19, 2025
26 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173

fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173

Uh oh!

r-bit-rry commented Nov 17, 2025

Uh oh!

mattf left a comment

Uh oh!

r-bit-rry commented Nov 17, 2025

Uh oh!

cdoern commented Nov 17, 2025 •

edited

Loading

Uh oh!

r-bit-rry commented Nov 17, 2025

Uh oh!

mattf commented Nov 18, 2025

Uh oh!

ashwinb commented Nov 18, 2025 •

edited

Loading

Uh oh!

r-bit-rry commented Nov 19, 2025

Uh oh!

dgellow commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

ashwinb commented Nov 19, 2025

Uh oh!

Uh oh!

dgellow commented Nov 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173

fix: Pydantic validation error with list-type metadata in vector search (#3797) #4173

Uh oh!

Conversation

r-bit-rry commented Nov 17, 2025

Fix for Issue #3797

Problem

Solution

Impact

Testing

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

r-bit-rry commented Nov 17, 2025

Uh oh!

cdoern commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-bit-rry commented Nov 17, 2025

Uh oh!

mattf commented Nov 18, 2025

Uh oh!

ashwinb commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-bit-rry commented Nov 19, 2025

Uh oh!

dgellow commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

ashwinb commented Nov 19, 2025

Uh oh!

Uh oh!

dgellow commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

cdoern commented Nov 17, 2025 •

edited

Loading

ashwinb commented Nov 18, 2025 •

edited

Loading

github-actions bot commented Nov 19, 2025 •

edited

Loading

dgellow commented Nov 19, 2025 •

edited

Loading