Skip to content

Commit f18870a

Browse files
r-bit-rryashwinbfranciscojavierarceo
authored
fix: Pydantic validation error with list-type metadata in vector search (#3797) (#4173)
# Fix for Issue #3797 ## Problem Vector store search failed with Pydantic ValidationError when chunk metadata contained list-type values. **Error:** ``` ValidationError: 3 validation errors for VectorStoreSearchResponse attributes.tags.str: Input should be a valid string attributes.tags.float: Input should be a valid number attributes.tags.bool: Input should be a valid boolean ``` **Root Cause:** - `Chunk.metadata` accepts `dict[str, Any]` (any type allowed) - `VectorStoreSearchResponse.attributes` requires `dict[str, str | float | bool]` (primitives only) - Direct assignment at line 641 caused validation failure for non-primitive types ## Solution Added utility function to filter metadata to primitive types before creating search response. ## Impact **Fixed:** - Vector search works with list metadata (e.g., `tags: ["transformers", "gpu"]`) - Lists become searchable as comma-separated strings - No ValidationError on search responses **Preserved:** - Full metadata still available in `VectorStoreContent.metadata` - No API schema changes - Backward compatible with existing primitive metadata **Affected:** All vector store providers using `OpenAIVectorStoreMixin`: FAISS, Chroma, Qdrant, Milvus, Weaviate, PGVector, SQLite-vec ## Testing tests/unit/providers/vector_io/test_vector_utils.py::test_sanitize_metadata_for_attributes --------- Co-authored-by: Ashwin Bharambe <[email protected]> Co-authored-by: Francisco Arceo <[email protected]>
1 parent 1e4e02e commit f18870a

File tree

7 files changed

+207
-8
lines changed

7 files changed

+207
-8
lines changed

client-sdks/stainless/openapi.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9862,9 +9862,21 @@ components:
98629862
title: Object
98639863
default: vector_store.file
98649864
attributes:
9865-
additionalProperties: true
9865+
additionalProperties:
9866+
anyOf:
9867+
- type: string
9868+
maxLength: 512
9869+
- type: number
9870+
- type: boolean
9871+
title: string | number | boolean
9872+
propertyNames:
9873+
type: string
9874+
maxLength: 64
98669875
type: object
9876+
maxProperties: 16
98679877
title: Attributes
9878+
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
9879+
x-oaiTypeLabel: map
98689880
chunking_strategy:
98699881
oneOf:
98709882
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

docs/static/deprecated-llama-stack-spec.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6705,9 +6705,21 @@ components:
67056705
title: Object
67066706
default: vector_store.file
67076707
attributes:
6708-
additionalProperties: true
6708+
additionalProperties:
6709+
anyOf:
6710+
- type: string
6711+
maxLength: 512
6712+
- type: number
6713+
- type: boolean
6714+
title: string | number | boolean
6715+
propertyNames:
6716+
type: string
6717+
maxLength: 64
67096718
type: object
6719+
maxProperties: 16
67106720
title: Attributes
6721+
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
6722+
x-oaiTypeLabel: map
67116723
chunking_strategy:
67126724
oneOf:
67136725
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

docs/static/experimental-llama-stack-spec.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6061,9 +6061,21 @@ components:
60616061
title: Object
60626062
default: vector_store.file
60636063
attributes:
6064-
additionalProperties: true
6064+
additionalProperties:
6065+
anyOf:
6066+
- type: string
6067+
maxLength: 512
6068+
- type: number
6069+
- type: boolean
6070+
title: string | number | boolean
6071+
propertyNames:
6072+
type: string
6073+
maxLength: 64
60656074
type: object
6075+
maxProperties: 16
60666076
title: Attributes
6077+
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
6078+
x-oaiTypeLabel: map
60676079
chunking_strategy:
60686080
oneOf:
60696081
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

docs/static/llama-stack-spec.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8883,9 +8883,21 @@ components:
88838883
title: Object
88848884
default: vector_store.file
88858885
attributes:
8886-
additionalProperties: true
8886+
additionalProperties:
8887+
anyOf:
8888+
- type: string
8889+
maxLength: 512
8890+
- type: number
8891+
- type: boolean
8892+
title: string | number | boolean
8893+
propertyNames:
8894+
type: string
8895+
maxLength: 64
88878896
type: object
8897+
maxProperties: 16
88888898
title: Attributes
8899+
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
8900+
x-oaiTypeLabel: map
88898901
chunking_strategy:
88908902
oneOf:
88918903
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

docs/static/stainless-llama-stack-spec.yaml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9862,9 +9862,21 @@ components:
98629862
title: Object
98639863
default: vector_store.file
98649864
attributes:
9865-
additionalProperties: true
9865+
additionalProperties:
9866+
anyOf:
9867+
- type: string
9868+
maxLength: 512
9869+
- type: number
9870+
- type: boolean
9871+
title: string | number | boolean
9872+
propertyNames:
9873+
type: string
9874+
maxLength: 64
98669875
type: object
9876+
maxProperties: 16
98679877
title: Attributes
9878+
description: Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard. Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters, booleans, or numbers.
9879+
x-oaiTypeLabel: map
98689880
chunking_strategy:
98699881
oneOf:
98709882
- $ref: '#/components/schemas/VectorStoreChunkingStrategyAuto'

src/llama_stack_api/vector_io.py

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from typing import Annotated, Any, Literal, Protocol, runtime_checkable
1212

1313
from fastapi import Body, Query
14-
from pydantic import BaseModel, Field
14+
from pydantic import BaseModel, Field, field_validator
1515

1616
from llama_stack_api.common.tracing import telemetry_traceable
1717
from llama_stack_api.inference import InterleavedContent
@@ -372,6 +372,65 @@ class VectorStoreFileLastError(BaseModel):
372372
register_schema(VectorStoreFileStatus, name="VectorStoreFileStatus")
373373

374374

375+
# VectorStoreFileAttributes type with OpenAPI constraints
376+
VectorStoreFileAttributes = Annotated[
377+
dict[str, Annotated[str, Field(max_length=512)] | float | bool],
378+
Field(
379+
max_length=16,
380+
json_schema_extra={
381+
"propertyNames": {"type": "string", "maxLength": 64},
382+
"x-oaiTypeLabel": "map",
383+
},
384+
description=(
385+
"Set of 16 key-value pairs that can be attached to an object. This can be "
386+
"useful for storing additional information about the object in a structured "
387+
"format, and querying for objects via API or the dashboard. Keys are strings "
388+
"with a maximum length of 64 characters. Values are strings with a maximum "
389+
"length of 512 characters, booleans, or numbers."
390+
),
391+
),
392+
]
393+
394+
395+
def _sanitize_vector_store_attributes(metadata: dict[str, Any] | None) -> dict[str, str | float | bool]:
396+
"""
397+
Sanitize metadata to VectorStoreFileAttributes spec (max 16 properties, primitives only).
398+
399+
Converts dict[str, Any] to dict[str, str | float | bool]:
400+
- Preserves: str (truncated to 512 chars), bool, int/float (as float)
401+
- Converts: list -> comma-separated string
402+
- Filters: dict, None, other types
403+
- Enforces: max 16 properties, max 64 char keys, max 512 char string values
404+
"""
405+
if not metadata:
406+
return {}
407+
408+
sanitized: dict[str, str | float | bool] = {}
409+
for key, value in metadata.items():
410+
# Enforce max 16 properties
411+
if len(sanitized) >= 16:
412+
break
413+
414+
# Enforce max 64 char keys
415+
if len(key) > 64:
416+
continue
417+
418+
# Convert to supported primitive types
419+
if isinstance(value, bool):
420+
sanitized[key] = value
421+
elif isinstance(value, int | float):
422+
sanitized[key] = float(value)
423+
elif isinstance(value, str):
424+
# Enforce max 512 char string values
425+
sanitized[key] = value[:512] if len(value) > 512 else value
426+
elif isinstance(value, list):
427+
# Convert lists to comma-separated strings (max 512 chars)
428+
list_str = ", ".join(str(item) for item in value)
429+
sanitized[key] = list_str[:512] if len(list_str) > 512 else list_str
430+
431+
return sanitized
432+
433+
375434
@json_schema_type
376435
class VectorStoreFileObject(BaseModel):
377436
"""OpenAI Vector Store File object.
@@ -389,14 +448,20 @@ class VectorStoreFileObject(BaseModel):
389448

390449
id: str
391450
object: str = "vector_store.file"
392-
attributes: dict[str, Any] = Field(default_factory=dict)
451+
attributes: VectorStoreFileAttributes = Field(default_factory=dict)
393452
chunking_strategy: VectorStoreChunkingStrategy
394453
created_at: int
395454
last_error: VectorStoreFileLastError | None = None
396455
status: VectorStoreFileStatus
397456
usage_bytes: int = 0
398457
vector_store_id: str
399458

459+
@field_validator("attributes", mode="before")
460+
@classmethod
461+
def _validate_attributes(cls, v: dict[str, Any] | None) -> dict[str, str | float | bool]:
462+
"""Sanitize attributes to match VectorStoreFileAttributes OpenAPI spec."""
463+
return _sanitize_vector_store_attributes(v)
464+
400465

401466
@json_schema_type
402467
class VectorStoreListFilesResponse(BaseModel):

tests/unit/providers/vector_io/test_vector_utils.py

Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# the root directory of this source tree.
66

77
from llama_stack.providers.utils.vector_io.vector_utils import generate_chunk_id
8-
from llama_stack_api import Chunk, ChunkMetadata
8+
from llama_stack_api import Chunk, ChunkMetadata, VectorStoreFileObject
99

1010
# This test is a unit test for the chunk_utils.py helpers. This should only contain
1111
# tests which are specific to this file. More general (API-level) tests should be placed in
@@ -78,3 +78,77 @@ def test_chunk_serialization():
7878
serialized_chunk = chunk.model_dump()
7979
assert serialized_chunk["chunk_id"] == "test-chunk-id"
8080
assert "chunk_id" in serialized_chunk
81+
82+
83+
def test_vector_store_file_object_attributes_validation():
84+
"""Test VectorStoreFileObject validates and sanitizes attributes at input boundary."""
85+
# Test with metadata containing lists, nested dicts, and primitives
86+
from llama_stack_api.vector_io import VectorStoreChunkingStrategyAuto
87+
88+
file_obj = VectorStoreFileObject(
89+
id="file-123",
90+
attributes={
91+
"tags": ["transformers", "h100-compatible", "region:us"], # List -> string
92+
"model_name": "granite-3.3-8b", # String preserved
93+
"score": 0.95, # Float preserved
94+
"active": True, # Bool preserved
95+
"count": 42, # Int -> float
96+
"nested": {"key": "value"}, # Dict filtered out
97+
},
98+
chunking_strategy=VectorStoreChunkingStrategyAuto(),
99+
created_at=1234567890,
100+
status="completed",
101+
vector_store_id="vs-123",
102+
)
103+
104+
# Lists converted to comma-separated strings
105+
assert file_obj.attributes["tags"] == "transformers, h100-compatible, region:us"
106+
# Primitives preserved
107+
assert file_obj.attributes["model_name"] == "granite-3.3-8b"
108+
assert file_obj.attributes["score"] == 0.95
109+
assert file_obj.attributes["active"] is True
110+
assert file_obj.attributes["count"] == 42.0 # int -> float
111+
# Complex types filtered out
112+
assert "nested" not in file_obj.attributes
113+
114+
115+
def test_vector_store_file_object_attributes_constraints():
116+
"""Test VectorStoreFileObject enforces OpenAPI constraints on attributes."""
117+
from llama_stack_api.vector_io import VectorStoreChunkingStrategyAuto
118+
119+
# Test max 16 properties
120+
many_attrs = {f"key{i}": f"value{i}" for i in range(20)}
121+
file_obj = VectorStoreFileObject(
122+
id="file-123",
123+
attributes=many_attrs,
124+
chunking_strategy=VectorStoreChunkingStrategyAuto(),
125+
created_at=1234567890,
126+
status="completed",
127+
vector_store_id="vs-123",
128+
)
129+
assert len(file_obj.attributes) == 16 # Max 16 properties
130+
131+
# Test max 64 char keys are filtered
132+
long_key_attrs = {"a" * 65: "value", "valid_key": "value"}
133+
file_obj = VectorStoreFileObject(
134+
id="file-124",
135+
attributes=long_key_attrs,
136+
chunking_strategy=VectorStoreChunkingStrategyAuto(),
137+
created_at=1234567890,
138+
status="completed",
139+
vector_store_id="vs-123",
140+
)
141+
assert "a" * 65 not in file_obj.attributes
142+
assert "valid_key" in file_obj.attributes
143+
144+
# Test max 512 char string values are truncated
145+
long_value_attrs = {"key": "x" * 600}
146+
file_obj = VectorStoreFileObject(
147+
id="file-125",
148+
attributes=long_value_attrs,
149+
chunking_strategy=VectorStoreChunkingStrategyAuto(),
150+
created_at=1234567890,
151+
status="completed",
152+
vector_store_id="vs-123",
153+
)
154+
assert len(file_obj.attributes["key"]) == 512

0 commit comments

Comments
 (0)