-
-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add PydanticJSONB
TypeDecorator for Automatic Pydantic Model Serialization in SQLModel
#1324
base: main
Are you sure you want to change the base?
Conversation
I like this approach. There seem to me at the moment 2 things that would still need to be added.
What about the following? from typing import Any, Type, TypeVar, get_args
from pydantic import BaseModel
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB # for Postgres JSONB
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(self, model_class: Type[BaseModel] | Type[list[BaseModelType]], *args, **kwargs):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
def process_bind_param(self, value: Any, dialect) -> list[dict] | dict | None: # noqa: ANN401, ARG002, ANN001
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [m.model_dump(mode="json") for m in value]
return value
def process_result_value(self, value: Any, dialect) -> list[BaseModel] | BaseModel | None: # noqa: ANN401, ARG002, ANN001
# Called when loading from DB: convert dict to Pydantic model instance
if isinstance(value, dict):
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
return [get_args(self.model_class)[0].model_validate(v) for v in value]
return None |
+1 from @DaanRademaker 's comment. This PydanticJSONB implementation needs to support |
@Seluj78 I hesitate because when process_result_value returns a dict, additional logic is needed to distinguish between a basic dictionary and a structured collection of Pydantic models. This could introduce ambiguity or unexpected behavior in model validation and serialization. |
Actually, maybe the key signifier can come from the |
I see what you mean. I was trying to avoid an extra step by just having The main problem you'd need to tacle anyway, no matter if you choose to support this or not is the mutability and assignment detection. It was a nightmare to try and get working and I failed on my end |
No, it actually makes sense, sorry, I had to think about for a minute. Take a look at my most recent commit. I manage it by, just like with If you want, throw this at the bottom and run if __name__ == "__main__":
from typing import Dict, List
from pydantic import BaseModel
from sqlalchemy import Column
from sqlmodel import Field, SQLModel # Import Field
from sqlmodel.sql.sqltypes import PydanticJSONB
# Define some Pydantic models
class Address(BaseModel):
street: str
city: str
class User(BaseModel):
name: str
age: int
# 1. Single Model Example
class PersonTable(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
# Use Field instead of Column, and wrap Address with PydanticJSONB
address: Address = Field(sa_column=Column(PydanticJSONB(Address)))
# 2. List of Models Example
class TeamTable(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
members: List[User] = Field(sa_column=Column(PydanticJSONB(List[User])))
# 3. Dictionary of Models Example
class CompanyTable(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
employees: Dict[str, User] = Field(
sa_column=Column(PydanticJSONB(Dict[str, User]))
)
# Test instances
person = PersonTable(address=Address(street="123 Main St", city="Boston"))
print("Person:")
print(person)
team = TeamTable(members=[User(name="Alice", age=30), User(name="Bob", age=25)])
print("\nTeam:")
print(team)
company = CompanyTable(
employees={"alice": User(name="Alice", age=30), "bob": User(name="Bob", age=25)}
)
print("\nCompany:")
print(company)
|
Nice improvements! I think there is 1 more interesting usecase. Let's say you would use the insert statement from the postgres dialect, the process bind param will be called with a dictionary that might not be jsonable. It would probably be useful to convert any dictionary to jsonable using pydantic to_jsonable function. from datetime import date
from decimal import Decimal
from typing import Any, Dict, List, Type, TypeVar, get_args
from uuid import UUID
from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import (
JSONB, # for Postgres JSONB
insert,
)
from sqlmodel import Field, SQLModel
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
# Add to_jsonable_python
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(
self,
model_class: Type[BaseModelType] | Type[list[BaseModelType]] | Type[Dict[str, BaseModelType]],
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
def process_bind_param(self, value: Any, dialect) -> dict | list[dict] | None: # noqa: ANN401, ARG002, ANN001
if value is None:
return None
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [m.model_dump(mode="json") if isinstance(m, BaseModel) else to_jsonable_python(m) for m in value]
if isinstance(value, dict):
return {
k: v.model_dump(mode="json") if isinstance(v, BaseModel) else to_jsonable_python(v)
for k, v in value.items()
}
return to_jsonable_python(value)
def process_result_value(
self, value: Any, dialect
) -> BaseModelType | List[BaseModelType] | Dict[str, BaseModelType] | None:
if value is None:
return None
if isinstance(value, dict):
# If model_class is a Dict type hint, handle key-value pairs
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is dict:
model_class = get_args(self.model_class)[1] # Get the value type (the model)
return {k: model_class.model_validate(v) for k, v in value.items()}
# Regular case: the whole dict represents a single model
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
# If model_class is a List type hint
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is list:
model_class = get_args(self.model_class)[0]
return [model_class.model_validate(v) for v in value]
# Fallback case (though this shouldn't happen given our __init__ types)
return [self.model_class.model_validate(v) for v in value] # type: ignore
return value
class SomeNestedModel(BaseModel):
some_decimal: Decimal
class SomeModel(SQLModel, table=True):
some_uuid: UUID = Field(primary_key=True)
some_date: date = Field(primary_key=True)
some_nested: SomeNestedModel = Field(sa_column=Field(PydanticJSONB(SomeNestedModel)))
item_1 = SomeModel(
some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
some_date=date(2022, 1, 1),
some_nested=SomeNestedModel(some_decimal=Decimal("1.23")), #
)
# model_dump(mode="json") cannot be called because else fields like some_date and some_uuid are not correct type
# therefore added to_jsonable_python in process_bind_param
not_yet_jsonable_dict = item_1.model_dump()
stmt = insert(SomeModel).values(not_yet_jsonable_dict)
# statement cannot executed with jsonable dict because it loses datetime, uuid types etc
jsonable_dict = item_1.model_dump(mode="json")
stmt_2 = insert(SomeModel).values(jsonable_dict) |
@DaanRademaker Also, I tried your example (make sure your update the from datetime import date
from decimal import Decimal
from pydantic import BaseModel
from uuid import UUID
from sqlmodel import Column, Field, SQLModel
class SomeNestedModel(BaseModel):
some_decimal: Decimal
class SomeModel(SQLModel, table=True):
some_uuid: UUID = Field(primary_key=True)
some_date: date = Field(primary_key=True)
some_nested: SomeNestedModel = Field(
sa_column=Column(PydanticJSONB(SomeNestedModel))
)
item_1 = SomeModel(
some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
some_date=date(2022, 1, 1),
some_nested=SomeNestedModel(some_decimal=Decimal("1.23")), #
)
print(item_1) # some_uuid=UUID('123e4567-e89b-12d3-a456-426614174000') some_date=datetime.date(2022, 1, 1) some_nested=SomeNestedModel(some_decimal=Decimal('1.23'))
|
PydanticJSONB
TypeDecorator for Automatic Pydantic Model Serialization in SQLModelPydanticJSONB
TypeDecorator for Automatic Pydantic Model Serialization in SQLModel
Hi all, Thanks for the contribution and it's great to see this level of engagement! Just as a maintenance note, I'll put this in draft while the CI is failing. Feel free to mark as "Ready for review" when it's green! |
🫡 |
I took some time to create a fully reproducible example to show the error I am getting. The issue seems to be with difference between using a sync adapter (psycopg2) or async adapter (asyncpg). Asyncpg cannot deal with already jsonable python dict before hitting the process_bind_param function. It seems asyncpg does stricter type handling during parameter substitution and does not convert to the correct type as psycopg2 does. By adding the to_jsonable_python in the process_bind_param function we are able to pass an not yet jsonable dict. Which resolves this issue with asyncpg. async def test_reproducable_example():
from datetime import date, datetime
from decimal import Decimal
from typing import Any, Dict, List, Type, TypeVar, get_args
from uuid import UUID
from pydantic import BaseModel
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import (
JSONB, # for Postgres JSONB
insert,
)
from sqlmodel import Column, Field, Session, SQLModel, create_engine, select
from sqlmodel.ext.asyncio.session import AsyncSession
from sqlalchemy.ext.asyncio import create_async_engine
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
# fill in your database connection details here
postgres_url = "postgresql+psycopg2://user:password@localhost:5432/dbname"
postgres_url_async = "postgresql+asyncpg://user:password@localhost:5432/dbname"
async_engine = create_async_engine(postgres_url_async)
sync_engine = create_engine(postgres_url)
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(
self,
model_class: Type[BaseModelType] | Type[list[BaseModelType]] | Type[Dict[str, BaseModelType]],
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
def process_bind_param(self, value: Any, dialect) -> dict | list[dict] | None: # noqa: ANN401, ARG002
if value is None:
return None
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [m.model_dump(mode="json") if isinstance(m, BaseModel) else m for m in value]
if isinstance(value, dict):
return {k: v.model_dump(mode="json") if isinstance(v, BaseModel) else v for k, v in value.items()}
return value
def process_result_value(
self, value: Any, dialect
) -> BaseModelType | List[BaseModelType] | Dict[str, BaseModelType] | None:
if value is None:
return None
if isinstance(value, dict):
# If model_class is a Dict type hint, handle key-value pairs
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is dict:
model_class = get_args(self.model_class)[1] # Get the value type (the model)
return {k: model_class.model_validate(v) for k, v in value.items()}
# Regular case: the whole dict represents a single model
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
# If model_class is a List type hint
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is list:
model_class = get_args(self.model_class)[0]
return [model_class.model_validate(v) for v in value]
# Fallback case (though this shouldn't happen given our __init__ types)
return [self.model_class.model_validate(v) for v in value] # type: ignore
return value
class SomeNestedModel(BaseModel):
some_decimal: Decimal
class SomeModel(SQLModel, table=True):
some_uuid: UUID = Field(primary_key=True)
some_date: date = Field(primary_key=True)
some_nested: SomeNestedModel = Field(sa_column=Column(PydanticJSONB(SomeNestedModel)))
SQLModel.metadata.create_all(sync_engine)
item_1 = SomeModel(
some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
some_date=date(2022, 1, 1),
some_nested=SomeNestedModel(some_decimal=Decimal("1.23")),
)
items_mode_json = [item_1.model_dump(mode="json")]
items_non_json_mode = [item_1.model_dump()]
stmt_json_mode = insert(SomeModel).values(items_mode_json)
stmt_non_json_mode = insert(SomeModel).values(items_non_json_mode)
# Psycopg2 engine causing no problems!
with Session(sync_engine) as session:
session.execute(stmt_json_mode)
session.commit()
# This fails with statement error, the PydanticJSONB type decorator process_bind_param is called
# With a dict that does not have json serializable values (Decimal is not serializable)
try:
with Session(sync_engine) as session:
session.execute(stmt_non_json_mode)
session.commit()
except Exception as e:
print(e)
try:
# This fails with DBAPIError todordinal of the date type, asyncpg engine cannot deal with
# Date values already being converted to string
async with AsyncSession(async_engine) as session:
await session.execute(stmt_json_mode)
await session.commit()
except Exception as e:
print(e)
try:
# This fails with StatementError the PydanticJSONB type decorator process_bind_param is called
# With a dict that does not have json serializable values (Decimal is not serializable)
async with AsyncSession(async_engine) as session:
await session.execute(stmt_non_json_mode)
await session.commit()
except Exception as e:
print(e) |
… for non-BaseModel types in lists and dictionaries
@DaanRademaker Thanks for sending that example! Helped me understand the blocker you were presenting Made the updates to include Only issue now is the linter is yelling at me due to the function returning |
bump? @svlandeg |
Any advice in how will be used in a |
Im not sure if this is right, but it works for me from functools import partial
from typing import Any, Dict, List, Optional, Type, TypeVar, Union, get_args, get_origin
from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.sql.operators import ColumnOperators
from sqlalchemy.sql.type_api import _ComparatorFactory
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
JSONValue = Union[Dict[str, Any], List[Any], str, int, float, bool, None]
class PydanticJSONBComparator(JSONB.Comparator):
def __init__(self, *args, **kwargs):
self.model_class = kwargs.pop("model_class", None)
return super().__init__(*args, **kwargs)
def __getattr__(self, name: Any) -> ColumnOperators:
if name in self.model_class.model_fields:
return self.__getitem__(name)
raise AttributeError(
f"'{self.__class__.__name__}' object has no attribute '{name}'"
f" or '{name}' is not a valid field in {self.model_class.__name__}"
)
# Define a type alias for JSON-serializable values
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization with comparators."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(
self,
model_class: Union[
Type[BaseModelType],
Type[List[BaseModelType]],
Type[Dict[str, BaseModelType]],
],
*args: Any,
**kwargs: Any,
):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
@property
def comparator_factory(self) -> _ComparatorFactory[Any]: # type: ignore # mypy properties bug
"""Override the comparator factory to use our custom comparator."""
return partial(PydanticJSONBComparator, model_class=self.model_class)
def process_bind_param(self, value: Any, dialect: Any) -> JSONValue: # noqa: ANN401, ARG002, ANN001
if value is None:
return None
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [
m.model_dump(mode="json")
if isinstance(m, BaseModel)
else to_jsonable_python(m)
for m in value
]
if isinstance(value, dict):
return {
k: v.model_dump(mode="json")
if isinstance(v, BaseModel)
else to_jsonable_python(v)
for k, v in value.items()
}
# We know to_jsonable_python returns a JSON-serializable value, but mypy sees it as Any
return to_jsonable_python(value) # type: ignore[no-any-return]
def process_result_value(
self, value: Any, dialect: Any
) -> Optional[Union[BaseModelType, List[BaseModelType], Dict[str, BaseModelType]]]: # noqa: ANN401, ARG002, ANN001
if value is None:
return None
if isinstance(value, dict):
# If model_class is a Dict type hint, handle key-value pairs
origin = get_origin(self.model_class)
if origin is dict:
model_class = get_args(self.model_class)[
1
] # Get the value type (the model)
return {k: model_class.model_validate(v) for k, v in value.items()}
# Regular case: the whole dict represents a single model
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
# If model_class is a List type hint
origin = get_origin(self.model_class)
if origin is list:
model_class = get_args(self.model_class)[0]
return [model_class.model_validate(v) for v in value]
# Fallback case (though this shouldn't happen given our __init__ types)
return [self.model_class.model_validate(v) for v in value] # type: ignore
raise TypeError(
f"Unsupported type for PydanticJSONB from database: {type(value)}. Expected a dictionary or list."
) |
Oh nice this is pretty cool! |
@svlandeg why is this marked as a draft again? |
Because the tests are red again? |
Oh strange. I didn't change anything and it was all green. Tests were updated? |
Description
This PR introduces a
PydanticJSONB
to SQLModel, enabling seamless serialization and deserialization of Pydantic models in JSONB columns. This removes the need for manual conversion, allowing SQLModel instances to work directly with Pydantic objects.Why?
Storing Pydantic models in JSONB columns has been a recurring challenge. This PR solves that by automating conversion between Pydantic models and JSON fields.
How?
Benefits
✅ Eliminates manual conversion – No need to wrap dict(**org.config) manually.
✅ Ensures structured storage – Enforces Pydantic validation automatically.
✅ Improves dev experience – Seamless interaction with JSONB fields in SQLModel.
Example Usage of PydanticJSONB in SQLModel
With this PR, you can now store and retrieve Pydantic models in JSONB fields effortlessly.
Define a Pydantic Model
Create & Store Data
Retrieve & Use Data
Result:
✅ No need for OrgConfig(**org.config) – it's already a OrgConfig instance!
✅ Automatic conversion between JSONB and Pydantic models.
This simplifies handling structured configurations in SQLModel, making JSONB storage seamless and ergonomic. 🚀
Related Issues & Discussions
SQLModel Issue #63 – Nested Pydantic models in JSON fields.
Stack Overflow: Writing Pydantic objects into SQLAlchemy JSON columns
GitHub Gist Example – Community implementation of Pydantic models in JSON columns.
This PR makes SQLModel more ergonomic for JSONB storage while maintaining compatibility with SQLAlchemy and Pydantic. 🚀
TODO
Per contribution rules: https://sqlmodel.tiangolo.com/help/#create-a-pull-request
This PR still needs:
Before I do that, I would love to hear your thoughts @tiangolo!