328 Add schema and dynamic_models components#813
328 Add schema and dynamic_models components#813mryoho wants to merge 4 commits intoLIF-Initiative:mainfrom
schema and dynamic_models components#813Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds two new components to the LIF framework: schema and dynamic_models, which provide functionality for parsing OpenAPI schemas and dynamically generating Pydantic models from those schemas.
Changes:
- Introduces a
schemacomponent for extracting and loading schema fields from OpenAPI/JSON Schema documents - Adds a
dynamic_modelscomponent for building nested Pydantic models dynamically from schema definitions - Enhances
string_utilswith improved implementations for case conversions and identifier sanitization - Adds a new
SchemaFielddatatype to represent schema field metadata - Includes comprehensive test coverage with unit tests and integration tests
- Updates
pyproject.tomlto register the new components
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| test/data/test_openapi_schema.json | Test OpenAPI schema file with person entity structure (has critical JSON Schema validation issues) |
| test/components/lif/string_utils/test_core.py | Comprehensive test suite for string utility functions |
| test/components/lif/schema/test_core.py | Extensive test suite for schema extraction and loading functionality |
| test/components/lif/schema/init.py | Empty init file for schema tests |
| test/components/lif/dynamic_models/test_core.py | Comprehensive test suite for dynamic model generation with 980 lines of tests |
| test/components/lif/dynamic_models/init.py | Empty init file for dynamic_models tests |
| pyproject.toml | Registers new schema and dynamic_models components in the project |
| components/lif/string_utils/core.py | Enhanced string utility functions with improved to_pascal_case and to_camel_case implementations |
| components/lif/schema/core.py | Core functionality for extracting schema fields from OpenAPI documents |
| components/lif/schema/init.py | Module initialization for schema component |
| components/lif/dynamic_models/core.py | Core functionality for building dynamic Pydantic models from schema fields |
| components/lif/dynamic_models/init.py | Module initialization for dynamic_models component |
| components/lif/datatypes/schema.py | New SchemaField dataclass for representing schema field metadata |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # SchemaField( | ||
| # json_path="person.name", | ||
| # description="Person name", | ||
| # attributes={"xQueryable": True, "dataType": "xsd:string", "array": "No"}, | ||
| # ) | ||
| # SchemaField( | ||
| # json_path='person', | ||
| # description='', | ||
| # attributes={'xMutable': False, 'type': 'object', 'array': 'No', 'branch': True, 'leaf': False}, | ||
| # py_field_name='' | ||
| # ) |
There was a problem hiding this comment.
Commented-out code should be removed. Lines 531-541 contain commented-out SchemaField definitions that appear to be leftover from development or debugging. If these are needed for documentation or future reference, they should be properly documented or removed entirely to keep the test clean.
| # SchemaField( | |
| # json_path="person.name", | |
| # description="Person name", | |
| # attributes={"xQueryable": True, "dataType": "xsd:string", "array": "No"}, | |
| # ) | |
| # SchemaField( | |
| # json_path='person', | |
| # description='', | |
| # attributes={'xMutable': False, 'type': 'object', 'array': 'No', 'branch': True, 'leaf': False}, | |
| # py_field_name='' | |
| # ) |
| print(models) | ||
|
|
There was a problem hiding this comment.
Debug print statement should be removed. Line 563 contains a print statement that was likely used for debugging during development. This should be removed or replaced with proper logging/assertions for production code.
| print(models) |
| return node.get("type") == "array" or "items" in node | ||
|
|
||
| def get_description(node: dict) -> str: | ||
| """Get description from node, prefer lower-case.""" |
There was a problem hiding this comment.
Incorrect comment: the function prefers uppercase "Description" over lowercase "description", not the other way around. The comment says "prefer lower-case" but the code node.get("Description", "") or node.get("description", "") actually prefers the uppercase version first.
| """Get description from node, prefer lower-case.""" | |
| """Get description from node, preferring 'Description' over 'description'.""" |
| "Contact": { | ||
| "type": "array", | ||
| "properties": { | ||
| "Email": { | ||
| "type": "array", | ||
| "properties": { | ||
| "emailAddress": { | ||
| "type": "string", | ||
| "description": "The electronic mail address of an individual or person" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Invalid JSON Schema structure: arrays should use "items" instead of "properties". The "Contact" field is declared as "type": "array" but uses "properties" to define its structure. This should use "items" with "type": "object" containing the properties.
| @dataclass | ||
| class SchemaField: | ||
| """Represents a single schema field, including its path and attributes.""" | ||
|
|
||
| json_path: str | ||
| description: str | ||
| attributes: Dict[str, Any] | ||
| py_field_name: str = "" |
There was a problem hiding this comment.
SchemaField is not exported from the datatypes module. For consistency with other datatypes in this module (like IdentityMapping, LIFFragment, etc.), SchemaField should be added to the init.py imports and all list in components/lif/datatypes/init.py. This would allow users to import it as from lif.datatypes import SchemaField instead of requiring the more verbose from lif.datatypes.schema import SchemaField.
|
|
||
| This module supports building Pydantic models at runtime based on a schema definition, | ||
| allowing flexible data validation for various use cases (e.g., query filters, mutations, or full models). | ||
| All fields in generated models are Optional and default to None. |
There was a problem hiding this comment.
Docstring is inaccurate: states "All fields in generated models are Optional and default to None" but the all_optional parameter (line 96) can be set to False to make fields required. Update the module docstring to reflect that field optionality is configurable.
| All fields in generated models are Optional and default to None. | |
| By default, fields in generated models are Optional and default to None, but this optionality is configurable | |
| via the ``all_optional`` parameter. |
| "properties": { | ||
| "identifier": { | ||
| "type": "string", | ||
| "description": "A number and/or alphanumeric code used to uniquely identify the entity", | ||
| "x-queryable": true | ||
| }, | ||
| "identifierType": { | ||
| "type": "string", | ||
| "description": "The types of sources of identifiers used to uniquely identify the entity", | ||
| "x-queryable": true | ||
| } | ||
| } | ||
| }, | ||
| "Name": { | ||
| "type": "array", | ||
| "properties": { | ||
| "firstName": { | ||
| "type": "string", | ||
| "description": "The first name of a person or individual", | ||
| "x-mutable": false | ||
| }, | ||
| "lastName": { | ||
| "type": "string", | ||
| "description": "The last name of a person or individual", | ||
| "x-mutable": false | ||
| } | ||
| } | ||
| }, | ||
| "Proficiency": { | ||
| "type": "array", | ||
| "properties": { | ||
| "name": { | ||
| "type": "string", | ||
| "description": "Name of the proficiency" | ||
| }, | ||
| "description": { | ||
| "type": "string", | ||
| "description": "Description of the proficiency" | ||
| } | ||
| } | ||
| }, | ||
| "Contact": { | ||
| "type": "array", | ||
| "properties": { | ||
| "Email": { | ||
| "type": "array", | ||
| "properties": { | ||
| "emailAddress": { | ||
| "type": "string", | ||
| "description": "The electronic mail address of an individual or person" |
There was a problem hiding this comment.
Invalid JSON Schema structure: arrays should use "items" instead of "properties". The "Proficiency" field is declared as "type": "array" but uses "properties" to define its structure. This should use "items" with "type": "object" containing the properties.
| "properties": { | |
| "identifier": { | |
| "type": "string", | |
| "description": "A number and/or alphanumeric code used to uniquely identify the entity", | |
| "x-queryable": true | |
| }, | |
| "identifierType": { | |
| "type": "string", | |
| "description": "The types of sources of identifiers used to uniquely identify the entity", | |
| "x-queryable": true | |
| } | |
| } | |
| }, | |
| "Name": { | |
| "type": "array", | |
| "properties": { | |
| "firstName": { | |
| "type": "string", | |
| "description": "The first name of a person or individual", | |
| "x-mutable": false | |
| }, | |
| "lastName": { | |
| "type": "string", | |
| "description": "The last name of a person or individual", | |
| "x-mutable": false | |
| } | |
| } | |
| }, | |
| "Proficiency": { | |
| "type": "array", | |
| "properties": { | |
| "name": { | |
| "type": "string", | |
| "description": "Name of the proficiency" | |
| }, | |
| "description": { | |
| "type": "string", | |
| "description": "Description of the proficiency" | |
| } | |
| } | |
| }, | |
| "Contact": { | |
| "type": "array", | |
| "properties": { | |
| "Email": { | |
| "type": "array", | |
| "properties": { | |
| "emailAddress": { | |
| "type": "string", | |
| "description": "The electronic mail address of an individual or person" | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "identifier": { | |
| "type": "string", | |
| "description": "A number and/or alphanumeric code used to uniquely identify the entity", | |
| "x-queryable": true | |
| }, | |
| "identifierType": { | |
| "type": "string", | |
| "description": "The types of sources of identifiers used to uniquely identify the entity", | |
| "x-queryable": true | |
| } | |
| } | |
| } | |
| }, | |
| "Name": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "firstName": { | |
| "type": "string", | |
| "description": "The first name of a person or individual", | |
| "x-mutable": false | |
| }, | |
| "lastName": { | |
| "type": "string", | |
| "description": "The last name of a person or individual", | |
| "x-mutable": false | |
| } | |
| } | |
| } | |
| }, | |
| "Proficiency": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "name": { | |
| "type": "string", | |
| "description": "Name of the proficiency" | |
| }, | |
| "description": { | |
| "type": "string", | |
| "description": "Description of the proficiency" | |
| } | |
| } | |
| } | |
| }, | |
| "Contact": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "Email": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "emailAddress": { | |
| "type": "string", | |
| "description": "The electronic mail address of an individual or person" | |
| } | |
| } |
|
|
||
| # ===== Build Root + Wrapper ===== | ||
|
|
||
| # TODO (from before integration into this repo): This forces the wrapper structure. It should use OpenAPI schema |
There was a problem hiding this comment.
TODO comment should be tracked in an issue tracker. This comment mentions that "this forces the wrapper structure" and "should use OpenAPI schema", which suggests incomplete implementation. Consider creating a GitHub issue to track this work and reference it in the comment, or complete the implementation in this PR.
| # TODO (from before integration into this repo): This forces the wrapper structure. It should use OpenAPI schema | |
| # Note: This currently forces the wrapper structure as an array. Any future change should | |
| # be aligned with the OpenAPI schema used elsewhere in the system. |
| "__module__": __name__, | ||
| # Use supported ConfigDict keys; attach metadata via json_schema_extra | ||
| "model_config": ConfigDict( | ||
| # TODO (from before integration into this repo): Make sure the change from this works: title=class_name, description=desc, strict=False, extra="allow" if allow_extra else "forbid" |
There was a problem hiding this comment.
TODO comment should be tracked in an issue tracker or removed. This comment mentions configuration changes but doesn't provide clear context about what needs to be verified. If this configuration is correct, remove the TODO; otherwise, create a GitHub issue to track the validation work.
| # TODO (from before integration into this repo): Make sure the change from this works: title=class_name, description=desc, strict=False, extra="allow" if allow_extra else "forbid" |
| OPENAPI_SCHEMA_FILE: str | None = os.getenv("OPENAPI_SCHEMA_FILE") | ||
|
|
||
| logging.basicConfig(level=logging.INFO) | ||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Unused logger variable: the logger is created on line 34 but is never used anywhere in the module. Either use the logger for appropriate logging statements or remove it to reduce clutter.
| logger = logging.getLogger(__name__) |
These should be pretty close to functional (maybe should still have pre-commit hooks run--I ran out of time).
They won't be used for anything just yet - but they are needed in order to integrate some of the other code, which will come in another PR.