Skip to content

328 Add schema and dynamic_models components#813

Open
mryoho wants to merge 4 commits intoLIF-Initiative:mainfrom
mryoho:328-schema-and-dynamic-models-from-a-and-m
Open

328 Add schema and dynamic_models components#813
mryoho wants to merge 4 commits intoLIF-Initiative:mainfrom
mryoho:328-schema-and-dynamic-models-from-a-and-m

Conversation

@mryoho
Copy link
Contributor

@mryoho mryoho commented Jan 27, 2026

These should be pretty close to functional (maybe should still have pre-commit hooks run--I ran out of time).

They won't be used for anything just yet - but they are needed in order to integrate some of the other code, which will come in another PR.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new components to the LIF framework: schema and dynamic_models, which provide functionality for parsing OpenAPI schemas and dynamically generating Pydantic models from those schemas.

Changes:

  • Introduces a schema component for extracting and loading schema fields from OpenAPI/JSON Schema documents
  • Adds a dynamic_models component for building nested Pydantic models dynamically from schema definitions
  • Enhances string_utils with improved implementations for case conversions and identifier sanitization
  • Adds a new SchemaField datatype to represent schema field metadata
  • Includes comprehensive test coverage with unit tests and integration tests
  • Updates pyproject.toml to register the new components

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
test/data/test_openapi_schema.json Test OpenAPI schema file with person entity structure (has critical JSON Schema validation issues)
test/components/lif/string_utils/test_core.py Comprehensive test suite for string utility functions
test/components/lif/schema/test_core.py Extensive test suite for schema extraction and loading functionality
test/components/lif/schema/init.py Empty init file for schema tests
test/components/lif/dynamic_models/test_core.py Comprehensive test suite for dynamic model generation with 980 lines of tests
test/components/lif/dynamic_models/init.py Empty init file for dynamic_models tests
pyproject.toml Registers new schema and dynamic_models components in the project
components/lif/string_utils/core.py Enhanced string utility functions with improved to_pascal_case and to_camel_case implementations
components/lif/schema/core.py Core functionality for extracting schema fields from OpenAPI documents
components/lif/schema/init.py Module initialization for schema component
components/lif/dynamic_models/core.py Core functionality for building dynamic Pydantic models from schema fields
components/lif/dynamic_models/init.py Module initialization for dynamic_models component
components/lif/datatypes/schema.py New SchemaField dataclass for representing schema field metadata

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +531 to +541
# SchemaField(
# json_path="person.name",
# description="Person name",
# attributes={"xQueryable": True, "dataType": "xsd:string", "array": "No"},
# )
# SchemaField(
# json_path='person',
# description='',
# attributes={'xMutable': False, 'type': 'object', 'array': 'No', 'branch': True, 'leaf': False},
# py_field_name=''
# )
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed. Lines 531-541 contain commented-out SchemaField definitions that appear to be leftover from development or debugging. If these are needed for documentation or future reference, they should be properly documented or removed entirely to keep the test clean.

Suggested change
# SchemaField(
# json_path="person.name",
# description="Person name",
# attributes={"xQueryable": True, "dataType": "xsd:string", "array": "No"},
# )
# SchemaField(
# json_path='person',
# description='',
# attributes={'xMutable': False, 'type': 'object', 'array': 'No', 'branch': True, 'leaf': False},
# py_field_name=''
# )

Copilot uses AI. Check for mistakes.
Comment on lines +563 to +564
print(models)

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed. Line 563 contains a print statement that was likely used for debugging during development. This should be removed or replaced with proper logging/assertions for production code.

Suggested change
print(models)

Copilot uses AI. Check for mistakes.
return node.get("type") == "array" or "items" in node

def get_description(node: dict) -> str:
"""Get description from node, prefer lower-case."""
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect comment: the function prefers uppercase "Description" over lowercase "description", not the other way around. The comment says "prefer lower-case" but the code node.get("Description", "") or node.get("description", "") actually prefers the uppercase version first.

Suggested change
"""Get description from node, prefer lower-case."""
"""Get description from node, preferring 'Description' over 'description'."""

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +66
"Contact": {
"type": "array",
"properties": {
"Email": {
"type": "array",
"properties": {
"emailAddress": {
"type": "string",
"description": "The electronic mail address of an individual or person"
}
}
}
}
}
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid JSON Schema structure: arrays should use "items" instead of "properties". The "Contact" field is declared as "type": "array" but uses "properties" to define its structure. This should use "items" with "type": "object" containing the properties.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +11
@dataclass
class SchemaField:
"""Represents a single schema field, including its path and attributes."""

json_path: str
description: str
attributes: Dict[str, Any]
py_field_name: str = ""
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchemaField is not exported from the datatypes module. For consistency with other datatypes in this module (like IdentityMapping, LIFFragment, etc.), SchemaField should be added to the init.py imports and all list in components/lif/datatypes/init.py. This would allow users to import it as from lif.datatypes import SchemaField instead of requiring the more verbose from lif.datatypes.schema import SchemaField.

Copilot uses AI. Check for mistakes.

This module supports building Pydantic models at runtime based on a schema definition,
allowing flexible data validation for various use cases (e.g., query filters, mutations, or full models).
All fields in generated models are Optional and default to None.
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring is inaccurate: states "All fields in generated models are Optional and default to None" but the all_optional parameter (line 96) can be set to False to make fields required. Update the module docstring to reflect that field optionality is configurable.

Suggested change
All fields in generated models are Optional and default to None.
By default, fields in generated models are Optional and default to None, but this optionality is configurable
via the ``all_optional`` parameter.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +61
"properties": {
"identifier": {
"type": "string",
"description": "A number and/or alphanumeric code used to uniquely identify the entity",
"x-queryable": true
},
"identifierType": {
"type": "string",
"description": "The types of sources of identifiers used to uniquely identify the entity",
"x-queryable": true
}
}
},
"Name": {
"type": "array",
"properties": {
"firstName": {
"type": "string",
"description": "The first name of a person or individual",
"x-mutable": false
},
"lastName": {
"type": "string",
"description": "The last name of a person or individual",
"x-mutable": false
}
}
},
"Proficiency": {
"type": "array",
"properties": {
"name": {
"type": "string",
"description": "Name of the proficiency"
},
"description": {
"type": "string",
"description": "Description of the proficiency"
}
}
},
"Contact": {
"type": "array",
"properties": {
"Email": {
"type": "array",
"properties": {
"emailAddress": {
"type": "string",
"description": "The electronic mail address of an individual or person"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid JSON Schema structure: arrays should use "items" instead of "properties". The "Proficiency" field is declared as "type": "array" but uses "properties" to define its structure. This should use "items" with "type": "object" containing the properties.

Suggested change
"properties": {
"identifier": {
"type": "string",
"description": "A number and/or alphanumeric code used to uniquely identify the entity",
"x-queryable": true
},
"identifierType": {
"type": "string",
"description": "The types of sources of identifiers used to uniquely identify the entity",
"x-queryable": true
}
}
},
"Name": {
"type": "array",
"properties": {
"firstName": {
"type": "string",
"description": "The first name of a person or individual",
"x-mutable": false
},
"lastName": {
"type": "string",
"description": "The last name of a person or individual",
"x-mutable": false
}
}
},
"Proficiency": {
"type": "array",
"properties": {
"name": {
"type": "string",
"description": "Name of the proficiency"
},
"description": {
"type": "string",
"description": "Description of the proficiency"
}
}
},
"Contact": {
"type": "array",
"properties": {
"Email": {
"type": "array",
"properties": {
"emailAddress": {
"type": "string",
"description": "The electronic mail address of an individual or person"
"items": {
"type": "object",
"properties": {
"identifier": {
"type": "string",
"description": "A number and/or alphanumeric code used to uniquely identify the entity",
"x-queryable": true
},
"identifierType": {
"type": "string",
"description": "The types of sources of identifiers used to uniquely identify the entity",
"x-queryable": true
}
}
}
},
"Name": {
"type": "array",
"items": {
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The first name of a person or individual",
"x-mutable": false
},
"lastName": {
"type": "string",
"description": "The last name of a person or individual",
"x-mutable": false
}
}
}
},
"Proficiency": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Name of the proficiency"
},
"description": {
"type": "string",
"description": "Description of the proficiency"
}
}
}
},
"Contact": {
"type": "array",
"items": {
"type": "object",
"properties": {
"Email": {
"type": "array",
"items": {
"type": "object",
"properties": {
"emailAddress": {
"type": "string",
"description": "The electronic mail address of an individual or person"
}
}

Copilot uses AI. Check for mistakes.

# ===== Build Root + Wrapper =====

# TODO (from before integration into this repo): This forces the wrapper structure. It should use OpenAPI schema
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment should be tracked in an issue tracker. This comment mentions that "this forces the wrapper structure" and "should use OpenAPI schema", which suggests incomplete implementation. Consider creating a GitHub issue to track this work and reference it in the comment, or complete the implementation in this PR.

Suggested change
# TODO (from before integration into this repo): This forces the wrapper structure. It should use OpenAPI schema
# Note: This currently forces the wrapper structure as an array. Any future change should
# be aligned with the OpenAPI schema used elsewhere in the system.

Copilot uses AI. Check for mistakes.
"__module__": __name__,
# Use supported ConfigDict keys; attach metadata via json_schema_extra
"model_config": ConfigDict(
# TODO (from before integration into this repo): Make sure the change from this works: title=class_name, description=desc, strict=False, extra="allow" if allow_extra else "forbid"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment should be tracked in an issue tracker or removed. This comment mentions configuration changes but doesn't provide clear context about what needs to be verified. If this configuration is correct, remove the TODO; otherwise, create a GitHub issue to track the validation work.

Suggested change
# TODO (from before integration into this repo): Make sure the change from this works: title=class_name, description=desc, strict=False, extra="allow" if allow_extra else "forbid"

Copilot uses AI. Check for mistakes.
OPENAPI_SCHEMA_FILE: str | None = os.getenv("OPENAPI_SCHEMA_FILE")

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused logger variable: the logger is created on line 34 but is never used anywhere in the module. Either use the logger for appropriate logging statements or remove it to reduce clutter.

Suggested change
logger = logging.getLogger(__name__)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant