New @strict decorator for dataclass validation#2895
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@Wauplin A I've run into a constraint: some validated fields are class-dependent multiple-choice values, or have an admissible range. We can write a validator for each of these cases but, ideally, a single composable validator would be shared across models. An example: (validator) def choice_str(value: str, choices: Iterable[str]):
"""Ensures that `value` is one of the choices in `choices`."""
if value not in choices:
raise ValueError(f"Value must be one of {choices}, got {value}")(validated field, what I would like to be able to do) position_embedding_type: str = validated_field(
choice_str, choices=["absolute", "relative_key", "relative_key_query"], default="absolute"
)An alternative would be to define the model-specific validator as albert_position_embedding_type_validator = functools.partial(choice_str, choices=...)But I was wondering whether it would be possible to pipe validator arguments from |
|
I would prefer to not to pipe arguments into the validator on demand. It brings a level of complexity more to def one_of(choices: List):
def _inner(value: Any):
"""Check `value` is one of the options in `choices`."""
if value not in choices:
raise ValueError(f"Value must be one of {choices}, got {value}.")
return _inner
@strict_dataclass
class AlbertConfig:
position_embedding_type: str = validated_field(one_of(["absolute", "relative_key", "relative_key_query"]), default="absolute")
...I've renamed |
|
@gante I pushed b1720fa to support @strict_dataclass
class AlbertConfig:
position_embedding_type: Literal["absolute", "relative_key", "relative_key_query"] = "absolute"
...(much better for readability + IDE autocompletion) |
|
I'm going to redo the transformers-side PR with the latest suggestions 👍 |
Co-authored-by: Célina <hanouticelina@gmail.com>
@strict_dataclass decorator for dataclass validation@strict decorator for dataclass validation
@strict decorator for dataclass validation@strict decorator for dataclass validation
|
@hanouticelina @gante @ArthurZucker PR is in its final state and is ready for review :) Documentation's here if needed: https://moon-ci-docs.huggingface.co/docs/huggingface_hub/pr_2895/en/package_reference/dataclasses. Implementation currently used in huggingface/transformers#36534 as a test. |
Co-authored-by: célina <hanouticelina@gmail.com>
Co-authored-by: célina <hanouticelina@gmail.com>
…face/huggingface_hub into first-draft-for-strict-dataclass
|
Thanks for the thorough review @hanouticelina! I've addressed all of your comments and updated the tests/docs accordingly. |
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
|
merging! |
* New @strict_dataclass decorator * expose main methods * typog * Support Literal[...] type * Update src/huggingface_hub/utils/_strict_dataclass.py Co-authored-by: Célina <hanouticelina@gmail.com> * nit * accept kwargs * Accept kwargs, move to huggingface.dataclasses, fix autocompletion, add tests, add docs * docs * @as_validated_field decorator * code quality * class validators * inherit class validators from not strict classes * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: célina <hanouticelina@gmail.com> * remove duplicated definition of _setattr * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: célina <hanouticelina@gmail.com> * optional is an alias for union[, None] * dumb tests * Raise if already defined by user * docs * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update docs/source/en/package_reference/dataclasses.md Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * doc --------- Co-authored-by: Célina <hanouticelina@gmail.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Documentation https://moon-ci-docs.huggingface.co/docs/huggingface_hub/pr_2895/en/package_reference/dataclasses
Follow-up after huggingface/transformers#36329 and slack discussions (private).
The idea is to add a layer of validation on top of Python's built-in dataclasses.
Example
How to test
Test it with:
What it does ?
@strictto be used on top of@dataclass. When decorated, class values are validated.Dict[str, List[Optional[DummyClass]]is correctly validated)validated_field(built on top offield)validate_***are considered as class validators and executed at initialization after__post_init__.obj.validate().Why not chose pydantic ? (or
attrs? ormarshmallow_dataclass?)pydanticas a base requirement to 🤗transformerstransformers#36329 related to adding pydantic as a new dependency. Would be an heavy addition + require careful logic to support both v1 and v2.@strict_dataclassis not meant for heavy load where performances is critical. Common use case will be to validate a model configuration (only done once and very neglectable compared to running a model). This allows us to keep code minimal.Plan:
transformers@gante)We won't push for it / release it until we are sure at least the
transformersuse case is covered.Notes:
This
@strictmight be useful inhuggingface_hubitself in the future but that's not its primary goal for now.