diff --git a/README.md b/README.md index f570801..24b699f 100644 --- a/README.md +++ b/README.md @@ -6,43 +6,67 @@ [![CI](https://github.com/jmarshrossney/dirconf/actions/workflows/ci.yml/badge.svg)](https://github.com/jmarshrossney/dirconf/actions/workflows/ci.yml) [![Docs](https://github.com/jmarshrossney/dirconf/actions/workflows/docs.yml/badge.svg)](https://jmarshrossney.github.io/dirconf) -`dirconf` is a Python tool for declaratively specifying what a valid configuration directory looks like. +`dirconf` is a Python tool for declaratively specifying configuration directory structures, and constructing Python `dict` representations of their contents. + +For full user documentation and examples please visit **[https://jmarshrossney.github.io/dirconf/](https://jmarshrossney.github.io/dirconf/)**. + +## Motivations I wrote this because I sometimes have to work with quite old scientific models that require various configuration files and data inputs in various formats to be present in various locations. I was (and remain) concerned about how easy it can be to misconfigure certain models without realising, and how common workflows compromise reproducibility. `dirconf` helps by -1. Allowing the user to describe the structure of a directory representing a valid configuration, and validate real directories against this description. - -2. Facilitating the generation of new configurations and metadata programmatically, in Python, as opposed to copying and editing files by hand or writing shell scripts. +1. Allowing the user to describe the structure of a directory representing a valid configuration using Python [dataclasses](https://docs.python.org/3/library/dataclasses.html), and validate real directories against this description. -3. Providing a consistent mechanism through which complex, distributed configurations in legacy formats can be validated using excellent tools such as [JSON Schema](https://json-schema.org/) and [Pydantic](https://docs.pydantic.dev/). +2. Providing a scaffold for defining consistent read/write mechanisms through which complex, distributed configurations in legacy formats can be mapped to Python `dict`s. -Configurations are specified using Python [dataclasses](https://docs.python.org/3/library/dataclasses.html); `dirconf` has no dependencies beyond the standard library. +The ability to represent configurations as `dict`s is very useful indeed. +With no extra effort, we can: -For full user documentation and examples please visit **[https://jmarshrossney.github.io/dirconf/](https://jmarshrossney.github.io/dirconf/)**. +- Validate configurations using excellent tools such as [JSON Schema](https://json-schema.org/) and [Pydantic](https://docs.pydantic.dev/). +- Generate new configurations and metadata programmatically, as opposed to copying and editing files by hand or writing shell scripts. ## Installation +`dirconf` is a Python package and thus can be installed using `pip`, or tools such as `uv` and `poetry` that wrap around `pip`. + ```sh -pip install dirconf +uv add dirconf ``` -or with `uv`: +or ```sh -uv add dirconf +pip install dirconf ``` -or the equivalent command for other package managers (poetry etc). +Currently Python versions equal to or above 3.12 are supported. + +## Overview of usage + +There are two essential steps for adapting `dirconf` to a specific use-case. + +1. **Define handlers** satisfying the `Handler` protocol for each of the paths (files and directories) present in your configuration. +2. **Define the structure of a valid configuration** in terms of its paths and their respective handlers, by subclassing the `DirConfig` class. This is most easily done using the `make_dirconfig` function. + +The custom `DirConfig` subclass can then be used to + +1. **Read** a configuration from the filesystem into a Python `dict`. +2. **Write** a configuration `dict` to the filesystem. + +These steps are most easily understood through examples: see [the docs](https://jmarshrossney.github.io/dirconf/101.html). +All examples are based on self-contained [marimo](https://marimo.io/) notebooks, which can be found in the [examples](examples/) directory. +## Philosophy -## Development +`dirconf` contains ~700 lines of code (including docstrings) and has no dependencies beyond the Standard Library. -Contributions are welcome! +This is by design. +I have no intention of developing `dirconf` into a more sophisticated tool than it already is. +The aim is that is works seamlessly alongside other tools and packages for parsing and validation, without ever getting in the way or creating conflicts. -Please open a Pull Request against the `main` branch. +With that out of the way, please feel free to raise an [issue](https://github.com/jmarshrossney/dirconf/issues) or make a [pull request](https://github.com/jmarshrossney/dirconf/pulls) to suggest a change or feature. See [`CONTRIBUTING.md`](CONTRIBUTING.md) for full details. diff --git a/docs/index.md b/docs/index.md index 6666241..0fb33df 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,17 +1,21 @@ # Home -`dirconf` is a simple tool for the meta-configuration of collections of configuration files, leaning heavily on Python [dataclasses](https://docs.python.org/3/library/dataclasses.html). +`dirconf` is a Python tool for declaratively specifying configuration directory structures, and constructing Python `dict` representations of their contents. -I wrote this because I sometimes work with quite old scientific models requiring various configuration files and data inputs in various formats to be present in various locations. I was (and remain) concerned about how easy it can be to misconfigure certain models without realising, and how common workflows compromise reproducibility. +I wrote this because I sometimes have to work with quite old scientific models that require various configuration files and data inputs in various formats to be present in various locations. +I was (and remain) concerned about how easy it can be to misconfigure certain models without realising, and how common workflows compromise reproducibility. `dirconf` helps by -1. Allowing the user to describe the structure of a directory representing a valid configuration, and validate real directories against this description. +1. Allowing the user to describe the structure of a directory representing a valid configuration using Python [dataclasses](https://docs.python.org/3/library/dataclasses.html), and validate real directories against this description. -2. Facilitating the generation of new configurations and metadata programmatically, in Python, as opposed to copying and editing files by hand or writing shell scripts. +2. Providing a scaffold for defining consistent read/write mechanisms through which complex, distributed configurations in legacy formats can be mapped to Python `dict`s. -3. Providing a consistent mechanism through which complex, distributed configurations in legacy formats can be validated using excellent tools such as [JSON Schema](https://json-schema.org/) and [Pydantic](https://docs.pydantic.dev/). +The ability to represent configurations as `dict`s is very useful indeed. +With no extra effort, we can: +- Validate configurations using excellent tools such as [JSON Schema](https://json-schema.org/) and [Pydantic](https://docs.pydantic.dev/). +- Generate new configurations and metadata programmatically, as opposed to copying and editing files by hand or writing shell scripts. ## Installation @@ -31,7 +35,6 @@ I wrote this because I sometimes work with quite old scientific models requiring ``` Currently Python versions equal to or above 3.12 are supported. -It has no dependencies other than the Standard Library. ## Overview of usage @@ -50,7 +53,7 @@ These steps are most easily understood through examples. To start with, take a look at the [Usage](101.md) section. More realistic examples can be found under the 'examples' heading. -All of the examples (including 'Usage') are based on self-contained [marimo](https://marimo.io/) notebooks, which can be browsed and downloaded [here](https://github.com/jmarshrossney/dirconf/tree/main/examples/) +All examples are based on self-contained [marimo](https://marimo.io/) notebooks, which can be found [here](https://github.com/jmarshrossney/dirconf/tree/main/examples/). ## Philosophy diff --git a/examples/101/notebook.py b/examples/101/notebook.py index 52fb043..636808d 100644 --- a/examples/101/notebook.py +++ b/examples/101/notebook.py @@ -2,6 +2,7 @@ # requires-python = ">=3.12" # dependencies = [ # "marimo", +# "pydantic", # "pyyaml", # ] # /// @@ -779,9 +780,91 @@ def _(mo): @app.cell(hide_code=True) def _(mo): mo.md(r""" - ### Strategies for validation + ## Config Validation - *To do.* + A primary motivation for reading file-based configurations into Python dicts is to enable validation using Python tooling. + + Here we demonstrate how to validate the configuration dict returned by `read` using [Pydantic](https://docs.pydantic.dev/). + """) + return + + +@app.cell +def _(): + from pydantic import BaseModel + + class ParamsModel(BaseModel): + a: float + b: float + c: float + + class ConfigModel(BaseModel): + id: str + params: ParamsModel + init_state: list[float] + switch: bool + + return (ConfigModel,) + + +@app.cell(hide_code=True) +def _(mo): + mo.md(r""" + We can now validate the 'basic' configuration from earlier: + """) + return + + +@app.cell +def _(config_dict): + config_dict + return + + +@app.cell +def _(ConfigModel, config_dict): + validated_config = ConfigModel(**config_dict["config"]) + validated_config + return + + +@app.cell(hide_code=True) +def _(mo): + mo.md(r""" + If the configuration contains invalid data, Pydantic will raise a clear validation error: + """) + return + + +@app.cell +def _(ConfigModel): + try: + ConfigModel( + id=123, + params={"a": "not a float", "b": 2.0, "c": 3.0}, + init_state=[0, 0, 0], + switch=True, + ) + except Exception as e: + print(type(e).__name__) + print(e) + return + + +@app.cell(hide_code=True) +def _(mo): + mo.md(r""" + !!! tip + You can integrate validation directly into your workflow by wrapping the `read` method: + + ```python + def read_validated(config_instance, path): + config_dict = config_instance.read(path) + config_dict["config"] = ConfigModel(**config_dict["config"]).model_dump() + return config_dict + ``` + + This ensures that every time you load a configuration, it is automatically validated against your Pydantic model. """) return diff --git a/pyproject.toml b/pyproject.toml index a0160cc..f9ba4a4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "uv_build" [project] name = "dirconf" -version = "0.5.0" +version = "0.5.1" description = "Build declarative schemas for multi-file configuration directories using Python dataclasses, with dict-based read and write." authors = [ { name = "Joe Marsh Rossney", email = "17361029+jmarshrossney@users.noreply.github.com" } diff --git a/uv.lock b/uv.lock index 5c6e63e..ab11c29 100644 --- a/uv.lock +++ b/uv.lock @@ -219,7 +219,7 @@ wheels = [ [[package]] name = "dirconf" -version = "0.5.0" +version = "0.5.1" source = { editable = "." } [package.dev-dependencies]