diff --git a/.github/workflows/docs-preview.yaml b/.github/workflows/docs-preview.yaml new file mode 100644 index 00000000..fc789905 --- /dev/null +++ b/.github/workflows/docs-preview.yaml @@ -0,0 +1,40 @@ +# .github/workflows/docs-preview.yml +name: Docs (PR preview) + +on: + pull_request: + branches: [ main ] # runs for ANY PR targeting main (e.g., your 20-create-docs → main) + +permissions: + contents: write + pages: write + id-token: write + pull-requests: write + +jobs: + preview: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + # build your docs (use uv or pip; shown with uv here) + - name: Install uv + run: | + curl -LsSf https://astral.sh/uv/install.sh | sh + echo "$HOME/.local/bin" >> $GITHUB_PATH + - name: Install project + doc tooling + run: | + uv venv .venv + . .venv/bin/activate + uv pip install -e . + uv pip install -U jupyter-book "sphinx>=7" sphinx-autodoc-typehints + - name: Build Jupyter Book + run: | + . .venv/bin/activate + uv run jupyter-book build docs/ + + # Publishes to /pr-/ by default and comments the link on the PR + - name: Deploy PR Preview + uses: rossjrw/pr-preview-action@v1 + with: + source-dir: docs/_build/html diff --git a/.github/workflows/docs.yaml b/.github/workflows/docs.yaml new file mode 100644 index 00000000..d2c07e8c --- /dev/null +++ b/.github/workflows/docs.yaml @@ -0,0 +1,38 @@ +name: Docs + +on: + push: + branches: [ main ] + pull_request: + +jobs: + build-docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + # Install uv (or use your preferred installer) + - name: Install uv + run: | + curl -LsSf https://astral.sh/uv/install.sh | sh + echo "$HOME/.local/bin" >> $GITHUB_PATH + + # Create / use env and install your package + doc deps + - name: Install project (editable) and doc tooling + run: | + uv venv .venv + . .venv/bin/activate + uv pip install -e . + uv pip install jupyter-book "sphinx>=7" sphinx-autodoc-typehints + + # Build the book with the SAME interpreter/env + - name: Build docs + run: | + . .venv/bin/activate + uv run jupyter-book build docs/ + + # (Optional) publish artifacts or deploy HTML to GitHub Pages + - name: Upload site + uses: actions/upload-pages-artifact@v3 + with: + path: docs/_build/html diff --git a/.gitignore b/.gitignore index c668d847..71cac5ea 100644 --- a/.gitignore +++ b/.gitignore @@ -11,3 +11,4 @@ outputs wandb notebooks/data notebooks/*.nc +docs/_build/ diff --git a/README.md b/README.md index 0ddf39a9..822a1773 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,10 @@ A pipeline for predicting sea ice. +## Documentation + +📚 **[View the full documentation](https://alan-turing-institute.github.io/ice-station-zebra/pr-112/)** - Complete API reference, guides, and examples. + ## Setting up your environment ### Tools diff --git a/docs/_config.yml b/docs/_config.yml new file mode 100644 index 00000000..ac67a063 --- /dev/null +++ b/docs/_config.yml @@ -0,0 +1,25 @@ +# Book settings +title: "Ice Station Zebra API Documentation" +author: The Alan Turing Institute + +sphinx: + extra_extensions: + - sphinx.ext.autodoc + - sphinx.ext.autosummary + config: + autosummary_generate: true + autodoc_typehints: "description" # optional + autodoc_member_order: "bysource" # optional + autoclass_content: "init" # pull class doc from __init__ docstring + +# Table of contents +toc: + - file: intro + - file: cli + - file: api/index + - file: api/data_loaders + - file: api/models + - file: api/training + - file: api/evaluation + - file: api/types + - file: api/visualisations diff --git a/docs/_toc.yml b/docs/_toc.yml new file mode 100644 index 00000000..7c041df2 --- /dev/null +++ b/docs/_toc.yml @@ -0,0 +1,20 @@ +# Table of contents +format: jb-book +root: intro +chapters: +- file: quickstart +- file: adding-new-models +- file: cli +- file: api/index + sections: + - file: api/data_loaders + - file: api/models + - file: api/models-common + - file: api/models-encoders + - file: api/models-decoders + - file: api/models-processors + - file: api/models-diffusion + - file: api/training + - file: api/evaluation + - file: api/types + - file: api/visualisations diff --git a/docs/adding-new-models.md b/docs/adding-new-models.md new file mode 100644 index 00000000..1ebd8cd0 --- /dev/null +++ b/docs/adding-new-models.md @@ -0,0 +1,53 @@ +# Adding new models + +## Background + +An `ice-station-zebra` model needs to be able to run over multiple different datasets with different dimensions. +These are structured in `NTCHW` format, where: +- `N` is the batch size, +- `T` is the number of history (forecast) steps for inputs (outputs) +- `C` is the number of channels or variables +- `H` is a height dimension +- `W` is a width dimension + +`N` and `T` will be the same for all inputs, but `C`, `H` and `W` might vary. + +Taking as an example, a batch size (`N=2`), 3 history steps and 4 forecast steps, we will have `k` inputs of shape `(2, 3, C_k, H_k, W_k)` and one output of shape `(2, 4, C_out, H_out, W_out)`. + +## Standalone models + +A standalone model will need to accept a `dict[str, TensorNTCHW]` which maps dataset names to an `NTCHW` Tensor of values. +The model might want to use one or more of these for training, and will need to produce an output with shape `N, T, C_out, H_out, W_out`. + +As can be seen in the example below, a separate instance of the model is likely to be needed for each output to be predicted. + +![Standalone Pipeline](pipeline-standalone.png) + +Pros: +- all input variables are available without transformation + +Cons: +- hard to add new inputs +- hard to add new outputs + +## Processor models + +A processor model is part of a larger encode-process-decode step. +Start by defining a latent space as `(C_latent, H_latent, W_latent)` - in the example below, this has been set to `(10, 64, 64)`. +The encode-process-decode model automatically creates one encoder for each input and one decoder for each output. +The dataset-specific encoder takes the input data and converts it to shape `(N, C_latent, H_latent, W_latent)`, compressing the time and channels dimensions. +The `k` encoded datasets can then be combined in latent space to give a single dataset of shape `(N, k * C_latent, H_latent, W_latent)`. + +This is then passed to the processor, which must accept input of shape `(N, k * C_latent, H_latent, W_latent)` and produce output of the same shape. + +This output is then passed to one or more output-specific decoders which take input of shape `(N, k * C_latent, H_latent, W_latent)` and produce output of shape `(N, T, C_out, H_out, W_out)`, regenerating the time dimension. + +![Encode-Process-Decode Pipeline](pipeline-encode-process-decode.png) + +Pros: +- easy to add new inputs +- easy to add new outputs + +Cons: +- input variables have been transformed into latent space +- time-step information has been compressed into the latent space diff --git a/docs/api/data_loaders.rst b/docs/api/data_loaders.rst new file mode 100644 index 00000000..f44b8b29 --- /dev/null +++ b/docs/api/data_loaders.rst @@ -0,0 +1,88 @@ +Data Loaders +============ + +The data loaders module provides classes for loading and managing datasets in the Ice Station Zebra framework. + +Classes +------- + +CombinedDataset +~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.data_loaders.combined_dataset.CombinedDataset + :members: + :undoc-members: + :show-inheritance: + +ZebraDataModule +~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.data_loaders.zebra_data_module.ZebraDataModule + :members: + :undoc-members: + :show-inheritance: + +ZebraDataset +~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.data_loaders.zebra_dataset.ZebraDataset + :members: + :undoc-members: + :show-inheritance: + +Usage Examples +-------------- + +Loading a Dataset +~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + from ice_station_zebra.data_loaders.zebra_dataset import ZebraDataset + + # Load a dataset + dataset = ZebraDataset("path/to/dataset.zarr") + + # Access data + data = dataset[0] # Get first sample + + +Combining Multiple Datasets +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + from ice_station_zebra.data_loaders.combined_dataset import CombinedDataset + + # Create a combined dataset from multiple ZebraDatasets + combined = CombinedDataset( + datasets=[dataset1, dataset2, dataset3], + target="target_dataset_name", + n_forecast_steps=4, + n_history_steps=3 + ) + + # Access combined data + sample = combined[0] # Returns dict with input and target data + +Using ZebraDataModule +~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + from ice_station_zebra.data_loaders.zebra_data_module import ZebraDataModule + from omegaconf import DictConfig + + # Initialize with configuration + data_module = ZebraDataModule(config) + + # Get data loaders + train_loader = data_module.train_dataloader() + val_loader = data_module.val_dataloader() + test_loader = data_module.test_dataloader() diff --git a/docs/api/evaluation.rst b/docs/api/evaluation.rst new file mode 100644 index 00000000..320ee8c3 --- /dev/null +++ b/docs/api/evaluation.rst @@ -0,0 +1,17 @@ +Evaluation +========== + +The evaluation module provides evaluation metrics and utilities for model assessment in the Ice Station Zebra framework. + +Classes +------- + +ZebraEvaluator +~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.evaluation.evaluator.ZebraEvaluator + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/index.md b/docs/api/index.md new file mode 100644 index 00000000..ab13462b --- /dev/null +++ b/docs/api/index.md @@ -0,0 +1,46 @@ +# API Reference + +This section contains detailed documentation for all classes and functions in the Ice Station Zebra framework. + +## Modules + +### Data Loaders +Classes for loading and managing datasets: +- **CombinedDataset** - Combines multiple ZebraDatasets for training +- **ZebraDataModule** - Lightning DataModule for dataset management +- **ZebraDataset** - Base dataset class for individual datasets + +### Data Processors +Classes for preprocessing and transforming data: +- **ZebraDataProcessor** - Main data processing pipeline +- **ZebraDataProcessorFactory** - Factory for creating processors + +### Models +Neural network models and architectures: +- **[Main Models](models.md)** - Core model classes (ZebraModel, EncodeProcessDecode, Persistence) +- **[Common Components](models-common.md)** - Building blocks and utilities +- **[Encoders](models-encoders.md)** - Input encoding components +- **[Decoders](models-decoders.md)** - Output decoding components +- **[Processors](models-processors.md)** - Latent space processing components +- **[Diffusion Models](models-diffusion.md)** - Diffusion-based forecasting algorithms + +### Training +Training utilities and trainers: +- **ZebraTrainer** - Main training class + +### Evaluation +Evaluation metrics and utilities: +- **ZebraEvaluator** - Model evaluation class + +### Types +Type definitions and data structures: +- **ArrayTCHW** - Time-Channel-Height-Width array type +- **DataSpace** - Data space definition +- **DataloaderArgs** - DataLoader arguments + +### Visualisations +Plotting and visualization utilities: +- **PlottingCore** - Core plotting functionality +- **PlottingMaps** - Map-based visualizations +- **Layout** - Plot layout utilities +- **Convert** - Data conversion utilities diff --git a/docs/api/models-common.rst b/docs/api/models-common.rst new file mode 100644 index 00000000..39a60da4 --- /dev/null +++ b/docs/api/models-common.rst @@ -0,0 +1,107 @@ +Common Components +================= + +The common components module provides building blocks and utilities used across different model architectures. + +Classes +------- + +CommonConvBlock +~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.conv_block_common.CommonConvBlock + :members: + :undoc-members: + :show-inheritance: + +ConvNormAct +~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.conv_norm_act.ConvNormAct + :members: + :undoc-members: + :show-inheritance: + +ConvBlockDownsample +~~~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.conv_block_downsample.ConvBlockDownsample + :members: + :undoc-members: + :show-inheritance: + +ConvBlockUpsample +~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.conv_block_upsample.ConvBlockUpsample + :members: + :undoc-members: + :show-inheritance: + +ConvBlockUpsampleNaive +~~~~~~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.conv_block_upsample_naive.ConvBlockUpsampleNaive + :members: + :undoc-members: + :show-inheritance: + +PatchEmbedding +~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.patchembed.PatchEmbedding + :members: + :undoc-members: + :show-inheritance: + +ResizingAveragePool2d +~~~~~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.resizing_average_pool_2d.ResizingAveragePool2d + :members: + :undoc-members: + :show-inheritance: + +ResizingInterpolation +~~~~~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.resizing_interpolation.ResizingInterpolation + :members: + :undoc-members: + :show-inheritance: + +TimeEmbed +~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.time_embed.TimeEmbed + :members: + :undoc-members: + :show-inheritance: + +TransformerEncoderBlock +~~~~~~~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.common.transformerblock.TransformerEncoderBlock + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/models-decoders.rst b/docs/api/models-decoders.rst new file mode 100644 index 00000000..66f0a4c0 --- /dev/null +++ b/docs/api/models-decoders.rst @@ -0,0 +1,27 @@ +Decoders +======== + +The decoders module provides classes for decoding latent space representations back to output data. + +Classes +------- + +BaseDecoder +~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.decoders.base_decoder.BaseDecoder + :members: + :undoc-members: + :show-inheritance: + +CNNDecoder +~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.decoders.cnn_decoder.CNNDecoder + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/models-diffusion.rst b/docs/api/models-diffusion.rst new file mode 100644 index 00000000..d8405f90 --- /dev/null +++ b/docs/api/models-diffusion.rst @@ -0,0 +1,28 @@ +Diffusion Models +================ + +.. note:: + Diffusion models provide the underlying denoising algorithms. To use them within the encode–process–decode pipeline, wrap them via :class:`DDPMProcessor` (see :doc:`Processors `). + +Classes +------- + +GaussianDiffusion +~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.diffusion.gaussian_diffusion.GaussianDiffusion + :members: + :undoc-members: + :show-inheritance: + +UNetDiffusion +~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.diffusion.unet_diffusion.UNetDiffusion + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/models-encoders.rst b/docs/api/models-encoders.rst new file mode 100644 index 00000000..918c2e28 --- /dev/null +++ b/docs/api/models-encoders.rst @@ -0,0 +1,27 @@ +Encoders +======== + +The encoders module provides classes for encoding input data into latent space representations. + +Classes +------- + +BaseEncoder +~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.encoders.base_encoder.BaseEncoder + :members: + :undoc-members: + :show-inheritance: + +CNNEncoder +~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.encoders.cnn_encoder.CNNEncoder + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/models-processors.rst b/docs/api/models-processors.rst new file mode 100644 index 00000000..c92a5945 --- /dev/null +++ b/docs/api/models-processors.rst @@ -0,0 +1,61 @@ +Processors +========== + +The processors module provides classes for processing data in latent space within the encode-process-decode pipeline. + +Classes +------- + +BaseProcessor +~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.processors.base_processor.BaseProcessor + :members: + :undoc-members: + :show-inheritance: + +UNetProcessor +~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.processors.unet.UNetProcessor + :members: + :undoc-members: + :show-inheritance: + +NullProcessor +~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.processors.null.NullProcessor + :members: + :undoc-members: + :show-inheritance: + +VitProcessor +~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.processors.vit.VitProcessor + :members: + :undoc-members: + :show-inheritance: + +DDPMProcessor +~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.processors.ddpm.DDPMProcessor + :members: + :undoc-members: + :show-inheritance: + + .. note:: + This processor wraps the diffusion models from the :doc:`diffusion models ` section. + See :class:`GaussianDiffusion` and :class:`UNetDiffusion` for the underlying denoising algorithms. diff --git a/docs/api/models.rst b/docs/api/models.rst new file mode 100644 index 00000000..5b550f2a --- /dev/null +++ b/docs/api/models.rst @@ -0,0 +1,37 @@ +Models +====== + +The models module provides neural network architectures and model classes for the Ice Station Zebra framework. + +Main Model Classes +------------------ + +ZebraModel +~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.zebra_model.ZebraModel + :members: + :undoc-members: + :show-inheritance: + +EncodeProcessDecode +~~~~~~~~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.encode_process_decode.EncodeProcessDecode + :members: + :undoc-members: + :show-inheritance: + +Persistence +~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.models.persistence.Persistence + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/training.rst b/docs/api/training.rst new file mode 100644 index 00000000..e318b947 --- /dev/null +++ b/docs/api/training.rst @@ -0,0 +1,17 @@ +Training +======== + +The training module provides utilities and trainers for model training in the Ice Station Zebra framework. + +Classes +------- + +ZebraTrainer +~~~~~~~~~~~~ + +.. container:: toggle + + .. autoclass:: ice_station_zebra.training.trainer.ZebraTrainer + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/types.rst b/docs/api/types.rst new file mode 100644 index 00000000..875c76b5 --- /dev/null +++ b/docs/api/types.rst @@ -0,0 +1,12 @@ +Types +===== + +The types module provides type definitions and data structures for the Ice Station Zebra framework. + +Type Definitions +---------------- + +.. automodule:: ice_station_zebra.types.typedefs + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/visualisations.rst b/docs/api/visualisations.rst new file mode 100644 index 00000000..798c5621 --- /dev/null +++ b/docs/api/visualisations.rst @@ -0,0 +1,15 @@ +Visualisations +============== + +The visualisations module provides plotting and visualization utilities for the Ice Station Zebra framework. + +Functions +--------- + +Plotting Core +~~~~~~~~~~~~~ + +.. automodule:: ice_station_zebra.visualisations.plotting_core + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/cli.md b/docs/cli.md new file mode 100644 index 00000000..e0366c6f --- /dev/null +++ b/docs/cli.md @@ -0,0 +1,127 @@ +# CLI Commands + +The Ice Station Zebra CLI provides commands for dataset management, model training, and evaluation. + +## Main Entry Point + +The main CLI entry point is accessed through: + +```bash +uv run zebra --help +``` + +This provides access to all available commands organized into subcommands. + +## Available Commands + +### Dataset Management + +#### Create Dataset +```bash +uv run zebra datasets create [CONFIG_NAME] [OVERRIDES...] +``` + +Creates a new dataset from configuration files. + +**Parameters:** +- `CONFIG_NAME`: Name of the configuration file (optional, defaults to "zebra") +- `OVERRIDES`: Space-separated Hydra config overrides + +**Example:** +```bash +uv run zebra datasets create era5-0d5-south-2019-12-24h-v1 +``` + +#### Inspect Dataset +```bash +uv run zebra datasets inspect [CONFIG_NAME] [OVERRIDES...] +``` + +Inspects an existing dataset to show its structure and contents. + +**Parameters:** +- `CONFIG_NAME`: Name of the configuration file (optional, defaults to "zebra") +- `OVERRIDES`: Space-separated Hydra config overrides + +**Example:** +```bash +uv run zebra datasets inspect era5-0d5-south-2019-12-24h-v1 +``` + +### Model Training + +#### Train Model +```bash +uv run zebra train [CONFIG_NAME] [OVERRIDES...] +``` + +Trains a model using the specified configuration. + +**Parameters:** +- `CONFIG_NAME`: Name of the configuration file (optional, defaults to "zebra") +- `OVERRIDES`: Space-separated Hydra config overrides + +**Example:** +```bash +uv run zebra train encode_ddpm_decode +``` + +### Model Evaluation + +#### Evaluate Model +```bash +uv run zebra evaluate [CONFIG_NAME] [OVERRIDES...] +``` + +Evaluates a trained model on test data. + +**Parameters:** +- `CONFIG_NAME`: Name of the configuration file (optional, defaults to "zebra") +- `OVERRIDES`: Space-separated Hydra config overrides + +**Example:** +```bash +uv run zebra evaluate default +``` + +## Configuration Overrides + +All commands support Hydra configuration overrides using the syntax: + +```bash +uv run zebra [COMMAND] [CONFIG_NAME] key=value key.subkey=value +``` + +**Examples:** +```bash +# Override model parameters +uv run zebra train encode_ddpm_decode model.n_forecast_steps=7 + +# Override data paths +uv run zebra datasets create era5-0d5-south-2019-12-24h-v1 data.path=/path/to/data + +# Multiple overrides +uv run zebra train encode_ddpm_decode model.n_forecast_steps=7 trainer.max_epochs=100 +``` + +## Getting Help + +For detailed help on any command: + +```bash +uv run zebra --help +uv run zebra datasets --help +uv run zebra train --help +uv run zebra evaluate --help +``` + +## Configuration Files + +Configuration files are located in `ice_station_zebra/config/` and define: + +- **Dataset configurations**: Data sources, preprocessing, and storage +- **Model configurations**: Architecture, training parameters, and optimization +- **Training configurations**: Trainer settings, callbacks, and logging +- **Evaluation configurations**: Metrics, visualization, and output formats + +See the [Configuration Guide](configuration.md) for more details on creating and customizing configuration files. diff --git a/docs/intro.md b/docs/intro.md new file mode 100644 index 00000000..bf2498f1 --- /dev/null +++ b/docs/intro.md @@ -0,0 +1,14 @@ +# Weclome to Ice Station Zebra! + +Welcome to the Ice Station Zebra API documentation. + +## Overview + +Ice Station Zebra is a machine learning framework for sea ice forecasting and analysis. + +## Documentation Sections + +- **[Quickstart](quickstart.md)** - Setting up your environment and running zebra commands +- **[Adding New Models](adding-new-models.md)** - Guide to creating standalone and processor models +- **[CLI Commands](cli.md)** - Command-line interface for dataset management, training, and evaluation +- **[API Reference](api/index.md)** - Detailed information about all classes and functions diff --git a/docs/logo.png b/docs/logo.png new file mode 100644 index 00000000..06d56f40 Binary files /dev/null and b/docs/logo.png differ diff --git a/docs/assets/pipeline-encode-process-decode.png b/docs/pipeline-encode-process-decode.png similarity index 100% rename from docs/assets/pipeline-encode-process-decode.png rename to docs/pipeline-encode-process-decode.png diff --git a/docs/assets/pipeline-standalone.png b/docs/pipeline-standalone.png similarity index 100% rename from docs/assets/pipeline-standalone.png rename to docs/pipeline-standalone.png diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 00000000..f3256882 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,83 @@ +# Quickstart + +## Setting up your environment + +### Tools + +You will need to install the following tools if you want to develop this project: + +- [`uv`](https://docs.astral.sh/uv/getting-started/installation/) + +### Creating your own configuration file + +Create a file in `config` that is called `.local.yaml`. +You will want this to inherit from `base.yaml` and then apply your own changes on top. +For example, the following config will override the `base_path` option in `base.yaml`: + +```yaml +defaults: + - base + +base_path: /local/path/to/my/data +``` + +You can then run this with, e.g.: + +```bash +uv run zebra datasets create --config-name .yaml +``` +You can also use this config to override other options in the `base.yaml` file, as shown below: + +```yaml +defaults: + - base + - override /model: encode_unet_decode # Use this format if you want to use a different config + +# Override specific model parameters +model: + processor: + start_out_channels: 37 # Use this format to override specific model parameters in the named configs + +base_path: /local/path/to/my/data +``` + +Alternatively, you can apply overrides to specific options at the command line like this: + +```bash +uv run zebra datasets create ++base_path=/local/path/to/my/data +``` + +Note that `persistence.yaml` overrides the specific options in `base.yaml` needed to run the `Persistence` model. + +### Running on Baskerville + +As `uv` cannot easily be installed on Baskerville, you should install the `zebra` package directly into a virtual environment that you have set up. + +```bash +source /path/to/venv/activate.sh +pip install -e . +``` + +This means that later commands like `uv run X ...` should simply be `X ...` instead. + +## Running Zebra commands + +### Create + +You will need a [CDS account](https://cds.climate.copernicus.eu/how-to-api) to download data with `anemoi`. + +Run `uv run zebra datasets create` to download all datasets locally. + +### Inspect + +Run `uv run zebra datasets inspect` to inspect all datasets available locally. + +### Train + +Run `uv run zebra train` to train using the datasets specified in the config. + +:information_source: This will save checkpoints to `${BASE_DIR}/training/wandb/run-${DATE}$-${RANDOM_STRING}/checkpoints/${CHECKPOINT_NAME}$.ckpt`. + +### Evaluate + +Run `uv run zebra evaluate --checkpoint PATH_TO_A_CHECKPOINT` to evaluate using a checkpoint from a training run. diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 00000000..7e821e45 --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,3 @@ +jupyter-book +matplotlib +numpy