Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/docs-preview.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# .github/workflows/docs-preview.yml
name: Docs (PR preview)

on:
pull_request:
branches: [ main ] # runs for ANY PR targeting main (e.g., your 20-create-docs → main)

permissions:
contents: write
pages: write
id-token: write
pull-requests: write

jobs:
preview:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

# build your docs (use uv or pip; shown with uv here)
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: Install project + doc tooling
run: |
uv venv .venv
. .venv/bin/activate
uv pip install -e .
uv pip install -U jupyter-book "sphinx>=7" sphinx-autodoc-typehints
- name: Build Jupyter Book
run: |
. .venv/bin/activate
uv run jupyter-book build docs/

# Publishes to /pr-<PR number>/ by default and comments the link on the PR
- name: Deploy PR Preview
uses: rossjrw/pr-preview-action@v1
with:
source-dir: docs/_build/html
38 changes: 38 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Docs

on:
push:
branches: [ main ]
pull_request:

jobs:
build-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

# Install uv (or use your preferred installer)
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.local/bin" >> $GITHUB_PATH

# Create / use env and install your package + doc deps
- name: Install project (editable) and doc tooling
run: |
uv venv .venv
. .venv/bin/activate
uv pip install -e .
uv pip install jupyter-book "sphinx>=7" sphinx-autodoc-typehints

# Build the book with the SAME interpreter/env
- name: Build docs
run: |
. .venv/bin/activate
uv run jupyter-book build docs/

# (Optional) publish artifacts or deploy HTML to GitHub Pages
- name: Upload site
uses: actions/upload-pages-artifact@v3
with:
path: docs/_build/html
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ outputs
wandb
notebooks/data
notebooks/*.nc
docs/_build/
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

A pipeline for predicting sea ice.

## Documentation

📚 **[View the full documentation](https://alan-turing-institute.github.io/ice-station-zebra/pr-112/)** - Complete API reference, guides, and examples.

## Setting up your environment

### Tools
Expand Down
25 changes: 25 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Book settings
title: "Ice Station Zebra API Documentation"
author: The Alan Turing Institute

sphinx:
extra_extensions:
- sphinx.ext.autodoc
- sphinx.ext.autosummary
config:
autosummary_generate: true
autodoc_typehints: "description" # optional
autodoc_member_order: "bysource" # optional
autoclass_content: "init" # pull class doc from __init__ docstring

# Table of contents
toc:
- file: intro
- file: cli
- file: api/index
- file: api/data_loaders
- file: api/models
- file: api/training
- file: api/evaluation
- file: api/types
- file: api/visualisations
20 changes: 20 additions & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Table of contents
format: jb-book
root: intro
chapters:
- file: quickstart
- file: adding-new-models
- file: cli
- file: api/index
sections:
- file: api/data_loaders
- file: api/models
- file: api/models-common
- file: api/models-encoders
- file: api/models-decoders
- file: api/models-processors
- file: api/models-diffusion
- file: api/training
- file: api/evaluation
- file: api/types
- file: api/visualisations
53 changes: 53 additions & 0 deletions docs/adding-new-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Adding new models

## Background

An `ice-station-zebra` model needs to be able to run over multiple different datasets with different dimensions.
These are structured in `NTCHW` format, where:
- `N` is the batch size,
- `T` is the number of history (forecast) steps for inputs (outputs)
- `C` is the number of channels or variables
- `H` is a height dimension
- `W` is a width dimension

`N` and `T` will be the same for all inputs, but `C`, `H` and `W` might vary.

Taking as an example, a batch size (`N=2`), 3 history steps and 4 forecast steps, we will have `k` inputs of shape `(2, 3, C_k, H_k, W_k)` and one output of shape `(2, 4, C_out, H_out, W_out)`.

## Standalone models

A standalone model will need to accept a `dict[str, TensorNTCHW]` which maps dataset names to an `NTCHW` Tensor of values.
The model might want to use one or more of these for training, and will need to produce an output with shape `N, T, C_out, H_out, W_out`.

As can be seen in the example below, a separate instance of the model is likely to be needed for each output to be predicted.

![Standalone Pipeline](pipeline-standalone.png)

Pros:
- all input variables are available without transformation

Cons:
- hard to add new inputs
- hard to add new outputs

## Processor models

A processor model is part of a larger encode-process-decode step.
Start by defining a latent space as `(C_latent, H_latent, W_latent)` - in the example below, this has been set to `(10, 64, 64)`.
The encode-process-decode model automatically creates one encoder for each input and one decoder for each output.
The dataset-specific encoder takes the input data and converts it to shape `(N, C_latent, H_latent, W_latent)`, compressing the time and channels dimensions.
The `k` encoded datasets can then be combined in latent space to give a single dataset of shape `(N, k * C_latent, H_latent, W_latent)`.

This is then passed to the processor, which must accept input of shape `(N, k * C_latent, H_latent, W_latent)` and produce output of the same shape.

This output is then passed to one or more output-specific decoders which take input of shape `(N, k * C_latent, H_latent, W_latent)` and produce output of shape `(N, T, C_out, H_out, W_out)`, regenerating the time dimension.

![Encode-Process-Decode Pipeline](pipeline-encode-process-decode.png)

Pros:
- easy to add new inputs
- easy to add new outputs

Cons:
- input variables have been transformed into latent space
- time-step information has been compressed into the latent space
88 changes: 88 additions & 0 deletions docs/api/data_loaders.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
Data Loaders
============

The data loaders module provides classes for loading and managing datasets in the Ice Station Zebra framework.

Classes
-------

CombinedDataset
~~~~~~~~~~~~~~~

.. container:: toggle

.. autoclass:: ice_station_zebra.data_loaders.combined_dataset.CombinedDataset
:members:
:undoc-members:
:show-inheritance:

ZebraDataModule
~~~~~~~~~~~~~~~

.. container:: toggle

.. autoclass:: ice_station_zebra.data_loaders.zebra_data_module.ZebraDataModule
:members:
:undoc-members:
:show-inheritance:

ZebraDataset
~~~~~~~~~~~~

.. container:: toggle

.. autoclass:: ice_station_zebra.data_loaders.zebra_dataset.ZebraDataset
:members:
:undoc-members:
:show-inheritance:

Usage Examples
--------------

Loading a Dataset
~~~~~~~~~~~~~~~~~

.. code-block:: python

from ice_station_zebra.data_loaders.zebra_dataset import ZebraDataset

# Load a dataset
dataset = ZebraDataset("path/to/dataset.zarr")

# Access data
data = dataset[0] # Get first sample


Combining Multiple Datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

from ice_station_zebra.data_loaders.combined_dataset import CombinedDataset

# Create a combined dataset from multiple ZebraDatasets
combined = CombinedDataset(
datasets=[dataset1, dataset2, dataset3],
target="target_dataset_name",
n_forecast_steps=4,
n_history_steps=3
)

# Access combined data
sample = combined[0] # Returns dict with input and target data

Using ZebraDataModule
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

from ice_station_zebra.data_loaders.zebra_data_module import ZebraDataModule
from omegaconf import DictConfig

# Initialize with configuration
data_module = ZebraDataModule(config)

# Get data loaders
train_loader = data_module.train_dataloader()
val_loader = data_module.val_dataloader()
test_loader = data_module.test_dataloader()
17 changes: 17 additions & 0 deletions docs/api/evaluation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Evaluation
==========

The evaluation module provides evaluation metrics and utilities for model assessment in the Ice Station Zebra framework.

Classes
-------

ZebraEvaluator
~~~~~~~~~~~~~~

.. container:: toggle

.. autoclass:: ice_station_zebra.evaluation.evaluator.ZebraEvaluator
:members:
:undoc-members:
:show-inheritance:
46 changes: 46 additions & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# API Reference

This section contains detailed documentation for all classes and functions in the Ice Station Zebra framework.

## Modules

### Data Loaders
Classes for loading and managing datasets:
- **CombinedDataset** - Combines multiple ZebraDatasets for training
- **ZebraDataModule** - Lightning DataModule for dataset management
- **ZebraDataset** - Base dataset class for individual datasets

### Data Processors
Classes for preprocessing and transforming data:
- **ZebraDataProcessor** - Main data processing pipeline
- **ZebraDataProcessorFactory** - Factory for creating processors

### Models
Neural network models and architectures:
- **[Main Models](models.md)** - Core model classes (ZebraModel, EncodeProcessDecode, Persistence)
- **[Common Components](models-common.md)** - Building blocks and utilities
- **[Encoders](models-encoders.md)** - Input encoding components
- **[Decoders](models-decoders.md)** - Output decoding components
- **[Processors](models-processors.md)** - Latent space processing components
- **[Diffusion Models](models-diffusion.md)** - Diffusion-based forecasting algorithms

### Training
Training utilities and trainers:
- **ZebraTrainer** - Main training class

### Evaluation
Evaluation metrics and utilities:
- **ZebraEvaluator** - Model evaluation class

### Types
Type definitions and data structures:
- **ArrayTCHW** - Time-Channel-Height-Width array type
- **DataSpace** - Data space definition
- **DataloaderArgs** - DataLoader arguments

### Visualisations
Plotting and visualization utilities:
- **PlottingCore** - Core plotting functionality
- **PlottingMaps** - Map-based visualizations
- **Layout** - Plot layout utilities
- **Convert** - Data conversion utilities
Loading
Loading