This version of the code is under active development. For a stable version of the code associated with the manuscript "Inductive bias influences the spatial scale of biological features learned from images", please use the version on Zenodo. The datasets used in the paper can also be found on Zenodo.
Spatiotemporal encoder is a machine learning tool for rapidly developing and testing neural network autoencoder architectures on timelapse imaging data. This project was developed as part of the research described in the manuscript "Inductive bias determines the spatial scale of biological features learned from images".
Package and dependency management for this project is done with Poetry. To install dependencies, navigate to the project folder in the command line and run:
$ poetry installIf you do not have poetry installed, refer to the documentation they provide here.
Once dependencies are installed, place imaging data under the paths expected by your dataset YAML (see below). Training reads configuration from src/conf/.
src/conf/config.yaml sets top-level flags:
study_name: Base name of the study file insrc/conf/studies/(without.yaml), e.g.architecture-gastruloidloadssrc/conf/studies/architecture-gastruloid.yaml.data_quantity_experiment: Iftrue, runs the data-quantity sweep; iffalse, runs standard training from the study file.debug: Enables debug behavior in the training runner.
Files in src/conf/studies/ define experiments. Each study YAML has an experiments mapping: keys are experiment IDs (e.g. ae, vit), values list datasets, model assets, and run options.
Required shape:
experiments:
<experiment_id>:
datasets:
- <dataset_config_name>
model:
architecture: <model_yaml_stem> # src/conf/models/<stem>.yaml
num_timepoints: <int>
params: <hyperparams_yaml_stem> # src/conf/hyperparams/<stem>.yaml
general_configs:
pretrain: <true|false>
verbose: <true|false>The names under datasets, architecture, and params must match the corresponding YAML stems in src/conf/datasets/, src/conf/models/, and src/conf/hyperparams/. Packaged examples include architecture-gastruloid.yaml and architecture-ARCADE.yaml (extra architectures may appear commented—uncomment or add entries to enable them).
Files in src/conf/datasets/ describe loaders and data layout. Required fields (see DatasetConfig in src/simulation_encoder/dataclass/config_schemas.py):
loader: Which loader handles parsing (e.g.gastruloid,ARCADE).image_dir,label_dir: Paths to image stacks and label files.image_size: Edge length of square inputs (pixels).channels: Channel names passed to the model.batch_size,val_split,test_split: Batch size and validation/test fractions (val_split+test_splitmust be < 1).keys: Which sample keys to include (dataset-specific).
Optional:
augmentations: Augmentation list (e.g.rotate: 90); may be empty or commented.
Files in src/conf/models/ define the network. The bundled manuscript-style configs use:
type: e.g.AEforae_small,cae_small,neuralop_small, andvit_small.architecture: Layer lists underencoder,decoder_image, anddecoder_timepoint. Each layer has atype(PyTorch module name) and kwargs. Runtime placeholders includenum_channels,latent_dim,image_size, and tokens such asdecoder_spatial_flat.
Supported building blocks include standard conv/linear stacks, FNO (in neuralop_small.yaml), and VisionTransformer (vit_small.yaml). Other layouts (e.g. flat_cnn.yaml) may differ for one-off experiments.
Example skeleton:
type: AE
architecture:
encoder:
- type: <layer>
# layer kwargs...
decoder_image:
- type: <layer>
decoder_timepoint:
- type: <layer>For the manuscript Inductive bias influences the spatial scale of biological features learned from images, the encoder YAMLs referenced in the paper are under src/conf/models/:
- MLP:
ae_small.yaml - CNN:
cae_small.yaml - FNO:
neuralop_small.yaml - ViT:
vit_small.yaml
Files in src/conf/hyperparams/ describe training search spaces. grid_optimizers.yaml matches the current schema:
num_epochs: Training epochs.continuous: Typicallyimage_loss_weightwith:range: A single float or a two-element[low, high]interval (seeHyperparameterRangeConfiginconfig_schemas.py).search: e.g.linear.num_samples: Number of samples along the continuous axis when a range is used.
discrete: e.g.latent_dimwithvalues: list of latent sizes;optimizerwithvalues: a list of optimizer dicts. Fields such aslr,betas,momentum, andnesterovmay be lists, which are expanded in a grid (see the Adam and SGD blocks ingrid_optimizers.yaml).
Once configs are updated, start the Poetry virtual environment:
$ poetry shellFinally, experiments can be run manually by running the main.py file
$ python src/simulation_encoder/main.pyResults and logs will be recorded, and the best performing model in each experiment will have its weights saved in a .pth file in the corresponding results folder.