Skip to content

FLC-QU-hep/PointCountFM

Repository files navigation

PointCountFM

Python Version PyTorch Version License Build Status Tests

A conditional flow matching model to generate the number of points per layer in a particle shower. The model can be used as part of a generative model for particle showers. It generates the number of points per layer in a particle shower given the incident particle type and kinematics.

Table of Contents

Requirements

  • pytorch: for training and inference of ML models
  • numpy: only as input/output data format
  • matplotlib: for visualization of data
  • h5py: for reading training data and saving generated data
  • pyyaml: for reading configuration files
  • showerdata: for handling calorimeter shower data

Setup

Clone repository

To clone the repository, run:

git clone [email protected]:FLC-QU-hep/PointCountFM.git
cd PointCountFM

Install dependencies

Choose one of the following options to install the required dependencies. Only uv has to be tested.

With uv (option 1):

uv sync --all-groups
source .venv/bin/activate

With pip + venv (option 2):

python3.13 -m venv --prompt PointCountFM .venv
source .venv/bin/activate
pip install -e .
pip install --group dev

If you want to try to run the code with a different python version, you might need to adapt the pyproject.toml file accordingly.

With conda (option 3):

conda env create -f environment.yaml
conda activate PointCountFM
pip install -e .

All packages available from conda-forge will be installed via conda, the rest via pip.

Data

You can use your own data or download the AllShowers dataset from Zenodo: https://zenodo.org/records/18020348 To download layer level shower data (1.3 GB), run:

mkdir data
curl -o data/layer_level.h5 https://zenodo.org/records/18020348/files/layer_level.h5?download=1

If you want to use you own data, store it in HDF5 format with the following keys:

key shape dtype description
directions (n, 3) float32 incident particle directions
energies (n, 1) float32 incident particle energies (in any consistent unit)
labels (n) int32 incident particle labels
num_points (n, m) int32 number of points per layer

n is the data set size and m is the number of layers in the calorimeter, which needs to be consistent with dim_input in the configuration file.

Usage

The main entry point is the pointcountfm/trainer.py script. It can be run with the following command:

python pointcountfm/trainer.py [options] config/config.yaml

The configuration file config/config.yaml specifies all hyper-parameters, preprocessing steps, and the training data. The script will train a model and save it to results/%Y%m%d_%H%M%S_name/ where name is the name specified in the configuration file and %Y%m%d_%H%M%S is the current date and time. It also generates 50,000 samples and saves them to results/%Y%m%d_%H%M%S_name/new_samples.h5.

options

Option Short Description
--help -h Show the help message
--device -d The device to run the model on (e.g. cpu, mps, or cuda) (if not specified it will be automatically selected)
--time -t Run a timing test on the model
--fast-dev-run Run a fast development run for testing

configuration

The configuration file is a YAML with the following keys:

  • model: specifies the model architecture and hyper-parameters
  • data: specifies the training data and preprocessing steps
  • training: specifies the training hyper-parameters
  • name: a descriptive name for the run

model

Key Type Description
name string The model class (FullyConnected or ConcatSquash)
dim_input int The dimension of the input data
dim_condition int The dimension of the condition (number of particle labels + 3 (directions) + 1 (energies))
dim_time int The dimension of the time embedding
hidden_dims list A list of hidden dimensions for the model

data

Key Type Description
data_file string The path to the training data
batch_size int The batch size for training
batch_size_val int The batch size for validation
transform_num_points list A list of the preprocessing steps for the number of points per layer (optional)
transform_inc list A list of the preprocessing steps for the incident energy (optional)

training

Key Type Description
epochs int The number of epochs to train the model
optimizer dict The optimizer (name key) and its hyper-parameters
scheduler dict The learning rate scheduler (name key) and its hyper-parameters (optional)

If you use OneCycleLR or CosineAnnealing as a scheduler, the maximum number of iterations is calculated automatically.

For an example configuration file, see config/config.yaml.

pre-commit

This repository uses pre-commit to run checks on the code before committing. To install pre-commit, run:

pre-commit install

This will install pre-commit and set up the checks. If you want to run the checks manually, you can run:

pre-commit run --all-files

This will run all checks on all files.

Testing

To run the unit tests, run:

python -m unittest discover -s test -p "*_test.py" -v

If you have any questions or comments about this repository, please contact [email protected].

About

A conditional flow matching model to generate the number of points per layer in a particle shower.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages