GitHub - FloMau/gato-hep

gato-hep logo

We present gato-hep: the Gradient-based cATegorization Optimizer for High Energy Physics analyses. gato-hep learns boundaries in N-dimensional discriminants that maximize signal significance for binned likelihood fits, using a differentiable approximation of signal significance and gradient descent techniques for optimization with TensorFlow.

📘 Documentation: https://gato-hep.readthedocs.io/
📦 PyPI: https://pypi.org/project/gato-hep/
🧪 Examples: see the examples/ directory in this repository

Key Features

Optimize categorizations in multi-dimensional spaces using Gaussian Mixture Models (GMM) or 1D sigmoid-based models
Set the range of the discriminant dimensions as needed for your analysis
Penalize low-yield or high-uncertainty categories to keep optimizations analysis-friendly
Built-in annealing schedules for temperature / steepness (setting the level of approximation for differentiability), and learning rate to stabilize training
Ready-to-run toy workflows that mirror real HEP analysis patterns

Installation

Latest release (PyPI)

pip install gato-hep

The base install targets CPU execution and pulls the tested TensorFlow stack automatically. Optional extras:

pip install "gato-hep[gpu]"   # CUDA-enabled TensorFlow wheels

For the GPU extra you still need NVIDIA drivers and CUDA libraries that match the selected TensorFlow build.

From source

git clone https://github.com/FloMau/gato-hep.git
cd gato-hep
python -m venv .venv  # or use micromamba/conda
source .venv/bin/activate
pip install -e ".[dev]"

Requirements: Python ≥ 3.10. See pyproject.toml for the authoritative dependency pins.

Quickstart

The snippet below mirrors the three-class softmax demo. It generates the 3D toy sample, fits a two-dimensional Gaussian mixture model to the softmax scores, and reports the per-signal significances produced by the learnt categories.

import numpy as np
import tensorflow as tf
from pathlib import Path

from gatohep.data_generation import generate_toy_data_3class_3D
from gatohep.models import gato_gmm_model


def convert_data_to_tensors(data):
    tensors = {}
    for proc, df in data.items():
        scores = np.stack(df["NN_output"].values)[:, :2]  # keep the first two dims
        weights = df["weight"].values
        tensors[proc] = {
            "NN_output": tf.convert_to_tensor(scores, tf.float32),
            "weight": tf.convert_to_tensor(weights, tf.float32),
        }
    return tensors

# setup class for the 2D discriminant optimization
class SoftmaxGMM(gato_gmm_model):
    def __init__(self, n_cats, temperature=0.3):
        super().__init__(
            n_cats=n_cats,
            dim=2,
            temperature=temperature,
            mean_norm="softmax",
        )
    def call(self, data_dict):
        # Differentiate through the Asimov significances provided by the helper
        significances = self.get_differentiable_significance(
            data_dict,
            signal_labels=["signal1", "signal2"],
        )
        z1 = significances["signal1"]
        z2 = significances["signal2"]
        return -tf.sqrt(z1 * z2)  # geometric-mean loss

# load your data as dictionary containing pandas DataFrames, or use the integrated toy data generation:
data = generate_toy_data_3class_3D()
tensors = convert_data_to_tensors(data)

# example: use 10 bins
model = SoftmaxGMM(n_cats=10, temperature=0.3)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.05)

# actual training
for epoch in range(100):
    with tf.GradientTape() as tape:
        loss = model.call(tensors)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

# Save the trained model for later use in the analysis to some path
checkpoint_path = Path("softmax_demo_ckpt")
model.save(checkpoint_path)

# Restore the model
restored = SoftmaxGMM(n_cats=10, temperature=0.3)
restored.restore(checkpoint_path)

# Obtain the hard (non-differentiable) bin assignments
assignments = restored.get_bin_indices(tensors)

See examples/three_class_softmax_example/run_example.py for the full training loop with schedulers, plotting helpers, and GIF generation.

Examples & Tutorials

examples/1D_example/run_sigmoid_example.py – sigmoid-based boundaries for a single discriminant.
examples/1D_example/run_gmm_example.py – GMM-based categorisation for the same data.
examples/three_class_softmax_example/run_example.py – optimize categories directly on a 3-class softmax output (shown in 2D projections).
examples/bumphunt_example/run_example.py – $H\to\gamma\gamma$–style bump hunt example with inference on the mass, but including the background over a wider range for increased statistical power.

Every script populates an examples/.../Plots*/ folder with plots and checkpoints.

Contributing

Fork and branch: git checkout -b feature/xyz.
Implement changes under src/gatohep/ and possibly add/adjust tests in tests/.
Format and lint (flake8) and run pytest.
Open a pull request summarizing the physics motivation and technical changes.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/gatohep		src/gatohep
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Key Features

Installation

Latest release (PyPI)

From source

Quickstart

Examples & Tutorials

Further Reading

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

FloMau/gato-hep

Folders and files

Latest commit

History

Repository files navigation

Key Features

Installation

Latest release (PyPI)

From source

Quickstart

Examples & Tutorials

Further Reading

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages