psaegert · psaegert · Jan 6, 2026 · Jan 6, 2026 · Jan 6, 2026 · Jan 6, 2026
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-<h1 align="center" style="margin-top: 0px;">⚡ANSR:<br>Flash Amortized Neural Symbolic Regression</h1>
+<h1 align="center" style="margin-top: 0px;">⚡Flash-ANSR:<br>Fast Amortized Neural Symbolic Regression</h1>
 
 <div align="center">
 
@@ -106,7 +106,7 @@ Coming soon
   title   = {Flash Amortized Neural Symbolic Regression},
   year    = {2024},
   publisher   = {GitHub},
-  version = {0.4.4},
+  version = {0.4.5},
   url     = {https://github.com/psaegert/flash-ansr}
 }
 ```
diff --git a/docs/evaluation.md b/docs/evaluation.md
@@ -41,217 +41,46 @@
 4. Install the required sympytorch fork: `pip install git+https://github.com/pakamienny/sympytorch.git`.
 5. Download the pretrained checkpoint to `e2e/model1.pt` (mirror of https://dl.fbaipublicfiles.com/symbolicregression/model1.pt). Keep the filename as-is; the scaling config points there.
 
-## Express
+## Configs at a glance
 
-Use, copy or modify a config in `./configs`:
+- Evaluation configs live under `configs/evaluation/` (families: `scaling/`, `noise_sweep/`, `support_sweep/`).
+- Each file is a single run definition: `data_source`, `model_adapter`, and `runner` blocks.
+- Multi-experiment configs run **all** experiments when `--experiment` is omitted; pass a name to isolate one.
+- Outputs default to `results/evaluation/...` as specified in the config; override with `-o/--output-file`.
 
-```
-./configs
-├── my_config
-│   ├── dataset_train.yaml          # Link to skeleton pool and padding for training
-│   ├── dataset_val.yaml            # Link to skeleton pool and padding for validation
-│   ├── tokenizer.yaml              # Tokenizer settings
-│   ├── model.yaml                  # Model settings and link to simplipy engine
-│   ├── skeleton_pool_train.yaml    # Sampling and holdout settings for training
-│   ├── skeleton_pool_val.yaml      # Sampling and holdout settings for validation
-│   └── train.yaml                  # Data and schedule for training
-```
-
-Use the helper scripts to import data, build validation sets, and kick off training:
-
-```sh
-./scripts/import_test_sets.sh                     # optional, required only once per checkout
-./scripts/generate_validation_set.sh my_config    # prepares validation skeletons
-./scripts/train.sh my_config                      # trains using configs/my_config
-```
-
-For more information see below.
-
-## Manual
-
-### 0. Prerequisites
-
-Test data structured as follows:
-
-```sh
-./data/ansr-data/test_set
-├── fastsrb
-│   └── expressions.yaml
-```
-
-The test data can be cloned from the Hugging Face data repository:
-
-```sh
-git clone https://huggingface.co/psaegert/ansr-data data/ansr-data
-```
-
-### 1. Import test data
-
-External datasets must be imported into the supported format:
+## Step-by-step run guide
 
-```sh
-flash_ansr import-data -i "{{ROOT}}/data/ansr-data/test_set/fastsrb/expressions.yaml" -p "fastsrb" -e "dev_7-3" -b "{{ROOT}}/configs/test_set/skeleton_pool.yaml" -o "{{ROOT}}/data/ansr-data/test_set/fastsrb/skeleton_pool" -v
-```
-
-with
-
-- `-i` the input file
-- `-p` the name of the parser implemented in `./src/flash_ansr/compat/convert_data.py`
-- `-e` the SimpliPy engine version to use for simplification
-- `-b` the config of a base skeleton pool to add the data to
-- `-o` the output directory for the resulting skeleton pool
-- `-v` verbose output
-
-This will create and save a skeleton pool with the parsed imported skeletons in the specified directory:
+### 0. Benchmark data
 
-```sh
-./data/ansr-data/test_set/<test_set>
-└── skeleton_pool
-    ├── skeleton_pool.yaml
-    └── skeletons.pkl
-```
-
-### 2. Generate validation data
-
-Validation data is generated by randomly sampling according to the settings in the skeleton pool config:
+Fetch the FastSRB benchmark once (if you do not already have `data/ansr-data/test_set/fastsrb/expressions.yaml`):
 
 ```sh
-flash_ansr generate-skeleton-pool -c {{ROOT}}/configs/${CONFIG}/skeleton_pool_val.yaml -o {{ROOT}}/data/ansr-data/${CONFIG}/skeleton_pool_val -s 5000 -v
+mkdir -p "{{ROOT}}/data/ansr-data/test_set/fastsrb"
+wget -O "{{ROOT}}/data/ansr-data/test_set/fastsrb/expressions.yaml" \
+  "https://raw.githubusercontent.com/viktmar/FastSRB/refs/heads/main/src/expressions.yaml"
 ```
 
-with
-
-- `-c` the skeleton pool config
-- `-o` the output directory to save the skeleton pool
-- `-s` the number of unique skeletons to sample
-- `-v` verbose output
+This writes `skeleton_pool.yaml` and `skeletons.pkl` under the specified output directory.
 
-### 3. Train the model
-
-```sh
-flash_ansr train -c {{ROOT}}/configs/${CONFIG}/train.yaml -o {{ROOT}}/models/ansr-models/${CONFIG} -v -ci 100000 -vi 10000
-```
-
-with
-
-- `-c` the training config
-- `-o` the output directory to save the model and checkpoints
-- `-v` verbose output
-- `-ci` the interval to save checkpoints
-- `-vi` the interval for validation
-
-### 4. Evaluate the model
-
-⚡ANSR, PySR, NeSymReS, E2E, skeleton-pool, brute-force, and the FastSRB benchmark run through a shared evaluation engine.
-Each run is configured in a single YAML that wires a **data source**, a **model adapter**, and runtime **runner** settings.
-The common CLI entry point is:
+### 1. Run evaluation
 
 ```sh
 flash_ansr evaluate-run -c configs/evaluation/scaling/v23.0-20M_fastsrb.yaml --experiment flash_ansr_fastsrb_choices_00032 -v
 ```
-
-Use `-n/--limit`, `--save-every`, `-o/--output-file`, `--experiment <name>`, or `--no-resume` to temporarily override the config without editing the file. When a config defines multiple experiments (see `configs/evaluation/scaling/`), omitting `--experiment` now runs **all** of them sequentially; pass an explicit name if you only want a single sweep entry.
-
-#### 4.1 Config-driven workflow
-
-Every run config (see `configs/evaluation/*.yaml`) follows the same structure:
-
-```yaml
-run:
-  data_source:  # how to create evaluation samples
-    ...
-  model_adapter:  # which model/baseline to call
-    ...
-  runner:        # bookkeeping + persistence
-    limit: 5000
-    save_every: 250
-    output: "{{ROOT}}/results/evaluation/v23.0-20M/fastsrb.pkl"
-    resume: true
-```
-
-- **`data_source`** selects where problems come from. `type: skeleton_dataset` streams from a `FlashANSRDataset`, while `type: fastsrb` reads the FastSRB YAML benchmark. Common knobs include `n_support`, `noise_level`, and target sizes. Provide `datasets_per_expression` to iterate each skeleton or FastSRB equation deterministically with a fixed number of generated datasets (handy for reproducible evaluation sweeps).
-- **`model_adapter`** declares the solver. Supported values today are `flash_ansr`, `pysr`, `nesymres`, `skeleton_pool`, `brute_force`, and `e2e`, each with their own required fields (model paths, timeout/beam/samples knobs, etc.).
-- **`runner`** controls persistence: `limit` caps the number of processed samples, `save_every` checkpoints incremental progress to `output`, and `resume` decides whether to load previous results from that file.
-
-When `resume` is enabled the engine simply reloads the existing pickle, skips that many deterministic samples, and keeps writing to the same file. If a dataset cannot be generated within `max_trials`, the runner now appends a placeholder entry (`placeholder=True`, `placeholder_reason=...`) so the results length still reflects every attempted expression/dataset pair. Downstream analysis can filter those placeholders, but their presence keeps pause/resume logic trivial and avoids juggling extra state files. Skeleton dataset evaluations remain sequential—`datasets_per_expression` (default `1`) controls how many deterministic datasets are emitted per skeleton, and the previous random sampling mode has been removed.
-
-Running `flash_ansr evaluate-run ...` loads the config, resumes any previously saved pickle, instantiates the requested data/model pair, and streams results back into the same output file.
-
-#### 4.2 Example run configs
-
-Ready-to-use configs live under `configs/evaluation/scaling/` (with matching `noise_sweep/` and `support_sweep/` variants). All shipped experiments target FastSRB; the `*_v23_val.yaml` siblings swap in the v23 validation skeleton pool.
-
-##### 4.2.1 FlashANSR
-
-`configs/evaluation/scaling/v23.0-20M_fastsrb.yaml` (plus the 3M and 120M variants) sweep SoftmaxSampling `choices`. Example:
+or
 
 ```sh
-flash_ansr evaluate-run \
-  -c configs/evaluation/scaling/v23.0-20M_fastsrb.yaml \
-  --experiment flash_ansr_fastsrb_choices_00032 -v
+flash_ansr evaluate-run -c configs/evaluation/scaling/v23.0-20M_fastsrb.yaml -v
 ```
+to run all experiments in the config.
 
-##### 4.2.2 PySR
-
-`configs/evaluation/scaling/pysr_fastsrb.yaml` mirrors the same sweep over `niterations`. Run a single point with:
-
-```sh
-flash_ansr evaluate-run \
-  -c configs/evaluation/scaling/pysr_fastsrb.yaml \
-  --experiment pysr_fastsrb_iter_00032 -v
-```
-
-For long sweeps, `python scripts/evaluate_PySR.py -c <config> --experiment <name> -v` restarts jobs if PySR stalls.
-
-##### 4.2.3 NeSymReS
-
-`configs/evaluation/scaling/nesymres_fastsrb.yaml` varies `beam_width` for the 100M checkpoint tracked under `models/nesymres/`. Example:
-
-```sh
-flash_ansr evaluate-run \
-  -c configs/evaluation/scaling/nesymres_fastsrb.yaml \
-  --experiment nesymres_fastsrb_beam_width_00008 -v
-```
-
-##### 4.2.4 Skeleton pool baseline
-
-`configs/evaluation/scaling/skeleton_pool_fastsrb.yaml` samples skeletons directly from `data/ansr-data/test_set/fastsrb/skeleton_pool_max8`. Example:
-
-```sh
-flash_ansr evaluate-run \
-  -c configs/evaluation/scaling/skeleton_pool_fastsrb.yaml \
-  --experiment skeleton_pool_fastsrb_samples_00032 -v
-```
-
-##### 4.2.5 Brute force baseline
-
-`configs/evaluation/scaling/brute_force_fastsrb.yaml` exhaustively enumerates skeletons up to `max_expressions`. Example:
-
-```sh
-flash_ansr evaluate-run \
-  -c configs/evaluation/scaling/brute_force_fastsrb.yaml \
-  --experiment brute_force_fastsrb_max_expressions_00064 -v
-```
-
-##### 4.2.6 E2E baseline
-
-`configs/evaluation/scaling/e2e_fastsrb.yaml` sweeps `model_adapter.candidates_per_bag` (the beam size). Example:
-
-```sh
-flash_ansr evaluate-run \
-  -c configs/evaluation/scaling/e2e_fastsrb.yaml \
-  --experiment e2e_fastsrb_candidates_00016 -v
-```
-
-##### 4.2.7 Compute-scaling sweeps
-
-All scaling configs are multi-experiment. Omit `--experiment` to run the full sweep; the primary knobs are:
+- Adjust `-c` to any file under `configs/evaluation/` and optionally set `--experiment`.
+- Override on the fly: `-n/--limit`, `--save-every`, `-o/--output-file`, `--no-resume`.
+- The runner loads existing partial pickles, skips processed items, and appends new results. If sample generation fails within `max_trials`, a placeholder entry is written to preserve counts.
 
-- **FlashANSR**: `generation_overrides.kwargs.choices`
-- **PySR**: `niterations`
-- **NeSymReS**: `beam_width`
-- **SkeletonPool**: `samples`
-- **BruteForce**: `max_expressions`
-- **E2E**: `candidates_per_bag`
+### 2. Example configs
 
-Outputs are namespaced under `results/evaluation/scaling/<model>/<dataset>/...` so sweeps can run back-to-back.
+- FlashANSR v23.0-20M scaling: `configs/evaluation/scaling/v23.0-20M_fastsrb.yaml`
+- PySR scaling: `configs/evaluation/scaling/pysr_fastsrb.yaml`
+- NeSymReS scaling: `configs/evaluation/scaling/nesymres_fastsrb.yaml`
+- E2E baseline: `configs/evaluation/scaling/e2e_fastsrb.yaml`
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -15,29 +15,44 @@ See [all available models on Hugging Face](https://huggingface.co/models?search=
 
 ## Minimal inference Example
 ```python
-import numpy as np
-from flash_ansr import FlashANSR, SoftmaxSamplingConfig, get_path
-
-# Define some data
-X = np.random.randn(256, 2)
-y = X[:, 0] + X[:, 1]
-
-# Load the model (assuming v23.0-120M is installed)
+import torch
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# Import flash_ansr
+from flash_ansr import (
+  FlashANSR,
+  SoftmaxSamplingConfig,
+  install_model,
+  get_path,
+)
+
+# Select a model from Hugging Face
+# https://huggingface.co/models?search=flash-ansr-v23.0
+MODEL = "psaegert/flash-ansr-v23.0-120M"
+
+# Download the latest snapshot of the model
+# By default, the model is downloaded to the directory `./models/` in the package root
+install_model(MODEL)
+
+# Load the model
 model = FlashANSR.load(
-    directory=get_path('models', 'psaegert/flash-ansr-v23.0-120M'),
-    generation_config=SoftmaxSamplingConfig(choices=256),
-)  # .to(device) for GPU. Highly recommended.
+  directory=get_path('models', MODEL),
+  generation_config=SoftmaxSamplingConfig(choices=32),  # or BeamSearchConfig / MCTSGenerationConfig
+  n_restarts=8,
+).to(device)
 
-# Find an expression that fits the data by sampling from the model
+# Define data
+X = ...
+y = ...
+
+# Fit the model to the data
 model.fit(X, y, verbose=True)
 
-print("Expression:", model.get_expression())
+# Show the best expression
+print(model.get_expression())
 
+# Predict with the best expression
 y_pred = model.predict(X)
-print("Predictions:", y_pred[:5])
-
-# All results are stored in model.results as a pandas DataFrame
-model.results
 ```
 
 Find more details in the [API Reference](api.md).

diff --git a/docs/index.md b/docs/index.md
@@ -15,16 +15,44 @@ pip install flash-ansr
 flash_ansr install psaegert/flash-ansr-v23.0-120M
 ```
 ```python
-import numpy as np
-from flash_ansr import FlashANSR, SoftmaxSamplingConfig, get_path
+import torch
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
-X = np.random.randn(256, 2)
-model = FlashANSR.load(
-   directory=get_path('models', 'psaegert/flash-ansr-v23.0-120M'),
-   generation_config=SoftmaxSamplingConfig(choices=512),
+# Import flash_ansr
+from flash_ansr import (
+  FlashANSR,
+  SoftmaxSamplingConfig,
+  install_model,
+  get_path,
 )
-expr = model.fit(X, X[:, 0] + X[:, 1])
-print(expr)
+
+# Select a model from Hugging Face
+# https://huggingface.co/models?search=flash-ansr-v23.0
+MODEL = "psaegert/flash-ansr-v23.0-120M"
+
+# Download the latest snapshot of the model
+# By default, the model is downloaded to the directory `./models/` in the package root
+install_model(MODEL)
+
+# Load the model
+model = FlashANSR.load(
+  directory=get_path('models', MODEL),
+  generation_config=SoftmaxSamplingConfig(choices=32),  # or BeamSearchConfig / MCTSGenerationConfig
+  n_restarts=8,
+).to(device)
+
+# Define data
+X = ...
+y = ...
+
+# Fit the model to the data
+model.fit(X, y, verbose=True)
+
+# Show the best expression
+print(model.get_expression())
+
+# Predict with the best expression
+y_pred = model.predict(X)
 ```
 
 ## Serving these docs locally

diff --git a/docs/training.md b/docs/training.md
@@ -19,6 +19,11 @@
 ```
 Produces checkpoints under `models/ansr-models/test/` with `model.yaml`, `tokenizer.yaml`, and `state_dict.pt`.
 
+## Helper scripts
+- `./scripts/import_test_sets.sh`: import benchmark skeletons once so training excludes evaluation holdouts.
+- `./scripts/generate_validation_set.sh <config>`: create held-out skeleton pools matching your bundle.
+- `./scripts/train.sh <config>`: convenience wrapper to launch training with the bundle.
+
 ## Full training workflow
 1. **Import test sets**: Ajdust and run `./scripts/import_test_sets.sh` to import test sets. The data generating processes during training will exclude these skeletons to ensure fair evaluation.
 2. **Configure skeleton pools and datasets**: Adjust the `skeleton_pool_*.yaml` and `dataset_*.yaml` files inside your chosen config bundle to set operator priors, expression depths, and data sampling strategies.