Implement Typer + Hydra Configuration Architecture #147

JemmaLDaniel · 2025-11-26T18:14:22Z

Summary

This PR implements the Typer + Hydra hybrid architecture proposed in #146, refactoring Winnow's configuration management from flat CLI signatures to a flexible, hierarchical system that enables scalable configuration of complex nested components and automatic object instantiation.

Implementation Details

1. Typer + Hydra Hybrid Architecture

Typer now acts as a thin command dispatcher, passing all configuration to Hydra:

def train(ctx: typer.Context) -> None:
    """Passes control directly to the Hydra training pipeline."""
    overrides = ctx.args if ctx.args else None
    train_entry_point(overrides)Pipeline logic moved to `train_entry_point()` and `predict_entry_point()` functions that handle Hydra initialization, configuration composition and pipeline execution.

2. Structured Configuration with Composition

Created modular configuration structure in config/:

train.yaml / predict.yaml - Main pipeline configurations
calibrator.yaml - Model architecture and features
residues.yaml - Amino acid masses and modifications (shared via composition)
data_loader/ - Pluggable dataset format loaders (InstaNovo, MZTab, PointNovo, Winnow)
fdr_method/ - Pluggable FDR methods (nonparametric, database-grounded)

Configuration files use Hydra's defaults mechanism to compose shared components.

3. Hydra-Based Object Instantiation

Used Hydra's _target_ field for automatic instantiation:

Data loaders instantiated from configuration without manual if/elif logic
FDR methods selected and configured via YAML
Users can inject custom implementations by creating YAML configs with _target_ pointing to their classes

4. Configuration Inspection Commands

Added winnow config command group:

winnow config train - Display resolved training configuration
winnow config predict - Display resolved prediction configuration

Implemented custom ConfigFormatter class with hierarchical colour-coding based on YAML nesting depth for improved terminal readability.

5. Lazy Imports for CLI Performance

Implemented lazy import pattern using TYPE_CHECKING to defer heavy dependencies (PyTorch, InstaNovo, etc.) until command execution. This makes --help and config commands respond instantly whilst pipeline commands still have access to all required dependencies.

Added module-level docstring in main.py explaining the rationale.

6. Documentation Updates

Minor improvements to CLI help text and documentation to reflect the new Hydra-based configuration system with examples of dot-notation overrides.

Migration Notes

Existing users will need to:

Use configuration files in config/ instead of passing all parameters via CLI flags
Override parameters using dot notation: winnow train calibrator.seed=42
Consult winnow config <pipeline> to inspect resolved configurations

JemmaLDaniel · 2025-11-26T18:16:39Z

Commits 20ee8b3 and 2529582 also address #143 and #140

docs/cli.md

docs/configuration.md

winnow/scripts/main.py

docs/cli.md

chore: pre-commit edits to generate_sample_data

…nstalled as a package chore: fix pre-commit on main script chore: remove testing Make commands fix: correct the path for config_path_utils fix: correct the path for config_path_utils chore: pre-commit formatting fixes for test_config_paths

…s and using config defaults

examples/example_data/predictions.csv

README.md

uv.lock

pyproject.toml

winnow/datasets/calibration_dataset.py

Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com> chore: update README.md Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com> chore: remove trailing whitespace

BioGeek

When I run

make sample-data
make train-sample
make predict-sample

It generates CSV files in results/predictions with only headers.

preds_and_fdr_metrics.csv:

,calibrated_confidence,prediction,psm_fdr,psm_q_value,sequence,psm_pep,spectrum_id

metadata.csv:

,spectrum_id,prediction_untokenised,confidence,sequence_untokenised,token_log_probabilities_beam_0,token_log_probabilities_beam_1,token_log_probabilities_beam_2,precursor_mz,precursor_charge,precursor_mass,retention_time,mz_array,intensity_array,valid_peptide,valid_prediction,num_matches,correct,Mass Error,is_missing_prosit_features,prosit_mz,prosit_intensity,ion_matches,ion_match_intensity,is_missing_irt_error,iRT,predicted iRT,iRT error,is_missing_chimeric_features,runner_up_prosit_mz,runner_up_prosit_intensity,chimeric_ion_matches,chimeric_ion_match_intensity,margin,median_margin,entropy,z-score

pyproject.toml

docs/cli.md

docs/api/calibration.md

docs/cli.md

Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com>

JemmaLDaniel · 2026-01-21T16:44:47Z

make predict-sample resulted in no saved predictions because we filter to an FDR of <=0.05 by default, and the sample data did not pass this threshold. I've changed the make command to prevent filtering, which will return all the input sample data rows.

BioGeek

Thanks for persevering and getting this over the finish line!

JemmaLDaniel added 7 commits November 26, 2025 11:41

chore: add hydra to project dependencies

a2ae8d3

feat: use hydra to configure winnow runs

d51a264

test: update tests to use extra init arguments

8d3a02a

feat: add winnow config command to view resolved configuration

5e730c7

docs: document hydra config usage with winnow cli

20ee8b3

docs: make docs titles sentence case and fix bullet list formatting

2529582

perf: optimise CLI startup time with lazy imports

d7e713c

JemmaLDaniel requested a review from BioGeek November 26, 2025 18:18

JemmaLDaniel self-assigned this Nov 26, 2025

JemmaLDaniel added enhancement New feature or request documentation Improvements or additions to documentation labels Nov 26, 2025

JemmaLDaniel added 2 commits November 26, 2025 18:31

chore: merge branch 'main' into feat-hydra-config

07bfc18

chore: update gitignore to ignore extra supported files and images

980a793

BioGeek requested changes Dec 1, 2025

View reviewed changes

winnow/scripts/main.py Outdated Show resolved Hide resolved

BioGeek requested changes Dec 1, 2025

View reviewed changes

docs/cli.md Outdated Show resolved Hide resolved

JemmaLDaniel added 6 commits December 4, 2025 10:09

Merge branch 'main' into feat-hydra-config

b18bd54

fix: convert predictions_path to a Path before file loading

bb25d28

docs: add instructions on conversion from mgf to parquet file

a883fcd

docs: remove references to old Typer CLI arguments

e9126d9

feat: create toy data for CLI quickstart

864095e

chore: pre-commit edits to generate_sample_data

docs: add documentation for quickstarting with the toy data

d614fcf

JemmaLDaniel force-pushed the feat-hydra-config branch 4 times, most recently from f0bafe0 to b459d60 Compare December 4, 2025 18:59

JemmaLDaniel added 4 commits December 8, 2025 12:20

chore: update example notebook with new object instantiation argument…

ad8b1f5

…s and using config defaults

ci: migrate coverage badge to Gist-based dynamic system

00a006b

chore: track new config position

571b3b3

JemmaLDaniel force-pushed the feat-hydra-config branch from b1f5a96 to 571b3b3 Compare December 8, 2025 12:25

chore: merge branch 'main' into feat-hydra-config

cb9edfe

BioGeek requested changes Jan 16, 2026

View reviewed changes

chore: update README.md

36c50a2

Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com> chore: update README.md Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com> chore: remove trailing whitespace

JemmaLDaniel force-pushed the feat-hydra-config branch from 79610e0 to 533836a Compare January 16, 2026 13:07

JemmaLDaniel added 3 commits January 16, 2026 15:11

fix: remove np.float64 artifacts from example CSV

29e182e

chore: remove unused workspace config

0e12735

chore: update requirements

4ea04a5

JemmaLDaniel force-pushed the feat-hydra-config branch from 533836a to 4ea04a5 Compare January 16, 2026 13:12

JemmaLDaniel added 2 commits January 16, 2026 16:29

fix: added instanovo version compatibility layer

c12ae19

chore: bump instanovo package version

46c66e1

JemmaLDaniel requested a review from BioGeek January 16, 2026 14:40

BioGeek requested changes Jan 20, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

docs/cli.md Show resolved Hide resolved

docs/api/calibration.md Outdated Show resolved Hide resolved

docs/api/calibration.md Outdated Show resolved Hide resolved

BioGeek requested changes Jan 20, 2026

View reviewed changes

docs/cli.md Show resolved Hide resolved

JemmaLDaniel and others added 8 commits January 20, 2026 16:43

fix: update pyproject.toml to include compatibility layer

bf5c90d

Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com>

fix: do not filter sample data outputs on FDR 0.05

35d1a36

docs: update old CLI references

c0ad9b2

fix: reference MZTabDatasetLoader residue_remapping correctly

e7986f9

chore: update pyproject.toml with winnow sub-directories

5794913

docs: recommend use of Make commands in quickstart

5418e6b

chore: use uv to run quickstart commands

431123b

chore: remove unused global variable

82b9666

JemmaLDaniel requested a review from BioGeek January 21, 2026 16:52

BioGeek approved these changes Jan 22, 2026

View reviewed changes

JemmaLDaniel merged commit 767a464 into main Jan 23, 2026
4 checks passed

Implement Typer + Hydra Configuration Architecture #147

Implement Typer + Hydra Configuration Architecture #147

Uh oh!

Conversation

JemmaLDaniel commented Nov 26, 2025

Summary

Implementation Details

1. Typer + Hydra Hybrid Architecture

2. Structured Configuration with Composition

3. Hydra-Based Object Instantiation

4. Configuration Inspection Commands

5. Lazy Imports for CLI Performance

6. Documentation Updates

Migration Notes

Uh oh!

JemmaLDaniel commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BioGeek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JemmaLDaniel commented Jan 21, 2026

Uh oh!

BioGeek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JemmaLDaniel commented Nov 26, 2025 •

edited

Loading