-
Notifications
You must be signed in to change notification settings - Fork 1
Implement Typer + Hydra Configuration Architecture #147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
chore: pre-commit edits to generate_sample_data
f0bafe0 to
b459d60
Compare
…nstalled as a package chore: fix pre-commit on main script chore: remove testing Make commands fix: correct the path for config_path_utils fix: correct the path for config_path_utils chore: pre-commit formatting fixes for test_config_paths
…s and using config defaults
b1f5a96 to
571b3b3
Compare
Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com> chore: update README.md Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com> chore: remove trailing whitespace
79610e0 to
533836a
Compare
533836a to
4ea04a5
Compare
BioGeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run
make sample-data
make train-sample
make predict-sample
It generates CSV files in results/predictions with only headers.
preds_and_fdr_metrics.csv:
,calibrated_confidence,prediction,psm_fdr,psm_q_value,sequence,psm_pep,spectrum_id
metadata.csv:
,spectrum_id,prediction_untokenised,confidence,sequence_untokenised,token_log_probabilities_beam_0,token_log_probabilities_beam_1,token_log_probabilities_beam_2,precursor_mz,precursor_charge,precursor_mass,retention_time,mz_array,intensity_array,valid_peptide,valid_prediction,num_matches,correct,Mass Error,is_missing_prosit_features,prosit_mz,prosit_intensity,ion_matches,ion_match_intensity,is_missing_irt_error,iRT,predicted iRT,iRT error,is_missing_chimeric_features,runner_up_prosit_mz,runner_up_prosit_intensity,chimeric_ion_matches,chimeric_ion_match_intensity,margin,median_margin,entropy,z-score
Co-authored-by: Jeroen Van Goey <j.vangoey@instadeep.com>
|
|
BioGeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for persevering and getting this over the finish line!
Summary
This PR implements the Typer + Hydra hybrid architecture proposed in #146, refactoring Winnow's configuration management from flat CLI signatures to a flexible, hierarchical system that enables scalable configuration of complex nested components and automatic object instantiation.
Implementation Details
1. Typer + Hydra Hybrid Architecture
Typer now acts as a thin command dispatcher, passing all configuration to Hydra:
2. Structured Configuration with Composition
Created modular configuration structure in
config/:train.yaml/predict.yaml- Main pipeline configurationscalibrator.yaml- Model architecture and featuresresidues.yaml- Amino acid masses and modifications (shared via composition)data_loader/- Pluggable dataset format loaders (InstaNovo, MZTab, PointNovo, Winnow)fdr_method/- Pluggable FDR methods (nonparametric, database-grounded)Configuration files use Hydra's
defaultsmechanism to compose shared components.3. Hydra-Based Object Instantiation
Used Hydra's
_target_field for automatic instantiation:if/eliflogic_target_pointing to their classes4. Configuration Inspection Commands
Added
winnow configcommand group:winnow config train- Display resolved training configurationwinnow config predict- Display resolved prediction configurationImplemented custom
ConfigFormatterclass with hierarchical colour-coding based on YAML nesting depth for improved terminal readability.5. Lazy Imports for CLI Performance
Implemented lazy import pattern using
TYPE_CHECKINGto defer heavy dependencies (PyTorch, InstaNovo, etc.) until command execution. This makes--helpandconfigcommands respond instantly whilst pipeline commands still have access to all required dependencies.Added module-level docstring in
main.pyexplaining the rationale.6. Documentation Updates
Minor improvements to CLI help text and documentation to reflect the new Hydra-based configuration system with examples of dot-notation overrides.
Migration Notes
Existing users will need to:
config/instead of passing all parameters via CLI flagswinnow train calibrator.seed=42winnow config <pipeline>to inspect resolved configurations