Skip to content

Conversation

@ArthurDeclercq
Copy link
Contributor

@ArthurDeclercq ArthurDeclercq commented Jan 9, 2026

This pull request introduces parallel processing for spectrum parsing and feature extraction, adds new MS2 feature calculation capabilities, and improves Python interoperability for the MS2Spectrum struct. The changes are primarily focused on performance improvements and new functionality for feature extraction from MS2 spectra.

Parallelization and Performance Improvements:

  • Added the rayon crate and enabled the parallelism feature for mzdata to allow parallel processing of spectra in both parse_mzdata.rs and parse_timsrust.rs, resulting in faster reading and parsing of large spectrum files. [1] [2] [3] [4] [5] [6] [7] [8] [9]

New Feature Extraction Functionality:

  • Introduced the new module ms2_features.rs with the batch_ms2_features_from_spectra function, which computes a set of features (including intensity-based and sequence-based metrics, as well as an optional hyperscore) for batches of MS2 spectra in parallel. [1] [2] [3]
  • Introduced the new module ms2pip_features.rs with the batch_ms2pip_features_numpy which computes a set of features for the ms2pip feature generator in MS2rescore

Python Interoperability and API Improvements:

  • Enhanced the MS2Spectrum struct with a Python constructor (__new__) and pickling support (__reduce__), making it easier to create and serialize/deserialize spectrum objects from Python. [1] [2]
  • Updated the signatures of get_precursor_info and get_ms2_spectra to accept the Python interpreter context and allow thread release for improved performance with Python bindings. [1] [2]

Dependency and Version Updates:

  • Bumped the crate version to 0.5.0 and added new dependencies (rayon, rustyms, ordered-float, numpy) to support new features and parallelization. [1] [2]

API Additions:

  • Registered new feature extraction functions with the Python module, making them available to Python users.

These changes collectively improve the performance, usability, and feature set of the ms2rescore-rs library, especially for batch processing and integration with Python workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants