Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 63 additions & 20 deletions docs/data_conversion.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,66 @@
# Data handling (supported formats)

mzmine supports both **open** (e.g., .mzML, .mzXML, .imzML, .netCDF) and **proprietary**
formats from Bruker Daltonics (.d and .tdf/tsf). Raw data files from
vendors must be converted into an open format prior to the import. **This conversion can be applied automatically
during the import, if the user has MSConvert installed.**
If you want to convert the files yourself, see the sections below.

The **recommendations** for the data handling are the conversion of the raw data to centroided .mzML
data files,
**except** for timsTOF data (native .tdf and .tsf inside the Bruker .d folder), and the conversion of MS
imaging data to .imzML, except for the timsTOF fleX MS imaging data.
mzmine supports both **open** (e.g., .mzML, .mzXML, .imzML, .netCDF) and many **proprietary**
vendor data formats. For many vendor formats it is recommended to use the original data files and
to only apply conversion where needed. Data files can be imported by drag-and-dropping files into
the mzmine graphical user interface or using the _import MS data_ module.

**Supported raw data formats include:**
- Bruker Daltonics (.d and .tdf/tsf)
- Thermo Fisher (.raw)
- Waters (.raw folders)
- Agilent (.d)
- Sciex (.wiff/.wiff2)
- Shimadzu (.lcd)
- MOBILion (.mbi)
- mzML or mzXML/mzData/netCDF (prefer mzML if possible due to better metadata coverage)
- imzML (MS imaging)

!!! warning

Some vendor data formats are only supported on specific operating systems due to the limited
support by their respective data access libraries. All data formats are supported on Windows
and many on Linux (see full [compatibility list here](system_requirements.md#compatibility)).
Many data formats are unsupported on macOS, requiring data conversion to open formats, usually
on a Windows or Linux computer.

## External dependencies

Many data formats are supported without external dependencies, directly through the mzmine version.
Other formats may require another third-party software to be downloaded and installed.

**MSConvert:** This tool is provided by ProteoWizard and the default data converter for MS data.
While our team is expanding the native data support for all major vendor formats, we recommend to
install MSConvert for some formats. This will grant direct access to these files and mzmine will
use MSConvert with some internal optimizations to load data files in the background. Formats that
currently require MSConvert for direct support include:
- Agilent (.d)
- Sciex (.wiff/.wiff2)
- Shimadzu (.lcd)
- MOBILion (.mbi)

## Data conversion to open formats (.mzML / .imzML)

When converting data, prefer the latest standard formats mzML (or imzML for MS imaging data).
Other older open formats like mzData or mzXML may cover less metadata.
It is **recommended** to convert raw data to centroided .mzML files.
**Exceptions:** timsTOF native data as .tdf and .tsf inside the Bruker .d folder are best imported
in their original format. This is also true for timsTOF fleX MS imaging data.

**This conversion can be applied automatically during the import if the user has MSConvert installed.**
If you want to convert the files yourself, see the sections below.


### MSConvert (ProteoWizard) to mzML

!!! info

mzmine can use MSConvert automatically. Make sure to setup the MSConvert installation path in the mzmine preferences. (only supported on Windows)
mzmine can use MSConvert automatically. Make sure to setup the MSConvert installation path in
the mzmine preferences. (only supported on Windows)

![MSConvert_settings](MSConvert_settings.png)

MSConvert supports the conversion of AB SCIEX, Agilent, Bruker, Shimadzu, Thermo Scientific,
MSConvert supports the conversion of AB SCIEX, Agilent, Bruker, Shimadzu, Thermo Scientific, MOBILion,
and [Waters](data_conversion.md#waters) raw data. More information about the formats can be found in
the [ProteoWizard Documentation for Users](https://proteowizard.sourceforge.io/doc_users.html).
Furthermore, profile data can be centroided to reduce the file size and memory consumption,
Expand Down Expand Up @@ -62,17 +101,19 @@ the [ProteoWizard documentation](https://proteowizard.sourceforge.io/tools/mscon
### ThermoRawFileParser

It is used to convert ThermoFisher .raw files into .mgf, .mzML, .parquet. This converter is
important if an
internal calibrant was used (e.g., EASY-IC). This mass is excluded in the FreeStyle view, whereas
MSConvert
remains all signals in the mzML, including the calibrant. If those masses together with some flagged signals
by Thermo, should be
removed use this converter with the option --excludeExceptionData.
important if an internal calibrant was used (e.g., EASY-IC). This mass is excluded in the FreeStyle
view, whereas MSConvert remains all signals in the mzML, including the calibrant. If those masses
together with some flagged signals by Thermo, should be removed use this converter with the option
**--excludeExceptionData**.

!!! Note

mzmine can use the ThermoRawFileParser automatically to import your data without conversion. In the preferences (CTRL+P)
set the "Thermo data import" to "Thermo raw file parser" instead of MSConvert. The raw file parser is supported on Mac, Linux, and Windows.
**mzmine 4.8** and higher supports Thermo Raw data directly, there is no need to install external
dependencies.

**Earlier mzmine versions** can use the ThermoRawFileParser automatically to import your data
without conversion. In the preferences (CTRL+P) set the "Thermo data import" to "Thermo raw
file parser" instead of MSConvert. The raw file parser is supported on Mac, Linux, and Windows.

Example for command line interface with the exclusion of exception data:

Expand Down Expand Up @@ -105,6 +146,8 @@ how to use it can be found [here](https://github.com/elnurgar/mzxml-precursor-co

### Waters

Direct Waters data support is currently in beta phase.

Waters recently released a tool called **Waters data connect**, which allows conversion of DDA, DIA,
and HD-DDA data to mzML. Lock mass correction is applied during the conversion. We also recommend to
enable centroiding (2D peak picking).
Expand Down
45 changes: 35 additions & 10 deletions docs/system_requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Installation of mzmine is described on the [getting started](getting_started.md#install-update) page.
mzmine is available as an installable package or a portable version. The portable version does not
require administator rights to be run, making it useful for university students without elevated
require administrator rights to be run, making it useful for users without elevated
permissions.

## Hardware requirements
Expand Down Expand Up @@ -38,16 +38,12 @@ permissions.

<!-- markdown-link-check-disable -->

- Up-to-date operating system, e.g., Windows 10 or newer, recent Linux or MacOS (academic only) versions
- mzmine does not require a dedicated Java installation, even though it is a Java software. All
requirements are shipped with mzmine
- Up-to-date operating system, e.g., Windows 10 or newer, recent Linux or MacOS (academic only) versions.
- mzmine does not require a dedicated Java installation, as it is a self-contained Java software with its own Java Virtual Machine. All
requirements are shipped with mzmine.
- Microsoft Visual Studio C++ Redist for Bruker raw data import [download page](https://learn.microsoft.com/de-de/cpp/windows/latest-supported-vc-redist?view=msvc-170)
- MSConvert (on Windows) for native Sciex, Waters, Shimadzu, MOBILion, Thermo data
support [download page](https://proteowizard.sourceforge.io/download.html)
- Thermo alternative: ThermoRawFileParser for native Thermo support on Windows, Mac, and
Linux [download page](https://github.com/pluskal-lab/ThermoRawFileParserMacLinux/releases)
- ThermoRawFileParser does not need to be installed but only downloaded and imported via the
mzmine preferences
- MSConvert (on Windows) for native Agilent, Sciex, Waters, Shimadzu, and MOBILion data support [download page](https://proteowizard.sourceforge.io/download.html)

<!-- markdown-link-check-enable -->

## Internet connection
Expand All @@ -65,3 +61,32 @@ for spectral networking using MS2Deepscore and DReaMS, an internet connection is
- https://zenodo.org/ spectral libraries
- https://external.gnps2.org/gnpslibrary spectral libraries
<!-- markdown-link-check-enable -->

## Operating system compatibility {#compatibility}

### Windows

Currently, all modules are compatible with Microsoft Windows 10 and higher.

Some libraries for the raw data support for vendor-specific formats are only available for Windows.
Read more about data support and [data conversion](data_conversion.md).

### Linux

Some libraries for the raw data support for vendor-specific formats are only available for Windows.

The Linux version **supports** raw data formats from:
- **Thermo**, **Bruker**, **Waters**

Data from other Vendors may need to be **converted** to the open .mzML format before, including:
- **Agilent**, **Sciex**, **Shimadzu**, **MOBILion**

### macOS

Some libraries for the raw data support for vendor-specific formats are only available for Windows and Linux.

The macOS version **supports** raw data formats from:
- Thermo

Data from other Vendors may need to be **converted** to the open .mzML format before, including:
- Agilent, Sciex, Shimadzu, MOBILion, Bruker, Waters