Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
yufree committed Mar 25, 2024
1 parent 4f037ec commit 4e158a4
Show file tree
Hide file tree
Showing 11 changed files with 3,770 additions and 1,300 deletions.
2 changes: 1 addition & 1 deletion 01-introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ General challenge for metabolomics studies could be found here [@schymanski2017;

- Quantitative Metabolomics related issues could be found here[@kapoore2016b; @jorge2016a; @lv2022; @vitale2022].

- For quality control issues, check here[@dudzik2018; @siskos2017; @sumner2007; @place2021]. You might also try postcolumn infusion as a quality control tool[@gonzalez2022].
- For quality control issues, check here[@dudzik2018; @siskos2017; @sumner2007; @place2021;@broeckling2023;@gonzalez-dominguez2024]. You might also try postcolumn infusion as a quality control tool[@gonzalez2022].

## Trends in Metabolomics

Expand Down
8 changes: 6 additions & 2 deletions 02-doe.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,20 @@ One experiment can contain lots of factors with different levels and only one se

## Pooled QC

Pooled QC samples are unique and very important for metabolomics study. Every 10 or 20 samples, a pooled sample from all samples and blank sample in one study should be injected as quality control samples. Pooled QC samples contain the changes during the instrumental analysis and blank samples could tell where the variances come from. Meanwhile the cap of sequence should old the column with pooled QC samples. The injection sequence should be randomized. Those papers[@phapale2020; @dudzik2018; @dunn2012; @broadhurst2018] should be read for details.
Pooled QC samples are unique and very important for metabolomics study. Every 10 or 20 samples, a pooled sample from all samples and blank sample in one study should be injected as quality control samples. Pooled QC samples contain the changes during the instrumental analysis and blank samples could tell where the variances come from. Meanwhile the cap of sequence should old the column with pooled QC samples. The injection sequence should be randomized. Those papers[@phapale2020; @dudzik2018; @dunn2012; @broadhurst2018;@broeckling2023;@gonzalez-dominguez2024] should be read for details.

If there are other co-factors, a linear model or randomizing would be applied to eliminate their influences. You need to record the values of those co-factors for further data analysis. Common co-factors in metabolomics studies are age, gender, location, etc.

If you need data correction, some background or calibration samples are required. However, control samples could also be used for data correction in certain DoE.

Another important factors are instrumentals. High-resolution mass spectrum is always preferred. As shown in Lukas's study [@najdekr2016]:

> the most effective mass resolving powers for profiling analyses of metabolite rich biofluids on the Orbitrap Elite were around 60000--120000 fwhm to retrieve the highest amount of information. The region between 400--800 m/z was influenced the most by resolution.
> the most effective mass resolving powers for profiling analyses of metabolite rich biofluids on the Orbitrap Elite were around 60000-120000 fwhm to retrieve the highest amount of information. The region between 400-800 m/z was influenced the most by resolution.
However, elimination of peaks with high RSD% within group were always omitted by most study. Based on pre-experiment, you could get a description of RSD% distribution and set cut-off to use stable peaks for further data analysis. To my knowledge, 30% is suitable considering the batch effects.

Adding certified reference material or standard reference material will help to evaluate the quality large scale data collocation or important metabolites[@wise2022; @wright2022].

For quality control in long term, ScreenDB provide a data analysis strategy for HRMS data founded on structured query language database archiving[@mardal2023].

AVIR develops a computational solution to automatically recognize metabolic features with computational variation in a metabolomics data set[@zhang2024a].
4 changes: 3 additions & 1 deletion 04-instrumental.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,12 @@ For GC, higher temperature could release compounds with higher boiling point. Fo

Meta-analysis of chromatographic methods in EBI metabolights and NIH Workbench could be a guide for lab without experience on metabolomics chromatographic methods[@harrieder2022].

This work introduce Sequential Quantification using Isotope Dilution (SQUID), a method combining serial sample injections into a continuous isocratic mobile phase, enabling rapid analysis of target molecules with high accuracy, as demonstrated by detecting microbial polyamines in human urine samples with an LLOQ of 106 nM and analysis times as short as 57 s, thus proposing SQUID as a high-throughput LC–MS tool for quantifying target biomarkers in large cohorts[@groves2023].

## Mass resolution

For metabolomics, high resolution mass spectrum should be used to make identification of compounds easier. The Mass Resolving Power is very important for annotation and high resolution mass spectrum should be calibrated in real time. The region between 400--800 m/z was influenced the most by resolution[@najdekr2016]. Orbitrap Fusion's performance was evaluated here[@barbiersainthilaire2018], as well as the comparison with Fourier transform ion cyclotron resonance (FT-ICR)[@ghaste2016; @huang2021]. Mass Difference Maps could recalibrate HRMS data [@smirnov2019].

## Matrix effects

Matrix effects could decrease the sensitivity of untargeted analysis. Such matrix effects could be checked by low resolution mass spectrometry[@yu2017] and found for high resolution mass spectrometry[@calbiani2006]. Ion suppression should also be considered as a critical issue comparing heterogeneous metabolic profiles[@ghosson2021].
Matrix effects could decrease the sensitivity of untargeted analysis. Such matrix effects could be checked by low resolution mass spectrometry[@yu2017] and found for high resolution mass spectrometry[@calbiani2006]. Ion suppression should also be considered as a critical issue comparing heterogeneous metabolic profiles[@ghosson2021]. This work discussed the matrix effects after Trimethylsilyl derivatization[@tarakhovskaya2023].The study[@dagan2023] investigated how the complexity of matrices affects nontargeted detection using LC-MS/MS analysis, finding that detection limits for trace compounds were significantly influenced by matrix complexity, with higher concentrations required for detection within the "top 1000" list compared to the first 10,000 peaks, suggesting a negative power law functional relationship between peak location and concentration; the research also demonstrated a correlation between power law coefficient and dilution factor, while showcasing the distribution of matrix peaks across various matrices, providing insights into the capabilities and limitations of LC-MS in analyzing nontargets in complex matrices.
18 changes: 13 additions & 5 deletions 05-workflow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Here is a list for related open source [projects](http://strimmerlab.org/notes/m

[xcms](https://bioconductor.org/packages/release/bioc/html/xcms.html) is different from xcms online while they might share the same code. I used it almost every data to run local metabolomics data analysis. Recently, they will change their version to xcms 3 with major update for object class. Their data format would integrate into the MSnbase package and the parameters would be easy to set up for each step. Normally, I will use msconvert-IPO-xcms-xMSannotator-metaboanalyst as workflow to process the offline data. It could accelerate the process by parallel processing. However, if you are not familiar with R, you would better to choose some software below. For xcms, 1000 files will need around 5 hours to generate the peaks list on a regular workstation.

[IPO](https://github.com/rietho/IPO) A Tool for automated Optimization of XCMS Parameters [@libiseller2015] and [Warpgroup](https://github.com/nathaniel-mahieu/warpgroup) is used for chromatogram subregion detection, consensus integration bound determination and accurate missing value integration[@mahieu2016]. Another option is AutoTuner, which are much faster than IPO[@mclean2020]. Recently, MetaboAnalystR 3.0 could also optimize the parameters for xcms while you need to perform the following analysis within this software[@pang2020]. For IPO, ten files will need \~12 hours to generate the optimized results on a regular workstation. [Paramounter](https://github.com/HuanLab/Paramounter) is a direct measurement of universal parameters to process metabolomics data in a “White Box”[@guo2022]. Another research use machine learning method to compare different optimization methods and they are all better than the default setting of xcms[@lassen2021]. It could be extended to include ion mobility[@dodds2022].
[IPO](https://github.com/rietho/IPO) A Tool for automated Optimization of XCMS Parameters [@libiseller2015] and [Warpgroup](https://github.com/nathaniel-mahieu/warpgroup) is used for chromatogram subregion detection, consensus integration bound determination and accurate missing value integration[@mahieu2016]. A case study to compare different xcms parameters with IPO can be found for GC-MS [@dossantos2023]. Another option is AutoTuner, which are much faster than IPO[@mclean2020]. Recently, MetaboAnalystR 3.0 could also optimize the parameters for xcms while you need to perform the following analysis within this software[@pang2020]. For IPO, ten files will need \~12 hours to generate the optimized results on a regular workstation. [Paramounter](https://github.com/HuanLab/Paramounter) is a direct measurement of universal parameters to process metabolomics data in a “White Box”[@guo2022]. Another research use machine learning method to compare different optimization methods and they are all better than the default setting of xcms[@lassen2021]. It could be extended to include ion mobility[@dodds2022].

Check those papers for the XCMS based workflow[@forsberg2018; @huan2017; @mahieu2016a; @montenegro-burke2017; @domingo-almenara2020; @stancliffe2022]. For metlin related annotation, check those papers[@guijas2018; @tautenhahn2012; @xue2020; @domingo-almenara2018a].

Expand All @@ -61,19 +61,19 @@ mzMatch is a modular, open source and platform independent data processing pipel

[PRIMe](http://prime.psc.riken.jp/Metabolomics_Software/) is from RIKEN and UC Davis. They update their database frequently[@tsugawa2016]. It supports mzML and major MS vendor formats. They defined own file format ABF and eco-system for omics studies. The software are updated almost everyday. You could use MS-DIAL for untargeted analysis and MRMOROBS for targeted analysis. For annotation, they developed MS-FINDER and statistic tools with excel. This platform could replaced the dear software from company and well prepared for MS/MS data analysis and lipidomics. They are open source, work on Windows and also could run within mathmamtics. However, they don't cover pathway analysis. Another feature is they always show the most recently spectral records from public repositories. You could always get the updated MSP spectra files for your own data analysis.

For PRIMe based workflow, check those papers[@lai2018; @matsuo2017; @treutler2016; @tsugawa2015; @tsugawa2016; @kind2018]. There are also extensions for their workflow[@uchino2022].
For PRIMe based workflow, check those papers[@lai2018; @matsuo2017; @treutler2016; @tsugawa2015; @tsugawa2016; @kind2018]. There are also extensions for their workflow[@uchino2022] and workflow for environmental science[@bonnefille2023].

### GNPS

[GNPS](http://gnps.%20ucsd.edu) is an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. It's a straight forward annotation methods for MS/MS data. Feature-based molecular networking (FBMN) within GNPS could be coupled with xcms, openMS, MS-DIAL, MZmine2, and other popular software. GNPS also have a dashboard for online mass spectrometery data analysis[@petras2021a].

Check those papers for GNPS and related projects[@aron2020; @nothias2020; @scheubert2017; @silva2018; @wang2016b].
Check those papers for GNPS and related projects[@aron2020; @nothias2020; @scheubert2017; @silva2018; @wang2016b;@bittremieux2023].

### OpenMS & SIRIUS

[OpenMS](https://www.openms.de/) is another good platform for mass spectrum data analysis developed with C++. You could use them as plugin of [KNIME](https://www.knime.org/). I suggest anyone who want to be a data scientist to get familiar with platform like KNIME because they supplied various API for different programme language, which is easy to use and show every steps for others. Also TOPPView in OpenMS could be the best software to visualize the MS data. You could always use the metabolomics workflow to train starter about details in data processing. pyOpenMS and OpenSWATH are also used in this platform. If you want to turn into industry, this platform fit you best because you might get a clear idea about solution and workflow.

Check those paper for OpenMS based workflow[@bertsch2011; @pfeuffer2017; @rost2014; @rost2016; @rurik2020; @alka2020].
Check those paper for OpenMS based workflow[@bertsch2011; @pfeuffer2017; @rost2014; @rost2016; @rurik2020; @alka2020;@pfeuffer2024].

OpenMS could be coupled to SIRIUS 4 for annotation. [Sirius](https://bio.informatik.uni-jena.de/software/sirius/) is a new java-based software framework for discovering a landscape of de-novo identification of metabolites using single and tandem mass spectrometry. SIRIUS 4 project integrates a collection of our tools, including [CSI:FingerID](https://www.csi-fingerid.uni-jena.de/), [ZODIAC](https://bio.informatik.uni-jena.de/software/zodiac/) and [CANOPUS](https://bio.informatik.uni-jena.de/software/canopus/). Check those papers for SIRIUS based workflow[@duhrkop2019; @duhrkop2020a; @alka2020; @ludwig2020].

Expand Down Expand Up @@ -137,9 +137,17 @@ You could check those papers for Emory workflow[@uppal2013; @uppal2017; @yu2009b

- [MetEx](http://www.metaboex.cn/MetEx) is a targeted extraction strategy for improving the coverage and accuracy of metabolite annotation[@zheng2022a].

- Asari:Trackable and scalable LC-MS metabolomics data processing software in Python[@li2023a]

- NOMspectra: An Open-Source Python Package for Processing High Resolution Mass Spectrometry Data on Natural Organic Matter[@volikov2023]

- MARS:A Multipurpose Software for Untargeted LC−MS-Based Metabolomics and Exposomics with GUI in C++ [@goracci2024]

- MeRgeION: a Multifunctional R Pipeline for Small Molecule LC-MS/MS Data Processing, Searching, and Organizing [@liu2023a]

### Workflow Comparison

Here are some comparisons for different workflow and you could make selection based on their works[@myers2017; @weber2017; @li2018a].
Here are some comparisons for different workflow and you could make selection based on their works[@myers2017; @weber2017; @li2018a;@liao2023].

[xcmsrocker](https://github.com/yufree/xcmsrocker) is a docker image for metabolomics to compare R based software with template[@yu2022b].

Expand Down
12 changes: 10 additions & 2 deletions 06-rawdata.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,15 @@ An Open-source feature detection algorithm for non-target LC–MS analytics coul

[mzRAPP](https://github.com/YasinEl/mzRAPP) could enables the generation of benchmark peak lists by using an internal set of known molecules in the analyzed data set to compare workflows[@elabiead2022].

G-Aligner is a graph-based feature alignment method for untargeted LC–MS-based metabolomics[@wang2023b], which will consider the importance of feature matching.

qBinning is a novel algorithm for constructing extracted ion chromatograms (EICs) based on statistical principles and without the need to set user parameters[@reuschenbach2023].

Machine learning can also be used for feature extraxtion. Deep learning frame for LC-MS feature detection on 2D pseudo color image could improve the peak picking process [@zhao2021]. Another deep learning-assisted peak curation (NeatMS) can also be used for large-scale LC-MS metabolomics[@gloaguen2022]. A feature selection pipeline based on neural network and genetic algorithm could be applied for metabolomics data analysis[@lisitsyna2022].

## MS/MS

Various data acquisition workflow could be checked here[@fenaille2017].
Various data acquisition workflow could be checked here[@fenaille2017]. Before using MS/MS annotation, it's better to know that DDA and DIA will lose precursor found in MS1[@guo2020a;@stincone2023].

### MRM

Expand Down Expand Up @@ -98,11 +102,15 @@ Other related papers could be found here to cover SWATH and other topic in DIA[@

- DIAMetAlyzer is a pipeline for assay library generation and targeted analysis with statistical validation.[@alka2022]

- MetaboMSDIA: A tool for implementing data-independent acquisition in metabolomic-based mass spectrometry analysis[@ledesma-escobar2023].

- CRISP: a cross-run ion selection and peak-picking (CRISP) tool that utilizes the important advantage of run-to-run consistency of DIA and simultaneously examines the DIA data from the whole set of runs to filter out the interfering signals, instead of only looking at a single run at a time[@yan2023].

## Retention Time Correction

For single file, we could get peaks. However, we should make the peaks align across samples for as features and retention time corrections should be performed. The basic idea behind retention time correction is that use the high quality grouped peaks to make a new retention time. You might choose `obiwarp`(for dramatic shifts) or loess regression(fast) method to get the corrected retention time for all of the samples. Remember the original retention times might be changed and you might need cross-correct the data. After the correction, you could group the peaks again for a better cross-sample peaks list. However, if you directly use `obiwarp`, you don't have to group peaks before correction.

This paper show a matlab based shift correction methods[@fu2017]. Retention time correction is a Parametric time warping process and this paper is a good start [@wehrens2015]. Meanwhile, you could use MS2 for retention time correction[@li2017b].
This paper show a matlab based shift correction methods[@fu2017]. Retention time correction is a Parametric time warping process and this paper is a good start [@wehrens2015]. Meanwhile, you could use MS2 for retention time correction[@li2017b]. This work is a python based RI system and peak shift correction model, significantly enhancing alignment accuracy[@hao2023].

## Filling missing values

Expand Down
Loading

0 comments on commit 4e158a4

Please sign in to comment.