-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathtutorials.rst
431 lines (302 loc) · 32 KB
/
tutorials.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
Tutorials & Demos
=============================================================================
This section covers a breadth of walkthroughs using different datasets and analytical techniques.
Structure of Data Sets
----------------------
SmartPeak is optimised to process data set folders with the following structure :
#. featureFilterComponentGroups.csv
This file contains component group names and their properties such as ``retention_time`` and ``intensity``.
The header including a sample entry is shown below:
.. table:: featureFilterComponentGroups.csv Headers
:widths: auto
==================== ========= ========= ========= ========= ============= ============= =============== =============== =============== =============== =============== =============== ===================== ===================== =========== =========== ================ ================ =========== =========== ================= =================
component_group_name n_heavy_l n_heavy_u n_light_l n_light_u n_detecting_l n_detecting_u n_quantifying_l n_quantifying_u n_identifying_l n_identifying_u n_transitions_l n_transitions_u ion_ratio_pair_name_1 ion_ratio_pair_name_2 ion_ratio_l ion_ratio_u retention_time_l retention_time_u intensity_l intensity_u overall_quality_l overall_quality_u
==================== ========= ========= ========= ========= ============= ============= =============== =============== =============== =============== =============== =============== ===================== ===================== =========== =========== ================ ================ =========== =========== ================= =================
Serotonin 0 10 0 10 0 10 0 10 0 10 0 10 1.37 2.37 -1.00E+12 1.00E+15 -1.00E+12 1.00E+12
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
==================== ========= ========= ========= ========= ============= ============= =============== =============== =============== =============== =============== =============== ===================== ===================== =========== =========== ================ ================ =========== =========== ================= =================
|
#. featureFilterComponents.csv
This file contains component names and their properties such as ``retention_time`` and ``intensity``.
The header including a sample entry is shown below:
.. table:: featureFilterComponents.csv Headers
:widths: auto
==================== =========== =========== ================= ================= ================ ================
component_name intensity_l intensity_u overall_quality_l overall_quality_u retention_time_l retention_time_u
==================== =========== =========== ================= ================= ================ ================
Serotonin 100 1.00E+15 -500 1.00E+12 0.87 2.87
\- \- \- \- \- \- \-
==================== =========== =========== ================= ================= ================ ================
|
#. featureQCComponentGroups.csv
This file contains QC component group names and their properties such as ``retention_time`` and ``intensity``.
The header including a sample entry is shown below:
.. table:: featureFilterComponents.csv Headers
:widths: auto
==================== ========= ========= ========= ========= ============= ============= =============== =============== =============== =============== =============== =============== ===================== ===================== =========== =========== ================ ================ =========== =========== ================= =================
component_group_name n_heavy_l n_heavy_u n_light_l n_light_u n_detecting_l n_detecting_u n_quantifying_l n_quantifying_u n_identifying_l n_identifying_u n_transitions_l n_transitions_u ion_ratio_pair_name_1 ion_ratio_pair_name_2 ion_ratio_l ion_ratio_u retention_time_l retention_time_u intensity_l intensity_u overall_quality_l overall_quality_u
==================== ========= ========= ========= ========= ============= ============= =============== =============== =============== =============== =============== =============== ===================== ===================== =========== =========== ================ ================ =========== =========== ================= =================
Serotonin 0 10 0 10 1 10 1 10 0 10 1 10 1.37 2.37 0 1.00E+15 -500 1.00E+12
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
==================== ========= ========= ========= ========= ============= ============= =============== =============== =============== =============== =============== =============== ===================== ===================== =========== =========== ================ ================ =========== =========== ================= =================
|
#. featureQCComponents.csv
This file contains feature QC component names and their properties such as ``metaValue_peak_apex_int_l`` and ``metaValue_logSN_l``.
The header including a sample entry is shown below:
.. table:: featureQCComponents.csv Headers
:widths: auto
============== ========================= ========================= ================= ================= ======================= ======================= ======================= ======================= ========================== ========================== ============================ ============================ =================================== =================================== ================================== ================================== ===================================== ===================================== ================= ================= =========== =========== ================= =================
component_name metaValue_peak_apex_int_l metaValue_peak_apex_int_u metaValue_logSN_l metaValue_logSN_u metaValue_total_width_l metaValue_total_width_u metaValue_width_at_50_l metaValue_width_at_50_u metaValue_tailing_factor_l metaValue_tailing_factor_u metaValue_asymmetry_factor_l metaValue_asymmetry_factor_u metaValue_baseline_delta_2_height_l metaValue_baseline_delta_2_height_u metaValue_points_across_baseline_l metaValue_points_across_baseline_u metaValue_points_across_half_height_l metaValue_points_across_half_height_u metaValue_logSN_l metaValue_logSN_u intensity_l intensity_u overall_quality_l overall_quality_u
============== ========================= ========================= ================= ================= ======================= ======================= ======================= ======================= ========================== ========================== ============================ ============================ =================================== =================================== ================================== ================================== ===================================== ===================================== ================= ================= =========== =========== ================= =================
Serotonin 0 5.00E+06 1 1.00E+06 0.1 1 0.001 0.25 0 2 0.8 2.5 -0.25 0.25 20 500 5 500 1 1.00E+06 0 1.00E+15 0 1.00E+12
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
============== ========================= ========================= ================= ================= ======================= ======================= ======================= ======================= ========================== ========================== ============================ ============================ =================================== =================================== ================================== ================================== ===================================== ===================================== ================= ================= =========== =========== ================= =================
|
#. features
This folder contains features info either in the
`featureXML <https://raw.githubusercontent.com/OpenMS/OpenMS/develop/share/OpenMS/SCHEMAS/FeatureXML_1_9.xsd>`_ format or in
`Chromeleon CDS7 TXT <https://www.thermofisher.com/order/catalog/product/CHROMELEON7>`_ format.
Features contain the processed results of a single sample after applying a workflow and running the command ``STORE_FEATURES`` and
can be stored at both the sample (i.e., injection) or the sample group (merged injections) level.
|
#. mzML
This folder contains mass spectrometry data files in the most widely open-source format : `mzML <https://www.psidev.info/mzML>`_
which descripes raw spectrometer value, a single mzML file usually encapsulates all the information extracted from a single MS run.
Converting raw files to the ``mzML`` format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`Proteowizard <https://proteowizard.sourceforge.io/download.html>`_ is a toolkit to view and convert mass spectrometry
data. The tools kit includes SeeMS and MSConvert to visualize MS data and convert to and from ``mzML`` file format respectively.
Once installed, you will be presented with the following window:
.. image:: ../images/msconvert.png
To convert mass spectrometry files, please follow the steps below:
#. Select the file you wish to convert.
#. Add the selected file to the list of files to process.
#. Select the output directory where the converted files will be saved.
#. Select the output format, providing an extension to the file in not mandatory.
#. Check/uncheck compression and packaging options if desired.
#. Select how many files you wish to convert at once, then hit the "Start" button.
|
#. parameters.csv
This file contains a list of parameters for the workflow steps as well as various other application settings,
the header including a sample entry is shown below:
.. table:: parameters.csv Headers
:widths: auto
========== =================== ===== ====== ======= ============ ======================================== ===== =========
function name type value default restrictions description used_ comment_
========== =================== ===== ====== ======= ============ ======================================== ===== =========
MRMMapping precursor_tolerance float 0.0009 0.1 0.1 Precursor tolerance when mapping (in Th) TRUE
\- \- \- \- \- \- \- \- \-
========== =================== ===== ====== ======= ============ ======================================== ===== =========
|
#. quantitationMethods.csv
This file contains information about various quantitation methods and their values,
and is required for absolute quantitation.
The header including a sample entry is shown below:
.. table:: quantitationMethods.csv Headers
:widths: auto
=============== =================== ============= =================== ==== ==== ==== ==== ======================= ==================== ======== ==================== ================================ ==================================== =================================== =================================== ====================================== ====================================== ====================================== ======================================
IS_name component_name feature_name concentration_units llod ulod lloq uloq correlation_coefficient actual_concentration n_points transformation_model transformation_model_param_slope transformation_model_param_intercept transformation_model_param_x_weight transformation_model_param_y_weight transformation_model_param_x_datum_min transformation_model_param_x_datum_max transformation_model_param_y_datum_min transformation_model_param_y_datum_max
=============== =================== ============= =================== ==== ==== ==== ==== ======================= ==================== ======== ==================== ================================ ==================================== =================================== =================================== ====================================== ====================================== ====================================== ======================================
s7p.s7p_1.Heavy s7p.s7p_1.Light peak_apex_int uM linear 1.88269 1.93E-04 1.00E-15 1.00E+15 1.00E-15 1.00E+15
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
=============== =================== ============= =================== ==== ==== ==== ==== ======================= ==================== ======== ==================== ================================ ==================================== =================================== =================================== ====================================== ====================================== ====================================== ======================================
|
#. sequence.csv
This file contains information about all the injections in the data set and their values,
fields such as `sample_name` and `original_filename` refer to the files names stored in the `mzML` folder.
Please note that information in the sequence that are not required for some workflows are left empty.
The header including a sample entry is shown below:
.. table:: parameters.csv Headers
:widths: auto
============================== ================= ===================== =========== ============================== ========== =========== ============ ========== ========== =============== ========== ================ ============= =============== ================ ========================= ============= ============= ==============
sample_name sample_group_name sequence_segment_name sample_type original_filename batch_name rack_number plate_number pos_number inj_number dilution_factor inj_volume inj_volume_units operator_name acq_method_name proc_method_name acquisition_date_and_time scan_polarity scan_mass_low scan_mass_high
============================== ================= ===================== =========== ============================== ========== =========== ============ ========== ========== =============== ========== ================ ============= =============== ================ ========================= ============= ============= ==============
170808_Jonathan_yeast_Sacc2_1x group1 sequence1 Unknown 170808_Jonathan_yeast_Sacc2_1x BatchName 2 3 uL MethodName
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
============================== ================= ===================== =========== ============================== ========== =========== ============ ========== ========== =============== ========== ================ ============= =============== ================ ========================= ============= ============= ==============
|
#. standardsConcentrations.csv
This file contains information about concentration values fot the provided samples/components,
this file is required for automated calibration curve fitting.
The header including a sample entry is shown below:
.. table:: standardsConcentrations.csv Headers
:widths: auto
================= =================== =================== ==================== ======================= =================== ===============
sample_name component_name IS_component_name actual_concentration IS_actual_concentration concentration_units dilution_factor
================= =================== =================== ==================== ======================= =================== ===============
150516_CM1_Level1 23dpg.23dpg_1.Light 23dpg.23dpg_1.Heavy 0 1 uM 1
\- \- \- \- \- \- \-
================= =================== =================== ==================== ======================= =================== ===============
|
#. traML.csv
This file contains a summary of the proteins or metabolites with transition names and other related information,
the header including a sample entry is shown below:
.. table:: traML.csv Headers
:widths: auto
=========== =============== =================== =================== ============= ========== =========== ======= ========= ======= ===== ========== ================ ======================== ================ ===== =============== ========= =============== ============== ============ ==================== ====================== ====================== ====================
ProteinName FullPeptideName transition_group_id transition_name RetentionTime Annotation PrecursorMz MS1 Res ProductMz MS2 Res Dwell Fragmentor Collision Energy Cell Accelerator Voltage LibraryIntensity decoy PeptideSequence LabelType PrecursorCharge FragmentCharge FragmentType FragmentSeriesNumber quantifying_transition identifying_transition detecting_transition
=========== =============== =================== =================== ============= ========== =========== ======= ========= ======= ===== ========== ================ ======================== ================ ===== =============== ========= =============== ============== ============ ==================== ====================== ====================== ====================
arg-L arg-L arg-L.arg-L_1.Heavy 45.85610358 179 Unit 136 Unit 1 0 Heavy 1 1 1 TRUE FALSE TRUE
\- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \- \-
=========== =============== =================== =================== ============= ========== =========== ======= ========= ======= ===== ========== ================ ======================== ================ ===== =============== ========= =============== ============== ============ ==================== ====================== ====================== ====================
|
#. workflow.csv
The workflow steps, which are parsed by SmartPeak to process the data,
are listed in this file under the column ``command_name``.
A full list of the commands can be found in :ref:`Workflow Commands`.
|
Targeted quantitation with HPLC data
------------------------------------
This tutorial walks you through the workflow for analyzing targeted HPLC data
starting from input file generation, to processing the data in SmartPeak,
to reviewing the data in SmartPeak, to reporting the results for later use.
Objectives
~~~~~~~~~~
#. Obtaining the SOP for the workflow.
#. Choosing a data set for demonstrating the workflow.
#. Creating an optimized SmartPeak input templates for running the workflow.
The Workflows include
~~~~~~~~~~~~~~~~~~~~~
#. Calculating the calibration curves using Standards
#. Processing Unknowns
Steps
~~~~~
The tutorial includes the following steps :
#. Setting up the input files
The data set used can be found here
`HPLC_UV_Standards <https://github.com/AutoFlowResearch/SmartPeak/tree/develop/src/examples/data/HPLC_UV_Standards>`_ and
`HPLC_UV_Unknowns <https://github.com/AutoFlowResearch/SmartPeak/tree/develop/src/examples/data/HPLC_UV_Unknowns>`_
for the HPLC UV Standards and HPLC UV Unknowns respectively.
#. Defining the workflow in SmartPeak
For HPLC UV Standards analysis, the following steps are saved
into the ``workflow.csv`` file. Alternatively, steps can be replaced,
added or deleted direclty from SmartPeakGUI.
A detailed explanation of each command step
can be found in :ref:`Workflow Commands`.
* LOAD_RAW_DATA
* MAP_CHROMATOGRAMS
* EXTRACT_CHROMATOGRAM_WINDOWS
* ZERO_CHROMATOGRAM_BASELINE
* PICK_MRM_FEATURES
* CHECK_FEATURES
* SELECT_FEATURES
* CALCULATE_CALIBRATION
* STORE_QUANTITATION_METHODS
* QUANTIFY_FEATURES
* STORE_FEATURES
The calibration curve can be inspected after all workflow steps had been run, to do so please
click on view and then "Calibrators". From the transition tab select Antranilicacid and Indole
as ``transition_group`` to plot their concentration curves within the given concentration range as
shown below:
.. image:: ../images/hplc_uv_standards_calibration_curve.png
To inspect the features for the selected transition groups, select "Features (line)" from the view menu
then open the features tab (can be opened from the view menu as well) to select the "asymetry_factors" and "logSN"
in the plot column. The line plot illistrates the value for each transition group and feature as shown below:
.. image:: ../images/hplc_uv_standards_features_line.png
The features can also be plotted as a heatmap, under "view" select "Features (heatmap)" then select the "left_width"
feature to display transition groups as a heatmap and compare the values from the same injection as shown below:
.. image:: ../images/hplc_uv_standards_features_heatmap.png
The workflow step ``STORE_QUANTITATION_METHODS`` writes the calibration model for each transition, an excerpt can be seen below:
.. table:: Generated sequence1_quantitationMethods.csv
:widths: auto
=============== =================== ============= =================== ==== ==== ==== ==== ======================= ======== ==================== =================================== ====================================== ====================================== =================================== ====================================== ====================================== =============================================== ================================ ====================================
IS_name component_name feature_name concentration_units llod ulod lloq uloq correlation_coefficient n_points transformation_model transformation_model_param_y_weight transformation_model_param_y_datum_min transformation_model_param_y_datum_max transformation_model_param_x_weight transformation_model_param_x_datum_min transformation_model_param_x_datum_max transformation_model_param_symmetric_regression transformation_model_param_slope transformation_model_param_intercept
=============== =================== ============= =================== ==== ==== ==== ==== ======================= ======== ==================== =================================== ====================================== ====================================== =================================== ====================================== ====================================== =============================================== ================================ ====================================
_ Antranilicacid intensity ug/mL 0.0 0.0 0.5 2500 0.998679668124795 7 linear ln(y) -1.0e15 1.0e15 ln(x) -1.0e15 1.0e15 FALSE 1.353587567241049 0.369814545757549
_ Indole intensity ug/mL 0.0 0.0 0.5 50.0 0.998763546720702 6 linear ln(y) -1.0e15 1.0e15 ln(x) -1.0e15 1.0e15 FALSE 0.995574540930201 3.242340261658038
=============== =================== ============= =================== ==== ==== ==== ==== ======================= ======== ==================== =================================== ====================================== ====================================== =================================== ====================================== ====================================== =============================================== ================================ ====================================
This file is used to apply the predefined calibration model to each transition by running the ``QUANTIFY_FEATURES`` workflow step.
The workflow steps for HPLC UV Unknowns are :
* LOAD_RAW_DATA
* MAP_CHROMATOGRAMS
* EXTRACT_CHROMATOGRAM_WINDOWS
* ZERO_CHROMATOGRAM_BASELINE
* PICK_MRM_FEATURES
* QUANTIFY_FEATURES
* CHECK_FEATURES
* SELECT_FEATURES
* STORE_FEATURES
To inspect the features for the selected transition groups, select "Features (line)" from the view menu
then open the features tab (can be opened from the view menu as well) to select the "asymetry_factors" and "logSN"
in the plot column. The line plot illistrates the value for each transition group and feature as shown below:
.. image:: ../images/hplc_uv_unknowns_features_line.png
The features can also be plotted as a heatmap, under "view" select "Features (heatmap)" then select the "asymetry_factors"
feature to display transition groups as a heatmap and compare the values from the same injection as shown below:
.. image:: ../images/hplc_uv_unknowns_features_heatmap.png
To plot the intensities over time for given injections and transitions, view the "chromatogram" from the "view" menu
then select the injections and transitions to plot from their respective tabs on the left. The following shows the chromatogram
for two injections using Antranilicacid and 5-HTP2 transitions and their intensity differences over time.
.. image:: ../images/hplc_uv_unknowns_chromatogram.png
#. Running the workflow in SmartPeak
To run the analysis, please follow the steps for
:ref:`Using SmartPeak GUI` or :ref:`Using SmartPeak CLI`
to execute the workflow steps and review the results including plotting.
#. Reporting the results
To export the results, select "Report" from the "Actions" which will show the
"Create Report" window:
.. image:: ../images/hplc_uv_standards_exports.png
Based in the data you wish to export, select the desired "Sample types" from the left pane
and select the "Metadata" from the right pane then click on of the buttons below to create
the report with the selected items in the csv format. More details on exporting the results can be found
in :ref:`Export report`.
Targeted quantitation with LC-MS/MS 5500 QTRAP RapidRIP
-------------------------------------------------------
Targeted flux analysis with LC-MS/MS 5500 QTRAP
-----------------------------------------------
Targeted flux analysis with GC-MS full-scan Agilent
---------------------------------------------------
Targeted flux analysis with GC-MS SIM Agilent
---------------------------------------------
Non-targeted FIA-MS analysis with Thermo Orbitrap
-------------------------------------------------
This tutorial walks you through the workflow for analyzing targeted HPLC data
starting from input file generation, to processing the data in SmartPeak,
to reviewing the data in SmartPeak, to reporting the results for later use.
Objectives
~~~~~~~~~~
#. Obtaining the SOP for the workflow.
#. Choosing a data set for demonstrating the workflow.
#. Creating an optimized SmartPeak input templates for running the workflow.
The Workflows include
~~~~~~~~~~~~~~~~~~~~~
#. Calculating the calibration curves using Standards
#. Processing Unknowns
Steps
~~~~~
The tutorial includes the following steps :
#. Setting up the input files
The data set used can be found here
`FIAMS FullScan Unknowns <https://github.com/AutoFlowResearch/SmartPeak/tree/develop/src/examples/data/FIAMS_FullScan_Unknowns>`_.
#. Defining the workflow in SmartPeak
For FIAMS FullScan Unknowns analysis, the following steps are saved
into the ``workflow.csv`` file. Alternatively, steps can be replaced,
added or deleted direclty from SmartPeakGUI within the "workflow" tap in the right pane.
A detailed explanation of each command step
can be found in :ref:`Workflow Commands`.
* LOAD_RAW_DATA
* EXTRACT_SPECTRA_WINDOWS
* MERGE_SPECTRA
* PICK_MS1_FEATURES
* SEARCH_ACCURATE_MASS
* STORE_ANNOTATIONS
* STORE_FEATURES
* ESTIMATE_FEATURE_BACKGROUND_INTERFERENCES
* STORE_FEATURE_BACKGROUND_ESTIMATIONS
* FILTER_FEATURES_BACKGROUND_INTERFERENCES
* MERGE_FEATURES
* MERGE_INJECTIONS
* STORE_FEATURES_SAMPLE_GROUP
The Spectra for the two injection samples can be inspected after all workflow steps had been run, to do so please
click on view and then "Spectra". From the Injections tab check "Plot/Unplot All" select all injection samples and
plot the mass to charge ratio relative to their respective intensities as shown below:
.. image:: ../images/fiams_fullscan_unknowns_spectra.png
#. Reporting the results
To export the results, select "Report" from the "Actions" which will show the
"Create Report" window:
.. image:: ../images/fiams_fullscan_unknowns_exports.png
Based in the data you wish to export, select the desired "Sample types" from the left pane
and select the "Metadata" from the right pane then click on of the buttons below to create
the report with the selected items in the csv format. More details on exporting the results can be found
in :ref:`Export report`.
Non-targeted LC-MS/MS DDA analysis with Thermo Orbitrap
-------------------------------------------------------