Skip to content

Commit

Permalink
add new
Browse files Browse the repository at this point in the history
  • Loading branch information
yufree committed Aug 26, 2016
0 parents commit af5ea1d
Show file tree
Hide file tree
Showing 64 changed files with 5,812 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
^.*\.Rproj$
^\.Rproj\.user$
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
19 changes: 19 additions & 0 deletions 01-projectsetup.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Project Setup

I suggest building your data analysis projects in RStudio(Click File - New project - New dictionary - Empty project). Then assign a name for your projects. I also recommend the following tips if you are familiar with it.

- Use [git](https://git-scm.com/)/[github](https://github.com/) to make version control of your code and sync your project online.

- NOT use your name for your project because other peoples might cooperate with you and someone might check your data when you publish your papers. Each project should be a work for one paper or one chapter in your thesis.

- Use **workflow** document(txt or doc) in your project to record all of the steps and code you performed for this project. Treat this document as digital version of your experiment notebook

- Use **data** folder in your project folder for the raw data and the results you get in data analysis

- Use **figure** folder in your project folder for the figure

- Use **munuscript** folder in your project folder for the manuscript (you could write paper in rstudio with the help of template in [Rmarkdown](https://github.com/rstudio/rticles))

- The best way to begin your study with project is copy the contents in this folder into your new project folder. Remember not copy the **metademo.Rproj** into your folder because you already have one.

- Just double click **[yourprojectname].Rproj** to start your project
7 changes: 7 additions & 0 deletions 02-doe.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Exprimental design(DoE)

Before you perform any metabolomic studies, a clean and meaningful experimental design is always the best start. You need two groups at least: treated group and control group.

If there are other co-factors, a linear model or randomizing would be applied to eliminated their influences. You need to record the values of those co-factors for further data analysis. Common co-factors in metabolomic studies are age, gender, location, etc.

If you need data correction, some background or calibration samples are required. However, control samples could also be used for data correction in certain DoE.
19 changes: 19 additions & 0 deletions 03-rawdata.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Raw data pretreatment

Collection of your Raw data from the instrument such as LC-MS or GC-MS is the beginning of data analysis.

However, Raw data were hard to be analyzed. GC/LC-MS data are usually be shown by a matrix with column standing for retention time and row standing for mass. Noises are so much that such data could not be processed effeciently.

```{r singledata, fig.show='hold', fig.cap='Demo of GC/LC-MS data',echo=FALSE}
knitr::include_graphics('images/singledata.png')
```

Conversation from the mass-retention time matrix into a vector with selected MS peaks at certain retention time is the basic idea of the Raw data pretreatment.

With many groups of samples, you will get another data matrix with column standing for ions at cerntain retention time and row standing for samples after the Raw data pretreatment.

```{r multidata, fig.show='hold', fig.cap='Demo of many GC/LC-MS data',echo=FALSE}
knitr::include_graphics('images/multidata.png')
```

However, before you get the peaks, some corrections should be performed such as mass shift and retention time shift. The basic idea behind retention time correction is that use the high quality grouped peaks to make a new retention time. You might choose obiwarp(based on similarity, recommended, see this [paper](http://pubs.acs.org/doi/abs/10.1021/ac0605344)) or loess regression method to get the corrected retention time for all of the samples. Remember the original retention times might be changed and you might need cross-correct for mass data correction.
7 changes: 7 additions & 0 deletions 04-peakselection.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Peaks selection

After we get corrected peaks across samples, the next step is finding the differences between two groups. Actually, you could perform ANOVA or Kruskal-Wallis Test for comparison among more than two groups. The basic idea behind statistic analysis is to find the meaningful differences between groups and extract such ions or peak groups.

So how to find the differences? In most metabolomics software, such task is completed by a t-test and report p-value and fold changes. If you only compare two groups on one peaks, that's OK. However, if you compare two groups on thousands of peaks, statistic textbook would tell you to notice the false discovery rate(FDR). For one comparasion, the confidence level is 0.05, which means 5% chances to get false positive result. For two comparasions, such chances would be $1-0.95^2$. For 10 comparasions, such chances would be $1-0.95^{10} = 0.4012631$. For 100 comparasions, such chances would be $1-0.95^{100} = 0.9940795$. You would almost certainly to make mistakes for your results.

In statistics, FDR control is always mentioned in omics studies. I suggested using q-values to control FDR. If q-value is less than 0.05, we should expect a lower than 5% chances we make the wrong selections for all of the comparisions.
9 changes: 9 additions & 0 deletions 05-omics.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Omics analysis

When you get the filtered ions, the next step is making annotations for them. Such annotations would be helpful for omics studies.

Since we have got the annotations, Omics analysis could be performed.Upload the data obtained from the **xcms** to other tools or databases.

You will get an updated database list [here](http://metabolomicssociety.org/resources/metabolomics-databases)

Right now, it is hard to connect different omics databases such as gene, protein and metabolites together for a whole scope of certain biological process. However, you might select few metabolites across those databases and find something interesting.
6 changes: 6 additions & 0 deletions 06-dataanalysis.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Common analysis methods for metabolomics

## PCA

## PLS-DA

160 changes: 160 additions & 0 deletions 07-demo.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Demo

**xcms** does not support all of the Raw files from every mass spectrometry manufacturers. You need to convert your Raw data into some open-source [data format](https://en.wikipedia.org/wiki/Mass_spectrometry_data_format) such as mzData, mzXML or CDF files. The tool is **MScovert** from [**ProteoWizard**](http://proteowizard.sourceforge.net/).

Here is a demo:

```{r demo1,message=F,warning=FALSE}
# install the packages for data analysis and
# source("https://bioconductor.org/biocLite.R")
# biocLite(c("multtest","faahKO","xcms","qvalue","CAMERA"))
# load the functions and dataset for demo
library(multtest)
library(xcms)
library(faahKO)
# get the demo data in faahKO packages
cdfpath <- system.file("cdf",package = "faahKO")
# show the name of demo data
list.files(cdfpath,recursive = T)
```

Here is a demo for *xcmsSet*:

```{r demo3,warning=F}
cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)
xset <- xcmsSet(cdffiles)
xset
```

## Find the peaks

The first step to process the MS data is that find the peaks against the noises. In **xcms**, all of related staffs are handled by *xcmsSet* function.

For any functions in **xcms** or **R**, you could get their documents by type `?` before certain function. Another geek way is input the name of the function in the console of Rstudio and press F1 for help.

```{r demo2,eval=F}
?xcmsSet
```

In the document of *xcmsset*, we could set the sample classes, profmethod, profparam, polarity,etc. In the online version, such configurations are shown in certain windows. In the local analysis environment, such parameters are setup by yourselves. However, I think the default configurations could satisfied most of the analysis because related information should have been recorded in your Raw data and **xcms** could find them. All you need to do is that show the data dictionary for *xcmsSet*.

If your data have many groups such as control and treated group, just put them in separate subfolder of the data folder and *xcmsSet* would read them as separated groups.

The output was an object with class of *xcmsSet*. You could see a summary by type the name. In this cases, *xcmsSet* found 4721 peaks with time range 41.8-69.1 min and mass range 200.1-599.3338 m/z in the 12 samples.

Another function which might be useful is `group`. This function will add additional information about the same analytes for `xcmsSet` objects.

```{r demo4}
xset <- group(xset)
xset
```

Now you see there are 403 groups in the demo data, which meant 403 analytes are found across 4721 peaks.

## Data correction

Reasons of data correction might come from many aspects such as the unstable instrument and pollution on column. In **xcms**, the most important correction is retention time correction.

Remember the original retention time might changed and use another object to save the new object:

```{r demo5}
xset2 <- retcor(xset, method = "obiwarp")
xset2
# you need group the peaks again for this corrected data
xset2 <- group(xset2)
xset2
```

You see one more peak groups after the correction. After the retention time correction, we also need to correct the peak groups by filling the missing peaks. Such function calls *fillpeaks*:

```{r demo6}
xset3 <- fillPeaks(xset2)
xset3
```

You see more peaks found.

## Statistic analysis

Right now we get peaks across samples, the next step is finding the differences between two groups. You will find the P values of t-test for pairwise comparison:

```{r demo7}
reporttab <- diffreport(xset3, "WT", "KO", "example")
reporttab[1:3,]
```

Now you have got the ions that varies a lot between groups. Such ions are things we should take care of. In a ideal case, this is the endpoint of your study and the left work is making a report of your finding.

However,we need q-values to control FDR. To get the q-values, you need input p-values and use the function from **qvalue** package.

```{r demo8}
library(qvalue)
# extract the p-value to caculate q-value
qvalue <- qvalue(p=reporttab$pvalue)
# add qvalue to reporttab
reporttab$qvalue <- qvalue$qvalues
reporttab[1:3,]
```

For further information about q-value, check [here](https://en.wikipedia.org/wiki/False_discovery_rate#q-value).

After the FDR control, the following steps depend on your study.

## Annotation

I suggest **CAMERA** package to handle this task. You need to prepare an object of class *xcmsSet*, for example, *xset3*(remember to use *fillpeaks* to get the ions group).

```{r demo9}
library(CAMERA)
# Create an xsAnnotate object
xsa <- xsAnnotate(xset3)
# Group after RT value of the xcms grouped peak
xsaF <- groupFWHM(xsa, perfwhm=0.6)
# Verify grouping
xsaC <- groupCorr(xsaF)
# Annotate isotopes, could be done before groupCorr
xsaFI <- findIsotopes(xsaC)
# Annotate adducts
xsaFA <- findAdducts(xsaFI, polarity="positive")
# See the results
getPeaklist(xsaFA)[1:3,]
# Get final peaktable and store on harddrive
# write.csv(getPeaklist(xsaFA),file="data/result_CAMERA.csv")
```

Any steps after the *annotation* could be operated solo and you may not need the isotopes or adducts. You could also use *annotateDiffreport* to show the results as *diffreport* in **xcms**.

```{r demo10}
# make a diffreport with CAMERA result and extract the fold change higher than 3
dreport <- annotateDiffreport(xset3, fc_th = 3)
# extract the p-value to caculate q-value
qvalue <- qvalue(p=dreport$pvalue)
# add qvalue to reporttab
dreport$qvalue <- qvalue$qvalues
# See the results
dreport[1:3,]
# save on harddrive
# write.csv(dreport,file='data/diffreport.csv')
```

## Omics analysis

Since we have got the annotations, Omics analysis could be performed. In **xcms**, the default database is **metlin**. You could directly get the link to certain compounds when you generate the differences report.

```{r demo11}
# make a diffreport with CAMERA result and extract the fold change higher than 3, add the metlin links
dreport <- annotateDiffreport(xset3, fc_th = 3, metlin = T)
# extract the p-value to caculate q-value
qvalue <- qvalue(p=dreport$pvalue)
# add qvalue to reporttab
dreport$qvalue <- qvalue$qvalues
# See the results
dreport[1:3,]
# save on harddrive
# write.csv(dreport,file='data/diffreport.csv')
```

This is the offline metaboliomics data process workflow. For each study, details would be different and F1 is always your best friend.

Enjoy yourself in data mining!
5 changes: 5 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Package: placeholder
Title: Does not matter.
Version: 0.0.1
Imports: bookdown
Remotes: rstudio/bookdown
117 changes: 117 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
CC0 1.0 Universal

Statement of Purpose

The laws of most jurisdictions throughout the world automatically confer
exclusive Copyright and Related Rights (defined below) upon the creator and
subsequent owner(s) (each and all, an "owner") of an original work of
authorship and/or a database (each, a "Work").

Certain owners wish to permanently relinquish those rights to a Work for the
purpose of contributing to a commons of creative, cultural and scientific
works ("Commons") that the public can reliably and without fear of later
claims of infringement build upon, modify, incorporate in other works, reuse
and redistribute as freely as possible in any form whatsoever and for any
purposes, including without limitation commercial purposes. These owners may
contribute to the Commons to promote the ideal of a free culture and the
further production of creative, cultural and scientific works, or to gain
reputation or greater distribution for their Work in part through the use and
efforts of others.

For these and/or other purposes and motivations, and without any expectation
of additional consideration or compensation, the person associating CC0 with a
Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
and publicly distribute the Work under its terms, with knowledge of his or her
Copyright and Related Rights in the Work and the meaning and intended legal
effect of CC0 on those rights.

1. Copyright and Related Rights. A Work made available under CC0 may be
protected by copyright and related or neighboring rights ("Copyright and
Related Rights"). Copyright and Related Rights include, but are not limited
to, the following:

i. the right to reproduce, adapt, distribute, perform, display, communicate,
and translate a Work;

ii. moral rights retained by the original author(s) and/or performer(s);

iii. publicity and privacy rights pertaining to a person's image or likeness
depicted in a Work;

iv. rights protecting against unfair competition in regards to a Work,
subject to the limitations in paragraph 4(a), below;

v. rights protecting the extraction, dissemination, use and reuse of data in
a Work;

vi. database rights (such as those arising under Directive 96/9/EC of the
European Parliament and of the Council of 11 March 1996 on the legal
protection of databases, and under any national implementation thereof,
including any amended or successor version of such directive); and

vii. other similar, equivalent or corresponding rights throughout the world
based on applicable law or treaty, and any national implementations thereof.

2. Waiver. To the greatest extent permitted by, but not in contravention of,
applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
and Related Rights and associated claims and causes of action, whether now
known or unknown (including existing as well as future claims and causes of
action), in the Work (i) in all territories worldwide, (ii) for the maximum
duration provided by applicable law or treaty (including future time
extensions), (iii) in any current or future medium and for any number of
copies, and (iv) for any purpose whatsoever, including without limitation
commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
the Waiver for the benefit of each member of the public at large and to the
detriment of Affirmer's heirs and successors, fully intending that such Waiver
shall not be subject to revocation, rescission, cancellation, termination, or
any other legal or equitable action to disrupt the quiet enjoyment of the Work
by the public as contemplated by Affirmer's express Statement of Purpose.

3. Public License Fallback. Should any part of the Waiver for any reason be
judged legally invalid or ineffective under applicable law, then the Waiver
shall be preserved to the maximum extent permitted taking into account
Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
is so judged Affirmer hereby grants to each affected person a royalty-free,
non transferable, non sublicensable, non exclusive, irrevocable and
unconditional license to exercise Affirmer's Copyright and Related Rights in
the Work (i) in all territories worldwide, (ii) for the maximum duration
provided by applicable law or treaty (including future time extensions), (iii)
in any current or future medium and for any number of copies, and (iv) for any
purpose whatsoever, including without limitation commercial, advertising or
promotional purposes (the "License"). The License shall be deemed effective as
of the date CC0 was applied by Affirmer to the Work. Should any part of the
License for any reason be judged legally invalid or ineffective under
applicable law, such partial invalidity or ineffectiveness shall not
invalidate the remainder of the License, and in such case Affirmer hereby
affirms that he or she will not (i) exercise any of his or her remaining
Copyright and Related Rights in the Work or (ii) assert any associated claims
and causes of action with respect to the Work, in either case contrary to
Affirmer's express Statement of Purpose.

4. Limitations and Disclaimers.

a. No trademark or patent rights held by Affirmer are waived, abandoned,
surrendered, licensed or otherwise affected by this document.

b. Affirmer offers the Work as-is and makes no representations or warranties
of any kind concerning the Work, express, implied, statutory or otherwise,
including without limitation warranties of title, merchantability, fitness
for a particular purpose, non infringement, or the absence of latent or
other defects, accuracy, or the present or absence of errors, whether or not
discoverable, all to the greatest extent permissible under applicable law.

c. Affirmer disclaims responsibility for clearing rights of other persons
that may apply to the Work or any use thereof, including without limitation
any person's Copyright and Related Rights in the Work. Further, Affirmer
disclaims responsibility for obtaining any necessary consents, permissions
or other rights required for any use of the Work.

d. Affirmer understands and acknowledges that Creative Commons is not a
party to this document and has no duty or obligation with respect to this
CC0 or use of the Work.

For more information, please see
<http://creativecommons.org/publicdomain/zero/1.0/>

Empty file added _book/applications.html
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit af5ea1d

Please sign in to comment.