add new

yufree · Aug 26, 2016 · af5ea1d · af5ea1d
commit af5ea1d
Show file tree

Hide file tree

Showing 64 changed files with 5,812 additions and 0 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -0,0 +1,2 @@
+^.*\.Rproj$
+^\.Rproj\.user$
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
diff --git a/01-projectsetup.Rmd b/01-projectsetup.Rmd
@@ -0,0 +1,19 @@
+# Project Setup
+
+I suggest building your data analysis projects in RStudio(Click File - New project - New dictionary - Empty project). Then assign a name for your projects. I also recommend the following tips if you are familiar with it.
+
+- Use [git](https://git-scm.com/)/[github](https://github.com/) to make version control of your code and sync your project online.
+
+- NOT use your name for your project because other peoples might cooperate with you and someone might check your data when you publish your papers. Each project should be a work for one paper or one chapter in your thesis.
+
+- Use **workflow** document(txt or doc) in your project to record all of the steps and code you performed for this project. Treat this document as digital version of your experiment notebook
+
+- Use **data** folder in your project folder for the raw data and the results you get in data analysis
+
+- Use **figure** folder in your project folder for the figure
+
+- Use **munuscript** folder in your project folder for the manuscript (you could write paper in rstudio with the help of template in [Rmarkdown](https://github.com/rstudio/rticles))
+
+- The best way to begin your study with project is copy the contents in this folder into your new project folder. Remember not copy the **metademo.Rproj** into your folder because you already have one.
+
+- Just double click **[yourprojectname].Rproj** to start your project
diff --git a/02-doe.Rmd b/02-doe.Rmd
@@ -0,0 +1,7 @@
+# Exprimental design(DoE)
+
+Before you perform any metabolomic studies, a clean and meaningful experimental design is always the best start. You need two groups at least: treated group and control group. 
+
+If there are other co-factors, a linear model or randomizing would be applied to eliminated their influences. You need to record the values of those co-factors for further data analysis. Common co-factors in metabolomic studies are age, gender, location, etc.
+
+If you need data correction, some background or calibration samples are required. However, control samples could also be used for data correction in certain DoE.
diff --git a/03-rawdata.Rmd b/03-rawdata.Rmd
@@ -0,0 +1,19 @@
+# Raw data pretreatment
+
+Collection of your Raw data from the instrument such as LC-MS or GC-MS is the beginning of data analysis.
+
+However, Raw data were hard to be analyzed. GC/LC-MS data are usually be shown by a matrix with column standing for retention time and row standing for mass. Noises are so much that such data could not be processed effeciently.
+
+```{r singledata, fig.show='hold', fig.cap='Demo of GC/LC-MS data',echo=FALSE}
+knitr::include_graphics('images/singledata.png')
+```
+
+Conversation from the mass-retention time matrix into a vector with selected MS peaks at certain retention time is the basic idea of the Raw data pretreatment.
+
+With many groups of samples, you will get another data matrix with column standing for ions at cerntain retention time and row standing for samples after the Raw data pretreatment.
+
+```{r multidata, fig.show='hold', fig.cap='Demo of many GC/LC-MS data',echo=FALSE}
+knitr::include_graphics('images/multidata.png')
+```
+
+However, before you get the peaks, some corrections should be performed such as mass shift and retention time shift. The basic idea behind retention time correction is that use the high quality grouped peaks to make a new retention time. You might choose obiwarp(based on similarity, recommended, see this [paper](http://pubs.acs.org/doi/abs/10.1021/ac0605344)) or loess regression method to get the corrected retention time for all of the samples. Remember the original retention times might be changed and you might need cross-correct for mass data correction.
diff --git a/04-peakselection.Rmd b/04-peakselection.Rmd
@@ -0,0 +1,7 @@
+# Peaks selection
+
+After we get corrected peaks across samples, the next step is finding the differences between two groups. Actually, you could perform ANOVA or Kruskal-Wallis Test for comparison among more than two groups. The basic idea behind statistic analysis is to find the meaningful differences between groups and extract such ions or peak groups. 
+
+So how to find the differences? In most metabolomics software, such task is completed by a t-test and report p-value and fold changes. If you only compare two groups on one peaks, that's OK. However, if you compare two groups on thousands of peaks, statistic textbook would tell you to notice the false discovery rate(FDR). For one comparasion, the confidence level is 0.05, which means 5% chances to get false positive result. For two comparasions, such chances would be $1-0.95^2$. For 10 comparasions, such chances would be $1-0.95^{10} = 0.4012631$. For 100 comparasions, such chances would be $1-0.95^{100} = 0.9940795$. You would almost certainly to make mistakes for your results.
+
+In statistics, FDR control is always mentioned in omics studies. I suggested using q-values to control FDR. If q-value is less than 0.05, we should expect a lower than 5% chances we make the wrong selections for all of the comparisions.
diff --git a/05-omics.Rmd b/05-omics.Rmd
@@ -0,0 +1,9 @@
+# Omics analysis
+
+When you get the filtered ions, the next step is making annotations for them. Such annotations would be helpful for omics studies.
+
+Since we have got the annotations, Omics analysis could be performed.Upload the data obtained from the **xcms** to other tools or databases.
+
+You will get an updated database list [here](http://metabolomicssociety.org/resources/metabolomics-databases)
+
+Right now, it is hard to connect different omics databases such as gene, protein and metabolites together for a whole scope of certain biological process. However, you might select few metabolites across those databases and find something interesting.
diff --git a/06-dataanalysis.Rmd b/06-dataanalysis.Rmd
@@ -0,0 +1,6 @@
+# Common analysis methods for metabolomics
+
+## PCA
+
+## PLS-DA
+
diff --git a/07-demo.Rmd b/07-demo.Rmd
@@ -0,0 +1,160 @@
+# Demo
+
+**xcms** does not support all of the Raw files from every mass spectrometry manufacturers. You need to convert your Raw data into some open-source [data format](https://en.wikipedia.org/wiki/Mass_spectrometry_data_format) such as mzData, mzXML or CDF files. The tool is **MScovert** from [**ProteoWizard**](http://proteowizard.sourceforge.net/).
+
+Here is a demo:
+
+```{r demo1,message=F,warning=FALSE}
+# install the packages for data analysis and 
+# source("https://bioconductor.org/biocLite.R")
+# biocLite(c("multtest","faahKO","xcms","qvalue","CAMERA"))
+# load the functions and dataset for demo
+
+library(multtest)
+library(xcms)
+library(faahKO)
+# get the demo data in faahKO packages
+cdfpath <- system.file("cdf",package = "faahKO")
+# show the name of demo data
+list.files(cdfpath,recursive = T)
+```
+
+Here is a demo for *xcmsSet*:
+
+```{r demo3,warning=F}
+cdffiles <- list.files(cdfpath, recursive = TRUE, full.names = TRUE)
+xset <- xcmsSet(cdffiles)
+xset
+```
+
+## Find the peaks
+
+The first step to process the MS data is that find the peaks against the noises. In **xcms**, all of related staffs are handled by *xcmsSet* function. 
+
+For any functions in **xcms** or **R**, you could get their documents by type `?` before certain function. Another geek way is input the name of the function in the console of Rstudio and press F1 for help.
+
+```{r demo2,eval=F}
+?xcmsSet
+```
+
+In the document of *xcmsset*, we could set the sample classes, profmethod, profparam, polarity,etc. In the online version, such configurations are shown in certain windows. In the local analysis environment, such parameters are setup by yourselves. However, I think the default configurations could satisfied most of the analysis because related information should have been recorded in your Raw data and **xcms** could find them. All you need to do is that show the data dictionary for *xcmsSet*. 
+
+If your data have many groups such as control and treated group, just put them in separate subfolder of the data folder and *xcmsSet* would read them as separated groups.
+
+The output was an object with class of *xcmsSet*. You could see a summary by type the name. In this cases, *xcmsSet* found 4721 peaks with time range 41.8-69.1 min and mass range 200.1-599.3338 m/z in the 12 samples.
+
+Another function which might be useful is `group`. This function will add additional information about the same analytes for `xcmsSet` objects.
+
+```{r demo4}
+xset <- group(xset)
+xset
+```
+
+Now you see there are 403 groups in the demo data, which meant 403 analytes are found across 4721 peaks.
+
+## Data correction
+
+Reasons of data correction might come from many aspects such as the unstable instrument and pollution on column. In **xcms**, the most important correction is retention time correction. 
+
+Remember the original retention time might changed and use another object to save the new object:
+
+```{r demo5}
+xset2 <- retcor(xset, method = "obiwarp")
+xset2
+# you need group the peaks again for this corrected data
+xset2 <- group(xset2)
+xset2
+```
+
+You see one more peak groups after the correction. After the retention time correction, we also need to correct the peak groups by filling the missing peaks. Such function calls *fillpeaks*:
+
+```{r demo6}
+xset3 <- fillPeaks(xset2)
+xset3
+```
+
+You see more peaks found.
+
+## Statistic analysis
+
+Right now we get peaks across samples, the next step is finding the differences between two groups. You will find the P values of t-test for pairwise comparison:
+
+```{r demo7}
+reporttab <- diffreport(xset3, "WT", "KO", "example")
+reporttab[1:3,]
+```
+
+Now you have got the ions that varies a lot between groups. Such ions are things we should take care of. In a ideal case, this is the endpoint of your study and the left work is making a report of your finding.
+
+However,we need q-values to control FDR. To get the q-values, you need input p-values and use the function from **qvalue** package.
+
+```{r demo8}
+library(qvalue)
+# extract the p-value to caculate q-value
+qvalue <- qvalue(p=reporttab$pvalue)
+# add qvalue to reporttab
+reporttab$qvalue <- qvalue$qvalues
+reporttab[1:3,]
+```
+
+For further information about q-value, check [here](https://en.wikipedia.org/wiki/False_discovery_rate#q-value).
+
+After the FDR control, the following steps depend on your study.
+
+## Annotation
+
+I suggest **CAMERA** package to handle this task. You need to prepare an object of class *xcmsSet*, for example, *xset3*(remember to use *fillpeaks* to get the ions group).
+
+```{r demo9}
+library(CAMERA)
+# Create an xsAnnotate object
+xsa <- xsAnnotate(xset3)
+# Group after RT value of the xcms grouped peak
+xsaF <- groupFWHM(xsa, perfwhm=0.6)
+# Verify grouping
+xsaC <- groupCorr(xsaF)
+# Annotate isotopes, could be done before groupCorr
+xsaFI <- findIsotopes(xsaC)
+# Annotate adducts
+xsaFA <- findAdducts(xsaFI, polarity="positive")
+# See the results
+getPeaklist(xsaFA)[1:3,]
+# Get final peaktable and store on harddrive
+# write.csv(getPeaklist(xsaFA),file="data/result_CAMERA.csv")
+```
+
+Any steps after the *annotation* could be operated solo and you may not need the isotopes or adducts. You could also use *annotateDiffreport* to show the results as *diffreport* in **xcms**.
+
+```{r demo10}
+# make a diffreport with CAMERA result and extract the fold change higher than 3
+dreport <- annotateDiffreport(xset3, fc_th = 3)
+# extract the p-value to caculate q-value
+qvalue <- qvalue(p=dreport$pvalue)
+# add qvalue to reporttab
+dreport$qvalue <- qvalue$qvalues
+# See the results
+dreport[1:3,]
+# save on harddrive
+# write.csv(dreport,file='data/diffreport.csv')
+```
+
+## Omics analysis
+
+Since we have got the annotations, Omics analysis could be performed. In **xcms**, the default database is **metlin**. You could directly get the link to certain compounds when you generate the differences report.
+
+```{r demo11}
+# make a diffreport with CAMERA result and extract the fold change higher than 3, add the metlin links
+dreport <- annotateDiffreport(xset3, fc_th = 3, metlin = T)
+# extract the p-value to caculate q-value
+qvalue <- qvalue(p=dreport$pvalue)
+# add qvalue to reporttab
+dreport$qvalue <- qvalue$qvalues
+# See the results
+dreport[1:3,]
+# save on harddrive
+# write.csv(dreport,file='data/diffreport.csv')
+```
+
+This is the offline metaboliomics data process workflow. For each study, details would be different and F1 is always your best friend. 
+
+Enjoy yourself in data mining!
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -0,0 +1,5 @@
+Package: placeholder
+Title: Does not matter.
+Version: 0.0.1
+Imports: bookdown
+Remotes: rstudio/bookdown
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,117 @@
+CC0 1.0 Universal
+
+Statement of Purpose
+
+The laws of most jurisdictions throughout the world automatically confer
+exclusive Copyright and Related Rights (defined below) upon the creator and
+subsequent owner(s) (each and all, an "owner") of an original work of
+authorship and/or a database (each, a "Work").
+
+Certain owners wish to permanently relinquish those rights to a Work for the
+purpose of contributing to a commons of creative, cultural and scientific
+works ("Commons") that the public can reliably and without fear of later
+claims of infringement build upon, modify, incorporate in other works, reuse
+and redistribute as freely as possible in any form whatsoever and for any
+purposes, including without limitation commercial purposes. These owners may
+contribute to the Commons to promote the ideal of a free culture and the
+further production of creative, cultural and scientific works, or to gain
+reputation or greater distribution for their Work in part through the use and
+efforts of others.
+
+For these and/or other purposes and motivations, and without any expectation
+of additional consideration or compensation, the person associating CC0 with a
+Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
+and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
+and publicly distribute the Work under its terms, with knowledge of his or her
+Copyright and Related Rights in the Work and the meaning and intended legal
+effect of CC0 on those rights.
+
+1. Copyright and Related Rights. A Work made available under CC0 may be
+protected by copyright and related or neighboring rights ("Copyright and
+Related Rights"). Copyright and Related Rights include, but are not limited
+to, the following:
+
+  i. the right to reproduce, adapt, distribute, perform, display, communicate,
+  and translate a Work;
+
+  ii. moral rights retained by the original author(s) and/or performer(s);
+
+  iii. publicity and privacy rights pertaining to a person's image or likeness
+  depicted in a Work;
+
+  iv. rights protecting against unfair competition in regards to a Work,
+  subject to the limitations in paragraph 4(a), below;
+
+  v. rights protecting the extraction, dissemination, use and reuse of data in
+  a Work;
+
+  vi. database rights (such as those arising under Directive 96/9/EC of the
+  European Parliament and of the Council of 11 March 1996 on the legal
+  protection of databases, and under any national implementation thereof,
+  including any amended or successor version of such directive); and
+
+  vii. other similar, equivalent or corresponding rights throughout the world
+  based on applicable law or treaty, and any national implementations thereof.
+
+2. Waiver. To the greatest extent permitted by, but not in contravention of,
+applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
+unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
+and Related Rights and associated claims and causes of action, whether now
+known or unknown (including existing as well as future claims and causes of
+action), in the Work (i) in all territories worldwide, (ii) for the maximum
+duration provided by applicable law or treaty (including future time
+extensions), (iii) in any current or future medium and for any number of
+copies, and (iv) for any purpose whatsoever, including without limitation
+commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
+the Waiver for the benefit of each member of the public at large and to the
+detriment of Affirmer's heirs and successors, fully intending that such Waiver
+shall not be subject to revocation, rescission, cancellation, termination, or
+any other legal or equitable action to disrupt the quiet enjoyment of the Work
+by the public as contemplated by Affirmer's express Statement of Purpose.
+
+3. Public License Fallback. Should any part of the Waiver for any reason be
+judged legally invalid or ineffective under applicable law, then the Waiver
+shall be preserved to the maximum extent permitted taking into account
+Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
+is so judged Affirmer hereby grants to each affected person a royalty-free,
+non transferable, non sublicensable, non exclusive, irrevocable and
+unconditional license to exercise Affirmer's Copyright and Related Rights in
+the Work (i) in all territories worldwide, (ii) for the maximum duration
+provided by applicable law or treaty (including future time extensions), (iii)
+in any current or future medium and for any number of copies, and (iv) for any
+purpose whatsoever, including without limitation commercial, advertising or
+promotional purposes (the "License"). The License shall be deemed effective as
+of the date CC0 was applied by Affirmer to the Work. Should any part of the
+License for any reason be judged legally invalid or ineffective under
+applicable law, such partial invalidity or ineffectiveness shall not
+invalidate the remainder of the License, and in such case Affirmer hereby
+affirms that he or she will not (i) exercise any of his or her remaining
+Copyright and Related Rights in the Work or (ii) assert any associated claims
+and causes of action with respect to the Work, in either case contrary to
+Affirmer's express Statement of Purpose.
+
+4. Limitations and Disclaimers.
+
+  a. No trademark or patent rights held by Affirmer are waived, abandoned,
+  surrendered, licensed or otherwise affected by this document.
+
+  b. Affirmer offers the Work as-is and makes no representations or warranties
+  of any kind concerning the Work, express, implied, statutory or otherwise,
+  including without limitation warranties of title, merchantability, fitness
+  for a particular purpose, non infringement, or the absence of latent or
+  other defects, accuracy, or the present or absence of errors, whether or not
+  discoverable, all to the greatest extent permissible under applicable law.
+
+  c. Affirmer disclaims responsibility for clearing rights of other persons
+  that may apply to the Work or any use thereof, including without limitation
+  any person's Copyright and Related Rights in the Work. Further, Affirmer
+  disclaims responsibility for obtaining any necessary consents, permissions
+  or other rights required for any use of the Work.
+
+  d. Affirmer understands and acknowledges that Creative Commons is not a
+  party to this document and has no duty or obligation with respect to this
+  CC0 or use of the Work.
+
+For more information, please see
+<http://creativecommons.org/publicdomain/zero/1.0/>
+
diff --git a/_book/applications.html b/_book/applications.html
diff --git a/_book/bookdown-demo_files/figure-html/unnamed-chunk-1-1.png b/_book/bookdown-demo_files/figure-html/unnamed-chunk-1-1.png