This repository will host the OpenPlant funded R/Bioconductor for proteomics material. The funding was awarded to Jan Sklenar (TSL, Norwich), Laurent Gatto (UCam), Marielle Vigouroux (JIC), Govind Chandra (JIC) to introduce and implement the utilisation of various R for proteomics tools in the Sainsbury Laboratory and more generally on the Norwich campus.
The project will run 6 sessions/days over 6 months with the aim to
- Identify needs and opportunities
- Train invdividuals
- Dedicated development and integration with existing tools
- Introductory talk: R and Bioconductor for proteomics
During this session, we discussed the concrete needs and opportunities that would be tackled as part of this project, and set a schedule for the sessions until June.
Download and install R and RStudio. In case you already have R installed, make you have R 3.4.3. To check the version:
> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status Patched
major 3
minor 4.3
year 2017
month 12
day 12
svn rev 73903
language R
version.string R version 3.4.3 Patched (2017-12-12 r73903)
nickname Kite-Eating Tree
Once the software are installed, open RStudio and install Bioconcuctor packages:
source("http://www.bioconductor.org/biocLite.R")
biocLite(c("MSnbase", "msmsTests", "rpx", "pRoloc", "pRolocdata", "msdata"))
To test the installation, load MSnbase
:
library("MSnbase")
- Introduction to R and RStudio and Bioconductor
- R documentation and vignettes
- Variables, vectors and dataframes
- Manipulating data
- Plotting
- Saving and loading data (binary and csv)
- Installing packages
The content of that session is available in the rfp1
Rproject/dir.
Summary from last time:
- Using R and RStudio.
- Data structures: vectors, dataframes and MSnSets.
- Subsetting using
[
and$
. - Data input:
read.csv
andreadMSnSet2
. - Data output:
save
andload
.
We didn't get time to see plotting and package installation.
Material for session 3 will focus on consolidating our understanding and usage of dedicated R/Bioconductor packages for MS and proteomics.
- For identification, use mzid files, that can be opened with
mzR::openIDfile
orMSnbase:::readMzIdData
. To visualise data over time, annotate (with filename and date) and combine there. - Raw data, with
readMSData(, mode = "onDisk")
. - Differential expression of count data with
msmsTests
.
## Title: R for Proteomics
The final presentation of OpenPlant project aimed to facilitate proteomics data analysis using the existing infrastructure in R environment. The objective to organize training and explore R for Proteomics, an open source project available in Bioconductor packages was met in several workshops with the developer. The generated code was applied to current JIC/TSL projects. We set up basic proteomics data pipeline that is an independent alternative to the existing, mostly commercial, software. The results of this teamwork, experience working with the R packages and perspectives will be presented and discussed by team members.
Those who carry out proteomics experiments and seek a deeper understanding of data analysis involved and, those wishing to learn to use open source proteomics tools available in Bioconductor.
### The team: **Laurent Gatto**, RfP developer, Professor of Bioinformatics, Université catholique de Louvain, Belgium
Jan Sklenar, proteomics and mass spectrometry specialist, The Sainsbury Laboratory, Norwich
Marielle Vigouroux, Bioinformatician, John Innes Centre, Norwich
Govind Chandra, Bioinformatician, John Innes Centre, Norwich
### Key words: Proteomics data analysis, R, open source.