Skip to content

Commit 2985585

Browse files
committed
added content from intro-to-r repo
0 parents  commit 2985585

File tree

125 files changed

+107500
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+107500
-0
lines changed

README.md

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Introduction to R
2+
3+
| Audience | Computational skills required | Duration |
4+
:----------|:-------------|:----------|
5+
| Biologists | None | 1.5 or 2-day workshop (~ 9 - 13 hours of trainer-led time)|
6+
7+
### Description
8+
This repository has teaching materials for a hands-on **Introduction to R** workshop. The workshop will introduce participants to the basics of R and RStudio. R is a simple programming environment that enables the effective handling of data, while providing excellent graphical support. RStudio is a tool that provides a user-friendly environment for working with R.
9+
10+
These materials are intended to provide both basic R programming knowledge and its application for increasing efficiency for data analysis.
11+
12+
> These materials are developed for a trainer-led workshop, but also amenable to self-guided learning.
13+
14+
### Learning Objectives
15+
16+
1. **R syntax**: Understand the different 'parts of speech'.
17+
2. **Data types structures in R**: Describe the various data types and data structures.
18+
3. **Data inspection and wrangling**: Demonstrate the utilization of functions and indices to inspect and subset data from various data structures.
19+
4. **Visualizing data**: Demonstrate the use of the ggplot2 package to create plots for easy data visualization.
20+
21+
### Lessons
22+
**[Click here](schedules/1.5-day.md) for links to lessons and the suggested schedule**
23+
24+
### Installation Requirements
25+
26+
Download the most recent versions of R and RStudio for the appropriate OS using the links below:
27+
28+
- [R](https://cran.r-project.org/)
29+
- [RStudio](https://www.rstudio.com/products/rstudio/download/#download)
30+
31+
### Dataset
32+
33+
All the files used for the above lessons are linked within, but can also be [accessed here](https://github.com/hbctraining/Intro-to-R-with-DGE/tree/master/data).
34+
35+
---
36+
*These materials have been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*
37+
38+
* *Some materials used in these lessons were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/).
39+
All Data Carpentry instructional material is made available under the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0).*

_config.yml

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
theme: jekyll-theme-cayman
2+
title: Introduction to R

assets/css/style.scss

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
---
3+
4+
@import "{{ site.theme }}";
5+
6+
.page-header { color: #fff; text-align: center; background-image: url("../images/dna-sequence-1600x800.jpg"); }
7+
8+
.main-content h1, .main-content h2, .main-content h3, .main-content h4, .main-content h5, .main-content h6 { margin-top: 2rem; margin-bottom: 1rem; font-weight: normal; color: #000000; }
201 KB
Loading

data/Mov10_full_meta.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
sampletype MOVexpr
2+
Mov10_kd_2 MOV10_knockdown low
3+
Mov10_kd_3 MOV10_knockdown low
4+
Mov10_oe_1 MOV10_overexpression high
5+
Mov10_oe_2 MOV10_overexpression high
6+
Mov10_oe_3 MOV10_overexpression high
7+
Irrel_kd_1 control normal
8+
Irrel_kd_2 control normal
9+
Irrel_kd_3 control normal

data/animals.csv

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
,speed,color
2+
Elephant,40,Gray
3+
Cheetah,120,Tan
4+
Tortoise,0.1,Green
5+
Hare,48,Grey
6+
Lion,80,Tan
7+
PolarBear,30,White

data/counts.rpkm

+38,829
Large diffs are not rendered by default.

data/mouse_exp_design.csv

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
genotype,celltype,replicate
2+
sample1,Wt,typeA,1
3+
sample2,Wt,typeA,2
4+
sample3,Wt,typeA,3
5+
sample4,KO,typeA,1
6+
sample5,KO,typeA,2
7+
sample6,KO,typeA,3
8+
sample7,Wt,typeB,1
9+
sample8,Wt,typeB,2
10+
sample9,Wt,typeB,3
11+
sample10,KO,typeB,1
12+
sample11,KO,typeB,2
13+
sample12,KO,typeB,3

data/normalized_counts.txt

+23,369
Large diffs are not rendered by default.

data/ordered_counts_rpkm.csv

+38,829
Large diffs are not rendered by default.

dataset.zip

1.08 MB
Binary file not shown.

homework/Intro_to_R_hw.md

+107
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Introduction to R practice
2+
3+
## Creating vectors/factors and dataframes
4+
5+
1. We are performing RNA-Seq on cancer samples being treated with three different types of treatment (A, B, and P). You have 12 samples total, with 4 replicates per treatment. Write the R code you would use to construct your metadata table as described below.
6+
- Create the vectors/factors for each column (Hint: you can type out each vector/factor, or if you want the process go faster try exploring the `rep()` function).
7+
- Put them together into a dataframe called `meta`.
8+
- Use the `rownames()` function to assign row names to the dataframe (Hint: you can type out the row names as a vector, or if you want the process go faster try exploring the `paste()` function).
9+
10+
Your finished metadata table should have information for the variables `sex`, `stage`, `treatment`, and `myc` levels:
11+
12+
| |sex | stage | treatment | myc |
13+
|:--:|:--: | :--: | :------: | :--: |
14+
|sample1| M |I |A |2343|
15+
|sample2| F |II |A |457|
16+
|sample3 |M |II |A |4593|
17+
|sample4 |F |I |A |9035|
18+
|sample5| M |II |B |3450|
19+
|sample6| F| II| B| 3524|
20+
|sample7| M| I| B| 958|
21+
|sample8| F| II| B| 1053|
22+
|sample9| M| II| P| 8674|
23+
|sample10 |F| I |P |3424|
24+
|sample11| M |II |P |463|
25+
|sample12| F| II| P| 5105|
26+
27+
28+
## Subsetting vectors/factors and dataframes
29+
30+
2. Using the `meta` data frame from question #1, write out the R code you would use to perform the following operations (questions **DO NOT** build upon each other):
31+
32+
- return only the `treatment` and `sex` columns using `[]`:
33+
- return the `treatment` values for samples 5, 7, 9, and 10 using `[]`:
34+
- use `filter()` to return all data for those samples receiving treatment `P`:
35+
- use `filter()`/`select()`to return only the `stage` and `treatment` columns for those samples with `myc` > 5000:
36+
- remove the `treatment` column from the dataset using `[]`:
37+
- remove samples 7, 8 and 9 from the dataset using `[]`:
38+
- keep only samples 1-6 using `[]`:
39+
- add a column called `pre_treatment` to the beginning of the dataframe with the values T, F, F, F, T, T, F, T, F, F, T, T (Hint: use `cbind()`):
40+
- change the names of the columns to: "A", "B", "C", "D":
41+
42+
## Extracting components from lists
43+
3. Create a new list, `list_hw` with three components, the `glengths` vector, the dataframe `df`, and `number` value. Use this list to answer the questions below . `list_hw` has the following structure (NOTE: the components of this list are not currently named):
44+
45+
[[1]]
46+
[1] 4.6 3000.0 50000.0
47+
48+
[[2]]
49+
species glengths
50+
1 ecoli 4.6
51+
2 human 3000.0
52+
3 corn 50000.0
53+
54+
[[3]]
55+
[1] 8
56+
57+
Write out the R code you would use to perform the following operations (questions **DO NOT** build upon each other):
58+
- return the second component of the list:
59+
- return `50000.0` from the first component of the list:
60+
- return the value `human` from the second component:
61+
- give the components of the list the following names: "genome_lengths", "genomes", "record":
62+
63+
## Creating figures with ggplot2
64+
65+
![plot_image](plotcounts.png)
66+
67+
4. Create the same plot as above using ggplot2 using the provided metadata and counts datasets. The [metadata table](https://github.com/hbc/Intro-to-R-2-day/raw/master/data/Mov10_full_meta.txt) describes an experiment that you have setup for RNA-seq analysis, while the [associated count matrix](https://github.com/hbc/Intro-to-R-2-day/raw/master/data/normalized_counts.txt) gives the normalized counts for each sample for every gene. Download the count matrix and metadata using the links provided.
68+
69+
Follow the instructions below to build your plot. Write the code you used and provide the final image.
70+
71+
- Read in the metadata file using: `meta <- read.delim("Mov10_full_meta.txt", sep="\t", row.names=1)`
72+
73+
- Read in the count matrix file using: `data <- read.delim("normalized_counts.txt", sep="\t", row.names=1)`
74+
75+
- Create a vector called `expression` that contains the normalized count values from the row in normalized_counts that corresponds to the MOV10 gene.
76+
77+
- Check the class of this expression vector. Then, convert it to a numeric vector using `as.numeric(expression)`
78+
79+
- Bind that vector to your metadata data frame (`meta`) and call the new data frame `df`.
80+
81+
- Create a ggplot by constructing the plot line by line:
82+
83+
- Initialize a ggplot with your `df` as input.
84+
85+
- Add the `geom_jitter()` geometric object with the required aesthetics which are x and y.
86+
87+
- Color the points based on `sampletype`
88+
89+
- Add the `theme_bw()` layer
90+
91+
- Add the title "Expression of MOV10" to the plot
92+
93+
- Change the x-axis label to be blank
94+
95+
- Change the y-axis label to "Normalized counts"
96+
97+
- Using `theme()` change the following properties of the plot:
98+
99+
- Remove the legend (Hint: use ?theme help and scroll down to legend.position)
100+
101+
- Change the plot title size to 1.5x the default and center align
102+
103+
- Change the axis title to 1.5x the default size
104+
105+
- Change the size of the axis text only on the y-axis to 1.25x the default size
106+
107+
- Rotate the x-axis text to 45 degrees using `axis.text.x=element_text(angle=45, hjust=1)`

0 commit comments

Comments
 (0)