hbctraining
diff --git a/‎README.md
+39 b/‎README.md
+39
diff --git a/‎_config.yml
+2 b/‎_config.yml
+2
diff --git a/‎assets/css/style.scss
+8 b/‎assets/css/style.scss
+8
diff --git a/‎assets/images/dna-sequence-1600x800.jpg
201 KB b/‎assets/images/dna-sequence-1600x800.jpg
201 KB
diff --git a/‎data/Mov10_full_meta.txt
+9 b/‎data/Mov10_full_meta.txt
+9
diff --git a/‎data/animals.csv
+7 b/‎data/animals.csv
+7
diff --git a/‎data/counts.rpkm
+38,829 b/‎data/counts.rpkm
+38,829
diff --git a/‎data/mouse_exp_design.csv
+13 b/‎data/mouse_exp_design.csv
+13
diff --git a/‎data/normalized_counts.txt
+23,369 b/‎data/normalized_counts.txt
+23,369
diff --git a/‎data/ordered_counts_rpkm.csv
+38,829 b/‎data/ordered_counts_rpkm.csv
+38,829
diff --git a/‎dataset.zip
1.08 MB b/‎dataset.zip
1.08 MB
diff --git a/‎homework/Intro_to_R_hw.md
+107 b/‎homework/Intro_to_R_hw.md
+107
@@ -0,0 +1,39 @@
+## Introduction to R
+
+| Audience | Computational skills required | Duration |
+:----------|:-------------|:----------|
+| Biologists | None | 1.5 or 2-day workshop (~ 9 - 13 hours of trainer-led time)|
+
+### Description
+This repository has teaching materials for a hands-on **Introduction to R** workshop. The workshop will introduce participants to the basics of R and RStudio. R is a simple programming environment that enables the effective handling of data, while providing excellent graphical support. RStudio is a tool that provides a user-friendly environment for working with R. 
+
+These materials are intended to provide both basic R programming knowledge and its application for increasing efficiency for data analysis. 
+
+> These materials are developed for a trainer-led workshop, but also amenable to self-guided learning.
+
+### Learning Objectives
+
+1. **R syntax**: Understand the different 'parts of speech'.
+2. **Data types structures in R**: Describe the various data types and data structures.
+3. **Data inspection and wrangling**: Demonstrate the utilization of functions and indices to inspect and subset data from various data structures.
+4. **Visualizing data**: Demonstrate the use of the ggplot2 package to create plots for easy data visualization.
+
+### Lessons
+**[Click here](schedules/1.5-day.md) for links to lessons and the suggested schedule**
+
+### Installation Requirements
+
+Download the most recent versions of R and RStudio for the appropriate OS using the links below:
+
+ - [R](https://cran.r-project.org/) 
+ - [RStudio](https://www.rstudio.com/products/rstudio/download/#download)
+
+### Dataset
+
+All the files used for the above lessons are linked within, but can also be [accessed here](https://github.com/hbctraining/Intro-to-R-with-DGE/tree/master/data).
+
+---
+*These materials have been developed by members of the teaching team at the [Harvard Chan Bioinformatics Core (HBC)](http://bioinformatics.sph.harvard.edu/). These are open access materials distributed under the terms of the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.*
+
+* *Some materials used in these lessons were derived from work that is Copyright © Data Carpentry (http://datacarpentry.org/). 
+All Data Carpentry instructional material is made available under the [Creative Commons Attribution license](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0).*
@@ -0,0 +1,2 @@
+theme: jekyll-theme-cayman
+title: Introduction to R
@@ -0,0 +1,8 @@
+---
+---
+
+@import "{{ site.theme }}";
+
+.page-header { color: #fff; text-align: center; background-image: url("../images/dna-sequence-1600x800.jpg"); }
+
+.main-content h1, .main-content h2, .main-content h3, .main-content h4, .main-content h5, .main-content h6 { margin-top: 2rem; margin-bottom: 1rem; font-weight: normal; color: #000000; }
@@ -0,0 +1,9 @@
+	sampletype	MOVexpr
+Mov10_kd_2	MOV10_knockdown	low
+Mov10_kd_3	MOV10_knockdown	low
+Mov10_oe_1	MOV10_overexpression	high
+Mov10_oe_2	MOV10_overexpression	high
+Mov10_oe_3	MOV10_overexpression	high
+Irrel_kd_1	control	normal
+Irrel_kd_2	control	normal
+Irrel_kd_3	control	normal
@@ -0,0 +1,7 @@
+,speed,color
+Elephant,40,Gray
+Cheetah,120,Tan
+Tortoise,0.1,Green
+Hare,48,Grey
+Lion,80,Tan
+PolarBear,30,White
@@ -0,0 +1,13 @@
+genotype,celltype,replicate
+sample1,Wt,typeA,1
+sample2,Wt,typeA,2
+sample3,Wt,typeA,3
+sample4,KO,typeA,1
+sample5,KO,typeA,2
+sample6,KO,typeA,3
+sample7,Wt,typeB,1
+sample8,Wt,typeB,2
+sample9,Wt,typeB,3
+sample10,KO,typeB,1
+sample11,KO,typeB,2
+sample12,KO,typeB,3
@@ -0,0 +1,107 @@
+# Introduction to R practice
+
+## Creating vectors/factors and dataframes
+
+1. We are performing RNA-Seq on cancer samples being treated with three different types of treatment (A, B, and P). You have 12 samples total, with 4 replicates per treatment. Write the R code you would use to construct your metadata table as described below.  
+     - Create the vectors/factors for each column (Hint: you can type out each vector/factor, or if you want the process go faster try exploring the `rep()` function).
+     - Put them together into a dataframe called `meta`.
+     - Use the `rownames()` function to assign row names to the dataframe (Hint: you can type out the row names as a vector, or if you want the process go faster try exploring the `paste()` function).
+
+     Your finished metadata table should have information for the variables `sex`, `stage`, `treatment`, and `myc` levels: 
+
+     | |sex	| stage	| treatment	| myc |
+     |:--:|:--: | :--:	| :------:	| :--: |
+     |sample1|	M	|I	|A	|2343|
+     |sample2|	F	|II	|A	|457|
+     |sample3	|M	|II	|A	|4593|
+     |sample4	|F	|I	|A	|9035|
+     |sample5|	M	|II	|B	|3450|
+     |sample6|	F|	II|	B|	3524|
+     |sample7|	M|	I|	B|	958|
+     |sample8|	F|	II|	B|	1053|
+     |sample9|	M|	II|	P|	8674|
+     |sample10	|F|	I	|P	|3424|
+     |sample11|	M	|II	|P	|463|
+     |sample12|	F|	II|	P|	5105|
+
+ 
+## Subsetting vectors/factors and dataframes
+
+2. Using the `meta` data frame from question #1, write out the R code you would use to perform the following operations (questions **DO NOT** build upon each other):
+
+     - return only the `treatment` and `sex` columns using `[]`:
+     - return the `treatment` values for samples 5, 7, 9, and 10 using `[]`:
+     - use `filter()` to return all data for those samples receiving treatment `P`:
+     - use `filter()`/`select()`to return only the `stage` and `treatment` columns for those samples with `myc` > 5000:
+     - remove the `treatment` column from the dataset using `[]`:
+     - remove samples 7, 8 and 9 from the dataset using `[]`:
+     - keep only samples 1-6 using `[]`:
+     - add a column called `pre_treatment` to the beginning of the dataframe with the values T, F, F, F, T, T, F, T, F, F, T, T (Hint: use `cbind()`): 
+     - change the names of the columns to: "A", "B", "C", "D":
+ 
+## Extracting components from lists
+3. Create a new list, `list_hw` with three components, the `glengths` vector, the dataframe `df`, and `number` value. Use this list to answer the questions below . `list_hw` has the following structure (NOTE: the components of this list are not currently named):
+
+          [[1]]
+          [1]   4.6  3000.0 50000.0 
+
+          [[2]]
+                 species  glengths 
+            1    ecoli    4.6
+            2    human    3000.0
+            3    corn     50000.0
+
+          [[3]]
+          [1] 8
+
+Write out the R code you would use to perform the following operations (questions **DO NOT** build upon each other):
+ - return the second component of the list:
+ - return `50000.0` from the first component of the list:
+ - return the value `human` from the second component: 
+ - give the components of the list the following names: "genome_lengths", "genomes", "record":
+   
+## Creating figures with ggplot2
+
+![plot_image](plotcounts.png)
+
+4. Create the same plot as above using ggplot2 using the provided metadata and counts datasets. The [metadata table](https://github.com/hbc/Intro-to-R-2-day/raw/master/data/Mov10_full_meta.txt) describes an experiment that you have setup for RNA-seq analysis, while the [associated count matrix](https://github.com/hbc/Intro-to-R-2-day/raw/master/data/normalized_counts.txt) gives the normalized counts for each sample for every gene. Download the count matrix and metadata using the links provided.
+
+     Follow the instructions below to build your plot. Write the code you used and provide the final image.
+
+     - Read in the metadata file using: `meta <- read.delim("Mov10_full_meta.txt", sep="\t", row.names=1)`
+
+     - Read in the count matrix file using: `data <- read.delim("normalized_counts.txt", sep="\t", row.names=1)`
+
+     - Create a vector called `expression` that contains the normalized count values from the row in normalized_counts that corresponds to the MOV10 gene.  
+
+     - Check the class of this expression vector. Then, convert it to a numeric vector using `as.numeric(expression)`
+
+     - Bind that vector to your metadata data frame (`meta`) and call the new data frame `df`. 
+
+     - Create a ggplot by constructing the plot line by line:
+     
+          - Initialize a  ggplot with your `df` as input.
+
+          - Add the `geom_jitter()` geometric object with the required aesthetics which are x and y.
+
+          - Color the points based on `sampletype`
+
+          - Add the `theme_bw()` layer 
+
+          - Add the title "Expression of MOV10" to the plot
+
+          - Change the x-axis label to be blank
+
+          - Change the y-axis label to "Normalized counts"
+
+          - Using `theme()` change the following properties of the plot:
+
+               - Remove the legend (Hint: use ?theme help and scroll down to legend.position)
+
+               - Change the plot title size to 1.5x the default and center align
+
+               - Change the axis title to 1.5x the default size
+
+               - Change the size of the axis text only on the y-axis to 1.25x the default size
+               
+               - Rotate the x-axis text to 45 degrees using `axis.text.x=element_text(angle=45, hjust=1)`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+theme: jekyll-theme-cayman`
	`2`	`+title: Introduction to R`