|
| 1 | +# Introduction to R practice |
| 2 | + |
| 3 | +## Creating vectors/factors and dataframes |
| 4 | + |
| 5 | +1. We are performing RNA-Seq on cancer samples being treated with three different types of treatment (A, B, and P). You have 12 samples total, with 4 replicates per treatment. Write the R code you would use to construct your metadata table as described below. |
| 6 | + - Create the vectors/factors for each column (Hint: you can type out each vector/factor, or if you want the process go faster try exploring the `rep()` function). |
| 7 | + - Put them together into a dataframe called `meta`. |
| 8 | + - Use the `rownames()` function to assign row names to the dataframe (Hint: you can type out the row names as a vector, or if you want the process go faster try exploring the `paste()` function). |
| 9 | + |
| 10 | + Your finished metadata table should have information for the variables `sex`, `stage`, `treatment`, and `myc` levels: |
| 11 | + |
| 12 | + | |sex | stage | treatment | myc | |
| 13 | + |:--:|:--: | :--: | :------: | :--: | |
| 14 | + |sample1| M |I |A |2343| |
| 15 | + |sample2| F |II |A |457| |
| 16 | + |sample3 |M |II |A |4593| |
| 17 | + |sample4 |F |I |A |9035| |
| 18 | + |sample5| M |II |B |3450| |
| 19 | + |sample6| F| II| B| 3524| |
| 20 | + |sample7| M| I| B| 958| |
| 21 | + |sample8| F| II| B| 1053| |
| 22 | + |sample9| M| II| P| 8674| |
| 23 | + |sample10 |F| I |P |3424| |
| 24 | + |sample11| M |II |P |463| |
| 25 | + |sample12| F| II| P| 5105| |
| 26 | + |
| 27 | + |
| 28 | +## Subsetting vectors/factors and dataframes |
| 29 | + |
| 30 | +2. Using the `meta` data frame from question #1, write out the R code you would use to perform the following operations (questions **DO NOT** build upon each other): |
| 31 | + |
| 32 | + - return only the `treatment` and `sex` columns using `[]`: |
| 33 | + - return the `treatment` values for samples 5, 7, 9, and 10 using `[]`: |
| 34 | + - use `filter()` to return all data for those samples receiving treatment `P`: |
| 35 | + - use `filter()`/`select()`to return only the `stage` and `treatment` columns for those samples with `myc` > 5000: |
| 36 | + - remove the `treatment` column from the dataset using `[]`: |
| 37 | + - remove samples 7, 8 and 9 from the dataset using `[]`: |
| 38 | + - keep only samples 1-6 using `[]`: |
| 39 | + - add a column called `pre_treatment` to the beginning of the dataframe with the values T, F, F, F, T, T, F, T, F, F, T, T (Hint: use `cbind()`): |
| 40 | + - change the names of the columns to: "A", "B", "C", "D": |
| 41 | + |
| 42 | +## Extracting components from lists |
| 43 | +3. Create a new list, `list_hw` with three components, the `glengths` vector, the dataframe `df`, and `number` value. Use this list to answer the questions below . `list_hw` has the following structure (NOTE: the components of this list are not currently named): |
| 44 | + |
| 45 | + [[1]] |
| 46 | + [1] 4.6 3000.0 50000.0 |
| 47 | + |
| 48 | + [[2]] |
| 49 | + species glengths |
| 50 | + 1 ecoli 4.6 |
| 51 | + 2 human 3000.0 |
| 52 | + 3 corn 50000.0 |
| 53 | + |
| 54 | + [[3]] |
| 55 | + [1] 8 |
| 56 | + |
| 57 | +Write out the R code you would use to perform the following operations (questions **DO NOT** build upon each other): |
| 58 | + - return the second component of the list: |
| 59 | + - return `50000.0` from the first component of the list: |
| 60 | + - return the value `human` from the second component: |
| 61 | + - give the components of the list the following names: "genome_lengths", "genomes", "record": |
| 62 | + |
| 63 | +## Creating figures with ggplot2 |
| 64 | + |
| 65 | + |
| 66 | + |
| 67 | +4. Create the same plot as above using ggplot2 using the provided metadata and counts datasets. The [metadata table](https://github.com/hbc/Intro-to-R-2-day/raw/master/data/Mov10_full_meta.txt) describes an experiment that you have setup for RNA-seq analysis, while the [associated count matrix](https://github.com/hbc/Intro-to-R-2-day/raw/master/data/normalized_counts.txt) gives the normalized counts for each sample for every gene. Download the count matrix and metadata using the links provided. |
| 68 | + |
| 69 | + Follow the instructions below to build your plot. Write the code you used and provide the final image. |
| 70 | + |
| 71 | + - Read in the metadata file using: `meta <- read.delim("Mov10_full_meta.txt", sep="\t", row.names=1)` |
| 72 | + |
| 73 | + - Read in the count matrix file using: `data <- read.delim("normalized_counts.txt", sep="\t", row.names=1)` |
| 74 | + |
| 75 | + - Create a vector called `expression` that contains the normalized count values from the row in normalized_counts that corresponds to the MOV10 gene. |
| 76 | + |
| 77 | + - Check the class of this expression vector. Then, convert it to a numeric vector using `as.numeric(expression)` |
| 78 | + |
| 79 | + - Bind that vector to your metadata data frame (`meta`) and call the new data frame `df`. |
| 80 | + |
| 81 | + - Create a ggplot by constructing the plot line by line: |
| 82 | + |
| 83 | + - Initialize a ggplot with your `df` as input. |
| 84 | + |
| 85 | + - Add the `geom_jitter()` geometric object with the required aesthetics which are x and y. |
| 86 | + |
| 87 | + - Color the points based on `sampletype` |
| 88 | + |
| 89 | + - Add the `theme_bw()` layer |
| 90 | + |
| 91 | + - Add the title "Expression of MOV10" to the plot |
| 92 | + |
| 93 | + - Change the x-axis label to be blank |
| 94 | + |
| 95 | + - Change the y-axis label to "Normalized counts" |
| 96 | + |
| 97 | + - Using `theme()` change the following properties of the plot: |
| 98 | + |
| 99 | + - Remove the legend (Hint: use ?theme help and scroll down to legend.position) |
| 100 | + |
| 101 | + - Change the plot title size to 1.5x the default and center align |
| 102 | + |
| 103 | + - Change the axis title to 1.5x the default size |
| 104 | + |
| 105 | + - Change the size of the axis text only on the y-axis to 1.25x the default size |
| 106 | + |
| 107 | + - Rotate the x-axis text to 45 degrees using `axis.text.x=element_text(angle=45, hjust=1)` |
0 commit comments