forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
149 lines (96 loc) · 4.12 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
```{r, setoptions, echo=FALSE, eval=FALSE, results=FALSE}
library(knitr)
opts_chunk$set(echo=TRUE, results=asis)
```
## Loading and preprocessing the data
The data is load using readr package. Using the read_cvs fuction avoids unzip the data.
```{r, loaddata, results=FALSE, warning=FALSE, comment=NA, message=FALSE}
library(readr)
data<-read_csv("activity.zip")
```
## What is mean total number of steps taken per day
To calculate the number of steps per day, the NAs were filtered using the filter function from the dplyr package.
Adittionaly, the mean and the median are calculated by the summary function.
```{r, histogramday, results=TRUE, warning=FALSE, comment=NA, message=FALSE}
library(dplyr)
library(tidyr)
library(ggplot2)
stepsperday<- data %>%
filter(!is.na(steps)) %>%
group_by(date) %>%
summarise(steps=sum(steps))
ggplot(stepsperday, aes(x=steps)) +
geom_histogram() +
ggtitle("Histogram of the total number of steps taken each day")
```
The mean and median are calculated by their respective fuctions.
```{r, summary, results=TRUE, warning=FALSE, message=FALSE}
#Mean value
mean(stepsperday$steps)
#Median value
median(stepsperday$steps)
```
## What is the average daily activity pattern?
Fistly, a new dataframe is created, which has the mean for every 5-minute interval. Then, the line graph is created.
```{r, histogramint, results=TRUE, warning=FALSE, comment=NA, message=FALSE}
stepsperint<- data %>%
group_by(interval) %>%
summarise(meansteps=mean(steps, na.rm=TRUE))
ggplot(stepsperint, aes(x=interval, y=meansteps)) +
geom_line() +
ggtitle("Average number of steps taken across all days by the 5-minute interval")
```
5-minute interval that contains the maximum number of steps (on average across all the days in the dataset)
```{r, maxinterval, results=TRUE, warning=FALSE, comment=NA, message=FALSE}
stepsperint$interval[stepsperint$meansteps==max(stepsperint$meansteps)]
```
## Imputing missing values
Number of missing values (NAs)
```{r, NAsnumber, results=TRUE, warning=FALSE, comment=NA, message=FALSE}
sum(is.na(data$steps))
```
A new column which contains the mean of the steps per interval is added to the dataset. Then, the NAs are replaced with that value.
```{r, datanoNA, results=TRUE, warning=FALSE, comment=NA, message=FALSE}
#Creating the new dataframe without NAs
datanona<-left_join(data,stepsperint)
datanona[is.na(datanona$steps),"stepsnona"]<-datanona[is.na(datanona$steps),"meansteps"]
datanona[!is.na(datanona$steps),"stepsnona"]<-datanona[!is.na(datanona$steps),"steps"]
datanona<- datanona %>% select(steps=stepsnona, date, interval)
#Total steps by day dataframe
stepsperdaynona<- datanona %>%
filter(!is.na(steps)) %>%
group_by(date) %>%
summarise(steps=sum(steps))
#Creating graph
ggplot(stepsperdaynona, aes(x=steps)) +
geom_histogram() +
ggtitle("Histogram of the total number of steps taken each day (no NAs)")
```
The mean and median are calculated by their respective fuctions.
```{r, summarynona, results=TRUE, warning=FALSE, message=FALSE}
#Mean value
mean(stepsperdaynona$steps)
#Median value
median(stepsperdaynona$steps)
```
Comparing to the initial values (with NAs) the mean is the same, however, the median changed and become equal to the mean.
## Are there differences in activity patterns between weekdays and weekends?
The lubridate package is used to add the number of the day. That number is replaced with "weekday" or "weekend".
```{r, daytype, results=TRUE, warning=FALSE, message=FALSE}
library(lubridate)
datanona$daytype<-wday(datanona$date)
datanona[datanona$daytype %in% c(2,3,4,5,6),"daytype"]<-"weekday"
datanona[datanona$daytype %in% c(1,7),"daytype"]<-"weekend"
datanona$daytype<-as.factor(datanona$daytype)
stepsbyintday<- datanona %>% group_by(interval,daytype) %>%
summarise(meansteps=mean(steps))
ggplot(stepsbyintday, aes(x=interval,y=meansteps)) +
geom_line() +
facet_grid(daytype~.)
```