-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathLinear Regression Machine Learning Project.Rmd
129 lines (84 loc) · 2.32 KB
/
Linear Regression Machine Learning Project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# Linear Regression Machine Learning Project With R
# Read the Data
```{r}
bike <- read.csv('./bikeshare.csv')
head(bike)
```
## Exploratory Data Analysis
# Load the Libraries
```{r}
library(ggplot2)
library(dplyr)
```
```{r}
ggplot(bike,aes(temp,count)) + geom_point(alpha=0.3,aes(color=temp)) + theme_bw()
```
# Convert to Timestamp
```{r}
bike$datetime <- as.POSIXct(bike$datetime)
```
```{r}
plot <- ggplot(bike,aes(datetime,count)) + geom_point(aes(color=temp), alpha = 0.5)
print(plot)
```
```{r}
plot + scale_color_continuous(low='#55D8CE',high = '#FF6E2E') + theme_bw()
```
# Building a Correlation Graph
```{r}
cor(bike[,c('temp','count')])
ggplot(bike,aes(factor(season),count)) + geom_boxplot(aes(color=factor(season))) + theme_bw()
```
# Feature Engineering
```{r}
bike$hour <- sapply(bike$datetime,function(x){format(x, "%H")})
head(bike)
```
# Scatterplot
## For Working Days
```{r}
plot <- ggplot(filter(bike,workingday==1),aes(hour,count))
```
```{r}
plot <- plot + geom_point(position=position_jitter(w=1, h=0),aes(color=temp),alpha=0.5)
plot <- plot + scale_color_gradientn(colours = c('dark blue','blue','light blue','light green','yellow','orange','red'))
plot + theme_bw()
```
## For Non-Working Days
```{r}
plot <- ggplot(filter(bike,workingday==0),aes(hour,count))
plot <- plot + geom_point(position=position_jitter(w=1, h=0),aes(color=temp),alpha=0.8)
```
```{r}
plot <- plot + scale_color_gradientn(colours = c('dark blue','blue','light blue','light green','yellow','orange','red'))
plot + theme_bw()
```
## Building the Linear Model
```{r}
temp.model <- lm(count~temp,bike)
```
## Getting the Summary of the Model
```{r}
summary(temp.model)
```
## How many bike rentals would we predict if the temperature was 25 degrees Celsius? Calculate this two ways:
# * Using the values we just got above
# * Using the predict() function
# Method 1
```{r}
6.0462 + 9.17*25
```
# Method 2
```{r}
temp.test <- data.frame(temp=c(25))
predict(temp.model,temp.test)
```
# Columns of Numeric Values
```{r}
bike$hour <- sapply(bike$hour,as.numeric)
```
# Building Models Based on Different Variables
```{r}
model <- lm(count ~ . -casual - registered -datetime -atemp,bike )
summary(model)
```