-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathR_regression.Rmd
108 lines (92 loc) · 4.27 KB
/
R_regression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: 'DMwP-EnergyUsage'
author: 'Wenze ma, Darren Zhu, Lawrence Ko, Shiloh Musser, Steven Feng'
date: ' Jan 22, 2021'
output:
word_document: default
---
# Energy Usage in the U.S.(2007)
## "The U.S. Energy Information Adminstration collects and curates self-reported information from energy utilities about energy production and usage in the United States. This data set contains information from over 2,000 U.S. utilities in 2017. The information includes sources of energy, its uses in different economic sectors, and the revenues obtained from the sale of electrical energy."
```{r}
getwd()
setwd('C:/Users/fengr/Desktop/CSSE386_Data_Mining/DMwP-EnergyUsage') #set to the folder where the data is saved
electricity <- read.csv('electricity.csv', header = TRUE) #import the data to the R session
```
```{r}
head(electricity)
```
```{r}
plot(electricity$Sources.Generation, electricity$Sources.Purchased, main="Relationship Between Net Generation and Purchase of Power (in MGh)",
xlab="Power Generated (MGh) ", ylab="Power Purchased (MGh) ", pch=19)
```
It looks like they have inverse or exponential relationship, so just took reciprocal about the power generated.
```{r}
plot(1/electricity$Sources.Generation, electricity$Sources.Purchased, main="Relationship Between Net Generation and Purchase of Power (in MGh)",
xlab="1/Power Generated (MGh^-1) ", ylab="Power Purchased (MGh) ", pch=19)
```
```{r}
plot(log(electricity$Sources.Generation), electricity$Sources.Purchased, main="Relationship Between Net Generation and Purchase of Power (in MGh)",
xlab="1/Power Generated (MGh^-1) ", ylab="Power Purchased (MGh) ", pch=19)
```
```{r}
lm1=lm(electricity$Sources.Purchased~electricity$Sources.Generation)
summary(lm1)
```
```{r}
plot(lm1)
```
```{r}
# Create Training and Test data -
set.seed(7) # setting seed to reproduce results of random sampling
trainingRowIndex <- sample(1:nrow(electricity), 0.8*nrow(electricity)) # row indices for training data
trainingData <- electricity[trainingRowIndex, ] # model training data
testData <- electricity[-trainingRowIndex, ] # test data
```
```{r}
# Build the model on training data -
lmMod <- lm(electricity$Revenue.Total ~ electricity$Sources.Total + electricity$Retail.Total.Revenue + electricity$Retail.Total.Sales + electricity$Retail.Total.Customers, data=trainingData) # build the model
revPred <- predict(lmMod, testData) # predict distance
```
```{r}
summary (lmMod)
library(MASS)
```
```{r}
# Build the robust model on training data -
rlmMod <- rlm(electricity$Revenue.Total ~ electricity$Sources.Total + electricity$Retail.Total.Revenue + electricity$Retail.Total.Sales + electricity$Retail.Total.Customers, data=trainingData, maxit=40, psi=psi.bisquare) # build the model
revPred <- predict(lmMod, testData) # predict distance
```
```{r}
summary (rlmMod)
```
```{r}
DMwR::regr.eval(electricity$Revenue.Total, lmMod$fitted.values) # error for the above lm model
```
```{r}
DMwR::regr.eval(electricity$Revenue.Total, rlmMod$fitted.values) # error for the above rlm model
```
```{r}
library(plyr)
count(electricity$Revenue.Resale!=0)
```
```{r}
385/dim(electricity)[1]
```
```{r}
electricity$Resale[electricity$Revenue.Resale>0] <- 1
electricity$Resale[electricity$Revenue.Resale==0] <- 0
```
```{r}
pairs(~electricity$Revenue.Total + electricity$Resale + electricity$Retail.Total.Revenue + electricity$Retail.Total.Sales + electricity$Retail.Total.Customers)
```
```{r}
# Build the model on training data -
lmMod2 <- lm(electricity$Revenue.Total ~ electricity$Sources.Total + electricity$Retail.Total.Revenue + electricity$Retail.Total.Sales + electricity$Retail.Total.Customers + as.factor(electricity$Utility.Type) + electricity$Resale, data=trainingData) # build the model
revPred2 <- predict(lmMod2, testData) # predict distance
```
```{r}
summary(lmMod2)
```
Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.