-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathabalone.Rmd
146 lines (94 loc) · 3.08 KB
/
abalone.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: "Abalone"
subtitle: "Lab 1"
author: "Brendan Alexander"
date: "9/27/2019"
output:
html_document:
toc: true
toc_depth: 4
toc_float: true
code_folding: hide
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Import the data
```{r results=FALSE}
data <- read.table("abalone.data",sep = ",")
head(data)
names(data) <- c("sex","length","diameter", "height", "whole", "shucked", "viscera", "shell", "rings")
head(data)
# pairs(data)
```
# Week 1
**Team lead presenting:** Madsen (grp 3)
## Proposal (1-3PM)
1. Height: Outlier issue
- check for outliers
- Pulled 2 outliers from height
1. Check pairs again
```{r}
data1 <- data[-c(2052,1418),]
pairs(data1)
```
Suggestions:
**Group 1**
- Stepwise regression to evaluate all the variables
- Remove outliers from weight
- Reconsider sex as: mature vs immature
**Group 2**
- Sex as a dummy variable?
- Don't want to kill animals, stepwise regression,
- How to estimate weights without killing?
- Use whole weight, regress on shucked viscera and shell
- derived variables from the weight
**Group 4**
- Correlations first, with `ring`
- diameter has a strong correlation
- combine length, diameter and height, make a derived variable
## Proposal
1. Inspect correlations
- sex length and diameter highest correlation, no death
1. Box plots
- comparative boxplots for the different sex categories
1. Outliers?
- hov?
- if not homogeneous, transform the data
1. fit a model
- rings (or age) ~ (nothing specified)
1. Derived variable --> mature = male + female vs infant
- boxPlot against length and diameter
1. Derived variable --> volume = length*diameter*height
1. Multiple regression: whole ~ shucked + viscera + shell
# Week 2: Fulfilling the proposal, collaborating with git
You all had several good initial analysis ideas.
Unfortunately, your programmer was fired.
You have to code it yourself.
Let's break the problem down into the following components:
1. Data cleaning
- Please removed the 2 outliers from the `height` variable and re-run the `pairs` plot.
1. Variable manipulation
- Create a new variable "mature" based on "sex"
- Male and female are mature, infants are not
- Create a new variable "volume" based on "length", "diameter", and "height"
1. Visualize the data
- Plot metrics by sex
1. Model fit: Can we predict shucked, viscera, and shell weight based on whole weight?
1. Model fit: Perform an initial model fit for the data.
- You can use a `stepwise` procedure if you like
- You can use `lm` or `glm`
Groups are the same as last time.
Group leaders are now:
1. Yuting
1. Jenny
1. Fik
1. Allen
Group leaders are responsible for reporting their analysis.
## Data cleaning
## Variable manipulation
## Visualize the data
## Model fit 1: Modeling the weight variables
## Model fit 2: Initial rings fit