Skip to content

Commit 9d5bf96

Browse files
Bhanu MittalBhanu Mittal
Bhanu Mittal
authored and
Bhanu Mittal
committed
Completed the tutorial
1 parent a4c5c6c commit 9d5bf96

File tree

1 file changed

+81
-2
lines changed

1 file changed

+81
-2
lines changed

README.md

+81-2
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ python3 --version
3333
which python3
3434
```
3535

36-
The ouput of the directory should be somethhing like "/usr/local/bin/python3.7" or "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7"
36+
The ouput of the directory should be something like "/usr/local/bin/python3.7" or "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7"
3737

3838
## Getting Started
3939

@@ -55,7 +55,86 @@ if __name__ == '__main__':
5555

5656

5757
## Solving our first ML Problem
58-
We are gonna work on our first dataset by using simple [linear regression](http://onlinestatbook.com/2/regression/intro.html). We would be using a fairly small dataset for our problem called [Auto MPG](https://archive.ics.uci.edu/ml/datasets/auto+mpg).
58+
We are gonna work on our first dataset by using simple [linear regression](http://onlinestatbook.com/2/regression/intro.html). We would be using a fairly small dataset for our problem called [Auto MPG](https://archive.ics.uci.edu/ml/datasets/auto+mpg). Before we get stared we need to install some libraries in our Python environment. In order to do that, click on the terminal tab in PyCharm at the bottom of the window, it should have (venv) as its set environment. Run the following commands in it.
59+
```bash
60+
pip install pandas
61+
pip install sklearn
62+
pip install matplotlib
63+
```
64+
Our dataset is present in the file "auto-mpg.data", so download it inside your project directory.
65+
66+
### Preparing the Dataset
67+
First let's see how can we explore the dataset in Python. Observe that the values of each row in the dataset are seperated by spaces which are not fixed in length. Thus it would require us to read the file line by line in Python. Below is the source code to do the needfull.
68+
```python
69+
import pandas as pd
70+
import numpy as np
71+
import matplotlib.pyplot as plt
72+
import sklearn
73+
from sklearn.linear_model import LinearRegression
74+
75+
headers = ['cylinders','displacement','horsepower','weight'
76+
,'acceleration','model year','origin']
77+
mpg = [] #Dependent variable : mpg (miles per gallon)
78+
features = [] #Independent variables: rest of them except car name
79+
80+
#Preparing the data
81+
raw = open('auto-mpg.data')
82+
for line in raw:
83+
i=0
84+
var = []
85+
for y in line.split():
86+
if (i==0): # The first column is mpg
87+
mpg.append([float(y)])
88+
if (i<8 and i>0): # Not including the last column i.e. "car names"
89+
if(y=='?'): # Handaling missing values for "horsepower"
90+
y = 0 # (Setting them to 0) Later we will change them
91+
if(i==4):
92+
var.append(float(y.replace('.','')))
93+
else:
94+
var.append(float(y))
95+
i+=1
96+
features.append(var)
97+
98+
99+
df_x = pd.DataFrame.from_records(features, columns = headers)
100+
df_y = pd.DataFrame.from_records(mpg, columns = ['mpg'])
101+
102+
```
103+
104+
Now we have the entire datset in the variables: '*df_x*' and '*df_y*'. You can preview these variables and get an overall summary of them by using:
105+
```python
106+
print(df_x.describe(include='all'))
107+
print(df_y.describe(include='all'))
108+
print(df_x.head()) # Prints the top 5 values
109+
print(df_y.head()) # Prints the top 5 values
110+
```
111+
Now let us change the missing values, i.e. now 0, to their average values.
112+
```python
113+
avg_bhp = (np.average(df_x['horsepower'])*len(df_x))/(len(df_x)-6)
114+
df_x['horsepower'] = df_x['horsepower'].replace(0,avg_bhp)
115+
```
116+
Now let's train our first ML model in Python.
117+
```python
118+
X_train, X_test, Y_train, Y_test = sklearn.model_selection.train_test_split(
119+
df_x,df_y,test_size=0.2,random_state = 5)
120+
lm = LinearRegression()
121+
122+
lm.fit(X_train, Y_train)
123+
pred_test = lm.predict(X_test)
124+
125+
print('coefficients :(all vars)')
126+
print(lm.coef_)
59127

128+
print('Intercept :(all vars)')
129+
print(lm.intercept_)
130+
131+
print('Accuracy Score(all vars): %s' % lm.score(X_test,Y_test))
132+
133+
residues = (Y_test - pred_test)
134+
135+
plt.scatter(X_test, Y_test, color='black') # The actual values
136+
plt.plot(X_test, pred_test, color='blue', linewidth=3) # The predicted values
137+
```
138+
Voila! You have successfully completed your first ML model.
60139
## License
61140
This guide is free to use for non-commercial purposes. For any improvements please reach out to me on: bhanu93(dot)iitd(at)gmail(dot)com.

0 commit comments

Comments
 (0)