Skip to content

Commit

Permalink
Lab07
Browse files Browse the repository at this point in the history
  • Loading branch information
mrsillydog committed Mar 20, 2020
1 parent ff53412 commit 3213f95
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 0 deletions.
14 changes: 14 additions & 0 deletions lab07/lab07_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Exercise 7.1

a.
Exercise 1
cities["Begins with San and 50+ sq. miles"] = (cities["City name"].apply(lambda name: name.startswith("San")) & cities["Area square miles"].apply(lambda area: area > 50))
cities

Exercise 2
cities.reindex(np.random.permutation(cities.index + 5))

b. Pandas provides data in a more understandable fashion than NumPy; it provides labeled columns and indexing on the rows, whereas it's very easy to lose track of which column represents what with NumPy. The 2-dimensional DataFrame object is also very convenient.

c. Before random sampling would be one such scenario; therefore, instead of sampling at random indices or risking any organizational non-randomness, one could just randomly reorder the database and take the first X values.

26 changes: 26 additions & 0 deletions lab07/lab07_2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Exercise 7.2

a. Numerical data is data with number values (1, 100.4, 55). Categorical data is data with discrete labels as values ("True" or "False", "Dog" or "Cat" or "Parrot").

b.

Task 1:

train_model(
learning_rate=0.00003,
steps=500,
batch_size=5
)

Task 2:

train_model(
learning_rate=0.00003,
steps=500,
batch_size=5,
input_feature="population"
)

c. Hyper-parameters are parameters which dictate how the machine learning algorithm learns, as opposed to the actual data parameters. Put a different way, hyper-parameters are developer-defined with values such that the regular parameters can be better defined by the machine learning algorithm.

There is no standard tuning algorithm for hyper-parameters, since the effects of various hyper-parameters is data dependent. So the only way to truly test their validity is by testing on the data.
26 changes: 26 additions & 0 deletions lab07/lab07_3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Exercise 7.3

a.

Task 1

california_housing_dataframe["rooms_per_person"] = california_housing_dataframe["total_rooms"] / california_housing_dataframe["population"]

calibration_data = train_model(
learning_rate=0.04,
steps=500,
batch_size=5,
input_feature="rooms_per_person"
)

Task 2

plt.scatter(calibration_data["predictions"], calibration_data["targets"])

Task 3

california_housing_dataframe["rooms_per_person"].apply(lambda x: min(5, x))

b. Synthetic features which combine two other features allow us to explore how well a combination of multiple parameters predicts or relates to a single target parameter.

c. Outliers are pieces of data with values that very much stand out from the majority of the data. The usual action with outliers is not to get rid of them entirely, but rather to set them to a more acceptable reasonable maximum or minimum value.

0 comments on commit 3213f95

Please sign in to comment.