Lab07

mrsillydog · Mar 20, 2020 · 3213f95 · 3213f95
1 parent ff53412
commit 3213f95
Show file tree

Hide file tree

Showing 3 changed files with 66 additions and 0 deletions.
diff --git a/lab07/lab07_1.txt b/lab07/lab07_1.txt
@@ -0,0 +1,14 @@
+Exercise 7.1
+
+a.
+Exercise 1
+cities["Begins with San and 50+ sq. miles"] = (cities["City name"].apply(lambda name: name.startswith("San")) & cities["Area square miles"].apply(lambda area: area > 50))
+cities
+
+Exercise 2
+cities.reindex(np.random.permutation(cities.index + 5))
+
+b. Pandas provides data in a more understandable fashion than NumPy; it provides labeled columns and indexing on the rows, whereas it's very easy to lose track of which column represents what with NumPy. The 2-dimensional DataFrame object is also very convenient.
+
+c. Before random sampling would be one such scenario; therefore, instead of sampling at random indices or risking any organizational non-randomness, one could just randomly reorder the database and take the first X values.
+
diff --git a/lab07/lab07_2.txt b/lab07/lab07_2.txt
@@ -0,0 +1,26 @@
+Exercise 7.2
+
+a. Numerical data is data with number values (1, 100.4, 55). Categorical data is data with discrete labels as values ("True" or "False", "Dog" or "Cat" or "Parrot").
+
+b. 
+
+Task 1:
+
+train_model(
+    learning_rate=0.00003,
+    steps=500,
+    batch_size=5
+)
+
+Task 2:
+
+train_model(
+    learning_rate=0.00003,
+    steps=500,
+    batch_size=5,
+    input_feature="population"
+)
+
+c. Hyper-parameters are parameters which dictate how the machine learning algorithm learns, as opposed to the actual data parameters. Put a different way, hyper-parameters are developer-defined with values such that the regular parameters can be better defined by the machine learning algorithm.
+
+There is no standard tuning algorithm for hyper-parameters, since the effects of various hyper-parameters is data dependent. So the only way to truly test their validity is by testing on the data.
diff --git a/lab07/lab07_3.txt b/lab07/lab07_3.txt
@@ -0,0 +1,26 @@
+Exercise 7.3
+
+a. 
+
+Task 1
+
+california_housing_dataframe["rooms_per_person"] = california_housing_dataframe["total_rooms"] / california_housing_dataframe["population"]
+
+calibration_data = train_model(
+    learning_rate=0.04,
+    steps=500,
+    batch_size=5,
+    input_feature="rooms_per_person"
+)
+
+Task 2
+
+plt.scatter(calibration_data["predictions"], calibration_data["targets"])
+
+Task 3
+
+california_housing_dataframe["rooms_per_person"].apply(lambda x: min(5, x))
+
+b. Synthetic features which combine two other features allow us to explore how well a combination of multiple parameters predicts or relates to a single target parameter.
+
+c. Outliers are pieces of data with values that very much stand out from the majority of the data. The usual action with outliers is not to get rid of them entirely, but rather to set them to a more acceptable reasonable maximum or minimum value.