Skip to content

Commit

Permalink
Added class 34 and practice 10 material
Browse files Browse the repository at this point in the history
  • Loading branch information
emeyers committed Apr 20, 2022
1 parent bcc8ab4 commit 09642f5
Show file tree
Hide file tree
Showing 10 changed files with 3,164 additions and 0 deletions.
Binary file added demos/lec34.zip
Binary file not shown.
1,373 changes: 1,373 additions & 0 deletions demos/lec34/banknote.csv

Large diffs are not rendered by default.

51 changes: 51 additions & 0 deletions demos/lec34/fruit_baskets.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Bananas,Clementines,Weight
9,19,7.9158474996816794
4,25,6.698709726621817
5,26,7.367896742177805
7,26,8.018789791219415
8,21,7.57554432183902
9,23,8.333710090812431
5,16,5.931816559231025
5,16,5.848549073185536
9,17,7.895173563047283
6,23,7.011109795165099
4,26,6.668068372929213
9,21,7.961371444638614
9,17,7.505879200699456
5,28,7.174203037373924
5,24,6.738540314853398
7,17,7.039008068707994
6,20,7.013657264456396
5,20,6.623699979885603
8,27,8.331002624512315
6,20,6.780726410571946
6,15,6.183445705387013
8,29,8.662572475638402
6,19,6.636719416937433
4,26,6.7231498206239255
4,26,6.749736069327665
9,29,9.028376079829965
4,15,5.5308245420577435
4,22,6.055992593491173
9,17,7.599693180071183
5,18,6.217862693193167
7,14,6.433308301987923
7,27,7.929009194407139
8,22,7.8057436171132535
4,27,6.97372260670841
7,19,7.290583856036912
4,20,6.297207911132039
7,17,6.89676373322237
6,27,7.619274930152079
9,21,8.096026889276203
6,18,6.561978517200374
8,27,8.44991946519999
9,20,7.946105322785703
5,20,6.376491134061626
7,14,6.653772139552654
9,27,8.78023129861189
7,28,8.13465575082018
9,17,7.658174638520082
7,22,7.608134883262746
8,29,8.636244227191405
9,29,9.138903778812676
314 changes: 314 additions & 0 deletions demos/lec34/lec34.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,314 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"from datascience import *\n",
"import numpy as np\n",
"import matplotlib\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plots\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"plots.style.use('fivethirtyeight')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lecture 34"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"def standard_units(arr):\n",
" return (arr - np.average(arr))/np.std(arr)\n",
"\n",
"def correlation(t, x, y):\n",
" x_standard = standard_units(t.column(x))\n",
" y_standard = standard_units(t.column(y))\n",
" return np.average(x_standard * y_standard)\n",
"\n",
"def slope(t, x, y):\n",
" r = correlation(t, x, y)\n",
" y_sd = np.std(t.column(y))\n",
" x_sd = np.std(t.column(x))\n",
" return r * y_sd / x_sd\n",
"\n",
"def intercept(t, x, y):\n",
" x_mean = np.mean(t.column(x))\n",
" y_mean = np.mean(t.column(y))\n",
" return y_mean - slope(t, x, y)*x_mean\n",
"\n",
"def get_fitted_values(t, x, y):\n",
" \"\"\"Return an array of the regression estimates at all the x values\"\"\"\n",
" a = slope(t, x, y)\n",
" b = intercept(t, x, y)\n",
" return a*t.column(x) + b\n",
"\n",
"def get_residuals(t, x, y):\n",
" predictions = get_fitted_values(t, x, y)\n",
" return t.column(y) - predictions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regression Model \n",
"\n",
"Let's examine the relationship between:\n",
"\n",
"- True regression line that captures the linear relationship between two variables (green line)\n",
"- A random sample of n points that come from the underlying linear relationship plus random noise off the regression line\n",
"- A line fit to the sample of points that approximates the true regression line (i.e., the \"line best fit\" shown in blue)\n",
"\n",
"To do this we will use the function `draw_and_compare` defined below that takes three arguments:\n",
"\n",
"1. The true slope of a linear relationship between our variables\n",
"2. The true y-intercept of a linear relationship between our variables\n",
"3. A sample size (n) of random points that will be used to calculate the \"line of best fit\"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [],
"source": [
"def draw_and_compare(true_slope, true_int, sample_size):\n",
" \n",
" x = np.random.normal(50, 5, sample_size)\n",
" xlims = np.array([np.min(x), np.max(x)])\n",
" errors = np.random.normal(0, 6, sample_size)\n",
" y = (true_slope * x + true_int) + errors\n",
" sample = Table().with_columns('x', x, 'y', y)\n",
"\n",
" sample.scatter('x', 'y')\n",
" plots.plot(xlims, true_slope*xlims + true_int, lw=2, color='green')\n",
" plots.title('True Line, and Points Created')\n",
"\n",
" sample.scatter('x', 'y')\n",
" plots.title('What We Get to See')\n",
"\n",
" sample.scatter('x', 'y', fit_line=True)\n",
" plots.title('Regression Line: Estimate of True Line')\n",
"\n",
" sample.scatter('x', 'y', fit_line=True)\n",
" plots.plot(xlims, true_slope*xlims + true_int, lw=2, color='green')\n",
" plots.title(\"Regression Line and True Line\")"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"# have a true slope of 2, an true intercept of -5 and draw 10 random points\n"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"# have a true slope of 2, an true intercept of -5 and draw 100 random points\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bootstrap slopes, intercepts and regression lines"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# load data on fruits\n",
"fruit = Table.read_table('fruit_baskets.csv')\n",
"fruit.show(3)"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"# take a random sample (with replacement) from our original fruit sample and fit a regression line\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"# create a bootstrap distribution for the slope and intercept\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"# visualize all the bootstrap lines\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"# create a 95% confidence interval for the regression slope\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"# visualize the bootstrap distribution\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"# Question:\n",
"# Is a slope of 0 plausible? \n",
"# i.e, no linear association between the number of Clementines and Weight?\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Question: could you run a hypothesis test assessing whether the regression slope is 0? "
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"# create a null distribution \n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [],
"source": [
"# visualize the null distribution and compare it to slope calculated on the real data\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Classification"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Can you tell if a bank note is counterfeit or legitimate?\n",
"# Variables based on photgraphs of many banknotes (a few numbers for each image calculated)\n",
"\n",
"banknotes = Table.read_table('banknote.csv')\n",
"banknotes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize 'WaveletVar' and 'WaveletCurt'\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize 'WaveletSkew', 'Entropy'\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Two attributes have some overlap of classes...what happens with three attributes?\n",
"fig = plots.figure(figsize=(8,8))\n",
"ax = Axes3D(fig)\n",
"\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Loading

0 comments on commit 09642f5

Please sign in to comment.