Skip to content

Commit 7bd40a8

Browse files
committed
update
1 parent 7bcccbb commit 7bd40a8

File tree

9 files changed

+589
-178
lines changed

9 files changed

+589
-178
lines changed
Lines changed: 360 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,360 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "7d56b2d5",
6+
"metadata": {
7+
"editable": true
8+
},
9+
"source": [
10+
"<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
11+
"doconce format html exercisesweek37.do.txt -->\n",
12+
"<!-- dom:TITLE: Exercises week 36 -->"
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "c7a8e9c7",
18+
"metadata": {
19+
"editable": true
20+
},
21+
"source": [
22+
"# Exercises week 36\n",
23+
"**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n",
24+
"\n",
25+
"Date: **September 8-12, 2025**"
26+
]
27+
},
28+
{
29+
"cell_type": "markdown",
30+
"id": "cf8f0ecb",
31+
"metadata": {
32+
"editable": true
33+
},
34+
"source": [
35+
"## Learning goals\n",
36+
"\n",
37+
"After having completed these exercises you will have:\n",
38+
"1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n",
39+
"\n",
40+
"2. Be able to compare the analytical expressions for OLS and Rudge regression with the gradient descent approach\n",
41+
"\n",
42+
"3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n",
43+
"\n",
44+
"4. Scale the data properly"
45+
]
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"id": "a67ae548",
50+
"metadata": {
51+
"editable": true
52+
},
53+
"source": [
54+
"## Ridge regression and a new Synthetic Dataset\n",
55+
"\n",
56+
"We create a synthetic linear regression dataset with a sparse\n",
57+
"underlying relationship. This means we have many features but only a\n",
58+
"few of them actually contribute to the target. In our example, we’ll\n",
59+
"use 10 features with only 3 non-zero weights in the true model. This\n",
60+
"way, the target is generated as a linear combination of a few features\n",
61+
"(with known coefficients) plus some random noise. The steps we include are:\n",
62+
"\n",
63+
"Decide on the number of samples and features (e.g. 100 samples, 10 features).\n",
64+
"Define the **true** coefficient vector with mostly zeros (for sparsity). For example, we set $\\hat{\\boldsymbol{\\theta}} = [5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0]$, meaning only features 0, 1, and 6 have a real effect on y.\n",
65+
"\n",
66+
"Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n",
67+
"Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n",
68+
"\n",
69+
"Below is the code to generate the dataset:"
70+
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"execution_count": 1,
75+
"id": "f2d4a55d",
76+
"metadata": {
77+
"collapsed": false,
78+
"editable": true
79+
},
80+
"outputs": [],
81+
"source": [
82+
"import numpy as np\n",
83+
"\n",
84+
"# Set random seed for reproducibility\n",
85+
"np.random.seed(0)\n",
86+
"\n",
87+
"# Define dataset size\n",
88+
"n_samples = 100\n",
89+
"n_features = 10\n",
90+
"\n",
91+
"# Define true coefficients (sparse linear relationship)\n",
92+
"theta_true = np.array([5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0])\n",
93+
"\n",
94+
"# Generate feature matrix X (n_samples x n_features) with random values\n",
95+
"X = np.random.randn(n_samples, n_features) # standard normal distribution\n",
96+
"\n",
97+
"# Generate target values y with a linear combination of X and theta_true, plus noise\n",
98+
"noise = 0.5 * np.random.randn(n_samples) # Gaussian noise\n",
99+
"y = X.dot @ theta_true + noise"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"id": "a445583b",
105+
"metadata": {
106+
"editable": true
107+
},
108+
"source": [
109+
"This code produces a dataset where only features 0, 1, and 6\n",
110+
"significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n",
111+
"coefficient, so they only contribute noise. For example, feature 0 has\n",
112+
"a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n",
113+
"the expected relationship is:"
114+
]
115+
},
116+
{
117+
"cell_type": "markdown",
118+
"id": "4a81ddf9",
119+
"metadata": {
120+
"editable": true
121+
},
122+
"source": [
123+
"$$\n",
124+
"y \\approx 5 \\times X_0 \\;-\\; 3 \\times X_1 \\;+\\; 2 \\times X_6 \\;+\\; \\text{noise}.\n",
125+
"$$"
126+
]
127+
},
128+
{
129+
"cell_type": "markdown",
130+
"id": "ae590275",
131+
"metadata": {
132+
"editable": true
133+
},
134+
"source": [
135+
"## Exercise 1, scale your data\n",
136+
"\n",
137+
"Before fitting a regression model, it is good practice to normalize or\n",
138+
"standardize the features. This ensures all features are on a\n",
139+
"comparable scale, which is especially important when using\n",
140+
"regularization. Here we will perform standardization, scaling each\n",
141+
"feature to have mean 0 and standard deviation 1:\n",
142+
"\n",
143+
"Compute the mean and standard deviation of each column (feature) in $bm{X}$.\n",
144+
"Subtract the mean and divide by the standard deviation for each feature.\n",
145+
"\n",
146+
"We will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
147+
"(and each feature) means the model won’t require a separate intercept\n",
148+
"term – the data is shifted such that the intercept is effectively 0\n",
149+
". (In practice, one could include an intercept in the model and not\n",
150+
"penalize it, but here we simplify by centering.)"
151+
]
152+
},
153+
{
154+
"cell_type": "code",
155+
"execution_count": 2,
156+
"id": "8b40c47a",
157+
"metadata": {
158+
"collapsed": false,
159+
"editable": true
160+
},
161+
"outputs": [],
162+
"source": [
163+
"# Standardize features (zero mean, unit variance for each feature)\n",
164+
"X_mean = X.mean(axis=0)\n",
165+
"X_std = X.std(axis=0)\n",
166+
"X_std[X_std == 0] = 1 # safeguard to avoid division by zero for constant features\n",
167+
"X_norm = (X - X_mean) / X_std\n",
168+
"\n",
169+
"# Center the target to zero mean (optional, to simplify intercept handling)\n",
170+
"y_mean = ?\n",
171+
"y_centered = ?"
172+
]
173+
},
174+
{
175+
"cell_type": "markdown",
176+
"id": "ff9c0c81",
177+
"metadata": {
178+
"editable": true
179+
},
180+
"source": [
181+
"### 1a)\n",
182+
"\n",
183+
"Fill in the necessary details.\n",
184+
"\n",
185+
"After this preprocessing, each column of $\\boldsymbol{X}_norm$ has mean zero and standard deviation $1$\n",
186+
"and $\\boldsymbol{y}_centered$ has mean 0. This makes the optimization landscape\n",
187+
"nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
188+
"\\beta_j^2$ treats each coefficient fairly (since features are on the\n",
189+
"same scale)."
190+
]
191+
},
192+
{
193+
"cell_type": "markdown",
194+
"id": "d27c70e4",
195+
"metadata": {
196+
"editable": true
197+
},
198+
"source": [
199+
"## Exercise 2, use the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{theta}$"
200+
]
201+
},
202+
{
203+
"cell_type": "code",
204+
"execution_count": 3,
205+
"id": "9f1e5184",
206+
"metadata": {
207+
"collapsed": false,
208+
"editable": true
209+
},
210+
"outputs": [],
211+
"source": [
212+
"# Set regularization parameter, either a single value or a vector of values\n",
213+
"lambda = ?\n",
214+
"\n",
215+
"# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n",
216+
"I = np.eye(n_features)\n",
217+
"theta_closed_formRidge = ?\n",
218+
"theta_closed_formOLS = ?\n",
219+
"\n",
220+
"print(\"Closed-form Ridge coefficients:\", theta_closed_form)\n",
221+
"print(\"Closed-form OLS coefficients:\", theta_closed_form)"
222+
]
223+
},
224+
{
225+
"cell_type": "markdown",
226+
"id": "2ec556b9",
227+
"metadata": {
228+
"editable": true
229+
},
230+
"source": [
231+
"This computes the ridge and OLS regression coefficients directly. The identity\n",
232+
"matrix $I$ has the same size as $X^T X$ (which is n_features x\n",
233+
"n_features), and lam * I adds $\\lambda$ to the diagonal of $X^T X. We\n",
234+
"then invert this matrix and multiply by $X^T y. The result\n",
235+
"for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n_features,) containing the\n",
236+
"fitted weights."
237+
]
238+
},
239+
{
240+
"cell_type": "markdown",
241+
"id": "a821f0c5",
242+
"metadata": {
243+
"editable": true
244+
},
245+
"source": [
246+
"### 2a)\n",
247+
"\n",
248+
"Finalize the OLS and Ridge regression determination of the optimal parameters $bm{\\theta}$."
249+
]
250+
},
251+
{
252+
"cell_type": "markdown",
253+
"id": "d637130e",
254+
"metadata": {
255+
"editable": true
256+
},
257+
"source": [
258+
"### 2b)\n",
259+
"\n",
260+
"Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36."
261+
]
262+
},
263+
{
264+
"cell_type": "markdown",
265+
"id": "b455ce7e",
266+
"metadata": {
267+
"editable": true
268+
},
269+
"source": [
270+
"## Implementing the simplest form for gradient descent\n",
271+
"\n",
272+
"Alternatively, we can fit the ridge regression model using gradient\n",
273+
"descent. This is useful to visualize the iterative convergence and is\n",
274+
"necessary if $n$ and $p$ are so large that the closed-form might be\n",
275+
"too slow or memory-intensive. We derive the gradients from the cost\n",
276+
"functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n",
277+
"the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
278+
"\n",
279+
"Below is a template code for gradient descent implementation of ridge:"
280+
]
281+
},
282+
{
283+
"cell_type": "code",
284+
"execution_count": 4,
285+
"id": "cfa1eb29",
286+
"metadata": {
287+
"collapsed": false,
288+
"editable": true
289+
},
290+
"outputs": [],
291+
"source": [
292+
"# Gradient descent parameters, learning rate eta first\n",
293+
"eta = 0.1\n",
294+
"# Then number of iterations\n",
295+
"num_iters = 1000\n",
296+
"\n",
297+
"# Initialize weights for gradient descent\n",
298+
"theta = np.zeros(n_features)\n",
299+
"\n",
300+
"# Arrays to store history for plotting\n",
301+
"cost_history = np.zeros(num_iters)\n",
302+
"\n",
303+
"# Gradient descent loop\n",
304+
"m = n_samples # number of examples\n",
305+
"for t in range(num_iters):\n",
306+
" # Compute prediction error\n",
307+
" error = X_norm.dot(theta) - y_centered \n",
308+
" # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n",
309+
" cost_OLS = ?\n",
310+
" cost_Ridge = ?\n",
311+
" cost_history[t] = ?\n",
312+
" # Compute gradients for OSL and Ridge\n",
313+
" grad_OLS = ?\n",
314+
" grad_Ridge = ?\n",
315+
" # Update parameters theta\n",
316+
" theta_gdOLS = ?\n",
317+
" theta_gdRidge = ? \n",
318+
"\n",
319+
"# After the loop, theta contains the fitted coefficients\n",
320+
"theta_gdOLS = ?\n",
321+
"theta_gdRidge = ?\n",
322+
"print(\"Gradient Descent OLS coefficients:\", theta_gdOLS)\n",
323+
"print(\"Gradient Descent Ridge coefficients:\", theta_gdRidge)"
324+
]
325+
},
326+
{
327+
"cell_type": "markdown",
328+
"id": "dc78d58d",
329+
"metadata": {
330+
"editable": true
331+
},
332+
"source": [
333+
"### 3a)\n",
334+
"\n",
335+
"Discuss the results as function of the learning rate paramaters and the number of iterations."
336+
]
337+
},
338+
{
339+
"cell_type": "markdown",
340+
"id": "15060acb",
341+
"metadata": {
342+
"editable": true
343+
},
344+
"source": [
345+
"### 3b)\n",
346+
"\n",
347+
"Add a stopping parameter as function of the number iterations. \n",
348+
"\n",
349+
"If everything worked correctly, the learned coefficients should be\n",
350+
"close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n",
351+
"generate the data. Keep in mind that due to regularization and noise,\n",
352+
"the learned values will not exactly equal the true ones, but they\n",
353+
"should be in the same ballpark."
354+
]
355+
}
356+
],
357+
"metadata": {},
358+
"nbformat": 4,
359+
"nbformat_minor": 5
360+
}
25 Bytes
Binary file not shown.
755 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)