CompPhysics
diff --git a/‎doc/LectureNotes/.ipynb_checkpoints/exercisesweek37-checkpoint.ipynb‎
Lines changed: 360 additions & 0 deletions b/‎doc/LectureNotes/.ipynb_checkpoints/exercisesweek37-checkpoint.ipynb‎
Lines changed: 360 additions & 0 deletions
diff --git a/‎doc/LectureNotes/_build/.doctrees/environment.pickle‎
25 Bytes b/‎doc/LectureNotes/_build/.doctrees/environment.pickle‎
25 Bytes
diff --git a/‎doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree‎
755 Bytes b/‎doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree‎
755 Bytes
@@ -0,0 +1,360 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7d56b2d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek37.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 36 -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7a8e9c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 36\n",
+    "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n",
+    "\n",
+    "Date: **September 8-12, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf8f0ecb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Learning goals\n",
+    "\n",
+    "After having completed these exercises you will have:\n",
+    "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n",
+    "\n",
+    "2. Be able to compare the analytical expressions for OLS and Rudge regression with the gradient descent approach\n",
+    "\n",
+    "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n",
+    "\n",
+    "4. Scale the data properly"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a67ae548",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ridge regression and a new Synthetic Dataset\n",
+    "\n",
+    "We create a synthetic linear regression dataset with a sparse\n",
+    "underlying relationship. This means we have many features but only a\n",
+    "few of them actually contribute to the target. In our example, we’ll\n",
+    "use 10 features with only 3 non-zero weights in the true model. This\n",
+    "way, the target is generated as a linear combination of a few features\n",
+    "(with known coefficients) plus some random noise. The steps we include are:\n",
+    "\n",
+    "Decide on the number of samples and features (e.g. 100 samples, 10 features).\n",
+    "Define the **true** coefficient vector with mostly zeros (for sparsity). For example, we set $\\hat{\\boldsymbol{\\theta}} = [5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0]$, meaning only features 0, 1, and 6 have a real effect on y.\n",
+    "\n",
+    "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n",
+    "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n",
+    "\n",
+    "Below is the code to generate the dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "f2d4a55d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "# Set random seed for reproducibility\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# Define dataset size\n",
+    "n_samples = 100\n",
+    "n_features = 10\n",
+    "\n",
+    "# Define true coefficients (sparse linear relationship)\n",
+    "theta_true = np.array([5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0])\n",
+    "\n",
+    "# Generate feature matrix X (n_samples x n_features) with random values\n",
+    "X = np.random.randn(n_samples, n_features)  # standard normal distribution\n",
+    "\n",
+    "# Generate target values y with a linear combination of X and theta_true, plus noise\n",
+    "noise = 0.5 * np.random.randn(n_samples)    # Gaussian noise\n",
+    "y = X.dot @ theta_true + noise"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a445583b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This code produces a dataset where only features 0, 1, and 6\n",
+    "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n",
+    "coefficient, so they only contribute noise. For example, feature 0 has\n",
+    "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n",
+    "the expected relationship is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a81ddf9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y \\approx 5 \\times X_0 \\;-\\; 3 \\times X_1 \\;+\\; 2 \\times X_6 \\;+\\; \\text{noise}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae590275",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 1, scale your data\n",
+    "\n",
+    "Before fitting a regression model, it is good practice to normalize or\n",
+    "standardize the features. This ensures all features are on a\n",
+    "comparable scale, which is especially important when using\n",
+    "regularization. Here we will perform standardization, scaling each\n",
+    "feature to have mean 0 and standard deviation 1:\n",
+    "\n",
+    "Compute the mean and standard deviation of each column (feature) in $bm{X}$.\n",
+    "Subtract the mean and divide by the standard deviation for each feature.\n",
+    "\n",
+    "We will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
+    "(and each feature) means the model won’t require a separate intercept\n",
+    "term – the data is shifted such that the intercept is effectively 0\n",
+    ". (In practice, one could include an intercept in the model and not\n",
+    "penalize it, but here we simplify by centering.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8b40c47a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Standardize features (zero mean, unit variance for each feature)\n",
+    "X_mean = X.mean(axis=0)\n",
+    "X_std = X.std(axis=0)\n",
+    "X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features\n",
+    "X_norm = (X - X_mean) / X_std\n",
+    "\n",
+    "# Center the target to zero mean (optional, to simplify intercept handling)\n",
+    "y_mean = ?\n",
+    "y_centered = ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff9c0c81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### 1a)\n",
+    "\n",
+    "Fill in the necessary details.\n",
+    "\n",
+    "After this preprocessing, each column of $\\boldsymbol{X}_norm$ has mean zero and standard deviation $1$\n",
+    "and $\\boldsymbol{y}_centered$ has mean 0. This makes the optimization landscape\n",
+    "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
+    "\\beta_j^2$ treats each coefficient fairly (since features are on the\n",
+    "same scale)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d27c70e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2, use the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{theta}$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "9f1e5184",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set regularization parameter, either a single value or a vector of values\n",
+    "lambda = ?\n",
+    "\n",
+    "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n",
+    "I = np.eye(n_features)\n",
+    "theta_closed_formRidge = ?\n",
+    "theta_closed_formOLS = ?\n",
+    "\n",
+    "print(\"Closed-form Ridge coefficients:\", theta_closed_form)\n",
+    "print(\"Closed-form OLS coefficients:\", theta_closed_form)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ec556b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This computes the ridge and OLS regression coefficients directly. The identity\n",
+    "matrix $I$ has the same size as $X^T X$ (which is n_features x\n",
+    "n_features), and lam * I adds $\\lambda$ to the diagonal of $X^T X. We\n",
+    "then invert this matrix and multiply by $X^T y. The result\n",
+    "for $\\boldsymbol{\\theta}$  is a NumPy array of shape (n_features,) containing the\n",
+    "fitted weights."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a821f0c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### 2a)\n",
+    "\n",
+    "Finalize the OLS and Ridge regression determination of the optimal parameters $bm{\\theta}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d637130e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### 2b)\n",
+    "\n",
+    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b455ce7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Implementing the simplest form for gradient descent\n",
+    "\n",
+    "Alternatively, we can fit the ridge regression model using gradient\n",
+    "descent. This is useful to visualize the iterative convergence and is\n",
+    "necessary if $n$ and $p$ are so large that the closed-form might be\n",
+    "too slow or memory-intensive. We derive the gradients from the cost\n",
+    "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n",
+    "the parameters  $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
+    "\n",
+    "Below is a template code for gradient descent implementation of ridge:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "cfa1eb29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Gradient descent parameters, learning rate eta first\n",
+    "eta = 0.1\n",
+    "# Then number of iterations\n",
+    "num_iters = 1000\n",
+    "\n",
+    "# Initialize weights for gradient descent\n",
+    "theta = np.zeros(n_features)\n",
+    "\n",
+    "# Arrays to store history for plotting\n",
+    "cost_history = np.zeros(num_iters)\n",
+    "\n",
+    "# Gradient descent loop\n",
+    "m = n_samples  # number of examples\n",
+    "for t in range(num_iters):\n",
+    "    # Compute prediction error\n",
+    "    error = X_norm.dot(theta) - y_centered \n",
+    "    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n",
+    "    cost_OLS = ?\n",
+    "    cost_Ridge = ?\n",
+    "    cost_history[t] = ?\n",
+    "    # Compute gradients for OSL and Ridge\n",
+    "    grad_OLS = ?\n",
+    "    grad_Ridge = ?\n",
+    "    # Update parameters theta\n",
+    "    theta_gdOLS = ?\n",
+    "    theta_gdRidge = ? \n",
+    "\n",
+    "# After the loop, theta contains the fitted coefficients\n",
+    "theta_gdOLS = ?\n",
+    "theta_gdRidge = ?\n",
+    "print(\"Gradient Descent OLS coefficients:\", theta_gdOLS)\n",
+    "print(\"Gradient Descent Ridge coefficients:\", theta_gdRidge)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc78d58d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### 3a)\n",
+    "\n",
+    "Discuss the results as function of the learning rate paramaters and the number of iterations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15060acb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### 3b)\n",
+    "\n",
+    "Add a stopping parameter as function of the number iterations. \n",
+    "\n",
+    "If everything worked correctly, the learned coefficients should be\n",
+    "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n",
+    "generate the data. Keep in mind that due to regularization and noise,\n",
+    "the learned values will not exactly equal the true ones, but they\n",
+    "should be in the same ballpark."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}