diff --git a/notebooks/Norms.ipynb b/notebooks/Norms.ipynb
deleted file mode 100644
index 2c9a501..0000000
--- a/notebooks/Norms.ipynb
+++ /dev/null
@@ -1,497 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Norms, singular values, and conditioning\n",
- "\n",
- "\n",
- "In this lecture we discuss matrix and vector norms. The matrix $2$-norm involves\n",
- "_singular values_, which are a measure of how matrices \"stretch\" vectors, similar to\n",
- "eigenvalues but more robust. We also introduce condition of problems, and show that\n",
- "the singular values of a matrix give a notion of a _condition number_, which allows us\n",
- "to bound errors introduced by floating point numbers in linear algebra operations.\n",
- "\n",
- "1. Vector norms: we discuss the standard $p$-norm for vectors in $ℝ^n$.\n",
- "2. Matrix norms: we discuss how two vector norms can be used to induce a norm on matrices. These\n",
- "satisfy an additional multiplicative inequality.\n",
- "3. Singular value decomposition: we introduce the singular value decomposition which is related to\n",
- "the matrix $2$-norm and best low rank approximation.\n",
- "4. Condition numbers: we will see how errors in matrix-vector multiplication and solving linear systems\n",
- "can be bounded in terms of the _condition number_, which is defined in terms of singular values.\n",
- "\n",
- "## 1. Vector norms\n",
- "\n",
- "Recall the definition of a (vector-)norm:\n",
- "\n",
- "**Definition (vector-norm)** A norm $\\|\\cdot\\|$ on $ℝ^n$ is a function that satisfies the following, for $𝐱,𝐲 ∈ ℝ^n$ and\n",
- "$c ∈ ℝ$:\n",
- "1. Triangle inequality: $\\|𝐱 + 𝐲 \\| ≤ \\|𝐱\\| + \\|𝐲\\|$\n",
- "2. Homogeneneity: $\\| c 𝐱 \\| = |c| \\| 𝐱 \\|$\n",
- "3. Positive-definiteness: $\\|𝐱\\| = 0$ implies that $𝐱 = 0$.\n",
- "\n",
- "\n",
- "Consider the following example:\n",
- "\n",
- "**Definition (p-norm)**\n",
- "For $1 ≤ p < ∞$ and $𝐱 \\in ℝ^n$, define the $p$-norm:\n",
- "$$\n",
- "\\|𝐱\\|_p := \\left(\\sum_{k=1}^n |x_k|^p\\right)^{1/p}\n",
- "$$\n",
- "where $x_k$ is the $k$-th entry of $𝐱$. \n",
- "For $p = ∞$ we define\n",
- "$$\n",
- "\\|𝐱\\|_∞ := \\max_k |x_k|\n",
- "$$\n",
- "\n",
- "**Theorem (p-norm)** $\\| ⋅ \\|_p$ is a norm for $1 ≤ p ≤ ∞$.\n",
- "\n",
- "**Proof**\n",
- "\n",
- "Homogeneity and positive-definiteness are straightforward: e.g.,\n",
- "$$\n",
- "\\|c 𝐱\\|_p = (\\sum_{k=1}^n |cx_k|^p)^{1/p} = (|c|^p \\sum_{k=1}^n |x_k|^p)^{1/p} = |c| \\| 𝐱 \\|\n",
- "$$\n",
- "and if $\\| 𝐱 \\|_p = 0$ then all $|x_k|^p$ are have to be zero.\n",
- "\n",
- "For $p = 1,∞$ the triangle inequality is also straightforward:\n",
- "$$\n",
- "\\| 𝐱 + 𝐲 \\|_∞ = \\max_k (|x_k + y_k|) ≤ \\max_k (|x_k| + |y_k|) ≤ \\|𝐱\\|_∞ + \\|𝐲\\|_∞\n",
- "$$\n",
- "and\n",
- "$$\n",
- "\\| 𝐱 + 𝐲 \\|_1 = \\sum_{k=1}^n |x_k + y_k| ≤ \\sum_{k=1}^n (|x_k| + |y_k|) = \\| 𝐱 \\|_1 + \\| 𝐲\\|_1\n",
- "$$\n",
- "\n",
- "For $p = 2$ it can be proved using the Cauchy–Schwartz inequality:\n",
- "$$\n",
- "|𝐱^⊤ 𝐲| ≤ \\| 𝐱 \\|_2 \\| 𝐲 \\|_2\n",
- "$$\n",
- "That is, we have\n",
- "$$\n",
- "\\| 𝐱 + 𝐲 \\|^2 = \\|𝐱\\|^2 + 2 𝐱^⊤ 𝐲 + \\|𝐲\\|^2 ≤ \\|𝐱\\|^2 + 2\\| 𝐱 \\| \\| 𝐲 \\| + \\|𝐲\\|^2 = (\\| 𝐱 \\| + \\| 𝐲 \\|)\n",
- "$$\n",
- "\n",
- "For general $1 < p < ∞$, it suffices to assume $\\| 𝐱 + 𝐲 \\| = 1$.\n",
- "consider $\\| 𝐱 + 𝐲 \\|^p$...\n",
- "\n",
- "∎\n",
- "\n",
- "\n",
- " In Julia can use the inbuilt `norm` function to calculate norms:\n",
- " ```julia\n",
- " norm([1,-2,3]) == norm([1,-2,3],2) == sqrt(1^2+2^2+3^2);\n",
- " norm([1,-2,3],1) == sqrt(1 + 2 + 3)\n",
- " ```\n",
- "\n",
- "\n",
- "## 2. Matrix norms\n",
- " Just like vectors, matrices have norms that measure their \"length\". The simplest example is the Fröbenius norm, \n",
- " defined for an $m \\times n$ real matrix $A$ as\n",
- "$$\n",
- "\\|A\\|_F := \\sqrt{\\sum_{k=1}^m \\sum_{j=1}^n A_{kj}^2}\n",
- "$$\n",
- "While this is the simplest norm, it is not the most useful. Instead, we will build a matrix norm from a \n",
- "vector norm:\n",
- "\n",
- "\n",
- "\n",
- "**Definition (matrix-norm)** Suppose $A ∈ ℝ^{m × n}$ and consider two norms $\\| ⋅ \\|_X$ on $ℝ^n$ and \n",
- "$\\| ⋅ \\|_Y$ on $ℝ^n$. Define the _(induced) matrix norm_ as:\n",
- "$$\n",
- "\\|A \\|_{X → Y} := \\sup_{𝐯 : \\|𝐯\\|_X=1} \\|A 𝐯\\|_Y\n",
- "$$\n",
- "Also define\n",
- "$$\n",
- "\\|A\\|_X \\triangleq \\|A\\|_{X \\rightarrow X}\n",
- "$$\n",
- "\n",
- "For the induced 2, 1, and $∞$-norm we use\n",
- "$$\n",
- "\\|A\\|_2, \\|A\\|_1 \\qquad \\hbox{and} \\qquad \\|A\\|_∞.\n",
- "$$\n",
- "\n",
- "Note an equivalent definition of the induced norm:\n",
- "$$\n",
- "\\|A\\|_{X → Y} = \\sup_{𝐱 ∈ ℝ^n, 𝐱 ≠ 0} {\\|A 𝐱\\|_Y \\over \\| 𝐱\\|_X}\n",
- "$$\n",
- "This follows since we can scale $𝐱$ by its norm so that it has unit norm, that is,\n",
- "${𝐱} \\over \\|𝐱\\|_X$ has unit norm.\n",
- "\n",
- "**Lemma (matrix norms are norms)** Induced matrix norms are norms, that is for $\\| ⋅ \\| = \\| ⋅ \\|_{X → Y}$ we have:\n",
- "1. Triangle inequality: $\\| A + B \\| ≤ \\|A\\| + \\|B\\|$\n",
- "1. Homogeneneity: $\\|c A \\| = |c| \\|A\\|$\n",
- "3. Positive-definiteness: $\\|A\\| =0 \\Rightarrow A = 0$\n",
- "In addition, they satisfy the following additional propertie:\n",
- "1. $\\|A 𝐱 \\|_Y ≤ \\|A\\|_{X → Y} \\|𝐱 \\|_X$\n",
- "2. Multiplicative inequality: $\\| AB\\|_{X → Z} ≤ \\|A \\|_{Y → Z} \\|B\\|_{X → Y}$\n",
- "\n",
- "**Proof**\n",
- "\n",
- "First we show the _triangle inequality_:\n",
- "$$\n",
- "\\|A + B \\| ≤ \\sup_{𝐯 : \\|𝐯\\|_X=1} (\\|A 𝐯\\|_Y + \\|B 𝐯\\|_Y) ≤ \\| A \\| + \\|B \\|.\n",
- "$$\n",
- "Homogeneity is also immediate. Positive-definiteness follows from the fact that if\n",
- "$\\|A\\| = 0$ then $A 𝐱 = 0$ for all $𝐱 ∈ ℝ^n$.\n",
- "The property $\\|A 𝐱 \\|_Y ≤ \\|A\\|_{X → Y} \\|𝐱 \\|_X$ follows from the definition. Finally, \n",
- "Finally, the multiplicative inequality follows from\n",
- "$$\n",
- "\\|A B\\| = \\sup_{𝐯 : \\|𝐯\\|_X=1} \\|A B 𝐯 |_Z ≤ \\sup_{𝐯 : \\|𝐯\\|_X=1} \\|A\\|_{Y → Z} \\| B 𝐯 | = \\|A \\|_{Y → Z} \\|B\\|_{X → Y}\n",
- "$$\n",
- "\n",
- "\n",
- "\n",
- "∎\n",
- "\n",
- "We have some simple examples of induced norms:\n",
- "\n",
- "**Example ($1$-norm)** We claim \n",
- "$$\n",
- "\\|A \\|_1 = \\max_j \\|𝐚_j\\|_1\n",
- "$$\n",
- "that is, the maximum $1$-norm of the columns. To see this use the triangle inequality to\n",
- "find for $\\|𝐱\\|_1 = 1$\n",
- "$$\n",
- "\\| A 𝐱 \\|_1 ≤ ∑_{j = 1}^n |x_j| \\| 𝐚_j\\|_1 ≤ \\max_j \\| 𝐚_j\\| ∑_{j = 1}^n |x_j| = \\max_j \\| 𝐚_j\\|_1.\n",
- "$$\n",
- "But the bound is also attained since if $j$ is the column that maximises the norms then\n",
- "$$\n",
- "\\|A 𝐞_j \\|_1 = \\|𝐚_j\\|_1 = \\max_j \\| 𝐚_j\\|_1.\n",
- "$$\n",
- "\n",
- "In the problem sheet we see that\n",
- "$$\n",
- "\\|A\\|_∞ = \\max_k \\|A[k,:]\\|_1\n",
- "$$\n",
- "that is, the maximum $1$-norm of the rows.\n",
- "\n",
- "\n",
- "\n",
- "An example that is not simple is $\\|A \\|_2$, but we do have two simple cases:\n",
- "\n",
- "**Proposition (diagonal/orthogonal 2-norms)** If $Λ$ is diagonal with entries $λ_k$ then\n",
- "$\\|Λ\\|_2 = \\max_k |λ_k|.$. If $Q$ is orthogonal then $\\|Q\\| = 1$.\n",
- "\n",
- "\n",
- "## 3. Singular value decomposition\n",
- "\n",
- "To define the induced $2$-norm we need to consider the following:\n",
- "\n",
- "**Definition (singular value decomposition)** For $A ∈ ℝ^{m × n}$ with rank $r > 0$, \n",
- "the _reduced singular value decomposition (SVD)_ is\n",
- "$$\n",
- "A = U Σ V^⊤\n",
- "$$\n",
- "where $U ∈ ℝ^{m × r}$ and $V ∈ ℝ^{r × n}$ have orthonormal columns and $Σ ∈ ℝ^{r × r}$ is diagonal whose\n",
- "diagonal entries, which which we call _singular values_, are all non-negative and decreasing: $σ_1 ≥ ⋯ ≥ σ_{\\min(m,n)} ≥ 0$.\n",
- "The _full singular value decomposition (SVD)_ is\n",
- "$$\n",
- "A = U Σ V^⊤\n",
- "$$\n",
- "where $U ∈ ℝ^{m × m}$ and $V ∈ ℝ^{n × n}$ are orthogonal matrices and $Σ ∈ ℝ^{m × n}$.\n",
- "\n",
- "For symmetric matrices, the SVD is related to the eigenvalue decomposition.\n",
- "Recall that a symmetric matrix has real eigenvalues and orthogonal eigenvectors:\n",
- "$$\n",
- "A = Q Λ Q^⊤ = \\underbrace{Q}_U \\underbrace{|Λ|}_Σ \\underbrace{(\\hbox{sign}(Λ) Q)^⊤}_{V^⊤}\n",
- "$$\n",
- "For non-symmetric matrices we relate it to the eigenvalues of the _Gram matrix_ $A^⊤A$ and $AA^⊤$ via:\n",
- "$$\n",
- "\\begin{align*}\n",
- "A^⊤ A = V Σ^2 V^⊤ \\\\\n",
- "A A^⊤ = U Σ^2 U^⊤\n",
- "\\end{align*}\n",
- "$$\n",
- "That is, $σ_k^2$ are non-zero eigenvalues of $A^⊤ A$ and $A A^⊤$. \n",
- "We now establish some properties of a Gram matrix:\n",
- "\n",
- "**Proposition (Gram matrix kernel)** The kernel of $A$ is the also the kernel of $A^⊤ A$. \n",
- "\n",
- "**Proof**\n",
- "If $A^⊤ A 𝐱 = 0$ then we have\n",
- "$$\n",
- "0 = 𝐱 A^⊤ A 𝐱 = \\| A 𝐱 \\|^2\n",
- "$$\n",
- "which means $A 𝐱 = 0$ and $𝐱 ∈ \\hbox{ker}(A)$.\n",
- "∎\n",
- "\n",
- "\n",
- "\n",
- "This connection allows us to prove existence:\n",
- "\n",
- "**Theorem (SVD existence)** Every $A ∈ ℝ^{m× n}$ has an SVD.\n",
- "\n",
- "**Proof**\n",
- "\n",
- "First note that $A^⊤ A = Q Λ Q^⊤$ has non-negative eigenvalues $λ_k$ as,\n",
- "for the corresponding (orthonormal) eigenvector $𝐪_k$,\n",
- "$$\n",
- "λ_k = λ_k 𝐪_k^⊤ 𝐪_k = 𝐪_k^⊤ A^⊤ A 𝐪_k = \\| A 𝐪_k \\| ≥ 0.\n",
- "$$\n",
- "Further, the kernel of $A^⊤ A$ is the same as $A$.\n",
- "Assume the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$\n",
- "are an enumeration of the non-zero eigenvalues and\n",
- "$$\n",
- "V := \\begin{bmatrix} 𝐪_1 | ⋯ | 𝐪_r \\end{bmatrix}\n",
- "$$\n",
- "the corresponding (orthonormal) eigenvectors, with\n",
- "$$\n",
- "K = \\begin{bmatrix} 𝐪_{r+1} | ⋯ | 𝐪_n \\end{bmatrix}\n",
- "$$\n",
- "the corresponding kernel. \n",
- "Define\n",
- "$$\n",
- "Σ := \\begin{bmatrix} \\sqrt{λ_1} \\\\ & ⋱ \\\\ && \\sqrt{λ_r} \\end{bmatrix}\n",
- "$$\n",
- "Now define\n",
- "$$\n",
- "U := AV Σ^{-1}\n",
- "$$\n",
- "which is orthogonal since $A^⊤ A V = Σ^2 V$:\n",
- "$$\n",
- "U^⊤ U = Σ^{-1} V^⊤ A^⊤ A V Σ^{-1} = I.\n",
- "$$\n",
- "Thus we have\n",
- "$$\n",
- "U Σ V^⊤ = A V V^⊤ = A \\underbrace{\\begin{bmatrix} V | K \\end{bmatrix}}_Q\\underbrace{\\begin{bmatrix} V^⊤ \\\\ K^⊤ \\end{bmatrix}}_{Q^⊤}\n",
- "$$\n",
- "where we use the fact that $A K = 0$ so that concatenating $K$ does not change the value.\n",
- "\n",
- "∎\n",
- "\n",
- "Singular values tell us the 2-norm:\n",
- "\n",
- "**Corollary (singular values and norm)**\n",
- "$$\n",
- "\\|A \\|_2 = σ_1\n",
- "$$\n",
- "and if $A$ is invertible, then\n",
- "$$\n",
- "\\|A^{-1} \\|_2 = σ_r^{-1}\n",
- "$$\n",
- "\n",
- "**Proof**\n",
- "\n",
- "First we establish the upper-bound:\n",
- "$$\n",
- "\\|A \\|_2 ≤ \\|U \\|_2 \\| Σ \\|_2 \\| V^⊤\\|_2 = \\| Σ \\|_2 = σ_1\n",
- "$$\n",
- "This is attained using the first right singular vector:\n",
- "$$\n",
- "\\|A 𝐯_1\\|_2 = \\|Σ V^⊤ 𝐯_1\\|_2 = \\|Σ 𝐞_1\\|_2 = σ_1\n",
- "$$\n",
- "The inverse result follows since the inverse has SVD\n",
- "$$\n",
- "A^{-1} = V Σ^{-1} U^⊤ = V (W Σ^{-1} W) U^⊤\n",
- "$$\n",
- "where\n",
- "$$\n",
- "W := P_σ = \\begin{bmatrix} && 1 \\\\ & ⋰ \\\\ 1 \\end{bmatrix}\n",
- "$$\n",
- "is the permutation that reverses the entries, that is, $σ$ has Cauchy notation\n",
- "$$\n",
- "\\begin{pmatrix}\n",
- "1 & 2 & ⋯ & n \\\\\n",
- "n & n-1 & ⋯ & 1\n",
- "\\end{pmatrix}.\n",
- "$$\n",
- "\n",
- "\n",
- "∎\n",
- "\n",
- "We will not discuss in this module computation of singular value decompositions or eigenvalues:\n",
- "they involve iterative algorithms (actually built on a sequence of QR decompositions).\n",
- "\n",
- "One of the main usages for SVDs is low-rank approximation:\n",
- "\n",
- "**Theorem (best low rank approximation)** The matrix\n",
- "$$\n",
- "A_k := \\begin{bmatrix} 𝐮_1 | ⋯ | 𝐮_k \\end{bmatrix} \\begin{bmatrix}\n",
- "σ_1 \\\\\n",
- "& ⋱ \\\\\n",
- "&& σ_k\\end{bmatrix} \\begin{bmatrix} 𝐯_1 | ⋯ | 𝐯_k \\end{bmatrix}^⊤\n",
- "$$ \n",
- "is the best 2-norm approximation of $A$ by a rank $k$ matrix, that is, for all rank-$k$ matrices $B$, we have \n",
- "$$\\|A - A_k\\|_2 ≤ \\|A -B \\|_2.$$\n",
- "\n",
- "\n",
- "**Proof**\n",
- "We have\n",
- "\n",
- "$$\n",
- "\\|A - A_k\\|_2 = \\|U \\begin{bmatrix} 0 \\cr &\\ddots \\cr && 0 \\cr &&& σ_{k+1} \\cr &&&& \\ddots \\cr &&&&& σ_n \\cr &&&&&\\vdots \\cr &&&&&0\\end{bmatrix} = σ_{r+1}\n",
- "$$\n",
- "\n",
- "\n",
- "Suppose a rank-$k$ matrix $B$ has \n",
- "$$\n",
- "\\|A-B\\|_2 < \\|A-A_k\\|_2 = σ_{k+1}.\n",
- "$$\n",
- "For all $𝐰 \\in \\ker(B)$ we have \n",
- "$$\n",
- "\\|A 𝐰\\|_2 = \\|(A-B) 𝐰\\|_2 ≤ \\|A-B\\|\\|𝐰\\|_2 < σ_{k+1} \\|𝐰\\|_2\n",
- "$$\n",
- "\n",
- "But for all $𝐮 \\in {\\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:r+1]𝐜$ for some $𝐜 \\in ℝ^{k+1}$ we have \n",
- "$$\n",
- "\\|A 𝐮\\|_2^2 = \\|U Σ_k 𝐜\\|_2^2 = \\|Σ_k 𝐜\\|_2^2 =\n",
- "\\sum_{j=1}^{k+1} (σ_j c_j)^2 ≥ σ_{k+1}^2 \\|c\\|^2,\n",
- "$$\n",
- "i.e., $\\|A 𝐮\\|_2 ≥ σ_{k+1} \\|c\\|$. Thus $𝐰$ cannot be in this span.\n",
- "\n",
- "\n",
- "The dimension of the span of $\\ker(B)$ is at least $n-k$, but the dimension of ${\\rm span}(𝐯_1,…,𝐯_{k+1})$ is at least $k+1$.\n",
- "Since these two spaces cannot intersect we have a contradiction, since $(n-r) + (r+1) = n+1 > n$. ∎\n",
- "\n",
- "\n",
- "In the problem sheet we explore the usage of low rank approximation to smooth functions.\n",
- "\n",
- "\n",
- "\n",
- "## 4. Condition numbers\n",
- "\n",
- "We have seen that floating point arithmetic induces errors in computations, and that we can typically\n",
- "bound the absolute errors to be proportional to $C ϵ_{\\rm m}$. We want a way to bound the\n",
- "effect of more complicated calculations like computing $A 𝐱$ or $A^{-1} 𝐲$ without having to deal with\n",
- "the exact nature of floating point arithmetic. Here we consider only matrix-multiplication but will make a remark\n",
- "about matrix inversion.\n",
- "\n",
- "To justify what follows, we first observe that errors in implementing matrix-vector multiplication\n",
- "can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A x`\n",
- "(implemented with floating point) is precisely $A + δA$ where $δA$ has small norm, relative to $A$.\n",
- "This is known as _backward error analysis_.\n",
- "\n",
- "\n",
- "\n",
- "To discuss floating point errors we need to be precise which order the operations happened.\n",
- "We will use the definition `mul(A,x)`, which denote ${\\rm mul}(A, 𝐱)$. (Note that `mul_rows` actually\n",
- "does the _exact_ same operations, just in a different order.) Note that each entry of the result is in fact a dot-product\n",
- "of the corresponding rows so we first consider the error in the dot product `dot(𝐱,𝐲)` as implemented in floating-point, \n",
- "which we denote ${\\rm dot}(A,x)$.\n",
- "\n",
- "We first need a helper proposition:\n",
- "\n",
- "**Proposition** If $|ϵ_i| ≤ ϵ$ and $n ϵ < 1$, then\n",
- "$$\n",
- "\\prod_{k=1}^n (1+ϵ_i) = 1+θ_n\n",
- "$$\n",
- "for some constant $θ_n$ satisfying $|θ_n| ≤ {n ϵ \\over 1-nϵ}$.\n",
- "\n",
- "The proof is left as an exercise (Hint: use induction).\n",
- "\n",
- "**Lemma (dot product backward error)**\n",
- "For any norm,\n",
- "$$\n",
- "{\\rm dot}(𝐱, 𝐲) = (𝐱 + δ𝐱)^⊤ 𝐲\n",
- "$$\n",
- "where\n",
- "$$\n",
- "\\|δ𝐱\\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\|𝐱 \\|\n",
- "$$\n",
- "\n",
- "\n",
- "**Proof**\n",
- "\n",
- "Note that \n",
- "$$\n",
- "{\\rm dot}(𝐱, 𝐲) = \\{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\\}⊕(x_n ⊗ y_n) \\\\\n",
- " & = \\{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_1) +x_3 y_3(1+δ_3)](1+γ_2) + ⋯ \\}(1+γ_{n-1})+x_n y_n(1+δ_n) \\\\\n",
- " & = x_1 y_1 (1+δ_1) \\prod_{k=1}^{n-1}(1+γ_k) + x_2 y_2 (1+δ_2) \\prod_{k=1}^{n-1}(1+γ_k) + x_3 y_3 (1+δ_3) \\prod_{k=2}^{n-1}(1+γ_k) + ⋯ +x_{n-1} y_{n-1}(1+δ_{n-1})(1+γ_{n-1}) + x_n y_n(1+δ_n) \\\\\n",
- " &= x_1y_1(1+θ_n^1) + x_2y_2(1+θ_n^2)+ x_3y_3(1+θ_{n-1}) + \\cdots + x_n y_n (1+θ_1)\n",
- "\\end{align*}\n",
- "$$\n",
- "where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ and the previous proposition tells us\n",
- "$$\n",
- "|θ_n^1|, |θ_n^2| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}\n",
- "$$\n",
- "and\n",
- "$$\n",
- "|θ_k| ≤ {k ϵ_{\\rm m} \\over 1-kϵ_{\\rm m}} ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}.\n",
- "$$\n",
- "Thus\n",
- "$$\n",
- "δ𝐱 = \\begin{pmatrix} x_1 θ_n^1 \\cr x_2 θ_n^2 \\cr x_3 θ_{n-1} \\cr \\vdots \\cr x_n θ_1\\end{pmatrix}\n",
- "$$\n",
- "and the theorem follows:\n",
- "$$\n",
- "\\| δ𝐱 \\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\| 𝐱 \\|\n",
- "$$\n",
- "\n",
- "∎\n",
- "\n",
- "**Theorem (matrix-vector backward error)**\n",
- "For any norm,\n",
- "$$\n",
- "{\\rm mul}(A, 𝐱) = (A + δA)^⊤ 𝐱\n",
- "$$\n",
- "where\n",
- "$$\n",
- "\\|δA\\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\|A \\|\n",
- "$$\n",
- "\n",
- "**Proof**\n",
- "\n",
- "Follows from applying the previous lemma to each row.\n",
- "\n",
- "∎\n",
- "\n",
- "\n",
- "So now we get to a mathematical question independent of floating point: can we bound the _relative error_ in approximating\n",
- "$$\n",
- "A 𝐱 ≈ (A + δA) 𝐱\n",
- "$$\n",
- "if we know a bound on $\\|δA\\|$?\n",
- "It turns out we can in turns of the _condition number_ of the matrix:\n",
- "\n",
- "**Definition (condition number)**\n",
- "For a square matrix $A$, the _condition number_ is\n",
- "$$\n",
- "κ(A) := \\| A \\| \\| A^{-1} \\| = {σ_1 \\over σ_n}\n",
- "$$\n",
- "\n",
- "\n",
- "**Theorem (relative-error for matrix-vector)**\n",
- "The _worst-case_ relative error in $A 𝐱 ≈ (A + δA) 𝐱$ is\n",
- "$$\n",
- "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ κ(A) {\\|δA\\| \\over \\|A \\|}\n",
- "$$\n",
- "\n",
- "**Proof**\n",
- "We can assume $A$ is invertible (as otherwise $κ(A) = ∞$). Denote $𝐲 = A 𝐱$ and we have\n",
- "$$\n",
- "{\\|𝐱 \\| \\over \\| A 𝐱 \\|} = {\\|A^{-1} 𝐲 \\| \\over \\|𝐲 \\|} ≤ \\| A^{-1}\\|\n",
- "$$\n",
- "Thus we have:\n",
- "$$\n",
- "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ \\| δA\\| \\|A^{-1}\\| ≤ κ(A) {\\|δA\\| \\over \\|A \\|}\n",
- "$$\n",
- "\n",
- "∎\n",
- "\n",
- "\n",
- "Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}$.\n",
- "\n",
- "If one uses QR to solve $A 𝐱 = 𝐲$ the condition number also gives a meaningful bound on the error. \n",
- "As we have already noted, there are some matrices where PLU decompositions introduce large errors, so\n",
- "in that case well-conditioning is not a guarantee (but it still usually works)."
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Julia 1.7.0",
- "language": "julia",
- "name": "julia-1.7"
- },
- "language_info": {
- "file_extension": ".jl",
- "mimetype": "application/julia",
- "name": "julia",
- "version": "1.7.1"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/notebooks/SingularValues.ipynb b/notebooks/SingularValues.ipynb
index 6437a11..523d31f 100644
--- a/notebooks/SingularValues.ipynb
+++ b/notebooks/SingularValues.ipynb
@@ -8,8 +8,8 @@
"\n",
"\n",
"In this lecture we discuss matrix and vector norms. The matrix $2$-norm involves\n",
- "_singular values_, which are a measure of how matrices \"stretch\" vectors, similar to\n",
- "eigenvalues but more robust. We also introduce condition of problems, and show that\n",
+ "_singular values_, which are a measure of how matrices \"stretch\" vectors. \n",
+ "We also introduce show that\n",
"the singular values of a matrix give a notion of a _condition number_, which allows us\n",
"to bound errors introduced by floating point numbers in linear algebra operations.\n",
"\n",
@@ -19,8 +19,29 @@
"3. Singular value decomposition: we introduce the singular value decomposition which is related to\n",
"the matrix $2$-norm and best low rank approximation.\n",
"4. Condition numbers: we will see how errors in matrix-vector multiplication and solving linear systems\n",
- "can be bounded in terms of the _condition number_, which is defined in terms of singular values.\n",
- "\n",
+ "can be bounded in terms of the _condition number_, which is defined in terms of singular values."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2022-02-05T14:26:10.748464Z",
+ "iopub.status.busy": "2022-02-05T14:26:10.279635Z",
+ "iopub.status.idle": "2022-02-05T14:26:15.813546Z",
+ "shell.execute_reply": "2022-02-05T14:26:15.812864Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "using LinearAlgebra, Plots"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
"## 1. Vector norms\n",
"\n",
"Recall the definition of a (vector-)norm:\n",
@@ -49,6 +70,8 @@
"\n",
"**Proof**\n",
"\n",
+ "We will only prove the case $p = 1, 2, ∞$ as general $p$ is more involved.\n",
+ "\n",
"Homogeneity and positive-definiteness are straightforward: e.g.,\n",
"$$\n",
"\\|c 𝐱\\|_p = (\\sum_{k=1}^n |cx_k|^p)^{1/p} = (|c|^p \\sum_{k=1}^n |x_k|^p)^{1/p} = |c| \\| 𝐱 \\|\n",
@@ -73,16 +96,15 @@
"\\| 𝐱 + 𝐲 \\|^2 = \\|𝐱\\|^2 + 2 𝐱^⊤ 𝐲 + \\|𝐲\\|^2 ≤ \\|𝐱\\|^2 + 2\\| 𝐱 \\| \\| 𝐲 \\| + \\|𝐲\\|^2 = (\\| 𝐱 \\| + \\| 𝐲 \\|)\n",
"$$\n",
"\n",
- "For general $1 < p < ∞$, it suffices to assume $\\| 𝐱 + 𝐲 \\| = 1$.\n",
- "consider $\\| 𝐱 + 𝐲 \\|^p$...\n",
"\n",
"∎\n",
"\n",
"\n",
" In Julia can use the inbuilt `norm` function to calculate norms:\n",
" ```julia\n",
- " norm([1,-2,3]) == norm([1,-2,3],2) == sqrt(1^2+2^2+3^2);\n",
- " norm([1,-2,3],1) == sqrt(1 + 2 + 3)\n",
+ " norm([1,-2,3]) == norm([1,-2,3], 2) == sqrt(1^2 + 2^2 + 3^2);\n",
+ " norm([1,-2,3], 1) == sqrt(1 + 2 + 3);\n",
+ " norm([1,-2,3], Inf) == 3;\n",
" ```\n",
"\n",
"\n",
@@ -92,6 +114,41 @@
"$$\n",
"\\|A\\|_F := \\sqrt{\\sum_{k=1}^m \\sum_{j=1}^n A_{kj}^2}\n",
"$$\n",
+ "This is available as `norm` in Julia:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2022-02-05T14:26:17.314876Z",
+ "iopub.status.busy": "2022-02-05T14:26:15.815322Z",
+ "iopub.status.idle": "2022-02-05T14:26:18.213634Z",
+ "shell.execute_reply": "2022-02-05T14:26:18.212896Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "true"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "A = randn(5,3)\n",
+ "norm(A) == norm(vec(A))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
"While this is the simplest norm, it is not the most useful. Instead, we will build a matrix norm from a \n",
"vector norm:\n",
"\n",
@@ -167,9 +224,45 @@
"$$\n",
"that is, the maximum $1$-norm of the rows.\n",
"\n",
- "\n",
- "\n",
- "An example that is not simple is $\\|A \\|_2$, but we do have two simple cases:\n",
+ "Matrix norms are available via `opnorm`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "execution": {
+ "iopub.execute_input": "2022-02-05T14:26:18.216129Z",
+ "iopub.status.busy": "2022-02-05T14:26:18.215539Z",
+ "iopub.status.idle": "2022-02-05T14:26:18.771829Z",
+ "shell.execute_reply": "2022-02-05T14:26:18.771362Z"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "2.9150640682982107"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "m,n = 5,3\n",
+ "A = randn(m,n)\n",
+ "opnorm(A,1) == maximum(norm(A[:,j],1) for j = 1:n)\n",
+ "opnorm(A,Inf) == maximum(norm(A[k,:],1) for k = 1:m)\n",
+ "opnorm(A) # the 2-norm"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "An example that does not have a simple formula is $\\|A \\|_2$, but we do have two simple cases:\n",
"\n",
"**Proposition (diagonal/orthogonal 2-norms)** If $Λ$ is diagonal with entries $λ_k$ then\n",
"$\\|Λ\\|_2 = \\max_k |λ_k|.$. If $Q$ is orthogonal then $\\|Q\\| = 1$.\n",
@@ -190,18 +283,18 @@
"$$\n",
"A = Ũ Σ̃ Ṽ^⊤\n",
"$$\n",
- "where $U ∈ ℝ^{m × m}$ and $V ∈ ℝ^{n × n}$ are orthogonal matrices and $Σ ∈ ℝ^{m × n}$ has only\n",
+ "where $Ũ ∈ ℝ^{m × m}$ and $Ṽ ∈ ℝ^{n × n}$ are orthogonal matrices and $Σ̃ ∈ ℝ^{m × n}$ has only\n",
"diagonal entries, i.e., if $m > n$,\n",
"$$\n",
- "Σ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_n \\\\ && 0 \\\\ && ⋮ \\\\ && 0 \\end{bmatrix}\n",
+ "Σ̃ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_n \\\\ && 0 \\\\ && ⋮ \\\\ && 0 \\end{bmatrix}\n",
"$$\n",
"and if $m < n,\n",
"$$\n",
- "Σ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_m & 0 & ⋯ & 0\\end{bmatrix}\n",
+ "Σ̃ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_m & 0 & ⋯ & 0\\end{bmatrix}\n",
"$$\n",
"where $σ_k = 0$ if $k > r$.\n",
"\n",
- "To show the SVD exists we first establish some properties of a Gram matrix:\n",
+ "To show the SVD exists we first establish some properties of a _Gram matrix_ ($A^⊤A$):\n",
"\n",
"**Proposition (Gram matrix kernel)** The kernel of $A$ is the also the kernel of $A^⊤ A$. \n",
"\n",
@@ -233,11 +326,14 @@
"\n",
"This connection allows us to prove existence:\n",
"\n",
- "**Theorem (SVD existence)** Every $A ∈ ℝ^{m× n}$ has an SVD.\n",
+ "**Theorem (SVD existence)** Every $A ∈ ℝ^{m × n}$ has an SVD.\n",
"\n",
"**Proof**\n",
- "\n",
- "Assume the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$\n",
+ "Consider\n",
+ "$$\n",
+ "A^⊤ A = Q Λ Q^⊤.\n",
+ "$$\n",
+ "Assume (as usual) that the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$\n",
"are an enumeration of the non-zero eigenvalues and\n",
"$$\n",
"V := \\begin{bmatrix} 𝐪_1 | ⋯ | 𝐪_r \\end{bmatrix}\n",
@@ -255,7 +351,7 @@
"$$\n",
"U := AV Σ^{-1}\n",
"$$\n",
- "which is orthogonal since $A^⊤ A V = Σ^2 V$:\n",
+ "which is orthogonal since $A^⊤ A V = V Σ^2 $:\n",
"$$\n",
"U^⊤ U = Σ^{-1} V^⊤ A^⊤ A V Σ^{-1} = I.\n",
"$$\n",
@@ -273,9 +369,9 @@
"$$\n",
"\\|A \\|_2 = σ_1\n",
"$$\n",
- "and if $A$ is invertible, then\n",
+ "and if $A ∈ ℝ^{n × n}$ is invertible, then\n",
"$$\n",
- "\\|A^{-1} \\|_2 = σ_r^{-1}\n",
+ "\\|A^{-1} \\|_2 = σ_n^{-1}\n",
"$$\n",
"\n",
"**Proof**\n",
@@ -292,7 +388,7 @@
"$$\n",
"A^{-1} = V Σ^{-1} U^⊤ = V (W Σ^{-1} W) U^⊤\n",
"$$\n",
- "where\n",
+ "is the SVD of $A^{-1}$, where\n",
"$$\n",
"W := P_σ = \\begin{bmatrix} && 1 \\\\ & ⋰ \\\\ 1 \\end{bmatrix}\n",
"$$\n",
@@ -327,10 +423,8 @@
"We have\n",
"\n",
"$$\n",
- "\\|A - A_k\\|_2 = \\|U \\begin{bmatrix} 0 \\cr &\\ddots \\cr && 0 \\cr &&& σ_{k+1} \\cr &&&& \\ddots \\cr &&&&& σ_n \\cr &&&&&\\vdots \\cr &&&&&0\\end{bmatrix} = σ_{r+1}\n",
+ "A - A_k = \\|U \\begin{bmatrix} 0 \\cr &\\ddots \\cr && 0 \\cr &&& σ_{k+1} \\cr &&&& \\ddots \\cr &&&&& σ_r\\end{bmatrix} V^⊤.\n",
"$$\n",
- "\n",
- "\n",
"Suppose a rank-$k$ matrix $B$ has \n",
"$$\n",
"\\|A-B\\|_2 < \\|A-A_k\\|_2 = σ_{k+1}.\n",
@@ -340,7 +434,7 @@
"\\|A 𝐰\\|_2 = \\|(A-B) 𝐰\\|_2 ≤ \\|A-B\\|\\|𝐰\\|_2 < σ_{k+1} \\|𝐰\\|_2\n",
"$$\n",
"\n",
- "But for all $𝐮 \\in {\\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:r+1]𝐜$ for some $𝐜 \\in ℝ^{k+1}$ we have \n",
+ "But for all $𝐮 \\in {\\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:k+1]𝐜$ for some $𝐜 \\in ℝ^{k+1}$ we have \n",
"$$\n",
"\\|A 𝐮\\|_2^2 = \\|U Σ_k 𝐜\\|_2^2 = \\|Σ_k 𝐜\\|_2^2 =\n",
"\\sum_{j=1}^{k+1} (σ_j c_j)^2 ≥ σ_{k+1}^2 \\|c\\|^2,\n",
@@ -358,13 +452,13 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": 4,
"metadata": {
"execution": {
- "iopub.execute_input": "2022-02-04T14:46:57.839027Z",
- "iopub.status.busy": "2022-02-04T14:46:57.441840Z",
- "iopub.status.idle": "2022-02-04T14:47:03.612331Z",
- "shell.execute_reply": "2022-02-04T14:47:03.611754Z"
+ "iopub.execute_input": "2022-02-05T14:26:18.773994Z",
+ "iopub.status.busy": "2022-02-05T14:26:18.773408Z",
+ "iopub.status.idle": "2022-02-05T14:26:19.973597Z",
+ "shell.execute_reply": "2022-02-05T14:26:19.973138Z"
}
},
"outputs": [
@@ -379,13 +473,12 @@
" 0.2 0.166667 0.142857 0.125 0.111111"
]
},
- "execution_count": 1,
+ "execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "using LinearAlgebra, Plots\n",
"function hilbertmatrix(n)\n",
" ret = zeros(n,n)\n",
" for j = 1:n, k=1:n\n",
@@ -405,13 +498,13 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 5,
"metadata": {
"execution": {
- "iopub.execute_input": "2022-02-04T14:47:04.520855Z",
- "iopub.status.busy": "2022-02-04T14:47:03.614279Z",
- "iopub.status.idle": "2022-02-04T14:47:13.320995Z",
- "shell.execute_reply": "2022-02-04T14:47:13.319968Z"
+ "iopub.execute_input": "2022-02-05T14:26:19.976953Z",
+ "iopub.status.busy": "2022-02-05T14:26:19.975449Z",
+ "iopub.status.idle": "2022-02-05T14:26:30.312323Z",
+ "shell.execute_reply": "2022-02-05T14:26:30.311896Z"
}
},
"outputs": [
@@ -421,100 +514,185 @@
"\n",
"\n"
+ "\n",
+ "\n"
]
},
- "execution_count": 2,
+ "execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -522,25 +700,28 @@
"source": [
"H = hilbertmatrix(100)\n",
"U,σ,V = svd(H)\n",
- "plot(σ; yscale=:log10)"
+ "scatter(σ; yscale=:log10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Thus we can approximate it well with a rank 20 matrix:"
+ "Note numerically we typically do not get a exactly zero singular values so the rank is always\n",
+ "treated as $\\min(m,n)$.\n",
+ "Because the singular values decay rapidly \n",
+ " we can approximate the matrix very well with a rank 20 matrix:"
]
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 6,
"metadata": {
"execution": {
- "iopub.execute_input": "2022-02-04T14:47:13.324115Z",
- "iopub.status.busy": "2022-02-04T14:47:13.323489Z",
- "iopub.status.idle": "2022-02-04T14:47:14.774032Z",
- "shell.execute_reply": "2022-02-04T14:47:14.773489Z"
+ "iopub.execute_input": "2022-02-05T14:26:30.314396Z",
+ "iopub.status.busy": "2022-02-05T14:26:30.313854Z",
+ "iopub.status.idle": "2022-02-05T14:26:31.430903Z",
+ "shell.execute_reply": "2022-02-05T14:26:31.430449Z"
}
},
"outputs": [
@@ -550,7 +731,7 @@
"1.15531964884547e-15"
]
},
- "execution_count": 3,
+ "execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -567,6 +748,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
+ "Note that this can be viewed as a _compression_ algorithm: we have replaced a matrix with \n",
+ "$100^2 = 10,000$ entries by two matrices and a vector with $4,000$ entries without losing\n",
+ "any information.\n",
"In the problem sheet we explore the usage of low rank approximation to smooth functions.\n",
"\n",
"\n",
@@ -580,7 +764,7 @@
"about matrix inversion.\n",
"\n",
"To justify what follows, we first observe that errors in implementing matrix-vector multiplication\n",
- "can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A x`\n",
+ "can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A*x`\n",
"(implemented with floating point) is precisely $A + δA$ where $δA$ has small norm, relative to $A$.\n",
"This is known as _backward error analysis_.\n",
"\n",
@@ -618,27 +802,25 @@
"Note that \n",
"$$\n",
"\\begin{align*}\n",
- "{\\rm dot}(𝐱, 𝐲) = \\{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\\}⊕(x_n ⊗ y_n) \\\\\n",
- " & = \\{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_1) +x_3 y_3(1+δ_3)](1+γ_2) + ⋯ \\}(1+γ_{n-1})+x_n y_n(1+δ_n) \\\\\n",
- " & = x_1 y_1 (1+δ_1) \\prod_{k=1}^{n-1}(1+γ_k) + x_2 y_2 (1+δ_2) \\prod_{k=1}^{n-1}(1+γ_k) + x_3 y_3 (1+δ_3) \\prod_{k=2}^{n-1}(1+γ_k) + ⋯ +x_{n-1} y_{n-1}(1+δ_{n-1})(1+γ_{n-1}) + x_n y_n(1+δ_n) \\\\\n",
- " &= x_1y_1(1+θ_n^1) + x_2y_2(1+θ_n^2)+ x_3y_3(1+θ_{n-1}) + \\cdots + x_n y_n (1+θ_1)\n",
+ "{\\rm dot}(𝐱, 𝐲) &= \\{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\\}⊕(x_n ⊗ y_n) \\\\\n",
+ " & = \\{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_2) +x_3 y_3(1+δ_3)](1+γ_3) + ⋯ +x_n y_n(1+δ_n) \\}(1+γ_n) \\\\\n",
+ " & = ∑_{j = 1}^n x_j y_j (1+δ_j) ∏_{k=j}^n (1 + γ_k)\n",
+ " & = ∑_{j = 1}^n x_j y_j (1+θ_j)\n",
"\\end{align*}\n",
"$$\n",
- "where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ and the previous proposition tells us\n",
+ "where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ \n",
+ "(with $γ_1 = 0$). Note that $θ_j$ each have at most $n$ terms each bounded by $ϵ_{\\rm m}/2$,\n",
+ "Thus the previous proposition tells us\n",
"$$\n",
- "|θ_n^1|, |θ_n^2| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}\n",
- "$$\n",
- "and\n",
- "$$\n",
- "|θ_k| ≤ {k ϵ_{\\rm m} \\over 1-kϵ_{\\rm m}} ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}.\n",
+ "|θ_j| ≤ {n ϵ_{\\rm m} \\over 2- nϵ_{\\rm m}}.\n",
"$$\n",
"Thus\n",
"$$\n",
"δ𝐱 = \\begin{pmatrix} x_1 θ_n^1 \\cr x_2 θ_n^2 \\cr x_3 θ_{n-1} \\cr \\vdots \\cr x_n θ_1\\end{pmatrix}\n",
"$$\n",
- "and the theorem follows:\n",
+ "and the theorem follows from homogeneity:\n",
"$$\n",
- "\\| δ𝐱 \\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\| 𝐱 \\|\n",
+ "\\| δ𝐱 \\| ≤ {n ϵ_{\\rm m} \\over 2-nϵ_{\\rm m}} \\| 𝐱 \\|\n",
"$$\n",
"\n",
"∎\n",
@@ -646,16 +828,15 @@
"**Theorem (matrix-vector backward error)**\n",
"For any norm,\n",
"$$\n",
- "{\\rm mul}(A, 𝐱) = (A + δA)^⊤ 𝐱\n",
+ "{\\rm mul}(A, 𝐱) = (A + δA) 𝐱\n",
"$$\n",
"where\n",
"$$\n",
- "\\|δA\\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\|A \\|\n",
+ "\\|δA\\| ≤ {n ϵ_{\\rm m} \\over 2-nϵ_{\\rm m}} \\|A \\|\n",
"$$\n",
"\n",
"**Proof**\n",
- "\n",
- "Follows from applying the previous lemma to each row.\n",
+ "Follows from applying the previous lemma to each row and homogeneity.\n",
"\n",
"∎\n",
"\n",
@@ -668,17 +849,22 @@
"It turns out we can in turns of the _condition number_ of the matrix:\n",
"\n",
"**Definition (condition number)**\n",
- "For a square matrix $A$, the _condition number_ is\n",
+ "For a square matrix $A$, the _condition number_ (in $p$-norm) is\n",
+ "$$\n",
+ "κ_p(A) := \\| A \\|_p \\| A^{-1} \\|_p\n",
+ "$$\n",
+ "with the $2$-norm:\n",
"$$\n",
- "κ(A) := \\| A \\| \\| A^{-1} \\| = {σ_1 \\over σ_n}\n",
+ "κ_2(A) = {σ_1 \\over σ_n}.\n",
"$$\n",
"\n",
"\n",
"**Theorem (relative-error for matrix-vector)**\n",
"The _worst-case_ relative error in $A 𝐱 ≈ (A + δA) 𝐱$ is\n",
"$$\n",
- "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ κ(A) {\\|δA\\| \\over \\|A \\|}\n",
+ "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ κ(A) ε\n",
"$$\n",
+ "if we have the relative pertubation error $\\|δA\\| = \\|A \\| ε$.\n",
"\n",
"**Proof**\n",
"We can assume $A$ is invertible (as otherwise $κ(A) = ∞$). Denote $𝐲 = A 𝐱$ and we have\n",
@@ -693,7 +879,7 @@
"∎\n",
"\n",
"\n",
- "Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}$.\n",
+ "Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\\rm m} \\over 2-nϵ_{\\rm m}}$.\n",
"\n",
"If one uses QR to solve $A 𝐱 = 𝐲$ the condition number also gives a meaningful bound on the error. \n",
"As we have already noted, there are some matrices where PLU decompositions introduce large errors, so\n",
@@ -711,7 +897,7 @@
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
- "version": "1.7.1"
+ "version": "1.7.0"
}
},
"nbformat": 4,
diff --git a/src/SingularValues.jmd b/src/SingularValues.jmd
index 397917f..15f2994 100644
--- a/src/SingularValues.jmd
+++ b/src/SingularValues.jmd
@@ -2,8 +2,8 @@
In this lecture we discuss matrix and vector norms. The matrix $2$-norm involves
-_singular values_, which are a measure of how matrices "stretch" vectors, similar to
-eigenvalues but more robust. We also introduce condition of problems, and show that
+_singular values_, which are a measure of how matrices "stretch" vectors.
+We also introduce show that
the singular values of a matrix give a notion of a _condition number_, which allows us
to bound errors introduced by floating point numbers in linear algebra operations.
@@ -15,6 +15,10 @@ the matrix $2$-norm and best low rank approximation.
4. Condition numbers: we will see how errors in matrix-vector multiplication and solving linear systems
can be bounded in terms of the _condition number_, which is defined in terms of singular values.
+```julia
+using LinearAlgebra, Plots
+```
+
## 1. Vector norms
Recall the definition of a (vector-)norm:
@@ -43,6 +47,8 @@ $$
**Proof**
+We will only prove the case $p = 1, 2, ∞$ as general $p$ is more involved.
+
Homogeneity and positive-definiteness are straightforward: e.g.,
$$
\|c 𝐱\|_p = (\sum_{k=1}^n |cx_k|^p)^{1/p} = (|c|^p \sum_{k=1}^n |x_k|^p)^{1/p} = |c| \| 𝐱 \|
@@ -67,16 +73,15 @@ $$
\| 𝐱 + 𝐲 \|^2 = \|𝐱\|^2 + 2 𝐱^⊤ 𝐲 + \|𝐲\|^2 ≤ \|𝐱\|^2 + 2\| 𝐱 \| \| 𝐲 \| + \|𝐲\|^2 = (\| 𝐱 \| + \| 𝐲 \|)
$$
-For general $1 < p < ∞$, it suffices to assume $\| 𝐱 + 𝐲 \| = 1$.
-consider $\| 𝐱 + 𝐲 \|^p$...
∎
In Julia can use the inbuilt `norm` function to calculate norms:
```julia
- norm([1,-2,3]) == norm([1,-2,3],2) == sqrt(1^2+2^2+3^2);
- norm([1,-2,3],1) == sqrt(1 + 2 + 3)
+ norm([1,-2,3]) == norm([1,-2,3], 2) == sqrt(1^2 + 2^2 + 3^2);
+ norm([1,-2,3], 1) == sqrt(1 + 2 + 3);
+ norm([1,-2,3], Inf) == 3;
```
@@ -86,6 +91,12 @@ consider $\| 𝐱 + 𝐲 \|^p$...
$$
\|A\|_F := \sqrt{\sum_{k=1}^m \sum_{j=1}^n A_{kj}^2}
$$
+This is available as `norm` in Julia:
+```julia
+A = randn(5,3)
+norm(A) == norm(vec(A))
+```
+
While this is the simplest norm, it is not the most useful. Instead, we will build a matrix norm from a
vector norm:
@@ -161,9 +172,17 @@ $$
$$
that is, the maximum $1$-norm of the rows.
+Matrix norms are available via `opnorm`:
+```julia
+m,n = 5,3
+A = randn(m,n)
+opnorm(A,1) == maximum(norm(A[:,j],1) for j = 1:n)
+opnorm(A,Inf) == maximum(norm(A[k,:],1) for k = 1:m)
+opnorm(A) # the 2-norm
+```
-An example that is not simple is $\|A \|_2$, but we do have two simple cases:
+An example that does not have a simple formula is $\|A \|_2$, but we do have two simple cases:
**Proposition (diagonal/orthogonal 2-norms)** If $Λ$ is diagonal with entries $λ_k$ then
$\|Λ\|_2 = \max_k |λ_k|.$. If $Q$ is orthogonal then $\|Q\| = 1$.
@@ -195,7 +214,7 @@ $$
$$
where $σ_k = 0$ if $k > r$.
-To show the SVD exists we first establish some properties of a Gram matrix:
+To show the SVD exists we first establish some properties of a _Gram matrix_ ($A^⊤A$):
**Proposition (Gram matrix kernel)** The kernel of $A$ is the also the kernel of $A^⊤ A$.
@@ -227,11 +246,14 @@ $$
This connection allows us to prove existence:
-**Theorem (SVD existence)** Every $A ∈ ℝ^{m× n}$ has an SVD.
+**Theorem (SVD existence)** Every $A ∈ ℝ^{m × n}$ has an SVD.
**Proof**
-
-Assume the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$
+Consider
+$$
+A^⊤ A = Q Λ Q^⊤.
+$$
+Assume (as usual) that the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$
are an enumeration of the non-zero eigenvalues and
$$
V := \begin{bmatrix} 𝐪_1 | ⋯ | 𝐪_r \end{bmatrix}
@@ -249,7 +271,7 @@ Now define
$$
U := AV Σ^{-1}
$$
-which is orthogonal since $A^⊤ A V = Σ^2 V$:
+which is orthogonal since $A^⊤ A V = V Σ^2 $:
$$
U^⊤ U = Σ^{-1} V^⊤ A^⊤ A V Σ^{-1} = I.
$$
@@ -267,9 +289,9 @@ Singular values tell us the 2-norm:
$$
\|A \|_2 = σ_1
$$
-and if $A$ is invertible, then
+and if $A ∈ ℝ^{n × n}$ is invertible, then
$$
-\|A^{-1} \|_2 = σ_r^{-1}
+\|A^{-1} \|_2 = σ_n^{-1}
$$
**Proof**
@@ -286,7 +308,7 @@ The inverse result follows since the inverse has SVD
$$
A^{-1} = V Σ^{-1} U^⊤ = V (W Σ^{-1} W) U^⊤
$$
-where
+is the SVD of $A^{-1}$, where
$$
W := P_σ = \begin{bmatrix} && 1 \\ & ⋰ \\ 1 \end{bmatrix}
$$
@@ -321,10 +343,8 @@ $$\|A - A_k\|_2 ≤ \|A -B \|_2.$$
We have
$$
-\|A - A_k\|_2 = \|U \begin{bmatrix} 0 \cr &\ddots \cr && 0 \cr &&& σ_{k+1} \cr &&&& \ddots \cr &&&&& σ_n \cr &&&&&\vdots \cr &&&&&0\end{bmatrix} = σ_{r+1}
+A - A_k = \|U \begin{bmatrix} 0 \cr &\ddots \cr && 0 \cr &&& σ_{k+1} \cr &&&& \ddots \cr &&&&& σ_r\end{bmatrix} V^⊤.
$$
-
-
Suppose a rank-$k$ matrix $B$ has
$$
\|A-B\|_2 < \|A-A_k\|_2 = σ_{k+1}.
@@ -334,7 +354,7 @@ $$
\|A 𝐰\|_2 = \|(A-B) 𝐰\|_2 ≤ \|A-B\|\|𝐰\|_2 < σ_{k+1} \|𝐰\|_2
$$
-But for all $𝐮 \in {\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:r+1]𝐜$ for some $𝐜 \in ℝ^{k+1}$ we have
+But for all $𝐮 \in {\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:k+1]𝐜$ for some $𝐜 \in ℝ^{k+1}$ we have
$$
\|A 𝐮\|_2^2 = \|U Σ_k 𝐜\|_2^2 = \|Σ_k 𝐜\|_2^2 =
\sum_{j=1}^{k+1} (σ_j c_j)^2 ≥ σ_{k+1}^2 \|c\|^2,
@@ -349,7 +369,6 @@ Since these two spaces cannot intersect we have a contradiction, since $(n-r) +
Here we show an example of a simple low-rank approximation using the SVD. Consider the Hilbert matrix:
```julia
-using LinearAlgebra, Plots
function hilbertmatrix(n)
ret = zeros(n,n)
for j = 1:n, k=1:n
@@ -393,7 +412,7 @@ the exact nature of floating point arithmetic. Here we consider only matrix-mult
about matrix inversion.
To justify what follows, we first observe that errors in implementing matrix-vector multiplication
-can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A x`
+can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A*x`
(implemented with floating point) is precisely $A + δA$ where $δA$ has small norm, relative to $A$.
This is known as _backward error analysis_.
@@ -431,27 +450,25 @@ $$
Note that
$$
\begin{align*}
-{\rm dot}(𝐱, 𝐲) = \{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\}⊕(x_n ⊗ y_n) \\
- & = \{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_1) +x_3 y_3(1+δ_3)](1+γ_2) + ⋯ \}(1+γ_{n-1})+x_n y_n(1+δ_n) \\
- & = x_1 y_1 (1+δ_1) \prod_{k=1}^{n-1}(1+γ_k) + x_2 y_2 (1+δ_2) \prod_{k=1}^{n-1}(1+γ_k) + x_3 y_3 (1+δ_3) \prod_{k=2}^{n-1}(1+γ_k) + ⋯ +x_{n-1} y_{n-1}(1+δ_{n-1})(1+γ_{n-1}) + x_n y_n(1+δ_n) \\
- &= x_1y_1(1+θ_n^1) + x_2y_2(1+θ_n^2)+ x_3y_3(1+θ_{n-1}) + \cdots + x_n y_n (1+θ_1)
+{\rm dot}(𝐱, 𝐲) &= \{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\}⊕(x_n ⊗ y_n) \\
+ & = \{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_2) +x_3 y_3(1+δ_3)](1+γ_3) + ⋯ +x_n y_n(1+δ_n) \}(1+γ_n) \\
+ & = ∑_{j = 1}^n x_j y_j (1+δ_j) ∏_{k=j}^n (1 + γ_k)
+ & = ∑_{j = 1}^n x_j y_j (1+θ_j)
\end{align*}
$$
-where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ and the previous proposition tells us
+where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$
+(with $γ_1 = 0$). Note that $θ_j$ each have at most $n$ terms each bounded by $ϵ_{\rm m}/2$,
+Thus the previous proposition tells us
$$
-|θ_n^1|, |θ_n^2| ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}}
-$$
-and
-$$
-|θ_k| ≤ {k ϵ_{\rm m} \over 1-kϵ_{\rm m}} ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}}.
+|θ_j| ≤ {n ϵ_{\rm m} \over 2- nϵ_{\rm m}}.
$$
Thus
$$
δ𝐱 = \begin{pmatrix} x_1 θ_n^1 \cr x_2 θ_n^2 \cr x_3 θ_{n-1} \cr \vdots \cr x_n θ_1\end{pmatrix}
$$
-and the theorem follows:
+and the theorem follows from homogeneity:
$$
-\| δ𝐱 \| ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}} \| 𝐱 \|
+\| δ𝐱 \| ≤ {n ϵ_{\rm m} \over 2-nϵ_{\rm m}} \| 𝐱 \|
$$
∎
@@ -459,16 +476,15 @@ $$
**Theorem (matrix-vector backward error)**
For any norm,
$$
-{\rm mul}(A, 𝐱) = (A + δA)^⊤ 𝐱
+{\rm mul}(A, 𝐱) = (A + δA) 𝐱
$$
where
$$
-\|δA\| ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}} \|A \|
+\|δA\| ≤ {n ϵ_{\rm m} \over 2-nϵ_{\rm m}} \|A \|
$$
**Proof**
-
-Follows from applying the previous lemma to each row.
+Follows from applying the previous lemma to each row and homogeneity.
∎
@@ -481,17 +497,22 @@ if we know a bound on $\|δA\|$?
It turns out we can in turns of the _condition number_ of the matrix:
**Definition (condition number)**
-For a square matrix $A$, the _condition number_ is
+For a square matrix $A$, the _condition number_ (in $p$-norm) is
+$$
+κ_p(A) := \| A \|_p \| A^{-1} \|_p
+$$
+with the $2$-norm:
$$
-κ(A) := \| A \| \| A^{-1} \| = {σ_1 \over σ_n}
+κ_2(A) = {σ_1 \over σ_n}.
$$
**Theorem (relative-error for matrix-vector)**
The _worst-case_ relative error in $A 𝐱 ≈ (A + δA) 𝐱$ is
$$
-{\| δA 𝐱 \| \over \| A 𝐱 \| } ≤ κ(A) {\|δA\| \over \|A \|}
+{\| δA 𝐱 \| \over \| A 𝐱 \| } ≤ κ(A) ε
$$
+if we have the relative pertubation error $\|δA\| = \|A \| ε$.
**Proof**
We can assume $A$ is invertible (as otherwise $κ(A) = ∞$). Denote $𝐲 = A 𝐱$ and we have
@@ -506,7 +527,7 @@ $$
∎
-Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\rm m} \over 1-nϵ_{\rm m}}$.
+Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\rm m} \over 2-nϵ_{\rm m}}$.
If one uses QR to solve $A 𝐱 = 𝐲$ the condition number also gives a meaningful bound on the error.
As we have already noted, there are some matrices where PLU decompositions introduce large errors, so
diff --git a/src/week5s.jmd b/src/week5s.jmd
index 46a3528..1704d17 100644
--- a/src/week5s.jmd
+++ b/src/week5s.jmd
@@ -35,4 +35,47 @@ $$
\end{bmatrix}
$$
Deduce its two Cholesky decompositions: $Δ_n = L_n L_n^⊤ = U_n U_n^⊤$ where
-$L_n$ is lower triangular and $U_n$ is upper triangular.
\ No newline at end of file
+$L_n$ is lower triangular and $U_n$ is upper triangular.
+
+## 2. Matrix norms
+
+**Problem 2.1⋆** Prove the following:
+$$
+\begin{align*}
+\|A\|_∞ &= \max_k \|A[k,:]\|_1 \\
+\|A\|_{1 → ∞} &= \|\hbox{vec}(A)\|_∞ = \max_{kj} |a_{kj}|
+\end{align*}
+$$
+
+**Problem 2.2⋆**
+
+## 3. Singular value decomposition
+
+**Problem 3.1** Consider functions sampled on a $(n+1) × (n+1)$ 2D grid
+$(x_k,y_j) = (k/n, j/n)$ where $k,j = 0,…,n$.
+For $n = 100$, what is the lowest rank $r$ such that
+the best rank-$r$ approximation to the samples
+that is accurate to within $10^{-5}$ accuracy for the following functions:
+$$
+(x + 2y)^2, \cos(\sin x \E^y), 1/(x + y + 1), $\sign(x-y)$
+$$
+For which examples does the answer change when $n = 1000$?
+
+**SOLUTION**
+
+```julia
+n = 100
+x = y = range(0, 1; length=n)
+
+f = (x,y) -> (x + 2y)^2
+g = (x,y) -> cos(sin(x)exp(y))
+h = (x,y) -> 1/(x + y + 1)
+s = (x,y) -> sign(x-y)
+
+svdvals(f.(x, y')) # rank 3
+svdvals(g.(x, y')) # rank 5
+svdvals(h.(x, y')) # rank 4
+svdvals(s.(x, y')) # rank n
+```
+
+**END**
diff --git a/src/week6s.jmd b/src/week6s.jmd
new file mode 100644
index 0000000..1bdc6b8
--- /dev/null
+++ b/src/week6s.jmd
@@ -0,0 +1,10 @@
+
+
+## 1. Condition numbers
+
+
+**Problem 1.1⋆** Prove that, if $|ϵ_i| ≤ ϵ$ and $n ϵ < 1$, then
+$$
+\prod_{k=1}^n (1+ϵ_i) = 1+θ_n
+$$
+for some constant $θ_n$ satisfying $|θ_n| ≤ {n ϵ \over 1-nϵ}$.