From c4cbedf26c86bd9dac2fbcbfaa575b8588343fc1 Mon Sep 17 00:00:00 2001 From: Sheehan Olver Date: Sat, 5 Feb 2022 14:38:27 +0000 Subject: [PATCH] SingularValues chapter --- notebooks/Norms.ipynb | 497 --------------------------------- notebooks/SingularValues.ipynb | 404 +++++++++++++++++++-------- src/SingularValues.jmd | 103 ++++--- src/week5s.jmd | 45 ++- src/week6s.jmd | 10 + 5 files changed, 411 insertions(+), 648 deletions(-) delete mode 100644 notebooks/Norms.ipynb create mode 100644 src/week6s.jmd diff --git a/notebooks/Norms.ipynb b/notebooks/Norms.ipynb deleted file mode 100644 index 2c9a501..0000000 --- a/notebooks/Norms.ipynb +++ /dev/null @@ -1,497 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Norms, singular values, and conditioning\n", - "\n", - "\n", - "In this lecture we discuss matrix and vector norms. The matrix $2$-norm involves\n", - "_singular values_, which are a measure of how matrices \"stretch\" vectors, similar to\n", - "eigenvalues but more robust. We also introduce condition of problems, and show that\n", - "the singular values of a matrix give a notion of a _condition number_, which allows us\n", - "to bound errors introduced by floating point numbers in linear algebra operations.\n", - "\n", - "1. Vector norms: we discuss the standard $p$-norm for vectors in $ℝ^n$.\n", - "2. Matrix norms: we discuss how two vector norms can be used to induce a norm on matrices. These\n", - "satisfy an additional multiplicative inequality.\n", - "3. Singular value decomposition: we introduce the singular value decomposition which is related to\n", - "the matrix $2$-norm and best low rank approximation.\n", - "4. Condition numbers: we will see how errors in matrix-vector multiplication and solving linear systems\n", - "can be bounded in terms of the _condition number_, which is defined in terms of singular values.\n", - "\n", - "## 1. Vector norms\n", - "\n", - "Recall the definition of a (vector-)norm:\n", - "\n", - "**Definition (vector-norm)** A norm $\\|\\cdot\\|$ on $ℝ^n$ is a function that satisfies the following, for $𝐱,𝐲 ∈ ℝ^n$ and\n", - "$c ∈ ℝ$:\n", - "1. Triangle inequality: $\\|𝐱 + 𝐲 \\| ≤ \\|𝐱\\| + \\|𝐲\\|$\n", - "2. Homogeneneity: $\\| c 𝐱 \\| = |c| \\| 𝐱 \\|$\n", - "3. Positive-definiteness: $\\|𝐱\\| = 0$ implies that $𝐱 = 0$.\n", - "\n", - "\n", - "Consider the following example:\n", - "\n", - "**Definition (p-norm)**\n", - "For $1 ≤ p < ∞$ and $𝐱 \\in ℝ^n$, define the $p$-norm:\n", - "$$\n", - "\\|𝐱\\|_p := \\left(\\sum_{k=1}^n |x_k|^p\\right)^{1/p}\n", - "$$\n", - "where $x_k$ is the $k$-th entry of $𝐱$. \n", - "For $p = ∞$ we define\n", - "$$\n", - "\\|𝐱\\|_∞ := \\max_k |x_k|\n", - "$$\n", - "\n", - "**Theorem (p-norm)** $\\| ⋅ \\|_p$ is a norm for $1 ≤ p ≤ ∞$.\n", - "\n", - "**Proof**\n", - "\n", - "Homogeneity and positive-definiteness are straightforward: e.g.,\n", - "$$\n", - "\\|c 𝐱\\|_p = (\\sum_{k=1}^n |cx_k|^p)^{1/p} = (|c|^p \\sum_{k=1}^n |x_k|^p)^{1/p} = |c| \\| 𝐱 \\|\n", - "$$\n", - "and if $\\| 𝐱 \\|_p = 0$ then all $|x_k|^p$ are have to be zero.\n", - "\n", - "For $p = 1,∞$ the triangle inequality is also straightforward:\n", - "$$\n", - "\\| 𝐱 + 𝐲 \\|_∞ = \\max_k (|x_k + y_k|) ≤ \\max_k (|x_k| + |y_k|) ≤ \\|𝐱\\|_∞ + \\|𝐲\\|_∞\n", - "$$\n", - "and\n", - "$$\n", - "\\| 𝐱 + 𝐲 \\|_1 = \\sum_{k=1}^n |x_k + y_k| ≤  \\sum_{k=1}^n (|x_k| + |y_k|) = \\| 𝐱 \\|_1 + \\| 𝐲\\|_1\n", - "$$\n", - "\n", - "For $p = 2$ it can be proved using the Cauchy–Schwartz inequality:\n", - "$$\n", - "|𝐱^⊤ 𝐲| ≤ \\| 𝐱 \\|_2 \\| 𝐲 \\|_2\n", - "$$\n", - "That is, we have\n", - "$$\n", - "\\| 𝐱 + 𝐲 \\|^2 = \\|𝐱\\|^2 + 2 𝐱^⊤ 𝐲 + \\|𝐲\\|^2 ≤ \\|𝐱\\|^2 + 2\\| 𝐱 \\| \\| 𝐲 \\| + \\|𝐲\\|^2 = (\\| 𝐱 \\| + \\| 𝐲 \\|)\n", - "$$\n", - "\n", - "For general $1 < p < ∞$, it suffices to assume $\\| 𝐱 + 𝐲 \\| = 1$.\n", - "consider $\\| 𝐱 + 𝐲 \\|^p$...\n", - "\n", - "∎\n", - "\n", - "\n", - " In Julia can use the inbuilt `norm` function to calculate norms:\n", - " ```julia\n", - " norm([1,-2,3]) == norm([1,-2,3],2) == sqrt(1^2+2^2+3^2);\n", - " norm([1,-2,3],1) == sqrt(1 + 2 + 3)\n", - " ```\n", - "\n", - "\n", - "## 2. Matrix norms\n", - " Just like vectors, matrices have norms that measure their \"length\". The simplest example is the Fröbenius norm, \n", - " defined for an $m \\times n$ real matrix $A$ as\n", - "$$\n", - "\\|A\\|_F := \\sqrt{\\sum_{k=1}^m \\sum_{j=1}^n A_{kj}^2}\n", - "$$\n", - "While this is the simplest norm, it is not the most useful. Instead, we will build a matrix norm from a \n", - "vector norm:\n", - "\n", - "\n", - "\n", - "**Definition (matrix-norm)** Suppose $A ∈ ℝ^{m × n}$ and consider two norms $\\| ⋅ \\|_X$ on $ℝ^n$ and \n", - "$\\| ⋅ \\|_Y$ on $ℝ^n$. Define the _(induced) matrix norm_ as:\n", - "$$\n", - "\\|A \\|_{X → Y} := \\sup_{𝐯 : \\|𝐯\\|_X=1} \\|A 𝐯\\|_Y\n", - "$$\n", - "Also define\n", - "$$\n", - "\\|A\\|_X \\triangleq \\|A\\|_{X \\rightarrow X}\n", - "$$\n", - "\n", - "For the induced 2, 1, and $∞$-norm we use\n", - "$$\n", - "\\|A\\|_2, \\|A\\|_1 \\qquad \\hbox{and} \\qquad \\|A\\|_∞.\n", - "$$\n", - "\n", - "Note an equivalent definition of the induced norm:\n", - "$$\n", - "\\|A\\|_{X → Y} = \\sup_{𝐱 ∈ ℝ^n, 𝐱 ≠ 0} {\\|A 𝐱\\|_Y \\over \\| 𝐱\\|_X}\n", - "$$\n", - "This follows since we can scale $𝐱$ by its norm so that it has unit norm, that is,\n", - "${𝐱} \\over \\|𝐱\\|_X$ has unit norm.\n", - "\n", - "**Lemma (matrix norms are norms)** Induced matrix norms are norms, that is for $\\| ⋅ \\| = \\| ⋅ \\|_{X → Y}$ we have:\n", - "1. Triangle inequality: $\\| A + B \\| ≤ \\|A\\| + \\|B\\|$\n", - "1. Homogeneneity: $\\|c A \\| = |c| \\|A\\|$\n", - "3. Positive-definiteness: $\\|A\\| =0 \\Rightarrow A = 0$\n", - "In addition, they satisfy the following additional propertie:\n", - "1. $\\|A 𝐱 \\|_Y ≤ \\|A\\|_{X → Y} \\|𝐱 \\|_X$\n", - "2. Multiplicative inequality: $\\| AB\\|_{X → Z} ≤ \\|A \\|_{Y → Z} \\|B\\|_{X → Y}$\n", - "\n", - "**Proof**\n", - "\n", - "First we show the _triangle inequality_:\n", - "$$\n", - "\\|A + B \\| ≤ \\sup_{𝐯 : \\|𝐯\\|_X=1} (\\|A 𝐯\\|_Y + \\|B 𝐯\\|_Y) ≤ \\| A \\| + \\|B \\|.\n", - "$$\n", - "Homogeneity is also immediate. Positive-definiteness follows from the fact that if\n", - "$\\|A\\| = 0$ then $A 𝐱 = 0$ for all $𝐱 ∈ ℝ^n$.\n", - "The property $\\|A 𝐱 \\|_Y ≤ \\|A\\|_{X → Y} \\|𝐱 \\|_X$ follows from the definition. Finally, \n", - "Finally, the multiplicative inequality follows from\n", - "$$\n", - "\\|A B\\| = \\sup_{𝐯 : \\|𝐯\\|_X=1} \\|A B 𝐯 |_Z ≤ \\sup_{𝐯 : \\|𝐯\\|_X=1} \\|A\\|_{Y → Z} \\| B 𝐯 | = \\|A \\|_{Y → Z} \\|B\\|_{X → Y}\n", - "$$\n", - "\n", - "\n", - "\n", - "∎\n", - "\n", - "We have some simple examples of induced norms:\n", - "\n", - "**Example ($1$-norm)** We claim \n", - "$$\n", - "\\|A \\|_1 = \\max_j \\|𝐚_j\\|_1\n", - "$$\n", - "that is, the maximum $1$-norm of the columns. To see this use the triangle inequality to\n", - "find for $\\|𝐱\\|_1 = 1$\n", - "$$\n", - "\\| A 𝐱 \\|_1 ≤ ∑_{j = 1}^n |x_j| \\| 𝐚_j\\|_1 ≤ \\max_j \\| 𝐚_j\\| ∑_{j = 1}^n |x_j| = \\max_j \\| 𝐚_j\\|_1.\n", - "$$\n", - "But the bound is also attained since if $j$ is the column that maximises the norms then\n", - "$$\n", - "\\|A 𝐞_j \\|_1 = \\|𝐚_j\\|_1 = \\max_j \\| 𝐚_j\\|_1.\n", - "$$\n", - "\n", - "In the problem sheet we see that\n", - "$$\n", - "\\|A\\|_∞ = \\max_k \\|A[k,:]\\|_1\n", - "$$\n", - "that is, the maximum $1$-norm of the rows.\n", - "\n", - "\n", - "\n", - "An example that is not simple is $\\|A \\|_2$, but we do have two simple cases:\n", - "\n", - "**Proposition (diagonal/orthogonal 2-norms)** If $Λ$ is diagonal with entries $λ_k$ then\n", - "$\\|Λ\\|_2 = \\max_k |λ_k|.$. If $Q$ is orthogonal then $\\|Q\\| = 1$.\n", - "\n", - "\n", - "## 3. Singular value decomposition\n", - "\n", - "To define the induced $2$-norm we need to consider the following:\n", - "\n", - "**Definition (singular value decomposition)** For $A ∈ ℝ^{m × n}$ with rank $r > 0$, \n", - "the _reduced singular value decomposition (SVD)_ is\n", - "$$\n", - "A = U Σ V^⊤\n", - "$$\n", - "where $U ∈ ℝ^{m × r}$ and $V ∈ ℝ^{r × n}$ have orthonormal columns and $Σ ∈ ℝ^{r × r}$ is diagonal whose\n", - "diagonal entries, which which we call _singular values_, are all non-negative and decreasing: $σ_1 ≥ ⋯ ≥ σ_{\\min(m,n)} ≥ 0$.\n", - "The _full singular value decomposition (SVD)_ is\n", - "$$\n", - "A = U Σ V^⊤\n", - "$$\n", - "where $U ∈ ℝ^{m × m}$ and $V ∈ ℝ^{n × n}$ are orthogonal matrices and $Σ ∈ ℝ^{m × n}$.\n", - "\n", - "For symmetric matrices, the SVD is related to the eigenvalue decomposition.\n", - "Recall that a symmetric matrix has real eigenvalues and orthogonal eigenvectors:\n", - "$$\n", - "A = Q Λ Q^⊤ = \\underbrace{Q}_U \\underbrace{|Λ|}_Σ \\underbrace{(\\hbox{sign}(Λ) Q)^⊤}_{V^⊤}\n", - "$$\n", - "For non-symmetric matrices we relate it to the eigenvalues of the _Gram matrix_ $A^⊤A$ and $AA^⊤$ via:\n", - "$$\n", - "\\begin{align*}\n", - "A^⊤ A = V Σ^2 V^⊤ \\\\\n", - "A A^⊤ = U Σ^2 U^⊤\n", - "\\end{align*}\n", - "$$\n", - "That is, $σ_k^2$ are non-zero eigenvalues of $A^⊤ A$ and $A A^⊤$. \n", - "We now establish some properties of a Gram matrix:\n", - "\n", - "**Proposition (Gram matrix kernel)** The kernel of $A$ is the also the kernel of $A^⊤ A$. \n", - "\n", - "**Proof**\n", - "If $A^⊤ A 𝐱 = 0$ then we have\n", - "$$\n", - "0 = 𝐱 A^⊤ A 𝐱 = \\| A 𝐱 \\|^2\n", - "$$\n", - "which means $A 𝐱 = 0$ and $𝐱 ∈ \\hbox{ker}(A)$.\n", - "∎\n", - "\n", - "\n", - "\n", - "This connection allows us to prove existence:\n", - "\n", - "**Theorem (SVD existence)** Every $A ∈ ℝ^{m× n}$ has an SVD.\n", - "\n", - "**Proof**\n", - "\n", - "First note that $A^⊤ A = Q Λ Q^⊤$ has non-negative eigenvalues $λ_k$ as,\n", - "for the corresponding (orthonormal) eigenvector $𝐪_k$,\n", - "$$\n", - "λ_k = λ_k 𝐪_k^⊤ 𝐪_k = 𝐪_k^⊤ A^⊤ A 𝐪_k = \\| A 𝐪_k \\| ≥ 0.\n", - "$$\n", - "Further, the kernel of $A^⊤ A$ is the same as $A$.\n", - "Assume the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$\n", - "are an enumeration of the non-zero eigenvalues and\n", - "$$\n", - "V := \\begin{bmatrix} 𝐪_1 | ⋯ | 𝐪_r \\end{bmatrix}\n", - "$$\n", - "the corresponding (orthonormal) eigenvectors, with\n", - "$$\n", - "K = \\begin{bmatrix} 𝐪_{r+1} | ⋯ | 𝐪_n \\end{bmatrix}\n", - "$$\n", - "the corresponding kernel. \n", - "Define\n", - "$$\n", - "Σ := \\begin{bmatrix} \\sqrt{λ_1} \\\\ & ⋱ \\\\ && \\sqrt{λ_r} \\end{bmatrix}\n", - "$$\n", - "Now define\n", - "$$\n", - "U := AV Σ^{-1}\n", - "$$\n", - "which is orthogonal since $A^⊤ A V = Σ^2 V$:\n", - "$$\n", - "U^⊤ U = Σ^{-1} V^⊤ A^⊤ A V Σ^{-1} = I.\n", - "$$\n", - "Thus we have\n", - "$$\n", - "U Σ V^⊤ = A V V^⊤ = A \\underbrace{\\begin{bmatrix} V | K \\end{bmatrix}}_Q\\underbrace{\\begin{bmatrix} V^⊤ \\\\ K^⊤ \\end{bmatrix}}_{Q^⊤}\n", - "$$\n", - "where we use the fact that $A K = 0$ so that concatenating $K$ does not change the value.\n", - "\n", - "∎\n", - "\n", - "Singular values tell us the 2-norm:\n", - "\n", - "**Corollary (singular values and norm)**\n", - "$$\n", - "\\|A \\|_2 = σ_1\n", - "$$\n", - "and if $A$ is invertible, then\n", - "$$\n", - "\\|A^{-1} \\|_2 = σ_r^{-1}\n", - "$$\n", - "\n", - "**Proof**\n", - "\n", - "First we establish the upper-bound:\n", - "$$\n", - "\\|A \\|_2 ≤  \\|U \\|_2 \\| Σ \\|_2 \\| V^⊤\\|_2 = \\| Σ \\|_2 = σ_1\n", - "$$\n", - "This is attained using the first right singular vector:\n", - "$$\n", - "\\|A 𝐯_1\\|_2 = \\|Σ V^⊤ 𝐯_1\\|_2 = \\|Σ 𝐞_1\\|_2 = σ_1\n", - "$$\n", - "The inverse result follows since the inverse has SVD\n", - "$$\n", - "A^{-1} = V Σ^{-1} U^⊤ = V (W Σ^{-1} W) U^⊤\n", - "$$\n", - "where\n", - "$$\n", - "W := P_σ = \\begin{bmatrix} && 1 \\\\ & ⋰ \\\\ 1 \\end{bmatrix}\n", - "$$\n", - "is the permutation that reverses the entries, that is, $σ$ has Cauchy notation\n", - "$$\n", - "\\begin{pmatrix}\n", - "1 & 2 & ⋯ & n \\\\\n", - "n & n-1 & ⋯ & 1\n", - "\\end{pmatrix}.\n", - "$$\n", - "\n", - "\n", - "∎\n", - "\n", - "We will not discuss in this module computation of singular value decompositions or eigenvalues:\n", - "they involve iterative algorithms (actually built on a sequence of QR decompositions).\n", - "\n", - "One of the main usages for SVDs is low-rank approximation:\n", - "\n", - "**Theorem (best low rank approximation)** The matrix\n", - "$$\n", - "A_k := \\begin{bmatrix} 𝐮_1 | ⋯ | 𝐮_k \\end{bmatrix} \\begin{bmatrix}\n", - "σ_1 \\\\\n", - "& ⋱ \\\\\n", - "&& σ_k\\end{bmatrix} \\begin{bmatrix} 𝐯_1 | ⋯ | 𝐯_k \\end{bmatrix}^⊤\n", - "$$ \n", - "is the best 2-norm approximation of $A$ by a rank $k$ matrix, that is, for all rank-$k$ matrices $B$, we have \n", - "$$\\|A - A_k\\|_2 ≤ \\|A -B \\|_2.$$\n", - "\n", - "\n", - "**Proof**\n", - "We have\n", - "\n", - "$$\n", - "\\|A - A_k\\|_2 = \\|U \\begin{bmatrix} 0 \\cr &\\ddots \\cr && 0 \\cr &&& σ_{k+1} \\cr &&&& \\ddots \\cr &&&&& σ_n \\cr &&&&&\\vdots \\cr &&&&&0\\end{bmatrix} = σ_{r+1}\n", - "$$\n", - "\n", - "\n", - "Suppose a rank-$k$ matrix $B$ has \n", - "$$\n", - "\\|A-B\\|_2 < \\|A-A_k\\|_2 = σ_{k+1}.\n", - "$$\n", - "For all $𝐰 \\in \\ker(B)$ we have \n", - "$$\n", - "\\|A 𝐰\\|_2 = \\|(A-B) 𝐰\\|_2 ≤ \\|A-B\\|\\|𝐰\\|_2 < σ_{k+1} \\|𝐰\\|_2\n", - "$$\n", - "\n", - "But for all $𝐮 \\in {\\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:r+1]𝐜$ for some $𝐜 \\in ℝ^{k+1}$ we have \n", - "$$\n", - "\\|A 𝐮\\|_2^2 = \\|U Σ_k 𝐜\\|_2^2 = \\|Σ_k 𝐜\\|_2^2 =\n", - "\\sum_{j=1}^{k+1} (σ_j c_j)^2 ≥ σ_{k+1}^2 \\|c\\|^2,\n", - "$$\n", - "i.e., $\\|A 𝐮\\|_2 ≥ σ_{k+1} \\|c\\|$. Thus $𝐰$ cannot be in this span.\n", - "\n", - "\n", - "The dimension of the span of $\\ker(B)$ is at least $n-k$, but the dimension of ${\\rm span}(𝐯_1,…,𝐯_{k+1})$ is at least $k+1$.\n", - "Since these two spaces cannot intersect we have a contradiction, since $(n-r) + (r+1) = n+1 > n$. ∎\n", - "\n", - "\n", - "In the problem sheet we explore the usage of low rank approximation to smooth functions.\n", - "\n", - "\n", - "\n", - "## 4. Condition numbers\n", - "\n", - "We have seen that floating point arithmetic induces errors in computations, and that we can typically\n", - "bound the absolute errors to be proportional to $C ϵ_{\\rm m}$. We want a way to bound the\n", - "effect of more complicated calculations like computing $A 𝐱$ or $A^{-1} 𝐲$ without having to deal with\n", - "the exact nature of floating point arithmetic. Here we consider only matrix-multiplication but will make a remark\n", - "about matrix inversion.\n", - "\n", - "To justify what follows, we first observe that errors in implementing matrix-vector multiplication\n", - "can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A x`\n", - "(implemented with floating point) is precisely $A + δA$ where $δA$ has small norm, relative to $A$.\n", - "This is known as _backward error analysis_.\n", - "\n", - "\n", - "\n", - "To discuss floating point errors we need to be precise which order the operations happened.\n", - "We will use the definition `mul(A,x)`, which denote ${\\rm mul}(A, 𝐱)$. (Note that `mul_rows` actually\n", - "does the _exact_ same operations, just in a different order.) Note that each entry of the result is in fact a dot-product\n", - "of the corresponding rows so we first consider the error in the dot product `dot(𝐱,𝐲)` as implemented in floating-point, \n", - "which we denote ${\\rm dot}(A,x)$.\n", - "\n", - "We first need a helper proposition:\n", - "\n", - "**Proposition** If $|ϵ_i| ≤ ϵ$ and $n ϵ < 1$, then\n", - "$$\n", - "\\prod_{k=1}^n (1+ϵ_i) = 1+θ_n\n", - "$$\n", - "for some constant $θ_n$ satisfying $|θ_n| ≤ {n ϵ \\over 1-nϵ}$.\n", - "\n", - "The proof is left as an exercise (Hint: use induction).\n", - "\n", - "**Lemma (dot product backward error)**\n", - "For any norm,\n", - "$$\n", - "{\\rm dot}(𝐱, 𝐲) = (𝐱 + δ𝐱)^⊤ 𝐲\n", - "$$\n", - "where\n", - "$$\n", - "\\|δ𝐱\\| ≤  {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\|𝐱 \\|\n", - "$$\n", - "\n", - "\n", - "**Proof**\n", - "\n", - "Note that \n", - "$$\n", - "{\\rm dot}(𝐱, 𝐲) = \\{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\\}⊕(x_n ⊗ y_n) \\\\\n", - " & = \\{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_1) +x_3 y_3(1+δ_3)](1+γ_2) + ⋯ \\}(1+γ_{n-1})+x_n y_n(1+δ_n) \\\\\n", - " & = x_1 y_1 (1+δ_1) \\prod_{k=1}^{n-1}(1+γ_k) + x_2 y_2 (1+δ_2) \\prod_{k=1}^{n-1}(1+γ_k) + x_3 y_3 (1+δ_3) \\prod_{k=2}^{n-1}(1+γ_k) + ⋯ +x_{n-1} y_{n-1}(1+δ_{n-1})(1+γ_{n-1}) + x_n y_n(1+δ_n) \\\\\n", - " &= x_1y_1(1+θ_n^1) + x_2y_2(1+θ_n^2)+ x_3y_3(1+θ_{n-1}) + \\cdots + x_n y_n (1+θ_1)\n", - "\\end{align*}\n", - "$$\n", - "where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ and the previous proposition tells us\n", - "$$\n", - "|θ_n^1|, |θ_n^2| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}\n", - "$$\n", - "and\n", - "$$\n", - "|θ_k| ≤ {k ϵ_{\\rm m} \\over 1-kϵ_{\\rm m}} ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}.\n", - "$$\n", - "Thus\n", - "$$\n", - "δ𝐱 = \\begin{pmatrix} x_1 θ_n^1 \\cr x_2 θ_n^2 \\cr x_3 θ_{n-1} \\cr \\vdots \\cr x_n θ_1\\end{pmatrix}\n", - "$$\n", - "and the theorem follows:\n", - "$$\n", - "\\| δ𝐱 \\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\| 𝐱 \\|\n", - "$$\n", - "\n", - "∎\n", - "\n", - "**Theorem (matrix-vector backward error)**\n", - "For any norm,\n", - "$$\n", - "{\\rm mul}(A, 𝐱) = (A + δA)^⊤ 𝐱\n", - "$$\n", - "where\n", - "$$\n", - "\\|δA\\| ≤  {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\|A \\|\n", - "$$\n", - "\n", - "**Proof**\n", - "\n", - "Follows from applying the previous lemma to each row.\n", - "\n", - "∎\n", - "\n", - "\n", - "So now we get to a mathematical question independent of floating point: can we bound the _relative error_ in approximating\n", - "$$\n", - "A 𝐱 ≈ (A + δA) 𝐱\n", - "$$\n", - "if we know a bound on $\\|δA\\|$?\n", - "It turns out we can in turns of the _condition number_ of the matrix:\n", - "\n", - "**Definition (condition number)**\n", - "For a square matrix $A$, the _condition number_ is\n", - "$$\n", - "κ(A) := \\| A \\| \\| A^{-1} \\| = {σ_1 \\over σ_n}\n", - "$$\n", - "\n", - "\n", - "**Theorem (relative-error for matrix-vector)**\n", - "The _worst-case_ relative error in $A 𝐱 ≈ (A + δA) 𝐱$ is\n", - "$$\n", - "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ κ(A) {\\|δA\\| \\over \\|A \\|}\n", - "$$\n", - "\n", - "**Proof**\n", - "We can assume $A$ is invertible (as otherwise $κ(A) = ∞$). Denote $𝐲 = A 𝐱$ and we have\n", - "$$\n", - "{\\|𝐱 \\| \\over \\| A 𝐱 \\|} = {\\|A^{-1} 𝐲 \\| \\over \\|𝐲 \\|} ≤ \\| A^{-1}\\|\n", - "$$\n", - "Thus we have:\n", - "$$\n", - "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ \\| δA\\| \\|A^{-1}\\| ≤ κ(A) {\\|δA\\| \\over \\|A \\|}\n", - "$$\n", - "\n", - "∎\n", - "\n", - "\n", - "Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}$.\n", - "\n", - "If one uses QR to solve $A 𝐱 = 𝐲$ the condition number also gives a meaningful bound on the error. \n", - "As we have already noted, there are some matrices where PLU decompositions introduce large errors, so\n", - "in that case well-conditioning is not a guarantee (but it still usually works)." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Julia 1.7.0", - "language": "julia", - "name": "julia-1.7" - }, - "language_info": { - "file_extension": ".jl", - "mimetype": "application/julia", - "name": "julia", - "version": "1.7.1" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/notebooks/SingularValues.ipynb b/notebooks/SingularValues.ipynb index 6437a11..523d31f 100644 --- a/notebooks/SingularValues.ipynb +++ b/notebooks/SingularValues.ipynb @@ -8,8 +8,8 @@ "\n", "\n", "In this lecture we discuss matrix and vector norms. The matrix $2$-norm involves\n", - "_singular values_, which are a measure of how matrices \"stretch\" vectors, similar to\n", - "eigenvalues but more robust. We also introduce condition of problems, and show that\n", + "_singular values_, which are a measure of how matrices \"stretch\" vectors. \n", + "We also introduce show that\n", "the singular values of a matrix give a notion of a _condition number_, which allows us\n", "to bound errors introduced by floating point numbers in linear algebra operations.\n", "\n", @@ -19,8 +19,29 @@ "3. Singular value decomposition: we introduce the singular value decomposition which is related to\n", "the matrix $2$-norm and best low rank approximation.\n", "4. Condition numbers: we will see how errors in matrix-vector multiplication and solving linear systems\n", - "can be bounded in terms of the _condition number_, which is defined in terms of singular values.\n", - "\n", + "can be bounded in terms of the _condition number_, which is defined in terms of singular values." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "execution": { + "iopub.execute_input": "2022-02-05T14:26:10.748464Z", + "iopub.status.busy": "2022-02-05T14:26:10.279635Z", + "iopub.status.idle": "2022-02-05T14:26:15.813546Z", + "shell.execute_reply": "2022-02-05T14:26:15.812864Z" + } + }, + "outputs": [], + "source": [ + "using LinearAlgebra, Plots" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "## 1. Vector norms\n", "\n", "Recall the definition of a (vector-)norm:\n", @@ -49,6 +70,8 @@ "\n", "**Proof**\n", "\n", + "We will only prove the case $p = 1, 2, ∞$ as general $p$ is more involved.\n", + "\n", "Homogeneity and positive-definiteness are straightforward: e.g.,\n", "$$\n", "\\|c 𝐱\\|_p = (\\sum_{k=1}^n |cx_k|^p)^{1/p} = (|c|^p \\sum_{k=1}^n |x_k|^p)^{1/p} = |c| \\| 𝐱 \\|\n", @@ -73,16 +96,15 @@ "\\| 𝐱 + 𝐲 \\|^2 = \\|𝐱\\|^2 + 2 𝐱^⊤ 𝐲 + \\|𝐲\\|^2 ≤ \\|𝐱\\|^2 + 2\\| 𝐱 \\| \\| 𝐲 \\| + \\|𝐲\\|^2 = (\\| 𝐱 \\| + \\| 𝐲 \\|)\n", "$$\n", "\n", - "For general $1 < p < ∞$, it suffices to assume $\\| 𝐱 + 𝐲 \\| = 1$.\n", - "consider $\\| 𝐱 + 𝐲 \\|^p$...\n", "\n", "∎\n", "\n", "\n", " In Julia can use the inbuilt `norm` function to calculate norms:\n", " ```julia\n", - " norm([1,-2,3]) == norm([1,-2,3],2) == sqrt(1^2+2^2+3^2);\n", - " norm([1,-2,3],1) == sqrt(1 + 2 + 3)\n", + " norm([1,-2,3]) == norm([1,-2,3], 2) == sqrt(1^2 + 2^2 + 3^2);\n", + " norm([1,-2,3], 1) == sqrt(1 + 2 + 3);\n", + " norm([1,-2,3], Inf) == 3;\n", " ```\n", "\n", "\n", @@ -92,6 +114,41 @@ "$$\n", "\\|A\\|_F := \\sqrt{\\sum_{k=1}^m \\sum_{j=1}^n A_{kj}^2}\n", "$$\n", + "This is available as `norm` in Julia:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "execution": { + "iopub.execute_input": "2022-02-05T14:26:17.314876Z", + "iopub.status.busy": "2022-02-05T14:26:15.815322Z", + "iopub.status.idle": "2022-02-05T14:26:18.213634Z", + "shell.execute_reply": "2022-02-05T14:26:18.212896Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "true" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "A = randn(5,3)\n", + "norm(A) == norm(vec(A))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "While this is the simplest norm, it is not the most useful. Instead, we will build a matrix norm from a \n", "vector norm:\n", "\n", @@ -167,9 +224,45 @@ "$$\n", "that is, the maximum $1$-norm of the rows.\n", "\n", - "\n", - "\n", - "An example that is not simple is $\\|A \\|_2$, but we do have two simple cases:\n", + "Matrix norms are available via `opnorm`:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "execution": { + "iopub.execute_input": "2022-02-05T14:26:18.216129Z", + "iopub.status.busy": "2022-02-05T14:26:18.215539Z", + "iopub.status.idle": "2022-02-05T14:26:18.771829Z", + "shell.execute_reply": "2022-02-05T14:26:18.771362Z" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2.9150640682982107" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "m,n = 5,3\n", + "A = randn(m,n)\n", + "opnorm(A,1) == maximum(norm(A[:,j],1) for j = 1:n)\n", + "opnorm(A,Inf) == maximum(norm(A[k,:],1) for k = 1:m)\n", + "opnorm(A) # the 2-norm" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An example that does not have a simple formula is $\\|A \\|_2$, but we do have two simple cases:\n", "\n", "**Proposition (diagonal/orthogonal 2-norms)** If $Λ$ is diagonal with entries $λ_k$ then\n", "$\\|Λ\\|_2 = \\max_k |λ_k|.$. If $Q$ is orthogonal then $\\|Q\\| = 1$.\n", @@ -190,18 +283,18 @@ "$$\n", "A = Ũ Σ̃ Ṽ^⊤\n", "$$\n", - "where $U ∈ ℝ^{m × m}$ and $V ∈ ℝ^{n × n}$ are orthogonal matrices and $Σ ∈ ℝ^{m × n}$ has only\n", + "where $Ũ ∈ ℝ^{m × m}$ and $Ṽ ∈ ℝ^{n × n}$ are orthogonal matrices and $Σ̃ ∈ ℝ^{m × n}$ has only\n", "diagonal entries, i.e., if $m > n$,\n", "$$\n", - "Σ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_n \\\\ && 0 \\\\ && ⋮ \\\\ && 0 \\end{bmatrix}\n", + "Σ̃ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_n \\\\ && 0 \\\\ && ⋮ \\\\ && 0 \\end{bmatrix}\n", "$$\n", "and if $m < n,\n", "$$\n", - "Σ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_m & 0 & ⋯ & 0\\end{bmatrix}\n", + "Σ̃ = \\begin{bmatrix} σ_1 \\\\ & ⋱ \\\\ && σ_m & 0 & ⋯ & 0\\end{bmatrix}\n", "$$\n", "where $σ_k = 0$ if $k > r$.\n", "\n", - "To show the SVD exists we first establish some properties of a Gram matrix:\n", + "To show the SVD exists we first establish some properties of a _Gram matrix_ ($A^⊤A$):\n", "\n", "**Proposition (Gram matrix kernel)** The kernel of $A$ is the also the kernel of $A^⊤ A$. \n", "\n", @@ -233,11 +326,14 @@ "\n", "This connection allows us to prove existence:\n", "\n", - "**Theorem (SVD existence)** Every $A ∈ ℝ^{m× n}$ has an SVD.\n", + "**Theorem (SVD existence)** Every $A ∈ ℝ^{m × n}$ has an SVD.\n", "\n", "**Proof**\n", - "\n", - "Assume the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$\n", + "Consider\n", + "$$\n", + "A^⊤ A = Q Λ Q^⊤.\n", + "$$\n", + "Assume (as usual) that the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$\n", "are an enumeration of the non-zero eigenvalues and\n", "$$\n", "V := \\begin{bmatrix} 𝐪_1 | ⋯ | 𝐪_r \\end{bmatrix}\n", @@ -255,7 +351,7 @@ "$$\n", "U := AV Σ^{-1}\n", "$$\n", - "which is orthogonal since $A^⊤ A V = Σ^2 V$:\n", + "which is orthogonal since $A^⊤ A V = V Σ^2 $:\n", "$$\n", "U^⊤ U = Σ^{-1} V^⊤ A^⊤ A V Σ^{-1} = I.\n", "$$\n", @@ -273,9 +369,9 @@ "$$\n", "\\|A \\|_2 = σ_1\n", "$$\n", - "and if $A$ is invertible, then\n", + "and if $A ∈ ℝ^{n × n}$ is invertible, then\n", "$$\n", - "\\|A^{-1} \\|_2 = σ_r^{-1}\n", + "\\|A^{-1} \\|_2 = σ_n^{-1}\n", "$$\n", "\n", "**Proof**\n", @@ -292,7 +388,7 @@ "$$\n", "A^{-1} = V Σ^{-1} U^⊤ = V (W Σ^{-1} W) U^⊤\n", "$$\n", - "where\n", + "is the SVD of $A^{-1}$, where\n", "$$\n", "W := P_σ = \\begin{bmatrix} && 1 \\\\ & ⋰ \\\\ 1 \\end{bmatrix}\n", "$$\n", @@ -327,10 +423,8 @@ "We have\n", "\n", "$$\n", - "\\|A - A_k\\|_2 = \\|U \\begin{bmatrix} 0 \\cr &\\ddots \\cr && 0 \\cr &&& σ_{k+1} \\cr &&&& \\ddots \\cr &&&&& σ_n \\cr &&&&&\\vdots \\cr &&&&&0\\end{bmatrix} = σ_{r+1}\n", + "A - A_k = \\|U \\begin{bmatrix} 0 \\cr &\\ddots \\cr && 0 \\cr &&& σ_{k+1} \\cr &&&& \\ddots \\cr &&&&& σ_r\\end{bmatrix} V^⊤.\n", "$$\n", - "\n", - "\n", "Suppose a rank-$k$ matrix $B$ has \n", "$$\n", "\\|A-B\\|_2 < \\|A-A_k\\|_2 = σ_{k+1}.\n", @@ -340,7 +434,7 @@ "\\|A 𝐰\\|_2 = \\|(A-B) 𝐰\\|_2 ≤ \\|A-B\\|\\|𝐰\\|_2 < σ_{k+1} \\|𝐰\\|_2\n", "$$\n", "\n", - "But for all $𝐮 \\in {\\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:r+1]𝐜$ for some $𝐜 \\in ℝ^{k+1}$ we have \n", + "But for all $𝐮 \\in {\\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:k+1]𝐜$ for some $𝐜 \\in ℝ^{k+1}$ we have \n", "$$\n", "\\|A 𝐮\\|_2^2 = \\|U Σ_k 𝐜\\|_2^2 = \\|Σ_k 𝐜\\|_2^2 =\n", "\\sum_{j=1}^{k+1} (σ_j c_j)^2 ≥ σ_{k+1}^2 \\|c\\|^2,\n", @@ -358,13 +452,13 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 4, "metadata": { "execution": { - "iopub.execute_input": "2022-02-04T14:46:57.839027Z", - "iopub.status.busy": "2022-02-04T14:46:57.441840Z", - "iopub.status.idle": "2022-02-04T14:47:03.612331Z", - "shell.execute_reply": "2022-02-04T14:47:03.611754Z" + "iopub.execute_input": "2022-02-05T14:26:18.773994Z", + "iopub.status.busy": "2022-02-05T14:26:18.773408Z", + "iopub.status.idle": "2022-02-05T14:26:19.973597Z", + "shell.execute_reply": "2022-02-05T14:26:19.973138Z" } }, "outputs": [ @@ -379,13 +473,12 @@ " 0.2 0.166667 0.142857 0.125 0.111111" ] }, - "execution_count": 1, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "using LinearAlgebra, Plots\n", "function hilbertmatrix(n)\n", " ret = zeros(n,n)\n", " for j = 1:n, k=1:n\n", @@ -405,13 +498,13 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 5, "metadata": { "execution": { - "iopub.execute_input": "2022-02-04T14:47:04.520855Z", - "iopub.status.busy": "2022-02-04T14:47:03.614279Z", - "iopub.status.idle": "2022-02-04T14:47:13.320995Z", - "shell.execute_reply": "2022-02-04T14:47:13.319968Z" + "iopub.execute_input": "2022-02-05T14:26:19.976953Z", + "iopub.status.busy": "2022-02-05T14:26:19.975449Z", + "iopub.status.idle": "2022-02-05T14:26:30.312323Z", + "shell.execute_reply": "2022-02-05T14:26:30.311896Z" } }, "outputs": [ @@ -421,100 +514,185 @@ "\n", "\n", "\n", - " \n", + " \n", " \n", " \n", "\n", - "\n", "\n", - " \n", + " \n", " \n", " \n", "\n", - "\n", "\n", - " \n", + " \n", " \n", " \n", "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", - "\n", - "\n", - "\n" + "\n", + "\n" ] }, - "execution_count": 2, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -522,25 +700,28 @@ "source": [ "H = hilbertmatrix(100)\n", "U,σ,V = svd(H)\n", - "plot(σ; yscale=:log10)" + "scatter(σ; yscale=:log10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Thus we can approximate it well with a rank 20 matrix:" + "Note numerically we typically do not get a exactly zero singular values so the rank is always\n", + "treated as $\\min(m,n)$.\n", + "Because the singular values decay rapidly \n", + " we can approximate the matrix very well with a rank 20 matrix:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 6, "metadata": { "execution": { - "iopub.execute_input": "2022-02-04T14:47:13.324115Z", - "iopub.status.busy": "2022-02-04T14:47:13.323489Z", - "iopub.status.idle": "2022-02-04T14:47:14.774032Z", - "shell.execute_reply": "2022-02-04T14:47:14.773489Z" + "iopub.execute_input": "2022-02-05T14:26:30.314396Z", + "iopub.status.busy": "2022-02-05T14:26:30.313854Z", + "iopub.status.idle": "2022-02-05T14:26:31.430903Z", + "shell.execute_reply": "2022-02-05T14:26:31.430449Z" } }, "outputs": [ @@ -550,7 +731,7 @@ "1.15531964884547e-15" ] }, - "execution_count": 3, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -567,6 +748,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "Note that this can be viewed as a _compression_ algorithm: we have replaced a matrix with \n", + "$100^2 = 10,000$ entries by two matrices and a vector with $4,000$ entries without losing\n", + "any information.\n", "In the problem sheet we explore the usage of low rank approximation to smooth functions.\n", "\n", "\n", @@ -580,7 +764,7 @@ "about matrix inversion.\n", "\n", "To justify what follows, we first observe that errors in implementing matrix-vector multiplication\n", - "can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A x`\n", + "can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A*x`\n", "(implemented with floating point) is precisely $A + δA$ where $δA$ has small norm, relative to $A$.\n", "This is known as _backward error analysis_.\n", "\n", @@ -618,27 +802,25 @@ "Note that \n", "$$\n", "\\begin{align*}\n", - "{\\rm dot}(𝐱, 𝐲) = \\{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\\}⊕(x_n ⊗ y_n) \\\\\n", - " & = \\{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_1) +x_3 y_3(1+δ_3)](1+γ_2) + ⋯ \\}(1+γ_{n-1})+x_n y_n(1+δ_n) \\\\\n", - " & = x_1 y_1 (1+δ_1) \\prod_{k=1}^{n-1}(1+γ_k) + x_2 y_2 (1+δ_2) \\prod_{k=1}^{n-1}(1+γ_k) + x_3 y_3 (1+δ_3) \\prod_{k=2}^{n-1}(1+γ_k) + ⋯ +x_{n-1} y_{n-1}(1+δ_{n-1})(1+γ_{n-1}) + x_n y_n(1+δ_n) \\\\\n", - " &= x_1y_1(1+θ_n^1) + x_2y_2(1+θ_n^2)+ x_3y_3(1+θ_{n-1}) + \\cdots + x_n y_n (1+θ_1)\n", + "{\\rm dot}(𝐱, 𝐲) &= \\{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\\}⊕(x_n ⊗ y_n) \\\\\n", + " & = \\{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_2) +x_3 y_3(1+δ_3)](1+γ_3) + ⋯ +x_n y_n(1+δ_n) \\}(1+γ_n) \\\\\n", + " & = ∑_{j = 1}^n x_j y_j (1+δ_j) ∏_{k=j}^n (1 + γ_k)\n", + " & = ∑_{j = 1}^n x_j y_j (1+θ_j)\n", "\\end{align*}\n", "$$\n", - "where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ and the previous proposition tells us\n", + "where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ \n", + "(with $γ_1 = 0$). Note that $θ_j$ each have at most $n$ terms each bounded by $ϵ_{\\rm m}/2$,\n", + "Thus the previous proposition tells us\n", "$$\n", - "|θ_n^1|, |θ_n^2| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}\n", - "$$\n", - "and\n", - "$$\n", - "|θ_k| ≤ {k ϵ_{\\rm m} \\over 1-kϵ_{\\rm m}} ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}.\n", + "|θ_j| ≤ {n ϵ_{\\rm m} \\over 2- nϵ_{\\rm m}}.\n", "$$\n", "Thus\n", "$$\n", "δ𝐱 = \\begin{pmatrix} x_1 θ_n^1 \\cr x_2 θ_n^2 \\cr x_3 θ_{n-1} \\cr \\vdots \\cr x_n θ_1\\end{pmatrix}\n", "$$\n", - "and the theorem follows:\n", + "and the theorem follows from homogeneity:\n", "$$\n", - "\\| δ𝐱 \\| ≤ {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\| 𝐱 \\|\n", + "\\| δ𝐱 \\| ≤ {n ϵ_{\\rm m} \\over 2-nϵ_{\\rm m}} \\| 𝐱 \\|\n", "$$\n", "\n", "∎\n", @@ -646,16 +828,15 @@ "**Theorem (matrix-vector backward error)**\n", "For any norm,\n", "$$\n", - "{\\rm mul}(A, 𝐱) = (A + δA)^⊤ 𝐱\n", + "{\\rm mul}(A, 𝐱) = (A + δA) 𝐱\n", "$$\n", "where\n", "$$\n", - "\\|δA\\| ≤  {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}} \\|A \\|\n", + "\\|δA\\| ≤  {n ϵ_{\\rm m} \\over 2-nϵ_{\\rm m}} \\|A \\|\n", "$$\n", "\n", "**Proof**\n", - "\n", - "Follows from applying the previous lemma to each row.\n", + "Follows from applying the previous lemma to each row and homogeneity.\n", "\n", "∎\n", "\n", @@ -668,17 +849,22 @@ "It turns out we can in turns of the _condition number_ of the matrix:\n", "\n", "**Definition (condition number)**\n", - "For a square matrix $A$, the _condition number_ is\n", + "For a square matrix $A$, the _condition number_ (in $p$-norm) is\n", + "$$\n", + "κ_p(A) := \\| A \\|_p \\| A^{-1} \\|_p\n", + "$$\n", + "with the $2$-norm:\n", "$$\n", - "κ(A) := \\| A \\| \\| A^{-1} \\| = {σ_1 \\over σ_n}\n", + "κ_2(A) = {σ_1 \\over σ_n}.\n", "$$\n", "\n", "\n", "**Theorem (relative-error for matrix-vector)**\n", "The _worst-case_ relative error in $A 𝐱 ≈ (A + δA) 𝐱$ is\n", "$$\n", - "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ κ(A) {\\|δA\\| \\over \\|A \\|}\n", + "{\\| δA 𝐱 \\| \\over \\| A 𝐱 \\| } ≤ κ(A) ε\n", "$$\n", + "if we have the relative pertubation error $\\|δA\\| = \\|A \\| ε$.\n", "\n", "**Proof**\n", "We can assume $A$ is invertible (as otherwise $κ(A) = ∞$). Denote $𝐲 = A 𝐱$ and we have\n", @@ -693,7 +879,7 @@ "∎\n", "\n", "\n", - "Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\\rm m} \\over 1-nϵ_{\\rm m}}$.\n", + "Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\\rm m} \\over 2-nϵ_{\\rm m}}$.\n", "\n", "If one uses QR to solve $A 𝐱 = 𝐲$ the condition number also gives a meaningful bound on the error. \n", "As we have already noted, there are some matrices where PLU decompositions introduce large errors, so\n", @@ -711,7 +897,7 @@ "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", - "version": "1.7.1" + "version": "1.7.0" } }, "nbformat": 4, diff --git a/src/SingularValues.jmd b/src/SingularValues.jmd index 397917f..15f2994 100644 --- a/src/SingularValues.jmd +++ b/src/SingularValues.jmd @@ -2,8 +2,8 @@ In this lecture we discuss matrix and vector norms. The matrix $2$-norm involves -_singular values_, which are a measure of how matrices "stretch" vectors, similar to -eigenvalues but more robust. We also introduce condition of problems, and show that +_singular values_, which are a measure of how matrices "stretch" vectors. +We also introduce show that the singular values of a matrix give a notion of a _condition number_, which allows us to bound errors introduced by floating point numbers in linear algebra operations. @@ -15,6 +15,10 @@ the matrix $2$-norm and best low rank approximation. 4. Condition numbers: we will see how errors in matrix-vector multiplication and solving linear systems can be bounded in terms of the _condition number_, which is defined in terms of singular values. +```julia +using LinearAlgebra, Plots +``` + ## 1. Vector norms Recall the definition of a (vector-)norm: @@ -43,6 +47,8 @@ $$ **Proof** +We will only prove the case $p = 1, 2, ∞$ as general $p$ is more involved. + Homogeneity and positive-definiteness are straightforward: e.g., $$ \|c 𝐱\|_p = (\sum_{k=1}^n |cx_k|^p)^{1/p} = (|c|^p \sum_{k=1}^n |x_k|^p)^{1/p} = |c| \| 𝐱 \| @@ -67,16 +73,15 @@ $$ \| 𝐱 + 𝐲 \|^2 = \|𝐱\|^2 + 2 𝐱^⊤ 𝐲 + \|𝐲\|^2 ≤ \|𝐱\|^2 + 2\| 𝐱 \| \| 𝐲 \| + \|𝐲\|^2 = (\| 𝐱 \| + \| 𝐲 \|) $$ -For general $1 < p < ∞$, it suffices to assume $\| 𝐱 + 𝐲 \| = 1$. -consider $\| 𝐱 + 𝐲 \|^p$... ∎ In Julia can use the inbuilt `norm` function to calculate norms: ```julia - norm([1,-2,3]) == norm([1,-2,3],2) == sqrt(1^2+2^2+3^2); - norm([1,-2,3],1) == sqrt(1 + 2 + 3) + norm([1,-2,3]) == norm([1,-2,3], 2) == sqrt(1^2 + 2^2 + 3^2); + norm([1,-2,3], 1) == sqrt(1 + 2 + 3); + norm([1,-2,3], Inf) == 3; ``` @@ -86,6 +91,12 @@ consider $\| 𝐱 + 𝐲 \|^p$... $$ \|A\|_F := \sqrt{\sum_{k=1}^m \sum_{j=1}^n A_{kj}^2} $$ +This is available as `norm` in Julia: +```julia +A = randn(5,3) +norm(A) == norm(vec(A)) +``` + While this is the simplest norm, it is not the most useful. Instead, we will build a matrix norm from a vector norm: @@ -161,9 +172,17 @@ $$ $$ that is, the maximum $1$-norm of the rows. +Matrix norms are available via `opnorm`: +```julia +m,n = 5,3 +A = randn(m,n) +opnorm(A,1) == maximum(norm(A[:,j],1) for j = 1:n) +opnorm(A,Inf) == maximum(norm(A[k,:],1) for k = 1:m) +opnorm(A) # the 2-norm +``` -An example that is not simple is $\|A \|_2$, but we do have two simple cases: +An example that does not have a simple formula is $\|A \|_2$, but we do have two simple cases: **Proposition (diagonal/orthogonal 2-norms)** If $Λ$ is diagonal with entries $λ_k$ then $\|Λ\|_2 = \max_k |λ_k|.$. If $Q$ is orthogonal then $\|Q\| = 1$. @@ -195,7 +214,7 @@ $$ $$ where $σ_k = 0$ if $k > r$. -To show the SVD exists we first establish some properties of a Gram matrix: +To show the SVD exists we first establish some properties of a _Gram matrix_ ($A^⊤A$): **Proposition (Gram matrix kernel)** The kernel of $A$ is the also the kernel of $A^⊤ A$. @@ -227,11 +246,14 @@ $$ This connection allows us to prove existence: -**Theorem (SVD existence)** Every $A ∈ ℝ^{m× n}$ has an SVD. +**Theorem (SVD existence)** Every $A ∈ ℝ^{m × n}$ has an SVD. **Proof** - -Assume the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$ +Consider +$$ +A^⊤ A = Q Λ Q^⊤. +$$ +Assume (as usual) that the eigenvalues are sorted in decreasing modulus, and so $λ_1,…,λ_r$ are an enumeration of the non-zero eigenvalues and $$ V := \begin{bmatrix} 𝐪_1 | ⋯ | 𝐪_r \end{bmatrix} @@ -249,7 +271,7 @@ Now define $$ U := AV Σ^{-1} $$ -which is orthogonal since $A^⊤ A V = Σ^2 V$: +which is orthogonal since $A^⊤ A V = V Σ^2 $: $$ U^⊤ U = Σ^{-1} V^⊤ A^⊤ A V Σ^{-1} = I. $$ @@ -267,9 +289,9 @@ Singular values tell us the 2-norm: $$ \|A \|_2 = σ_1 $$ -and if $A$ is invertible, then +and if $A ∈ ℝ^{n × n}$ is invertible, then $$ -\|A^{-1} \|_2 = σ_r^{-1} +\|A^{-1} \|_2 = σ_n^{-1} $$ **Proof** @@ -286,7 +308,7 @@ The inverse result follows since the inverse has SVD $$ A^{-1} = V Σ^{-1} U^⊤ = V (W Σ^{-1} W) U^⊤ $$ -where +is the SVD of $A^{-1}$, where $$ W := P_σ = \begin{bmatrix} && 1 \\ & ⋰ \\ 1 \end{bmatrix} $$ @@ -321,10 +343,8 @@ $$\|A - A_k\|_2 ≤ \|A -B \|_2.$$ We have $$ -\|A - A_k\|_2 = \|U \begin{bmatrix} 0 \cr &\ddots \cr && 0 \cr &&& σ_{k+1} \cr &&&& \ddots \cr &&&&& σ_n \cr &&&&&\vdots \cr &&&&&0\end{bmatrix} = σ_{r+1} +A - A_k = \|U \begin{bmatrix} 0 \cr &\ddots \cr && 0 \cr &&& σ_{k+1} \cr &&&& \ddots \cr &&&&& σ_r\end{bmatrix} V^⊤. $$ - - Suppose a rank-$k$ matrix $B$ has $$ \|A-B\|_2 < \|A-A_k\|_2 = σ_{k+1}. @@ -334,7 +354,7 @@ $$ \|A 𝐰\|_2 = \|(A-B) 𝐰\|_2 ≤ \|A-B\|\|𝐰\|_2 < σ_{k+1} \|𝐰\|_2 $$ -But for all $𝐮 \in {\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:r+1]𝐜$ for some $𝐜 \in ℝ^{k+1}$ we have +But for all $𝐮 \in {\rm span}(𝐯_1,…,𝐯_{k+1})$, that is, $𝐮 = V[:,1:k+1]𝐜$ for some $𝐜 \in ℝ^{k+1}$ we have $$ \|A 𝐮\|_2^2 = \|U Σ_k 𝐜\|_2^2 = \|Σ_k 𝐜\|_2^2 = \sum_{j=1}^{k+1} (σ_j c_j)^2 ≥ σ_{k+1}^2 \|c\|^2, @@ -349,7 +369,6 @@ Since these two spaces cannot intersect we have a contradiction, since $(n-r) + Here we show an example of a simple low-rank approximation using the SVD. Consider the Hilbert matrix: ```julia -using LinearAlgebra, Plots function hilbertmatrix(n) ret = zeros(n,n) for j = 1:n, k=1:n @@ -393,7 +412,7 @@ the exact nature of floating point arithmetic. Here we consider only matrix-mult about matrix inversion. To justify what follows, we first observe that errors in implementing matrix-vector multiplication -can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A x` +can be captured by considering the multiplication to be exact on the wrong matrix: that is, `A*x` (implemented with floating point) is precisely $A + δA$ where $δA$ has small norm, relative to $A$. This is known as _backward error analysis_. @@ -431,27 +450,25 @@ $$ Note that $$ \begin{align*} -{\rm dot}(𝐱, 𝐲) = \{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\}⊕(x_n ⊗ y_n) \\ - & = \{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_1) +x_3 y_3(1+δ_3)](1+γ_2) + ⋯ \}(1+γ_{n-1})+x_n y_n(1+δ_n) \\ - & = x_1 y_1 (1+δ_1) \prod_{k=1}^{n-1}(1+γ_k) + x_2 y_2 (1+δ_2) \prod_{k=1}^{n-1}(1+γ_k) + x_3 y_3 (1+δ_3) \prod_{k=2}^{n-1}(1+γ_k) + ⋯ +x_{n-1} y_{n-1}(1+δ_{n-1})(1+γ_{n-1}) + x_n y_n(1+δ_n) \\ - &= x_1y_1(1+θ_n^1) + x_2y_2(1+θ_n^2)+ x_3y_3(1+θ_{n-1}) + \cdots + x_n y_n (1+θ_1) +{\rm dot}(𝐱, 𝐲) &= \{ [(x_1 ⊗ y_1) ⊕ (x_2 ⊗ y_2)] ⊕(x_3⊗ y_3)] ⊕⋯\}⊕(x_n ⊗ y_n) \\ + & = \{ [(x_1 y_1)(1+δ_1) + (x_2 y_2)(1+δ_2)](1+γ_2) +x_3 y_3(1+δ_3)](1+γ_3) + ⋯ +x_n y_n(1+δ_n) \}(1+γ_n) \\ + & = ∑_{j = 1}^n x_j y_j (1+δ_j) ∏_{k=j}^n (1 + γ_k) + & = ∑_{j = 1}^n x_j y_j (1+θ_j) \end{align*} $$ -where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ and the previous proposition tells us +where we denote the errors from multiplication as $δ_k$ and those from addition by $γ_k$ +(with $γ_1 = 0$). Note that $θ_j$ each have at most $n$ terms each bounded by $ϵ_{\rm m}/2$, +Thus the previous proposition tells us $$ -|θ_n^1|, |θ_n^2| ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}} -$$ -and -$$ -|θ_k| ≤ {k ϵ_{\rm m} \over 1-kϵ_{\rm m}} ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}}. +|θ_j| ≤ {n ϵ_{\rm m} \over 2- nϵ_{\rm m}}. $$ Thus $$ δ𝐱 = \begin{pmatrix} x_1 θ_n^1 \cr x_2 θ_n^2 \cr x_3 θ_{n-1} \cr \vdots \cr x_n θ_1\end{pmatrix} $$ -and the theorem follows: +and the theorem follows from homogeneity: $$ -\| δ𝐱 \| ≤ {n ϵ_{\rm m} \over 1-nϵ_{\rm m}} \| 𝐱 \| +\| δ𝐱 \| ≤ {n ϵ_{\rm m} \over 2-nϵ_{\rm m}} \| 𝐱 \| $$ ∎ @@ -459,16 +476,15 @@ $$ **Theorem (matrix-vector backward error)** For any norm, $$ -{\rm mul}(A, 𝐱) = (A + δA)^⊤ 𝐱 +{\rm mul}(A, 𝐱) = (A + δA) 𝐱 $$ where $$ -\|δA\| ≤  {n ϵ_{\rm m} \over 1-nϵ_{\rm m}} \|A \| +\|δA\| ≤  {n ϵ_{\rm m} \over 2-nϵ_{\rm m}} \|A \| $$ **Proof** - -Follows from applying the previous lemma to each row. +Follows from applying the previous lemma to each row and homogeneity. ∎ @@ -481,17 +497,22 @@ if we know a bound on $\|δA\|$? It turns out we can in turns of the _condition number_ of the matrix: **Definition (condition number)** -For a square matrix $A$, the _condition number_ is +For a square matrix $A$, the _condition number_ (in $p$-norm) is +$$ +κ_p(A) := \| A \|_p \| A^{-1} \|_p +$$ +with the $2$-norm: $$ -κ(A) := \| A \| \| A^{-1} \| = {σ_1 \over σ_n} +κ_2(A) = {σ_1 \over σ_n}. $$ **Theorem (relative-error for matrix-vector)** The _worst-case_ relative error in $A 𝐱 ≈ (A + δA) 𝐱$ is $$ -{\| δA 𝐱 \| \over \| A 𝐱 \| } ≤ κ(A) {\|δA\| \over \|A \|} +{\| δA 𝐱 \| \over \| A 𝐱 \| } ≤ κ(A) ε $$ +if we have the relative pertubation error $\|δA\| = \|A \| ε$. **Proof** We can assume $A$ is invertible (as otherwise $κ(A) = ∞$). Denote $𝐲 = A 𝐱$ and we have @@ -506,7 +527,7 @@ $$ ∎ -Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\rm m} \over 1-nϵ_{\rm m}}$. +Thus for floating point arithmetic we know the error is bounded by $κ(A) {n ϵ_{\rm m} \over 2-nϵ_{\rm m}}$. If one uses QR to solve $A 𝐱 = 𝐲$ the condition number also gives a meaningful bound on the error. As we have already noted, there are some matrices where PLU decompositions introduce large errors, so diff --git a/src/week5s.jmd b/src/week5s.jmd index 46a3528..1704d17 100644 --- a/src/week5s.jmd +++ b/src/week5s.jmd @@ -35,4 +35,47 @@ $$ \end{bmatrix} $$ Deduce its two Cholesky decompositions: $Δ_n = L_n L_n^⊤ = U_n U_n^⊤$ where -$L_n$ is lower triangular and $U_n$ is upper triangular. \ No newline at end of file +$L_n$ is lower triangular and $U_n$ is upper triangular. + +## 2. Matrix norms + +**Problem 2.1⋆** Prove the following: +$$ +\begin{align*} +\|A\|_∞ &= \max_k \|A[k,:]\|_1 \\ +\|A\|_{1 → ∞} &= \|\hbox{vec}(A)\|_∞ = \max_{kj} |a_{kj}| +\end{align*} +$$ + +**Problem 2.2⋆** + +## 3. Singular value decomposition + +**Problem 3.1** Consider functions sampled on a $(n+1) × (n+1)$ 2D grid +$(x_k,y_j) = (k/n, j/n)$ where $k,j = 0,…,n$. +For $n = 100$, what is the lowest rank $r$ such that +the best rank-$r$ approximation to the samples +that is accurate to within $10^{-5}$ accuracy for the following functions: +$$ +(x + 2y)^2, \cos(\sin x \E^y), 1/(x + y + 1), $\sign(x-y)$ +$$ +For which examples does the answer change when $n = 1000$? + +**SOLUTION** + +```julia +n = 100 +x = y = range(0, 1; length=n) + +f = (x,y) -> (x + 2y)^2 +g = (x,y) -> cos(sin(x)exp(y)) +h = (x,y) -> 1/(x + y + 1) +s = (x,y) -> sign(x-y) + +svdvals(f.(x, y')) # rank 3 +svdvals(g.(x, y')) # rank 5 +svdvals(h.(x, y')) # rank 4 +svdvals(s.(x, y')) # rank n +``` + +**END** diff --git a/src/week6s.jmd b/src/week6s.jmd new file mode 100644 index 0000000..1bdc6b8 --- /dev/null +++ b/src/week6s.jmd @@ -0,0 +1,10 @@ + + +## 1. Condition numbers + + +**Problem 1.1⋆** Prove that, if $|ϵ_i| ≤ ϵ$ and $n ϵ < 1$, then +$$ +\prod_{k=1}^n (1+ϵ_i) = 1+θ_n +$$ +for some constant $θ_n$ satisfying $|θ_n| ≤ {n ϵ \over 1-nϵ}$.