Merge branch 'main' of ../../Foundations in Data Analysis

ila · Aug 3, 2022 · 4812308 · 4812308
2 parents 85a3af3 + f7bfd84
commit 4812308
Show file tree

Hide file tree

Showing 13 changed files with 593 additions and 2 deletions.
diff --git a/Foundations of Data Analysis/SoSe2022/Obsidian Notes/1 Basics.md b/Foundations of Data Analysis/SoSe2022/Obsidian Notes/1 Basics.md
@@ -1,5 +1,5 @@
-
-%% Lecture 1, 02.05. %%
+# 1 Basics
+%% #Lecture 1, 02.05. %%
 
 Motivation: to understand phenomena in deep learning, mathematical concepts needed are *linear algebra*, *nonlinear optimization*, *probability and statistics*.
 

diff --git a/...ions of Data Analysis/SoSe2022/Obsidian Notes/2.1 SVD - Principal Components.md b/...ions of Data Analysis/SoSe2022/Obsidian Notes/2.1 SVD - Principal Components.md
@@ -1,4 +1,5 @@
 # 2 Singular Value Decomposition
+%% #Lecture 2, 03.05. %%
 ## 2.1 Principal Components
 Assume a $n$-dim. random vector $X$, $E(X) = 0$.
 

diff --git a/Foundations of Data Analysis/SoSe2022/Obsidian Notes/3.2 TODO.md b/Foundations of Data Analysis/SoSe2022/Obsidian Notes/3.2 TODO.md
@@ -0,0 +1 @@
+#TODO 
diff --git a/Foundations of Data Analysis/SoSe2022/Obsidian Notes/3.3 TODO.md b/Foundations of Data Analysis/SoSe2022/Obsidian Notes/3.3 TODO.md
@@ -0,0 +1 @@
+#TODO 
diff --git a/...2022/Obsidian Notes/4 Dimensionality Reduction - Johnson-Lindenstrauss Lemma.md b/...2022/Obsidian Notes/4 Dimensionality Reduction - Johnson-Lindenstrauss Lemma.md
@@ -0,0 +1,93 @@
+# 4. Dimensionality Reduction
+%% #Lecture 15, 04.07. [[Foundations Section 4 printout.pdf]] %%
+## 4.1 The Johnson-Lindenstrauss Lemma
+
+> **Theorem 13.1** (*JL Lemma*)
+> For $\epsilon \in (0, 1)$ and $n \in \mathbb N$, let $k \in \mathbb N$ such that $$k \geq C(\epsilon^{-2}/2 - \epsilon^3/3)^{-1} \ln n$$
+> for some $\beta \geq 2$ (where do we use $\beta$? Does he mean $C$? #TODO)
+> Then for any set $\mathcal P$ of $n$ points in $\mathbb R^d$, there is a map $\mathbb R^d \to \mathbb R^k$ such that
+> $$\forall v,w \in \mathcal P: (1-\epsilon)||v-w||_2^2 \leq ||f(v) - f(w)||_2^2 \leq (1+\epsilon)||v-w||_2^2$$
+> $f$ can be chosen as a linear map.
+
+^9f9398
+
+>**Theorem 13.2** (*Probabilistic JL Lemma*)
+> Let $\epsilon, n, k$ and $\beta \geq 2$ be such that
+> $$k \geq 2\beta(\epsilon^{-2}/2 - \epsilon^3/3)^{-1} \ln n$$
+> ...or $k \geq 2 \beta \epsilon^{-2} \ln n$ for Gaussians.
+> Then there is a matrix-valued random variable $\Phi$ with values in $\mathbb R^{k \times d}$ such that for any set of $n$ points $\mathcal P$ in $\mathbb R^d$, it holds that
+> $$(1-\epsilon) ||v-w||_2^2 \leq ||\Phi v - \Phi w||_2^2 \leq (1+\epsilon) ||v-w||_w^2$$
+> with probability $1-(n^{2-\beta} - n^{1-\beta})$.
+
+^ecd655
+
+Remark on probability: $\beta \gg 2$ leads to a favorable probability of success.
+
+##### Some intuition on the numbers
+Consider a point cloud of $n=10^7$ points of dimension $d=10^6$, we can project them to dimension $k = O(\epsilon^{-2} \log 10^6) = O(6 \epsilon^{-2})$, preserving the distances with a maximal distortion of $O(\epsilon)$. For example for $\epsilon = 0.1$, we get $k$ of order of a thousand (3-4 orders of magnitude in reduction!).
+
+#### Towards proving the JL Lemma
+##### JL for Gaussian random matrices
+Consider $(\Phi_{ij})_{i,j}$ with $\Phi_{ij} \sim \mathcal N(0, 1)$. We aim to show that $\tfrac{1}{\sqrt{k}} \Phi$ is a JL embedding with high probability. We then proceed using the union bound.
+
+> **Lemma 13.3**
+> Let $x \in \mathbb R^d$, assume $\Phi \in \mathbb R^{k\times d}$ has entries sampled from a standard Guassian. Then
+> $$P\left( \left| ||\tfrac{1}{\sqrt k} \Phi_x ||_2^2 - ||x||_2^2 \right| > \epsilon ||x||_2^2 \right) \leq 2\exp(-\tfrac{\epsilon^2-\epsilon^3k}{4})$$ or, equivalently, #TODO 
+
+^06b03b
+
+###### #Proof of [[#^06b03b|Lemma 13.3]]
+Step 1: show that for for any $x$, $\mathbb E\left( ||\tfrac{1}{\sqrt k} \Phi_x ||_2^2 \right) = ||x||_2^2$.
+$$\mathbb ||1/\sqrt{k} \Phi x||_2^2
+= (1/k) \sum_{j=1}^k \mathbb E \left(\sum_{\ell=1}^d \Phi_{j\ell}x_\ell\right)\left(\sum_{\ell'=1}^d \Phi_{j\ell'}x_{\ell'}\right)$$
+... where things cancel due to orthonormality in expectation (right?) and we get $\dots = ||x||_2^2$. Namely, $\mathbb E(\Phi_{j\ell} \Phi_{j\ell'}) = \delta_{\ell \ell'}$.
+
+Step 2: we note that $||\tfrac{1}{\sqrt{k}} \Phi x||_2^2 = \tfrac{1}{k} \sum_{j=1}^k (\Phi x)_j^2$. By rotation invariance, $Z_j = \tfrac{(\Phi x)_j}{||x||_2} \sim \mathcal N(0,1)$.
+
+Step 3: apply Markov's ineq. By Markov, we get 
+$$P\left( \sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right)
+= P\left( \exp\left(\theta \sum_{j=1}^k Z_j^2\right) > \exp(\theta (1+\epsilon) k)  \right)$$
+$$\leq e^{-(1+\epsilon) k \theta} \mathbb E\left(\exp\left(\theta \sum_{j=1}^k Z_j^2\right)\right)
+= e^{-(1+\epsilon) k \theta}\prod_{j=1}^k \mathbb E\left( \exp(\theta Z_j^2) \right)$$
+Compute the expectation as $$1/\sqrt{2\pi}\int \exp(\theta g^2 - g^2/2) \,dg = 1/\sqrt{2\pi} \int \exp(-g^2/2(1-2\theta))\,dg$$
+$$1/\sqrt{2\pi}\int\exp(-t^2/2)\,dt \frac{1}{1-2\theta} = \frac{1}{1-2\theta}$$
+(provided that $\theta \in (0, \tfrac{1}{2}]$). Hence the main computation continues as
+$$\dots = e^{-(1+\epsilon) k \theta} \cdot \left(\tfrac{1}{1-2\theta}\right)^{k/2}$$
+Step 4: choose $\theta$.  Choosing $\theta = \tfrac{\epsilon}{2(1+\epsilon)}$ minimizes the above expression and also is less than $1/2$. We have
+$$P\left( \sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) \leq \left( (1+\epsilon)e^{-\epsilon} \right)^{\tfrac{k}{2}} \leq \exp(-\tfrac{k}{4}(\epsilon^2-\epsilon^3))$$
+using the estimate $1+\epsilon \leq \exp(\epsilon-(\epsilon^2 - \epsilon^3/2))$ (which can be proved e.g. by applying $\log$ to both sides and a well-known power series).
+
+In a similar way, one can prove the the other direction $$P\left( \sum_{j=1}^k Z_j^2 < (1-\epsilon)k \right) \leq \exp(-\tfrac{k}{4}(\epsilon^2-\epsilon^3))$$
+Finally, step 5: combining the pieces.
+$$P\left( ||\tfrac{1}{\sqrt k} \Phi x||_2^2 > (1+\epsilon)||x||_2^2 \right) = P\left(\sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) \leq \exp(-\tfrac{k}{4} (\epsilon^2-\epsilon^3)$$
+and similarly, $$P\left( ||\tfrac{1}{\sqrt k} \Phi x||_2^2 < (1-\epsilon)||x||_2^2 \right) = P\left(\sum_{j=1}^k Z_j^2 < (1-\epsilon)k \right) \leq \exp(-\tfrac{k}{4} (\epsilon^2-\epsilon^3)$$
+We conclude by combining the two cases with the union bound.
+
+#Exam especially the part about rotation invariance could be relevant for the exam.
+
+###### #Proof of the JL Lemma [[#^ecd655|Theorem 13.2]]
+- Choose $f$ as a random linear function $f(x) = \tfrac{1}{\sqrt k} \Phi x$, where $\Phi \in \mathbb R^{k\times d}$ has entries sampled from $\mathcal N(0, 1)$.
+- There are $\binom{n}{2}$ pairs $(v, w)$ of points from $\mathcal P$.
+- Hence we can apply the union bound: the probability that some pair $(v, w) \in \mathcal P^2$ will fail the inequality $(1-\epsilon)||v-w||_2^2 \leq ||\Phi v - \Phi w||_2^2 \leq (1+\epsilon)||v-w||_2^2$ is given by the probability $$P\left(\exists v, w \in \mathcal P: \left| \tfrac{1}{k} ||\Phi v - \Phi w||_2^2 - ||v-w||_2^2 \right| \geq \epsilon ||v-w||_2^2\right)$$ $$\dots \leq 2 \binom{n}{2} e^{-(\epsilon^2 - \epsilon^3)\tfrac{k}{4}} = 2 \binom{n}{2} n^{-\beta(1-\epsilon)} = n^{2-\beta(1-\epsilon)} - n^{1-\beta(1-\epsilon)}$$ (where in the last steps, the $\log$ etc. from the theorem statement comes into play)
+- Hence with probability $1-(n^{2-\beta(1-\epsilon)} - n^{1-\beta(1-\epsilon)})$ we have $(1-\epsilon) ||v-w||_2^2 \leq ||f(v) - f(w)||_2^2 \leq (1+\epsilon)||v-w||_2^2$ for all $v, w \in \mathcal P$.
+
+### Nothing special about Gaussian matrices
+If we choose the entries in $\Phi$ to be $U(\{-1, +1\})$-distributed, we get comparable results. Full lemma statement: see *Lemma 13.4* on the slides (not gone into detail here).
+
+### Scalar Product Preservation
+%% #Lecture 16, 05.07. [[Foundations Section 4 printout.pdf]] %%
+Under the transformation of a JL random matrix, scalar products of unit vectors are approximately preserved.
+
+> **Corollary 13.5** (*Scalar product preservation*)
+> Let $u, v$ be two points in the real $d$-dim. unit ball $B(1)$ and $\Phi \in \mathbb R^{k\times d}$ a matrix with random entries that satisfies a JL-like inequality,
+> $$P\left( (1-\epsilon)||x||_2^2 \leq ||\tfrac{1}{\sqrt k} \Phi x||_2^2 \leq (1+\epsilon) ||x||_2^2\right) \geq 1-2e^{-(\epsilon^2-\epsilon^3)k/4}$$
+> (e.g. if $\Phi$ is a matrix with Gaussian random entries, or with entries $U(\{-1, +1\})$-distributed).
+> Then with probability at least $1 - 4e^{-(\epsilon^2 - \epsilon^3)k/4}$, for $u, v \in \mathcal P$, $$|\langle u, v \rangle - \langle \Phi u, \Phi v \rangle | \leq \epsilon$$
+
+^99bd43
+###### #Proof of [[#^99bd43|Corollary 13.5]]
+We have
+$$4 \langle \Phi u, \Phi v\rangle = ||\Phi (u+v)||_2^2 - ||\Phi(u-v)||_2^2
+\geq (1-\epsilon)||u+v||_2^2 - (1+\epsilon) ||u-v||_2^2$$$$\dots = 4 \langle u, v \rangle - 2 \epsilon (||u||_2^2 + ||v||_2^2)
+\geq 4 \langle u, v \rangle - 4 \epsilon$$
+and the other direction is anologous.
diff --git a/...dations of Data Analysis/SoSe2022/Obsidian Notes/5.1 Convexity - Convex Sets.md b/...dations of Data Analysis/SoSe2022/Obsidian Notes/5.1 Convexity - Convex Sets.md
@@ -0,0 +1,37 @@
+# 5 Convex Analysis
+## 5.1 Convex Sets
+A *convex set* is a set $K \subseteq \mathbb R^N$ where for all $x, z \in K$ and all $t \in [0, 1]$, $tx + (1-t)z \in K$. By induction, it follows that all convex combinations of several elements in the set are also in the set.
+
+The *convex hull* $\text{conv}(T)$ of a set $T$ is the smallest convex set containing $T$, or equivalently the set of all convex combinations of elements from $T$.
+
+### Cones
+A set $K\subseteq \mathbb R^N$ is called a *cone* if for all $x \in K$ and all $t \geq 0$, $tx$ is contained in $K$. A cone that is convex is called *convex cone*, and $K$ is a convex cone iff for all $x, z \in K$ and $t, s \geq 0$, also $sx +tz \in K$.
+- for example, the set of positive semidefinite matrices in $\mathbb R$
+- or the *positive orthant* $\mathbb R_+^N = \{x \mid x_i \geq 0 \forall i \in [N] \}$
+
+##### Dual Cones
+For a cone $K$, its dual cone $K^\ast$ is defined as
+$$K^\ast = \{ z \in \mathbb R^N : \langle x, z \rangle \geq 0 ~~ \forall x \in K\}$$
+- $K^\ast$ is closed and convex (as intersection of half-spaces), and furthermore again a cone.
+- If $K$ is a closed cone, then $K^{\ast\ast} = K$.
+- If $H, K$ are cones and $H \subseteq K$, then $K^\ast \subseteq H^\ast$
+
+For example, the positive orthant is self-dual.
+
+##### Polar Cones
+Given a cone $K$, the polar cone is defined as
+$$K^\circ = \{ z \in \mathbb R^N \mid \langle x, z \rangle \leq 0 ~~\forall x \in K \} = -K^\ast$$
+##### Conic Hull
+The *conic hull* $\text{cone}(T)$ of a set $T$ is the smallest convex cone containing $T$, or equivalently the set of all conic combinations of elements from $T$, i.e. $$\text{cone}(T) = \left\{ \sum_j t_j x_j : t_j \geq 0, x_j \in T \right\}$$
+### Geometric Hahn-Banach Theorem
+Hahn-Banach: "Non-overlapping convex sets can be separated by hyperplanes".
+
+> **Theorem 14.4** (*Finite-dimensional Hahn-Banach Theorem*)
+> Let $K_1, K_2 \subseteq \mathbb R^N$ be convex sets with disjoint interiors. Then there exist $w, \lambda$ such that
+> $$K_1 \subseteq \{ x \in \mathbb R^N \mid \langle x, w \rangle \leq \lambda \}, \quad K_2 \subseteq \{ x \in \mathbb R^N \mid \langle x, w \rangle \geq \lambda \}.$$
+
+### Extreme Points
+Let $K \subseteq \mathbb R^N$ be a convex set. A point $x \in K$ is called *extreme point* of $K$ if the only way to represent $x$ as convex combination $x = tw + (1-t)z$ for $w, z \in K$, $t \in (0, 1)$ is by choosing $x = w = z$. Note: this is *not* the same as the boundary (e.g. for a convex polygon, only the corners are extreme points, not all points on the boundary)
+
+> **Theorem 14.6**
+> A compact convex set is the convex hull of its extreme points.
diff --git a/...ns of Data Analysis/SoSe2022/Obsidian Notes/5.2 Convexity - Convex Functions.md b/...ns of Data Analysis/SoSe2022/Obsidian Notes/5.2 Convexity - Convex Functions.md
@@ -0,0 +1,108 @@
+## 5.2 Convex Functions
+### Extended-valued functions
+To model inadmissible points: work with *extended-valued functions* $F: \mathbb R^N \to (-\infty, \infty]$. Define $F(x) = \infty$ at non-admissible points $x$: none of these points will ever be a minimizer.
+
+This extension of a function $F: K \to \mathbb R$ to the domain of all of $\mathbb R^N$ is called *canonical extension*.
+
+The domain of an extended-valued function $F$ is defined as $\text{dom}(F) = \{x \in \mathbb R^N : F(x) \neq -\infty\}$. A function with non-empty domain is called *proper*.
+
+### Convex functions
+> **Definition 15.1** (*Convex function*)
+> An extended-valued function $F$ is called
+> - *convex* if for all $x, z$ and $t$, $$F(tx + (1-t)z) \leq tF(x) + (1-t)F(z)$$
+> - *strictly convex* if above inequality holds strictly for $x \neq z, t \in (0, 1)$
+> - *strongly convex* with parameter $\gamma > 0$ if for all $x, z$ and $t$, $$F(tx + (1-t)z) \leq tF(x) + (1-t)F(z) - \tfrac{\gamma}{2} t (1-t) ||x-z||_2^2$$
+> $F$ is called (strictly; strongly) concave if $-F$ is (strictly; strongly) convex.
+> A function $F: K \to \mathbb R$ on a convex subset $K \subseteq \mathbb R^N$ is called convex if its canonical extension is convex.
+
+Strongly convex functions are always strictly convex, and strictly convex functions are always convex. Convex functions always have convex domains.
+
+$F$ is convex if and only if its *epigraph* $\text{epi}(F) = \{(x, r) \mid r \geq F(x)\}$ is a convex set.
+
+##### Smooth convex functions
+If $F$ is differentiable, its convexity can further be characterized as follows:
+
+> **Proposition 15.2** (*Smooth convex functions*)
+> Let $F: \mathbb R^N \to \mathbb R$ be differentiable.
+> 1. $F$ is convex iff for all $x, y$: $$F(x) \geq F(y) + \langle \nabla F(y), x-y \rangle$$, where $\nabla F(y) = (\partial_{y_1}F(y),\dots,\partial_{y_n}F(y))^\top$
+> 2. $F$ is strongly convex with $\gamma > 0$ if for all $x, y$, $$F(x) \geq F(y) + \langle \nabla F(y), x-y \rangle + \frac{\gamma}{2}(t)(1-t)||x-z||_2^2$$
+> 3. If $F$ is twice differentiable, then $F$ is convex iff for all $x$,$$\nabla^2 F(x) \succcurlyeq 0$$ where $\nabla^2 F$ is the Hessian of $F$
+
+##### Composition of Convex functions
+> **Proposition 15.3** (*Compositions of convex functions*)
+> 1. If $F, G$ are convex functions on $\mathbb R^N$, then for all $\alpha, \beta \geq 0$, $\alpha F + \beta G$ is convex.
+> 2. Let $F$ be convex and non-decreasing, let $G$ be convex. Then $H(x) = F(G(X))$ is convex.
+
+##### Examples of convex functions
+- norms $||\cdot||$ on $\mathbb R^N$ are always convex, by the triangle inequality and homogenity
+- the $\ell_p$-norms are strictly convex for $p \in (1, \infty)$, and *not* strictly convex for $p \in \{1, \infty\}$
+- if $A \in \mathbb R^{N \times N}$ is positive semi-definite, the function $F(x) = x^\top Ax$ is convex. If $A$ is positive definite, $F$ is strictly convex.
+- For a convex set $K$, the characteristic function $\chi_K(x) = 0$ if $x \in K$ and $\infty$ otherwise is convex (watch out, this is defined in a different way than usual!) ^d5bfc2
+
+%% #Lecture 17, 11.07. [[Foundations Section 5.2 printout.pdf]] %%
+
+##### Convexity and continuity
+> **Proposition 15.4**
+> Convex functions $F: \mathbb R^N \to \mathbb R$ (which are **not** extended-valued functions) are continuous.
+
+For extended-valued functions, one uses *lower semi-continuity*:
+
+> **Definition 15.5** (*Lower semicontinuity*)
+> An [[#Extended-valued functions|extended-valued function]] $F$ is *lower semicontinuous* if for all $x$ and every sequence $x_j \to x$, it holds that $$\lim\inf_{j} F(x_j) \geq F(x)$$
+> A function is lower semicontinuous iff its epigraph is closed.
+
+Examples:
+- continuous functions are lower semicontinuous.
+- $\chi_K$ ([[#^d5bfc2|definition]]) is not continuous, but lower semicontinuous iff $K$ is closed
+	- $K$ closed, $x$ in $K$: for a sequence $(x_j)_j$ converging to $x$ from inside $K$, $\lim\inf F(x_j) = 0 = F(x)$ and from outside, $\lim\inf F(x_j) = \infty \geq 0 = F(x)$.
+	- $K$ non-closed, $x \notin K$ that is a limit point: Find sequence $x_j \to x$ inside $K$, then $\lim\inf F(x_j) = 0 < \infty = F(x)$.
+
+Lower semicontinuity is particularly useful in *infinite-dimensional* Hilbert spaces (e.g. $||\cdot||$ can be non-continuous wrt. the weak topology, but still lower semicontinuous).
+
+### Minimizing convex functions
+> **Proposition 15.6**
+> Let $F$ be a convex extended-valued function. Then:
+> 1. Every local minimum of $F$ is a global minimum.
+> 2. The set of minima of $F$ is a convex set.
+> 3. If $F$ is strictly convex, the minimum is unique.
+
+^f3aef6
+
+###### #Proof of [[#^f3aef6|Proposition 15.6]]
+Statement 1:
+Let $x$ be a local minimum and assume for some $z$, $F(z) < F(x)$. Then for all $t \in (0, 1)$, $F(xt + z(1-t)) \leq t F(x) + (1-t) F(z) < F(x)$. In particular, in every local neighborhood of $x$ there is a point $y$ with lower function value $F(y) < F(x)$, a contradiction.
+
+Statement 2:
+Let $F(x) = F(y)$, where $x, y$ are two (local and global) minima of $F$. Then $F(tx + (1-t)y) \leq tF(x) + (1-t)F(y) = t F(x) + (1-t) F(x) = F(x)$. Hence every point on the line between $x, y$ is also a minimizer.
+
+Statement 3:
+Suppose $x \neq y$ are both minima. Then by strict convexity, $F(xt + (1-t)y) < t F(x) + (1-t) F(y) = F(x)$, i.e. every convex combination has a *strictly smaller* function value than the minima, a contradiction.
+
+
+### Jointly convex functions
+A function $f(x, y)$ of *two arguments* $x \in \mathbb R^n$, $y \in \mathbb R^m$ is *jointly convex* if it is convex as a function of the variable $z = (x,y)$. #TODO What is the purpose of this definition? I think it is mostly "syntactic sugar", since it is equivalent to talking about convexity of the corresponding function $f: \mathbb R^{n+m} \to \mathbb R$.
+
+Example: $f(x, y) = xy$ is "marginally" (or elelement-wise) convex, but not jointly convex. (I think because it is not convex in the "direction" (1, -1)). See [here](https://www.researchgate.net/figure/A-marginally-convex-function-is-not-necessarily-jointly-convex-The-function-f-x-y_fig2_339504312).
+
+> **Theorem 15.7**
+> Let $f$ be an extended-valued, *jointly convex* function. Then the function $g(x) = \inf_{y \in \mathbb R^m} f(x, y)$, $x \in \mathbb R^n$, is convex.
+
+^e2d3b3
+
+###### #Proof of [[#^e2d3b3|Theorem 15.7]]
+For $t \in [0, 1]$, by joint convexity, $$g(tx_1 + (1-t)x_2) \leq f(tx_1 + (1-t)x_2, t y_1 + (1-t)y_2)$$$$\leq t f(x_1, y_1) + (1-t) f(x_2, y_2) = t g(x_1) + (1-t) g(x_2)$$
+which is the claim.
+
+### *Maxima* of convex function
+We can also say something about *maxima* on *compact convex sets*:
+
+> **Theorem 15.8**
+> Let $K \subseteq \mathbb R^N$ be compact and convex, and $F: K \to \mathbb R$ be a convex function. Then $F$ attains its maximum at an *[[5.1 Convexity - Convex Sets#Extreme Points|extreme point]]* of $K$.
+
+^86df37
+
+###### #Proof of [[#^86df37|Theorem 15.8]]
+Let $x \in K$ be a maximum in $K$. Since $K$ is the convex hull of its extreme points, we can write $x = \sum_j^m t_i x_i$ for some $m$ and some (*not necessarily all!*) extreme points $x_j$. Then
+
+$$F(x) = \sum_j t_j F(x_j) \leq \sum_j t_j F(x) = F(x)$$
+and therefore, $F(x_j) = F(x)$ for all of the points $x_j$.
diff --git a/...ns of Data Analysis/SoSe2022/Obsidian Notes/5.3 Convexity - Convex Conjugate.md b/...ns of Data Analysis/SoSe2022/Obsidian Notes/5.3 Convexity - Convex Conjugate.md
@@ -0,0 +1,31 @@
+## 5.3 The Convex Conjugate
+
+^0a5cd2
+
+> **Definition 16.1** (*Convex conjugate*).
+> Let $F$ be an extended-valued function. Then its *convex conjugate* or *Fenchel dual* $F^\ast: \mathbb R^N \to (-\infty, \infty]$ is defined by
+> $$F^\ast(y) = \sup_{x \in \mathbb R^N} \left(\langle x, y\rangle - F(x)\right)$$
+
+- $F^\ast$ is always a convex function, regardless of $F$ being convex
+- We directly get the *Fenchel-Young inequality* $$\langle x, y \rangle \leq F(x) + F^\ast(y)$$ ^d415ba
+
+> **Proposition 16.2** (*Properties of the convex conjugate*)
+> Let $F: \mathbb R^N \to (-\infty, \infty]$
+> 1. $F^\ast$ is lower semicontinuous.
+> 2. The *biconjugate* $F^{\ast\ast}$ is the *largest* *lower semicontinuous function* satisfying $F^{\ast\ast}(x) \leq F(x)$ for all $x$. **In particular, if $F$ is convex and lower semicontinuous, $F^{\ast\ast} = F$**. ^e536c6
+> 3. For $\tau \neq 0$ define $F_\tau(x) := F(\tau x)$. Then $(F_\tau)^\ast(y) = F^\ast(\tfrac{y}{\tau})$.
+> 4. If $\tau > 0$, $(\tau F)^\ast(y) = \tau F^\ast(\tfrac{y}{\tau})$.
+> 5. For $z \in \mathbb R^N$ let $F^{(z)} := F(x-z)$. Then $(F^{(z)})^\ast(y) = \langle z, y \rangle + F^\ast(y)$.
+^ec92f9
+
+
+Because of property [[#^e536c6|2.]], the biconjugate $F^{\ast\ast}$ is sometimes called the *convex relaxation* of $F$.
+
+##### Example: computing the convex conjugate
+Consider $F(x) = ||x||_2^2/2$. Then $F^\ast(y) = F(y)$ #TODO calculate this!
+$$F^\ast(y) = \sup_x (\langle x, y \rangle - ||x||_2^2/2) = \sup_x \langle x, y-x/2\rangle ) = \dots$$
+
+##### Example: conjugate of $\exp$
+Let $F(x) = \exp(x)$. The map $x \mapsto xy - \exp(x)$ has a maximum at $x = \ln(y)$ if $y > 0$, so
+$$F^\ast(y) = \begin{cases} y \ln y - y, & y > 0 \\ 0, & y = 0 \\ \infty & y < 0\end{cases}$$
+This gives rise to *Young's inequality*: $$xy \leq e^x + y \ln(y) - y ~\qquad \forall y>0, x$$