-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of ../../Foundations in Data Analysis
- Loading branch information
Showing
13 changed files
with
593 additions
and
2 deletions.
There are no files selected for viewing
4 changes: 2 additions & 2 deletions
4
Foundations of Data Analysis/SoSe2022/Obsidian Notes/1 Basics.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
...ions of Data Analysis/SoSe2022/Obsidian Notes/2.1 SVD - Principal Components.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
Foundations of Data Analysis/SoSe2022/Obsidian Notes/3.2 TODO.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
#TODO |
1 change: 1 addition & 0 deletions
1
Foundations of Data Analysis/SoSe2022/Obsidian Notes/3.3 TODO.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
#TODO |
93 changes: 93 additions & 0 deletions
93
...2022/Obsidian Notes/4 Dimensionality Reduction - Johnson-Lindenstrauss Lemma.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# 4. Dimensionality Reduction | ||
%% #Lecture 15, 04.07. [[Foundations Section 4 printout.pdf]] %% | ||
## 4.1 The Johnson-Lindenstrauss Lemma | ||
|
||
> **Theorem 13.1** (*JL Lemma*) | ||
> For $\epsilon \in (0, 1)$ and $n \in \mathbb N$, let $k \in \mathbb N$ such that $$k \geq C(\epsilon^{-2}/2 - \epsilon^3/3)^{-1} \ln n$$ | ||
> for some $\beta \geq 2$ (where do we use $\beta$? Does he mean $C$? #TODO) | ||
> Then for any set $\mathcal P$ of $n$ points in $\mathbb R^d$, there is a map $\mathbb R^d \to \mathbb R^k$ such that | ||
> $$\forall v,w \in \mathcal P: (1-\epsilon)||v-w||_2^2 \leq ||f(v) - f(w)||_2^2 \leq (1+\epsilon)||v-w||_2^2$$ | ||
> $f$ can be chosen as a linear map. | ||
^9f9398 | ||
|
||
>**Theorem 13.2** (*Probabilistic JL Lemma*) | ||
> Let $\epsilon, n, k$ and $\beta \geq 2$ be such that | ||
> $$k \geq 2\beta(\epsilon^{-2}/2 - \epsilon^3/3)^{-1} \ln n$$ | ||
> ...or $k \geq 2 \beta \epsilon^{-2} \ln n$ for Gaussians. | ||
> Then there is a matrix-valued random variable $\Phi$ with values in $\mathbb R^{k \times d}$ such that for any set of $n$ points $\mathcal P$ in $\mathbb R^d$, it holds that | ||
> $$(1-\epsilon) ||v-w||_2^2 \leq ||\Phi v - \Phi w||_2^2 \leq (1+\epsilon) ||v-w||_w^2$$ | ||
> with probability $1-(n^{2-\beta} - n^{1-\beta})$. | ||
^ecd655 | ||
|
||
Remark on probability: $\beta \gg 2$ leads to a favorable probability of success. | ||
|
||
##### Some intuition on the numbers | ||
Consider a point cloud of $n=10^7$ points of dimension $d=10^6$, we can project them to dimension $k = O(\epsilon^{-2} \log 10^6) = O(6 \epsilon^{-2})$, preserving the distances with a maximal distortion of $O(\epsilon)$. For example for $\epsilon = 0.1$, we get $k$ of order of a thousand (3-4 orders of magnitude in reduction!). | ||
|
||
#### Towards proving the JL Lemma | ||
##### JL for Gaussian random matrices | ||
Consider $(\Phi_{ij})_{i,j}$ with $\Phi_{ij} \sim \mathcal N(0, 1)$. We aim to show that $\tfrac{1}{\sqrt{k}} \Phi$ is a JL embedding with high probability. We then proceed using the union bound. | ||
|
||
> **Lemma 13.3** | ||
> Let $x \in \mathbb R^d$, assume $\Phi \in \mathbb R^{k\times d}$ has entries sampled from a standard Guassian. Then | ||
> $$P\left( \left| ||\tfrac{1}{\sqrt k} \Phi_x ||_2^2 - ||x||_2^2 \right| > \epsilon ||x||_2^2 \right) \leq 2\exp(-\tfrac{\epsilon^2-\epsilon^3k}{4})$$ or, equivalently, #TODO | ||
^06b03b | ||
|
||
###### #Proof of [[#^06b03b|Lemma 13.3]] | ||
Step 1: show that for for any $x$, $\mathbb E\left( ||\tfrac{1}{\sqrt k} \Phi_x ||_2^2 \right) = ||x||_2^2$. | ||
$$\mathbb ||1/\sqrt{k} \Phi x||_2^2 | ||
= (1/k) \sum_{j=1}^k \mathbb E \left(\sum_{\ell=1}^d \Phi_{j\ell}x_\ell\right)\left(\sum_{\ell'=1}^d \Phi_{j\ell'}x_{\ell'}\right)$$ | ||
... where things cancel due to orthonormality in expectation (right?) and we get $\dots = ||x||_2^2$. Namely, $\mathbb E(\Phi_{j\ell} \Phi_{j\ell'}) = \delta_{\ell \ell'}$. | ||
|
||
Step 2: we note that $||\tfrac{1}{\sqrt{k}} \Phi x||_2^2 = \tfrac{1}{k} \sum_{j=1}^k (\Phi x)_j^2$. By rotation invariance, $Z_j = \tfrac{(\Phi x)_j}{||x||_2} \sim \mathcal N(0,1)$. | ||
|
||
Step 3: apply Markov's ineq. By Markov, we get | ||
$$P\left( \sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) | ||
= P\left( \exp\left(\theta \sum_{j=1}^k Z_j^2\right) > \exp(\theta (1+\epsilon) k) \right)$$ | ||
$$\leq e^{-(1+\epsilon) k \theta} \mathbb E\left(\exp\left(\theta \sum_{j=1}^k Z_j^2\right)\right) | ||
= e^{-(1+\epsilon) k \theta}\prod_{j=1}^k \mathbb E\left( \exp(\theta Z_j^2) \right)$$ | ||
Compute the expectation as $$1/\sqrt{2\pi}\int \exp(\theta g^2 - g^2/2) \,dg = 1/\sqrt{2\pi} \int \exp(-g^2/2(1-2\theta))\,dg$$ | ||
$$1/\sqrt{2\pi}\int\exp(-t^2/2)\,dt \frac{1}{1-2\theta} = \frac{1}{1-2\theta}$$ | ||
(provided that $\theta \in (0, \tfrac{1}{2}]$). Hence the main computation continues as | ||
$$\dots = e^{-(1+\epsilon) k \theta} \cdot \left(\tfrac{1}{1-2\theta}\right)^{k/2}$$ | ||
Step 4: choose $\theta$. Choosing $\theta = \tfrac{\epsilon}{2(1+\epsilon)}$ minimizes the above expression and also is less than $1/2$. We have | ||
$$P\left( \sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) \leq \left( (1+\epsilon)e^{-\epsilon} \right)^{\tfrac{k}{2}} \leq \exp(-\tfrac{k}{4}(\epsilon^2-\epsilon^3))$$ | ||
using the estimate $1+\epsilon \leq \exp(\epsilon-(\epsilon^2 - \epsilon^3/2))$ (which can be proved e.g. by applying $\log$ to both sides and a well-known power series). | ||
|
||
In a similar way, one can prove the the other direction $$P\left( \sum_{j=1}^k Z_j^2 < (1-\epsilon)k \right) \leq \exp(-\tfrac{k}{4}(\epsilon^2-\epsilon^3))$$ | ||
Finally, step 5: combining the pieces. | ||
$$P\left( ||\tfrac{1}{\sqrt k} \Phi x||_2^2 > (1+\epsilon)||x||_2^2 \right) = P\left(\sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) \leq \exp(-\tfrac{k}{4} (\epsilon^2-\epsilon^3)$$ | ||
and similarly, $$P\left( ||\tfrac{1}{\sqrt k} \Phi x||_2^2 < (1-\epsilon)||x||_2^2 \right) = P\left(\sum_{j=1}^k Z_j^2 < (1-\epsilon)k \right) \leq \exp(-\tfrac{k}{4} (\epsilon^2-\epsilon^3)$$ | ||
We conclude by combining the two cases with the union bound. | ||
|
||
#Exam especially the part about rotation invariance could be relevant for the exam. | ||
|
||
###### #Proof of the JL Lemma [[#^ecd655|Theorem 13.2]] | ||
- Choose $f$ as a random linear function $f(x) = \tfrac{1}{\sqrt k} \Phi x$, where $\Phi \in \mathbb R^{k\times d}$ has entries sampled from $\mathcal N(0, 1)$. | ||
- There are $\binom{n}{2}$ pairs $(v, w)$ of points from $\mathcal P$. | ||
- Hence we can apply the union bound: the probability that some pair $(v, w) \in \mathcal P^2$ will fail the inequality $(1-\epsilon)||v-w||_2^2 \leq ||\Phi v - \Phi w||_2^2 \leq (1+\epsilon)||v-w||_2^2$ is given by the probability $$P\left(\exists v, w \in \mathcal P: \left| \tfrac{1}{k} ||\Phi v - \Phi w||_2^2 - ||v-w||_2^2 \right| \geq \epsilon ||v-w||_2^2\right)$$ $$\dots \leq 2 \binom{n}{2} e^{-(\epsilon^2 - \epsilon^3)\tfrac{k}{4}} = 2 \binom{n}{2} n^{-\beta(1-\epsilon)} = n^{2-\beta(1-\epsilon)} - n^{1-\beta(1-\epsilon)}$$ (where in the last steps, the $\log$ etc. from the theorem statement comes into play) | ||
- Hence with probability $1-(n^{2-\beta(1-\epsilon)} - n^{1-\beta(1-\epsilon)})$ we have $(1-\epsilon) ||v-w||_2^2 \leq ||f(v) - f(w)||_2^2 \leq (1+\epsilon)||v-w||_2^2$ for all $v, w \in \mathcal P$. | ||
|
||
### Nothing special about Gaussian matrices | ||
If we choose the entries in $\Phi$ to be $U(\{-1, +1\})$-distributed, we get comparable results. Full lemma statement: see *Lemma 13.4* on the slides (not gone into detail here). | ||
|
||
### Scalar Product Preservation | ||
%% #Lecture 16, 05.07. [[Foundations Section 4 printout.pdf]] %% | ||
Under the transformation of a JL random matrix, scalar products of unit vectors are approximately preserved. | ||
|
||
> **Corollary 13.5** (*Scalar product preservation*) | ||
> Let $u, v$ be two points in the real $d$-dim. unit ball $B(1)$ and $\Phi \in \mathbb R^{k\times d}$ a matrix with random entries that satisfies a JL-like inequality, | ||
> $$P\left( (1-\epsilon)||x||_2^2 \leq ||\tfrac{1}{\sqrt k} \Phi x||_2^2 \leq (1+\epsilon) ||x||_2^2\right) \geq 1-2e^{-(\epsilon^2-\epsilon^3)k/4}$$ | ||
> (e.g. if $\Phi$ is a matrix with Gaussian random entries, or with entries $U(\{-1, +1\})$-distributed). | ||
> Then with probability at least $1 - 4e^{-(\epsilon^2 - \epsilon^3)k/4}$, for $u, v \in \mathcal P$, $$|\langle u, v \rangle - \langle \Phi u, \Phi v \rangle | \leq \epsilon$$ | ||
^99bd43 | ||
###### #Proof of [[#^99bd43|Corollary 13.5]] | ||
We have | ||
$$4 \langle \Phi u, \Phi v\rangle = ||\Phi (u+v)||_2^2 - ||\Phi(u-v)||_2^2 | ||
\geq (1-\epsilon)||u+v||_2^2 - (1+\epsilon) ||u-v||_2^2$$$$\dots = 4 \langle u, v \rangle - 2 \epsilon (||u||_2^2 + ||v||_2^2) | ||
\geq 4 \langle u, v \rangle - 4 \epsilon$$ | ||
and the other direction is anologous. |
37 changes: 37 additions & 0 deletions
37
...dations of Data Analysis/SoSe2022/Obsidian Notes/5.1 Convexity - Convex Sets.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# 5 Convex Analysis | ||
## 5.1 Convex Sets | ||
A *convex set* is a set $K \subseteq \mathbb R^N$ where for all $x, z \in K$ and all $t \in [0, 1]$, $tx + (1-t)z \in K$. By induction, it follows that all convex combinations of several elements in the set are also in the set. | ||
|
||
The *convex hull* $\text{conv}(T)$ of a set $T$ is the smallest convex set containing $T$, or equivalently the set of all convex combinations of elements from $T$. | ||
|
||
### Cones | ||
A set $K\subseteq \mathbb R^N$ is called a *cone* if for all $x \in K$ and all $t \geq 0$, $tx$ is contained in $K$. A cone that is convex is called *convex cone*, and $K$ is a convex cone iff for all $x, z \in K$ and $t, s \geq 0$, also $sx +tz \in K$. | ||
- for example, the set of positive semidefinite matrices in $\mathbb R$ | ||
- or the *positive orthant* $\mathbb R_+^N = \{x \mid x_i \geq 0 \forall i \in [N] \}$ | ||
|
||
##### Dual Cones | ||
For a cone $K$, its dual cone $K^\ast$ is defined as | ||
$$K^\ast = \{ z \in \mathbb R^N : \langle x, z \rangle \geq 0 ~~ \forall x \in K\}$$ | ||
- $K^\ast$ is closed and convex (as intersection of half-spaces), and furthermore again a cone. | ||
- If $K$ is a closed cone, then $K^{\ast\ast} = K$. | ||
- If $H, K$ are cones and $H \subseteq K$, then $K^\ast \subseteq H^\ast$ | ||
|
||
For example, the positive orthant is self-dual. | ||
|
||
##### Polar Cones | ||
Given a cone $K$, the polar cone is defined as | ||
$$K^\circ = \{ z \in \mathbb R^N \mid \langle x, z \rangle \leq 0 ~~\forall x \in K \} = -K^\ast$$ | ||
##### Conic Hull | ||
The *conic hull* $\text{cone}(T)$ of a set $T$ is the smallest convex cone containing $T$, or equivalently the set of all conic combinations of elements from $T$, i.e. $$\text{cone}(T) = \left\{ \sum_j t_j x_j : t_j \geq 0, x_j \in T \right\}$$ | ||
### Geometric Hahn-Banach Theorem | ||
Hahn-Banach: "Non-overlapping convex sets can be separated by hyperplanes". | ||
|
||
> **Theorem 14.4** (*Finite-dimensional Hahn-Banach Theorem*) | ||
> Let $K_1, K_2 \subseteq \mathbb R^N$ be convex sets with disjoint interiors. Then there exist $w, \lambda$ such that | ||
> $$K_1 \subseteq \{ x \in \mathbb R^N \mid \langle x, w \rangle \leq \lambda \}, \quad K_2 \subseteq \{ x \in \mathbb R^N \mid \langle x, w \rangle \geq \lambda \}.$$ | ||
### Extreme Points | ||
Let $K \subseteq \mathbb R^N$ be a convex set. A point $x \in K$ is called *extreme point* of $K$ if the only way to represent $x$ as convex combination $x = tw + (1-t)z$ for $w, z \in K$, $t \in (0, 1)$ is by choosing $x = w = z$. Note: this is *not* the same as the boundary (e.g. for a convex polygon, only the corners are extreme points, not all points on the boundary) | ||
|
||
> **Theorem 14.6** | ||
> A compact convex set is the convex hull of its extreme points. |
108 changes: 108 additions & 0 deletions
108
...ns of Data Analysis/SoSe2022/Obsidian Notes/5.2 Convexity - Convex Functions.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
## 5.2 Convex Functions | ||
### Extended-valued functions | ||
To model inadmissible points: work with *extended-valued functions* $F: \mathbb R^N \to (-\infty, \infty]$. Define $F(x) = \infty$ at non-admissible points $x$: none of these points will ever be a minimizer. | ||
|
||
This extension of a function $F: K \to \mathbb R$ to the domain of all of $\mathbb R^N$ is called *canonical extension*. | ||
|
||
The domain of an extended-valued function $F$ is defined as $\text{dom}(F) = \{x \in \mathbb R^N : F(x) \neq -\infty\}$. A function with non-empty domain is called *proper*. | ||
|
||
### Convex functions | ||
> **Definition 15.1** (*Convex function*) | ||
> An extended-valued function $F$ is called | ||
> - *convex* if for all $x, z$ and $t$, $$F(tx + (1-t)z) \leq tF(x) + (1-t)F(z)$$ | ||
> - *strictly convex* if above inequality holds strictly for $x \neq z, t \in (0, 1)$ | ||
> - *strongly convex* with parameter $\gamma > 0$ if for all $x, z$ and $t$, $$F(tx + (1-t)z) \leq tF(x) + (1-t)F(z) - \tfrac{\gamma}{2} t (1-t) ||x-z||_2^2$$ | ||
> $F$ is called (strictly; strongly) concave if $-F$ is (strictly; strongly) convex. | ||
> A function $F: K \to \mathbb R$ on a convex subset $K \subseteq \mathbb R^N$ is called convex if its canonical extension is convex. | ||
Strongly convex functions are always strictly convex, and strictly convex functions are always convex. Convex functions always have convex domains. | ||
|
||
$F$ is convex if and only if its *epigraph* $\text{epi}(F) = \{(x, r) \mid r \geq F(x)\}$ is a convex set. | ||
|
||
##### Smooth convex functions | ||
If $F$ is differentiable, its convexity can further be characterized as follows: | ||
|
||
> **Proposition 15.2** (*Smooth convex functions*) | ||
> Let $F: \mathbb R^N \to \mathbb R$ be differentiable. | ||
> 1. $F$ is convex iff for all $x, y$: $$F(x) \geq F(y) + \langle \nabla F(y), x-y \rangle$$, where $\nabla F(y) = (\partial_{y_1}F(y),\dots,\partial_{y_n}F(y))^\top$ | ||
> 2. $F$ is strongly convex with $\gamma > 0$ if for all $x, y$, $$F(x) \geq F(y) + \langle \nabla F(y), x-y \rangle + \frac{\gamma}{2}(t)(1-t)||x-z||_2^2$$ | ||
> 3. If $F$ is twice differentiable, then $F$ is convex iff for all $x$,$$\nabla^2 F(x) \succcurlyeq 0$$ where $\nabla^2 F$ is the Hessian of $F$ | ||
##### Composition of Convex functions | ||
> **Proposition 15.3** (*Compositions of convex functions*) | ||
> 1. If $F, G$ are convex functions on $\mathbb R^N$, then for all $\alpha, \beta \geq 0$, $\alpha F + \beta G$ is convex. | ||
> 2. Let $F$ be convex and non-decreasing, let $G$ be convex. Then $H(x) = F(G(X))$ is convex. | ||
##### Examples of convex functions | ||
- norms $||\cdot||$ on $\mathbb R^N$ are always convex, by the triangle inequality and homogenity | ||
- the $\ell_p$-norms are strictly convex for $p \in (1, \infty)$, and *not* strictly convex for $p \in \{1, \infty\}$ | ||
- if $A \in \mathbb R^{N \times N}$ is positive semi-definite, the function $F(x) = x^\top Ax$ is convex. If $A$ is positive definite, $F$ is strictly convex. | ||
- For a convex set $K$, the characteristic function $\chi_K(x) = 0$ if $x \in K$ and $\infty$ otherwise is convex (watch out, this is defined in a different way than usual!) ^d5bfc2 | ||
|
||
%% #Lecture 17, 11.07. [[Foundations Section 5.2 printout.pdf]] %% | ||
|
||
##### Convexity and continuity | ||
> **Proposition 15.4** | ||
> Convex functions $F: \mathbb R^N \to \mathbb R$ (which are **not** extended-valued functions) are continuous. | ||
For extended-valued functions, one uses *lower semi-continuity*: | ||
|
||
> **Definition 15.5** (*Lower semicontinuity*) | ||
> An [[#Extended-valued functions|extended-valued function]] $F$ is *lower semicontinuous* if for all $x$ and every sequence $x_j \to x$, it holds that $$\lim\inf_{j} F(x_j) \geq F(x)$$ | ||
> A function is lower semicontinuous iff its epigraph is closed. | ||
Examples: | ||
- continuous functions are lower semicontinuous. | ||
- $\chi_K$ ([[#^d5bfc2|definition]]) is not continuous, but lower semicontinuous iff $K$ is closed | ||
- $K$ closed, $x$ in $K$: for a sequence $(x_j)_j$ converging to $x$ from inside $K$, $\lim\inf F(x_j) = 0 = F(x)$ and from outside, $\lim\inf F(x_j) = \infty \geq 0 = F(x)$. | ||
- $K$ non-closed, $x \notin K$ that is a limit point: Find sequence $x_j \to x$ inside $K$, then $\lim\inf F(x_j) = 0 < \infty = F(x)$. | ||
|
||
Lower semicontinuity is particularly useful in *infinite-dimensional* Hilbert spaces (e.g. $||\cdot||$ can be non-continuous wrt. the weak topology, but still lower semicontinuous). | ||
|
||
### Minimizing convex functions | ||
> **Proposition 15.6** | ||
> Let $F$ be a convex extended-valued function. Then: | ||
> 1. Every local minimum of $F$ is a global minimum. | ||
> 2. The set of minima of $F$ is a convex set. | ||
> 3. If $F$ is strictly convex, the minimum is unique. | ||
^f3aef6 | ||
|
||
###### #Proof of [[#^f3aef6|Proposition 15.6]] | ||
Statement 1: | ||
Let $x$ be a local minimum and assume for some $z$, $F(z) < F(x)$. Then for all $t \in (0, 1)$, $F(xt + z(1-t)) \leq t F(x) + (1-t) F(z) < F(x)$. In particular, in every local neighborhood of $x$ there is a point $y$ with lower function value $F(y) < F(x)$, a contradiction. | ||
|
||
Statement 2: | ||
Let $F(x) = F(y)$, where $x, y$ are two (local and global) minima of $F$. Then $F(tx + (1-t)y) \leq tF(x) + (1-t)F(y) = t F(x) + (1-t) F(x) = F(x)$. Hence every point on the line between $x, y$ is also a minimizer. | ||
|
||
Statement 3: | ||
Suppose $x \neq y$ are both minima. Then by strict convexity, $F(xt + (1-t)y) < t F(x) + (1-t) F(y) = F(x)$, i.e. every convex combination has a *strictly smaller* function value than the minima, a contradiction. | ||
|
||
|
||
### Jointly convex functions | ||
A function $f(x, y)$ of *two arguments* $x \in \mathbb R^n$, $y \in \mathbb R^m$ is *jointly convex* if it is convex as a function of the variable $z = (x,y)$. #TODO What is the purpose of this definition? I think it is mostly "syntactic sugar", since it is equivalent to talking about convexity of the corresponding function $f: \mathbb R^{n+m} \to \mathbb R$. | ||
|
||
Example: $f(x, y) = xy$ is "marginally" (or elelement-wise) convex, but not jointly convex. (I think because it is not convex in the "direction" (1, -1)). See [here](https://www.researchgate.net/figure/A-marginally-convex-function-is-not-necessarily-jointly-convex-The-function-f-x-y_fig2_339504312). | ||
|
||
> **Theorem 15.7** | ||
> Let $f$ be an extended-valued, *jointly convex* function. Then the function $g(x) = \inf_{y \in \mathbb R^m} f(x, y)$, $x \in \mathbb R^n$, is convex. | ||
^e2d3b3 | ||
|
||
###### #Proof of [[#^e2d3b3|Theorem 15.7]] | ||
For $t \in [0, 1]$, by joint convexity, $$g(tx_1 + (1-t)x_2) \leq f(tx_1 + (1-t)x_2, t y_1 + (1-t)y_2)$$$$\leq t f(x_1, y_1) + (1-t) f(x_2, y_2) = t g(x_1) + (1-t) g(x_2)$$ | ||
which is the claim. | ||
|
||
### *Maxima* of convex function | ||
We can also say something about *maxima* on *compact convex sets*: | ||
|
||
> **Theorem 15.8** | ||
> Let $K \subseteq \mathbb R^N$ be compact and convex, and $F: K \to \mathbb R$ be a convex function. Then $F$ attains its maximum at an *[[5.1 Convexity - Convex Sets#Extreme Points|extreme point]]* of $K$. | ||
^86df37 | ||
|
||
###### #Proof of [[#^86df37|Theorem 15.8]] | ||
Let $x \in K$ be a maximum in $K$. Since $K$ is the convex hull of its extreme points, we can write $x = \sum_j^m t_i x_i$ for some $m$ and some (*not necessarily all!*) extreme points $x_j$. Then | ||
|
||
$$F(x) = \sum_j t_j F(x_j) \leq \sum_j t_j F(x) = F(x)$$ | ||
and therefore, $F(x_j) = F(x)$ for all of the points $x_j$. |
31 changes: 31 additions & 0 deletions
31
...ns of Data Analysis/SoSe2022/Obsidian Notes/5.3 Convexity - Convex Conjugate.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
## 5.3 The Convex Conjugate | ||
|
||
^0a5cd2 | ||
|
||
> **Definition 16.1** (*Convex conjugate*). | ||
> Let $F$ be an extended-valued function. Then its *convex conjugate* or *Fenchel dual* $F^\ast: \mathbb R^N \to (-\infty, \infty]$ is defined by | ||
> $$F^\ast(y) = \sup_{x \in \mathbb R^N} \left(\langle x, y\rangle - F(x)\right)$$ | ||
- $F^\ast$ is always a convex function, regardless of $F$ being convex | ||
- We directly get the *Fenchel-Young inequality* $$\langle x, y \rangle \leq F(x) + F^\ast(y)$$ ^d415ba | ||
|
||
> **Proposition 16.2** (*Properties of the convex conjugate*) | ||
> Let $F: \mathbb R^N \to (-\infty, \infty]$ | ||
> 1. $F^\ast$ is lower semicontinuous. | ||
> 2. The *biconjugate* $F^{\ast\ast}$ is the *largest* *lower semicontinuous function* satisfying $F^{\ast\ast}(x) \leq F(x)$ for all $x$. **In particular, if $F$ is convex and lower semicontinuous, $F^{\ast\ast} = F$**. ^e536c6 | ||
> 3. For $\tau \neq 0$ define $F_\tau(x) := F(\tau x)$. Then $(F_\tau)^\ast(y) = F^\ast(\tfrac{y}{\tau})$. | ||
> 4. If $\tau > 0$, $(\tau F)^\ast(y) = \tau F^\ast(\tfrac{y}{\tau})$. | ||
> 5. For $z \in \mathbb R^N$ let $F^{(z)} := F(x-z)$. Then $(F^{(z)})^\ast(y) = \langle z, y \rangle + F^\ast(y)$. | ||
^ec92f9 | ||
|
||
|
||
Because of property [[#^e536c6|2.]], the biconjugate $F^{\ast\ast}$ is sometimes called the *convex relaxation* of $F$. | ||
|
||
##### Example: computing the convex conjugate | ||
Consider $F(x) = ||x||_2^2/2$. Then $F^\ast(y) = F(y)$ #TODO calculate this! | ||
$$F^\ast(y) = \sup_x (\langle x, y \rangle - ||x||_2^2/2) = \sup_x \langle x, y-x/2\rangle ) = \dots$$ | ||
|
||
##### Example: conjugate of $\exp$ | ||
Let $F(x) = \exp(x)$. The map $x \mapsto xy - \exp(x)$ has a maximum at $x = \ln(y)$ if $y > 0$, so | ||
$$F^\ast(y) = \begin{cases} y \ln y - y, & y > 0 \\ 0, & y = 0 \\ \infty & y < 0\end{cases}$$ | ||
This gives rise to *Young's inequality*: $$xy \leq e^x + y \ln(y) - y ~\qquad \forall y>0, x$$ |
Oops, something went wrong.