Skip to content

Commit

Permalink
Merge branch 'main' of ../../Foundations in Data Analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
Vuenc committed Aug 3, 2022
2 parents 85a3af3 + f7bfd84 commit 4812308
Show file tree
Hide file tree
Showing 13 changed files with 593 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

%% Lecture 1, 02.05. %%
# 1 Basics
%% #Lecture 1, 02.05. %%

Motivation: to understand phenomena in deep learning, mathematical concepts needed are *linear algebra*, *nonlinear optimization*, *probability and statistics*.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# 2 Singular Value Decomposition
%% #Lecture 2, 03.05. %%
## 2.1 Principal Components
Assume a $n$-dim. random vector $X$, $E(X) = 0$.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
#TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
#TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# 4. Dimensionality Reduction
%% #Lecture 15, 04.07. [[Foundations Section 4 printout.pdf]] %%
## 4.1 The Johnson-Lindenstrauss Lemma

> **Theorem 13.1** (*JL Lemma*)
> For $\epsilon \in (0, 1)$ and $n \in \mathbb N$, let $k \in \mathbb N$ such that $$k \geq C(\epsilon^{-2}/2 - \epsilon^3/3)^{-1} \ln n$$
> for some $\beta \geq 2$ (where do we use $\beta$? Does he mean $C$? #TODO)
> Then for any set $\mathcal P$ of $n$ points in $\mathbb R^d$, there is a map $\mathbb R^d \to \mathbb R^k$ such that
> $$\forall v,w \in \mathcal P: (1-\epsilon)||v-w||_2^2 \leq ||f(v) - f(w)||_2^2 \leq (1+\epsilon)||v-w||_2^2$$
> $f$ can be chosen as a linear map.
^9f9398

>**Theorem 13.2** (*Probabilistic JL Lemma*)
> Let $\epsilon, n, k$ and $\beta \geq 2$ be such that
> $$k \geq 2\beta(\epsilon^{-2}/2 - \epsilon^3/3)^{-1} \ln n$$
> ...or $k \geq 2 \beta \epsilon^{-2} \ln n$ for Gaussians.
> Then there is a matrix-valued random variable $\Phi$ with values in $\mathbb R^{k \times d}$ such that for any set of $n$ points $\mathcal P$ in $\mathbb R^d$, it holds that
> $$(1-\epsilon) ||v-w||_2^2 \leq ||\Phi v - \Phi w||_2^2 \leq (1+\epsilon) ||v-w||_w^2$$
> with probability $1-(n^{2-\beta} - n^{1-\beta})$.
^ecd655

Remark on probability: $\beta \gg 2$ leads to a favorable probability of success.

##### Some intuition on the numbers
Consider a point cloud of $n=10^7$ points of dimension $d=10^6$, we can project them to dimension $k = O(\epsilon^{-2} \log 10^6) = O(6 \epsilon^{-2})$, preserving the distances with a maximal distortion of $O(\epsilon)$. For example for $\epsilon = 0.1$, we get $k$ of order of a thousand (3-4 orders of magnitude in reduction!).

#### Towards proving the JL Lemma
##### JL for Gaussian random matrices
Consider $(\Phi_{ij})_{i,j}$ with $\Phi_{ij} \sim \mathcal N(0, 1)$. We aim to show that $\tfrac{1}{\sqrt{k}} \Phi$ is a JL embedding with high probability. We then proceed using the union bound.

> **Lemma 13.3**
> Let $x \in \mathbb R^d$, assume $\Phi \in \mathbb R^{k\times d}$ has entries sampled from a standard Guassian. Then
> $$P\left( \left| ||\tfrac{1}{\sqrt k} \Phi_x ||_2^2 - ||x||_2^2 \right| > \epsilon ||x||_2^2 \right) \leq 2\exp(-\tfrac{\epsilon^2-\epsilon^3k}{4})$$ or, equivalently, #TODO
^06b03b

###### #Proof of [[#^06b03b|Lemma 13.3]]
Step 1: show that for for any $x$, $\mathbb E\left( ||\tfrac{1}{\sqrt k} \Phi_x ||_2^2 \right) = ||x||_2^2$.
$$\mathbb ||1/\sqrt{k} \Phi x||_2^2
= (1/k) \sum_{j=1}^k \mathbb E \left(\sum_{\ell=1}^d \Phi_{j\ell}x_\ell\right)\left(\sum_{\ell'=1}^d \Phi_{j\ell'}x_{\ell'}\right)$$
... where things cancel due to orthonormality in expectation (right?) and we get $\dots = ||x||_2^2$. Namely, $\mathbb E(\Phi_{j\ell} \Phi_{j\ell'}) = \delta_{\ell \ell'}$.

Step 2: we note that $||\tfrac{1}{\sqrt{k}} \Phi x||_2^2 = \tfrac{1}{k} \sum_{j=1}^k (\Phi x)_j^2$. By rotation invariance, $Z_j = \tfrac{(\Phi x)_j}{||x||_2} \sim \mathcal N(0,1)$.

Step 3: apply Markov's ineq. By Markov, we get
$$P\left( \sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right)
= P\left( \exp\left(\theta \sum_{j=1}^k Z_j^2\right) > \exp(\theta (1+\epsilon) k) \right)$$
$$\leq e^{-(1+\epsilon) k \theta} \mathbb E\left(\exp\left(\theta \sum_{j=1}^k Z_j^2\right)\right)
= e^{-(1+\epsilon) k \theta}\prod_{j=1}^k \mathbb E\left( \exp(\theta Z_j^2) \right)$$
Compute the expectation as $$1/\sqrt{2\pi}\int \exp(\theta g^2 - g^2/2) \,dg = 1/\sqrt{2\pi} \int \exp(-g^2/2(1-2\theta))\,dg$$
$$1/\sqrt{2\pi}\int\exp(-t^2/2)\,dt \frac{1}{1-2\theta} = \frac{1}{1-2\theta}$$
(provided that $\theta \in (0, \tfrac{1}{2}]$). Hence the main computation continues as
$$\dots = e^{-(1+\epsilon) k \theta} \cdot \left(\tfrac{1}{1-2\theta}\right)^{k/2}$$
Step 4: choose $\theta$. Choosing $\theta = \tfrac{\epsilon}{2(1+\epsilon)}$ minimizes the above expression and also is less than $1/2$. We have
$$P\left( \sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) \leq \left( (1+\epsilon)e^{-\epsilon} \right)^{\tfrac{k}{2}} \leq \exp(-\tfrac{k}{4}(\epsilon^2-\epsilon^3))$$
using the estimate $1+\epsilon \leq \exp(\epsilon-(\epsilon^2 - \epsilon^3/2))$ (which can be proved e.g. by applying $\log$ to both sides and a well-known power series).

In a similar way, one can prove the the other direction $$P\left( \sum_{j=1}^k Z_j^2 < (1-\epsilon)k \right) \leq \exp(-\tfrac{k}{4}(\epsilon^2-\epsilon^3))$$
Finally, step 5: combining the pieces.
$$P\left( ||\tfrac{1}{\sqrt k} \Phi x||_2^2 > (1+\epsilon)||x||_2^2 \right) = P\left(\sum_{j=1}^k Z_j^2 > (1+\epsilon)k \right) \leq \exp(-\tfrac{k}{4} (\epsilon^2-\epsilon^3)$$
and similarly, $$P\left( ||\tfrac{1}{\sqrt k} \Phi x||_2^2 < (1-\epsilon)||x||_2^2 \right) = P\left(\sum_{j=1}^k Z_j^2 < (1-\epsilon)k \right) \leq \exp(-\tfrac{k}{4} (\epsilon^2-\epsilon^3)$$
We conclude by combining the two cases with the union bound.

#Exam especially the part about rotation invariance could be relevant for the exam.

###### #Proof of the JL Lemma [[#^ecd655|Theorem 13.2]]
- Choose $f$ as a random linear function $f(x) = \tfrac{1}{\sqrt k} \Phi x$, where $\Phi \in \mathbb R^{k\times d}$ has entries sampled from $\mathcal N(0, 1)$.
- There are $\binom{n}{2}$ pairs $(v, w)$ of points from $\mathcal P$.
- Hence we can apply the union bound: the probability that some pair $(v, w) \in \mathcal P^2$ will fail the inequality $(1-\epsilon)||v-w||_2^2 \leq ||\Phi v - \Phi w||_2^2 \leq (1+\epsilon)||v-w||_2^2$ is given by the probability $$P\left(\exists v, w \in \mathcal P: \left| \tfrac{1}{k} ||\Phi v - \Phi w||_2^2 - ||v-w||_2^2 \right| \geq \epsilon ||v-w||_2^2\right)$$ $$\dots \leq 2 \binom{n}{2} e^{-(\epsilon^2 - \epsilon^3)\tfrac{k}{4}} = 2 \binom{n}{2} n^{-\beta(1-\epsilon)} = n^{2-\beta(1-\epsilon)} - n^{1-\beta(1-\epsilon)}$$ (where in the last steps, the $\log$ etc. from the theorem statement comes into play)
- Hence with probability $1-(n^{2-\beta(1-\epsilon)} - n^{1-\beta(1-\epsilon)})$ we have $(1-\epsilon) ||v-w||_2^2 \leq ||f(v) - f(w)||_2^2 \leq (1+\epsilon)||v-w||_2^2$ for all $v, w \in \mathcal P$.

### Nothing special about Gaussian matrices
If we choose the entries in $\Phi$ to be $U(\{-1, +1\})$-distributed, we get comparable results. Full lemma statement: see *Lemma 13.4* on the slides (not gone into detail here).

### Scalar Product Preservation
%% #Lecture 16, 05.07. [[Foundations Section 4 printout.pdf]] %%
Under the transformation of a JL random matrix, scalar products of unit vectors are approximately preserved.

> **Corollary 13.5** (*Scalar product preservation*)
> Let $u, v$ be two points in the real $d$-dim. unit ball $B(1)$ and $\Phi \in \mathbb R^{k\times d}$ a matrix with random entries that satisfies a JL-like inequality,
> $$P\left( (1-\epsilon)||x||_2^2 \leq ||\tfrac{1}{\sqrt k} \Phi x||_2^2 \leq (1+\epsilon) ||x||_2^2\right) \geq 1-2e^{-(\epsilon^2-\epsilon^3)k/4}$$
> (e.g. if $\Phi$ is a matrix with Gaussian random entries, or with entries $U(\{-1, +1\})$-distributed).
> Then with probability at least $1 - 4e^{-(\epsilon^2 - \epsilon^3)k/4}$, for $u, v \in \mathcal P$, $$|\langle u, v \rangle - \langle \Phi u, \Phi v \rangle | \leq \epsilon$$
^99bd43
###### #Proof of [[#^99bd43|Corollary 13.5]]
We have
$$4 \langle \Phi u, \Phi v\rangle = ||\Phi (u+v)||_2^2 - ||\Phi(u-v)||_2^2
\geq (1-\epsilon)||u+v||_2^2 - (1+\epsilon) ||u-v||_2^2$$$$\dots = 4 \langle u, v \rangle - 2 \epsilon (||u||_2^2 + ||v||_2^2)
\geq 4 \langle u, v \rangle - 4 \epsilon$$
and the other direction is anologous.
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# 5 Convex Analysis
## 5.1 Convex Sets
A *convex set* is a set $K \subseteq \mathbb R^N$ where for all $x, z \in K$ and all $t \in [0, 1]$, $tx + (1-t)z \in K$. By induction, it follows that all convex combinations of several elements in the set are also in the set.

The *convex hull* $\text{conv}(T)$ of a set $T$ is the smallest convex set containing $T$, or equivalently the set of all convex combinations of elements from $T$.

### Cones
A set $K\subseteq \mathbb R^N$ is called a *cone* if for all $x \in K$ and all $t \geq 0$, $tx$ is contained in $K$. A cone that is convex is called *convex cone*, and $K$ is a convex cone iff for all $x, z \in K$ and $t, s \geq 0$, also $sx +tz \in K$.
- for example, the set of positive semidefinite matrices in $\mathbb R$
- or the *positive orthant* $\mathbb R_+^N = \{x \mid x_i \geq 0 \forall i \in [N] \}$

##### Dual Cones
For a cone $K$, its dual cone $K^\ast$ is defined as
$$K^\ast = \{ z \in \mathbb R^N : \langle x, z \rangle \geq 0 ~~ \forall x \in K\}$$
- $K^\ast$ is closed and convex (as intersection of half-spaces), and furthermore again a cone.
- If $K$ is a closed cone, then $K^{\ast\ast} = K$.
- If $H, K$ are cones and $H \subseteq K$, then $K^\ast \subseteq H^\ast$

For example, the positive orthant is self-dual.

##### Polar Cones
Given a cone $K$, the polar cone is defined as
$$K^\circ = \{ z \in \mathbb R^N \mid \langle x, z \rangle \leq 0 ~~\forall x \in K \} = -K^\ast$$
##### Conic Hull
The *conic hull* $\text{cone}(T)$ of a set $T$ is the smallest convex cone containing $T$, or equivalently the set of all conic combinations of elements from $T$, i.e. $$\text{cone}(T) = \left\{ \sum_j t_j x_j : t_j \geq 0, x_j \in T \right\}$$
### Geometric Hahn-Banach Theorem
Hahn-Banach: "Non-overlapping convex sets can be separated by hyperplanes".

> **Theorem 14.4** (*Finite-dimensional Hahn-Banach Theorem*)
> Let $K_1, K_2 \subseteq \mathbb R^N$ be convex sets with disjoint interiors. Then there exist $w, \lambda$ such that
> $$K_1 \subseteq \{ x \in \mathbb R^N \mid \langle x, w \rangle \leq \lambda \}, \quad K_2 \subseteq \{ x \in \mathbb R^N \mid \langle x, w \rangle \geq \lambda \}.$$
### Extreme Points
Let $K \subseteq \mathbb R^N$ be a convex set. A point $x \in K$ is called *extreme point* of $K$ if the only way to represent $x$ as convex combination $x = tw + (1-t)z$ for $w, z \in K$, $t \in (0, 1)$ is by choosing $x = w = z$. Note: this is *not* the same as the boundary (e.g. for a convex polygon, only the corners are extreme points, not all points on the boundary)

> **Theorem 14.6**
> A compact convex set is the convex hull of its extreme points.
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
## 5.2 Convex Functions
### Extended-valued functions
To model inadmissible points: work with *extended-valued functions* $F: \mathbb R^N \to (-\infty, \infty]$. Define $F(x) = \infty$ at non-admissible points $x$: none of these points will ever be a minimizer.

This extension of a function $F: K \to \mathbb R$ to the domain of all of $\mathbb R^N$ is called *canonical extension*.

The domain of an extended-valued function $F$ is defined as $\text{dom}(F) = \{x \in \mathbb R^N : F(x) \neq -\infty\}$. A function with non-empty domain is called *proper*.

### Convex functions
> **Definition 15.1** (*Convex function*)
> An extended-valued function $F$ is called
> - *convex* if for all $x, z$ and $t$, $$F(tx + (1-t)z) \leq tF(x) + (1-t)F(z)$$
> - *strictly convex* if above inequality holds strictly for $x \neq z, t \in (0, 1)$
> - *strongly convex* with parameter $\gamma > 0$ if for all $x, z$ and $t$, $$F(tx + (1-t)z) \leq tF(x) + (1-t)F(z) - \tfrac{\gamma}{2} t (1-t) ||x-z||_2^2$$
> $F$ is called (strictly; strongly) concave if $-F$ is (strictly; strongly) convex.
> A function $F: K \to \mathbb R$ on a convex subset $K \subseteq \mathbb R^N$ is called convex if its canonical extension is convex.
Strongly convex functions are always strictly convex, and strictly convex functions are always convex. Convex functions always have convex domains.

$F$ is convex if and only if its *epigraph* $\text{epi}(F) = \{(x, r) \mid r \geq F(x)\}$ is a convex set.

##### Smooth convex functions
If $F$ is differentiable, its convexity can further be characterized as follows:

> **Proposition 15.2** (*Smooth convex functions*)
> Let $F: \mathbb R^N \to \mathbb R$ be differentiable.
> 1. $F$ is convex iff for all $x, y$: $$F(x) \geq F(y) + \langle \nabla F(y), x-y \rangle$$, where $\nabla F(y) = (\partial_{y_1}F(y),\dots,\partial_{y_n}F(y))^\top$
> 2. $F$ is strongly convex with $\gamma > 0$ if for all $x, y$, $$F(x) \geq F(y) + \langle \nabla F(y), x-y \rangle + \frac{\gamma}{2}(t)(1-t)||x-z||_2^2$$
> 3. If $F$ is twice differentiable, then $F$ is convex iff for all $x$,$$\nabla^2 F(x) \succcurlyeq 0$$ where $\nabla^2 F$ is the Hessian of $F$
##### Composition of Convex functions
> **Proposition 15.3** (*Compositions of convex functions*)
> 1. If $F, G$ are convex functions on $\mathbb R^N$, then for all $\alpha, \beta \geq 0$, $\alpha F + \beta G$ is convex.
> 2. Let $F$ be convex and non-decreasing, let $G$ be convex. Then $H(x) = F(G(X))$ is convex.
##### Examples of convex functions
- norms $||\cdot||$ on $\mathbb R^N$ are always convex, by the triangle inequality and homogenity
- the $\ell_p$-norms are strictly convex for $p \in (1, \infty)$, and *not* strictly convex for $p \in \{1, \infty\}$
- if $A \in \mathbb R^{N \times N}$ is positive semi-definite, the function $F(x) = x^\top Ax$ is convex. If $A$ is positive definite, $F$ is strictly convex.
- For a convex set $K$, the characteristic function $\chi_K(x) = 0$ if $x \in K$ and $\infty$ otherwise is convex (watch out, this is defined in a different way than usual!) ^d5bfc2

%% #Lecture 17, 11.07. [[Foundations Section 5.2 printout.pdf]] %%

##### Convexity and continuity
> **Proposition 15.4**
> Convex functions $F: \mathbb R^N \to \mathbb R$ (which are **not** extended-valued functions) are continuous.
For extended-valued functions, one uses *lower semi-continuity*:

> **Definition 15.5** (*Lower semicontinuity*)
> An [[#Extended-valued functions|extended-valued function]] $F$ is *lower semicontinuous* if for all $x$ and every sequence $x_j \to x$, it holds that $$\lim\inf_{j} F(x_j) \geq F(x)$$
> A function is lower semicontinuous iff its epigraph is closed.
Examples:
- continuous functions are lower semicontinuous.
- $\chi_K$ ([[#^d5bfc2|definition]]) is not continuous, but lower semicontinuous iff $K$ is closed
- $K$ closed, $x$ in $K$: for a sequence $(x_j)_j$ converging to $x$ from inside $K$, $\lim\inf F(x_j) = 0 = F(x)$ and from outside, $\lim\inf F(x_j) = \infty \geq 0 = F(x)$.
- $K$ non-closed, $x \notin K$ that is a limit point: Find sequence $x_j \to x$ inside $K$, then $\lim\inf F(x_j) = 0 < \infty = F(x)$.

Lower semicontinuity is particularly useful in *infinite-dimensional* Hilbert spaces (e.g. $||\cdot||$ can be non-continuous wrt. the weak topology, but still lower semicontinuous).

### Minimizing convex functions
> **Proposition 15.6**
> Let $F$ be a convex extended-valued function. Then:
> 1. Every local minimum of $F$ is a global minimum.
> 2. The set of minima of $F$ is a convex set.
> 3. If $F$ is strictly convex, the minimum is unique.
^f3aef6

###### #Proof of [[#^f3aef6|Proposition 15.6]]
Statement 1:
Let $x$ be a local minimum and assume for some $z$, $F(z) < F(x)$. Then for all $t \in (0, 1)$, $F(xt + z(1-t)) \leq t F(x) + (1-t) F(z) < F(x)$. In particular, in every local neighborhood of $x$ there is a point $y$ with lower function value $F(y) < F(x)$, a contradiction.

Statement 2:
Let $F(x) = F(y)$, where $x, y$ are two (local and global) minima of $F$. Then $F(tx + (1-t)y) \leq tF(x) + (1-t)F(y) = t F(x) + (1-t) F(x) = F(x)$. Hence every point on the line between $x, y$ is also a minimizer.

Statement 3:
Suppose $x \neq y$ are both minima. Then by strict convexity, $F(xt + (1-t)y) < t F(x) + (1-t) F(y) = F(x)$, i.e. every convex combination has a *strictly smaller* function value than the minima, a contradiction.


### Jointly convex functions
A function $f(x, y)$ of *two arguments* $x \in \mathbb R^n$, $y \in \mathbb R^m$ is *jointly convex* if it is convex as a function of the variable $z = (x,y)$. #TODO What is the purpose of this definition? I think it is mostly "syntactic sugar", since it is equivalent to talking about convexity of the corresponding function $f: \mathbb R^{n+m} \to \mathbb R$.

Example: $f(x, y) = xy$ is "marginally" (or elelement-wise) convex, but not jointly convex. (I think because it is not convex in the "direction" (1, -1)). See [here](https://www.researchgate.net/figure/A-marginally-convex-function-is-not-necessarily-jointly-convex-The-function-f-x-y_fig2_339504312).

> **Theorem 15.7**
> Let $f$ be an extended-valued, *jointly convex* function. Then the function $g(x) = \inf_{y \in \mathbb R^m} f(x, y)$, $x \in \mathbb R^n$, is convex.
^e2d3b3

###### #Proof of [[#^e2d3b3|Theorem 15.7]]
For $t \in [0, 1]$, by joint convexity, $$g(tx_1 + (1-t)x_2) \leq f(tx_1 + (1-t)x_2, t y_1 + (1-t)y_2)$$$$\leq t f(x_1, y_1) + (1-t) f(x_2, y_2) = t g(x_1) + (1-t) g(x_2)$$
which is the claim.

### *Maxima* of convex function
We can also say something about *maxima* on *compact convex sets*:

> **Theorem 15.8**
> Let $K \subseteq \mathbb R^N$ be compact and convex, and $F: K \to \mathbb R$ be a convex function. Then $F$ attains its maximum at an *[[5.1 Convexity - Convex Sets#Extreme Points|extreme point]]* of $K$.
^86df37

###### #Proof of [[#^86df37|Theorem 15.8]]
Let $x \in K$ be a maximum in $K$. Since $K$ is the convex hull of its extreme points, we can write $x = \sum_j^m t_i x_i$ for some $m$ and some (*not necessarily all!*) extreme points $x_j$. Then

$$F(x) = \sum_j t_j F(x_j) \leq \sum_j t_j F(x) = F(x)$$
and therefore, $F(x_j) = F(x)$ for all of the points $x_j$.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## 5.3 The Convex Conjugate

^0a5cd2

> **Definition 16.1** (*Convex conjugate*).
> Let $F$ be an extended-valued function. Then its *convex conjugate* or *Fenchel dual* $F^\ast: \mathbb R^N \to (-\infty, \infty]$ is defined by
> $$F^\ast(y) = \sup_{x \in \mathbb R^N} \left(\langle x, y\rangle - F(x)\right)$$
- $F^\ast$ is always a convex function, regardless of $F$ being convex
- We directly get the *Fenchel-Young inequality* $$\langle x, y \rangle \leq F(x) + F^\ast(y)$$ ^d415ba

> **Proposition 16.2** (*Properties of the convex conjugate*)
> Let $F: \mathbb R^N \to (-\infty, \infty]$
> 1. $F^\ast$ is lower semicontinuous.
> 2. The *biconjugate* $F^{\ast\ast}$ is the *largest* *lower semicontinuous function* satisfying $F^{\ast\ast}(x) \leq F(x)$ for all $x$. **In particular, if $F$ is convex and lower semicontinuous, $F^{\ast\ast} = F$**. ^e536c6
> 3. For $\tau \neq 0$ define $F_\tau(x) := F(\tau x)$. Then $(F_\tau)^\ast(y) = F^\ast(\tfrac{y}{\tau})$.
> 4. If $\tau > 0$, $(\tau F)^\ast(y) = \tau F^\ast(\tfrac{y}{\tau})$.
> 5. For $z \in \mathbb R^N$ let $F^{(z)} := F(x-z)$. Then $(F^{(z)})^\ast(y) = \langle z, y \rangle + F^\ast(y)$.
^ec92f9


Because of property [[#^e536c6|2.]], the biconjugate $F^{\ast\ast}$ is sometimes called the *convex relaxation* of $F$.

##### Example: computing the convex conjugate
Consider $F(x) = ||x||_2^2/2$. Then $F^\ast(y) = F(y)$ #TODO calculate this!
$$F^\ast(y) = \sup_x (\langle x, y \rangle - ||x||_2^2/2) = \sup_x \langle x, y-x/2\rangle ) = \dots$$

##### Example: conjugate of $\exp$
Let $F(x) = \exp(x)$. The map $x \mapsto xy - \exp(x)$ has a maximum at $x = \ln(y)$ if $y > 0$, so
$$F^\ast(y) = \begin{cases} y \ln y - y, & y > 0 \\ 0, & y = 0 \\ \infty & y < 0\end{cases}$$
This gives rise to *Young's inequality*: $$xy \leq e^x + y \ln(y) - y ~\qquad \forall y>0, x$$
Loading

0 comments on commit 4812308

Please sign in to comment.