Convex Optimization 读书笔记 (5)

Chapter 6: Approximation and fitting

6.1 Norm approximation

6.1.1 Basic norm approximation problem

The simplest norm approximation problem is an unconstrained problem of the form
m i n i m i z e      ∣ ∣ A x − b ∣ ∣ {\rm minimize} \ \ \ \ ||Ax-b|| minimize    Axbwhere A ∈ R m × n , b ∈ R m , x ∈ R n , ∣ ∣ ⋅ ∣ ∣ A\in \mathbf{R}^{m\times n}, b\in \mathbf{R}^m,x\in \mathbf{R}^n,||\cdot|| ARm×n,bRm,xRn, is a norm on R m \mathbf{R}^m Rm. A solution of the norm approximation problem is sometimes called an approximate solution of A x ≈ b Ax ≈ b Axb, in the norm ∥ ⋅ ∥ ∥ · ∥ . The vector
r = A x − b r=Ax-b r=Axbis called the residual for the problem; its components are sometimes called the individual residuals associated with x x x.

6.1.2 Penalty function approximation

In l p l_p lp-norm approximation, for 1 ≤ p < ∞ 1 ≤ p < ∞ 1p<, the objective is
( ∣ r 1 ∣ p + ⋯ + ∣ r m ∣ p ) 1 p (|r_1|^p+\cdots+|r_m|^p)^{\frac{1}{p}} (r1p++rmp)p1As in least-squares problems, we can consider the equivalent problem with objective
∣ r 1 ∣ p + ⋯ + ∣ r m ∣ p |r_1|^p+\cdots+|r_m|^p r1p++rmpThe penalty function approximation problem has the form
m i n i m i z e      ϕ ( r 1 ) + ⋯ + ϕ ( r m ) s u b j e c t   t o      r = A x − b \begin{aligned} {\rm minimize} \ \ \ \ & \phi(r_1)+\cdots+\phi(r_m)\\ {\rm subject \ to} \ \ \ \ & r=Ax-b \\ \end{aligned} minimize    subject to    ϕ(r1)++ϕ(rm)r=Axbwhere ϕ : R → R \phi : \mathbf{R} → \mathbf{R} ϕ:RR is called the (residual) penalty function. We assume that ϕ \phi ϕ is convex, so the penalty function approximation problem is a convex optimization problem.

deadzone-linear

ϕ ( u ) = { 0 ∣ u ∣ ≤ a ∣ u ∣ − a ∣ u ∣ > a \phi(u)=\left\{ \begin{array}{rl} & 0 & |u|\leq a \\ & |u|-a & |u| > a \end{array} \right. ϕ(u)={0uauau>a

log barrier

ϕ ( u ) = { − a log ⁡ ( 1 − ( u a ) 2 ) ∣ u ∣ ≤ a ∞ ∣ u ∣ > a \phi(u)=\left\{ \begin{array}{rl} & -a\log(1-(\frac{u}{a})^2) & |u|\leq a \\ & \infty & |u| > a \end{array} \right. ϕ(u)={alog(1(au)2)uau>a

The log barrier penalty puts weight very much like the l 2 l_2 l2-norm penalty for small residuals, but puts very strong weight on residuals larger than around 0.8, and infinite weight on residuals larger than 1.
Convex Optimization 读书笔记 (5)

6.1.3 Approximation with constraints

It is possible to add constraints to the basic norm approximation problem. When these constraints are convex, the resulting problem is convex.

6.2 Least-norm problems

The basic least-norm problem has the form
m i n i m i z e      ∣ ∣ x ∣ ∣ s u b j e c t   t o      A x = b \begin{aligned} {\rm minimize} \ \ \ \ & ||x|| \\ {\rm subject \ to} \ \ \ \ & Ax=b \\ \end{aligned} minimize    subject to    xAx=bwhere A ∈ R m × n , b ∈ R m , x ∈ R n , ∣ ∣ ⋅ ∣ ∣ A\in \mathbf{R}^{m\times n}, b\in \mathbf{R}^m,x\in \mathbf{R}^n,||\cdot|| ARm×n,bRm,xRn, is a norm on R m \mathbf{R}^m Rm.

6.3 Regularized approximation

6.3.1 Bi-criterion formulation

In the basic form of regularized approximation, the goal is to find a vector x x x that is small (if possible), and also makes the residual A x − b Ax − b Axb small. This is naturally described as a (convex) vector optimization problem with two objectives, ∥ A x − b ∥ ∥Ax − b∥ Axb and ∥ x ∥ ∥x∥ x:
m i n i m i z e   ( w . r . t .   R + 2 )      ( ∣ ∣ A x − b ∣ ∣ , ∣ ∣ x ∣ ∣ ) \begin{aligned} {\rm minimize \ (w.r.t. \ } \mathbf{R}^2_+) \ \ \ \ & (||Ax-b||,||x||) \\ \end{aligned} minimize (w.r.t. R+2)    (Axb,x)
The two norms can be different: the first, used to measure the size of the residual, is on R m \mathbf{R}^m Rm; the second, used to measure the size of x x x, is on R n \mathbf{R}^n Rn.

6.3.2 Regularization

Regularization is a common scalarization method used to solve the bi-criterion problem. One form of regularization is to minimize the weighted sum of the objectives:
m i n i m i z e      ∣ ∣ A x − b ∣ ∣ + γ ∣ ∣ x ∣ ∣ \begin{aligned} {\rm minimize} \ \ \ \ & ||Ax-b||+\gamma||x|| \\ \end{aligned} minimize    Axb+γxwhere γ > 0 γ > 0 γ>0 is a problem parameter. As γ γ γ varies over ( 0 , ∞ ) (0, ∞) (0,), the solution of traces out the optimal trade-off curve.

Another common method of regularization, especially when the Euclidean norm is used, is to minimize the weighted sum of squared norms:
m i n i m i z e      ∣ ∣ A x − b ∣ ∣ 2 + δ ∣ ∣ x ∣ ∣ 2 \begin{aligned} {\rm minimize} \ \ \ \ & ||Ax-b||^2+\delta||x||^2 \\ \end{aligned} minimize    Axb2+δx2for a variety of values of δ > 0 δ > 0 δ>0.

6.3.3 Reconstruction, smoothing, and de-noising

Quadratic smoothing

The simplest reconstruction method uses the quadratic smoothing function
ϕ q u a d ( x ) = ∑ i = 1 n − 1 ( x i − 1 − x i ) 2 \phi_{\rm quad}(x)=\sum_{i=1}^{n-1}(x_{i-1}-x_i)^2 ϕquad(x)=i=1n1(xi1xi)2
We can obtain the optimal trade-off between ∥ x ^ − x c o r ∥ 2 ∥\hat{x}−x_{\rm cor}∥^2 x^xcor2 and ∥ D x ^ ∥ 2 ∥D\hat{x}∥^2 Dx^2 by minimizing
∣ ∣ x ^ − x c o r ∣ ∣ 2 2 + δ ∣ ∣ D x ^ ∣ ∣ 2 2 ||\hat{x}-x_{\rm cor}||^2_2+\delta||D\hat{x}||_2^2 x^xcor22+δDx^22
where D ∈ R ( n − 1 ) × n D\in \mathbf{R}^{(n-1)\times n} DR(n1)×n is the bidiagonal matrix.
Convex Optimization 读书笔记 (5)

But any rapid variations in the original signal will, obviously, be attenuated or removed by quadratic smoothing.

Total variation reconstruction

The method is based on the smoothing function
ϕ t v ( x ) = ∑ i = 1 n − 1 ∣ x i − 1 − x i ∣ \phi_{\rm tv}(x)=\sum_{i=1}^{n-1}|x_{i-1}-x_i| ϕtv(x)=i=1n1xi1xi
Convex Optimization 读书笔记 (5)

6.4 Robust approximation

6.4.1 Stochastic robust approximation

We assume that A A A is a random variable taking values in R m × n \mathbf{R}^{m\times n} Rm×n with mean A ˉ \bar{A} Aˉ so we can describe A A A as with mean A ˉ \bar{A} Aˉ as
A = A ˉ + U A=\bar{A}+U A=Aˉ+Uwhere U U U is a random matrix with zero mean. Then the problem is
m i n i m i z e      E [ ∣ ∣ A x − b ∣ ∣ 2 2 ] {\rm minimize} \ \ \ \ \mathbb{E}[||Ax-b||_2^2] minimize    E[Axb22]
We can express the objective as
m i n i m i z e      ∣ ∣ A ˉ x − b ∣ ∣ 2 2 + ∣ ∣ P 1 2 x ∣ ∣ 2 2 {\rm minimize} \ \ \ \ ||\bar{A}x-b||_2^2+||P^{\frac{1}{2}}x||_2^2 minimize    Aˉxb22+P21x22where P = E [ U T U ] P=\mathbb{E}[U^TU] P=E[UTU]. The solution is
x = ( A ˉ T A ˉ + P ) − 1 A ˉ T b x=(\bar{A}^T\bar{A}+P)^{-1}\bar{A}^Tb x=(AˉTAˉ+P)1AˉTb

6.4.2 Worst-case robust approximation

The (worst-case) robust approximation problem is to minimize the worst-case error:
m i n i m i z e      e w c ( x ) = sup ⁡ { ∣ ∣ A x − b ∣ ∣ ∣ A ∈ A } {\rm minimize} \ \ \ \ e_{\rm wc}(x)=\sup\{ ||Ax-b|| \mid A\in \mathcal{A} \} minimize    ewc(x)=sup{AxbAA}where A \mathcal{A} A is a set of all possible value. The robust approximation problem is always a convex optimization problem, but its tractability depends on the norm used and the description of the uncertainty set A \mathcal{A} A.

6.5 Function fitting and interpolation

6.5.1 Function families

We consider a family of functions f 1 , ⋯   , f n : R k → R f_1, \cdots , f_n : \mathbf{R}^k → \mathbf{R} f1,,fn:RkR, with common domain d o m   f i = D \mathbf{dom} \ f_i = D dom fi=D. With each x ∈ R n x ∈ \mathbf{R}^n xRn we associate the function $ f : \mathbf{R}^k → \mathbf{R} $ given by
f ( u ) = x 1 f 1 ( u ) + ⋯ + x n f n ( x ) f(u)=x_1f_1(u)+\cdots+x_nf_n(x) f(u)=x1f1(u)++xnfn(x)
The family { f 1 , ⋯   , f n } \{f_1, \cdots , f_n\} {f1,,fn} is sometimes called the set of basis functions (for the fitting problem) even when the functions are not independent. The vector x ∈ R n x ∈ \mathbf{R}^n xRn, which parametrizes the subspace of functions, is our optimization variable, and is sometimes called the coefficient vector.

6.5.2 Constraints

Interpolation and inequalities

Interpolation conditions
f ( v j ) = z j ,    j = 1 , . . . m f(v_j)=z_j, \ \ j=1,...m f(vj)=zj,  j=1,...mwhich require the function f f f to have the values z j ∈ R z_j ∈ \mathbf{R} zjR at specified points v j ∈ D v_j ∈ D vjD, form a set of linear equalities in x x x.

Derivative constrains

The gradient
∇ f ( v ) = ∑ i = 1 n x i ∇ f i ( v ) \nabla f(v)=\sum_{i=1}^{n}x_i\nabla f_i(v) f(v)=i=1nxifi(v)is a linear function of x x x, so interpolation conditions on the derivative of f at v v v reduce to linear equality constraints on x x x:
∣ ∣ ∇ f ( v ) ∣ ∣ = ∣ ∣ ∑ i = 1 n x i ∇ f i ( v ) ∣ ∣ ≤ M ||\nabla f(v)||=||\sum_{i=1}^{n}x_i\nabla f_i(v)||\leq M f(v)=i=1nxifi(v)M
is a convex constrain on x x x.

Integral constrains

Any linear functional L \mathcal{L} L on the subspace of functions can be expressed as a linear function of x x x:
L = c T x \mathcal{L}=c^Tx L=cTxwhere c i = ∫ D ϕ ( u ) f i ( i ) d u c_i=\int_{D}\phi(u)f_i(i)du ci=Dϕ(u)fi(i)du

6.5.3 Fitting and interpolation problems

6.5.4 Sparse descriptions and basis pursuit

In basis pursuit, there is a very large number of basis functions, and the goal is to find a good fit of the given data as a linear combination of a small number of the basis functions.

Thus we seek a function f ∈ F f ∈ \mathcal{F} fF that fits the data well,
f ( u i ) ≈ y i ,    i = 1 , . . . , m f(u_i)\approx y_i,\ \ i=1,...,m f(ui)yi,  i=1,...,mwith a sparse coefficient vector x x x, i.e., c a r d ( x ) \mathbf{card}(x) card(x) small:
f = ∑ i ∈ B x i f i f=\sum_{i \in \mathcal{B}}x_if_i f=iBxifi

6.5.5 Interpolation with convex functions

If and only if there exist g 1 , . . . , g m g_1, . . . , g_m g1,...,gm such that
y j ≥ y i + g i T ( u j − u i ) ,    i , j = 1 , . . . , m y_j\geq y_i+g_i^T(u_j-u_i), \ \ i,j=1,...,m yjyi+giT(ujui),  i,j=1,...,mthen there exsit a convex function f : R k → R f:\mathbf{R}^k → \mathbf{R} f:RkR with d o m   f = R k \mathbf{dom} \ f= \mathbf{R}^k dom f=Rk, that satisfies
f ( u i ) = y i ,    i = 1 , . . . , m f(u_i)=y_i, \ \ i=1,...,m f(ui)=yi,  i=1,...,m
For example, f ( z ) = max ⁡ i = 1 , . . . , m ( y i + g i T ( z − u i ) ) . f(z)=\max_{i=1,...,m}(y_i+g_i^T(z-u_i)). f(z)=maxi=1,...,m(yi+giT(zui)).