Chapter 6: Approximation and fitting

6.1 Norm approximation

6.1.1 Basic norm approximation problem

The simplest norm approximation problem is an unconstrained problem of the form
m i n i m i z e ∣ ∣ A x − b ∣ ∣ {\rm minimize} \ \ \ \ ||Ax-b|| minimize ∣∣Ax−b∣∣where A ∈ R m × n , b ∈ R m , x ∈ R n , ∣ ∣ ⋅ ∣ ∣ A\in \mathbf{R}^{m\times n}, b\in \mathbf{R}^m,x\in \mathbf{R}^n,||\cdot|| A∈Rm×n,b∈Rm,x∈Rn,∣∣⋅∣∣ is a norm on R m \mathbf{R}^m Rm. A solution of the norm approximation problem is sometimes called an approximate solution of A x ≈ b Ax ≈ b Ax≈b, in the norm ∥ ⋅ ∥ ∥ · ∥ ∥⋅∥. The vector
r = A x − b r=Ax-b r=Ax−bis called the residual for the problem; its components are sometimes called the individual residuals associated with x x x.

6.1.2 Penalty function approximation

In l p l_p lp-norm approximation, for 1 ≤ p < ∞ 1 ≤ p < ∞ 1≤p<∞, the objective is
( ∣ r 1 ∣ p + ⋯ + ∣ r m ∣ p ) 1 p (|r_1|^p+\cdots+|r_m|^p)^{\frac{1}{p}} (∣r1∣p+⋯+∣rm∣p)p1As in least-squares problems, we can consider the equivalent problem with objective
∣ r 1 ∣ p + ⋯ + ∣ r m ∣ p |r_1|^p+\cdots+|r_m|^p ∣r1∣p+⋯+∣rm∣pThe penalty function approximation problem has the form
m i n i m i z e ϕ ( r 1 ) + ⋯ + ϕ ( r m ) s u b j e c t t o r = A x − b \begin{aligned} {\rm minimize} \ \ \ \ & \phi(r_1)+\cdots+\phi(r_m)\\ {\rm subject \ to} \ \ \ \ & r=Ax-b \\ \end{aligned} minimize subject to ϕ(r1)+⋯+ϕ(rm)r=Ax−bwhere ϕ : R → R \phi : \mathbf{R} → \mathbf{R} ϕ:R→R is called the (residual) penalty function. We assume that ϕ \phi ϕ is convex, so the penalty function approximation problem is a convex optimization problem.

deadzone-linear

ϕ ( u ) = { 0 ∣ u ∣ ≤ a ∣ u ∣ − a ∣ u ∣ > a \phi(u)=\left\{ \begin{array}{rl} & 0 & |u|\leq a \\ & |u|-a & |u| > a \end{array} \right. ϕ(u)={0∣u∣−a∣u∣≤a∣u∣>a

log barrier

ϕ ( u ) = { − a log ⁡ ( 1 − ( u a ) 2 ) ∣ u ∣ ≤ a ∞ ∣ u ∣ > a \phi(u)=\left\{ \begin{array}{rl} & -a\log(1-(\frac{u}{a})^2) & |u|\leq a \\ & \infty & |u| > a \end{array} \right. ϕ(u)={−alog(1−(au)2)∞∣u∣≤a∣u∣>a

The log barrier penalty puts weight very much like the l 2 l_2 l2-norm penalty for small residuals, but puts very strong weight on residuals larger than around 0.8, and infinite weight on residuals larger than 1.
Convex Optimization 读书笔记 (5)

6.1.3 Approximation with constraints

It is possible to add constraints to the basic norm approximation problem. When these constraints are convex, the resulting problem is convex.

6.2 Least-norm problems

The basic least-norm problem has the form
m i n i m i z e ∣ ∣ x ∣ ∣ s u b j e c t t o A x = b \begin{aligned} {\rm minimize} \ \ \ \ & ||x|| \\ {\rm subject \ to} \ \ \ \ & Ax=b \\ \end{aligned} minimize subject to ∣∣x∣∣Ax=bwhere A ∈ R m × n , b ∈ R m , x ∈ R n , ∣ ∣ ⋅ ∣ ∣ A\in \mathbf{R}^{m\times n}, b\in \mathbf{R}^m,x\in \mathbf{R}^n,||\cdot|| A∈Rm×n,b∈Rm,x∈Rn,∣∣⋅∣∣ is a norm on R m \mathbf{R}^m Rm.

6.3 Regularized approximation

6.3.1 Bi-criterion formulation

In the basic form of regularized approximation, the goal is to find a vector x x x that is small (if possible), and also makes the residual A x − b Ax − b Ax−b small. This is naturally described as a (convex) vector optimization problem with two objectives, ∥ A x − b ∥ ∥Ax − b∥ ∥Ax−b∥ and ∥ x ∥ ∥x∥ ∥x∥:
m i n i m i z e ( w . r . t . R + 2 ) ( ∣ ∣ A x − b ∣ ∣ , ∣ ∣ x ∣ ∣ ) \begin{aligned} {\rm minimize \ (w.r.t. \ } \mathbf{R}^2_+) \ \ \ \ & (||Ax-b||,||x||) \\ \end{aligned} minimize (w.r.t. R+2) (∣∣Ax−b∣∣,∣∣x∣∣)
The two norms can be different: the first, used to measure the size of the residual, is on R m \mathbf{R}^m Rm; the second, used to measure the size of x x x, is on R n \mathbf{R}^n Rn.

6.3.2 Regularization

Regularization is a common scalarization method used to solve the bi-criterion problem. One form of regularization is to minimize the weighted sum of the objectives:
m i n i m i z e ∣ ∣ A x − b ∣ ∣ + γ ∣ ∣ x ∣ ∣ \begin{aligned} {\rm minimize} \ \ \ \ & ||Ax-b||+\gamma||x|| \\ \end{aligned} minimize ∣∣Ax−b∣∣+γ∣∣x∣∣where γ > 0 γ > 0 γ>0 is a problem parameter. As γ γ γ varies over ( 0 , ∞ ) (0, ∞) (0,∞), the solution of traces out the optimal trade-off curve.

Another common method of regularization, especially when the Euclidean norm is used, is to minimize the weighted sum of squared norms:
m i n i m i z e ∣ ∣ A x − b ∣ ∣ 2 + δ ∣ ∣ x ∣ ∣ 2 \begin{aligned} {\rm minimize} \ \ \ \ & ||Ax-b||^2+\delta||x||^2 \\ \end{aligned} minimize ∣∣Ax−b∣∣2+δ∣∣x∣∣2for a variety of values of δ > 0 δ > 0 δ>0.

6.3.3 Reconstruction, smoothing, and de-noising

Quadratic smoothing

The simplest reconstruction method uses the quadratic smoothing function
ϕ q u a d ( x ) = ∑ i = 1 n − 1 ( x i − 1 − x i ) 2 \phi_{\rm quad}(x)=\sum_{i=1}^{n-1}(x_{i-1}-x_i)^2 ϕquad(x)=i=1∑n−1(xi−1−xi)2
We can obtain the optimal trade-off between ∥ x ^ − x c o r ∥ 2 ∥\hat{x}−x_{\rm cor}∥^2 ∥x^−xcor∥2 and ∥ D x ^ ∥ 2 ∥D\hat{x}∥^2 ∥Dx^∥2 by minimizing
∣ ∣ x ^ − x c o r ∣ ∣ 2 2 + δ ∣ ∣ D x ^ ∣ ∣ 2 2 ||\hat{x}-x_{\rm cor}||^2_2+\delta||D\hat{x}||_2^2 ∣∣x^−xcor∣∣22+δ∣∣Dx^∣∣22
where D ∈ R ( n − 1 ) × n D\in \mathbf{R}^{(n-1)\times n} D∈R(n−1)×n is the bidiagonal matrix.
Convex Optimization 读书笔记 (5)

But any rapid variations in the original signal will, obviously, be attenuated or removed by quadratic smoothing.

Total variation reconstruction

The method is based on the smoothing function
ϕ t v ( x ) = ∑ i = 1 n − 1 ∣ x i − 1 − x i ∣ \phi_{\rm tv}(x)=\sum_{i=1}^{n-1}|x_{i-1}-x_i| ϕtv(x)=i=1∑n−1∣xi−1−xi∣
Convex Optimization 读书笔记 (5)

6.4 Robust approximation

6.4.1 Stochastic robust approximation

We assume that A A A is a random variable taking values in R m × n \mathbf{R}^{m\times n} Rm×n with mean A ˉ \bar{A} Aˉ so we can describe A A A as with mean A ˉ \bar{A} Aˉ as
A = A ˉ + U A=\bar{A}+U A=Aˉ+Uwhere U U U is a random matrix with zero mean. Then the problem is
m i n i m i z e E [ ∣ ∣ A x − b ∣ ∣ 2 2 ] {\rm minimize} \ \ \ \ \mathbb{E}[||Ax-b||_2^2] minimize E[∣∣Ax−b∣∣22]
We can express the objective as
m i n i m i z e ∣ ∣ A ˉ x − b ∣ ∣ 2 2 + ∣ ∣ P 1 2 x ∣ ∣ 2 2 {\rm minimize} \ \ \ \ ||\bar{A}x-b||_2^2+||P^{\frac{1}{2}}x||_2^2 minimize ∣∣Aˉx−b∣∣22+∣∣P21x∣∣22where P = E [ U T U ] P=\mathbb{E}[U^TU] P=E[UTU]. The solution is
x = ( A ˉ T A ˉ + P ) − 1 A ˉ T b x=(\bar{A}^T\bar{A}+P)^{-1}\bar{A}^Tb x=(AˉTAˉ+P)−1AˉTb

6.4.2 Worst-case robust approximation

The (worst-case) robust approximation problem is to minimize the worst-case error:
m i n i m i z e e w c ( x ) = sup ⁡ { ∣ ∣ A x − b ∣ ∣ ∣ A ∈ A } {\rm minimize} \ \ \ \ e_{\rm wc}(x)=\sup\{ ||Ax-b|| \mid A\in \mathcal{A} \} minimize ewc(x)=sup{∣∣Ax−b∣∣∣A∈A}where A \mathcal{A} A is a set of all possible value. The robust approximation problem is always a convex optimization problem, but its tractability depends on the norm used and the description of the uncertainty set A \mathcal{A} A.

6.5 Function fitting and interpolation

6.5.1 Function families

We consider a family of functions f 1 , ⋯ , f n : R k → R f_1, \cdots , f_n : \mathbf{R}^k → \mathbf{R} f1,⋯,fn:Rk→R, with common domain d o m f i = D \mathbf{dom} \ f_i = D dom fi=D. With each x ∈ R n x ∈ \mathbf{R}^n x∈Rn we associate the function $ f : \mathbf{R}^k → \mathbf{R} $ given by
f ( u ) = x 1 f 1 ( u ) + ⋯ + x n f n ( x ) f(u)=x_1f_1(u)+\cdots+x_nf_n(x) f(u)=x1f1(u)+⋯+xnfn(x)
The family { f 1 , ⋯ , f n } \{f_1, \cdots , f_n\} {f1,⋯,fn} is sometimes called the set of basis functions (for the fitting problem) even when the functions are not independent. The vector x ∈ R n x ∈ \mathbf{R}^n x∈Rn, which parametrizes the subspace of functions, is our optimization variable, and is sometimes called the coefficient vector.

6.5.2 Constraints

Interpolation and inequalities

Interpolation conditions
f ( v j ) = z j , j = 1 , . . . m f(v_j)=z_j, \ \ j=1,...m f(vj)=zj, j=1,...mwhich require the function f f f to have the values z j ∈ R z_j ∈ \mathbf{R} zj∈R at specified points v j ∈ D v_j ∈ D vj∈D, form a set of linear equalities in x x x.

Derivative constrains

The gradient
∇ f ( v ) = ∑ i = 1 n x i ∇ f i ( v ) \nabla f(v)=\sum_{i=1}^{n}x_i\nabla f_i(v) ∇f(v)=i=1∑nxi∇fi(v)is a linear function of x x x, so interpolation conditions on the derivative of f at v v v reduce to linear equality constraints on x x x:
∣ ∣ ∇ f ( v ) ∣ ∣ = ∣ ∣ ∑ i = 1 n x i ∇ f i ( v ) ∣ ∣ ≤ M ||\nabla f(v)||=||\sum_{i=1}^{n}x_i\nabla f_i(v)||\leq M ∣∣∇f(v)∣∣=∣∣i=1∑nxi∇fi(v)∣∣≤M
is a convex constrain on x x x.

Integral constrains

Any linear functional L \mathcal{L} L on the subspace of functions can be expressed as a linear function of x x x:
L = c T x \mathcal{L}=c^Tx L=cTxwhere c i = ∫ D ϕ ( u ) f i ( i ) d u c_i=\int_{D}\phi(u)f_i(i)du ci=∫Dϕ(u)fi(i)du

6.5.3 Fitting and interpolation problems

6.5.4 Sparse descriptions and basis pursuit

In basis pursuit, there is a very large number of basis functions, and the goal is to find a good fit of the given data as a linear combination of a small number of the basis functions.

Thus we seek a function f ∈ F f ∈ \mathcal{F} f∈F that fits the data well,
f ( u i ) ≈ y i , i = 1 , . . . , m f(u_i)\approx y_i,\ \ i=1,...,m f(ui)≈yi, i=1,...,mwith a sparse coefficient vector x x x, i.e., c a r d ( x ) \mathbf{card}(x) card(x) small:
f = ∑ i ∈ B x i f i f=\sum_{i \in \mathcal{B}}x_if_i f=i∈B∑xifi

6.5.5 Interpolation with convex functions

If and only if there exist g 1 , . . . , g m g_1, . . . , g_m g1,...,gm such that
y j ≥ y i + g i T ( u j − u i ) , i , j = 1 , . . . , m y_j\geq y_i+g_i^T(u_j-u_i), \ \ i,j=1,...,m yj≥yi+giT(uj−ui), i,j=1,...,mthen there exsit a convex function f : R k → R f:\mathbf{R}^k → \mathbf{R} f:Rk→R with d o m f = R k \mathbf{dom} \ f= \mathbf{R}^k dom f=Rk, that satisfies
f ( u i ) = y i , i = 1 , . . . , m f(u_i)=y_i, \ \ i=1,...,m f(ui)=yi, i=1,...,m
For example, f ( z ) = max ⁡ i = 1 , . . . , m ( y i + g i T ( z − u i ) ) . f(z)=\max_{i=1,...,m}(y_i+g_i^T(z-u_i)). f(z)=maxi=1,...,m(yi+giT(z−ui)).

Convex Optimization 读书笔记 (5)