next up previous [pdf]

Next: Conjugate-Guided-Gradient (CGG) method Up: CG method for LS Previous: CG method for LS

CG method for Iteratively Reweighted Least Squares (IRLS)

Instead of the $\ell^2$-norm solution obtained by the conventional LS method, $\ell^p$-norm minimization solutions, with $1\le p \le2$, are often tried. Iterative inversion algorithms called IRLS (Iteratively Reweighted Least Squares) algorithms have been developed to solve these problems, which lie between the least-absolute-values problem and the classical least-squares problem. The main advantage of IRLS is that it provides an easy way to compute the approximate $\ell^p$-norm solution. Among the various $\ell^p$-norm solutions, $\ell ^1$-norm solutions are known to be more robust than $\ell^2$-norm solutions, being less sensitive to spiky, high-amplitude noise Scales et al. (1988); Scales and Gersztenkorn (1987); Claerbout and Muir (1973); Taylor et al. (1979).

The problem solved by IRLS is a minimization of the weighted residual/model in the least-squares sense. The residual to be minimized in the weighted problem is described as

\begin{displaymath}
\mathbf r = \mathbf W_r (\mathbf L \mathbf W_m \mathbf m - \mathbf d)
\end{displaymath} (5)

where $\mathbf W_r$ and $\mathbf W_m$ are the weights for residual and model, respectively. These residual and model weights are for enhancing our preference regarding the residual and model. They can be applied separately or together according to a given inversion goal. In this section, for simplicity, the explanation will be limited to the case of applying both weights together, but the examples given in a later section will show all the cases including the residual and the model weights separately for comparison. Those weights can be any matrices, but diagonal matrices are often used for them and this paper will assume all weights are diagonal matrices. Then the gradient for the weighted least-squares becomes
\begin{displaymath}
\mathbf W_m^T \mathbf L^T \mathbf W_r^T \mathbf r =
{\parti...
... W_r^T\mathbf W_r(\mathbf L \mathbf W_m\mathbf m -\mathbf d) .
\end{displaymath} (6)

A particular choice for the residual weight $\mathbf W_r$ is the one that results in minimizing the $\ell^p$-norm of the residual. Choosing the $i^{th}$ diagonal element of $\mathbf W_r$ to be a function of the $i^{th}$ component of the residual vector as follows:
\begin{displaymath}
diag ({\mathbf W_r})_i = \vert r_i\vert^{(p-2)/2},
\end{displaymath} (7)

the $\ell^2$-norm of the weighted residual is then
\begin{displaymath}
\Vert\mathbf W_r r\Vert _2^2
= \mathbf r^T \mathbf W_r^T \ma...
...athbf r^T \mathbf W_r^2 \mathbf r = \Vert\mathbf r\Vert _p^p .
\end{displaymath} (8)

Therefore, the minimization of the $\ell^2$-norm of the weighted residual with an weight as shown Equation (7) can be thought as a minimization of the $\ell^p$-norm of the residual. This method is valid for $\ell^p$-norms where $1\le p \le2$. When the $\ell ^1$-norm is desired, the weighting is as follows:

\begin{displaymath}diag({\mathbf W_r})_i = \vert r_i\vert^{-1/2}.\end{displaymath}

This weight will reduce the contribution of large residuals and improve the fit to the data that is already well-estimated. Thus, the $\ell ^1$-norm-based minimization is robust, less sensitive to noise bursts in the data. In practice the weighting operator is modified slightly to avoid dividing by zero. For this purpose, a damping parameter $\epsilon$ is chosen and the weighting operator is modified to be:

\begin{displaymath}
diag({\mathbf W_r})_i =
\left\{
\begin{array}{cc}
\vert r_i\...
...\epsilon, & \vert r_i\vert \le \epsilon \\
\end{array}\right.
\end{displaymath}

The choice of this parameter is related to the distribution of the residual values. Some authors choose it as a relatively small value like $\epsilon = \max\vert\mathbf d\vert/100$ and others choose it as a value that corresponds to a small percentile of data as 2 percentile (which is the value with 98% of the values above and 2% below). In this paper, I used the percentile approach to decide the parameter $\epsilon$ because it can reflect the distribution of the residual values in it and shows more stable behavior in the experiments performed in this paper.

The use of the model weight $\mathbf W_m$ is to enhance our preference regarding the model, for example the parsimony or the smoothness of the solution. The introduction of the model weight corresponds to applying precondition and solving the problem

\begin{displaymath}\mathbf L \mathbf W_m \hat {\mathbf m} = d\end{displaymath}

followed by

\begin{displaymath}\mathbf m = \mathbf W_m \mathbf{ \hat{m}}.\end{displaymath}

The iterative solution of this system minimizes the energy of the new model parameter $\mathbf{\hat m}$

\begin{displaymath}\Vert \hat{ \mathbf m} \Vert _2^2 = \mathbf{\hat{m}}^T \mathb...
...at{m}} = \mathbf m^T \mathbf W_m^{-T}\mathbf W_m^{-1}\mathbf m.\end{displaymath}

In the same vein as the residual weight, the model weight $\mathbf W_m$ can be chosen as
\begin{displaymath}
diag ({\mathbf W_m})_i = \vert m_i\vert^{(2-p)/2}.
\end{displaymath} (9)

Then the weighted model energy that is minimized is now

\begin{displaymath}\mathbf m^T \mathbf W_m^{-2} \mathbf m = \Vert\mathbf m\Vert _p^p, \end{displaymath}

which is the $\ell^p$-norm of the model. When the minimum $\ell ^1$-norm model is desired, the weighting is as follows:

\begin{displaymath}diag ({\mathbf W_m})_i = \vert m_i\vert^{1/2}.\end{displaymath}

The IRLS method can be easily incorporated in CG algorithms by including the weights $\mathbf W_r$ and $\mathbf W_m$ such that the operator $\mathbf L$ has a postmultiplier $\mathbf W_r$ and a premultiplier $\mathbf W_m$ and the adjoint operator $\mathbf L^T$ has a premultiplier $\mathbf W_m^T$ and postmultiplier $\mathbf W_r^T$ Claerbout (2004). However, the introduction of weights that change during iterations leads us to implement a nonlinear CG method with two nested loops. The outer loop is for the iteration of changing weights and the inner loop is for the iteration of the LS solution for a given weighted operator. Even though we do not know the real residual/model vector at the beginning of the iteration, we can approximate the real residual/model with a residual/model of the previous iteration step, and it will converge to a residual/model that is very close to the real residual/model as the iteration step continues. This method can be summarized as Algorithm 2, where $f(\mathbf r)$ and $f(\mathbf m)$ represent functions of residual and model described in Equation (7) and Equation (9), respectively.


\begin{algorithm}
% latex2html id marker 93\caption{ CG method for IRLS soluti...
... m \Leftarrow \mathbf W_m \mathbf m $
\ENDWHILE
\end{algorithmic}\end{algorithm}

For efficiency, Algorithm 2 is often implemented not to wait until the convergence of the inner loop and instead implemented to finish the inner loop after a certain number of iterations and recomputes the weights and the corresponding residual  Darche (1989); Nichols (1994). In order to take advantage of plane search in CG, however, the number of the iterations of the inner loop should be more than or equal to two. The experiments performed for the examples of this paper have shown almost no differences between the results of different iteration steps of the inner loop. In this paper, therefore, the IRLS algorithm is implemented to finish the inner loop after two iterations.


next up previous [pdf]

Next: Conjugate-Guided-Gradient (CGG) method Up: CG method for LS Previous: CG method for LS

2011-06-26