next up previous [pdf]

Next: Why steepest descent is Up: KRYLOV SUBSPACE ITERATIVE METHODS Previous: Sign convention

Method of random directions and steepest descent

Let us minimize the sum of the squares of the components of the residual vector given by:

$\displaystyle {\rm residual}$ $\textstyle =$ $\displaystyle \quad\quad
{\rm
\ \ \hbox{transform} \ \ \ \ \quad \hbox{model space}
\ \ \ -\ \
\ \hbox{data space}
}$ (50)
$\displaystyle \left[
\begin{array}{c}
\, \\
\, \\
\bold r\\
\, \\
\, \\
\,
\end{array}\right]$ $\textstyle =$ $\displaystyle \ \
\left[
\begin{array}{cccccc}
\, & \,& \,& \,& \,& \, \\
\,...
...gin{array}{c}
\, \\
\, \\
\bold d\\
\, \\
\, \\
\,
\end{array}\right]$ (51)

A contour plot is based on an altitude function of space. The altitude is the dot product $\bold r \cdot \bold r$. By finding the lowest altitude, we are driving the residual vector $\bold r$ as close as we can to zero. If the residual vector $\bold r$ reaches zero, then we have solved the simultaneous equations $\bold d= \bold F \bold x$. In a two-dimensional world, the vector $\bold x$ has two components, $(x_1,x_2)$. A contour is a curve of constant $\bold r \cdot \bold r$ in $(x_1,x_2)$-space. These contours have a statistical interpretation as contours of uncertainty in $(x_1,x_2)$, with measurement errors in $\bold d$.

Let us see how a random search-direction can be used to reduce the residual $\bold 0\approx \bold r= \bold F \bold x - \bold d$. Let $\Delta \bold x$ be an abstract vector with the same number of components as the solution $\bold x$, and let $\Delta \bold x$ contain arbitrary or random numbers. We add an unknown quantity $\alpha$ of vector $\Delta \bold x$ to the vector $\bold x$, and thereby create $\bold x_{\rm new}$:

\begin{displaymath}
\bold x_{\rm new} \eq \bold x+\alpha \Delta \bold x
\end{displaymath} (52)

The new $\bold x$ gives a new residual:
$\displaystyle \bold r_{\rm new}$ $\textstyle =$ $\displaystyle \bold F\ \bold x_{\rm new} -\bold d$ (53)
$\displaystyle \bold r_{\rm new}$ $\textstyle =$ $\displaystyle \bold F( \bold x+\alpha\Delta\bold x)-\bold d$ (54)
$\displaystyle \bold r_{\rm new} \eq
\bold r+\alpha \Delta\bold r$ $\textstyle =$ $\displaystyle (\bold F \bold x-\bold d)
+\alpha\bold F\Delta\bold x$ (55)

which defines $\Delta \bold r = \bold F \Delta \bold x$.

Next, we adjust $\alpha$ to minimize the dot product: $ \bold r_{\rm new} \cdot \bold r_{\rm new} $

\begin{displaymath}
(\bold r+\alpha\Delta \bold r)\cdot (\bold r+\alpha\Delta \b...
...ta \bold r) \ +\
\alpha^2 \Delta \bold r \cdot \Delta \bold r
\end{displaymath} (56)

Set to zero its derivative with respect to $\alpha$:
\begin{displaymath}
0\eq
2 \bold r \cdot \Delta \bold r + 2\alpha \Delta \bold r \cdot \Delta \bold r
\end{displaymath} (57)

which says that the new residual $\bold r_{\rm new} = \bold r + \alpha \Delta \bold r$ is perpendicular to the ``fitting function'' $\Delta \bold r$. Solving gives the required value of $\alpha$.
\begin{displaymath}
\alpha \eq - \ { (\bold r \cdot \Delta \bold r ) \over
( \Delta \bold r \cdot \Delta \bold r ) }
\end{displaymath} (58)

A ``computation template'' for the method of random directions is:


		 $\bold r \quad\longleftarrow\quad \bold F \bold x - \bold d$ 

iterate {
$\Delta \bold x \quad\longleftarrow\quad {\rm random\ numbers}$
$\Delta \bold r\ \quad\longleftarrow\quad \bold F \ \Delta \bold x$
$\alpha \quad\longleftarrow\quad -( \bold r \cdot \Delta\bold r )/ (\Delta \bold r \cdot \Delta\bold r ) $
$\bold x \quad\longleftarrow\quad \bold x + \alpha\Delta \bold x$
$\bold r\ \quad\longleftarrow\quad \bold r\ + \alpha\Delta \bold r$
}
A nice thing about the method of random directions is that you do not need to know the adjoint operator $\bold F\T$.

In practice, random directions are rarely used. It is more common to use the gradient direction than a random direction. Notice that a vector of the size of $\Delta \bold x$ is:

\begin{displaymath}
\bold g \eq \bold F\T\,\bold r
\end{displaymath} (59)

Recall this vector can be found by taking the gradient of the size of the residuals:
\begin{displaymath}
{\partial \over \partial \bold x\T } \ \bold r \cdot \bold r...
... \,
( \bold F \, \bold x \ -\ \bold d)
\eq
\bold F\T \ \bold r
\end{displaymath} (60)

Choosing $\Delta \bold x$ to be the gradient vector $\Delta\bold x = \bold g = \bold F\T\,\bold r$ is called ``the method of steepest descent.''

Starting from a model $\bold x = \bold m$ (which may be zero), the following is a template of pseudocode for minimizing the residual $\bold 0\approx \bold r= \bold F \bold x - \bold d$ by the steepest-descent method:


		 $\bold r \quad\longleftarrow\quad \bold F \bold x - \bold d$ 

iterate {
$\Delta \bold x \quad\longleftarrow\quad \bold F\T\ \bold r$
$\Delta \bold r\ \quad\longleftarrow\quad \bold F \ \Delta \bold x$
$\alpha \quad\longleftarrow\quad -( \bold r \cdot \Delta\bold r )/ (\Delta \bold r \cdot \Delta\bold r ) $
$\bold x \quad\longleftarrow\quad \bold x + \alpha\Delta \bold x$
$\bold r\ \quad\longleftarrow\quad \bold r\ + \alpha\Delta \bold r$
}

Good science and engineering is finding something unexpected. Look for the unexpected both in data space and in model space. In data space, you look at the residual $\bold r$. In model space, you look at the residual projected there $\bold F\T\,\bold r$. What does it mean? It is simply $\Delta m$, the changes you need to make to your model. It means more in later chapters, where the operator $\bold F$ is a column vector of operators that are fighting with one another to grab the data.


next up previous [pdf]

Next: Why steepest descent is Up: KRYLOV SUBSPACE ITERATIVE METHODS Previous: Sign convention

2014-12-01