Method of random directions and steepest descent

Next: Why steepest descent is Up: KRYLOV SUBSPACE ITERATIVE METHODS Previous: Sign convention

Method of random directions and steepest descent

Let us minimize the sum of the squares of the components of the residual vector given by:

$\displaystyle {\rm residual}$	$\textstyle =$	$\displaystyle \quad\quad {\rm \ \ \hbox{transform} \ \ \ \ \quad \hbox{model space} \ \ \ -\ \ \ \hbox{data space} }$	(50)
$\displaystyle \left[ \begin{array}{c} \, \\ \, \\ \bold r\\ \, \\ \, \\ \, \end{array}\right]$	$\textstyle =$	$\displaystyle \ \ \left[ \begin{array}{cccccc} \, & \,& \,& \,& \,& \, \\ \,... ...gin{array}{c} \, \\ \, \\ \bold d\\ \, \\ \, \\ \, \end{array}\right]$	(51)

A contour plot is based on an altitude function of space. The altitude is the dot product $\bold r \cdot \bold r$ . By finding the lowest altitude, we are driving the residual vector $\bold r$ as close as we can to zero. If the residual vector $\bold r$ reaches zero, then we have solved the simultaneous equations $\bold d= \bold F \bold x$ . In a two-dimensional world, the vector $\bold x$ has two components, . A contour is a curve of constant $\bold r \cdot \bold r$ in -space. These contours have a statistical interpretation as contours of uncertainty in , with measurement errors in $\bold d$ .

Let us see how a random search-direction can be used to reduce the residual $\bold 0\approx \bold r= \bold F \bold x - \bold d$ . Let $\Delta \bold x$ be an abstract vector with the same number of components as the solution $\bold x$ , and let $\Delta \bold x$ contain arbitrary or random numbers. We add an unknown quantity $\alpha$ of vector $\Delta \bold x$ to the vector $\bold x$ , and thereby create $\bold x_{\rm new}$ :

$\begin{displaymath} \bold x_{\rm new} \eq \bold x+\alpha \Delta \bold x \end{displaymath}$

(52)

The new $\bold x$ gives a new residual:

$\displaystyle \bold r_{\rm new}$	$\textstyle =$	$\displaystyle \bold F\ \bold x_{\rm new} -\bold d$	(53)
$\displaystyle \bold r_{\rm new}$	$\textstyle =$	$\displaystyle \bold F( \bold x+\alpha\Delta\bold x)-\bold d$	(54)
$\displaystyle \bold r_{\rm new} \eq \bold r+\alpha \Delta\bold r$	$\textstyle =$	$\displaystyle (\bold F \bold x-\bold d) +\alpha\bold F\Delta\bold x$	(55)

which defines $\Delta \bold r = \bold F \Delta \bold x$ .

Next, we adjust $\alpha$ to minimize the dot product: $\bold r_{\rm new} \cdot \bold r_{\rm new}$

$\begin{displaymath} (\bold r+\alpha\Delta \bold r)\cdot (\bold r+\alpha\Delta \b... ...ta \bold r) \ +\ \alpha^2 \Delta \bold r \cdot \Delta \bold r \end{displaymath}$

(56)

Set to zero its derivative with respect to $\alpha$ :

$\begin{displaymath} 0\eq 2 \bold r \cdot \Delta \bold r + 2\alpha \Delta \bold r \cdot \Delta \bold r \end{displaymath}$

(57)

which says that the new residual $\bold r_{\rm new} = \bold r + \alpha \Delta \bold r$ is perpendicular to the ``fitting function'' $\Delta \bold r$ . Solving gives the required value of $\alpha$ .

$\begin{displaymath} \alpha \eq - \ { (\bold r \cdot \Delta \bold r ) \over ( \Delta \bold r \cdot \Delta \bold r ) } \end{displaymath}$

(58)

A ``computation template'' for the method of random directions is:

A nice thing about the method of random directions is that you do not need to know the adjoint operator $\bold F\T$ .

In practice, random directions are rarely used. It is more common to use the gradient direction than a random direction. Notice that a vector of the size of $\Delta \bold x$ is:

$\begin{displaymath} \bold g \eq \bold F\T\,\bold r \end{displaymath}$

(59)

Recall this vector can be found by taking the gradient of the size of the residuals:

$\begin{displaymath} {\partial \over \partial \bold x\T } \ \bold r \cdot \bold r... ... \, ( \bold F \, \bold x \ -\ \bold d) \eq \bold F\T \ \bold r \end{displaymath}$

(60)

Choosing $\Delta \bold x$ to be the gradient vector $\Delta\bold x = \bold g = \bold F\T\,\bold r$ is called ``the method of steepest descent.''

Starting from a model $\bold x = \bold m$ (which may be zero), the following is a template of pseudocode for minimizing the residual $\bold 0\approx \bold r= \bold F \bold x - \bold d$ by the steepest-descent method:

Good science and engineering is finding something unexpected. Look for the unexpected both in data space and in model space. In data space, you look at the residual $\bold r$ . In model space, you look at the residual projected there $\bold F\T\,\bold r$ . What does it mean? It is simply $\Delta m$ , the changes you need to make to your model. It means more in later chapters, where the operator $\bold F$ is a column vector of operators that are fighting with one another to grab the data.

Next: Why steepest descent is Up: KRYLOV SUBSPACE ITERATIVE METHODS Previous: Sign convention

2014-12-01