Multidimensional autoregression |

Working in an optimization application, we begin from residuals between theory and practice. These residuals can be scaled to make new optimization residuals before we start minimizing their energy. What scaling should we use? The scaling can be a simple weighting function or a filter. A filter is simply a weighting function in Fourier space.

The basic idea of common sense, which also comes to us as results proven by Gauss or from the theory of statistical signal processing, is this: The optimization residuals should be roughly of equal scale. This makes sense because squaring magnifies scale, and anything small will be ignored while anything large will dominate. Scaling optimization residuals to be in a common range makes them all equally influential on the final solution. Not only should optimization residuals be of like scale in physical space, they should be of like scale in Fourier space or eigenvector space, or any other space that we might use to represent the optimization residuals. This implies that the optimization residuals should be uncorrelated. If the optimization residuals were correlated, they would have a spectrum that was not white. Not white means of differing sizes in Fourier space. Residuals should be the same size as one another in physical space, likewise in Fourier space. Thus the optimization residuals should be orthogonal and of unit scale, much like Fourier components or as eigenvectors are orthonormal.

Let us approach the problem backwards. Suppose we have two random variables that we take to be the ideal optimization residuals and . In reality the two may be few or trillions. In the language of statistics, the optimization residuals are expected to have zero mean, an idea that is formalized by writing and . Likewise these ideal optimization residuals have equal energy, and . Finally, these two optimization residuals are uncorrelated, a condition which is written as . The expectation symbol is like a summation over many instances of the random variable.

Now suppose there exists a transformation from these ideal optimization residuals to two experimental residuals and , say where

(53) |

The experimental residuals and are likely to be neither orthogonal nor equal in energy. From the column vector , the experimenter can form a square matrix. Let us also allow the experimenter to write the symbol to denote summation over many trials or over many sections of data, ranges over time or space, over soundings or over receiver locations. The experimenter writes

(54) | |||

(55) |

Given a random variable , the expectation of is simply . The symbol is a summation on random variables, but constants like the coefficients of pass right through it. Thus,

(56) | |||

(57) | |||

(58) | |||

(59) |

Given a matrix
,
there is a simple well-known method
called the **Cholesky factorization** method that will factor
into two parts like
and
.
The method creates for us either an upper or a lower triangular
matrix (our choice) for
.
You can easily reinvent the Cholesky method
if you multiply
the symbols for two triangular matrices like
and
and notice the procedure that
works backwards from
to
.
The experimenter seeks not
, however, but its inverse,
the matrix that takes us from the experimental residuals
to the ideal optimization residuals
that are uncorrelated and of equal energies.
The Cholesky factorization costs
computations,
which is about the same as the cost of the matrix inversion
of
or
.
For geophysical maps and other functions on Cartesian spaces,
the Prediction Error Filter (PEF) accomplishes the same
general goal and has the advantage that we have already
learned how to perform the operation using operators
instead of matrices.

The multivariate spectrum of experimental residuals is the matrix . For optimum model finding, the experimental residuals (squared) should be weighted inversely (matrix inverse) by their multivariate spectrum. |

If I were a little stronger at analysis (or rhetoric)
I would tell you that
the optimizers preconditioned variable
is the statisticians IID (Independent Identically Distributed) random variable.
For stationary (statistically constant) signals and images,
is the model-space PEF.
Echo soundings and
**interval velocity**
have statistical properties
that change with depth.
There
is a diagonal weighting matrix
(perhaps before or after a PEF).

Multidimensional autoregression |

2013-07-26