Working in an optimization application, we begin from residuals between theory and practice. These residuals can be scaled to make new optimization residuals before we start minimizing their energy. What scaling should we use? The scaling can be a simple weighting function or a filter. A filter is simply a weighting function in Fourier space.
The basic idea of common sense, which also comes to us as results proven by Gauss or from the theory of statistical signal processing, is this: The optimization residuals should be roughly of equal scale. This makes sense because squaring magnifies scale, and anything small will be ignored while anything large will dominate. Scaling optimization residuals to be in a common range makes them all equally influential on the final solution. Not only should optimization residuals be of like scale in physical space, they should be of like scale in Fourier space or eigenvector space, or any other space that we might use to represent the optimization residuals. This implies that the optimization residuals should be uncorrelated. If the optimization residuals were correlated, they would have a spectrum that was not white. Not white means of differing sizes in Fourier space. Residuals should be the same size as one another in physical space, likewise in Fourier space. Thus the optimization residuals should be orthogonal and of unit scale, much like Fourier components or as eigenvectors are orthonormal.
Let us approach the problem backwards. Suppose we have two random variables that we take to be the ideal optimization residuals and . In reality the two may be few or trillions. In the language of statistics, the optimization residuals are expected to have zero mean, an idea that is formalized by writing and . Likewise these ideal optimization residuals have equal energy, and . Finally, these two optimization residuals are uncorrelated, a condition which is written as . The expectation symbol is like a summation over many instances of the random variable.
Now suppose there exists a transformation from these ideal optimization residuals to two experimental residuals and , say where
Given a matrix , there is a simple well-known method called the Cholesky factorization method that will factor into two parts like and . The method creates for us either an upper or a lower triangular matrix (our choice) for . You can easily reinvent the Cholesky method if you multiply the symbols for two triangular matrices like and and notice the procedure that works backwards from to . The experimenter seeks not , however, but its inverse, the matrix that takes us from the experimental residuals to the ideal optimization residuals that are uncorrelated and of equal energies. The Cholesky factorization costs computations, which is about the same as the cost of the matrix inversion of or . For geophysical maps and other functions on Cartesian spaces, the Prediction Error Filter (PEF) accomplishes the same general goal and has the advantage that we have already learned how to perform the operation using operators instead of matrices.
|The multivariate spectrum of experimental residuals is the matrix . For optimum model finding, the experimental residuals (squared) should be weighted inversely (matrix inverse) by their multivariate spectrum.|
If I were a little stronger at analysis (or rhetoric) I would tell you that the optimizers preconditioned variable is the statisticians IID (Independent Identically Distributed) random variable. For stationary (statistically constant) signals and images, is the model-space PEF. Echo soundings and interval velocity have statistical properties that change with depth. There is a diagonal weighting matrix (perhaps before or after a PEF).