next up previous [pdf]

Next: SeaBeam Up: Preconditioning Previous: INVERSE LINEAR INTERPOLATION

EMPTY BINS AND PRECONDITIONING

There are at least three ways to fill empty bins. Two require a roughening operator $ \bold A$ while the third requires a smoothing operator, which (for comparison purposes) we denote $ \bold A^{-1}$ . The three methods are generally equivalent though they differ in significant details.

The original way in Chapter [*] is to restore missing data by ensuring the restored data, after specified filtering, has minimum energy, say $ \bold A\bold m\approx \bold 0$ . Introduce the selection mask operator $ \bold K$ , a diagonal matrix with ones on the known data and zeros elsewhere (on the missing data). Thus, $ \bold 0 \approx \bold A(\bold I-\bold K+\bold K)\bold m $ or:

$\displaystyle \bold 0 \quad\approx\quad \bold A (\bold I-\bold K) \bold m  + \bold A \bold m_k\;,$ (43)

where we define $ \bold m_k$ to be the data with missing values set to zero by $ \bold m_k=\bold K\bold m$ .

A second way to find missing data is with the set of goals:

\begin{displaymath}\begin{array}{rrrrr} \bold 0 & \approx & \bold K \bold m & - ...
...\ \bold 0 & \approx & \epsilon \bold A \bold m & & \end{array}\end{displaymath} (44)

and take the limit as the scalar $ \epsilon \rightarrow 0$ . At that limit, we should have the same result as equation (43).

There is an important philosophical difference between the first method and the second. The first method strictly honors the known data. The second method acknowledges that when data misfits the regularization theory, it might be the fault of the data, so the data need not be strictly honored. Just what balance is proper falls to the numerical choice of $ \epsilon$ , a nontrivial topic.

A third way to find missing data is to precondition equation (44), namely, try the substitution $ \bold m = \bold A^{-1} \bold p$ .

\begin{displaymath}\begin{array}{rrrrr} \bold 0 & \approx & \bold K \bold A^{-1}...
...old m_k \ \bold 0 & \approx & \epsilon \bold p & & \end{array}\end{displaymath} (45)

There is no simple way of knowing beforehand what is the best value of $ \epsilon$ . Practitioners like to see solutions for various values of $ \epsilon$ . Of course, that can cost a lot of computational effort. Practical exploratory data analysis is more pragmatic. Without a simple clear theoretical basis, analysts generally begin from $ \bold p=0$ and abandon the fitting goal $ \epsilon \bold I \bold p\approx 0$ . Implicitly, they take $ \epsilon=0$ . Then, they examine the solution as a function of iteration, imagining that the solution at larger iterations corresponds to smaller $ \epsilon$ . There is an eigenvector analysis indicating some kind of basis for this approach, but I believe there is no firm guidance.



Subsections
next up previous [pdf]

Next: SeaBeam Up: Preconditioning Previous: INVERSE LINEAR INTERPOLATION

2015-05-07