next up previous [pdf]

Next: Global vector distributions Up: Parallel sweeping preconditioner Previous: Parallel multifrontal algorithms


Selective inversion

The lackluster scalability of dense triangular solves is well known and a scheme known as selective inversion was introduced in (32) and implemented in (31) specifically to avoid the issue; the approach is characterized by directly inverting every distributed dense triangular matrix which would have been solved against in a normal multifrontal triangular solve. Thus, after performing selective inversion, every parallel dense triangular solve can be translated into a parallel dense triangular matrix-vector multiply.

Suppose that we have paused a multifrontal $ LDL^T$ factorization just before processing a particular front, $ F$ , which corresponds to some supernode, $ \mathcal{S}$ . Then all of the fronts for the descendants of $ \mathcal{S}$ have already been handled, and $ F$ can be partitioned as

$\displaystyle F = \left(\begin{array}{cc} F_{TL} & \star \\ F_{BL} & F_{BR}\end{array}\right),$ (6)

where $ F_{TL}$ holds the Schur complement of supernode $ \mathcal{S}$ with respect to all of its descendants, $ F_{BL}$ represents the coupling of $ \mathcal{S}$ and its descendants to $ \mathcal{S}$ 's ancestors, and $ F_{BR}$ holds the Schur complement updates from the descendants of $ \mathcal{S}$ for the ancestors of $ \mathcal{S}$ . Using hats to denote input states, e.g., $ \hat F_{TL}$ to denote the input state of $ F_{TL}$ , the first step in processing the frontal matrix $ F$ is to overwrite $ F_{TL}$ with its $ LDL^T$ factorization, which is to say that $ \hat F_{TL}$ is overwritten with the strictly lower portion of a unit lower triangular matrix $ L_F$ and a diagonal matrix $ D_F$ such that $ \hat F_{TL} = L_F D_F L_F^T$ .

The partial factorization of $ F$ can then be completed via the following steps:

  1. Solve against $ L_F^T$ to form $ F_{BL} := F_{BL} L_F^{-T}$ .
  2. Form the temporary copy $ Z_{BL} := F_{BL}$ .
  3. Finalize the coupling matrix as $ F_{BL} := F_{BL} D_F^{-1}$ .
  4. Finalize the update matrix as $ F_{BR} := F_{BR} - \hat F_{BL} \hat F_{TL}^{-1} \hat F_{BL}^T
= F_{BR} - Z_{BL} F_{BL}^T$ .
After adding $ F_{BR}$ onto the parent frontal matrix, only $ F_{TL}$ and $ F_{BL}$ are needed in order to perform a multifrontal solve. For instance, applying $ L^{-1}$ to some vector $ x$ proceeds up the elimination tree (starting from the leaves) in a manner similar to the factorization; after handling all of the work for the descendants of some supernode $ \mathcal{S}$ , only a few dense linear algebra operations with $ \mathcal{S}$ 's corresponding frontal matrix, say $ F$ , are required. Denoting the portion of $ x$ corresponding to the degrees of freedom in supernode $ \mathcal{S}$ by $ x_{\mathcal{S}}$ , we must perform:
  1. $ x_{\mathcal{S}} := L_F^{-1} x_{\mathcal{S}}$
  2. $ x_U \equiv -F_{BL} x_{\mathcal{S}}$
  3. Add $ x_U$ onto the entries of $ x$ corresponding to the parent supernode.
The key insight of selective inversion is that, if we invert each distributed dense unit lower triangular matrix $ L_F$ in place, all of the parallel dense triangular solves in a multifrontal triangular solve are replaced by parallel dense matrix-vector multiplies. It is also observed in (32) that the work required for the selective inversion is typically only a modest percentage of the work required for the multifrontal factorization, and that the overhead of the selective inversion is easily recouped if there are several right-hand sides to solve against.

Since each application of the sweeping preconditioner requires two multifrontal solves for each of the $ m=O(n/\gamma)$ subdomains, which are relatively small and likely distributed over a large number of processes, selective inversion will be shown to yield a very large performance improvement. We also note that, while it is widely believed that direct inversion is numerically unstable, in (11) Druinsky and Toledo provide a review of (apparently obscure) results dating back to Wilkinson (in (42)) which show that $ x := \mathrm{inv}(A)\!*\!b$ is as accurate as a backwards stable solve if reasonable assumptions are met on the accuracy of $ \mathrm{inv}(A)$ . Since $ \mathrm{inv}(A)\!*\!b$ is argued to be more accurate when the columns of $ \mathrm{inv}(A)$ have been computed with a backwards-stable solver, and both $ \mathrm{inv}(F_{TL})$ and $ \mathrm{inv}(F_{TL}^T)$ must be applied after selective inversion, it might be worthwhile to modify selective inversion to compute and store two different inverses of each $ F_{TL}$ : one by columns and one by rows.


next up previous [pdf]

Next: Global vector distributions Up: Parallel sweeping preconditioner Previous: Parallel multifrontal algorithms

2014-08-20