next up previous [pdf]

Next: Related work Up: Introduction Previous: Introduction

Moving PML sweeping preconditioner

The focus of this paper is on parallelization of the second class of sweeping preconditioners mentioned above, which makes use of auxiliary problems with artificial radiation boundary conditions in order to approximate the Schur complements that arise during block $ LDL^T$ factorization. The approach is referred to as a moving PML preconditioner since the introductory paper represented the artificial radiation boundary conditions using PML.

One interpretation of radiation conditions is that they allow for the analysis of a finite portion of an infinite domain, as their purpose is to enforce the condition that waves propagate outwards and not reflect at the boundary of the truncated domain. This concept is crucial to understanding the Schur complement approximations that take place within the moving PML sweeping preconditioner which is reintroduced in this paper for the sake of completeness.

For the sake of simplicity, we will describe the preconditioner in the context of a second-order finite-difference discretization over the unit cube, with PML used to approximate a radiation boundary condition on the $ x_3=0$ face and homogeneous Dirichlet boundary conditions applied on all other boundaries (an $ x_1 x_3$ cross-section is shown in Fig. 1). If the domain is discretized into an $ n \times n \times n$ grid, then ordering the vertices in the grid such that vertex $ (i_1,i_2,i_3)$ is assigned index $ i_1 + i_2 n + i_3 n^2$ results in a block tridiagonal system of equations, say

$\displaystyle \left(\begin{array}{ccccc} A_{0,0} & A_{1,0}^T & & & \\ A_{1,0} &...
...\left(\begin{array}{c}f_0\\ f_1\\ \vdots\\ f_{n-2}\\ f_{n-1}\end{array}\right),$ (2)

where $ A_{i,j}$ propagates sources from the $ i$ 'th $ x_1 x_2$ plane into the $ j$ 'th $ x_1 x_2$ plane, and the overall linear system is complex symmetric (not Hermitian) due to the imaginary terms introduced by the PML (15).

plane-with-single-pml
plane-with-single-pml
Figure 1.
An $ x_1 x_3$ cross section of a cube with PML on its $ x_3=0$ face. The domain is shaded in a manner which loosely corresponds to its extension into the complex plane.
[pdf] [png] [tikz]

If we were to ignore the sparsity within each block, then the following naïve factorization and solve algorithms would be appropriate for a direct solver, where the application of $ S_i^{-1}$ implicitly makes use of the factorization of $ S_i$ .


\begin{algorithm2e}
% latex2html id marker 87\DontPrintSemicolon
$S_0 := A_{0,...
...DL^T$\ factorization.
$O(n^7)=O(N^{7/3})$\ work is required.}
\end{algorithm2e}


\begin{algorithm2e}
% latex2html id marker 99\DontPrintSemicolon
\tcp{Apply $L...
... block $LDL^T$\ solve. $O(n^5)=O(N^{5/3})$\ work is required.}
\end{algorithm2e}
While the computational complexities of Algs. 0.0.1 and 0.0.2 are significantly higher than multifrontal algorithms (21,27,12), which have $ O(N^2)$ factorization complexity and $ O(N^{4/3})$ solve complexity for regular three-dimensional meshes, they are the conceptual starting points of the sweeping preconditioner.% latex2html id marker 3333
\setcounter{footnote}{5}\fnsymbol{footnote}

Suppose that we paused Alg. 0.0.1 after computing the $ i$ 'th Schur complement, $ S_i$ , where the $ i$ 'th $ x_1 x_2$ plane is in the interior of the domain (i.e., it is not in a PML region). Due to the ordering imposed on the degrees of freedom of the discretization, the first $ i$ Schur complements are equivalent to those that would have been produced from applying Alg. 0.0.1 to an auxiliary problem formed by placing a homogeneous Dirichlet boundary condition on the $ (i+1)$ 'th $ x_1 x_2$ plane and ignoring all of the successive planes (see the left half of Fig. 2). While this observation does not immediately yield any computational savings, it does allow for a qualitative description of the inverse of the $ i$ 'th Schur complement, $ S_i^{-1}$ : it is the restriction of the half-space Green's function of the exact auxiliary problem onto the $ i$ 'th $ x_1 x_2$ plane (recall that PML can be interpreted as approximating the effect of a domain extending to infinity).

The main approximation made in the sweeping preconditioner can now be succinctly described: since $ S_i^{-1}$ is a restricted half-space Green's function which incorporates the velocity field of the first $ i$ planes, it is natural to approximate it with another restricted half-space Green's function which only takes into account the local velocity field, i.e., by moving the PML next to the $ i$ 'th plane (see the right half of Fig. 2).

auxiliary
auxiliary
Figure 2.
(Left) A depiction of the portion of the domain involved in the computation of the Schur complement of an $ x_1 x_2$ plane (marked with the dashed line) with respect to all of the planes to its left during execution of Alg. 0.0.1. (Middle) An equivalent auxiliary problem which generates the same Schur complement; the original domain is truncated just to the right of the marked plane and a homogeneous Dirichlet boundary condition is placed on the cut. (Right) A local auxiliary problem for generating an approximation to the relevant Schur complement; the radiation boundary condition of the exact auxiliary problem is moved next to the marked plane.
[pdf] [png] [tikz]

If we use $ \gamma(\omega)$ to denote the number of grid points of PML as a function of the frequency, $ \omega$ , then it is important to recognize that the depth of the approximate auxiliary problems in the $ x_3$ direction is $ \Omega(\gamma)$ .% latex2html id marker 3377
\setcounter{footnote}{6}\fnsymbol{footnote}If the depth of the approximate auxiliary problems was $ O(1)$ , then combining nested dissection with the multifrontal method over the regular $ n \times n \times O(1)$ mesh would only require $ O(n^3)$ work and $ O(n^2 \log n)$ storage (21). However, if the PML size is a slowly growing function of frequency, then applying 2D nested dissection to the quasi-2D domain requires $ O(\gamma^3 n^3)$ work and $ O(\gamma^2 n^2 \log n)$ storage, as the number of degrees of freedom in each front increases by a factor of $ \gamma$ and dense factorizations have cubic complexity.

Let us denote the quasi-2D discretization of the local auxiliary problem for the $ i$ 'th plane as $ H_i$ , and its corresponding approximation to the Schur complement $ S_i$ as $ \tilde S_i$ . Since $ \tilde S_i$ is essentially a dense matrix, it is advantageous to come up with an abstract scheme for applying $ \tilde S_i^{-1}$ : Assuming that $ H_i$ was ordered in a manner consistent with the $ (i_1,i_2,i_3) \mapsto i_1+i_2 n+i_3 n^2$ global ordering, the degrees of freedom corresponding to the $ i$ 'th plane come last and we find that

$\displaystyle H_i^{-1}=\left(\begin{array}{cc} \star & \star \\ \star & \tilde S_i^{-1} \end{array}\right),$ (3)

where the irrelevant portions of the inverse have been marked with a $ \star$ . Then, trivially,

$\displaystyle H_i^{-1} \left(\begin{array}{c}0\\ u_i\end{array}\right)= \left(\...
...ay}\right)= \left(\begin{array}{c}\star\\ \tilde S_i^{-1}u_i\end{array}\right),$ (4)

which implies a method for quickly computing $ \tilde S_i^{-1} u_i$ given a factorization of $ H_i$ :


\begin{algorithm2e}
% latex2html id marker 163\DontPrintSemicolon
Form $\hat u...
...ization of $H_i$. $O(\gamma^2 n^2 \log n)$\ work is required.}
\end{algorithm2e}
From now on, we write $ T_i$ to refer to the application of the (approximate) inverse of the Schur complement for the $ i$ 'th plane.

There are two more points to discuss before defining the full serial algorithm. The first is that, rather than performing an approximate $ LDL^T$ factorization using a discretization of $ \mathcal{A}$ , the preconditioner is instead built from a discretization of an artificially damped version of the Helmholtz operator, say

$\displaystyle \mathcal{J} \equiv \left[-\Delta - \frac{(\omega+i\alpha)^2}{c^2(x)}\right],$ (5)

where $ \alpha \approx 2 \pi$ is responsible for the artificial damping. This is in contrast to shifted Laplacian preconditioners (16,5), where $ \alpha$ is typically $ O(\omega)$ (18), and our motivation is to avoid introducing large long-range dispersion error by damping the long range interactions in the preconditioner. Just as $ A$ refers to the discretization of the original Helmholtz operator, $ \mathcal{A}$ , we will use $ J$ to refer to the discretization of the artificially damped operator, $ \mathcal{J}$ .

The second point is much easier to motivate: since the artificial PML in each approximate auxiliary problem is of depth $ \gamma(\omega)$ , processing a single plane at a time would require processing $ O(n)$ subdomains with $ O(\gamma^3 n^3)$ work each. Clearly, processing $ O(\gamma)$ planes at once has the same asymptotic complexity as processing a single plane, and so combining $ O(\gamma)$ planes reduces the setup cost from $ O(\gamma^3 N^{4/3})$ to $ O(\gamma ^2 N^{4/3})$ . Similarly, the memory usage becomes $ O(\gamma N \log N)$ instead of $ O(\gamma^2 N \log N)$ .% latex2html id marker 3465
\setcounter{footnote}{7}\fnsymbol{footnote} If we refer to these sets of $ O(\gamma)$ contiguous planes as panels, which we label from 0 to $ m-1$ , where $ m=O(n/\gamma)$ , and correspondingly redefine $ \{u_i\}$ , $ \{f_i\}$ , $ \{S_i\}$ , $ \{T_i\}$ , and $ \{H_i\}$ , we have the following algorithm for setting up an approximate block $ LDL^T$ factorization of the discrete artificially damped Helmholtz operator:


\begin{algorithm2e}
% latex2html id marker 186\DontPrintSemicolon
$S_0 := J_{0...
...ing preconditioner.
$O(\gamma^2 N^{4/3})$\ work is required.}
\end{algorithm2e}

Once the preconditioner is set up, it can be applied using a straightforward modification of Alg. 0.0.2 which avoids redundant solves by combining the application of $ L^{-1}$ and $ D^{-1}$ :


\begin{algorithm2e}
% latex2html id marker 196\DontPrintSemicolon
\tcp{Apply $...
...ping preconditioner.
$O(\gamma N \log N)$\ work is required.}
\end{algorithm2e}
Given that the preconditioner can be set up with $ O(\gamma ^2 N^{4/3})$ work, and applied with $ O(\gamma N \log N)$ work, it provides a near-linear scheme for solving Helmholtz equations if only $ O(1)$ iterations are required for convergence. The experiments in this paper seem to indicate that, as long as the velocity model does not include a large cavity, the sweeping preconditioner indeed only requires $ O(1)$ iterations.

Though this paper is focused on the parallel solution of Helmholtz equations, which are the time-harmonic form of acoustic wave equations, Tsuji et al. have shown that the moving PML sweeping preconditioner is equally effective for time-harmonic Maxwell's equations (39,38), and we believe that the same will hold true for time-harmonic linear elasticity. The rest of the paper will be presented in the context of the Helmholtz equation, but we emphasize that the parallelization should carry over to more general wave equations in a conceptually trivial way.


next up previous [pdf]

Next: Related work Up: Introduction Previous: Introduction

2014-08-20