Monte Carlo inversion

Equipped with the general concepts on time-warping and the choice of 3D GMA described in the previous sections, we propose to solve this moveout inversion problem with a global Monte Carlo inversion method. This choice is encouraged by the non-uniqueness of the solution of this inverse problem that only relies on traveltime (kinematic) data and the fact that the 3D GMA depends non-linearly on associated moveout parameters. By employing a global optimization scheme within a Bayesian framework, we can map out the non-uniqueness of possible solutions and present them as posterior probability distributions, while also honoring the non-linearity of the problem. Here, we largely follow the notations of Mosegaard and Tarantola (1995) and Tarantola (2005), where we express the posterior probability density function $\sigma(\mathbf{m})$ as

$\displaystyle \sigma(\mathbf{m}) = k\rho(\mathbf{m}) L(\mathbf{m})~.$ (6)

$\rho(\mathbf{m})$ denotes the prior probability density function of model parameters $\mathbf{m}$ and $L(\mathbf{m})$ represents the likelihood function (with an appropriate normalization constant $k$) that measures the degree of fit between predicted data and observed data. Assuming a prior $\rho(\mathbf{m})$ that we can draw samples from, the goal is to design a selection process based on $L(\mathbf{m})$ that will result in samples whose density directly represents $\sigma(\mathbf{m})$.

Mosegaard and Tarantola (1995) showed that given samples of $\rho(\mathbf{m})$ and a choice of $L(\mathbf{m})$, the selection process can be done by means of the Metropolis-Hastings algorithm. In this study, we define $L(\mathbf{m})$ as:

$\displaystyle L(\mathbf{m}) = \frac{1}{(2 \pi S )^{N/2}}\exp{\left(-\frac{1}{2 S^2} \sum^{N}_{n=1} (F^{\text{obs}}_n - F_n(\mathbf{m}))^2 \right)}~,$ (7)

where for each $t_0$, the misfit between the observed $F^{\text{obs}}_n$ and modeled $F_n(\mathbf{m})$ for all $N$ traces in the considered CMP gather is evaluated and summed according to equation 7. Here, $S$ represents a `hyperparameter' related to data uncertainty, which we choose to express as

$\displaystyle S = \frac{S_{\%}}{100} \sqrt{\frac{\sum^{N}_{n=1} (F^{\text{obs}}_n)^2}{N}}~,$ (8)

with $S_{\%}$ denoting the magnitude of uncertainty expressed in percent with respect to the root mean square (RMS) of the data $F^{\text{obs}}_n$. Because $t_0$ is constant for each CMP event, $S$ can also be considered as a relative uncertainty in the estimated reflection traveltime squared (equation 2).

We emphasize that data uncertainty is generally not easily quantified in practice due to the workflow of estimating $F$. To account for this complication, we propose to utilize the transdimensional (hierarchical Bayesian) inversion framework (Sambridge et al., 2006; Malinverno and Briggs, 2004; Sambridge et al., 2013), which permits us to treat $S_{\%}$ as yet another parameter to be estimated during the inversion. In this way, we obviate the need to estimate the data uncertainty explicitly, while correctly honoring its effects in our inversion. Our implementation of the Monte Carlo inversion with Metropolis-Hastings rule can, therefore, be summarized as follows:

  1. Assume that we start at a point in the model space $j$ with a guess set of $\mathbf{m}_j$ drawn from the prior distributions of each model parameter. In the case of the 3D GMA, $\mathbf{m}$ consists of $W_i$, $A_i$, $B_i$, $C_i$, and $S_{\%}$. For uniform prior distributions, a random number generator can be used for drawing samples; whereas, for Gaussian prior distributions, a combination of a random number generator and the Box-Muller transform can be used.

  2. Evaluate the current likelihood function $L_j = L(\mathbf{m}_j)$ according to equation 7 with a choice of cutoff offset that depends on the example at hand.

  3. Attempt to move to another point $i$ with a new guess of $\mathbf{m}_i$ drawn from the same distribution. This attempt is evaluated according to the following Metropolis-Hastings rule.

  4. Record the values of $\mathbf{m}$ after the selection.

  5. Repeat steps 2–4 as many times as needed.

In our workflow, we propose to perform moveout parameter estimation with two inversion runs:

  1. The first run is aimed at estimating $W_i$ using only small-offset data. At this stage, we assume that the prior distribution is uniform within the chosen bounds (min and max values) for each model parameter in $\mathbf{m}$ and specify a small cutoff offset (e.g., where the offset-to-depth ratio is assumed to be unity).
  2. Subsequently, the resulting distributions of $W_i$ from the first run are assumed to be Gaussian and will be used as new prior distributions for $W_i$ in the second inversion run with all available (small- and large-offsets) data to estimate $A_i$, $B_i$, and $C_i$. Better prior distributions of $W_i$ from the first run are expected to help with the estimation of $A_i$, $B_i$, and $C_i$. The prior distributions for other pertaining parameters are again assumed to be uniform within some chosen bounds (min and max values).

After the two runs, the final recorded $\mathbf{m}$ can be visualized as histograms that represent 1D marginal posterior probability distributions of the estimated moveout parameters $\mathbf{m}$. The corresponding posterior probability density function $\sigma(\mathbf{m})$ can subsequently be obtained with appropriate normalization of the histograms. In order to quantify the `gain of information' from the moveout inversion process, we use the Kullback-Leibler divergence $D_{KL} \in [0,\infty)$ given by,

$\displaystyle D_{KL} = \int \sigma(\mathbf{m})\ln{\left( \frac{\sigma(\mathbf{m})}{\rho(\mathbf{m})}\right)} d\mathbf{m}~,$ (9)

which estimates the relative difference between the prior $\rho(\mathbf{m})$ and the posterior $\sigma(\mathbf{m})$. A high $D_{KL}$ value indicates that the posterior distribution is very different from the prior, and therefore, has gained new information on the moveout parameters through the inversion process using the provided data (i.e, moveout traveltime shifts $F^{\text{obs}}_n$ in our problem). On the other hand, $D_{KL}=0$ indicates that the two distributions are identical and no gain has been achieved upon seeing the data. In all of our examples, we report $D_{KL}$ with respect to the input uniform prior distributions to the first inversion run.

Even though $D_{KL}$ is generally unbounded, it is possible to compute a benchmark value for some simple case. For example, if we consider Gaussian prior and posterior distributions, the Kullback-Leibler divergence ($D_{KL}$) can be expressed as

$\displaystyle D_{KL} = \ln{\left(\sqrt{\frac{v_\rho}{v_\sigma}}~\right)} + \frac{v_\sigma +(\mu_\sigma-\mu_\rho)^2}{2 v_\rho} -\frac{1}{2}~,$ (10)

where the $v_i$ and $\mu_i$ are the variance and mean of the distributions, respectively. It follows that if no information is gained during the inversion process and $\sigma(v_\sigma,\mu_\sigma) = \rho(v_\rho,\mu_\rho)$, then $D_{KL}=0$. If the standard deviation of the posterior reduces to half that of the prior with the same mean, the Kullback-Leibler divergence is equal to

$\displaystyle D_{KL} = \ln{(2)} + \frac{1}{8} -\frac{1}{2} = 0.318 ~.$ (11)


2024-07-04