next up previous [pdf]

Next: Numerical examples Up: GPU implementation using CPML Previous: Memory manipulation

Code organization

Allowing for the GPU block alignment, the thickness of CPML boundary is chosen to be 32. Most of the CUDA kernels are configured with a block size 16x16. Some special configurations are related to the initialization and calculation of CPML boundary area. The CPML variables are initialized along x and z axis with CUDA kernels cuda_init_abcz($ \ldots$ ) and cuda_init_abcx($ \ldots$ ). When device_alloc($ \ldots$ ) is invoked to allocate memory, there is a variable phost to control the percentage of the effective boundary saved on host and device memory by calling the function cudaHostAlloc($ \ldots$ ). A pointer is referred to the pinned memory via cudaHostGetDevicePointer($ \ldots$ ). The wavelet is generated on device using cuda_ricker_wavelet($ \ldots$ ) with a dominant frequency fm and delayed wavelength. Adding a shot can be done by a smooth bell transition cuda_add_bellwlt($ \ldots$ ). We implement RTM (of order NJ=2, 4, 6, 8, 10) with forward and backward propagation functions step_forward($ \ldots$ ) and step_backward($ \ldots$ ), in which the shared memory is also used for faster computation. The cross-correlation imaging of each shot is done by cuda_cross_correlate($ \ldots$ ). The final image can be obtained by stacking the images of many shots using cuda_imaging($ \ldots$ ). Most of the low-frequency noise can be removed by applying the muting function cuda_mute($ \ldots$ ) and the Laplacian filtering cuda_laplace_filter($ \ldots$ ).


next up previous [pdf]

Next: Numerical examples Up: GPU implementation using CPML Previous: Memory manipulation

2021-08-31