2025 Energy HPC Conference: Full Schedule

arrow_back View All Dates

8:30am CST

Building an Optimized Elastic Finite-difference Propagator from Scratch for FWI on NVIDIA's Latest GPUs

Thursday February 27, 2025 8:30am - 5:00pm CST

Exhibit Hall | 8:30 am - 5:00 pm
Limited seating; 30 registrants - SOLD OUT

Speakers: Guillaume Barnier, Guillaume Thomas-Collignon, and Igor Terentyev (NVIDIA)

Schedule:

8:30 - 9:00 am: Check-in + Breakfast
9:00 - 10:00 am: Introduction, Theory Review (PDE + Numerical Scheme)
10:00 - 11:30 am: Initial Implementation + Profiler Report Introduction and Analysis
11:30 - 12:30 pm: Lunch
12:30 - 1:30 pm: Optimization #1 Using Shared Memory
1:30 - 2:30 pm: Optimization #2 Using Asynchronous Shared Memory Loads
2:30 - 3:30 pm: Optimization #3 Using TMA
3:30 - 4:00 pm: Break
4:00 - 5:00 pm: Theory Review on Adjoint System of Equations for FWI, Numerical Implementation, and Differences with Forward

Materials: It is highly recommended for attendees to bring their own laptop, but the speakers will still try to make the workshop understandable and adapted for people that do not have a computer. There will be power, but please charge in advance as some outlets may need to be shared.

Abstract: Elastic full waveform inversion (FWI) is becoming the industry's standard for subsurface model parameter estimation. However, this technique requires to simulate hundreds of thousands of wave propagations by numerically solving a system of partial differential equations (PDE). Consequently, implementing an efficient numerical scheme on GPUs is critical.

In this workshop, we propose to teach the attendees how to gradually build finite-difference (FD) propagators for elastic media (ISO and VTI) optimized for Nvidia's latest GPUs (Ampere, Hopper, and Blackwell).

We provide a brief theoretical review, and we describe the numerical scheme we implement, which is based on a staggered-grid approach for both time and space. We then gradually implement multiple versions of the forward propagator, starting from a baseline implementation that requires minimum GPU hardware knowledge, to our fastest version using asynchronous load to shared memory. At each step, we use our profiling tool - Nsight Compute (NCU) - to identify bottlenecks in our kernels and we show how to leverage Nvidia's new hardware features to mitigate these bottlenecks. Finally, we show how to derive and efficiently implement the adjoint propagator required for the elastic FWI gradient computation.

Thursday February 27, 2025 8:30am - 5:00pm CST
Exhibit Hall

Add-On Workshop