2025 Energy HPC Conference: Full Schedule

arrow_back View All Dates

8:00am CST

Best Practices in HPC Systems Management

Thursday February 27, 2025 8:00am - 4:00pm CST

Speakers: Practitioners and Experts from Industry, Academia, and National Labs

Organizer: Keith Gray (TotalEnergies)

Schedule:

8:00 - 8:30 am: Check-in + Breakfast
8:30 - 8:35 am: Welcome + Introductions
- Keith Gray, TotalEnergies
8:35 - 9:35 am: TACC Update — Click here to view the recording.
- Tommy Minyard, TACC
9:35 - 10:40 am: Facilities Trends and Best Practices — Click here to view the recording.
- Tommy Minyard, TACC
- Kent Blancett, bp
- Jeremy Singer, ExxonMobil
- Bill Guyton, Shell
- Donny Cooper, TotalEnergies
- Wade Vinson, NVIDIA (Host)
10:40 - 11:00 am: Break
11:00 - 12:00 am: Systems Management — Click here to view the recording.
- Ashwini Rathnakara, Shell - Experiences with Singularity, Enroot, and Podman
- Russell Jones, TotalEnergies - Migrating from xCAT to Warewulf
- Jonathon Anderson, CIQ - Surveying the state of CentOS migration
12:00 - 1:00 pm: Lunch
1:00 - 2:00 pm: RoCE in the Blue corner!
- Shawn Hall, Jump Trading
- Donny Cooper, TotalEnergies
2:00 - 2:20 pm: Memory Technologies Update — Click here to view the recording.
- Eric L Pope, HPE
- PJ Waskiewicz, Jump Trading
2:20 - 3:00 pm: Towards an Architecture for Cloud-Native HPC — Click here to view the recording.
- Alex Loddoch, Chevron
- Alex Morris, Microsoft
3:00 - 3:15 am: Break
3:15 - 4:00 pm: Queueing System Migration Experiences — Click here to view the recording.
- Kent Blancett, bp
- Tim Osborne, ORNL
4:00 pm: Close

Thursday February 27, 2025 8:00am - 4:00pm CST
2nd Floor Room 280

Add-On Workshop

8:15am CST

Devito User and Developer Workshop

Thursday February 27, 2025 8:15am - 4:00pm CST

10th Floor Room 1003

4th Annual Devito User and Developer Workshop
10th Floor, Room 1003 | 8:15am - 4:00 pm; Reception at 5:00pm (organized by workshop speakers)
Limited Seating (50 registrants)

Devito provides a powerful set of abstractions for finite-difference models in applications across the energy and medical industry, combining productivity, portability, and performance through symbolic computation and a flexible code-generation framework. DevitoPRO, an advanced extension of the open-source Devito platform, brings further features to cater specifically to high-performance computing (HPC) needs, especially in seismic imaging and inversion.

This workshop features presentations from industry and academia, highlighting experiences and applications of Devito/DevitoPRO in research and production. It includes an overview of ongoing research at Devito Codes and a panel with cloud service providers and hardware vendors discussing emerging trends, opportunities, and challenges in high-performance computing and seismic imaging applications. The workshop will maintain an inclusive atmosphere and will include ample time for discussion.

Materials: Laptop is not required. There will be power, but please charge in advance as some outlets may need to be shared.

Organizers:

Paul Holzhauer (Devito Codes)
Gerard Gorman (Devito Codes)
Fabio Luporini (Devito Codes)

Schedule

8:15-8:45 am: Check-in + Breakfast
8:45 am: Welcome + Introductions
09:00-10:15 am:Invited talks
- Cloud Native Polychromatic Multiparameter FWI backed by Devito — John Washbourne, et. al (Chevron)
- Enabling Rapid Interdisciplinary R&D with Devito — Jeremy Tillay (BP)
- Seismic imaging algorithms development and implementation with Devito — Cosmin Macesanu and Yongzhong Wang (TGS), Hao Hu (University of Oklahoma, formerly TGS)
10:15-10:45 am: Break
10:45-11:35 am: Invited talks
- Uncertainty-aware machine-learning enabled velocity-model building with Devito — Felix Herrmann (Georgia Tech)
- Leveraging Devito to image foothills seismic data — Tim MacArthur, Greg Cameron, and Rob Vestrum (Thrust Belt Imaging)
11:35-11:50 am: Q&A
11:50 am-1:00 pm: Lunch
1:00-1:50 pm:Invited talks
- Spatial Hyperparameter Optimisation for Full Waveform Inversion Algorithms — Peter Barnhill (Seimax), Christos Mavropoulos, Ayush Modi, and Adam Kovacs (S-Cube), Mathias Louboutin and Ed Caunt (Devito Codes)
- Galactic Seismic Imaging Study Phase 2: wave-equation-based model building and imaging in shallow water with XWI, Julia, and Devito — Henry Debens, Rob Eliott-Lockhart, and Jenny Moss (Woodside), Mathias Louboutin, Ed Caunt, and Gerard Gorman (Devito Codes), Christos Mavropoulos, Adam Kovacs, Sourajit Debnath, and Tenice Nangoo (S-Cube)
1:50-2:45 pm: Talks from Devito Codes
- 2024 Recap – Elastic solvers, mixed precision, PETSc, new features, and performance review
- 2025 and Beyond – Roadmap
2:45-3:15 pm: Break
3:15-3:45 pm: Annual GSD Panel discussion
- Chair: Elizabeth L'Heureux (BP)
- Panelists: Joe Greenseid (Microsoft), Gerard Gorman (Devito Codes), Dmitriy Tishechkin (AWS), Marc Spieler (NVIDIA), Keith Ritchie (AMD), and Guoquan Chen (Intel)
3:45-4:00 pm: Closing remarks
5:00 pm: Reception jointly organized by NVIDIA and Devito Codes

Thursday February 27, 2025 8:15am - 4:00pm CST
10th Floor Room 1003

Add-On Workshop

8:30am CST

Scientific Machine Learning

Thursday February 27, 2025 8:30am - 3:00pm CST

Auditorium

Organizers:

Beatrice Riviere (Rice University)
Matthias Heinkenschloss (Rice University)

Schedule
8:30 - 9:00 am: Check-in + Breakfast
9:00 - 10:00 am: Charbel Farhat (Stanford)
10:00 - 11:00 am: Jonas Actor (Sandia National Lab)
11:00 - 11:20 am: Adrian Celaya (Rice University)
11:20 - 11:40 am: Jonathan Cangelosi (Rice University)
11:40 am - 1:00 pm: Lunch
1:00 - 2:00 pm: Elizabeth Qian (Georgia Tech)
2:00 - 3:00 pm: Benjamin Peherstorfer (NYU)

Speaker: Charbel Farhat (Stanford)
Session: Mechanics-Informed Machine Learning for the Discovery of Constitutive Models

Speaker: Jonas Actor (Sandia National Lab)
Session: Leveraging Approximation Theory for Efficient Scientific Machine Learning

Speaker: Adrian Celaya (Rice University)
Session: Learning Finite Difference and Discontinuous Galerkin Solutions to Elliptic Problems via Numerics-Informed Neural Networks

Speaker: Jonathan Cangelosi (Rice University)
Session: Sensitivity-Driven Surrogate Modeling For Trajectory Optimization

Speaker: Elizabeth Qian (Georgia Tech)
Session: Multifidelity Linear Regression for Scientific Machine Learning from Scarce Data

Speaker: Benjamin Peherstorfer (NYU)
Session: Leveraging Nonlinear Latent Dynamics for Numerically Forecasting High-Dimensional Systems

Thursday February 27, 2025 8:30am - 3:00pm CST
Auditorium

Add-On Workshop

8:30am CST

Performance Evaluation of GPU Accelerated HPC and AI Applications Using HPCToolkit, TAU, and ParaTools Pro for E4S(TM)

Thursday February 27, 2025 8:30am - 3:15pm CST

1st Floor Room 106

1st Floor, Room 106 | 8:30 am - 3:15 pm
Limited Seating (20 registrants)

Course Skill Level: 25% basic content, 25% intermediate content, and 50% advanced content

Speakers:

John Mellor-Crummey, Professor of Computer Science and of Electrical and Computer Engineering, Rice University
Sameer Shende, Research Professor and Director of the Performance Research Laboratory, University of Oregon

Schedule:

8:30 - 9:00 am: Check-in + Breakfast
9:00 - 9:30 am: Setup ParaTools Pro for E4S on Cloud Platforms
9:30 - 11:30 am: HPCToolkit
11:30 am - 12:30pm: Lunch
12:30 - 2:30 pm: TAU
2:30 - 2:45 pm: Break
2:45 - 3:15 pm ParaTools Pro for E4S and Conclusion

Materials: Attendees will need to bring their laptop to access materials during the workshop. There will be power, but please charge in advance as some outlets may need to be shared.

Abstract:
The hand-on workshop will present two performance evaluation tools; HPCToolkit and TAU to evaluate and optimize the performance of GPU accelerated HPC and AI applications.

HPCToolkit (https://hpctoolkit.org) is an integrated suite of tools for profiling and tracing of parallel programs on computers ranging from multicore desktop systems to GPU-accelerated supercomputers and cloud platforms. HPCToolkit can measure and analyze executions of fully optimized, dynamically linked parallel applications on tens of thousands of CPU cores and GPUs. It supports multi-lingual codes with external binary-only libraries. It collects sampling based measurements of CPU codes with a controllable overhead. It measures GPU performance using vendor APIs to collect fine-grained measurements using PC sampling or instrumentation and monitors asynchronous GPU operations using activity APIs. HPCToolkit can attribute performance measurements to rich dynamic calling contexts containing procedures, inlined functions, loop nests, and source lines on both CPUs and GPUs.

The TAU Performance System [http://tau.uoregon.edu] is a versatile performance evaluation toolkit supporting both profiling and tracing modes of measurement. It supports performance evaluation of applications running on CPUs and GPUs and supports runtime-preloading of a Dynamic Shared Object (DSO) that allows users to measure the performance without modifying the source code or binary. This tutorial will describe how TAU may be used with MVAPICH and support advanced performance introspection capabilities at the runtime layer. TAU's support for tracking the idle time spent in implicit barriers within collective operations will be demonstrated. TAU also supports event-based sampling at the function, file, and statement level. TAU's support for runtime systems such as CUDA (for NVIDIA GPUs),Level Zero (for Intel oneAPI DPC++/SYCL), ROCm (for AMD GPUs), OpenMP with support for OMPT and Target Offload directives, Kokkos, and MPI allow instrumentation at the runtime system layer while using sampling to evaluate statement-level performance data.

HPCToolkit and TAU will be demonstrated on AWS using the ParaTools Pro for E4S(TM) image. The Extreme-scale Scientific Software Stack (E4S) [https://e4s.io] is a curated, Spack based software distribution of 100+ HPC and AI/ML packages. The Spack package manager is a core component of E4S and it is a platform for product integration and deployment of performance evaluation tools such as HPCToolkit, TAU, DyninstAPI, PAPI, etc. and supports both bare-metal and containerized deployment for CPU and GPU platforms. E4S provides a Spack binary cache and a set of base and full-featured container images with vendor runtimes to support GPU architectures from NVIDIA, Intel, and AMD. E4S is a community effort to provide open-source software packages for developing, deploying, and running scientific applications and tools on HPC platforms.

Speakers

John Mellor-Crummey, PhD

Professor of Computer Science and of Electrical and Computer Engineering, Rice University

John Mellor-Crummey is a Professor of Computer Science at Rice University in Houston, TX. His research focuses on software technology for high-performance parallel computing. His current research focus is tools for measurement and analysis of application performance. He leads the... Read More →

Sameer Shende, PhD

Research Professor and Director of the Performance Research Laboratory, University of Oregon

Sameer Shende serves as a Research Associate Professor and the Director of the Performance Research Laboratory at the University of Oregon and the President and Director of ParaTools, Inc. (USA) and ParaTools, SAS (France). He serves as the lead developer of the Extreme-scale Scientific... Read More →

Thursday February 27, 2025 8:30am - 3:15pm CST
1st Floor Room 106

Add-On Workshop

8:30am CST

Building an Optimized Elastic Finite-difference Propagator from Scratch for FWI on NVIDIA's Latest GPUs

Thursday February 27, 2025 8:30am - 5:00pm CST

Exhibit Hall

Exhibit Hall | 8:30 am - 5:00 pm
Limited seating; 30 registrants - SOLD OUT

Speakers: Guillaume Barnier, Guillaume Thomas-Collignon, and Igor Terentyev (NVIDIA)

Schedule:

8:30 - 9:00 am: Check-in + Breakfast
9:00 - 10:00 am: Introduction, Theory Review (PDE + Numerical Scheme)
10:00 - 11:30 am: Initial Implementation + Profiler Report Introduction and Analysis
11:30 - 12:30 pm: Lunch
12:30 - 1:30 pm: Optimization #1 Using Shared Memory
1:30 - 2:30 pm: Optimization #2 Using Asynchronous Shared Memory Loads
2:30 - 3:30 pm: Optimization #3 Using TMA
3:30 - 4:00 pm: Break
4:00 - 5:00 pm: Theory Review on Adjoint System of Equations for FWI, Numerical Implementation, and Differences with Forward

Materials: It is highly recommended for attendees to bring their own laptop, but the speakers will still try to make the workshop understandable and adapted for people that do not have a computer. There will be power, but please charge in advance as some outlets may need to be shared.

Abstract: Elastic full waveform inversion (FWI) is becoming the industry's standard for subsurface model parameter estimation. However, this technique requires to simulate hundreds of thousands of wave propagations by numerically solving a system of partial differential equations (PDE). Consequently, implementing an efficient numerical scheme on GPUs is critical.

In this workshop, we propose to teach the attendees how to gradually build finite-difference (FD) propagators for elastic media (ISO and VTI) optimized for Nvidia's latest GPUs (Ampere, Hopper, and Blackwell).

We provide a brief theoretical review, and we describe the numerical scheme we implement, which is based on a staggered-grid approach for both time and space. We then gradually implement multiple versions of the forward propagator, starting from a baseline implementation that requires minimum GPU hardware knowledge, to our fastest version using asynchronous load to shared memory. At each step, we use our profiling tool - Nsight Compute (NCU) - to identify bottlenecks in our kernels and we show how to leverage Nvidia's new hardware features to mitigate these bottlenecks. Finally, we show how to derive and efficiently implement the adjoint propagator required for the elastic FWI gradient computation.

Thursday February 27, 2025 8:30am - 5:00pm CST
Exhibit Hall

Add-On Workshop