A Pragmatic Approach to Optimize Execution Time and Cost of Complex Coupled-Physics Codes in Chevron’s HPC
Abstract: This work introduces pragmatic approaches for the systematic wall-clock time and execution cost optimization of complex codes, such as GEOS, in Chevron's Azure HPC environment. The target codes partition the computation at process and thread levels, need to scale to O(1000) of cores or O(100) of accelerators and run with minimal wall-clock times or cost on a diverse variety of processors and h/w platforms. We demonstrate that the performance of these codes is not a monotonically increasing function of the level of h/w resources they use, it varies with simulation model and, it is not easily assessed without running the code on specific h/w. Our approach relies on application profiling to identify Run-Time Configuration (RTC) space points (H,n_nodes,N_thr,n_{thr-rank},…) with minimal wall-clock time or cost and generate strong or weak scalability curves for each interesting simulation model. It leverages target h/w information to optimally place ranks and threads and to reduce the set of RTC points to assess, and it further “compresses” the profiling information to the optimal RTC for each specific node count. Here, n_nodes is the number of model “H” nodes, N_thr the total number of application threads, n_{thr- rank} the threads/rank and, “…” additional parameters like compiler optimization options. The profiling information among other includes initialization, linear-solve, non-linear implicit steps, and MPI times. We identify the performance of the linear and non-linear solvers with the profiling data at the best RTC point, and we gauge actual improvements as algorithms changes by SMEs. We have implemented this approach in a semi-automated run-time optimization framework. We demonstrate the ability of our methodologies to attain significant wall-clock time or cost savings results using GEOS and actual physical models. GEOS is an exascale-grade, multi-physics, multi-scale, simulation framework that advances the state-of-the-art in complex numerical analysis topics. Among others, it can simulate coupled flow, geomechanics and fracture models, including CO2 sequestration and storage, with simulation horizons of O(1000) of years.
Speaker: Michael Thomadakis (Chevron CTC, Innovation and HPC R&D) Authors: Michael Thomadakis (Chevron CTC, Innovation and HPC R&D), Pavel Tomin (Chevron, Reservoir Simulation and R&D), Alex Loddoch (Chevron, CTC, Chevron Fellow) and Victor Magri (Lawrence Livermore National Lab, Hypre Project)