Name: Performance Evaluation of GPU Accelerated HPC and AI Applications Using HPCToolkit, TAU, and ParaTools Pro for E4S(TM)
Start: 2025-02-27T08:30:00-0600
End: 2025-02-27T15:15:00-0600

Thursday February 27, 2025 8:30am - 3:15pm CST

1st Floor Room 106

1st Floor, Room 106 | 8:30 am - 3:15 pm
Limited Seating (20 registrants)

Course Skill Level: 25% basic content, 25% intermediate content, and 50% advanced content

Speakers:

John Mellor-Crummey, Professor of Computer Science and of Electrical and Computer Engineering, Rice University
Sameer Shende, Research Professor and Director of the Performance Research Laboratory, University of Oregon

Schedule:

8:30 - 9:00 am: Check-in + Breakfast
9:00 - 9:30 am: Setup ParaTools Pro for E4S on Cloud Platforms
9:30 - 11:30 am: HPCToolkit
11:30 am - 12:30pm: Lunch
12:30 - 2:30 pm: TAU
2:30 - 2:45 pm: Break
2:45 - 3:15 pm ParaTools Pro for E4S and Conclusion

Materials: Attendees will need to bring their laptop to access materials during the workshop. There will be power, but please charge in advance as some outlets may need to be shared.

Abstract:
The hand-on workshop will present two performance evaluation tools; HPCToolkit and TAU to evaluate and optimize the performance of GPU accelerated HPC and AI applications.

HPCToolkit (https://hpctoolkit.org) is an integrated suite of tools for profiling and tracing of parallel programs on computers ranging from multicore desktop systems to GPU-accelerated supercomputers and cloud platforms. HPCToolkit can measure and analyze executions of fully optimized, dynamically linked parallel applications on tens of thousands of CPU cores and GPUs. It supports multi-lingual codes with external binary-only libraries. It collects sampling based measurements of CPU codes with a controllable overhead. It measures GPU performance using vendor APIs to collect fine-grained measurements using PC sampling or instrumentation and monitors asynchronous GPU operations using activity APIs. HPCToolkit can attribute performance measurements to rich dynamic calling contexts containing procedures, inlined functions, loop nests, and source lines on both CPUs and GPUs.

The TAU Performance System [http://tau.uoregon.edu] is a versatile performance evaluation toolkit supporting both profiling and tracing modes of measurement. It supports performance evaluation of applications running on CPUs and GPUs and supports runtime-preloading of a Dynamic Shared Object (DSO) that allows users to measure the performance without modifying the source code or binary. This tutorial will describe how TAU may be used with MVAPICH and support advanced performance introspection capabilities at the runtime layer. TAU's support for tracking the idle time spent in implicit barriers within collective operations will be demonstrated. TAU also supports event-based sampling at the function, file, and statement level. TAU's support for runtime systems such as CUDA (for NVIDIA GPUs),Level Zero (for Intel oneAPI DPC++/SYCL), ROCm (for AMD GPUs), OpenMP with support for OMPT and Target Offload directives, Kokkos, and MPI allow instrumentation at the runtime system layer while using sampling to evaluate statement-level performance data.

HPCToolkit and TAU will be demonstrated on AWS using the ParaTools Pro for E4S(TM) image. The Extreme-scale Scientific Software Stack (E4S) [https://e4s.io] is a curated, Spack based software distribution of 100+ HPC and AI/ML packages. The Spack package manager is a core component of E4S and it is a platform for product integration and deployment of performance evaluation tools such as HPCToolkit, TAU, DyninstAPI, PAPI, etc. and supports both bare-metal and containerized deployment for CPU and GPU platforms. E4S provides a Spack binary cache and a set of base and full-featured container images with vendor runtimes to support GPU architectures from NVIDIA, Intel, and AMD. E4S is a community effort to provide open-source software packages for developing, deploying, and running scientific applications and tools on HPC platforms.

Speakers

John Mellor-Crummey, PhD

Professor of Computer Science and of Electrical and Computer Engineering, Rice University

John Mellor-Crummey is a Professor of Computer Science at Rice University in Houston, TX. His research focuses on software technology for high-performance parallel computing. His current research focus is tools for measurement and analysis of application performance. He leads the... Read More →

Sameer Shende, PhD

Research Professor and Director of the Performance Research Laboratory, University of Oregon

Sameer Shende serves as a Research Associate Professor and the Director of the Performance Research Laboratory at the University of Oregon and the President and Director of ParaTools, Inc. (USA) and ParaTools, SAS (France). He serves as the lead developer of the Extreme-scale Scientific... Read More →

Thursday February 27, 2025 8:30am - 3:15pm CST
1st Floor Room 106

Add-On Workshop

2025 Energy HPC Conference

John Mellor-Crummey, PhD

Sameer Shende, PhD

Attendees (1)

2025 Energy HPC Conference

John Mellor-Crummey, PhD

Sameer Shende, PhD

Attendees (1)

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!