Course: Hybrid Programming in HPC - MPI+X @ HLRS

Section outline

General

Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.

Hands-on sessions are included on all days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a "how-to" section. This course provides scientific training in Computational Science and, in addition, the scientific exchange of the participants among themselves.

This course is a joint training event of EuroCC@GCS and EuroCC-Austria, the German and Austrian National Competence Centres for High-Performance Computing. It is organized by the HLRS in cooperation with the VSC Research Center, TU Wien and NHR@FAU.

Agenda & Content

1st day – 21 January 2025

08:45 Join in
09:00 Welcome
09:10 Hunter's hardware architecture and its programming models
Dr. Christian Simmendinger (HPE) and Igor Pasichnyk (AMD), and Johanna Potyka (AMD)
10:00 Break
10:15 Introduction to the general MPI+X course with exercises on existing Hawk cluster
10:45 Programming Models
10:50 - MPI + OpenMP
11:45 Practical (how to compile and start)
12:30 Lunch
14:00 - continue: MPI + OpenMP
14:45 Break
15:00 - continue: MPI + OpenMP
15:45 Practical (how to do pinning)
16:15 Q & A
16:30 End of first day

2nd day – 22 January 2025

08:45 Join in
09:00 - continue: MPI + OpenMP
09:00 - Case study: Simple 2D stencil smoother
09:30 Practical (hybrid through OpenMP parallelization)
10:45 Break
11:00 - Overlapping Communication and Computation
11:30 Practical (taskloops)
12:15 - MPI + OpenMP Conclusions
12:30 Lunch
14:00 - MPI + Accelerators
15:00 Break
15:15 - MPI + Accelerators (continued)
16:15 Q & A
16:30 End of second day

3rd day – 23 January 2025

08:45 Join in
09:00 Programming Models (continued)
09:05 - MPI + MPI-3.0 Shared Memory
10:00 Break
10:15 - MPI Memory Models and Synchronization
11:00 Break
11:15 - Optimized node to node communication
11:35 - Recap - MPI Virtual Topologies
12:05 Lunch
13:35 - Topology Optimization
14:15 Conclusions
14:30 Break
14:45 Practical (replicated data)
16:00 Q & A, Feedback
16:30 End of third day (course)

Date:	Tuesday, January 23, 2024, 08:45 - Thursday, January 25, 2024, 16:30
Location:	HLRS, Room 0.439 / Rühle Saal, University of Stuttgart, Nobelstr. 19, D-70569 Stuttgart, Germany

Lecturers:

Rolf Rabenseifner (HLRS), Claudia Blaas-Schenner (VSC Team, TU Wien), Georg Hager (RRZE)

Course material (here ☺):

http://tiny.cc/MPIX-HLRS

Select section Slides

Collapse Expand
Slides
- Select activity "HLRS Hunter - Architecture and Programming Models" Part 1 by Christian Simmendinger (HPE) (2025-01-22)
  
  "HLRS Hunter - Architecture and Programming Models" Part 1 by Christian Simmendinger (HPE) (2025-01-22) File PDF
- Select activity "HLRS Hunter - Architecture and Programming Models": Part 2 - "MI300A Architecture and Programming model" by Johanna Potyka (AMD) and Igor Pasichnyk (AMD) (2025-01-22)
  
  "HLRS Hunter - Architecture and Programming Models": Part 2 - "MI300A Architecture and Programming model" by Johanna Potyka (AMD) and Igor Pasichnyk (AMD) (2025-01-22) File PDF
- Select activity Presentation slides (updated Jan 16, 2025, 16:37)
  
  Presentation slides (updated Jan 16, 2025, 16:37) File PDF
- Select activity Presentation slides also including all slide animations (updated Jan 16, 2025, 16:49)
  
  Presentation slides also including all slide animations (updated Jan 16, 2025, 16:49) File PDF
Select section Recording of the course 2025-HY-HLRS (Jan 21-23, 2025 at HLRS)

Collapse Expand
Recording of the course 2025-HY-HLRS (Jan 21-23, 2025 at HLRS)
- Select activity The lectures (without the exercises) are recorded ...
  
  The lectures (without the exercises) are recorded in the course 2025/HY-HLRS.
  
  Hunter's hardware architecture and its programming models
  Dr. Christian Simmendinger (HPE) and Igor Pasichnyk (AMD), and Johanna Potyka (AMD)
  HPE slides, AMD slides, recording
  
  Hybrid Programming in HPC - MPI+X
  Claudia Blaas-Schenner, Georg Hager, Rolf Rabenseifner
  slides / slides with animations
  
  Introduction:
  slides 1-13 recording_{(the first 100 seconds are organizational stuff, so please skip them, I apologize for that)}
  
  Programming Models and Optimizations:
  
  MPI + OpenMP
  
  General considerations /
  How to compile, link, and run
  slides 14-29 recording
  
  Hands-on: MPI+OpenMP: he-hy - Hello Hybrid! - compiling, starting: see Exercises below
  
  System topology, ccNUMA , and memory bandwidth /
  
  Memory placement on ccNUMA systems
  slides 32-52 recording
  
  Topology and affinity on multicore
  slides 53-67 recording_{(the first 57 seconds are quiet, so please skip them, I apologize for that)}
  
  Hands-on: MPI+OpenMP: he-hy - Hello Hybrid! - pinning: see Exercises below
  
  Case study: Simple 2D stencil smoother
  slides 69-77 recording
  
  Overlapping communication and computation /
  Communication overlap with OpenMP taskloops
  slides 88-100 recording
  
  Hands-on: MPI+OpenMP: jacobi - hybrid through OpenMP parallelization: see Exercises below
  
  Main advantages, disadvantages, conclusions
  slides 106-109 recording
  
  MPI + Accelerator:
  slides 110-154 recording
  
  MPI + MPI 3 shared memory
  
  General considerations & uses cases /
  Re-cap: MPI_Comm_split & one sided communication /
  How to
  slides 156-183 recording
  
  Hands-on: see lecture slides 184-195, MPI: data-rep - how to avoid replicated data: see Exercises below
  
  MPI memory models & synchronization /
  Shared memory problems /
  Advantages & disadvantages, conclusions
  slides 197-214 recording
  
  Optimized node to node communication
  slides 216-274 recording
  
  Conclusions
  slides 275-282 recording
Select section Exercises

Collapse Expand
Exercises
- Select activity Login to Hawk
  
  Login to Hawk Page
- Select activity MPI+OpenMP: he-hy - Hello Hybrid! - compiling, starting
  
  MPI+OpenMP: he-hy - Hello Hybrid! - compiling, starting Page
- Select activity MPI+OpenMP: he-hy - Hello Hybrid! - pinning
  
  MPI+OpenMP: he-hy - Hello Hybrid! - pinning Page
- Select activity MPI+OpenMP: jacobi - hybrid through OpenMP parallelization
  
  MPI+OpenMP: jacobi - hybrid through OpenMP parallelization Page
- Select activity MPI: data-rep - how to avoid replicated data
  
  MPI: data-rep - how to avoid replicated data Page
- Select activity Download Exercises (outside of course)
  
  Download Exercises (outside of course) Page
Select section Miscellaneous information

Collapse Expand
Miscellaneous information
- Select activity OpenMP standard documents
  
  OpenMP standard documents URL
- Select activity MPI standard documents
  
  MPI standard documents URL
- Select activity LIKWID documentation
  
  LIKWID documentation URL