NHR Learning Platform
Skip available courses
Skip site announcements
Available courses
Lecturers:
- Dr.-Ing. J. Eitzinger (jan.eitzinger@fau.de), Martensstr. 1, Room Room 1.025-113. Phone -28911
Location: The seminar will be conducted hybrid (in person and online).
Room: 11501.01.019 (01.019 Seminarraum) Elektrotechnik - Cauerstraße 7-9, 1. OG
Zoom Link: TBA
Time: Tuesday 14:15 - 15:45, First lecture: October 18th, 2022 (also see Campo)
Credits: 5 ECTS credits. This requires two talks and a written seminar report.
Possible topics can be found in the intro talk (see below).
Tutors:
- Christie Louis Alappat (christie.alappat@fau.de), RRZE, Room 1.025-113
In all modern HPC systems, the compute node is where code is executed
and "performance is generated." Hence, this is where a deep
understanding of the performance issues of any application must start.
At first glance, computer architecture appears extremely intricate,
making it next to impossible to derive general rules for good
performance. However, on closer inspection it turns out that there is a
surprisingly small number of guiding principles which govern most of the
performance behavior of HPC codes.
This online tutorial wants to convey those components of compute node
architecture that are most relevant for performance in HPC. We start
with the core level and cover code execution via pipelining and
out-of-order processing, Single Instruction Multiple Data (SIMD), and
Simultaneous Multi-Threading (SMT). Advancing through the memory
hierarchy, we look at cache hierarchies, main memory, and cache-coherent
non-uniform memory (ccNUMA) architecture. The commonalities and
differences between CPUs and GPUs are clearly described. Using simple
compute kernels from computational science, we show how architectural
features interact with code. We also introduce the Roofline performance
model as a simple way to formulate quantitative performance
expectations, compare them with observations, and derive possible
optimizations. Simple performance tools are introduced that favor
insight instead of automation.
To make this online event interactive, several online quizzes are
interspersed with lectures. Participants can also solve exercise
problems using H5P online content and our interactive "Layer Condition
Calculator" for stencil codes.
This course, a collaboration of the Erlangen National High Performance Computing Center (NHR@FAU) and the Leibniz Supercomputing Center (LRZ), is targeted at students and scientists with interest in programming modern HPC hardware, specifically the large scale parallel computing systems available in Jülich, Stuttgart and Munich but also smaller clusters.
This course teaches performance engineering approaches on the compute node level. "Performance engineering" as we define it is more than employing tools to identify hotspots and bottlenecks. It is about developing a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. Once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of optimizations can often be predicted. We introduce a "holistic" node-level performance engineering strategy and apply it to different algorithms from computational science.
Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory). This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI. Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming.
Hands-on sessions are included on both days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a "how-to" section. This course provides scientific training in Computational Science, and in addition, the scientific exchange of the participants among themselves. This course is organized by VSC (Vienna Scientific Cluster) in cooperation with HLRS and RRZE.
Skip site announcements
Site announcements
There are no discussion topics yet in this forum