Course: A Beginner's Introduction to Node-Level Computer Architecture and Performance

General Information

In all modern HPC systems, the compute node is where code is executed and "performance is generated." Hence, this is where a deep understanding of the performance issues of any application must start. At first glance, computer architecture appears extremely intricate, making it next to impossible to derive general rules for good performance. However, on closer inspection it turns out that there is a surprisingly small number of guiding principles which govern most of the performance behavior of HPC codes. This online tutorial wants to convey those components of compute node architecture that are most relevant for performance in HPC. We start on the core level and cover code execution via pipelining and out-of-order processing, Single Instruction Multiple Data (SIMD), and Simultaneous Multi-Threading (SMT). Advancing through the memory hierarchy, we look at cache hierarchies, main memory, and cache-coherent non-uniform memory (ccNUMA) architecture. The commonalities and differences between CPUs and GPUs are clearly described. Using simple compute kernels from computational science, we show how architectural features interact with code. We also introduce the Roofline performance model as a simple way to formulate quantitative performance expectations, compare them with observations, and derive possible optimizations. Simple performance tools are introduced that favor insight instead of automation. To make this online event interactive, several online quizzes are interspersed with lectures. Participants can also solve exercise problems using H5P online content and our interactive "Layer Condition Calculator" for stencil codes.

Half-day online seminar for ISC High Performance 2023.

Lecturer: Dr. Georg Hager, Erlangen National High Performance Computing Center (NHR@FAU)

Date and time: Thursday, May 11, 2023, 2:00 p.m. - 6:00 p.m. CEST

Zoom link: The link will be available on the Swapcard conference platform a few hours before the event.

Agenda:

Basic CPU/GPU and node architecture
- Core: Pipelines, SIMD, out-of-order processing, SMT
- Cache hierarchy
- Memory interface
- Basic performance phenomenology and bottlenecks
Hardware-software interaction
- The naive Roofline model
- Examples: sum reduction, stencils
Common-sense code analysis
- Using hardware performance counters
- Characterizing code with hardware counters

Slides

Select activity All presentation slides (updated May 11, 12:35 p.m. CEST)

All presentation slides (updated May 11, 12:35 p.m. CEST) File PDF
Select activity Exercise: Out-of-order code execution

Exercise: Out-of-order code execution H5P
Select activity Exercise: Layer Conditions for a strange stencil

Exercise: Layer Conditions for a strange stencil Page

Links

For publications of NHR@FAU, see https://hpc.fau.de/research/publications

Important links:

LIKWID tool suite: https://github.com/RRZE-HPC/likwid

LIKWID documentation Wiki: http://tiny.cc/LIKWID

Online Layer Condition calculator: http://tiny.cc/LayerConditions

Kerncraft automatic Roofline/ECM modeling tool: https://github.com/RRZE-HPC/kerncraft

ClusterCockpit monitoring infrastructure: https://github.com/ClusterCockpit

Upcoming course on Node-Level Performance Engineering (online) on June 27-30, 2023 at HLRS Stutgart: https://www.hlrs.de/training/2023/nlp

Section outline

General Information

Slides

Links