A Beginner's Introduction to Node-Level Computer Architecture and Performance
Section outline
- 
                    
In all modern HPC systems, the compute node is where code is executed
    and "performance is generated." Hence, this is where a deep
    understanding of the performance issues of any application must start.
    At first glance, computer architecture appears extremely intricate,
    making it next to impossible to derive general rules for good
    performance. However, on closer inspection it turns out that there is a
    surprisingly small number of guiding principles which govern most of the
    performance behavior of HPC codes.
    
This online tutorial wants to convey those components of compute node
    architecture that are most relevant for performance in HPC. We start on the core level and cover code execution via pipelining and
    out-of-order processing, Single Instruction Multiple Data (SIMD), and
    Simultaneous Multi-Threading (SMT). Advancing through the memory
    hierarchy, we look at cache hierarchies, main memory, and cache-coherent
    non-uniform memory (ccNUMA) architecture. The commonalities and
    differences between CPUs and GPUs are clearly described. Using simple
    compute kernels from computational science, we show how architectural
    features interact with code. We also introduce the Roofline performance
    model as a simple way to formulate quantitative performance
    expectations, compare them with observations, and derive possible
    optimizations. Simple performance tools are introduced that favor
    insight instead of automation.
    To make this online event interactive, several online quizzes are
    interspersed with lectures. Participants can also solve exercise
    problems using H5P online content and our interactive "Layer Condition
    Calculator" for stencil codes.
Half-day online seminar for ISC High Performance 2023.
Lecturer: Dr. Georg Hager, Erlangen National High Performance Computing Center (NHR@FAU)
Date and time: Thursday, May 11, 2023, 2:00 p.m. - 6:00 p.m. CEST
Zoom link: The link will be available on the Swapcard conference platform a few hours before the event.
Agenda:
- Basic CPU/GPU and node architecture
- Core: Pipelines, SIMD, out-of-order processing, SMT
 - Cache hierarchy
 - Memory interface
 - Basic performance phenomenology and bottlenecks
 
 - Core: Pipelines, SIMD, out-of-order processing, SMT
 - Hardware-software interaction
- The naive Roofline model
 - Examples: sum reduction, stencils
 
 - The naive Roofline model
 - Common-sense code analysis 
- Using hardware performance counters
 - Characterizing code with hardware counters
 
 
 - Basic CPU/GPU and node architecture
 - 
                    
For publications of NHR@FAU, see https://hpc.fau.de/research/publications
Important links:
LIKWID tool suite: https://github.com/RRZE-HPC/likwid
LIKWID documentation Wiki: http://tiny.cc/LIKWID
Online Layer Condition calculator: http://tiny.cc/LayerConditions
Kerncraft automatic Roofline/ECM modeling tool: https://github.com/RRZE-HPC/kerncraft
ClusterCockpit monitoring infrastructure: https://github.com/ClusterCockpit
Upcoming course on Node-Level Performance Engineering (online) on June 27-30, 2023 at HLRS Stutgart: https://www.hlrs.de/training/2023/nlp