Course: Node-Level Performance Engineering @HLRS

Section outline

Select section General

Collapse Expand
General

Collapse all Expand all
This course covers performance engineering approaches on the compute node level. Even application developers who are fluent in OpenMP and MPI often lack a good grasp of how much performance could at best be achieved by their code. This is because parallelism takes us only half the way to good performance. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted. This course conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware. This process must start at the core, socket, and node level, where the code gets executed that does the actual computational work. We introduce the basic architectural features and bottlenecks of modern processors and compute nodes. Pipelining, SIMD, superscalarity, caches, memory interfaces, ccNUMA, etc., are covered. A cornerstone of node-level performance analysis is the Roofline model, which is introduced in due detail and applied to various examples from computational science. We also show how simple software tools can be used to acquire knowledge about the system, run code in a reproducible way, and validate hypotheses about resource consumption. Finally, once the architectural requirements of a code are understood and correlated with performance measurements, the potential benefit of code changes can often be predicted, replacing hope-for-the-best optimizations by a scientific process. The focus of the last course day is on lectures and exercises using Score-P and Vampir for performance engineering, showing how these more traditional parallel performance analysis tools can be applied at the node level as well.

Lecturers: Georg Hager and Jan Eitzinger, Erlangen National High Performance Computing Center, Bill Williams, Center for Information Services and High Performance Computing (ZIH)

Course date: June 3-5, 2025 (9:00 am - 4:00 pm) and June 6, 2025 (9:00 am - 12:15 pm)

This course will be conducted online as a Zoom event. Details will be sent vie e-mail to registered participants.

Course Outline:

Introduction

Basic architecture of multicore systems: threads, cores, caches, sockets, memory

The important role of system topology

Tools topology and affinity in multicore environments

Overview

likwid-topology and likwid-pin

Roofline model: basics

Model assumptions and construction

Simple examples

Limitations of the Roofline model

Tools: hardware performance counters

Why hardware performance counters?

likwid-perfctr

Applications

Roofline case studies

Stencil algorithms

Tall & Skinny dense matrix-matrix multiplication

Sparse matrix-vector multiplication

Basic skills in performance engineering

Optimal use of parallel resources

Single Instruction Multiple Data (SIMD)

Cache-coherent Non-Uniform Memory Architecture (ccNUMA)

Extending Roofline: The ECM performance model

Performance Engineering using Score-P and Vampir

Analyzing MiniMD

Analyzing SpMV
- Select activity Course schedule (day 1-3)
  
  Course schedule (day 1-3) File
- Select activity Tools Day schedule (day 4)
  
  Tools Day schedule (day 4) File
Select section Day 1

Collapse Expand
Day 1
- Select activity General intro
  
  General intro File
- Select activity Introduction to computer node architecture
  
  Introduction to computer node architecture File
- Select activity Hands-on: Logging in
  
  Hands-on: Logging in Page
- Select activity Hands-on: The divide instruction
  
  Hands-on: The divide instruction Page
- Select activity LIKWID tools: topology, affinity, clock speed
  
  LIKWID tools: topology, affinity, clock speed File
- Select activity Hands-on: likwid-topology, likwid-pin, memory bandwidth
  
  Hands-on: likwid-topology, likwid-pin, memory bandwidth Page
- Select activity Small affinity check application
  
  Small affinity check application File ZIP
- Select activity The Roofline model: Introduction
  
  The Roofline model: Introduction File
Select section Day 2

Collapse Expand
Day 2
- Select activity LIKWID tools: hardware performance counters
  
  LIKWID tools: hardware performance counters File
- Select activity Hands-on: Performance counters and memory bandwidth
  
  Hands-on: Performance counters and memory bandwidth Page
- Select activity Roofline case study: Stencils
  
  Roofline case study: Stencils File
- Select activity Performance Engineering basics
  
  Performance Engineering basics File
- Select activity Hands-on: Dense matrix-vector multiplication
  
  Hands-on: Dense matrix-vector multiplication Page
- Select activity Cache-coherent Non-Uniform Memory Architecture (ccNUMA)
  
  Cache-coherent Non-Uniform Memory Architecture (ccNUMA) File
- Select activity Roofline case study: "Tall & Skinny" dense matrix-matrix multiplication
  
  Roofline case study: "Tall & Skinny" dense matrix-matrix multiplication File
Select section Day 3

Collapse Expand
Day 3
- Select activity Single Instruction Multiple Data (SIMD)
  
  Single Instruction Multiple Data (SIMD) File
- Select activity Hands-on: Analyzing the MiniMD proxy app
  
  Hands-on: Analyzing the MiniMD proxy app Page
- Select activity Analysis spreadsheet template
  
  Analysis spreadsheet template File
- Select activity Roofline case study: Sparse matrix-vector multiplication (SpMV)
  
  Roofline case study: Sparse matrix-vector multiplication (SpMV) File
- Select activity Hands-on: Matrix-free CG solver
  
  Hands-on: Matrix-free CG solver Page
- Select activity Beyond Roofline: The ECM performance model
  
  Beyond Roofline: The ECM performance model File
Select section Day 4

Collapse Expand
Day 4
- Select activity Trace-based performance engineering
  
  Trace-based performance engineering File
- Select activity Introduction to Score-P
  
  Introduction to Score-P File
- Select activity Score-P Demo
  
  Score-P Demo Page
- Select activity Exercise: MiniMD Trace Collection
  
  Exercise: MiniMD Trace Collection Page
- Select activity Trace analysis with Vampir
  
  Trace analysis with Vampir File
- Select activity Exercise: : Load imbalance: SMxV
  
  Exercise: : Load imbalance: SMxV Page
Select section Feedback

Collapse Expand
Feedback
Please fill out the feedback form at:

https://survey.hlrs.de/index.php/688569?lang=en

This link will be active from Thursday, June 5, 12:00 p.m.

Section outline

General

Day 1

Day 2

Day 3

Day 4

Feedback