NLPE_HLRS: Hands-on: Performance counters and memory bandwidth

Task: Explore the behavior of a memory benchmark using likwid-perfctr

In this exercise you will analyze and predict the data access pattern of typical streaming patterns and validate your prediction with likwid-perfctr measurements.

Preparation

You can find the benchmark code in the BWBENCH folder.

Investigate the benchmark code

Analyze the bwBench source code and derive the relation between read and write data volume for all benchmark cases.

Take into account possible write-allocate transfers!

Run benchmark

Data traffic analysis

Instrument the binary yourself using the LIKWID Marker API or use the provided bwBench-likwid.{c,f90}. Load the likwid and compiler modules:

$ module load likwid intel

Compile the code with:

$ icx -Ofast -xHost -fno-alias -std=c99 -qopenmp  -DLIKWID_PERFMON ${LIKWID_INC} -o bwBench-perf  bwBench-likwid.c ${LIKWID_LIB} -llikwid

$ ifx -Ofast -xHost -fno-alias -qopenmp ${LIKWID_INC} -o bwBench-perf  bwBench-likwid.f90 ${LIKWID_LIB} -llikwid

These command lines use the module variables from the likwid module, so they are not portable to other systems.

First, run on a single core with the MEM group:

$ likwid-perfctr -g MEM -C S0:0 -m ./bwBench-perf

Look at the following derived metrics (concentrate on the COPY, TRIAD, and DAXPY loops):

Memory read data volume
Memory write data volume
Overall memory bandwidth

Questions:

Is there anything unexpected in the data?
Does the memory bandwidth reported by the benchmark match the bandwidth measured by likwid-perfctr?

Now execute the benchmark using all cores on one ccNUMA domain:

$ likwid-perfctr -g MEM -C S0:0-15 -m ./bwBench-perf

Questions:

Is anything different from the single-core run? (look at the data volumes and the read vs. write data volumes!)
What could have happened?

Last modified: Wednesday, 10 June 2026, 9:39 AM