Hands-on: Performance counters and memory bandwidth
Task: Explore the behavior of a memory benchmark using likwid-perfctr
In
this exercise you will analyze and predict the data access pattern of
typical streaming patterns and validate your prediction with likwid-perfctr
measurements.
Preparation
You can find the benchmark code in the ~q26z0001/BWBENCH folder. Copy it again since there might have been updates.
Investigate the benchmark code
Analyze the bwBench source code and derive the relation between read and write data volume for all benchmark cases.
Take into account possible write-allocate transfers!
Run benchmark
Data traffic analysis
Instrument the binary yourself using the LIKWID Marker API or use the provided bwBench-likwid.{c,f90}. Load the likwid and compiler modules:
$ module load likwid intel
Compile the code with:
$ icx -Ofast -xHost -fno-alias -std=c99 -qopenmp -DLIKWID_PERFMON ${LIKWID_INC} -o bwBench-perf bwBench-likwid.c ${LIKWID_LIB} -llikwid
or
$ ifx -Ofast -xHost -fno-alias -qopenmp ${LIKWID_INC} -o bwBench-perf bwBench-likwid.f90 ${LIKWID_LIB} -llikwid
These command lines use the module variables from the likwid module, so they are not portable to other systems.
First, run on a single core with the MEM group:
$ likwid-perfctr -g MEM -C S0:0 -m ./bwBench-perf
Look at the following derived metrics (concentrate on the COPY, TRIAD, and DAXPY loops):
- Memory read data volume
- Memory write data volume
- Overall memory bandwidth
Questions:
- Is there anything unexpected in the data?
- Does the memory bandwidth reported by the benchmark match the bandwidth measured by likwid-perfctr?
Now execute the benchmark using all cores on one ccNUMA domain:
$ likwid-perfctr -g MEM -C S0:0-17 -m ./bwBench-perfQuestions:
- Is anything different from the single-core run? (look at the data volumes!)
- What could have happened?