In this hands-on exercise you will compile and run a main memory bandwidth benchmark. You will learn how to explore node properties and topology with likwid-topology and how to use likwid-pin to explicitly control thread affinity.

Finally you learn how to determine the maximum sustained memory bandwidth for one socket and a complete node.

Preparation

Copy all required file to your home. You can find the benchmark code in the BWBENCH folder.

$ cp -a ~dc-grub1/NLPE-Durham $HOME
$ cd NLPE-Durham/BWBENCH

Explore node topology

Submit job-dine2-part1.sh

$ sbatch job-dine2-part1.sh

Check the output

less -S job-dine2-part1sh.o*

(The "less -S" is for enabling horizontal panning because the output is too wide for most screens.)

Answer the following questions:

  1. How many cores are available in one socket, the whole node?
  2. Is SMT enabled?
  3. What is the aggregate size of the last level cache in MB per socket?
  4. How many ccNUMA memory domains are there?
  5. What is the total installed memory capacity?

Run the benchmark

BWBENCH runs a couple of different data-streaming loops with large arrays and reports the observed memory bandwidth per loop. Basically it's an improved version of the popular STREAM benchmark.

Submit job-dine2-part2.sh

$ sbatch job-dine2-part2.1.sh

The script executes with 16 threads without explicit pinning. Repeat multiple runs. Do the results fluctuate? What is the average bandwidth reading for, e.g., the Triad benchmark?

$ sbatch job-dine2-part2.2.sh

This runs BWBENCH again with explicit pinning also using 16 threads but pinned to 16 physical cores of socket 0 (If not already done, perform module load likwid):

 
  1. Is the performance different? If yes: why is it different? 
  2. Can you recover the previous (best) performance result?

Benchmark the memory bandwidth scaling within one ccNUMA domain (in 1-core steps from 1 to 32 cores):

$ sbatch job-dine2-part3.sh
  1. What is the maximum memory bandwidth in GB/s?
  2. Which benchmark case reaches the highest bandwidth?
  3. At which core count can you saturate the main memory bandwidth?
Last modified: Friday, 12 June 2026, 7:14 PM