Hands-on: likwid-topology, likwid-pin
In this hands-on exercise you will compile and run an example code. You will learn how to explore node properties and topology with likwid-topology and how to use likwid-pin to explicitly control thread affinity.
Preparation
You can find the benchmark code in the MFCG folder of the teacher account.
- Get the source from the teacher's account:
$ cp -a ~x19a0001/MFCG ~
- Load Intel compiler and LIKWID modules:
$ module load intel likwid
Explore node topology
Execute likwid-topology:
$ likwid-topology -g
Hint: In order to avoid line-wrapping, pipe outputs to less -S.
Answer the following questions:
- How many cores are available in one socket, the whole node?
- Is SMT enabled?
- What is the aggregate size of the last level cache in MB per socket?
- How many ccNUMA memory domains are there?
- What is the total installed memory capacity?
Compile the benchmark
Compile a threaded OpenMP binary with optimizing flags:
$ make
This creates a parallel version and a serial version (for runtime profile later)
Run the benchmark
Execute with 16 threads without explicit pinning:
$ env OMP_NUM_THREADS=16 ./perf 2500 40000Repeat multiple runs.
- Do the results fluctuate?
- By how much?
Run again with explicit pinning also using 16 threads but pinned to 16 physical cores of socket 0:
$ likwid-pin -c S0:0-15 ./perf 2500 40000
Then, run again with explicit pinning using 8 threads on each CPU socket:
$ likwid-pin -c S0:0-7@S1:0-7 ./perf 2500 40000
$ likwid-pin -c S0:0-15@S1:0-15 ./perf 2500 40000
Can you interpret the observed performance pattern? What could be going on?