In this hands-on exercise you will compile and run an example code. You will learn how to explore node properties and topology with likwid-topology and how to use likwid-pin to explicitly control thread affinity.

Preparation

You can find the benchmark code in the MFCG folder of the teacher account.

  • Get the source from the teacher's account:

    $ cp -a ~x19a0001/MFCG ~
  • Load Intel compiler and LIKWID modules:

    $ module load intel likwid

Explore node topology

Execute likwid-topology:

$ likwid-topology -g

Hint: In order to avoid line-wrapping, pipe outputs to less -S.

Answer the following questions:

  1. How many cores are available in one socket, the whole node?
  2. Is SMT enabled?
  3. What is the aggregate size of the last level cache in MB per socket?
  4. How many ccNUMA memory domains are there?
  5. What is the total installed memory capacity?

Compile the benchmark

Compile a threaded OpenMP binary with optimizing flags:

$ make

This creates a parallel version and a serial version (for runtime profile later)

Run the benchmark

Execute with 16 threads without explicit pinning:

$ env OMP_NUM_THREADS=16 ./perf 2500 40000
Repeat multiple runs.
  1. Do the results fluctuate? 
  2. By how much?

Run again with explicit pinning also using 16 threads but pinned to 16 physical cores of socket 0:

$ likwid-pin -c S0:0-15 ./perf 2500 40000

Then, run again with explicit pinning using 8 threads on each CPU socket:

$ likwid-pin -c S0:0-7@S1:0-7 ./perf 2500 40000

Then, run with 16 cores on each socket:

$ likwid-pin -c S0:0-15@S1:0-15 ./perf 2500 40000

Finally, run with a single thread.

Can you interpret the observed performance pattern? What could be going on?

Last modified: Tuesday, 23 July 2024, 11:28 AM