Investigate access pattern of MFPCG GSPrecon (bwd)

By analyzing the memory data volumes for read and writes, we can use benchmarks for our node-specific rooflines (i.e., the memory bandwidth limitation) that reflect the application best.

Run MFPCG on the full node with the MEM group again and derive the ratio of read and write traffic for the backward sweep of the GS preconditioner:

$ likwid-perfctr -C N:0-31 -g MEM -m ./perf 2500 40000

Check the list of benchmarks of likwid-bench and determine which fits best (there may be more than one choice):

$ likwid-bench -a

Difference of -w and -W

Run the selected benchmark with -w and -W for the workgroup specification

$ likwid-bench -t <kernel> -w N:4GB:32
$ likwid-bench -t <kernel> -W N:4GB:32

Which one is faster and why?

Get data for the roofline (bandwidth roof)

Run the selected benchmark

$ likwid-bench -t <kernel> -w/-W N:8GB:32
Get data for the peak ceiling

The ceiling in the Roofline Model represents the maximal compute performance the node can achive. For most HPC code, we use the peak FP rate as a ceiling. Since the MFPCG code uses MLUP/s as metric, we could also derive the maximal MLUP/s from the maximal FP rate and use this value.

likwid-bench can be used with the peakflops* benchmarks to get a FP ceiling.

Get the size of the L1 cache and use half of the L1 cache size for the benchmarks of each thread.

$ likwid-bench -t peakflops{,_sse,_avx,_avx512} -W N:${DATASIZE}:32

Determine performance of MFPCG

If you cannot scroll up to the data anymore, re-run MFPCG with the MEM_DP group

$ likwid-perfctr -C N:0-31 -g MEM_DP -m ./perf 2500 40000

Calculate operational intensity of the backward preconditioner sweep by dividing "DP MFLOPS/s" by "Memory bandwidth [MByte/s]". The operational intensity printed by the MEM_DP group has a bug and works only for single-socket cases.

Combine data to the Roofline Model

Copy the script roofline.plot from the teacher's account

$ cp -a ~f51h0001/roofline.plot .

Edit the plotting script and add your data:

  • maxperf: maximal FP rate
  • maxband: maximal bandwidth
  • op_ins: Operational intensity
  • app_perf: FP rate of application

Now you can plot the data and look at it:

$ module use ~unrz139/.modules/modulefiles
$ module load gnuplot
$ gnuplot roofline.plot
Last modified: Monday, 24 July 2023, 4:07 PM