ccNUMA analysis

Perform a scaling run of the MFPCG code over all 32 HW threads of the node by filling the ccNUMA domains consecutively.

$ for i in 16 32; do \
   likwid-perfctr -C E:N:${i} -g FLOPS_DP ./perf 2500 40000 \
   2>&1 | grep -E "Performance|^DP"; done

Does the performance scale across the ccNUMA domains? If it doesn't, what could be the reason?

Last modified: Sunday, 23 July 2023, 5:37 PM