Pinning

Pinning

by Ananda Krishnan Rejikumar Bindu -
Number of replies: 1

Dear PTFS team,

I have some difficulty in understanding the difference between compact and scattered pinning.

Could you please explain the exact difference? Also could you please see the attached and explain how in case of compact we have higher performance than scattered? Also Is it like for compact we have full chip config and when in case of scattered it is CoD?

With Regards,

Ananda

Attachment scattered_compact_pinning.PNG
In reply to Ananda Krishnan Rejikumar Bindu

Re: Pinning

by Jan Laukemann -

I have some difficulty in understanding the difference between compact and scattered pinning.
Could you please explain the exact difference? 

Affinity is important when talking about different domains, such as nodes, sockets, NUMA domains (as in the slide you attached), etc...
Compact or close pinning means the threads are bound to consecutive places, i.e., fill the first domain first, using the first core first.
Scattered or spread pinning means the threads are divided evenly among places, i.e., fill each domain equally, in a round-robin fashion.

Imagine the SNC scenario of the graph in your slides, i.e., we have 20 cores and two NUMA domains, with cores 0-9 in the first NUMA domain M0 and cores 10-19 in the other NUMA domain M1.
Filling the node in a compact fashion would lead to the following affinity (for 1 to 20 cores): 0,1,2,3,4,...,18,19
Filling the node in a scattered fashion would lead to: 0,10,1,11,2,12,3,13,...,8,18,9,19


Also could you please see the attached and explain how in case of compact we have higher performance than scattered?
We don't have a higher performance for compact pinning compared to scattered pinning when using a full node. Also in the plot they end up at the same sustained bandwidth of ~100 GB/s. However, you can see that with compact pinning you reach saturation of your NUMA domain faster because you are only utilizing one NUMA domain up to 10 cores and only starting with the 11th core you utilize your second NUMA domain, hence, from this point on the bandwidth increases again.

Also Is it like for compact we have full chip config and when in case of scattered it is CoD?
No. Both approaches can be applied to any setup as soon as you have multiple domains. SNC=on gives you more NUMA domains but even with one NUMA domain per socket you might use a system with multiple sockets, so you will end up again with multiple memory domains. Same thing with multiple nodes.

PS.: When using likwid-pin to pin your threads (using the -C option), you can print the used cores with the additional -p flag