Use the Layer Condition Calculator at to calculate the required spatial blocking criterion for achieving minimum code balance (B/LUP) at the L2 cache for the following stencil code on a single core:

#define M 8000
#define N 10000
for(int i=0; i<M; ++i)
  for(int j=0; j<N; ++j)
    b[i][j] = 0.25*(a[i-1][j] + a[i][j-1]
+ a[i-2][j] + a[i-3][j]
+ a[i-4][j]);

Assume the following hardware characteristics:

  • L1 cache size 32 kB per core
  • L2 cache size 512 kB per core
  • L3 cache size 20 MB shared among 10 cores

Last modified: Thursday, 11 May 2023, 12:33 PM