Data transfers and strided loops

Data transfers and strided loops

by Georg Hager -
Number of replies: 0

Dear PTfS students,

there was some confusion in the tutorial about how to calculate the data transfer in case of a strided loop, e.g.:

for(i=0; i<N; i+=2) {
  z[i] = a[i] * 0.5f;
}

As discussed on slide 16 of slide set 4, the fact that cache lines are always read and written as a whole leads to the effect that a[] must be read completely and z[] must be read and written completely (including the elements that are not used in any calculation). Since we want to calculate the in-memory code balance, all that matters are the memory transfers. Hence, this loop has a memory code balance of 12 bytes / 0.5 flops = 24 byte/flop. This is also true for the L2 and L3 code balance. (Note that if the stride is larger than a cache line (16 elements in this case), the data traffic may be reduced, depending on the details of the architecture.)

Best,

Georg.