ParProg20: Assignment 1 | NHR Learning Platform

The Hockney Model. The simplistic bandwidth/latency model for communication paths postulates that the transfer time is composed of a latency and a bandwidth part: T=λ+N/b, where λ is the latency, b is the asymptotic bandwidth, and N is the length of the message in bytes. The effective bandwidth of the transfer is the ratio of message size to transfer time: B_eff=N/T.

(a) There is a specific message size N_1/2 at which half of the asymptotic bandwidth is achieved. Derive an expression for N_1/2(λ,b).
(b) An Intel "Omni-Path" network connection has an asymptotic bandwidth of 100 Gbit/s per direction and a latency of 1.2 μs. Draw a diagram of effective bandwidth in Gbyte/s vs. message size in bytes for messages from 0 to 100 Mbytes. Hint: Choose a logarithmic scale on the x axis and a linear scale on the y axis. What is N_1/2 here?
Performance limits. The (now discontinued) Intel Xeon Phi "Knights Landing" (KNL) coprocessor had the following features (maximum values for top model):
- Clock speed: up to 1.4 GHz
- SIMD register width: 512 bit
- Floating-point superscalarity: 2 FMA instructions per cycle
- Cache size up to 36 Mbyte
- Memory bandwidth (measured): 480 Gbyte/s
- Number of cores: up to 72
(a) What is the double-precision peak performance of the chip?
(b) Derive an upper bound for the performance in Gflop/s of the following code on the KNL, assuming that it is properly parallelized to run on all cores in parallel (a and b are double-precision arrays, s is a double-precision variable):
```
for(int i=0; i<10000000; i++)
  a[i] += s*b[i];
```
Hint: There is not enough information here about the CPU to really get to the bottom of this. Just do your best with the information you were given.

Last modified: Tuesday, 7 September 2021, 4:35 PM