1. The Hockney Model. The simplistic bandwidth/latency model for communication paths postulates that the transfer time is composed of a latency and a bandwidth part: T=λ+N/b, where λ is the latency, b is the asymptotic bandwidth, and N is the length of the message in bytes. The effective bandwidth of the transfer is the ratio of message size to transfer time: Beff=N/T.

    (a) There is a specific message size N1/2 at which  half of the asymptotic bandwidth is achieved. Derive an expression for N1/2(λ,b).
    (b) An Intel "Omni-Path" network connection has an asymptotic bandwidth of 100 Gbit/s per direction and a latency of 1.2 μs. Draw a diagram of effective bandwidth in Gbyte/s vs. message size in bytes for messages from 0 to 100 Mbytes. Hint: Choose a logarithmic scale on the x axis and a linear scale on the y axis. What is N1/2 here?


  2. Performance limits. The (now discontinued) Intel Xeon Phi "Knights Landing" (KNL) coprocessor had the following features (maximum values for top model):
    • Clock speed: up to 1.4 GHz
    • SIMD register width: 512 bit
    • Floating-point superscalarity: 2 FMA instructions per cycle
    • Cache size up to 36 Mbyte
    • Memory bandwidth (measured): 480 Gbyte/s
    • Number of cores: up to 72

    (a) What is the double-precision peak performance of the chip?
    (b) Derive an upper bound for the performance in Gflop/s of the following code on the KNL, assuming that it is properly parallelized to run on all cores in parallel (a and b are double-precision arrays, s is a double-precision variable):

    for(int i=0; i<10000000; i++)
    a[i] += s*b[i];

    Hint: There is not enough information here about the CPU to really get to the bottom of this. Just do your best with the information you were given.

Last modified: Tuesday, 7 September 2021, 4:35 PM