1. Slow computing. Assume that a parallel program behaves in accordance with Amdahl's Law plus communication overhead (serial fraction s, c(N)=kN). We compare the scalability of this program on two machines. Both have the same communication network, but machine 1 is three times faster in terms of code execution than machine 2.
    (a) Prove that the slow machine "shows better speedup" than the fast machine.
    (b) Does it matter?

  2. A performance model. On slide 22 of Lecture 3 we analyzed a simple loop:

    #pragma omp parallel for
    for(i=0; i<n; ++i)   a[i] = a[i] + s * c[i];
    As opposed to our analysis in the lecture, in reality the OpenMP parallelization does have some overhead. In this case it amounts to roughly 2000 CPU cycles (with 8 cores at a clock speed of 3 GHz).

    Make a model of the expected performance in Gflop/s of this loop with respect to n, the loop length, on an 8-core CPU with a peak performance of 192 Gflop/s and a memory bandwidth of 40 Gbyte/s. You can assume that all the arrays (even if they are short) are in main memory. At which n do we get half of the asymptotic (i.e., large-n) performance?

Last modified: Wednesday, 4 November 2020, 4:23 PM