There is a major difference between the loop kernel we are analyzing in slide 3b-27 and assignment 1 task 2. While there is a dependency chain spreading across the iterations and we need to take this into account (we CANNOT compute
a[i]
before a[i-2]
is ready), the individual iterations of the loop in the slides (a[i] = s1+s2*b[i]
) can fully overlap with any other iteration as all computations are independent (a[i]/
b[i]
are never accessed outside of iteration i
).With regards to the flops/cy in the slide of assignment 1, 4/14 is 0.286 as you pointed out, so this is obviously a typo. I fixed this and updated the PDF, thank you for pointing this out.