Performance Calculation

Re: Performance Calculation

by Jan Laukemann -
Number of replies: 0
There is a major difference between the loop kernel we are analyzing in slide 3b-27 and assignment 1 task 2. While there is a dependency chain spreading across the iterations and we need to take this into account (we CANNOT compute a[i] before a[i-2] is ready), the individual iterations of the loop in the slides (a[i] = s1+s2*b[i]) can fully overlap with any other iteration as all computations are independent (a[i]/b[i] are never accessed outside of iteration i).

With regards to the flops/cy in the slide of assignment 1, 4/14 is 0.286 as you pointed out, so this is obviously a typo. I fixed this and updated the PDF, thank you for pointing this out.