PTfS25: Performance Calculation | NHR Learning Platform

Hi;

I am confused about performance calculation. In assignment1-question 2; performance was calculated by using bottleneck latency of add-mult instructions. however in slide 3b-page 27, performance was calculated by using bottleneck throughput of store instruction. is performance independent from latency in the hardwares which have simd vectorization. in the same manner, is performance independent from latency in the hardwares which do not have simd vectorization?

Best regards.

Re: Performance Calculation

by Meenal Baberwal - Monday, 19 May 2025, 5:46 PM

I have the query regarding the same assignment question. We are supposed to give Pmax in the unit Flops/cycle and here as we can see there are 4 Flops so 4/14 makes it 0.286 Flops/cycle. Is the output provided in the slide correct or am I missing something? Please assist.

Re: Performance Calculation

by Jan Laukemann - Monday, 19 May 2025, 6:33 PM

There is a major difference between the loop kernel we are analyzing in slide 3b-27 and assignment 1 task 2. While there is a dependency chain spreading across the iterations and we need to take this into account (we CANNOT compute a[i] before a[i-2] is ready), the individual iterations of the loop in the slides (a[i] = s1+s2*b[i]) can fully overlap with any other iteration as all computations are independent (a[i]/b[i] are never accessed outside of iteration i).

With regards to the flops/cy in the slide of assignment 1, 4/14 is 0.286 as you pointed out, so this is obviously a typo. I fixed this and updated the PDF, thank you for pointing this out.