Dear PTFS team,
Is the performance requested following the one mentioned in slides 05/08 or the one mentioned in slides 04/17? And, in general, how to distinguish between them if the performance is requested?
Kind regards
Hassan Rady
Dear PTFS team,
Is the performance requested following the one mentioned in slides 05/08 or the one mentioned in slides 04/17? And, in general, how to distinguish between them if the performance is requested?
Kind regards
Hassan Rady
If you look at the lecture you will see that we generally assume that SIMD variants of instructions have the same throughput and latency as their scalar counterparts. There are exceptions, as you know from the SQRT problem, but if this is the case it will be clearly stated.
Horizontal add. Most of this was shown in the lecture 04/30: We assume here that it is one instruction that takes some extra amount of latency. In case the compiler performs unrolling on top of SIMD, we need it multiple times, plus a final scalar ADD.
I do not understand the question; according to the problem statement, you should assume that the data comes from the L1 cache.
(this pertains to your first question; Moodle somehow did not recognize that I wanted to respond to this specific item)