Assignment 3 Task 2

Assignment 3 Task 2

by Hassan Rady -
Number of replies: 3

Dear PTFS team,


Is the performance requested following the one mentioned in slides 05/08 or the one mentioned in slides 04/17? And, in general, how to distinguish between them if the performance is requested?


Kind regards

Hassan Rady 

In reply to Hassan Rady

Re: Assignment 3 Task 2

by Timm Bugla -
Hi! I'll reply to that topic since it also applies to the subtask c) of Task 2.

When we use SIMD, can we assume that the MULT and LOAD simd operation take the same time as their singular counterparts? 
Is a single LOAD operation enough to fill an SIMD register or do we need to call LOAD multiple times per SIMD register?
When doing the horizontal add, is that part of the ADD operation or is it seperate from the ADD operation? If it is the later, is it valid in context of c) to save the result of the horizontal add in a seperate register "t" and then execute a normal add on "t" and "s"? Or would that be considered as a form of loop unrolling?


In reply to Timm Bugla

Re: Assignment 3 Task 2

by Georg Hager -

If you look at the lecture you will see that we generally assume that SIMD variants of instructions have the same throughput and latency as their scalar counterparts. There are exceptions, as you know from the SQRT problem, but if this is the case it will be clearly stated.

Horizontal add. Most of this was shown in the lecture 04/30: We assume here that it is one instruction that takes some extra amount of latency. In case the compiler performs unrolling on top of SIMD, we need it multiple times, plus a final scalar ADD.


In reply to Hassan Rady

Re: Assignment 3 Task 2

by Georg Hager -

I do not understand the question; according to the problem statement, you should assume that the data comes from the L1 cache.

(this pertains to your first question; Moodle somehow did not recognize that I wanted to respond to this specific item)