Question on Sheet 1, Ex 2

Re: Question on Sheet 1, Ex 2

by Jan Laukemann -
Number of replies: 0

Hi Max,

  1. you are absolutely right, a smart CPU would keep a[i] in a register for two more iterations to, then, reuse it as a[i-2] at that moment. However, assuming each Load and Store takes 1 cycle, performance wouldn't change either if the CPU reloads a[i-2] each iteration from cache in this case (the last sentence is more complicated and I guess more confusing than helpful, so let's forget about it, sorry)
  2. As soon as any arithmetic or load instruction finishes, the result will be in a register and can be used by any other instruction that requires an input. I.e., two ADDs on the same variable would take exactly 2x6 cycles. Any instruction, that is independent from any other instruction that is in-flight (i.e., currently executed), can be executed in parallel, given the first stage of the required pipeline is free (meaning, of course I can't start two independent ADD instructions at the same cycle due to the given hardware limitation)

Did this clarify the task for you?

Best,
Jan