Assignment 1 Task 2

Assignment 1 Task 2

by Sascha Hofmann -
Number of replies: 1

Hello everyone,

I have a question regarding Assignment 1 Task 2. 

The exercise slides state that each 14 cycles, two iterations can be completed. I understand that this is the added latency of ADD and MUL, so 6 + 8 = 14.

But where does this leave load and store? Shouldn't these two operations take at least one cycle each?

In reply to Sascha Hofmann

Re: Assignment 1 Task 2

by Jan Laukemann -

The LOAD and STORE instructions are outside of the dependency chain and, therefore, can overlap with the MUL-ADD-dependency chain.

For the LOAD, think about when data can be loaded the earliest. We can load a[0] in the first cycle, a[1] in the second, a[2] in the third, etc... While we would have to wait a cycle in the warmup phase in the first iteration until we can continue with the MUL, for every following iteration, the data we need is already loaded (in case of a[i]) or already in a register as we computed its value before (in case of a[i-2]).

For the STORE, while we do have to execute it, we already have the value of a[i] in a register and keep it there for when we need it again two iterations later, so it can also fully overlap to the limiting MUL-ADD chain.