Hello,
I'm currently working on ex. 2) of this weeks assignment sheet. I have some questions about the microarchitectural assumptions that I'm not sure the assignments statement gives a hint to.
As far as I understood, any iteration is composed of the following procedure:
- Under the assumption that the 16 registers suffice, some updated values a[i] which will become a[i-2] if the counter is incremented, will continue to reside in registers after am iteration has been completed. Hence, we do not need to load a[i-2] from the L1 cache again.
- The value a[i-2] is being multiplied with s for a total of 8 cycles and its result is written back into a register. At the same time, we can load a[i] and write the updated value a[i-2] from the previous iteration back to L1/MM because the latter take only one cycle.
- Next, we add our result "s*a[i-2]" to a[i] over the course of 6 cycles. and write it back to some register.
- Finally, this updated value of a[i] needs to be written back into L1/MM. However, we can already start new operations on this value immediately since its value resides in a register.
This model now relies on two assumptions for which I'm not sure whether they are sensible or not:
- Assuming the 16 registers suffice, may we assume that "intermediate" values a[i-2] do not need to be fetched from L1 if they still reside in a register from a previously calculated iteration?
- Can multiple instructions, i.e. MULT, STORE, LOAD, ADD, access a registers value simultaneously. This is because I assume that a new multiplication with a[i] can be started while it is simultaneously being written back to L1/MM.
I'd be grateful if anyone could clarify these points for me
Best Regards
Max Jordan