That is not what I said
The H2D and D2H transfers need to be done in addition to that (if the scenario regarded states it).
The program needs to stage the data to GPU (via PICe, as discussed in the lecture), do the computation (this part is compute bound), and copy the result back.