Q1 and Q2 assignment 6

Q1 and Q2 assignment 6

by Thies Weel -
Number of replies: 3

Hi, I’d like to double-check two details about the PCIs in Q1 and Q2.

1st question:
In Q1 the statement says “The CPUs and GPUs of a node are connected via 2 PCIe interfaces each capable of transferring 32 GB s-¹ in each direction.”
Does “2 interfaces” mean
• two links for the whole node i.e. an aggregate 64 GB s-¹ up- and 64 GB s-¹ down for the node, or
• two links per chip, giving each device its own 64 GB s-¹.

2nd question:
In the formulas from the slides, for calculation of P_max we never take memory roofline into account, yet it was given in Q1, so do you expect us to use it there as well?

Thanks you in advance.
Thies

In reply to Thies Weel

Re: Q1 and Q2 assignment 6

by Sebastian Kuckuk -

The CPUs and GPUs of a node are connected via 2 PCIe

All CPUs are connected to all GPUs via two links. That is the whole node shares two links.

In the formulas from the slides, for calculation of P_max we never take memory roofline into account, yet it was given in Q1, so do you expect us to use it there as well?

Not sure which part of the exercise this refers two. Task 1 states Compute the theoretical peak floating-point performance (in FLOP/s) - no need for anything else but the computational throughput.
In task 2, we implicitly assume that the computational part is compute bound (as usually done for dense matrix-matrix-multiplication). The H2D and D2H transfers need to be done in addition to that (if the scenario regarded states it).

In reply to Sebastian Kuckuk

Re: Q1 and Q2 assignment 6

by Thies Weel -

Alright, thank you.

Then I will assume that the PCIs bandwidth is not a limit in both questions.

In reply to Thies Weel

Re: Q1 and Q2 assignment 6

by Sebastian Kuckuk -

That is not what I said

The H2D and D2H transfers need to be done in addition to that (if the scenario regarded states it).

The program needs to stage the data to GPU (via PICe, as discussed in the lecture), do the computation (this part is compute bound), and copy the result back.