A stencil on the GPU
Completion requirements
The folder J2D contains the CUDA version of a Jacobi 2D code as shown in the lecture.
Look at the source code in jacobi-2d.cu: The GPU kernel is in jacobi().
- How many bytes per lattice site update (LUP) do you expect this code will transfer from and to memory?
- What is the expected maximum performance of the code in GLUP/s when run on an A100-80 GB GPU (theoretical memory bandwidth 1.9 Tbyte/s)?
Run the code using the provided job script in job-marvin.sh. Are your expectations met? Play with the problem size and the number of threads per block (x and y dimensions); how fast can you get?
Calculate the maximum memory bandwidth you can achieve.
Last modified: Tuesday, 17 March 2026, 6:10 AM