The folder J2D contains the CUDA version of a Jacobi 2D code as shown in the lecture. 

Look at the source code in jacobi-2d.cu: The GPU kernel is in jacobi().

  • How many bytes per lattice site update (LUP) do you expect this code will transfer from and to memory?
  • What is the expected maximum performance of the code in GLUP/s when run on an A100-80 GB GPU (theoretical memory bandwidth 1.9 Tbyte/s)?

Run the code using the provided job script in job-marvin.sh. Are your expectations met? Play with the problem size and the number of threads per block (x and y dimensions); how fast can you get? 

Calculate the maximum memory bandwidth you can achieve. 

Last modified: Tuesday, 17 March 2026, 6:10 AM