Assignment 10 (last): Tasking, ccNUMA
Completion requirements
Opened: Tuesday, 9 July 2024, 12:00 AM
Due: Thursday, 18 July 2024, 5:00 PM
- Tasking with the ray tracer
A correct example code with standard loop parallelization for the raytracer can be found in~ptfs100h/GettingStarted/RAY
. Choose the same problem size as before (15k x 15k).
(a) (25 crd) Parallelize the program with OpenMP tasking. There are different ways to do this. Do not use the taskloop directive but use individual tasks. Make sure that the code computes a correct result and has about the same performance as the original (loop-based) variant on 36 cores of Fritz at 2.0 GHz.
(b) (25 crd) Investigate how the performance of the code on 36 cores of a Fritz socket (2.0 GHz clock speed) depends on the tile size. Scan tile sizes from 1x1 to 3000x3000 pixels and compare the performance in Mpx/s with the loop-parallel version in a diagram that shows performance vs. tile size. Are there any significant differences? What can you conclude from that? - ccNUMA map.
(20 crd) In Lecture 16n (slide 12) we showed a ccNUMA map of an AMD Zen1 (Naples) node. Using the DAXPY benchmark loop [ a(i)=a(i)+s*b(i) ], make such a map for a Sapphire Rapids node in Fritz. You do not have to produce a fancy color map - a simple table is sufficient. Fix the clock frequency to 2.0 GHz.
Document exactly how you took the data so your experiment is reproducible!
Remember that you have to submit your job with the "-p spr1tb" option in order to get one of the SPR nodes.