In this exercise, we want to analyze the 2D Gauss-Seidel (GS) kernel. In this algorithm we iterate over a two-dimensional grid and update a cell in-place based on its four neighbor cells.

You can find the code here:

1. Compile the code using icc 2021.6.0 and use the optimization flags "-Ofast -qopenmp-simd -qopt-zmm-usage=low -xHost -fargument-noalias -funroll-loops -fno-builtin" and create an executor with the same flags. Run the code with a grid of 6000x6000.

2. Identify the region of interest (ROI) and analyze it with OSACA. Does your measurement meet your expectations? What is the limiting bottleneck?

3. Now replace the benchmark loop (lines 78-87) with an optimized version to overcome dependencies:
   // benchmark loop
    for (int i=1; i<NI-1; ++i) {
        t = phi[i][0];
        for (int k=1; k<NK-1; ++k) {
            t1 = (phi[i+1][k] + phi[i][k+1]) * 0.25;
            t2= t1 + (phi[i-1][k]) * 0.25;
            t = t2 + 0.25 * (t);
            phi[i][k] = t;
    // end of benchmark loop

  a) Analyze the new code and measure it. Did the overall prediction and/or the bottleneck change? 
  b) Do the same analysis with the -O1 flag instead of -Ofast in your compiler options. This will make the compiler optimize for the smallest code size and not for performance. What can you see now and why?
  c) Test out at least two other compilers or compiler versions with either -Ofast, -O3, or -O1 flag and try to find out how they try to optimize the kernel. Note that for Clang and GCC, you need to use -fopenmp-simd -march=native -funroll-loops -fno-builtin besides the -O option instead of the in flags mentioned in 1..

Last modified: Friday, 20 October 2023, 2:22 PM