In this exercise, we want to calculate the value of \( \pi \) by numerically integrating a function:

\( \displaystyle\pi=\int\limits_0^1\frac{4}{1+x^2}\,\mathrm dx \)

We use a very simple rectangular integration scheme that works by summing up areas of rectangles centered around \( x_i \) with a width of \( \Delta x \) and a height of \( f(x_i) \).

You can find the code here: https://hpc-mover.rrze.uni-erlangen.de/compiler-explorer/z/xKffPG

1. Compile the code using icpc 2021.6.0 and use the optimization flags "-Ofast -qopenmp-simd -qopt-zmm-usage=low -xHost -fargument-noalias -funroll-loops -fno-builtin" and create an executor with the same flags.

2. Identify the region of interest (ROI) and analyze it with OSACA. Does your measurement meet your expectations? What is the limiting bottleneck?

3. Change the "-qopt-zmm-usage" to "high" (make sure to have it both in the compiler and executor pane) and analyze the code. What has changed? Does your analysis still match the measurement and what is now the limiting bottleneck?

Last modified: Tuesday, 8 October 2024, 3:33 PM