PTfS25: Assignment 0: Warming up | NHR Learning Platform

Opened: Wednesday, 30 April 2025, 12:00 AM

Due: Thursday, 8 May 2025, 10:05 AM

Write a benchmark code that numerically computes the integral

$ \displaystyle \int\limits_0^1\frac{1}{{1+x}}\,\mathrm{d}x $.

The result should be an approximation to $ \ln(2) $. Use the very simple "midpoint rule" integration scheme, which works by summing up areas of rectangles centered around $ x_i $ with a width of $ \Delta x $ and a height of $ f(x_i) $:

      double sum, x, delta_x, S, E;
      int N = 10;
      do {
        S = getTimeStamp();
        delta_x = 1./N;
        sum = 0.;
        for(int i=0; i<N; i++) {
          x = (i + 0.5) * delta_x;
          sum = sum + 1.0 / (1.0 + x);
        }
        double ln = sum * delta_x;
        E = getTimeStamp();
        if(E-S > 1e-7) break;
        N *= 2;
      } while(1);

(25 credits) Complete the code fragment, using suitable timing functions to measure the runtime of the main loop (as described in the tutorials kick-off). Make sure that it actually computes an approximation to $ \ln(2)=0.69314718... $ by printing the result, and run it on one core of the Fritz cluster. Use the Intel compiler (icx for C or icpx for C++) with the recommended compiler options (-O3 -xHost). Include the relevant parts of your code in your submission. Do not forget to fix the clock speed (use 2.4 GHz).
(25 credits) As you can see, we are making sure that the runtime of the loop is above a certain threshold by doubling N (the number of integration points) if this is not the case. Experiment with the threshold (set to $ 10^{-7} $ seconds in the example above). What is a reasonable minimum runtime for getting reliable, accurate, and reproducible measurements for the loop's runtime per iteration? Calculate how many CPU cycles the code takes to execute per iteration of the loop! Describe how you arrived at this number from your measurements!
(20 credits) What is the performance of the code? Is Flops/s a good performance metric here? Discuss different options for appropriate performance metrics in this code.
(30 credits) Run the experiment again, but this time do not fix the clock frequency but set the "performance governor" for the CPU:
```
$ srun --cpu-freq=performance ./a.out
```
This makes sure that the CPU will run at the highest possible frequency. Assuming that the performance of the code is linear in the clock speed, calculate the actual frequency the CPU was running at.

Actually, this code can be used to measure the duration of a double-precision floating-point divide operation because the square root computation entirely dominates its runtime. Everything else can be "hidden" behind the divide.

Note: Make sure you actually print the result for $ \ln(2) $ because if you don't then the compiler might be smart and optimize away your whole benchmark because you have not used the result anywhere.

Note: How to fix the clock speed has been described in the first tutorial session. To reiterate: You have to run your executable with the srun command as a wrapper and using the --cpu-freq option, e.g.:

srun --cpu-freq=<min_freq_in_kHz>-<max_freq_in_kHz>:performance ./a.out

If you want to run this experiment in a job script, this is a possible starting point:

#!/bin/bash -l
#
#SBATCH --nodes=1
#SBATCH --time=00:08:00
#SBATCH --job-name=ln2
#SBATCH --export=NONE
#
# first non-empty non-comment line ends SBATCH options

unset SLURM_EXPORT_ENV

module load intel
echo Hello World!
# START YOUR CODE HERE

Adding the --cpu-freq option to the sbatch command at job submission will not fix the clock frequency. You have to use it with srun.