Assignment 0: Warmup
Completion requirements
Due: Thursday, 2 May 2024, 10:03 AM
Write a benchmark code that numerically computes the integral
\( \displaystyle 4\int\limits_0^1\sqrt{1-x^2}\,\mathrm{d}x \).
The
result should be an approximation to \( \pi \), of course. You may use a
very simple rectangular integration scheme that works by summing up
areas of rectangles centered around \( x_i \) with a width of \( \Delta
x \) and a height of \( f(x_i) \):
int SLICES = 1000000000;
double sum=0., delta_x = ....;
for (int i=0; i < SLICES; i++) {<br></strong><strong> double x = (i+0.5)*delta_x;<br></strong><strong> sum += 4.0 * sqrt(1.0 - x * x));<br></strong><strong>} double Pi = sum * delta_x;
Complete the
code fragment, add suitable timing
functions to measure the runtime of the main loop (as described in the tutorials kick-off). Make sure that it
actually computes an approximation to \( \pi \) by printing the result, and run it on one core of the Fritz cluster. Use
the Intel compiler with the recommended compiler options (
-O3 -xHost
). Include the
relevant parts of your code in your submission. Do not forget to fix the clock speed (use 2.4 GHz). - (40 credits) How
many CPU cycles does the code take to execute per iteration of the
loop? Describe how you arrived at this number from your measurements!
- (20 credits) What is the performance of the code? Is Flops/s a good performance metric here? Discuss
different options for appropriate performance metrics in this code.
- (20 credits) Discuss what
happens (in terms of performance and accuracy) if you change the code to use single-precision floating-point
numbers! Are there any relevant code changes beyond data types? (Remember to use sqrtf() instead of sqrt()!) Why do
you not get a reasonable accuracy for \( \pi \)? (Please submit your single-precision code as well).
- (20 credits) Run the (double-precision)
experiment again, but this time do not fix the clock frequency but set the "performance governor" for the CPU:
$ srun --cpu-freq=performance ./a.out
This makes sure that the CPU will run at the highest possible frequency. Assuming that the performance of the code is linear in the clock speed, calculate the actual frequency the CPU was running at.
Actually, this code can be used to
measure the duration of a floating-point square root operation because the square root computation entirely dominates its runtime. Everything else can be "hidden" behind
the sqrt.
Note: Make sure you actually print the result for \( \pi \) because if you don't then the compiler might be smart and optimize away your whole benchmark because you have not used the result anywhere.
Note: How to fix the clock speed has been
described in the first tutorial session. To reiterate: You have to run
your executable with the
srun
command as a wrapper and using the --cpu-freq
option, e.g.:srun --cpu-freq=<min_freq_in_kHz>-<max_freq_in_kHz> ./a.out
If you want to run this experiment in a job script, this is a possible starting point:
#!/bin/bash -l # #SBATCH --nodes=1 #SBATCH --time=00:08:00 #SBATCH --job-name=pi #SBATCH --export=NONE # # first non-empty non-comment line ends SBATCH options
unset SLURM_EXPORT_ENV
module load intel echo Hello World! # START YOUR CODE HERE
Adding the --cpu-freq
option to the sbatch
command at job submission will not fix the clock frequency. You have to use it with srun
.