The following code calculates a histogram with 16 bins (the least significant four bits) from the results of the standard rand_r() random number generator. You can find serial (Fortran and C) example codes in the HISTO folder.


 unsigned int seed = 123;
long hist[16];
for(int i=0; i<16; ++i)
hist[i]=0;
 timing(&wcstart, &ct);
for(long i=0; i<2000000000; ++i) {
hist[rand_r(&seed) & 0xf]++;
}
timing(&wcend, &ct);
for(int i=0; i<16; ++i) {
cout << "hist[" << i << "]=" << hist[i] << endl;
}
 cout << "Time: " << wcend-wcstart << " sec" << endl;


  1. Parallelize the histogram calculation using OpenMP. Does your parallel code scale from 1 to 10 cores? (Hint: There should be a significant speedup; if you see the program slowing down with increasing thread count, something is wrong)

  2. Parallelize the code with MPI. Same deal.

    Reminder: In order to compile an MPI program, you have to use one of the wrapper scripts mpiicxmpiicpx, or mpiifx instead of the normal Intel compiler for C, C++, and Fortran code, respectively. For running the code you use mpiexec in your batch script:

    $ mpiexec -n # ./my_executable

    Here, "#" is the number of processes you want to use. Each CoolMUC-4 node has 112 cores, so if you allocate one node you can run up to 112 processes. 

Note that "the code scales" means that it runs (almost) N times faster with N threads/processes than with a single thread/process.




Last modified: Wednesday, 19 February 2025, 9:29 AM