Exercise: Parallel histogram computation
The following code calculates a histogram with 16 bins (the least significant four bits) from the results of the standard rand_r() random number generator. You can find serial (Fortran and C) example codes in the HISTO
folder.
unsigned int seed = 123;
long hist[16];
for(int i=0; i<16; ++i)
hist[i]=0;
timing(&wcstart, &ct);
for(long i=0; i<2000000000; ++i) {
hist[rand_r(&seed) & 0xf]++;
}
timing(&wcend, &ct);
for(int i=0; i<16; ++i) {
cout << "hist[" << i << "]=" << hist[i] << endl;
}
cout << "Time: " << wcend-wcstart << " sec" << endl;
- Parallelize the histogram calculation using OpenMP. Does your parallel code scale from 1 to 10 cores? (Hint: There should be a significant speedup; if you see the program slowing down with increasing thread count, something is wrong)
- Parallelize the code with MPI. Same deal.
Reminder: In order to compile an MPI program, you have to use one of the wrapper scripts mpiicx, mpiicpx, or mpiifx instead of the normal Intel compiler for C, C++, and Fortran code, respectively. For running the code you use mpiexec in your batch script:$ mpiexec -n # ./my_executable
Here, "#" is the number of processes you want to use. Each CoolMUC-4 node has 112 cores, so if you allocate one node you can run up to 112 processes.
Note that "the code scales" means that it runs (almost) N times faster with N threads/processes than with a single thread/process.
Last modified: Wednesday, 19 February 2025, 9:29 AM