Exercise: J1D-Collectives-Blocking
In this exercise we show how collective operations can be used to add checkpointing to exercise J1D-PtP-Blocking. Indeed, there is a more elegant way of doing this in MPI, namely parallel I/O, which is not in our course agenda. By gathering all chunks from all processes, the potential array can be written into a file for a restart run. Checkpointing is an essential part of a parallel code because typical required run times of scientific simulations exceed the maximum walltime available on cluster computers. Consequently, one should submit a restart run, starting from the last point of the previous run.
- In the directory J1D-Collectives-Blocking, there are f and c subdirectories for Fortran and C, respectively. If you have loaded the modules as described in exercise Hello, compilers and MPI wrappers are ready for use. The compilation process is facilitated using a Makefile.
- There are 6 FIXME markers which are related to the MPI collective operations to gather and scatter the potential arrays.
- For the initial run, you execute the program as you have learnt so far and the program writes the last status of the potential in file starting_potential.dat. For a restart run, one should use command line arguments (-r and the potential filename) while executing the program to invoke a restart run as
$ mpirun -n 72 ./j1d_collectives_blocking -r starting_potential.dat
Last modified: Tuesday, 2 April 2024, 3:24 PM