In this exercise we show how collective operations can be used to add checkpointing to exercise J1D-PtP-Blocking. Indeed, there is a more elegant way of doing this in MPI, namely parallel I/O, which is not in our course agenda. By gathering all chunks from all processes, the potential array can be written into a file for a restart run. Checkpointing is an essential part of a parallel code because typical required run times of scientific simulations exceed the maximum walltime available on cluster computers. Consequently, one should submit a restart run, starting from the last point of the previous run.

  • In the directory J1D-Collectives-Blocking, there are f and c subdirectories for Fortran and C, respectively. If you have loaded the modules as described in exercise Hello, compilers and MPI wrappers are ready for use. The compilation process is facilitated using a Makefile.
  • There are 6 FIXME markers which are related to the MPI collective operations to gather and scatter the potential arrays.
  • For the initial run, you execute the program as you have learnt so far and the program writes the last status of the potential in file starting_potential.dat. For a restart run, one should use command line arguments (-r and the potential filename) while executing the program to invoke a restart run as

      $ mpirun -n 72 ./j1d_collectives_blocking -r starting_potential.dat
  • Please execute restart run multiple times and check whether each time the error is smaller than that of the previous execution.
  • The reference file at ref/domain-10000000.pgm is generated for nstep=10000000 and n=10001 (for Fortran n=10000) where in addition a new run, one restart run is executed as well. You can use the cmp command to compare your domain with the reference file.


Last modified: Tuesday, 2 April 2024, 3:24 PM