Prepare environment with paths for Score-P and Vampir:

$ source ~k78q0039/env.sh
$ cp ~k78q0039/Tools-Material ~

Build the MiniMD without likwid:

$ cd ~/Tools-Material/MINIMD
$ make

Run the uninstrumented MiniMD proxy app with 2 ranks and 10 threads each (ensure, that you are in the 'MINIMD/data' directory):

$ cd data
$ OMP_NUM_THREADS=10 srun -n 2 ../miniMD-ICC -t 10 --half_neigh 1

Modify build settings to use Score-P instrumenter 'scorep' in all $(CC), $(CXX), and $(FC) commands:

$ cd ..
$ cat include_ICC.scorep.mk

Build with modified Makefile configuration:

$ make TAG=ICC.scorep
$ cd data

Run instrumented MiniMD proxy app:

$ export SCOREP_EXPERIMENT_DIRECTORY=scorep-minimd-2x10-profile
$ OMP_NUM_THREADS=10 srun -n 2 -c 10 ../miniMD-ICC.scorep -t 10 --half_neigh 1

Examine scoring:

$ scorep-score scorep-minimd-2x10-profile/profile.cubex 

Estimated aggregate size of event trace:                   157MB
Estimated requirements for largest trace buffer (max_buf): 79MB
Estimated memory requirements (SCOREP_TOTAL_MEMORY):       99MB
(hint: When tracing set SCOREP_TOTAL_MEMORY=99MB to avoid intermediate flushes
 or reduce requirements using USR regions filters.)
flt     type max_buf[B]    visits time[s] time[%] time/visit[us]  region
         ALL 82,096,353 6,311,108  112.52   100.0          17.83  ALL
         USR 79,570,842 6,120,244   95.80    85.1          15.65  USR
         OMP  1,752,280   134,650   13.25    11.8          98.41  OMP
         COM    696,670    53,590    3.03     2.7          56.46  COM
         MPI     76,520     2,622    0.44     0.4         169.19  MPI
      SCOREP         41         2    0.00     0.0          97.98  SCOREP

List also individual functions causing trace buffer requirements:

$ scorep-score -r scorep-minimd-2x10-profile/profile.cubex 
Estimated aggregate size of event trace:                   157MB
Estimated requirements for largest trace buffer (max_buf): 79MB
Estimated memory requirements (SCOREP_TOTAL_MEMORY):       99MB
(hint: When tracing set SCOREP_TOTAL_MEMORY=99MB to avoid intermediate flushes
 or reduce requirements using USR regions filters.)
flt     type max_buf[B]    visits time[s] time[%] time/visit[us]  region
         ALL 82,096,353 6,311,108  112.52   100.0          17.83  ALL
         USR 79,570,842 6,120,244   95.80    85.1          15.65  USR
         OMP  1,752,280   134,650   13.25    11.8          98.41  OMP
         COM    696,670    53,590    3.03     2.7          56.46  COM
         MPI     76,520     2,622    0.44     0.4         169.19  MPI
      SCOREP         41         2    0.00     0.0          97.98  SCOREP

         USR 36,642,554 2,817,976    0.46     0.4           0.16  Neighbor::coord2bin
         USR 30,670,848 2,359,296    0.33     0.3           0.14  random
         USR  5,115,604   393,154    0.06     0.1           0.15  Atom::pack_border
         USR  5,115,396   393,154    0.05     0.0           0.14  Atom::unpack_border
         USR  1,703,936   131,072    0.02     0.0           0.17  Atom::addatom
         OMP    156,312    12,024    0.07     0.1           6.19  !$omp for @atom.cpp:170
         OMP    156,312    12,024   10.24     9.1         851.86  !$omp implicit barrier @atom.cpp:175
         OMP    156,312    12,024    0.12     0.1           9.70  !$omp for @atom.cpp:182
         OMP    156,312    12,024    0.03     0.0           2.66  !$omp implicit barrier @atom.cpp:188
         OMP    156,312    12,024    1.53     1.4         127.07  !$omp barrier @comm.cpp:372
         COM    156,312    12,024    0.00     0.0           0.40  Atom::pack_reverse
         COM    156,312    12,024    0.00     0.0           0.33  Atom::unpack_reverse
         OMP    148,200    11,400    0.11     0.1           9.58  !$omp for @atom.cpp:158
         OMP    148,200    11,400    0.02     0.0           1.61  !$omp implicit barrier @atom.cpp:163
         OMP    148,200    11,400    0.27     0.2          24.12  !$omp barrier @comm.cpp:322

Generate an initial filter file base on default heuristic:

$ scorep-score -g scorep-minimd-2x10-profile/profile.cubexExamine filter file:
An initial filter file template has been generated: 'initial_scorep.filter'
To use this file for filtering at run-time, set the respective Score-P variable:
    SCOREP_FILTERING_FILE=initial_scorep.filter
For compile-time filtering 'scorep' has to be provided with the '--instrument-filter' option:
    $ scorep --instrument-filter=initial_scorep.filter
Compile-time filtering depends on support in the used Score-P installation.
The filter file is annotated with comments, please check if the selection is
suitable for your purposes and add or remove functions if needed.
$ cat initial_scorep.filter

Apply filter file to determine expected reduction:

$ scorep-score -f initial_scorep.filter scorep-minimd-2x10-profile/profile.cubex
Estimated aggregate size of event trace:                   3969kB
Estimated requirements for largest trace buffer (max_buf): 1985kB
Estimated memory requirements (SCOREP_TOTAL_MEMORY):       23MB
(hint: When tracing set SCOREP_TOTAL_MEMORY=23MB to avoid intermediate flushes
 or reduce requirements using USR regions filters.)
flt     type max_buf[B]    visits time[s] time[%] time/visit[us]  region
 -       ALL 82,096,353 6,311,108  112.52   100.0          17.83  ALL
 -       USR 79,570,842 6,120,244   95.80    85.1          15.65  USR
 -       OMP  1,752,280   134,650   13.25    11.8          98.41  OMP
 -       COM    696,670    53,590    3.03     2.7          56.46  COM
 -       MPI     76,520     2,622    0.44     0.4         169.19  MPI
 -    SCOREP         41         2    0.00     0.0          97.98  SCOREP
 *       ALL  2,031,667   152,876  111.53    99.1         729.57  ALL-FLT
 +       FLT 80,064,686 6,158,232    0.99     0.9           0.16  FLT
 -       OMP  1,752,280   134,650   13.25    11.8          98.41  OMP-FLT
 *       USR    117,806     9,062   94.83    84.3       10464.72  USR-FLT
 *       COM     85,020     6,540    3.01     2.7         459.98  COM-FLT
 -       MPI     76,520     2,622    0.44     0.4         169.19  MPI-FLT
 -    SCOREP         41         2    0.00     0.0          97.98  SCOREP-FLT
 +       USR 36,642,554 2,817,976    0.46     0.4           0.16  Neighbor::coord2bin
 +       USR 30,670,848 2,359,296    0.33     0.3           0.14  random
 +       USR  5,115,604   393,154    0.06     0.1           0.15  Atom::pack_border
 +       USR  5,115,396   393,154    0.05     0.0           0.14  Atom::unpack_border
 +       USR  1,703,936   131,072    0.02     0.0           0.17  Atom::addatom
 -       OMP    156,312    12,024    0.07     0.1           6.19  !$omp for @atom.cpp:170
 -       OMP    156,312    12,024   10.24     9.1         851.86  !$omp implicit barrier @atom.cpp:175

Apply filter file to measurement and re-run MiniMD:

$ export SCOREP_EXPERIMENT_DIRECTORY=scorep-minimd-2x10-profile+filter
$ export SCOREP_FILTERING_FILE=initial_scorep.filter
$ OMP_NUM_THREADS=10 srun -n 2 -c 10 ../miniMD-ICC.scorep -t 10 --half_neigh 1

Re-examine effect of filter file:

$ scorep-score scorep-minimd-2x10-profile+filter/profile.cubex
Estimated aggregate size of event trace:                   3969kB
Estimated requirements for largest trace buffer (max_buf): 1985kB
Estimated memory requirements (SCOREP_TOTAL_MEMORY):       23MB
(hint: When tracing set SCOREP_TOTAL_MEMORY=23MB to avoid intermediate flushes
 or reduce requirements using USR regions filters.)
flt     type max_buf[B]  visits time[s] time[%] time/visit[us]  region
         ALL  2,031,667 152,876  115.94   100.0         758.39  ALL
         OMP  1,752,280 134,650   15.33    13.2         113.86  OMP
         USR    117,806   9,062   97.80    84.4       10792.11  USR
         COM     85,020   6,540    2.12     1.8         324.89  COM
         MPI     76,520   2,622    0.69     0.6         261.65  MPI
      SCOREP         41       2    0.00     0.0         118.20  SCOREP

Enable trace file collection:

$ export SCOREP_EXPERIMENT_DIRECTORY=scorep-minimd-2x10-tracing
$ export SCOREP_ENABLE_TRACING=true
$ export SCOQREP_TOTAL_MEMORY=23MB
$ OMP_NUM_THREADS=10 srun -n 2 -c 10 ../miniMD-ICC.scorep -t 10 --half_neigh 1

Examine experiment result:

$ ls -la scorep-minimd-2x10-tracing

(on login nodes or cshpc)

$ vampir scorep-minimd-2x10-tracing/traces.otf2
Last modified: Thursday, 29 June 2023, 1:54 PM