vsc4

Prepare for these Exercises:


cd ~/HY-VSC/he-hy          
#   change into your he-hy directory



Contents:

job_*.sh                              #   job-scripts to run the provided tools, 2 x job_*_exercise.sh

*.[c|f90]                             #   various codes (hello world & tests) - NO need to look into these!

vsc4/vsc4_slurm.out_*     #   vsc4 output files --> sorted (note: physical cores via modulo 48)



vsc-4 - 790 nodes - 2 Intel Skylake Platinum CPUs - 48 cores/node - 24 cores/socket - 96 GB/node (384 GB / 768 GB)IN THE ONLINE COURSE he-hy shall be done in two parts:

    first exercise       =   1. + 2. + 3. + 4.

    second exercise   =   5. + 6. + 7.   (after the talk on pinning)



1. FIRST THINGS FIRST - PART 1: :DEMO: find out about a (new) cluster - login node

    module (avail, load, list, unload); compiler (name & --version)

    Some suggestions what you might want to try:

vsc4 - login node - topology    module list                                 # are there any default modules loaded ?   are these okay ?
                                                        # ==> no default modules loaded on vsc4

    module avail                               # which modules are available ?
                                                        # default versions ?   latest versions ?   spack ?
                                                        # ==> look for: compiler, mpi, likwid

# likwid is (temporarily) not available on vsc4...
    module load likwid                    # let's try to load a module...
    likwid-topology -c -g                # ... and use it on the login node
    module purge                            # clean up and purge all loaded modules

    numactl --hardware                  # provides the same information...

    module load intel intel-mpi   # decide on the modules to be used (Intel compiler and Intel MPI library)

    
module list                                 # list modules loaded

    echo $CC                                   # check compiler names
    echo $CXX                                 #                     
    echo $F90                                  #

    icc --version                              # check versions for standard Intel compilers
    icpc --version                            #
    ifort  --version                           #

VSC4 - components    mpi<tab><tab>                         # figure out which MPI wrappers are available

    mpicc --version                        # try standard names for the MPI wrappers
    mpicxx --version                      #
    mpif90 --version                      # ==> oh no, these are old gnu compilers...

    mpiicc --version                       # try special names, e.g., for Intel...
    mpiicpc --version                     #
    mpiifort --version                     # ==> finally, we get what we expect (note the double ii) smile

    Always check compiler names and versions, e.g. with: mpiicc --version !!!


2. FIRST THINGS FIRST - PART 2: :DEMO: find out about a (new) cluster - batch jobs

SLURM - partition + qos    job environment, job scripts (clean) & batch system (SLURM); test compiler and MPI version

    job_env.sh,  job_te-ve_[c|f].sh,  te-ve*    

   ! there might be several hardware partitions and qos on a cluster & a default !
   !  you have to check all hardware partitions you would like to use separately  !

   ! in the course we have a node reservation on vsc4 in the skylake_0096 partition !
   ! (it's switched on via your ~/.bashrc file (done with setup.sh))                                     !

SLURM (vsc4):

vsc-4 - job scriptsbatch job*.sh                                             #   submit a job

sq                                                                 #   check own jobs
                                                                    #   sq is an alias for: squeue -u $USER

scancel JOB_ID                                           #   cancel a job

output will be written to: slurm-*.out     #   output

    sbatch job_env.sh                                                        # check job environment

    ./job_te-ve_c.sh             |   ./job_te-ve_f.sh               # run on login nodes  --> test version (te-ve)

    sbatch job_te-ve_c.sh   |   sbatch job_te-ve_f.sh     # submit job               --> test version (te-ve)


3. MPI+OpenMP: :TODO: how to compile and start an application
                                              how to do conditional compilation

    job_co-co_[c|f].sh,  co-co.[c|f90]

    Recap with Intel compiler & Intel MPI (→ see also slide ##):

vsc-4 - info about hardware

compiler:   ? USE_MPI  ? _OPENMP START APPLICATION:
C:      export OMP_NUM_THREADS=#
    with MPI mpiicc   -DUSE_MPI    -qopenmp             mpirun -n # ./<exe>
no MPI icc   -qopenmp   ./<exe>
Fortran: export OMP_NUM_THREADS=#
with MPI mpiifort   -fpp   -DUSE_MPI   -qopenmp   mpirun -n # ./<exe>
no MPI ifort  -fpp   -qopenmp  
./<exe>


      TODO:

    → Compile and Run (4 possibilities) co-co.[c|f90] = Demo for conditional compilation.

    → Do it by hand - compile and run it directly on the login node (e.g., with #=4).
         export I_MPI_FABRICS=shm                          # needed on (some) login nodes

    → Have a look into the code:  co-co.[c|f90]  to see how it works.

    → It's also available as a script:  job_co-co_[c|f].sh

    Always check compiler names and versions, e.g. with: mpiicc --version !!!


4. MPI+OpenMP: :TODO: get to know the hardware - needed for pinning

                                                                                                              (→ see also slide ##)

      TODO:

vsc4 - compute node - topology    → Find out about the hardware of compute nodes:

    → Write and Submit: job_check-hw_exercise.sh

        # likwid is (temporarily) not available on vsc4...

    → Describe the compute nodes... (core numbering?)

    → solution = job_check-hw_solution.sh

    → solution.out =  vsc4/vsc4_slurm.out_check-hw_solution





Last modified: Sunday, 11 December 2022, 12:30 PM