Hands-On: logging in, editing, compiling, and starting a job at LRZ
The CoolMUC-4 Linux Cluster at LRZ
Exercises will be done on the new CoolMUC-4 Linux cluster at LRZ.
See https://doku.lrz.de/coolmuc-4-1082337877.html for further information.
Logging in to the CoolMUC-4 cluster
Login under Linux:
- Open xterm
- ssh -Y cool.hpc.lrz.de -l username
- Use username, password and 2nd factor 2FA (provided by LRZ staff).
Login under Windows:
- We recommend to use MobaXterm https://mobaxterm.mobatek.net/ (comes with bulit-in X11 support).
- Open a new session via Session -> SSH.
- Specify remote host: cool.hpc.lrz.de and username as provided by LRZ staff.
- Enter password and 2nd factor 2FA (provided by LRZ staff) into the opened console.
- As an alternative to MobaXterm you can also use the terminal software putty (https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html) together with the Xming X11 server for Windows: https://sourceforge.net/projects/xming/ .
Login under Mac:
- Install X11 support for MacOS XQuartz: https://www.xquartz.org/ .
- Open Terminal.
- ssh -Y cool.hpc.lrz.de -l username
- Use username, password and 2nd factor 2FA (provided by LRZ staff).
See https://doku.lrz.de/access-and-login-to-the-linux-cluster-10745974.html for further information on how to access the Linux cluster.
Copying example code to your home directory
You will mostly be working on example code that we provide. The file(s) for the examples are available in a central directory. In order to copy a specific code, e.g., "HELLO", you just type:
$ cp -a /lrz/sys/courses/PPHPS25/HELLO/ ~
This copies the whole folder, including any subfolders, into your $HOME.
Editing and compiling code
Standard editors like vim and emacs are available on the frontend nodes. It is good practice to have at least two terminal windows open: one with a connection to the frontend to do editing, compiling, and other interactive stuff, and one for submitting a job (see below).
Most software, including compilers and some libraries, is available via the "modules" system. The "module load <x>" command loads software module "<x>", i.e., it sets some environment variables (e.g., PATH) so you can use the software.
On CoolMUC-4 Intel compiler, MPI and MKL modules are no longer loaded by default. Users can activate the Intel-OneAPI toolkit environment by inserting the following command in their corresponing SLURM scripts:
$ module load intel-toolkit
After that, you have a C (icx), C++ (icpx), and a Fortran (ifx) compiler at your disposal. In addition, an Intel MPI module is loaded. The example codes you will be working with are quite simple. The Intel compiler accepts the same options (well, mostly) as the GCC (here shown for a C code):$ icx -Ofast my_source.c
In order to make the compiler recognize OpenMP directives, you have to add "-qopenmp" to the command line. For MPI code you have to use one of the provided wrapper scripts (mpiicx, mpiicpx, mpiifx). They behave like normal compilers except that they "know" where to find the MPI headers and libraries.
It is good practice to do all editing and compiling on a frontend node. Not all tools and libraries you need may be available on the compute nodes.
Running a batch job
In order to run something on the cluster, you have to start a "batch job," i.e., you specify some amount of resources and the system decides when these resources will be available to you. During the course 10 nodes (1120 cores) of CoolMUC-4 are reserved for the course participants.
To run jobs on CoolMUC-4 during the course you have to use the SLURM batch scheduler, see
- https://doku.lrz.de/job-processing-on-the-linux-cluster-10745970.html
- https://doku.lrz.de/running-parallel-jobs-on-the-linux-cluster-11484078.html
Sample OpenMP job file for allocating 1 node (112 physical cores) on CoolMUC-4 and running 112 OpenMP threads:
--snip--
#!/bin/bash
--snap--#SBATCH -D ./
module load slurm_setup
#SBATCH -o ./%x.%j.%N.out
#SBATCH -e ./%x.%j.%N.err
#SBATCH -J cmtest
#SBATCH --clusters=cm4
#SBATCH --partition=cm4_tiny
#SBATCH --qos=cm4_tiny
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=112
#SBATCH --get-user-env#SBATCH --export=NONE
##SBATCH --reservation=hppb1w24
#SBATCH --time=00:10:00
module load intel-toolkit
#for OpenMP
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./openmp-program
Sample MPI job file for allocating 1 node (112 physical cores) on CoolMUC-4 and running 112 MPI tasks:
--snip--
#!/bin/bash
--snap--#SBATCH -D ./module load slurm_setup
#SBATCH -o ./%x.%j.%N.out
#SBATCH -e ./%x.%j.%N.err
#SBATCH -J cmtest
#SBATCH --clusters=cm4
#SBATCH --partition=cm4_tiny
#SBATCH --qos=cm4_tiny
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=112
#SBATCH --cpus-per-task=1
#SBATCH --get-user-env
#SBATCH --export=NONE
##SBATCH --reservation=hppb1w24
#SBATCH --time=00:10:00
module load intel-toolkit
#for MPI
mpiexec -n $SLURM_NTASKS ./mpi-program
Please do not use "#SBATCH --mail-type=…" as this could be considered as a denial of service attack on the LRZ mail hub!
- Submit a job
sbatch --reservation=hppb1w24 job.sh
- List own jobs
squeue -M cm4
- Cancel jobs
scancel -M cm4 jobid
- Information about the cm4 cluster segment
sinfo -M cm4
- Show reservation
scontrol -Mcm4 show reservation