Logging in and starting jobs on the Emmy cluster at RRZE
1. Login to the HPC machines at RRZE
Course
logins for the HPC systems at RRZE are provided after you send e-mail to Georg Hager.
You
will perform all of the benchmark work on the Ivy Bridge Cluster
"Emmy". Detailed information about login, file systems etc. can be found
on the website:
All HPC machines can only be accessed directly via ssh from machines within the FAU network. From outside the University of Erlangen-Nürnberg, type
ssh -p 8196 YOUR_USERNAME@grid.rrze.uni-erlangen.de
If
you need software components that are not present on the production
system you can use our general dialog
server, cshpc.rrze.uni-erlangen.de. All network file systems are
available also on this dialog server and special software (like, e.g.,
plotting tools) is most probably installed there in more or less current
versions. If you miss anything, tell us so we can install it.
Please
do not perform memory-intensive test runs on the frontends or dialog
servers as this will disturb user operations. Moreover, many users are
active on the frontends and you will not get sensible performance data
anyway.
2. Compilers
On the Emmy cluster we use the Intel compiler suite.
Usually the Intel compilers deliver higher performance than the GCC and
we are quite familiar with their characteristics. To access the Intel
compilers you first have to set up your environment correctly. This can
be done for the currently running shell via:
module load intel64
(you can also specify a version number; this will be required from time to time). This will set up the necessary PATH and other variables that you need to work with the Intel compilers. The compilers are called ifort (Fortran77/90), icc(C) and icpc (C++).
2.1 Recommended compiler switches
The Intel compilers have loads of command line options. We recommend to use -O3 -xHost -fno-alias. The option -help will give you a complete list. The standard options (-c, -g, -o etc.) are identical to GCC. If you want to have a report on what the compiler did in the optimization stage you can use -opt_report3, but don't expect too much readable information.
3. Batch processing
Short
test runs can be started directly on the Emmy frontends. However, for
producing reliable benchmark results it is preferable to submit the jobs
to the batch queue. The batch system accepts requests for
resources (e.g., "6 nodes for 24 hours") and queues them according to
some priority scheme. A job gets run, i.e. a previously specified shell
script gets executed, when the resources are available and the batch
system has chosen the job to be started. Apart from running a batch
script (see below) and interactive testing on the frontends you can
submit an interactive batch job which gives you, e.g., a shell on a compute node for some time. You can do this by typing:
$ qsub -l nodes=1:ppn=40,walltime=01:00:00 -I
This
command will allocate a complete node (40 virtual cores) for 1 hour.
You should always request complete nodes so that you can do your
benchmarks on a quiet machine. Unless you do message passing
parallelization (later this term) there will be no need to request more
than one node.
If you want to run longer benchmarks or parameter studies you have to submit a batch script:
Figure 1: A simple batch script
#!/bin/bash
#
# the script runs in $HOME, so
# change to correct directory (i.e. the directory
# from which the job was submitted)
cd $PBS_O_WORKDIR
# start executable
./a.out
$ qsub -l nodes=1:ppn=40,walltime=05:00:00 -m be -M you@somewhere script.sh
This
will again request one complete node, this time for 5 hours. After job
submission, qsub will print the job's ID number. You will be notifed
when the job starts and when it ends (-m be) via e-mail to the indicated address (-M option). When the job starts, the script script.sh gets executed one the allocated node. Fig. 1 shows a simple example for a batch script.
If
your job is not finished after the requested walltime, it will be
killed mercilessly. You may request up to 24 hours of runtime, but be
aware that shorter runtimes will increase the job's probability for
running early. So try to give a sensible estimate for your runtime
requirements.
After the job has finished, its stdout and stderr outputs
will be saved in the directory where you had submitted it. Filenames
for those files are usually compiled from the job name and ID, but can
be modified using the -o and -e options to qsub (see manpage).
You can watch and control your jobs using the qstat and qdel commands, respectively.
- qstat will show you all your jobs, whether running (status `R') or queued (status `Q').
- qdel takes
one or more job IDs (just the numbers) as arguments and allows you to
remove a job from the queue, even when it's already running.
4 Measuring elapsed time and consumed CPU time
A sample for measuring elapsed time and consumed CPU time is provided in the files timing.* located in the directory ~unrz55/GettingStarted.
An example for the use of the timing functions in C is provided in the file example.c also located in the directory ~unrz55/GettingStarted.
You can link the timing.o object file to a Fortran program. It should work out of the box because timing.c also defines a wrapper function with an underscore appended to its name.
Of
course there are a lot of other possibilities for measuring the time.
Feel free to use your favorite routines instead of the ones mentioned
above. Bear in mind that the only reliable measure for performance is wallclock time. CPU time can often be misleading.
Please
bear in mind that timing functions have limited granularity, i.e., it
does usually not make sense to measure time on a scale of microseconds.
Always write your benchmarks in a way that time intervals to be measured
are at least 100 ms.
Clock frequency settings
If
you want to get accurate timings in terms of processor cycles, you have
to know the exact clock speed of the CPU. The Emmy processors have a
nominal clock speed of 2.2 GHz, but "Turbo Mode" is enabled by default.
This means that the CPU can "overclock" to some degree, depending on the
number of active cores and the temperature. The highest possible clock
speed is 3.0 GHz. In order to set the clock frequency to a specific
(fixed) value you can specify a parameter at job submit time:
$ qsub -l nodes=1:ppn=40:f2.2,walltime=01:00:00 ...
In
this example, the clock speed for all cores in this job would be set to
2.2 GHz. The available settings are listed in the output of the "pbsnodes -a" command.
Last modified: Thursday, 5 November 2020, 3:37 PM