srun_error

srun_error

by Ananda Krishnan Rejikumar Bindu -
Number of replies: 8

Dear PTFS team,

I have a problem while using srun, could please see the attached for the error?

I saw that slurm got updated in Fritz, does it mean we need to define srun is another way?

With Regards,

Ananda



In reply to Ananda Krishnan Rejikumar Bindu

Re: srun_error

by Zurab Mujirishvili -
Hi,

I'm not sure if the behaviour remains exactly the same, but appending `:performance` to the end of the frequency option seems to resolve the issue. Like this:
`$ srun --cpu-freq=200000-200000:performance ...`
In reply to Zurab Mujirishvili

Re: srun_error

by Jan Laukemann -
It is indeed right that with the performance governor the program runs and it avoids the error you made before, but if you look closely, you will see your cores run with a frequency of 800 MHz.
The min/max frequency must be given in KHz and your mistake was to define it as 200MHz, not 2GHz, therefore, srun stops with the "Invalid --cpu-freq argument" error.

Provide a valid range for the min/max clock frequency and everything should work as expected.
In reply to Jan Laukemann

Re: srun_error

by Zurab Mujirishvili -
Ah my bad, the frequency number is, in fact, incorrect. However, I will add that a correct frequency number alone is not sufficient to resolve this error. See logs below:

```
icpx -std=c++0x -Wall -Winline -Wshadow -W -O3 -qopenmp -xHOST -pg -o perf Grid.o PDE.o Solver.o perf.o timer.o
Running ./perf with options 1000 400000 and CPU frequency 2000000 KHz
srun: error: governor of 2000000-2000000 is not allowed in slurm.conf
srun: error: Invalid --cpu-freq argument
gmon.out: No such file or directory
=== JOB_STATISTICS ===
=== current date : Wed Aug 21 11:29:25 CEST 2024
= Job-ID : 1541210 on fritz
= Job-Name : profile-job
= Job-Command : ./jobs/profile.sh
= Initial workdir : /home/hpc/ptfs/ptfs275h/project
= Queue/Partition : singlenode
= Slurm account : ptfs with QOS=ptfs
= Features : hwperf
= Requested resources: for 00:05:00
= Elapsed runtime : 00:00:07
= Total RAM usage : 0.0 GiB
= Node list : f0566
= Subm/Elig/Start/End: 2024-08-21T11:28:52 / 2024-08-21T11:28:52 / 2024-08-21T11:29:17 / 2024-08-21T11:29:24
======================
=== Quota infos ======
Path Used SoftQ HardQ Gracetime Filec FileQ FiHaQ FileGrace
/home/woody 0.0K 1000.0G 1500.0G N/A 1 5,000K 7,500K N/A
/home/hpc 2237.7M 104.9G 209.7G N/A 4,878 500K 1,000K N/A
/lustre 4.0K 0.0K 0.0K N/A 1 80K 250K N/A
======================
```

This issue has been occurring ever since the maintenance work was done in the period of August 13-15.
In reply to Zurab Mujirishvili

Re: srun_error

by Jan Laukemann -
The default governor (onDemand) does not support fixing frequencies anymore; thus, for setting a min (and/or max) frequency, you always need to provide a governor.
In reply to Jan Laukemann

Re: srun_error

by Ananda Krishnan Rejikumar Bindu -
Could you please show an example showing the exact command to be used to lock frequency with governor? Like should i user powersave or which is the governor to be used for locking frequency?

srun --cpu-freq=2000000-2000000:performance ./perf 1000 1000
srun --cpu-freq=2000000-2000000:powersave ./perf 1000 1000
In reply to Zurab Mujirishvili

Re: srun_error

by Erik Fabrizzi -

I last run my benchmarks almost a month ago and the command srun --cpu-freq=20000000-2000000 was working. I rerun the same batch script I used back then and it is trowing the same error now. Also "back then" givin a low frequency would default to the lowest supported rather than trowing an error, so something may really have changed from the amminsitration side.