Revision 6 as of 2019-09-06 10:46:53

Clear message

Introduction

At ITET the Condor Batch Queueing System is used since long time for running compute-intensive jobs. It uses the free resources on the tardis-PCs of the student rooms and on numerous PCs and compute servers at ITET institutes. Interactive work is privileged over batch computing, so running jobs could be killed by new interactive load or by shutdown/restart of a PC.

The SLURM system installed on the powerfull ITET arton compute servers is an alternative to the Condor batch computing system and reserved for staff of the contributing institutes (IBT,IFA,TIK,IKT,APS). It consists of a master host, where the scheduler resides and the arton compute nodes, where the batch jobs are executed. The compute nodes are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.

SLURM

SLURM (Simple Linux Utility for Resource Management) is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters. Slurm's design is very modular with about 100 optional plugins. In 2010, the developers of Slurm founded SchedMD (https://www.schedmd.com), which maintains the canonical source, provides development, level 3 commercial support and training services and also provide a very good online documentation to Slurm ( https://slurm.schedmd.com ).

SLURM Arton Grid

Hardware

At the moment the computing power of the SLURM Arton Grid is based on the following 11 cpu compute servers and 1 gpu compute server (compute nodes) :

Server

CPU

Frequency

Cores

GPUs

Memory

Operating System

arton01 - 03

Dual Octa-Core Intel Xeon E5-2690

2.90 GHz

16

-

128 GB

Debian 9

arton04 - 08

Dual Deca-Core Intel Xeon E5-2690 v2

3.00 GHz

20

-

128 GB

Debian 9

arton09 - 10

Dual Deca-Core Intel Xeon E5-2690 v2

3.00 GHz

20

-

256 GB

Debian 9

arton11

Dual Deca-Core Intel Xeon E5-2690 v2

3.00 GHz

20

-

768 GB

Debian 9

artongpu01

Dual Octa-Core Intel Xeon Silver 4208 CPU

2.10 GHz

16

2

128GB

Debian 9


The local disks (/scratch) of arton09, arton10 and arton11 are fast SSD-disks (6 GBit/s) with a size of 720 GByte.

The SLURM job scheduler runs on the linux server itetmaster01.

Software

The artons cpu nodes offer the same software environment as all D-ITET managed Linux clients, gpu nodes have a restricted software ( no desktops installed ).

Using SLURM

At a basic level, SLURM is very easy to use. The following sections will describe the commands you need to run and manage your batch jobs on the Grid Engine. The commands that will be most useful to you are as follows

Setting environment

The above commands are only working if the environment variables for SLURM are set. Please put the following to lines in your ~/.bashrc :

export PATH=/usr/pack/slurm-19.05.0-sr/amd64-debian-linux9/bin:$PATH
export SLURM_CONF=/home/sladmitet/slurm/slurm.conf

sbatch : Submitting a job

sbatch doesn't allow to submit a binary program directly, please put the program to run in a surrounding bash script. The sbatch command has the following syntax:

> sbatch [options] job_script [job_script arguments]

The job_script is a standard UNIX shell script. The fixed options for the SLURM Scheduler are placed in the job_script in lines starting with #SBATCH. The UNIX shell interpreter read this lines as comment lines and ignores them. Only temporary options should be placed outside the job_script. To test your job-script you can simply run it interactively.

Assume there is a c program primes.c which is compiled to an executable binary named primes with "gcc -o primes primes.c". The program runs 5 seconds and calculates prime numbers. The found prime numbers and a final summary report are written to standard output. A sample job_script primes.sh to perform a batch run of the binary primes on the Arton grid looks like this:

#
#SBATCH  --mail-type=ALL                     # mail configuration: NONE, BEGIN, END, FAIL, REQUEUE, ALL
#SBATCH  --output=log/%j.out                 # where to store the output ( %j is the JOBID )
/bin/echo Running on host: `hostname`
/bin/echo In directory: `pwd`
/bin/echo Starting on: `date`
/bin/echo SLURM_JOB_ID: $SLURM_JOB_ID
#
# binary to execute
./primes
echo finished at: `date`
exit 0;

You cat test the script by running it interactively in a terminal:

gfreudig@trollo:~/Batch$ ./primes.sh

If the script runs successfully you now can submit it as a batch job to the SLURM arton grid:

gfreudig@trollo:~/Batch$ sbatch primes.sh 
sbatch: Start executing function slurm_job_submit......
sbatch: Job partition set to : cpu.normal.32 (normal memory)
Submitted batch job 931
gfreudig@trollo:~/Batch$ 

When the job has finished, you find the output file of the job in the log subdirectory with a name of <JOBID>.out .
/!\ The directory for the job output must exist, it is not created automatically !

Like in condor its also possible to start an array job. The job above would run 10 times if you put the option #SBATCH --array=0-9 in the job-script. The repeated execution makes only sense if something is changed in the executed program with the array task count number.The array count number can be referenced through the variable $SLURM_ARRAY_TASK_ID. You can pass the value of $SLURM_ARRAY_TASK_ID or some derived parameters to the executable. A simple solution to pass an $SLURM_ARRAY_TASK_ID dependent input filename parameter for the executable would look like this:

.
#SBATCH   --array=0-9
#
# binary to execute
<path-to-executable> data$SLURM_ARRAY_TASK_ID.dat

Every run of the program in the array job with a different task-id will also produce a separate output file.

The following table shows the most common available options for sbatch to be placed in the job-script in lines starting with #SBATCH

option

description

--mail-type=...

Possible Values: NONE, BEGIN, END, FAIL, REQUEUE, ALL

--mem=<n>G

the job needs a maximum of <n> GByte ( if omitted the default of 12G is used )

--cpus-per-task=<n>

number of cores to be used for the job

--gres=gpu:1

number of GPUs needed for the job ( limited to 1 ! )