Differences between revisions 1 and 2
Revision 1 as of 2019-09-05 12:48:45
Size: 3128
Editor: gfreudig
Comment:
Revision 2 as of 2019-09-05 12:53:55
Size: 3436
Editor: gfreudig
Comment:
Deletions are marked like this. Additions are marked like this.
Line 31: Line 31:
==== Setting environment ====
The above commands are only working if the environment variables for SLURM are set. Please put the following to lines in your ~/.bashrc :<<BR>>
{{{
export PATH=/usr/pack/slurm-19.05.0-sr/amd64-debian-linux9/bin:$PATH
export SLURM_CONF=/home/sladmitet/slurm/slurm.conf
}}}

Introduction

At ITET the Condor Batch Queueing System is used since long time for running compute-intensive jobs. It uses the free resources on the tardis-PCs of the student rooms and on numerous PCs and compute servers at ITET institutes. Interactive work is privileged over batch computing, so running jobs could be killed by new interactive load or by shutdown/restart of a PC.

The SLURM system installed on the powerfull ITET arton compute servers is an alternative to the Condor batch computing system and reserved for staff of the contributing institutes (IBT,IFA,TIK,IKT,APS). It consists of a master host, where the scheduler resides and the arton compute nodes, where the batch jobs are executed. The compute nodes are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.

SLURM

SLURM (Simple Linux Utility for Resource Management) is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters. Slurm's design is very modular with about 100 optional plugins. In 2010, the developers of Slurm founded SchedMD (https://www.schedmd.com), which maintains the canonical source, provides development, level 3 commercial support and training services and also provide a very good online documentation to Slurm ( https://slurm.schedmd.com ).

SLURM Arton Grid

Hardware

At the moment the computing power of the SLURM Arton Grid is based on the following 11 cpu compute servers and 1 gpu compute server (compute nodes) :

Server

CPU

Frequency

Cores

GPUs

Memory

Operating System

arton01 - 03

Dual Octa-Core Intel Xeon E5-2690

2.90 GHz

16

-

128 GB

Debian 9

arton04 - 08

Dual Deca-Core Intel Xeon E5-2690 v2

3.00 GHz

20

-

128 GB

Debian 9

arton09 - 10

Dual Deca-Core Intel Xeon E5-2690 v2

3.00 GHz

20

-

256 GB

Debian 9

arton11

Dual Deca-Core Intel Xeon E5-2690 v2

3.00 GHz

20

-

768 GB

Debian 9

artongpu01

Dual Octa-Core Intel Xeon Silver 4208 CPU

2.10 GHz

16

2

128GB

Debian 9


The local disks (/scratch) of arton09, arton10 and arton11 are fast SSD-disks (6 GBit/s) with a size of 720 GByte.

The SLURM job scheduler runs on the linux server itetmaster01.

Software

The artons cpu nodes offer the same software environment as all D-ITET managed Linux clients, gpu nodes have a restricted software ( no desktops installed ).

Using SLURM

At a basic level, SLURM is very easy to use. The following sections will describe the commands you need to run and manage your batch jobs on the Grid Engine. The commands that will be most useful to you are as follows

  • sbatch - submit a job to the batch scheduler
  • squeue - examine running and waiting jobs
  • sinfo - status compute nodes
  • scancel - delete a running job

Setting environment

The above commands are only working if the environment variables for SLURM are set. Please put the following to lines in your ~/.bashrc :

export PATH=/usr/pack/slurm-19.05.0-sr/amd64-debian-linux9/bin:$PATH
export SLURM_CONF=/home/sladmitet/slurm/slurm.conf

Services/SLURM (last edited 2024-04-02 08:28:39 by bonaccos)