⇤ ← Revision 1 as of 2019-09-05 12:48:45
Size: 3128
Comment:
|
Size: 3436
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 31: | Line 31: |
==== Setting environment ==== The above commands are only working if the environment variables for SLURM are set. Please put the following to lines in your ~/.bashrc :<<BR>> {{{ export PATH=/usr/pack/slurm-19.05.0-sr/amd64-debian-linux9/bin:$PATH export SLURM_CONF=/home/sladmitet/slurm/slurm.conf }}} |
Introduction
At ITET the Condor Batch Queueing System is used since long time for running compute-intensive jobs. It uses the free resources on the tardis-PCs of the student rooms and on numerous PCs and compute servers at ITET institutes. Interactive work is privileged over batch computing, so running jobs could be killed by new interactive load or by shutdown/restart of a PC.
The SLURM system installed on the powerfull ITET arton compute servers is an alternative to the Condor batch computing system and reserved for staff of the contributing institutes (IBT,IFA,TIK,IKT,APS). It consists of a master host, where the scheduler resides and the arton compute nodes, where the batch jobs are executed. The compute nodes are powerfull servers, which resides in server rooms and are exclusively reserved for batch processing. Interactive logins are disabled.
SLURM
SLURM (Simple Linux Utility for Resource Management) is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters. Slurm's design is very modular with about 100 optional plugins. In 2010, the developers of Slurm founded SchedMD (https://www.schedmd.com), which maintains the canonical source, provides development, level 3 commercial support and training services and also provide a very good online documentation to Slurm ( https://slurm.schedmd.com ).
SLURM Arton Grid
Hardware
At the moment the computing power of the SLURM Arton Grid is based on the following 11 cpu compute servers and 1 gpu compute server (compute nodes) :
Server |
CPU |
Frequency |
Cores |
GPUs |
Memory |
Operating System |
||||||
arton01 - 03 |
Dual Octa-Core Intel Xeon E5-2690 |
2.90 GHz |
16 |
- |
128 GB |
Debian 9 |
||||||
arton04 - 08 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
20 |
- |
128 GB |
Debian 9 |
||||||
arton09 - 10 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
20 |
- |
256 GB |
Debian 9 |
||||||
arton11 |
Dual Deca-Core Intel Xeon E5-2690 v2 |
3.00 GHz |
20 |
- |
768 GB |
Debian 9 |
||||||
artongpu01 |
Dual Octa-Core Intel Xeon Silver 4208 CPU |
2.10 GHz |
16 |
2 |
128GB |
Debian 9 |
The local disks (/scratch) of arton09, arton10 and arton11 are fast SSD-disks (6 GBit/s) with a size of 720 GByte.
The SLURM job scheduler runs on the linux server itetmaster01.
Software
The artons cpu nodes offer the same software environment as all D-ITET managed Linux clients, gpu nodes have a restricted software ( no desktops installed ).
Using SLURM
At a basic level, SLURM is very easy to use. The following sections will describe the commands you need to run and manage your batch jobs on the Grid Engine. The commands that will be most useful to you are as follows
- sbatch - submit a job to the batch scheduler
- squeue - examine running and waiting jobs
- sinfo - status compute nodes
- scancel - delete a running job
Setting environment
The above commands are only working if the environment variables for SLURM are set. Please put the following to lines in your ~/.bashrc :
export PATH=/usr/pack/slurm-19.05.0-sr/amd64-debian-linux9/bin:$PATH export SLURM_CONF=/home/sladmitet/slurm/slurm.conf